Leveraging Genetic Algorithm and Neural Networks in Automated Protein Crystal Recognition Ming Jack Po and Andrew Laine Department of Biomedical Engineering.

Slides:



Advertisements
Similar presentations
Perceptron Learning Rule
Advertisements

QR Code Recognition Based On Image Processing
Road-Sign Detection and Recognition Based on Support Vector Machines Saturnino, Sergio et al. Yunjia Man ECG 782 Dr. Brendan.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Face Recognition & Biometric Systems Support Vector Machines (part 2)
A Versatile Depalletizer of Boxes Based on Range Imagery Dimitrios Katsoulas*, Lothar Bergen*, Lambis Tassakos** *University of Freiburg **Inos Automation-software.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Lecture 5 Template matching
Civil and Environmental Engineering Carnegie Mellon University Sensors & Knowledge Discovery (a.k.a. Data Mining) H. Scott Matthews April 14, 2003.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
1. Elements of the Genetic Algorithm  Genome: A finite dynamical system model as a set of d polynomials over  2 (finite field of 2 elements)  Fitness.
A Study of Approaches for Object Recognition
Fitting a Model to Data Reading: 15.1,
Feature Screening Concept: A greedy feature selection method. Rank features and discard those whose ranking criterions are below the threshold. Problem:
Basic concepts of Data Mining, Clustering and Genetic Algorithms Tsai-Yang Jea Department of Computer Science and Engineering SUNY at Buffalo.
Feature Selection and Its Application in Genomic Data Analysis March 9, 2004 Lei Yu Arizona State University.
Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Oral Defense by Sunny Tang 15 Aug 2003
Facial Recognition CSE 391 Kris Lord.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
כמה מהתעשייה? מבנה הקורס השתנה Computer vision.
1 REAL-TIME IMAGE PROCESSING APPROACH TO MEASURE TRAFFIC QUEUE PARAMETERS. M. Fathy and M.Y. Siyal Conference 1995: Image Processing And Its Applications.
October 8, 2013Computer Vision Lecture 11: The Hough Transform 1 Fitting Curve Models to Edges Most contours can be well described by combining several.
CSE 185 Introduction to Computer Vision
Presented by: Kamakhaya Argulewar Guided by: Prof. Shweta V. Jain
A Genetic Algorithms Approach to Feature Subset Selection Problem by Hasan Doğu TAŞKIRAN CS 550 – Machine Learning Workshop Department of Computer Engineering.
► Image Features for Neural Network Quantitative shape descriptions of a first-order Laplacian pyramid coefficients histogram combined with the power spectrum.
Comparison of Ventricular Geometry for Two Real-Time 3D Ultrasound Machines with Three-dimensional Level Set Elsa D. Angelini, Rio Otsuka, Shunishi Homma,
Professor: S. J. Wang Student : Y. S. Wang
Improved Gene Expression Programming to Solve the Inverse Problem for Ordinary Differential Equations Kangshun Li Professor, Ph.D Professor, Ph.D College.
Texture. Texture is an innate property of all surfaces (clouds, trees, bricks, hair etc…). It refers to visual patterns of homogeneity and does not result.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
1 Webcam Mouse Using Face and Eye Tracking in Various Illumination Environments Yuan-Pin Lin et al. Proceedings of the 2005 IEEE Y.S. Lee.
Object Detection with Discriminatively Trained Part Based Models
Representations for object class recognition David Lowe Department of Computer Science University of British Columbia Vancouver, Canada Sept. 21, 2006.
Object Recognition in Images Slides originally created by Bernd Heisele.
Fast and Robust Ellipse Detection J Yao, N Kharma et al. Computational Intelligence Lab Electrical & Computer Eng. Dept. Concordia University Montréal,
CS654: Digital Image Analysis Lecture 25: Hough Transform Slide credits: Guillermo Sapiro, Mubarak Shah, Derek Hoiem.
The Implementation of Markerless Image-based 3D Features Tracking System Lu Zhang Feb. 15, 2005.
Robust Real Time Face Detection
Kylie Gorman WEEK 1-2 REVIEW. CONVERTING AN IMAGE FROM RGB TO HSV AND DISPLAY CHANNELS.
By Brian Lam and Vic Ciesielski RMIT University
Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica Jurisica Lab, Division of Signaling Biology Ontario Cancer.
Christian A. Cumbaa and Igor Jurisica Division of Signaling Biology, Ontario Cancer Institute, Toronto,
CSE 185 Introduction to Computer Vision Feature Matching.
October 16, 2014Computer Vision Lecture 12: Image Segmentation II 1 Hough Transform The Hough transform is a very general technique for feature detection.
Feature Selection and Weighting using Genetic Algorithm for Off-line Character Recognition Systems Faten Hussein Presented by The University of British.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Data Mining and Decision Support
CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.
A Framework for a Fully Automatic Karyotyping System E. Poletti, E. Grisan, A. Ruggeri Department of Information Engineering, University of Padova, Italy.
September 28, 2000 Improved Simultaneous Data Reconciliation, Bias Detection and Identification Using Mixed Integer Optimization Methods Presented by:
776 Computer Vision Jan-Michael Frahm Spring 2012.
Robodog Frontal Facial Recognition AUTHORS GROUP 5: Jing Hu EE ’05 Jessica Pannequin EE ‘05 Chanatip Kitwiwattanachai EE’ 05 DEMO TIMES: Thursday, April.
Defect Prediction using Smote & GA 1 Dr. Abdul Rauf.
Grid-Based Genetic Algorithm Approach to Colour Image Segmentation Marco Gallotta Keri Woods Supervised by Audrey Mbogho.
Genetic Algorithm(GA)
By Brian Lam and Vic Ciesielski RMIT University
Recognition of biological cells – development
In Search of the Optimal Set of Indicators when Classifying Histopathological Images Catalin Stoean University of Craiova, Romania
Fast and Robust Ellipse Detection
Fast and Robust Ellipse Detection
Brain Hemorrhage Detection and Classification Steps
R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.
Factors that Influence the Geometric Detection Pattern of Vehicle-based Licence Plate Recognition Systems Martin Rademeyer Thinus Booysen, Arno Barnard.
Crystallization Image Analysis on the World Community Grid
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Leveraging Genetic Algorithm and Neural Networks in Automated Protein Crystal Recognition Ming Jack Po and Andrew Laine Department of Biomedical Engineering Columbia University New York, NY USA August 22 nd 2008 IEEE EMBS Annual Conference 2008, Vancouver, Canada

2 Agenda Introduction Current Algorithm Future Direction

Protein Structure Determination currently relies on X-ray crystallography The production of protein crystals is crucial to protein structure determination via x-ray crystallography. In 2000, the US National Institute of General Medical Sciences of the National Institutes of Health funded the Protein Structure Initiative (PSI), a ten-year project to uncover the three-dimensional shapes of a wide range of proteins. 1 Unfortunately, there are currently no reliable methodology to predict environments that would lead to protein crystallization. –High throughput experiments with varying crystallization parameters are being performed in order to “brute force” the problem. 3 1)

4 HTP Protein Crystallization Screening is currently the bottleneck in protein crystal discovery Extensive backlog of images have been developed –1536 Wells / Plate * 5K Plates * 6 time points ~ 46M Images* Manual Inspection of images from HTP experiments is not practical –Qualified and trained crystallographers are in short supply. –Crystallographers cannot keep up with the speed of robotic systems used in production experiments. Automated Protein Crystallization Screening is needed to tackle both previous existing images and future images * Feb 2002 to October 2006 only

5 Several key challenges have to be overcome for automated protein crystal recognition Arbitrary geometric orientation and structure of crystals Presence of organic matter Non-uniform lighting conditions Irregular droplet boundaries and size. Hits

Our Solution to the problem – Neural Networks Advantages –Allows for incremental learning –Can deal with the seemingly arbitrary geometric orientation and structure of crystals –Fast classification speed once neural net has been trained. Disadvantages –Black-box methodology –Identification of good feature set necessary for good performance –Need sufficiently large training set to be robust 6

Training database has been compiled by HWI expert crystallographers CrystalPrecipitatePrecipitate & SkinClear SkinPhase SeparationPrecipitate & CrystalUnsure Phase Separation & Precipitate Phase Separation & Crystal 7 Dr. George DeTitta et al. at HWI (Buffalo, NY) has compiled a data set of 73,632 manually classified images. –3 independent crystallographers each categorized 75,000 images into one of the above categories. –75,632 of these images have consensus between at least two crystallographers. Only these images were used for validation and training.

8 Agenda Introduction Current Algorithm Future Direction

Pre-Processing Steps Images are converted to Sobel edge sets and single edge points are removed. Multi-population Genetic Algorithm is performed on the image to find ellipsoidal Region of Interest (elaborated upon on the next few slides). 9 Image Normalization MPGA 1 – ROI Detection MPGA 2 – Area of Crystals Linearity Detector Laplacian Pyramidal Decomposition Feature Extraction

Multiple Population Genetic Algorithm 10 Randomly select 100 “chromosomes” of 5 points. Fitness based on similarity and distance metric. Similarity = Distance = Evolution proceeds through selection and diversification. –Optimize for high fitness score based on a combination of similarity and distance scores. –Selection eliminates low fit populations. –Diversification is realized through crossover, mutation and clustering. Significant speed and accuracy improvements vs. Randomized Hough Transforms. –Processing time dropped 50% to ~ 10 seconds for ROI detection. Yao, J., Kharma, N., and Grogono, P, "A multi-population genetic algorithm for robust and fast ellipse detection", Pattern Analysis & Applications, Volume 8, Issue 1 - 2, Sep 2005, pp

Ellipsoidal Geometry 11 The equation of a conic through 5 points is –This conic is an ellipse iff With 5 (x,y) pairs, it is possible to solve for parameters (a,h,b,g,f), and thus in turn solve for the physically related ellipsoidal parameters to the right.

MPGA is run twice due to variations in fitness criteria –Similarity = Distance = –Multiple population genetic algorithm allows for significantly faster and more robust search results than Randomized Hough Transform. –MPGA 1 – ROI Detection Heavy distance penalties for points that do not line up exactly on the perimeter of the projected ellipse. looks for r_maj close to r_min (more circular shapes – droplets, well). r_maj and r_min are bounded at empirically determined values. –MPGA 2 – Crystal Detection Only run inside ROI Heavy distance penalties only for far away points, but allow for ellipsoidal shape to be more “flexible”. Looks for r_maj far from r_min (more elongated ellipsoidal – closer to crystals). r_maj and r_min are bounded by no more than ½ ROI’s r_maj and r_min. 12

13 Crystal Recognition Code Execution Speed Execution Speed Pre-Processing12s Background Normalization GA ROI Detection Laplacian Pyramidal Decomposition Feature Extraction2s Mean, Standard Deviation Skewness, Kurtosis Energy, Entropy Area*, Linearity* Network Classifier0.5s Feed-forward Network (log-sig) Total14.5s * Not scale invariant, and done on original scale

Performance for current algorithm Performance metrics derived using 10% randomized holdout averaged over 3 iterations. Current false negative rate ~ 10%. –Working to reduce the number to below 5% at minimum before putting it into production.* –Current false negatives are total misses, so not possible to correct through thresholding. There is also no intuitive visual correlation. Current true negative rate ~ 99%. 14 Conversations with John Hunt

15 Agenda Introduction Current Algorithm Future Direction

Future Directions 16 Bishop, C. Neural Networks for Pattern Recognition. Incremental Neural Network training has been implemented in Matlab. –Allows us learn new crystal shapes & percipatate. Negligible performance hit. Porting the simulation portion of the network classifier onto C++. –The current program consists of –Preprocessing done in C++ inside the IT++ framework –Neural network toolbox in Matlab Currently working on making new training data sets. –Selectively biasing the training data set in order to increase accuracy. Expansion of feature sets in order to improve false negative rates.

17 Acknowledgements This project is part of the Northeast Structural Genomics Consortium (NESG) sponsored by the NIH for evaluating the feasibility, costs, economics of scale, and value of structural genomics. Protein crystal images acquired from Hauptman-Woodward Medical Research Institute, Buffalo, NY.