Highlights from NSF ongoing award on computational methods for exploring the geometry of large data sets Gilad Lerman University of Minnesota
Outline 1.Multiscale strip construction and applications to bioinformatics and imaging 2.The SCC algorithm for hybrid linear modeling and applications 3.Distances of functional protein domains and function-structure correlation 4.REU project: Detection of railroad cracks 5.REU project: Detection of cholesterol
Part 1 Multiscale strip construction and applications to bioinformatics and imaging
Curve and Strip Construction Goal: Estimate curve (light blue) and “strip” (green) around a main distribution of points (blue) while isolating “outliers” (red)
Movie demonstrating construction
Application 1: Bioinformatics Normalization of asymmetric ChIP-on-chip microarrays:
Application 2: imaging Robust edge detection
Part 2 The SCC algorithm for hybrid linear modeling
Hybrid Linear Modeling Input: data sampled from lines, planes, etc. Output: clusters respecting the geometry
How SCC works SCC maps original data (left column) to new coordinates and segment in this space (middle column). It then finds the original clusters accordingly (right column).
Application 1: separation of moving object from background Points on the moving object and the background correspond to blue and red planes respectively in each scenario. The separation is thus achieved by segmenting the two planes.
Application 2: segmentation of face images The facial images of different human subjects (e.g., 5,8,10) live on different planes in the image space. The separation is achieved by clustering the three planes.
Part 3 Distances of functional protein domains and function-structure relation
Function-Structure-Sequence The figure compares distances between “protein domains” (suggested in our work) with commonly used similarities of protein structure, sequence and phylogenetic information.
Functional Domain Graph The graph is formed according to the functional distances suggested in our work. The clusters it indicates are directly related to manually classified structures.
Part 4 REU Project: Detection of railroad cracks
Detecting Cracks in Railroads This railroad track has tiny cracks This damage needs to be identified quickly by a computer
How to detect damaged rails? Traditionally… drive along the rail (very long) and inspect Very easy to miss defects (falling asleep...) New technology: getting pictures of rails
Demonstration: Original imageDetected edges Changes in brightness across the image were used to detect cracks in track
Part 5 REU Project: Detection of cholesterol in histology slices
Detection of cholesterol This image of a blood vessel is used in testing the efficiency of new drugs. A computer is used to rapidly determine the amount of cholesterol in the image. This can be used to determine the efficiency of an anti-cholesterol drug by using a large database of images. Images courtesy of Merck
Actual detection Detected blood vesselDetected cholesterolOriginal image
Collaborators: Harvard University: B. Shakhnovich Loram: Dhaval Daftari Merck: Belma Dodgas New York University: A. Blais, B. Dynlacht, B. Mishra State University of New York, Stony Brook: J. McQuown University of Minnesota: G. Chen, K. Heuton, T. Whitehouse