Crystallography Labeling Using Machine Learning

Slides:



Advertisements
Similar presentations
MRI Image Segmentation for Brain Injury Quantification Lindsay Kulkin 1 and Bir Bhanu 2 1 Department of Biomedical Engineering, Syracuse University, Syracuse,
Advertisements

Lionel F. Lovett, II Jackson State University Research Alliance in Math and Science Computer Science and Mathematics Division Mentors: George Ostrouchov.
The Relativistic Quantum Field Calculator for Elementary Particle Interactions Brandon Green, Dr. Edward Deveney (Mentor) Department of Physics, Bridgewater.
Planning and Scheduling.  A job can be made up of a number of smaller tasks that can be completed by a number of different “processors.”  The processors.
Color Image Segmentation Mentor : Dr. Rajeev Srivastava Students: Achit Kumar Ojha Aseem Kumar Akshay Tyagi.
MESH Crystal Injectors
The Electromagnetic Spectrum High Harmonic Generation
Mathematical Derivation of Probability
Jonathan Donkers1, Lin Zhang1+
Measuring the Thickness of Cryosheet Jets
Madhu Karra1, Jason Koglin1,2
Sentiment Analysis and Data visualization Techniques of Experiments
Complications and Troubleshooting
Image Processing For Soft X-Ray Self-Seeding
Structure and Function: Ubiquitination of NEMO and Optineurin
Characterizing Optics with White-Light Interferometry
Matter in Extreme Conditions at LCLS:
Kevin Shimbo1, Matt Hayes+
X-Ray Shielding Introduction Application Research Conclusions
Crystallography images
Hutch 4 (XCS) Laser System
Anomaly Detection LCLS data: Introduction Research Conclusion
MAKING WAY FOR LCLS II HERRIOTT CELL IGNIS THE NEED FOR LCLS-II
A remote EPICS monitoring tool
EASE: Alert Management
Designing and Prototyping Optics Table and Shelves
Managing Diffraction Beam
System Design Ashima Wadhwa.
Integration of Blu-Ice into
Effects of Wrapping Vacuum
Scintillation Material Encapsulation Fixture
LCLS CXI Sample Chambers and Engineering Solutions
Values to Normalize Data
1. Welcome to Software Construction
Designing a molecular probe for detection
LAMP Vacuum System Documentation
G6PD Complex Structural Study Crystallization Setup
Your Name, Elias Catalano, Robert Sublett, Robin Curtis
Determining a Detector’s Saturation Threshold
HXRSD Helium Enclosure
IT Inventory and Optimization
Structural study of soluble ammonia monooxygenase
Error Analysis for Sparse Data Volume Visualization
Creation and development of a PRP database
IT Inventory and Optimization
Electro Optic Sampling of Ultrashort Mid-IR/THz Pulses
Margaret Koulikova1, William Colocho+
Project Implementation for ITCS4122
Massachusetts Institute of Technology
Analytical Solutions for Heat Distribution in KB Mirror Systems
Repeatability Test of Attocube Piezo Stages
Analytical Solutions for Temperature
Barron Montgomery1,2, Christopher O’Grady2, Chuck Yoon2, Mark Hunter2+
Combined Science (1-9): Astronomy Orbits and Theories of the Universe
Elizabeth Van Campen, Barrack Obama2, Stephen Curry1,2, Steve Kerr2+
Advanced Methods of Klystron Phasing
Chemistry Unit 7 Notes: Diffusion
Ryan Dudschus, Alex Wallace, Zachary Lentz
TED Talks – A Predictive Analysis Using Classification Algorithms
Jennifer Hoover1, William Colocho2
Multiple Instance Learning: applications to computer vision
X-Ray Spectrometry Using Cauchois Geometry For Temperature Diagnostics
Redesigning of The SED Calendars
Automatic compression control for the LCLS injector laser
Russell Edmonds1, Matt Hayes2+ 1LCLS Summer Research Intern
Prototype and Design of High-Speed X-Ray Chopper
Title of Your Poster Introduction Research Conclusions Acknowledgments
Reconstruction algorithms Challenges & Future work
Planning and Scheduling
Gaurab KCa,b, Zachary Mitchella,c and Sarat Sreepathia
Presentation transcript:

Crystallography Labeling Using Machine Learning Abdallah AbuHashem1, Chuck Yoon2+ 1Stanford University, 450 Serra Mall, Stanford, CA 94305, USA 2Linac Coherent Light Source, SLAC National Accelerator Laboratory, 2575 Sand Hill Road, Menlo Park, CA 94025, USA. +Contact: yoon82@stanford.edu Introduction Labeling GUI and Use A GUI was developed to get use of the diffusion map algorithm and the constructed manifolds. The GUI consisted of two parts: First Part (Fig. 2) included the manifold, and few controlling buttons. Second Part (Fig. 3) showed the picked events and their labeled. It also allowed labeling them. In crystallography experiments, the size of data received is too big. In addition, the number of events/images is in the magnitude of hundreds of thousands. However, scientists usually are interested in a small part of the overall events: single hits and sometimes multiple hits. Already existing software, Psocake, helped in cutting down the number of events, yet it did not have an interface to classify them, so a GUI had to be developed. Moreover, the process of labeling was still slow, so to make it faster, a Machine Learning algorithm using the concept of diffusion maps was developed to construct manifolds out of the different events data. Keywords: Crystallography, Psocake, Labeling, Machine Learning, Diffusion Maps, GUI. Fig. 4 – User Provided Labels Research Diffusion Map Manifold Overview: Diffusion maps is a feature extraction algorithm. In this research, we use crystallography events data to construct manifolds (E.g. Fig. 1). The distances between points in the manifold represents the diffusion distances or similarity between different points/events. Fig. 2 – Part One of The GUI Fig. 5 – Propagated Labels Recommender System: In order to make the accuracy of the label propagation algorithm higher, a recommender system was implemented. The system would show 10 images at a time for the user to label. The shown images are the most confusing images for the propagating algorithm. Conclusions The results we got are really interesting, and can help scientists a lot. However, it was possible to do things in a better way. Getting use of the manifold’s labels and data in other runs or other experiments' manifolds. Although this is currently possible between small number of runs, it does not work on large numbers. The label propagating works at an acceptable accuracy of about 85%. However, a higher accuracy is always better. In addition, it would be better if it required less than 20% user labeled events. Fig. 3 – Part Two of The GUI How it works: A label propagating algorithm was developed based on the manifold distances. As soon as the user starts labeling, the algorithm starts spreading the labels. The spread labels are then shown on the manifold as different colors*. If the user found out that some of the spread labels are not spread correctly, he can correct them, and the spreading algorithm will fix the rest of the labels to achieve a better accuracy. Fig. 1 – Diffusion Map Manifold Exp. Amo06516 Runs 93-97 Result: From the manifold in Fig. 1, it is noticeable that there are two main clusters and a bunch of sparse points in the Euclidean space around the two clusters. Every cluster is dominated by a specific type of events. The one to the right has a majority of Single hit events, while the smaller cluster to the left has a majority of water or unknown hits, which are not important nor interesting for scientists. For the sparse points, they are not dominated by any type of hits, but most of multiple hits exist in these areas. Example: The developed algorithms were applied on different runs in experiment amo06516 (A crystallography experiment on virus PR772). The included Manifold shows runs 93-97. In Fig. 4, the colored* points represent user labeled events. Fig. 5 shows how it was propagated. The percentage of user provided labels is 20%. The number of points in the manifold is 1770, and the accuracy that was achieved is 87%. Acknowledgments Use of the Linac Coherent Light Source (LCLS), SLAC National Accelerator Laboratory, is supported by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences under Contract No. DE-AC02-76SF00515. A special thanks to my mentor, Chuck Yoon, for providing countless advice and support on the coding and algorithm development process. *Red = Single, Green = Multiple, Blue = Water/Unknown Date: 09/07/2016