Mining and Visualization of Flow Cytometry Data ANGELA CHIN UNIVERSITY OF HOUSTON RESEARCH EXPERIENCE FOR UNDERGRADUATES JULY 3, 2013 1.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Normal Distribution 2 To be able to transform a normal distribution into Z and use tables To be able to use normal tables to find and To use the normal.
Module 6 “Normal Values”: How are Normal Reference Ranges Established?
MIS2502: Data Analytics Clustering and Segmentation.
STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for.
HPC - High Performance Productivity Computing and Future Computational Systems: A Research Engineer’s Perspective Dr. Robert C. Singleterry Jr. NASA Langley.
Model-based clustering of gene expression data Ka Yee Yeung 1,Chris Fraley 2, Alejandro Murua 3, Adrian E. Raftery 2, and Walter L. Ruzzo 1 1 Department.
K-means clustering Hongning Wang
AIIA Lab, Department of Informatics Aristotle University of Thessaloniki Z.Theodosiou, F.Raimondo, M.E.Garefalaki, G.Karayannopoulou, K.Lyroudia, I.Pitas,
Hidden Markov Model based 2D Shape Classification Ninad Thakoor 1 and Jean Gao 2 1 Electrical Engineering, University of Texas at Arlington, TX-76013,
Amir Hossein Momeni Azandaryani Course : IDS Advisor : Dr. Shajari 26 May 2008.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Bayesian Learning 1 of (probably) 2. Administrivia Readings 1 back today Good job, overall Watch your spelling/grammar! Nice analyses, though Possible.
Chapter 5. Operations on Multiple R. V.'s 1 Chapter 5. Operations on Multiple Random Variables 0. Introduction 1. Expected Value of a Function of Random.
김덕주 (Duck Ju Kim). Problems What is the objective of content-based video analysis? Why supervised identification has limitation? Why should use integrated.
Processing and Characterization of Hierarchical TiO 2 Coatings on Ti Implants Research Undergraduate: Christine McLinn Faculty Advisor: Dr. Grant Crawford.
Automated analysis of cytometric profiles of synovial fluid and peripheral blood in rheumatoid arthritis Till Sörensen 1, Ursula Schulte-Wrede 2, Silvia.
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
Interfacing Physics Sensors Using National Instruments Educational Laboratory Virtual Instrumentation Suite & LabVIEW by Eric Ethridge Left to right: motion.
UNDERSTANDING DYNAMIC BEHAVIOR OF EMBRYONIC STEM CELL MITOSIS Shubham Debnath 1, Bir Bhanu 2 Embryonic stem cells are derived from the inner cell mass.
Craig Lawrie Advisor: Dr. John Ruhl Abstract Software is developed for the detection of galaxy clusters in data gathered by the South Pole Telescope (SPT).
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.
1 A K-Means Based Bayesian Classifier Inside a DBMS Using SQL & UDFs Ph.D Showcase, Dept. of Computer Science Sasi Kumar Pitchaimalai Ph.D Candidate Database.
ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles 組員:李祥豪 謝紹陽 江建霖.
Boris Babenko Department of Computer Science and Engineering University of California, San Diego Semi-supervised and Unsupervised Feature Scaling.
A study of sale price and marketing time for the new housing building Dr. Ming-Yi Huang National Pingtung Institute of Commerce, Taiwan.
Yet Another Heapspray Detector Danny Kovach Raytheon SI.
Outline Introduction Descriptive Data Summarization Data Cleaning Missing value Noise data Data Integration Redundancy Data Transformation.
MRI Image Segmentation for Brain Injury Quantification Lindsay Kulkin BRITE REU 2009 Advisor: Bir Bhanu August 20, 2009.
AUTOMATIZATION OF COMPUTED TOMOGRAPHY PATHOLOGY DETECTION Semyon Medvedik Elena Kozakevich.
1 Statistical Techniques Chapter Linear Regression Analysis Simple Linear Regression.
Preprocessing Techniques for Image Analysis Applications Hong Zhang.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Virus Pattern Recognition Using Self-Organization Map.
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
Data Perturbation An Inference Control Method for Database Security Dissertation Defense Bob Nielson Oct 23, 2009.
Chapter 5: z-Scores x = 76 (a) X = 76 is slightly below average x = 76 (b) X = 76 is slightly above average 3 70 x = 76 (c) X = 76 is far.
Suppression of the eyelash artifact in ultra-widefield retinal images Vanessa Ortiz-Rivera – Dr. Badrinath Roysam, Advisor –
Fig.1. Flowchart Functional network identification via task-based fMRI To identify the working memory network, each participant performed a modified version.
Aquil Frost, Environmental Engineering, Central State UniversityGraduate Student Mentor: Abishek Venkatakrishnan John Lewnard, Mechanical Engineering,
Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining
Figure 3. Log-log plot of simulated oscillating phantom, assuming a Gaussian-shaped field. Field constants a 1 =a 2 =0.1. The data initially plateau, then.
Automated Gating of Flow Cytometry Data using Rho Path Distance
Possibility of detecting CHRISTODOULOU MEMORY of GRAVITATIONAL WAVES by using LISA (Laser Interferometer Space Antenna) Thus, the final form of the memory.
Analyzing Expression Data: Clustering and Stats Chapter 16.
A new initialization method for Fuzzy C-Means using Fuzzy Subtractive Clustering Thanh Le, Tom Altman University of Colorado Denver July 19, 2011.
Model-based Clustering
Personalization Services in CADAL Zhang yin Zhuang Yuting Wu Jiangqin College of Computer Science, Zhejiang University November 19,2006.
Cluster Analysis Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
GEOGRAPHIC DISTRIBUTION OF BREAST CANCER IN MISSOURI, Faustine Williams, MS., MPH, Stephen Jeanetta, Ph.D. Department of Rural Sociology, Division.
Preliminary Transformations Presented By: -Mona Saudagar Under Guidance of: - Prof. S. V. Jain Multi Oriented Text Recognition In Digital Images.
DEPARTMENT OF RADIOLOGIC TECHNOLOGY AND MEDICAL IMAGING RAD 2325/RT 325 RADIOGRAPHIC PROCEDURES III Inter Hospital Collaborative Research in the Fall 2012.
Anomaly Detection Carolina Ruiz Department of Computer Science WPI Slides based on Chapter 10 of “Introduction to Data Mining” textbook by Tan, Steinbach,
Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
1 SOFTWARE FOR VERIFICATION OF HOMOGENEITY RESULTS FROM AUTOMATED URINE ANALYZERS IN REAL TIME Waldemar Volanski Ademir Luiz do Prado Geraldo Picheth et.
Symposium Poster Guide
Results/Expected Results
Image Processing For Soft X-Ray Self-Seeding
A Methodology for Finding Bad Data
YOUR TITLE Your Name (Dr. Your Sponsor, Sponsor)
COSC 6335 Data Mining Fall 2009: Assignment3a Post Analysis
How to Find Data Values (X) Given Specific Probabilities
CSc4730/6730 Scientific Visualization
Department of Computer Science University of York
Discrete Surfaces and Manifolds: A Potential tool to Image Processing
Simple Kmeans Examples
Visual Causality Analysis Made Practical
Clustering Gene Expression Data Using Independent Component Analysis
Yining ZHAO Computer Network Information Center,
Presentation transcript:

Mining and Visualization of Flow Cytometry Data ANGELA CHIN UNIVERSITY OF HOUSTON RESEARCH EXPERIENCE FOR UNDERGRADUATES JULY 3,

Contents 1.Introduction to Flow Cytometry 2.The Problem 3.Current Approaches & Results 4.Future Work 2

Flow Cytometry MEDICAL TECHNIQUE USED FOR CELL COUNTING AND CELL SORTING 3

How it Works Picture from: Abcam 4

Flow Cytometry Application  Determine whether a person has b-cell lymphoma  Based on the number of clusters that result from flow cytometry Two clusters : cancer patient Three clusters : healthy individual 5

Example: Flow Cytometry Results 6 Cancer PatientHealthy Patient

Problems with Current Methods  The process for determining if there are two or three clusters is manual  Doctors’ time could be better spent on other tasks 7

The Problem CREATING AN AUTOMATED METHOD TO DETERMINING THE NUMBER OF CLUSTERS 8

Past Approaches  Many ways to determine number of clusters Most need to know the number of clusters ahead of time  Most popular is k-means, but there are some problems Need to give the algorithm the number of clusters beforehand Has difficulty when clusters are close, different sizes, etc. 9

Further Defining the Problem We want to be able to determine the number of clusters when:  The distance between clusters is very small  The ratio of cluster sizes is large (100:1 to 1000:1) We decided to further constrain the problem such that we could determine:  1 cluster vs 2 clusters when the size ratio was up to 1000:1 10

Current Approaches & Results 11

Two Approaches Approach #1: Transformation  Find the center of the data  Take each point and find its angle from the horizontal line located at the center (new x-value) and distance from the center (new y-value)  Use transformed data to determine number of clusters Approach #2: Testing Normal Fit  Project 2D data onto line to create 1D data  Apply normal distribution fit  Compare the Bayesian Information Criterion (BIC) of the fit to a cut-off limit  If the BIC is above the limit, there are two clusters; otherwise, there is one 12

Approach #1: Transformation 13

Approach #1: Transformation 14

Approach #1: Transformation Process 15

Approach #1: Transformation 16

Approach #2: Testing Normal Fit 17

Approach #2: Testing Normal Fit 3 standard deviations apart, ratio 1:99 ONE CLUSTER BEST FITSTWO CLUSTER BEST FITS 18

Approach #2: Testing Normal Fit  Comparing BIC of the one cluster versus two clusters  All data was generated using points and the same standard deviations  The ratios between clusters and distance between two clusters (if applicable) was varied Ratios: 199:1 to 63:1 Distance: 1.5 to 5 Standard Deviations apart 19

Approach #2: Testing Normal Fit 20  Comparing BIC of the one cluster versus two clusters  All data was generated using points and the same standard deviations  The ratios between clusters and distance between two clusters (if applicable) was varied Ratios: 199:1 to 63:1 Distance: 1.5 to 5 Standard Deviations apart

Future Work 21

Future Work  Approach #1: Determine if there is a way to detect the second cluster in the transformation  Approach #2: Use real data to see if a cut-off can be determined  Overall: After figuring out how to distinguish one and two clusters, extend the method to two versus three clusters 22

Limitations  Assume the data will have Gaussian distribution  Number of clusters limited to two or three 23

Acknowledgements I would like to thank my research advisor, Dr. Stephen Huang, and Mitch Shih for their guidance on this project. I would also like to thank the University of Houston Computer Science Department and the National Science Foundation for providing me with the opportunity to participate in the REU. 24