Download presentation
Presentation is loading. Please wait.
Published byGabriel Jordan Modified over 9 years ago
1
The Impact of Big Data on Health Science Research Vipin Kumar University of Minnesota kumar@cs.umn.edu www.cs.umn.edu/~kumar Delivery Science Summit, Mayo Clinic, 2015
2
Big Data Era Delivery Science Summit, Mayo Clinic, 2015
3
SNP Brain Imaging Data Big Data in Health Science Genomics Data Mobile Health Data EHR Data Gene Expression Protein Network Mass Spectrometry fMRI Data Diffusion Tensor Imaging EEG Data MEG Data PET Data Activity Heart rate Sleep monitoring Glucose monitoring Blood pressure Delivery Science Summit, Mayo Clinic, 2015
4
Electronic Health Records -Big Data holds the potential for improving clinical quality and reducing healthcare costs -Understanding the Natural History of Disease -Risk Prediction / Biomarker -Quantifying the Effect of Intervention -Constructing Evidence Based Guidelines -Adverse Event Detection -Many Challenges -Data is High-Dimensional, Sparse, Fragmented, often Missing/Censored -Questions of interest are complex -Need to integrate Expert Knowledge Delivery Science Summit, Mayo Clinic, 2015
5
Why does colonoscopy sometimes fail to prevent colon cancer ? -2008 ACG/Olympus and ACG Presidential Award -Plenary talk at Digestive Disease Week (DDW), 2009 -Joint work with Piet de Groen (Mayo Clinic) -Focuses on Endoscopist and Withdrawal Time -Makes use of Mayo Clinic Rochester Dataset Delivery Science Summit, Mayo Clinic, 2015 Gupta et al. 2009 Case study
6
Delivery Science Summit, Mayo Clinic, 2015 Miss Rate for each Endoscopist 9.9% cancers are missed Varies among endoscopists Not related to experience or withdrawal time Truly Missed Probably missed Seen & Removed
7
Predicting Improvement of Mobility from Home Health Care Data Outcome and Assessment Information Set Data (OASIS) -Sample of 270,634 patient records ( 10/1/2008 – 12/31/2009) from 581 Medicare-certified, home healthcare agencies Diagnosis Codes (ICD-9) Diagnosis Codes (ICD-9) Admission Survey (OASIS) Admission Survey (OASIS) Discharge Survey (OASIS) Discharge Survey (OASIS) Home Healthcare Demographic, behavioral, pathological, psycho- social factors, outcome variables. Objectives - Identify patterns of factors associated with improvement and no improvement in mobility within each group Delivery Science Summit, Mayo Clinic, 2015 Case study
8
Patterns Associated with Improvement and No-Improvement in Mobility Outcome - The size of the circles represents the magnitude of individual ORs of each variable present in the patterns. - The larger the circle/node in the pattern, the more likely the variable is associated with the outcome Delivery Science Summit, Mayo Clinic, 2015 [Dey et al. Nur. res. 2014]
9
Genomics Data Driven by advances in high-throughput technologies Holds great promise for revolutionizing practice of medicine – Determining predisposition to a disease – Personalized medicine Delivery Science Summit, Mayo Clinic, 2015 SNP Gene Expression Protein Network Mass Spectrometry
10
NHGRI GWA Catalog www.genome.gov/GWAStudies www.ebi.ac.uk/fgpt/gwas/ Published Genome-Wide Associations through 12/2013 Published GWA at p≤5X10 -8 for 17 trait categories
11
Most have modest effect sizes Limited overall impact even when combined Marked disparity with Extent of overall familial aggregation Missing heritability Rare variations Combinatorial biomarkers Novel types of (epi)genetic variations Brendan Maher, 2008 Manolio et al. 2009 Eichler et al. 2010 McCarthy et al. 2008 Manolio et al. 2009 Missing heritability Delivery Science Summit, Mayo Clinic, 2015
12
12 Disqualified Prune all the supersets [Agrawal et al. 1994] Anti-monotonic upper bound Association Analysis for Discovering Interesting Combinations Delivery Science Summit, Mayo Clinic, 2015
13
Combinatorial markers for Lung Cancer Delivery Science Summit, Mayo Clinic, 2015 ≈ 60% ≈ 10% Selected for highlight talk, RECOMB SB 2010 Best Network Model award, Sage Congress, 2010 [Fang, Kuang, Pandey, Steinbach, Myers and Kumar, PSB 2010] Case study
14
Big Data in Human Neuroscience fMRI Data Diffusion Tensor Imaging EEG Data MEG Data Gene Exp. PET Data Increasing amount of neuroimaging data is available E.g., Single fMRI 20 min. scan is 6GB in size 8,000 MRI datasets available in public domain (Poldrack et al. 2014) Human Connectome Project (HCP) Alzheimer’s Disease Neuroimaging Initiative (ADNI) Healthcare questions How can we automate diagnosis using imaging data? How can we predict the disease in advance? How can we study effectiveness of treatment methodologies? Delivery Science Summit, Mayo Clinic, 2015
15
Brain Connectivity: Healthy vs. Disease Lynall et al. 2010 fMRI scan time regions Atlas Brain Network 90 brain regions What are the key differences in networks between two groups? Delivery Science Summit, Mayo Clinic, 2015
16
Healthy vs. Disease – univariate testing edges schizophrenia Healthy ● Several studies have reported connections associated with schizophrenia in the last decade ● Are these reported connections consistent across studies? Correlation matrix regions Delivery Science Summit, Mayo Clinic, 2015
17
Inconsistent findings reported in Independent studies Delivery Science Summit, Mayo Clinic, 2015
18
A Big Data Approach edges Cluster edges cluster1 cluster2 cluster3 Test significance Strengths: Handles redundancy, small number of tests, statistical power Handles noise by averaging connectivity across multiple edges Could potentially increase reliability... subjects Delivery Science Summit, Mayo Clinic, 2015 Case study
19
A Big Data Approach edges Cluster edges cluster1 cluster2 cluster3 Test significance... subjects Clusters of connections Thalamus and primarily striate visual regions Thalamus and lateral visual regions Thalamus and lateral temporal regions Delivery Science Summit, Mayo Clinic, 2015 Case study
20
Studying Dynamics in Brain Networks Resting state connectivity Atluri et al. SDM 2014 Resting state vs. Watching cartoons Atluri et al. 2014 Delivery Science Summit, Mayo Clinic, 2015
21
Studying Dynamics Dynamic Brain Connectivity Atluri et al. SDM 2014 Resting state vs. Watching cartoons Atluri et al. 2014 Expeditions in Computing: Understanding Climate Change - A Data Driven Approach 5-year, $10 Million project Leverages the wealth of climate and ecosystem data Opportunities for cross-fertilization among diverse domains with space- time data such as Climate, Bio-images, Taxi data, Astronomy, Precision agriculture Delivery Science Summit, Mayo Clinic, 2015
22
mHealth Rapid growth in wearable devices market Collects a variety of variables: Respiration, Heartrate, Activity, Sleep, Glucose, Blood-pressure, Audio-visual samples Healthcare questions: – Can we predict psychological state (e.g. stress, anger)? – Can we assess lifestyle choices (e.g., smoking and diet)? – Can we study the relapse patterns in treating addiction? – Can we deliver telemedicine at the right time? Delivery Science Summit, Mayo Clinic, 2015
23
MD2K is one of 11 national NIH Big Data Centers of Excellence Part of Big-Data-to-Knowledge (BD2K) Initiative Collaborative effort between diverse areas Computer Science, Engineering, Medicine, Behavioral Science, and Statistics Advancing Biomedical Discovery and Improving Health through Mobile Sensor Big Data Autosense Project mConverse: Inferring conversations Rahman et al. WirelessHealth’11 mPuff: Detecting smoking episodes Ali et al. IPSN’12 mStress: Identifying stress Raij et al. WirelessHealth’11 Delivery Science Summit, Mayo Clinic, 2015
24
Conclusion Huge opportunities for application of Big Data to Healthcare Research Many Challenges – Data not in the form to be used by readily available big data technologies – Translating healthcare questions to data science questions – Many health specific data science questions require advances in big data analytics Delivery Science Summit, Mayo Clinic, 2015
25
References Atluri, G., Steinbach, M., Lim, K. O., Kumar, V., & MacDonald, A. (2015). Connectivity cluster analysis for discovering discriminative subnetworks in schizophrenia. Human brain mapping, 36(2), 756-767. Atluri, G., Steinbach, M., Lim, K. O., MacDonald III, A., & Kumar, V. Discovering Groups of Time Series with Similar Behavior in Multiple Small Intervals of Time. SDM 2014 Atluri, G., Steinbach, M., Lim, K., MacDonald, A., & Kumar, V. (2014). Discovering the Longest Set of Distinct Maximal Correlated Intervals in Time Series Data. University of Minnesota, Tech-report 14-025, 2014. Rahman, M. M., Ali, A. A., Plarre, K., al'Absi, M., Ertin, E., & Kumar, S. (2011, October). mConverse: inferring conversation episodes from respiratory measurements collected in the field. In Proceedings of the 2nd Conference on Wireless Health (p. 10). ACM. Ali, A. A., Hossain, S. M., Hovsepian, K., Rahman, M. M., Plarre, K., & Kumar, S. (2012, April). mPuff: automated detection of cigarette smoking puffs from respiration measurements. In Proceedings of the 11th international conference on Information Processing in Sensor Networks (pp. 269-280). ACM. Raij, A., Blitz, P., Ali, A. A., Fisk, S., French, B., Mitra, S.,... & Smailagic, A. (2010). mstress: Supporting continuous collection of objective and subjective measures of psychosocial stress on mobile devices. ACM Wireless Health 2010 San Diego, California USA. Dey, Sanjoy, et al. "Mining Patterns Associated With Mobility Outcomes in Home Healthcare." Nursing research 64.4 (2015): 235-245. Gupta, Rohit, et al. "284 Colorectal Cancer Despite Colonoscopy: Critical Is the Endoscopist, Not the Withdrawal Time." Gastroenterology 136.5 (2009): A-55 Simon, Gyorgy J., et al. "Survival association rule mining towards type 2 diabetes risk assessment." AMIA Annual Symposium Proceedings. Vol. 2013. American Medical Informatics Association, 2013. Li, Dingcheng, et al. "Using Association Rule Mining for Phenotype Extraction from Electronic Health Records." AMIA Summits on Translational Science Proceedings 2013 (2013): 142. Schrom, John R., et al. "Quantifying the effect of statin use in pre-diabetic phenotypes discovered through association rule mining." AMIA Annual Symposium Proceedings. Vol. 2013. American Medical Informatics Association, 2013. Pandey, Gaurav, et al. "An association analysis approach to biclustering."Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2009. Kim, Hye Soon, et al. "Comorbidity study on type 2 diabetes mellitus using data mining." The Korean journal of internal medicine 27.2 (2012): 197- 202. Shin, A. Mi, et al. "Diagnostic analysis of patients with essential hypertension using association rule mining." Healthcare informatics research 16.2 (2010): 77-81 Fang, Gang, et al. "Subspace differential coexpression analysis: problem definition and a general approach." Pacific symposium on biocomputing. Vol. 15. 2010. Delivery Science Summit, Mayo Clinic, 2015
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.