The Impact of Big Data on Health Science Research Vipin Kumar University of Minnesota Delivery Science Summit, Mayo.

Slides:



Advertisements
Similar presentations
PERSONALIZED MEDICINE: Planning for the Future You, Your Biomarkers and Your Rights.
Advertisements

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
An Association Analysis Approach to Biclustering website:
Wrapup. NHGRI strategic plan What does the NIH think genomics should be for the next 10 years? [Nature, Feb. 2011]
Division of Biomedical Informatics Beyond Interoperability: What Ontology Can Do for the EHR William R. Hogan, MD, MS July 30 th, 2011 International Conference.
Public Health Julie C. Chapman, PsyD Director of Neuroscience War Related Illness & Injury Study Center Veterans Affairs Medical Center, Washington, DC.
Subspace Differential Coexpression Analysis for the Discovery of Disease-related Dysregulations Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach,
ETIM-1 CSE 5810 CSE5810: Intro to Biomedical Informatics Mobile Computing to Impact Patient Health and Data Exchange and Statistical Analysis Presenter:
Applying electronic health record data to quality of care improvement and practice based research initiatives Cecil Pollard, Director West Virginia University.
Vision of how informatics enables a transformed health system Joyce Sensmeier MS, RN-BC, CPHIMS, FHIMSS, FAAN Vice President, Informatics, HIMSS President,
Medical Informatics Basics
Advancing Health: OCE May 8 th, ©Southlake Regional Health Centre Financial Drivers 80 / 51 42% 12%
Global impact of ischemic heart disease World Heart Federation, 2011.
Georgia Wiesner, MD CREC June 20, GATACAATGCATCATATG TATCAGATGCAATATATC ATTGTATCATGTATCATG TATCATGTATCATGTATC ATGTATCATGTCTCCAGA TGCTATGGATCTTATGTA.
Discussion Topics Healthcare: Then, Now and in the Future
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
CceHUB A Knowledge Discovery Environment for Cancer Care Engineering Research Ann Christine Catlin HUBzero Workshop November 7, 2008.
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 4 EBM: A Historical Perspective.
Bioinformatics and medicine: Are we meeting the challenge?
Where to focus? Horizon 2020 'Health, demographic change and wellbeing' Open Info Day -Horizon 2020 'Health, demographic change and wellbeing' Brussels,
European Society of Cardiology Cardiovascular diseases in women.
Computers in Healthcare Jinbo Bi Department of Computer Science and Engineering Connecticut Institute for Clinical and Translational Research University.
Anticipated FY2016 Appropriations Agency$ Million NIH200 Cancer70 Cohort130 FDA10 Office of the Natl Coord. for Health IT (ONC) 5 TOTAL215 Mission: To.
© Copyright IBM Corporation 2008 Health Analytics: An Overview HealthTech Net November 20, 2008 Richard Singerman, Ph.D.
The Center for Health Systems Transformation
N222Y Health Information Technology Module: Improving Quality in Healthcare and Patient Centered Care Looking to the Future of Health IT.
Acknowledgements Contact Information Anthony Wong, MTech 1, Senthil K. Nachimuthu, MD 1, Peter J. Haug, MD 1,2 Patterns and Rules  Vital signs medoids.
Getting Published Gavin Leslie Judy Currey Andrea Marshall Leanne Aitken.
Personalized Medicine Dr. M. Jawad Hassan. Personalized Medicine Human Genome and SNPs What is personalized medicine? Pharmacogenetics Case study – warfarin.
Bringing Genomics Home Your DNA: A Blueprint for Better Health Dr. Brad Popovich Chief Scientific Officer Genome British Columbia March 24, 2015 / Vancouver,
Improving Value in Health Care: Challenges and Potential Strategies Arnold M Epstein October 24, 2008 Congressional Health Care Reform Education Project.
Center for Causal Discovery (CCD) of Biomedical Knowledge from Big Data University of Pittsburgh Carnegie Mellon University Pittsburgh Supercomputing Center.
Telehealth Technology
BY: SAMIRA HAMOUD, ASHLEY EDWARDS & ANGELA PADILLA
Organization of statistical research. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and.
Automatic Discovery and Processing of EEG Cohorts from Clinical Records Mission: Enable comparative research by automatically uncovering clinical knowledge.
Rehospitalization Analytics: Modeling and Reducing the Risks of Rehospitalization Chandan K. Reddy Department of Computer Science, Wayne State University.
Leading the Biomedical Revolution in Precision Health: How Stanford Medicine is Developing the Next Generation of Health Care Annual Stanford Medicine.
D4FF55A0-6B6F BF422A9BA9 Present by: Xiao Chen On December 7, 2015.
BIOSTATISTICS Lecture 2. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and creating methods.
Amar K. Das, MD, PhD Associate Professor of Biomedical Data Science, Psychiatry and Health Policy & Clinical Practice Geisel School of Medicine at Dartmouth.
Research Priorities & Trends for NIH in 2003 Claire T. Driscoll Director Technology Transfer Office National Human Genome Research Institute (NHGRI) Research.
Clinical Research Informatics [CRI]. Informatics, defined generally as the intersection of information and computer science with a health-related discipline,
Methods We employ the UMLS Metathesaurus to annotate ICD-9 codes to MedDRA preferred terms (PTs) using the three-step process below. The mapping was applied.
N VISUAL ANALYTICS FOR HEALTHCARE: BIG DATA, BIG DECISIONS David Gotz Healthcare Analytics Research Group IBM T.J. Watson Research Center.
Linking Electronic Health Records Across Institutions to Understand Why Women Seek Care at Multiple Sites for Breast Cancer Caroline A. Thompson, PhD,
Data Deluge Challenges and Opportunities Vipin Kumar University of Minnesota Infosys Aurora – August 2011.
MPuff: Automated Detection of Cigarette Smoking Puffs from Respiration Measurements Amin Ahsan Ali, Syed Monowar Hossain, Karen Hovsepian, Md. Mahbubur.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
INFERENCE FOR BIG DATA Mike Daniels The University of Texas at Austin Department of Statistics & Data Sciences Department of Integrative Biology.
Vipin Kumar Regents Professor and William Norris Chair in Large Scale Computing Research interests – Data mining, – high-performance computing, and – their.
Tim Friede Department of Medical Statistics
Healthcare and Medicine: New frontiers for analytics and data mining
Semantic Web - caBIG Abstract: 21st century biomedical research is driven by massive amounts of data: automated technologies generate hundreds of.
Overview of Biomedical Informatics
Showcasing work by Jonnageddala, Liaw, Ray, Kumar, Chang, and Dai on
Biomedical Data Science for Precision Medicine
Physician Performance Measures: Like It Or Not?
Lisa A. Weissfeld Professor and Associate Chair Dept. of Biostatistics
Data challenges in the pharmaceutical industry
Data Mining Techniques For Correlating Phenotypic Expressions With Genomic and Medical Characteristics This work has been supported by DTC, IBM and NSF.
Walden University Carrie Vanzant February 7, 2010
Elham Rastegari University of Nebraska at Omaha
Biomedical Data Science for Precision Medicine
کتابهای تازه خریداری شده فن آوری اطلاعات سلامت 1397
Data Mining for Finding Connections of Disease and Medical and Genomic Characteristics Vipin Kumar William Norris Professor and Head, Department of Computer.
Simplifying Healthcare
Digital Biomarkers – Patient data mining & precision medicine Stefan Schulz, Medical University of Graz Donausymposium Vienna, March 14, 2018.
Medical Informatics and Explainable AI
Presentation transcript:

The Impact of Big Data on Health Science Research Vipin Kumar University of Minnesota Delivery Science Summit, Mayo Clinic, 2015

Big Data Era Delivery Science Summit, Mayo Clinic, 2015

SNP Brain Imaging Data Big Data in Health Science Genomics Data Mobile Health Data EHR Data Gene Expression Protein Network Mass Spectrometry fMRI Data Diffusion Tensor Imaging EEG Data MEG Data PET Data Activity Heart rate Sleep monitoring Glucose monitoring Blood pressure Delivery Science Summit, Mayo Clinic, 2015

Electronic Health Records -Big Data holds the potential for improving clinical quality and reducing healthcare costs -Understanding the Natural History of Disease -Risk Prediction / Biomarker -Quantifying the Effect of Intervention -Constructing Evidence Based Guidelines -Adverse Event Detection -Many Challenges -Data is High-Dimensional, Sparse, Fragmented, often Missing/Censored -Questions of interest are complex -Need to integrate Expert Knowledge Delivery Science Summit, Mayo Clinic, 2015

Why does colonoscopy sometimes fail to prevent colon cancer ? ACG/Olympus and ACG Presidential Award -Plenary talk at Digestive Disease Week (DDW), Joint work with Piet de Groen (Mayo Clinic) -Focuses on Endoscopist and Withdrawal Time -Makes use of Mayo Clinic Rochester Dataset Delivery Science Summit, Mayo Clinic, 2015 Gupta et al Case study

Delivery Science Summit, Mayo Clinic, 2015 Miss Rate for each Endoscopist 9.9% cancers are missed Varies among endoscopists Not related to experience or withdrawal time Truly Missed Probably missed Seen & Removed

Predicting Improvement of Mobility from Home Health Care Data Outcome and Assessment Information Set Data (OASIS) -Sample of 270,634 patient records ( 10/1/2008 – 12/31/2009) from 581 Medicare-certified, home healthcare agencies Diagnosis Codes (ICD-9) Diagnosis Codes (ICD-9) Admission Survey (OASIS) Admission Survey (OASIS) Discharge Survey (OASIS) Discharge Survey (OASIS) Home Healthcare Demographic, behavioral, pathological, psycho- social factors, outcome variables. Objectives - Identify patterns of factors associated with improvement and no improvement in mobility within each group Delivery Science Summit, Mayo Clinic, 2015 Case study

Patterns Associated with Improvement and No-Improvement in Mobility Outcome - The size of the circles represents the magnitude of individual ORs of each variable present in the patterns. - The larger the circle/node in the pattern, the more likely the variable is associated with the outcome Delivery Science Summit, Mayo Clinic, 2015 [Dey et al. Nur. res. 2014]

Genomics Data Driven by advances in high-throughput technologies Holds great promise for revolutionizing practice of medicine – Determining predisposition to a disease – Personalized medicine Delivery Science Summit, Mayo Clinic, 2015 SNP Gene Expression Protein Network Mass Spectrometry

NHGRI GWA Catalog Published Genome-Wide Associations through 12/2013 Published GWA at p≤5X10 -8 for 17 trait categories

Most have modest effect sizes Limited overall impact even when combined Marked disparity with Extent of overall familial aggregation Missing heritability Rare variations Combinatorial biomarkers Novel types of (epi)genetic variations Brendan Maher, 2008 Manolio et al Eichler et al McCarthy et al Manolio et al Missing heritability Delivery Science Summit, Mayo Clinic, 2015

12 Disqualified Prune all the supersets [Agrawal et al. 1994] Anti-monotonic upper bound Association Analysis for Discovering Interesting Combinations Delivery Science Summit, Mayo Clinic, 2015

Combinatorial markers for Lung Cancer Delivery Science Summit, Mayo Clinic, 2015 ≈ 60% ≈ 10% Selected for highlight talk, RECOMB SB 2010 Best Network Model award, Sage Congress, 2010 [Fang, Kuang, Pandey, Steinbach, Myers and Kumar, PSB 2010] Case study

Big Data in Human Neuroscience fMRI Data Diffusion Tensor Imaging EEG Data MEG Data Gene Exp. PET Data Increasing amount of neuroimaging data is available E.g., Single fMRI 20 min. scan is 6GB in size 8,000 MRI datasets available in public domain (Poldrack et al. 2014) Human Connectome Project (HCP) Alzheimer’s Disease Neuroimaging Initiative (ADNI) Healthcare questions How can we automate diagnosis using imaging data? How can we predict the disease in advance? How can we study effectiveness of treatment methodologies? Delivery Science Summit, Mayo Clinic, 2015

Brain Connectivity: Healthy vs. Disease Lynall et al fMRI scan time regions Atlas Brain Network 90 brain regions What are the key differences in networks between two groups? Delivery Science Summit, Mayo Clinic, 2015

Healthy vs. Disease – univariate testing edges schizophrenia Healthy ● Several studies have reported connections associated with schizophrenia in the last decade ● Are these reported connections consistent across studies? Correlation matrix regions Delivery Science Summit, Mayo Clinic, 2015

Inconsistent findings reported in Independent studies Delivery Science Summit, Mayo Clinic, 2015

A Big Data Approach edges Cluster edges cluster1 cluster2 cluster3 Test significance Strengths: Handles redundancy, small number of tests, statistical power Handles noise by averaging connectivity across multiple edges Could potentially increase reliability... subjects Delivery Science Summit, Mayo Clinic, 2015 Case study

A Big Data Approach edges Cluster edges cluster1 cluster2 cluster3 Test significance... subjects Clusters of connections Thalamus and primarily striate visual regions Thalamus and lateral visual regions Thalamus and lateral temporal regions Delivery Science Summit, Mayo Clinic, 2015 Case study

Studying Dynamics in Brain Networks Resting state connectivity Atluri et al. SDM 2014 Resting state vs. Watching cartoons Atluri et al Delivery Science Summit, Mayo Clinic, 2015

Studying Dynamics Dynamic Brain Connectivity Atluri et al. SDM 2014 Resting state vs. Watching cartoons Atluri et al Expeditions in Computing: Understanding Climate Change - A Data Driven Approach 5-year, $10 Million project Leverages the wealth of climate and ecosystem data Opportunities for cross-fertilization among diverse domains with space- time data such as Climate, Bio-images, Taxi data, Astronomy, Precision agriculture Delivery Science Summit, Mayo Clinic, 2015

mHealth Rapid growth in wearable devices market Collects a variety of variables: Respiration, Heartrate, Activity, Sleep, Glucose, Blood-pressure, Audio-visual samples Healthcare questions: – Can we predict psychological state (e.g. stress, anger)? – Can we assess lifestyle choices (e.g., smoking and diet)? – Can we study the relapse patterns in treating addiction? – Can we deliver telemedicine at the right time? Delivery Science Summit, Mayo Clinic, 2015

MD2K is one of 11 national NIH Big Data Centers of Excellence Part of Big-Data-to-Knowledge (BD2K) Initiative Collaborative effort between diverse areas Computer Science, Engineering, Medicine, Behavioral Science, and Statistics Advancing Biomedical Discovery and Improving Health through Mobile Sensor Big Data Autosense Project mConverse: Inferring conversations Rahman et al. WirelessHealth’11 mPuff: Detecting smoking episodes Ali et al. IPSN’12 mStress: Identifying stress Raij et al. WirelessHealth’11 Delivery Science Summit, Mayo Clinic, 2015

Conclusion Huge opportunities for application of Big Data to Healthcare Research Many Challenges – Data not in the form to be used by readily available big data technologies – Translating healthcare questions to data science questions – Many health specific data science questions require advances in big data analytics Delivery Science Summit, Mayo Clinic, 2015

References Atluri, G., Steinbach, M., Lim, K. O., Kumar, V., & MacDonald, A. (2015). Connectivity cluster analysis for discovering discriminative subnetworks in schizophrenia. Human brain mapping, 36(2), Atluri, G., Steinbach, M., Lim, K. O., MacDonald III, A., & Kumar, V. Discovering Groups of Time Series with Similar Behavior in Multiple Small Intervals of Time. SDM 2014 Atluri, G., Steinbach, M., Lim, K., MacDonald, A., & Kumar, V. (2014). Discovering the Longest Set of Distinct Maximal Correlated Intervals in Time Series Data. University of Minnesota, Tech-report , Rahman, M. M., Ali, A. A., Plarre, K., al'Absi, M., Ertin, E., & Kumar, S. (2011, October). mConverse: inferring conversation episodes from respiratory measurements collected in the field. In Proceedings of the 2nd Conference on Wireless Health (p. 10). ACM. Ali, A. A., Hossain, S. M., Hovsepian, K., Rahman, M. M., Plarre, K., & Kumar, S. (2012, April). mPuff: automated detection of cigarette smoking puffs from respiration measurements. In Proceedings of the 11th international conference on Information Processing in Sensor Networks (pp ). ACM. Raij, A., Blitz, P., Ali, A. A., Fisk, S., French, B., Mitra, S.,... & Smailagic, A. (2010). mstress: Supporting continuous collection of objective and subjective measures of psychosocial stress on mobile devices. ACM Wireless Health 2010 San Diego, California USA. Dey, Sanjoy, et al. "Mining Patterns Associated With Mobility Outcomes in Home Healthcare." Nursing research 64.4 (2015): Gupta, Rohit, et al. "284 Colorectal Cancer Despite Colonoscopy: Critical Is the Endoscopist, Not the Withdrawal Time." Gastroenterology (2009): A-55 Simon, Gyorgy J., et al. "Survival association rule mining towards type 2 diabetes risk assessment." AMIA Annual Symposium Proceedings. Vol American Medical Informatics Association, Li, Dingcheng, et al. "Using Association Rule Mining for Phenotype Extraction from Electronic Health Records." AMIA Summits on Translational Science Proceedings 2013 (2013): 142. Schrom, John R., et al. "Quantifying the effect of statin use in pre-diabetic phenotypes discovered through association rule mining." AMIA Annual Symposium Proceedings. Vol American Medical Informatics Association, Pandey, Gaurav, et al. "An association analysis approach to biclustering."Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, Kim, Hye Soon, et al. "Comorbidity study on type 2 diabetes mellitus using data mining." The Korean journal of internal medicine 27.2 (2012): Shin, A. Mi, et al. "Diagnostic analysis of patients with essential hypertension using association rule mining." Healthcare informatics research 16.2 (2010): Fang, Gang, et al. "Subspace differential coexpression analysis: problem definition and a general approach." Pacific symposium on biocomputing. Vol Delivery Science Summit, Mayo Clinic, 2015