Big Data and the Life Sciences

Slides:



Advertisements
Similar presentations
“BIG DATA” from RNA-Seq Experiments. Significance of RNA-Seq Approaches  Reveals which genes are expressed and the levels at which they are expressed;
Advertisements

Huge Raw Data Cleaning Data Condensation Dimensionality Reduction Data Wrapping/ Description Machine Learning Classification Clustering Rule Generation.
Automatic in vivo Microscopy Video Mining for Leukocytes * Chengcui Zhang, Wei-Bang Chen, Lin Yang, Xin Chen, John K. Johnstone.
Introduction to Data Science Kamal Al Nasr, Matthew Hayes and Jean-Claude Pedjeu Computer Science and Mathematical Sciences College of Engineering Tennessee.
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
Last Words COSC Big Data (frameworks and environments to analyze big datasets) has become a hot topic; it is a mixture of data analysis, data mining,
Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University.
Four Types of Decisions (p p.130) Structured vs. Nonstructured(Examples?) –Structured: Follow rules and criteria. The right answer exists. No “feel”
Big Data and Large Scale Data Analysis Andrew Mead School of Life Sciences 23 rd October 2013.
Big Data Analytics Bigger & Better Opportunity for LIS Professionals Invited presentation prepared for NACLIN th November 2015 Ananda T. Byrappa.
Big Data – Big Opportunity Mohammad Khansari ITRC President Jan 2015 ITRC, Tehran, Iran.
© 2012 IBM Corporation Converting Big Data into Big Knowledge.
SUPPLY CHAIN OF BIG DATA. WHAT IS BIG DATA?  A lot of data  Too much data for traditional methods  The 3Vs  Volume  Velocity  Variety.
Amar K. Das, MD, PhD Associate Professor of Biomedical Data Science, Psychiatry and Health Policy & Clinical Practice Geisel School of Medicine at Dartmouth.
What is Big Data? Dr Alasdair Rutherford University of Stirling.
Data Science Interview Questions 1.What do you mean by word Data Science? Data Science is the extraction of knowledge from large.
Am I my connectome? Statistical issues in functional connectomics Brian Caffo, PhD Department of Statistics at National Cheng-Kung University, Taiwan 2015.
1 Chapter 1 Section 1 Notes NatureOfScience. 2 What is Science?  A way or process used to investigate what is happening around you.  Not New  Early.
Why Intelligent Data Analysis? Joost N. Kok Leiden Institute of Advanced Computer Science Universiteit Leiden.
Vipin Kumar Regents Professor and William Norris Chair in Large Scale Computing Research interests – Data mining, – high-performance computing, and – their.
BLO #3:Discuss how and why particular research methods are used at the biological level of analysis.
Big Data-An Analysis. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult.
Data Analytics 1 - THE HISTORY AND CONCEPTS OF DATA ANALYTICS
Healthcare Science careers
4/19/ :02 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Database management system Data analytics system:
Fundamentals of Information Systems, Sixth Edition
Areas of Research Xia Jiang Associate Professor of
Journal Club M Havaei, et al. Université de Sherbrooke, Canada
An Artificial Intelligence Approach to Precision Oncology
Intro to Machine Learning
Gregory Cooper Professor of Biomedical Informatics Director, Center for Causal Discovery Vice Chair Research, Department of Biomedical Informatics.
School of Computer Science & Engineering
It’s All About Me From Big Data Models to Personalized Experience
INFORMATION COMPRESSION, MULTIPLE ALIGNMENT, AND INTELLIGENCE
Done Done Course Overview What is AI? What are the Major Challenges?
Graph Frequency Analysis of Learning Progression
Introductory Seminar on Research: Fall 2017
Moo K. Chung1,3, Kim M. Dalton3, Richard J. Davidson2,3
CHAPTER 1 The Science of Life.
Introduction Data Mining for Business Analytics.
Data Science Process Chapter 2 Rich's Training 11/13/2018.
Fenglong Ma1, Jing Gao1, Qiuling Suo1
What is Pattern Recognition?
Partial Differential Equations for Data Compression and Encryption
Session 2: The Biology of the Brain
Areas of Research Xia Jiang Assistant Professor
Gregory Cooper Professor of Biomedical Informatics Director, Center for Causal Discovery Vice Chair, Department of Biomedical Informatics Research involves.
Connectionist Knowledge Representation and Reasoning
Model-based Symmetric Information Theoretic Large Deformation
Overview of Machine Learning
Introduction to Neural Networks And Their Applications - Basics
3.1.1 Introduction to Machine Learning
Data Science in Industry
Spatial Information and Urban Analytics for Smart City Policy.
Hubs and Authorities & Learning: Perceptrons
M/EEG Statistical Analysis & Source Localization
Anatomical Measures John Ashburner
Wellcome Centre for Neuroimaging at UCL
III. Introduction to Neural Networks And Their Applications - Basics
Discovery of Hidden Structure in High-Dimensional Data
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Machine Learning for Space Systems: Are We Ready?
Design of Experiments CHM 585 Chapter 15.
Models of the brain hardware
Is Statistics=Data Science
Big Data.
Presentation transcript:

Big Data and the Life Sciences Joshua T. Vogelstein, James Taylor, Elana Fertig, Alexis Battle

What is Big Data Science for the Life Sciences? A field that develops, applies and combines big data systems and machine learning / artificial intelligence with life science domain knowledge to improve health and healthcare.

Why is it hard? Each stage of the data analysis workflow is broken by big data: Volume - each genome is 1B bytes Velocity - can generate TBs per day Variety - images, questionnaires, text, ‘omics, time-series Veracity - lots of noise

Why is it hard? Life sciences introduce a number of additional challenges Wide: sample size is ~10, dimensionality is ~109 Multiscale: molecules, cells, tissues, organs, individuals, populations Interpretable: doctors, scientists, insurers must understand

NeuroData Joshua T. Vogelstein http://neurodata.io

NeuroData Neural coding is about brain activity & info. representation Connectal coding is about brain connectivity & info. storage Connectopathies underly psychiatric & learning disorders Depression is leading cause of disability worldwide NeuroData Vogelstein et al., Current Opinion in Neurobiology (2019) http://neurodata.io

Storing Extended SDSS database Support 3D + multi-spectral data Now host the largest open source brain image repo in the world Storing Vogelstein et al., Nature Methods, 2018 (https://neurodata.io/ndcloud/) Vogelstein et al., Nature Methods (2018) http://neurodata.io/ndcloud/

Wrangling 10 TB raw image volume Bias correct, stitch, register, align Multi-contrast nonlinear deformation Wrangling Kutten et al., MICCAI (2017) http://neurodata.io/ndreg/

Exploring Often faced with high-dimensional multi-modal data Desire to understand which variables are nonlinearly associated Juxtapose harmonic analysis with manifold learning Exploring Vogelstein et al., eLife (2019) http://neurodata.io/mgc/

Modeling Larval Drosophila Mushroom Body Connectome Network valued data Parsimonious models enable estimation & testing Modeling Athreya et al., JMLR (2018) http://neurodata.io/graspy/

Predicting Must predict class or magnitude from high-dimensional data Requires interpretable predictions Juxtapose random matrix theory and decision forests Predicting Tomita et al., arXiv (2018) http://neurodata.io/rerf/

Acknowledgements