Overview of Biomedical Informatics

Slides:



Advertisements
Similar presentations
Analysis of Microarray Genomic Data of Breast Cancer Patients Hui Liu, MS candidate Department of statistics Prof. Eric Suess, faculty mentor Department.
Advertisements

NISS Metabolomics Workshop, Integrative Analysis of High Dimensional Gene Expression, Metabolite and Blood Chemistry Data Kwan R. Lee, Ph.D. and.
An Association Analysis Approach to Biclustering website:
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
Bioinformatics at IU - Ketan Mane. Bioinformatics at IU What is Bioinformatics? Bioinformatics is the study of the inherent structure of biological information.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Introduction Lecture Notes for Chapter 1 Introduction to Data Mining by Tan,
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
Subspace Differential Coexpression Analysis for the Discovery of Disease-related Dysregulations Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach,
3 rd Summer School in Computational Biology September 10, 2014 Frank Emmert-Streib & Salissou Moutari Computational Biology and Machine Learning Laboratory.
Office hours Wednesday 3-4pm 304A Stanley Hall Review session 5pm Thursday, Dec. 11 GPB100.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Luděk Bláha, PřF MU, RECETOX BIOMARKERS AND TOXICITY MECHANISMS 13 – BIOMARKERS Summary and final notes.
Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Tang: Introduction to Data Mining (with modification by Ch. Eick) I: Introduction to Data Mining A.Short Preview 1.Initial Definition of Data Mining 2.Motivation.
CceHUB A Knowledge Discovery Environment for Cancer Care Engineering Research Ann Christine Catlin HUBzero Workshop November 7, 2008.
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Bioinformatics and medicine: Are we meeting the challenge?
1 1 Slide Introduction to Data Mining and Business Intelligence.
The Impact of Big Data on Health Science Research Vipin Kumar University of Minnesota Delivery Science Summit, Mayo.
Knowledge Discovery and Data Mining Evgueni Smirnov.
A New Oklahoma Bioinformatics Company. Microarray and Bioinformatics.
Finish up array applications Move on to proteomics Protein microarrays.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Introduction to Databases Vetle I. Torvik. DNA was the 20 th century - Databases are the 21 st century 4 Quantum leaps in the evolution of human brain.
Agent-based methods for translational cancer multilevel modelling Sylvia Nagl PhD Cancer Systems Science & Biomedical Informatics UCL Cancer Institute.
Construction of cancer pathways for personalized medicine | Presented By Date Construction of cancer pathways for personalized medicine Predictive, Preventive.
Harbin Institute of Technology Computer Science and Bioinformatics Wang Yadong Second US-China Computer Science Leadership Summit.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Data Mining: Introduction Lecture Notes for Chapter 1 Introduction to Data Mining by Tan, Steinbach, Kumar 9/4/20071 Introduction to Data Mining Tan, Steinbach,
1 What is Data Mining? l Data mining is the process of automatically discovering useful information in large data repositories. l There are many other.
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
High throughput biology data management and data intensive computing drivers George Michaels.
Data Deluge Challenges and Opportunities Vipin Kumar University of Minnesota Infosys Aurora – August 2011.
Introduction to Data Mining Mining Association Rules Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Vipin Kumar Regents Professor and William Norris Chair in Large Scale Computing Research interests – Data mining, – high-performance computing, and – their.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Kelci J. Miclaus, PhD Advanced Analytics R&D Manager JMP Life Sciences
David Amar, Tom Hait, and Ron Shamir
Biomedical Data Science for Precision Medicine
Data Mining Motivation: “Necessity is the Mother of Invention”
MIS2502: Data Analytics Advanced Analytics - Introduction
KnowEnG: A SCALABLE KNOWLEDGE ENGINE FOR LARGE SCALE GENOMIC DATA
Statistics 202: Statistical Aspects of Data Mining
STRING Large-scale data and text mining
Gene expression.
Introduction to Bioinformatics February 13, 2017
Techniques for Finding Patterns in Large Amounts of Data: Applications in Biology Vipin Kumar William Norris Professor and Head, Department of Computer.
생물정보학 Bioinformatics.
Data Mining Techniques For Correlating Phenotypic Expressions With Genomic and Medical Characteristics This work has been supported by DTC, IBM and NSF.
William Norris Professor and Head, Department of Computer Science
Dept of Biomedical Informatics University of Pittsburgh
Data Mining for Biomedical Informatics
Association Analysis Techniques for Bioinformatics Problems
Data Mining: Introduction
Sangeeta Devadiga CS 157B, Spring 2007
Data Warehousing and Data Mining
Lixia Yao, James A. Evans, Andrey Rzhetsky  Trends in Biotechnology 
Biomedical Data Science for Precision Medicine
Benjamin Wooden, Nicolas Goossens, Yujin Hoshida, Scott L. Friedman 
Discriminative Pattern Mining
Standards Development for Metabolomics
Data Mining: Introduction
Volume 5, Issue 6, Pages e3 (December 2017)
Data Mining for Finding Connections of Disease and Medical and Genomic Characteristics Vipin Kumar William Norris Professor and Head, Department of Computer.
Presentation transcript:

Overview of Biomedical Informatics Vipin Kumar University of Minnesota kumar@cs.umn.edu www.cs.umn.edu/~kumar Team Members: Michael Steinbach, Rohit Gupta, Gowtham Atluri, Gang Fang, Gaurav Pandey, Sanjoy Dey, Vanja Paunic Collaborators: Brian Van Ness, Bill Oetting, Gary L. Nelsestuen, Christine Wendt, Piet C. de Groen, Michael Wilson Research Supported by NSF, IBM, BICB-UMR, Pfizer Nov 12th, 2009 Understanding Biotechnology – The Science of the ‘Omics’ 1

Biomedical Informatics Recent technological advances are helping to generate large amounts of biomedical data Data from high-throughput experimental techniques Gene expression data Biological networks Proteomics and metabolomics data Single Nucleotides Polymorphism (SNP) data Electronic Medical Records IBM-Mayo clinic partnership has created a DB of 5 million patients Great potential benefits from the analysis of these large-scale data sets: Automated analysis of patients history for customized treatment Discovery of biomarkers for complex diseases and other phenotypes Cheminformatics and drug discovery 2 2

Large-scale Data is Everywhere! There has been enormous data growth in both commercial and scientific databases due to advances in data generation and collection technologies New mantra Gather whatever data you can whenever and wherever possible. Expectations Gathered data will have value either for the purpose collected or for a purpose not envisioned. Homeland Security Business Data Geo-spatial data Computational Simulations Sensor Networks Scientific Data

Data Mining Data Automated techniques for analyzing large data sets. Draws ideas from machine learning/AI, pattern recognition, statistics, and database systems. Predictive Modeling Clustering Association Rules Anomaly Detection Milk Data 4

Model for predicting credit worthiness Predictive Modeling: Classification Find a model for class attribute as a function of the values of other attributes Model for predicting credit worthiness Class

Discovering biomarkers Gene Expression Data Given: n labeled subjects, each with expression levels of p genes Objectives: build a predictive model to identify cancer subtypes Genes Classical study of cancer subtypes Golub et al. (1999) identification of diagnostic genes SNP Data Given: n labeled subjects, each with genotypes of p SNPs Objectives: build a model using genotypes to predict labels. SNP 1 SNP 2 SNP 3 …….. ……. Class Patient 1 AC GT AA 1 Patient 2 GG ……… .. Patient n CC AG

Predicting short-term vs. long-term survivors among myeloma subjects 3404 SNPs (Selected according to potential relevance to Myeloma) Cases: 70 Patients who survived shorter than 1 year Controls: 73 Patients survived longer than 3 years SNPs cases Brian Van Ness et al, Genomic Variation in Myeloma: Design, content and initial application of the Bank On A Cure SNP Panel to detect associations with progression free survival, BMC Medicine, Volume 6, pp 26, 2008. controls

Clustering Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups Applications: Finding groups of similar genes or proteins based upon their expression profiles Clustering of patients based on phenotypic and genotypic factors for efficient disease diagnosis Market Segmentation Document Clustering Courtesy: Michael Eisen Michael Eisen et al, 1999 8

Association Pattern Discovery Given a set of records each of which contain some number of items from a given collection; Produce dependency rules which will predict occurrence of an item based on occurrences of other items. Biological applications Identifying functional modules in protein interaction networks Identifying transcription modules in gene expression data Identifying biological entities associated with disease phenotypes Biomarker discovery from genomic data, e.g. gene expression, Single-nucleotide polymorphism(SNP), metabolite data etc. Rules Discovered: {Milk} --> {Coke} {Diaper, Milk} --> {Beer}

Discovery of Discriminative Patterns from Lung Cancer Gene Expression Data 67 Normal samples, 102 cancer patients, 8787 genes [Stearman et al. 2005], [Su et al. 2007], [Bhattacharjee et al. 2001] Visualization of a size-10 pattern using a new discriminative pattern finding technique Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach, Chad L. Myers and Vipin Kumar, Subspace Differential Coexpression  Analysis: Problem Definition and A General Approach, In the Proceedings of the 15th Pacific Symposium on Biocomputing (PSB), pp. 145-156, 2010. Enriched with the TNF/NFkB signaling pathway which is well-known to be related to lung cancer P-value: 1.4*10-5 (6/10 overlap with the pathway)

Discriminative Metabolite Patterns from Liver Cirrhosis Data 41 alcoholic liver cirrhosis (row 1-41), 19 controls (row 42-60), 3610 metabolites Data from Gary Nelsestuen et al. A sample group of five metabolites having very similar (in relative terms) intensity values in cases, but mostly absent in controls. (a) The rank values (black is 10, white is 0), (b) original intensity values. Gaurav Pandey, Gowtham Atluri, Michael Steinbach, Chad L. Myers and Vipin Kumar, An Association Analysis Approach to Biclustering, Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), 677-686, 2009. (a) (b)

Summary Data mining techniques hold great promise for data-driven hypothesis generation in the biomedical domain. Ample scope exists for the development and application of novel techniques for the analysis of different types of biomedical data.

For further information… Visit www.cs.umn.edu/~kumar/dmbio. Send email to kumar@cs.umn.edu. Pang-Ning Tan, Michael Steinbach and Vipin Kumar, Introduction to Data Mining, Addison-Wesley, 2005.