Bioinformatics Brad Windle Ph# 628-1956 Web Site:

Slides:



Advertisements
Similar presentations
Analysis of Microarray Genomic Data of Breast Cancer Patients Hui Liu, MS candidate Department of statistics Prof. Eric Suess, faculty mentor Department.
Advertisements

Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
Test-tube or keyboard? Computation in the life sciences.
Basic Gene Expression Data Analysis--Clustering
DNA Damage and Repair. = ??? It is well-known that DNA can be damaged by radiation. However, DNA is routinely damaged by oxidative stress of normal cellular.
Review of main points from last week Medical costs escalating largely due to new technology This is an ethical/social problem with major conseq. Many new.
Yan Guo Assistant Professor Department of Cancer Biology Vanderbilt University USA.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
CAVEAT 1 MICROARRAY EXPERIMENTS ARE EXPENSIVE AND COMPLICATED. MICROARRAY EXPERIMENTS ARE THE STARTING POINT FOR RESEARCH. MICROARRAY EXPERIMENTS CANNOT.
Collaborative Information Management: Advanced Information Processing in Bioinformatics Joost N. Kok LIACS - Leiden Institute of Advanced Computer Science.
9 Genomics and Beyond Brief Chapter Outline
Microarrays Dr Peter Smooker,
By Russell Armstrong Supervisor Mrs Wei Ji Diagnosis Analysis of Lung Cancer by Genome Expression Profiles.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
An Introduction to DNA Microarrays Jack Newton University of Alberta
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
Analysis of microarray data
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Analysis and Management of Microarray Data Dr G. P. S. Raghava.
From motif search to gene expression analysis
Chapter 13. The Impact of Genomics on Antimicrobial Drug Discovery and Toxicology CBBL - Young-sik Sohn-
CS 790 – Bioinformatics Introduction and overview.
A New Oklahoma Bioinformatics Company. Microarray and Bioinformatics.
Finish up array applications Move on to proteomics Protein microarrays.
Microarrays.
Microarrays and Their Uses Brad Windle, Ph.D
What is Genetic Research?. Genetic Research Deals with Inherited Traits DNA Isolation Use bioinformatics to Research differences in DNA Genetic researchers.
Chapter 13 Table of Contents Section 1 DNA Technology
Ptree * -based Approach to Mining Gene Expression Data Fei Pan 1, Xin Hu 2, William Perrizo 1 1. Dept. Computer Science, 2. Dept. Pharmaceutical Science,
Microarray data analysis David A. McClellan, Ph.D. Introduction to Bioinformatics Brigham Young University Dept. Integrative Biology.
Integrating the Bioinformatic Technology Group into your research programme Introduction People and Skills Examples Integrating the BTG Contacts BHRC Away.
Cellular Profiles Exploring gene expression profile patterns Pathways, Profiles and Predictions Brad Windle Associate Professor of Medicinal Chemistry.
Mining Biological Data. Protein Enzymatic ProteinsTransport ProteinsRegulatory Proteins Storage ProteinsHormonal ProteinsReceptor Proteins.
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
BIOLOGICAL DATABASES. BIOLOGICAL DATA Bioinformatics is the science of Storing, Extracting, Organizing, Analyzing, and Interpreting information in biological.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
Statistical Testing with Genes Saurabh Sinha CS 466.
Clustering Algorithms to make sense of Microarray data: Systems Analyses in Biology Doug Welsh and Brian Davis BioQuest Workshop Beloit Wisconsin, June.
Bioinformatics and Computational Biology
Analyzing Expression Data: Clustering and Stats Chapter 16.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 8- 1.
Brad Windle, Ph.D Unsupervised Learning and Microarrays Web Site: Link to Courses and.
Integrated Genomic and Proteomic Analyses of a Systematically Perturbed Metabolic Network Science, Vol 292, Issue 5518, , 4 May 2001.
Gene expression. Gene Expression 2 protein RNA DNA.
Pan-cancer analysis of prognostic genes Jordan Anaya Omnes Res, In this study I have used publicly available clinical and.
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
The Future of Genetics Research Lesson 7. Human Genome Project 13 year project to sequence human genome and other species (fruit fly, mice yeast, nematodes,
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Notes: Human Genome (Right side page)
Human Genomics Higher Human Biology. Learning Intentions Explain what is meant by human genomics State that bioinformatics can be used to identify DNA.
Chapter 13 Section 13.3 The Human Genome. Genomes contain all the information needed for an organism to grow and survive The Human Genome Project (HGP)
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Looking Within Human Genome King abdulaziz university Dr. Nisreen R Tashkandy GENOMICS ; THE PIG PICTURE.
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Gene expression.
Parimal Samir1, Rahul2, James C. Slaughter3, Andrew J. Link1,4,5, *
APPLICATIONS OF BIOINFORMATICS IN DRUG DISCOVERY
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
In these studies, expression levels are viewed as quantitative traits, and gene expression phenotypes are mapped to particular genomic loci by combining.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Global analysis of the chemical–genetic interaction map.
Presentation transcript:

Bioinformatics Brad Windle Ph# Web Site: Click on Link to MEDC 310 course Or

Profiling

The term "bioinformatics" is about 15 years old. It covers a variety of data analyses that include: DNA and protein sequence analysis Biological analysis of drugs, can overlap with chemoinformatics Genetics Taxonomy Clinical data statistics Genomic and proteomic research Bioinformatics is sometimes equated to the term "data mining", which is commonly used in e-business and internet data handling.

Chemoinformatics Chemoinformatics has a special challenge in that a structure of a compound or drug needs to be quantified. Specific structures are characterized by molecular descriptors useful in Quantitative Structure Activity Relationship (QSAR) modeling. QSAR tells you what about the structure of a drug that makes it do what it does. Much of this information has implications on what a drug will do in a cell. However, the complexity of a cell makes the reality of what a drug does in the cell deviate significantly from what is anticipated based on chemistry and enzymatic assays. This stresses the need for characterizing drugs based on more biological data.

Analogies for looking for patterns Looking at patterns in images

A mixture of many patterns We need to identify individual patterns

There are methods for extracting the patterns from the data

There is also noise tht obscures the patterns

One method for identifying object patterns of interest amidst the noise

Another method for identifying different object patterns of interest amidst the noise

This is what was actually buried in the noise

Questions?

Philosophy of Science Reductionist Approach (Reductionism) VS Systems Approach (Systemism)

Reductionist

Systems Approach

How Does a Cell, or Person Respond to Therapy or a Drug? Treat 10 people suffering from Disease A with Drug X. 2 people suffer adverse reactions 3 exhibit good recovery from disease 2 exhibit modest recovery from disease 3 exhibit no sign of recovery from disease

What Factors Cause in Differences Between People? Genes and their sequence Health-wise Disease Health-related Traits Response to Drugs

What Are the Differences in Genes? Single nucleotide polymorphisms (SNPs) SerSerIleAsnGlyGlnLeuArgPro AGTTCTATAAATGGCCAGCTTAGACCT TCAAGATATTTACCGGTCGAATCTGGA SerSerIleHisGlyGlnIleArgPro AGTTCTATACATGGCCAGATTAGACCA TCAAGATATGTACCGGTCTAATCTGGT

How does a difference in a gene affect drug response? Transport of the drug Metabolism of the drug Interaction with the drug target

5 Million SNPs Let’s say there are 10 SNPs that contribute to response to Drug X Combinatorial approach to identifying SNPs that correlate with drug response All combinations = Narrow SNPs down to those within genes to 100,000 Combinations = 10 43

Traveling Salesman Problem

SNPs thus far described were inherited, affecting the quality of proteins What about differences between people that are somatic? What about quantitative differences in proteins?

Differences in Protein Expression and Gene Expression 20,0000 genes - Genomics 100,000 proteins - Proteomics

In genomics and proteomics research, the data is extensive and the patterns complex. The emphasis shifts from asking specific questions or testing hypotheses to trying to filter out the most significant observation the data offers. Bioinformatics and Data Mining in general use two forms of learning: Supervised learning is the process of learning by example: Use example patterns with known characteristics to learn and predict characteristics for the unknown This is essentially the modeling process Unsupervised learning and Supervised learning

Unsupervised learning is the learning by observation and exploratory data analysis is a general form Let the data reveal prominent patterns and associations, you don’t look for specific patterns Exploratory data analysis is used when there is no hypothesis to test, or when there is no specific pattern expected. This type of analysis shows the most significant pattern or trends within the data; it does not imply biologically or statistical significant. Cluster analysis is a popular form of exploratory data analysis.

Cluster analysis sorts whatever is being analyzed into clusters with the greatest similarities in trend or pattern. It is a form of non-descriptive statistics and exploratory data analysis. A dendrogram or tree diagram is used to present the results. Below is an example of a dendrogram for bacterial species of Escherichia.

New technology= lots of data

Microarray Technology DNA Microarray Cell 1’s mRNA Cell 2’s mRNA

Pseudo-colored MicroarraySpots

The total intensity for each spot is summed and the values plotted on a scatterplot. A scatterplot of 2000 points is shown. Each point respresents a gene.

Cluster analysis methods The most straightforward methods involve calculating the Euclidean (Euclid) distance between two points, for all combinations of points. Pythagorean Theorem

If we perform cluster analysis on the 2000 points, we can see that we have one giant cluster with a handful of outliers.

Adding Dimensions to Cluster Analysis

The distance calculation would be: Thus, while we can't visualize more than three dimensions, the computer can perform cluster analysis on as many dimensions imaginable or as processing time allows.

Pearson Correlation Coefficient

Two-fold Cluster Analysis Gene expression analysis in drug development can involve a large number of genes and a large number of drugs. It is not only important to identify what genes cluster together, but also what drugs cluster. This is done by two-fold cluster analysis. The genes are arranged and clustered as well as the drugs. The drugs that illicit similar gene expression patterns will cluster. Both clusters can be viewed in a single 2-D dendrogram.

Questions?

Cluster Tree of cell lines

Classifying Cancer Using supervised learning, models have been developed Classifying different subsets of cancers that the pathologist can’t Predicting response to therapy and patient prognosis

Any kind of data can be explored

Cell response profile Monks et al. Anti-Cancer Drug Design 12:553 (1997)

Drug clusters correspond to drug targets or mechanisms of action not necessarily drug structure. Scherf et al, nature genetics 24:236 (2000)

Exploratory Tools allows us to focus on what most relevant based on the data And developed relevant hypotheses For example Geldanamycin is cytotoxic through inhibition of microtubules

The End Any Questions?