The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington.

Slides:



Advertisements
Similar presentations
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Advertisements

Transcriptional regulation in Eukaryotes The regulatory elements of bacterial, yeast, and human genes.
Integrating Genomes D. R. Zerbino, B. Paten, D. Haussler Science 336, 179 (2012) Teacher: Professor Chao, Kun-Mao Speaker: Ho, Bin-Shenq June 4, 2012.
BLAST, PSI-BLAST and position- specific scoring matrices Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and.
Profile Hidden Markov Models Bioinformatics Fall-2004 Dr Webb Miller and Dr Claude Depamphilis Dhiraj Joshi Department of Computer Science and Engineering.
Molecular Evolution Revised 29/12/06
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Bioinformatics: A New Frontier for Computer Scientists Ruth G. Alscher Lenwood S. Heath.
Bioinformatics Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly Lecture 1 Introduction Aleppo University Faculty of technical engineering.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
This presentation was originally prepared by C. William Birky, Jr. Department of Ecology and Evolutionary Biology The University of Arizona It may be used.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Introduction to BioInformatics GCB/CIS535
Tutorial 2: Some problems in bioinformatics 1. Alignment pairs of sequences Database searching for sequences Multiple sequence alignment Protein classification.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
The Central Dogma of Molecular Biology (Things are not really this simple) Genetic information is stored in our DNA (~ 3 billion bp) The DNA of a.
. Computational Genomics Lecture #3a (revised 24/3/09) This class has been edited from Nir Friedman’s lecture which is available at
Bioinformatics Alternative splicing Multiple isoforms Exonic Splicing Enhancers (ESE) and Silencers (ESS) SpliceNest Lecture 13.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
“Multiple indexes and multiple alignments” Presenting:Siddharth Jonathan Scribing:Susan Tang DFLW:Neda Nategh Upcoming: 10/24:“Evolution of Multidomain.
Bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
Special Topics in Genomics Lecture 1: Introduction Instructor: Hongkai Ji Department of Biostatistics
Motif search and discovery Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington.
Statistical Bioinformatics QTL mapping Analysis of DNA sequence alignments Postgenomic data integration Systems biology.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Fine Structure and Analysis of Eukaryotic Genes
Multiple testing correction
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
Molecular Biology Primer. Starting 19 th century… Cellular biology: Cell as a fundamental building block 1850s+: ``DNA’’ was discovered by Friedrich Miescher.
Inferring phylogenetic trees: Maximum likelihood methods Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and.
Introduction to Bioinformatics Biostatistics & Medical Informatics 576 Computer Sciences 576 Fall 2008 Colin Dewey Dept. of Biostatistics & Medical Informatics.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Calculating branch lengths from distances. ABC A B C----- a b c.
REVIEW SESSION 5:30 PM Wednesday, September 15 5:30 PM SHANTZ 242 E.
Gene Expression and Networks. 2 Microarray Analysis Supervised Methods -Analysis of variance -Discriminate analysis -Support Vector Machine (SVM) Unsupervised.
An Overview of Clustering Methods Michael D. Kane, Ph.D.
Proteomics Session 1 Introduction. Some basic concepts in biology and biochemistry.
Central dogma: the story of life RNA DNA Protein.
Algorithms for Biological Sequence Analysis Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University,
Introduction to biological molecular networks
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Proteomics, the next step What does each protein do? Where is each protein located? What does each protein interact with, if anything? What role does it.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
PROTEIN INTERACTION NETWORK – INFERENCE TOOL DIVYA RAO CANDIDATE FOR MASTER OF SCIENCE IN BIOINFORMATICS ADVISOR: Dr. FILIPPO MENCZER CAPSTONE PROJECT.
HW7: Evolutionarily conserved segments ENCODE region 009 (beta-globin locus) Multiple alignment of human, dog, and mouse 2 states: neutral (fast-evolving),
BNFO 615 Fall 2016 Usman Roshan NJIT. Outline Machine learning for bioinformatics – Basic machine learning algorithms – Applications to bioinformatics.
Bioinformatics Overview
EQTLs.
CSCI2950-C Lecture 12 Networks
Pairwise sequence comparison
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
CSCI2950-C Genomes, Networks, and Cancer
Statistical Applications in Biology and Genetics
Genomes and Their Evolution
Bioinformatics: Buzzword or Discipline (???)
Inferring phylogenetic trees: Distance and maximum likelihood methods
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Schedule for the Afternoon
From Mendel to Genomics
Gautam Dey, Tobias Meyer  Cell Systems 
Presentation transcript:

The rest of bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

One-minute responses I always like it when we ask questions and you first say good question, even though the question is not good. I liked the lecture although the concepts were a bit advanced for me. I understood about 90% of everything. The Python is more challenging but it is good to get confused sometimes. Python was more interesting! The comprehension of Python is improved at 95%. Todays program (first one) was really challenging. I thought the second one was easier to understand. Python problem 3 was really challenging for me. The Python today was completely different from the rest and needed more time. Do your students at home write one-minute responses for the whole semester every day? – Yes. How did we discover the first mutation? – I am not sure I understand the question. We can observe mutations happening in microorganisms in the lab by sequencing their DNA from one generation to the next. Are you going to be readily available in future for consultations in case I get stuck? – Yes, you can always me at I do not think species are related because I believe in creation.

Outline Parsimony Distance methods – Computing distances – Finding the tree Maximum likelihood

Revision How do we compute the probability of observing this column, given this tree and an assumed model of evolution? ACGCGTTGGG ACGCAATGAA ACACAGGGAA T T AG Pr(column|tree,model) +

Revision We enumerate all possible assignments to the internal nodes, compute the probability of each tree, and sum. T T AGT T AGT T AG A A A A C A A G A

Revision How do we compute the probability of observing this column, given this assigned tree and an assumed model of evolution? ACGCGTTGGG ACGCAATGAA ACACAGGGAA T T AG Pr(column|tree,model) + T A A

Revision T T AG T A A π A, π C, π G, π T L0 L1L2 L3L4 L5 L6 We use our evolutionary model to assign a probability to each branch, and then take the product of the probabilities of the branches. L(tree) = L0 L1 L2 L3 L4 L5 L6

Revision In maximum likelihood estimation, are mutations that occur on branches of a single tree considered independent or mutually exclusive events? – Independent. What do different labelings of internal nodes of a tree represent? – Different possible evolutionary histories. Are the different labelings independent or mutually exclusive? – Mutually exclusive. Are the columns of a multiple alignment considered independent or mutually exclusive? – Independent

Maximum likelihood revisited for each possible tree for each column of the alignment for each assignment of internal nodes for each branch compute the probability of that branch assigned tree probability multiply branch probabilities column probability sum assigned tree probabilities tree probability multiply column probabilities return the tree with the highest probability

Sequence analysis tasks Protein structure prediction Remote homology detection Gene finding

Protein structure prediction Given: amino acid sequence Return: protein structure A complex of earthworm hemoglobin, comprised of 144 globin chains. Source: Protein Databank.

Remote homology detection The hidden Markov model generalizes the PSSM used by PSI-BLAST. The model is trained using expectation-maximization. M1M2M3M4M5M6M7M8 I1I2I3I4I5I6I7I8I0 D1D2D3D4D5D6D7D8 BE

Gene finding Pedersen and Hein, Bioinformatics 2003.

Mass spectrometry Spectrum identification Protein inference Biomarker discovery

EAMPK GDIFYPGYCPDVK LPLENENQGK ASVYNSFVSNGVK YVMTFK ENQGVVNR

Biological networks Functional networks Protein-protein interaction networks Metabolic networks Regulatory networks

Adai et al. JMB 340: (2004).

Protein-protein interactions Each node is a protein. Each edge is a physical interaction. Edges are measured via –Yeast two-hybrid –TAP tagging plus MS/MS Jeong et al. Nature

Regulatory networks Mammalian cell cycle. Colors represent different types of interactions –Black: binding –Red: covalent modifications and gene expression –Green: enzyme actions –Blue: stimulations and inhibitions Kohn. Mol Cell Biol. 1999

Metabolic networks Nodes are enzymes or metabolites. Edges represent interactions. This network represents the Arabidopsis TCA cycle.

Gene expression Clustering Predictive modeling Clinical applications

Gene expression matrix The matrix entry at (i, j) is the expression level of gene i in experiment j. Experiments Genes

Fibroblast gene clustering Cholesterol biosynthesis Cell cycle Immediate-early response Signaling and angiogenesis Wound healing and tissue remodeling Iyer et al. The transcriptional program in the response of human fibroblasts to serum. Science. 283:83-7, 1999.

Achieves >75% accuracy.

Next generation sequencing Next generation sequencing video

Spaced seed alignment Tags and tag-sized pieces of reference are cut into small seeds. Pairs of spaced seeds are stored in an index. Look up spaced seeds for each tag. For each hit, confirm the remaining positions. Report results to the user.

Burrows-Wheeler Store entire reference genome. Align tag base by base from the end. When tag is traversed, all active locations are reported. If no match is found, then back up and try a substitution.

Spliced-read mapping Used for processed mRNA data. Reports reads that span introns. Examples: TopHat, ERANGE

Beyond the genome Epigenetics Chromatin state assignment Genome 3D architecture

Next generation assays ENCODE Project Consortium PLoS Biol 9:e

Rediscovering genes

Population genetics Genotype to phenotype Human disease genetics Population history

jbiol.com Human migrations

Other topics Natural language processing Image analysis