Computational Genomics Lecture 1, Tuesday April 1, 2003.

Slides:



Advertisements
Similar presentations
Sequence Alignment I Lecture #2
Advertisements

CS 5263 Bioinformatics Lecture 3: Dynamic Programming and Global Sequence Alignment.
Sequence allignement 1 Chitta Baral. Sequences and Sequence allignment Two main kind of sequences –Sequence of base pairs in DNA molecules (A+T+C+G)*
Rapid Global Alignments How to align genomic sequences in (more or less) linear time.
Sequence Alignment.
Welcome to CS262: Computational Genomics Instructor: Serafim Batzoglou TAs: Eugene Davydov Christina Pop Monday & Wednesday.
Welcome to CS262: Computational Genomics Instructor: Serafim Batzoglou TAs: Marc Schaub Andreas Sundquist Monday & Wednesday.
Lecture 6, Thursday April 17, 2003
Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.
Genomic Sequence Alignment. Overview Dynamic programming & the Needleman-Wunsch algorithm Local alignment—BLAST Fast global alignment Multiple sequence.
Welcome to CS262!. Goals of this course Introduction to Computational Biology  Basic biology for computer scientists  Breadth: mention many topics &
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Sequence Alignment Algorithms in Computational Biology Spring 2006 Edited by Itai Sharon Most slides have been created and edited by Nir Friedman, Dan.
CS 5263 Bioinformatics Lecture 5: Affine Gap Penalties.
Sequence Alignment. Scoring Function Sequence edits: AGGCCTC  MutationsAGGACTC  InsertionsAGGGCCTC  DeletionsAGG. CTC Scoring Function: Match: +m Mismatch:
Sequence Alignment. CS262 Lecture 2, Win06, Batzoglou Complete DNA Sequences More than 300 complete genomes have been sequenced.
CS 262 Discussion Section 1. Purpose of discussion sections To clarify difficulties/ambiguities in the problem set questions and lecture material. To.
Large-Scale Global Alignments Multiple Alignments Lecture 10, Thursday May 1, 2003.
Sequence Alignment Cont’d. Sequence Alignment -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC Definition Given two strings.
Reminder -Structure of a genome Human 3x10 9 bp Genome: ~30,000 genes ~200,000 exons ~23 Mb coding ~15 Mb noncoding pre-mRNA transcription splicing translation.
Sequence Alignment.
Sequence Alignment. Before we start, administrivia Instructor: Serafim Batzoglou, CS x Office hours: Monday 2:00-3:30 TA:
Welcome to CS262: Computational Genomics Instructor: Serafim Batzoglou TAs: Marc Schaub Andreas Sundquist Monday & Wednesday.
Comparative ab initio prediction of gene structures using pair HMMs
Alignments and Comparative Genomics. Welcome to CS374! Today: Serafim: Alignments and Comparative Genomics Omkar: Administrivia.
Sequence Alignment Cont’d. Evolution Scoring Function Sequence edits: AGGCCTC  Mutations AGGACTC  Insertions AGGGCCTC  Deletions AGG.CTC Scoring Function:
Multiple Sequence Alignments. Lecture 12, Tuesday May 13, 2003 Reading Durbin’s book: Chapter Gusfield’s book: Chapter 14.1, 14.2, 14.5,
Welcome to CS262: Computational Genomics Instructor: Serafim Batzoglou TAs: George Asimenos Andreas Sundquist Tuesday&Thursday 2:45-4:00 Skilling Auditorium.
CS 6293 Advanced Topics: Current Bioinformatics Lectures 3-4: Pair-wise Sequence Alignment.
Welcome to CS262: Computational Genomics Instructor: Serafim Batzoglou TAs: Eugene Davydov Christina Pop Monday & Wednesday.
Sequence Alignment Slides courtesy of Serafim Batzoglou, Stanford Univ.
Implementation of Planted Motif Search Algorithms PMS1 and PMS2 Clifford Locke BioGrid REU, Summer 2008 Department of Computer Science and Engineering.
BNFO 235 Lecture 5 Usman Roshan. What we have done to date Basic Perl –Data types: numbers, strings, arrays, and hashes –Control structures: If-else,
. Computational Genomics Lecture #3a (revised 24/3/09) This class has been edited from Nir Friedman’s lecture which is available at
Phylogenetic Shadowing Daniel L. Ong. March 9, 2005RUGS, UC Berkeley2 Abstract The human genome contains about 3 billion base pairs! Algorithms to analyze.
Computational Genomics Lecture 1, Tuesday April 1, 2003.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Sequence Alignment Lecture 2, Thursday April 3, 2003.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
Sequence Alignment. -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC Given two strings x = x 1 x 2...x M, y = y 1 y 2 …y N,
Class 2: Basic Sequence Alignment
Variants of HMMs. Higher-order HMMs How do we model “memory” larger than one time point? P(  i+1 = l |  i = k)a kl P(  i+1 = l |  i = k,  i -1 =
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Developing Pairwise Sequence Alignment Algorithms
Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.
Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography.
CS 5263 Bioinformatics Lecture 4: Global Sequence Alignment Algorithms.
Sequence Alignment. 2 Sequence Comparison Much of bioinformatics involves sequences u DNA sequences u RNA sequences u Protein sequences We can think of.
BIOMETRICS Module Code: CA641 Week 11- Pairwise Sequence Alignment.
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
. EM with Many Random Variables Another Example of EM Sequence Alignment via HMM Lecture # 10 This class has been edited from Nir Friedman’s lecture. Changes.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Minimum Edit Distance Definition of Minimum Edit Distance.
Overview of Bioinformatics 1 Module Denis Manley..
AdvancedBioinformatics Biostatistics & Medical Informatics 776 Computer Sciences 776 Spring 2002 Mark Craven Dept. of Biostatistics & Medical Informatics.
Advanced Algorithms and Models for Computational Biology -- a machine learning approach Computational Genomics I: Sequence Alignment Eric Xing Lecture.
Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor.
Dynamic Programming Presenters: Michal Karpinski Eric Hoffstetter.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Sequence Alignment Tanya Berger-Wolf CS502: Algorithms in Computational Biology January 25, 2011.
Sequence Similarity.
CS 5263 Bioinformatics Lecture 3: Dynamic Programming and Sequence Alignment.
4.2 - Algorithms Sébastien Lemieux Elitra Canada Ltd.
1 Sequence Alignment -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC Definition Given two strings x = x 1 x 2...x M, y = y.
CS502: Algorithms in Computational Biology
CSCI2950-C Genomes, Networks, and Cancer
Definition of Minimum Edit Distance
Pairwise sequence Alignment.
Intro to Alignment Algorithms: Global and Local
Presentation transcript:

Computational Genomics Lecture 1, Tuesday April 1, 2003

Biology in One Slide

Lecture 1, Tuesday April 1, 2003 High Throughput Biology 1.DNA Sequencing …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGACTACGTTTTA TATATATATACGTCGTCGT ACTGATGACTAGATTACAG ACTGATTTAGATACCTGAC TGATTTTAAAAAAATATT…

Lecture 1, Tuesday April 1, 2003 High Throughput Biology 2.Sequencing of expressed genes (EST sequencing) mRNA sequence protein sequence

Lecture 1, Tuesday April 1, 2003 High Throughput Biology 3.Gene Expression: Microarrays

Lecture 1, Tuesday April 1, 2003 High Throughput Biology 4.Gene Regulation: CH.IP.

Lecture 1, Tuesday April 1, 2003 The goals of genomics Study organisms at the DNA level –Identify “parts” (genes, etc) –Figure out “connections” between “parts” Study evolution at the DNA level –Compare organisms –Uncover evolutionary history

Lecture 1, Tuesday April 1, 2003 The role of CS in Biology Essential –DNA sequencing and assembly –Microarray analysis –Protein 3D reconstruction Complementary –Gene finding, genome annotation –Protein fold prediction –Phylogeny, comparative genomics

Lecture 1, Tuesday April 1, 2003 Syllabus Tools –Alignment algorithms –Hidden Markov models –Statistical algorithms Applications –DNA sequencing and assembly –Sequence analysis (comparison, annotation) –Microarray analysis –Evolutionary analysis

Lecture 1, Tuesday April 1, 2003 Course responsibilities Homeworks[80%] –4 challenging problem sets, 4-5 problems/pset –Collaboration allowed –5 late days total –Televised students required to do 75% Final[20%] –Takehome, 1 day –Collaboration not allowed –Easy! Scribing –“Mandatory” –Grade replaces lowest 2 problems –Due one week after the lecture

Lecture 1, Tuesday April 1, 2003 Reading material Books –“Biological sequence analysis” by Durbin, Eddy, Krogh, Mitchinson Chapters 1-4, 6, (7-8), (9-10) –“Algorithms on strings, trees, and sequences” by Gusfield Chapters (5-7), 11-12, (13), 14, (17) Papers Lecture notes

Topic 1. Sequence Alignment

Lecture 1, Tuesday April 1, 2003 Complete genomes

Lecture 1, Tuesday April 1, 2003 Evolution

Lecture 1, Tuesday April 1, 2003 Evolution at the DNA level …ACGGTGCAGTCACCA… …ACGTTGCAGTCCACCA… C SEQUENCE EDITSREARRANGEMENTS

Lecture 1, Tuesday April 1, 2003 Evolutionary Rates OK X X Still OK? next generation

Lecture 1, Tuesday April 1, 2003 Sequence conservation implies function Interleukin region in human and mouse

Lecture 1, Tuesday April 1, 2003 Sequence Alignment -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC Definition Given two strings x = x 1 x 2...x M, y = y 1 y 2 …y N, an alignment is an assignment of gaps to positions 0,…, N in x, and 0,…, N in y, so as to line up each letter in one sequence with either a letter, or a gap in the other sequence AGGCTATCACCTGACCTCCAGGCCGATGCCC TAGCTATCACGACCGCGGTCGATTTGCCCGAC

Lecture 1, Tuesday April 1, 2003 What is a good alignment? Alignment: The “best” way to match the letters of one sequence with those of the other How do we define “best”? Alignment: A hypothesis that the two sequences come from a common ancestor through sequence edits Parsimonious explanation: Find the minimum number of edits that transform one sequence into the other

Lecture 1, Tuesday April 1, 2003 Scoring Function Sequence edits: AGGCCTC –Mutations AGGACTC –Insertions AGGGCCTC –Deletions AGG.CTC Scoring Function: Match: +m Mismatch: -s Gap:-d Score F = (# matches)  m - (# mismatches)  s – (#gaps)  d

Lecture 1, Tuesday April 1, 2003 How do we compute the best alignment? AGTGCCCTGGAACCCTGACGGTGGGTCACAAAACTTCTGGA AGTGACCTGGGAAGACCCTGACCCTGGGTCACAAAACTC Too many possible alignments: O( 2 M+N )

Lecture 1, Tuesday April 1, 2003 Alignment is additive Observation: The score of aligningx 1 ……x M y 1 ……y N is additive Say thatx 1 …x i x i+1 …x M aligns to y 1 …y j y j+1 …y N The two scores add up: F(x[1:M], y[1:N]) = F(x[1:i], y[1:j]) + F(x[i+1:M], y[j+1:N])

Lecture 1, Tuesday April 1, 2003 Dynamic Programming We will now describe a dynamic programming algorithm Suppose we wish to align x 1 ……x M y 1 ……y N Let F(i,j) = optimal score of aligning x 1 ……x i y 1 ……y j

Lecture 1, Tuesday April 1, 2003 Dynamic Programming (cont’d) Notice three possible cases: 1.x i aligns to y j x 1 ……x i-1 x i y 1 ……y j-1 y j 2.x i aligns to a gap x 1 ……x i-1 x i y 1 ……y j - 3.y j aligns to a gap x 1 ……x i - y 1 ……y j-1 y j m, if x i = y j F(i,j) = F(i-1, j-1) + -s, if not F(i,j) = F(i-1, j) - d F(i,j) = F(i, j-1) - d

Lecture 1, Tuesday April 1, 2003 Dynamic Programming (cont’d) How do we know which case is correct? Inductive assumption: F(i, j-1), F(i-1, j), F(i-1, j-1) are optimal Then, F(i-1, j-1) + s(x i, y j ) F(i, j) = maxF(i-1, j) – d F( i, j-1) – d Where s(x i, y j ) = m, if x i = y j ;-s, if not

Lecture 1, Tuesday April 1, 2003 Example x = AGTAm = 1 y = ATAs = -1 d = -1 AGTA A10 -2 T 0010 A-3 02 F(i,j) i = j = Optimal Alignment: F(4,3) = 2 AGTA A - TA

Lecture 1, Tuesday April 1, 2003 The Needleman-Wunsch Matrix x 1 ……………………………… x M y 1 ……………………………… y N Every nondecreasing path from (0,0) to (M, N) corresponds to an alignment of the two sequences Can think of it as a divide-and-conquer algorithm

Lecture 1, Tuesday April 1, 2003 The Needleman-Wunsch Algorithm 1.Initialization. a.F(0, 0) = 0 b.F(0, j) = - j  d c.F(i, 0)= - i  d 2.Main Iteration. Filling-in partial alignments a.For each i = 1……M For eachj = 1……N F(i-1,j) – d [case 1] F(i, j) = max F(i, j-1) – d [case 2] F(i-1, j-1) + s(x i, y j ) [case 3] UP, if [case 1] Ptr(i,j)= LEFTif [case 2] DIAGif [case 3] 3.Termination. F(M, N) is the optimal score, and from Ptr(M, N) can trace back optimal alignment

Lecture 1, Tuesday April 1, 2003 Performance Time: O(NM) Space: O(NM) Later we will cover more efficient methods

Lecture 1, Tuesday April 1, 2003 A variant of the basic algorithm: Maybe it is OK to have an unlimited # of gaps in the beginning and end: CTATCACCTGACCTCCAGGCCGATGCCCCTTCCGGC GCGAGTTCATCTATCAC--GACCGC--GGTCG Then, we don’t want to penalize gaps in the ends

Lecture 1, Tuesday April 1, 2003 Different types of overlaps

Lecture 1, Tuesday April 1, 2003 The Overlap Detection variant Changes: 1.Initialization For all i, j, F(i, 0) = 0 F(0, j) = 0 2.Termination max i F(i, N) F OPT = max max j F(M, j) x 1 ……………………………… x M y 1 ……………………………… y N

Lecture 1, Tuesday April 1, 2003 Next Lecture Local alignment More elaborate scoring function Memory-efficient algorithms Reading: Durbin, Chapter 2 Gusfield, Chapter 11