Download presentation
Presentation is loading. Please wait.
1
Computational Genomics Lecture 1, Tuesday April 1, 2003
2
Biology in One Slide: 2 Paradigms Molecular Paradigm Evolution Paradigm
3
High Throughput Biology Biology is becoming an information science …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGACTACGTTTTA TATATATATACGTCGTCGT ACTGATGACTAGATTACAG ACTGATTTAGATACCTGAC TGATTTTAAAAAAATATT… Gene Expression DNA Sequencing
4
Goals of this course Introduction to Computational Biology Basic biology for computer scientists Breadth: mention many topics & applications In-depth coverage of Computational Genomics Algorithms for sequence analysis Current applications, trends, and open problems Coverage of useful algorithms Hidden Markov models Dynamic Programming String algorithms Applications of AI techniques
6
Topics in CS262 Part 1: In-depth coverage of basic computational methods for analysis of biological sequences Sequence Alignment & Dynamic Programming Hidden Markov models These methods are used heavily in most genomics applications: DNA sequencing Comparison of DNA and proteins across organisms Discovery of genes, promoters, regulatory sites
7
Topics in CS262 Part 2: Topics in computational genomics, more algorithms, and areas of active research DNA sequencing & assembly: reading a complete genome such as the human DNA Gene finding: marking genes on the DNA sequence Large-scale comparative genomics: comparing whole genomes from multiple organisms Microarrays & regulation: understanding the regulatory code, and potential disease-causing genes RNA structure: predicting the folding of RNA Phylogeny and evolution: quantifying the evolution of biological sequences
8
Course responsibilities Homeworks[72%] 4 challenging problem sets, 4-5 problems/pset Collaboration allowed – please give credit Hws due Thursday, solutions explained Friday Two worst problems in all hws do not count Final[18%] Takehome, 1 day Collaboration not allowed Basic questions – much easier than homeworks Scribing[10%] Due one week after the lecture, except special permission
9
Reading material Books “Biological sequence analysis” by Durbin, Eddy, Krogh, Mitchinson Chapters 1-4, 6, (7-8), (9-10) “Algorithms on strings, trees, and sequences” by Gusfield Chapters (5-7), 11-12, (13), 14, (17) Papers Lecture notes
10
Topic 1. Sequence Alignment
11
Complete genomes
12
Evolution
13
Evolution at the DNA level …ACGGTGCAGTCACCA… …ACGTTGCAGTCCACCA… C SEQUENCE EDITSREARRANGEMENTS
14
Evolutionary Rates OK X X Still OK? next generation Changes in non-functional sites are OK, so will be propagated Most changes in functional sites are deleterious and will be rejected
15
Sequence conservation implies function Interleukin region in human and mouse 100% 40%
16
Sequence Alignment -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC Definition Given two strings x = x 1 x 2...x M, y = y 1 y 2 …y N, an alignment is an assignment of gaps to positions 0,…, M in x, and 0,…, N in y, so as to line up each letter in one sequence with either a letter, or a gap in the other sequence AGGCTATCACCTGACCTCCAGGCCGATGCCC TAGCTATCACGACCGCGGTCGATTTGCCCGAC
17
What is a good alignment? Alignment: The “best” way to match the letters of one sequence with those of the other How do we define “best”? Alignment: A hypothesis that the two sequences come from a common ancestor through sequence edits Parsimonious explanation: Find the minimum number of edits that transform one sequence into the other
18
Scoring Function Sequence edits: AGGCCTC Mutations AGGACTC Insertions AGGGCCTC Deletions AGG.CTC Scoring Function: Match: +m Mismatch: -s Gap:-d Score F = (# matches) m - (# mismatches) s – (#gaps) d
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.