Presentation is loading. Please wait.

Presentation is loading. Please wait.

Welcome to CS262: Computational Genomics Instructor: Serafim Batzoglou TAs: George Asimenos Andreas Sundquist Tuesday&Thursday 2:45-4:00 Skilling Auditorium.

Similar presentations


Presentation on theme: "Welcome to CS262: Computational Genomics Instructor: Serafim Batzoglou TAs: George Asimenos Andreas Sundquist Tuesday&Thursday 2:45-4:00 Skilling Auditorium."— Presentation transcript:

1

2 Welcome to CS262: Computational Genomics Instructor: Serafim Batzoglou TAs: George Asimenos Andreas Sundquist Tuesday&Thursday 2:45-4:00 Skilling Auditorium

3 Goals of this course Introduction to Computational Biology & Genomics  Basic concepts and scientific questions  Why does it matter?  Basic biology for computer scientists  In-depth coverage of algorithmic techniques  Current active areas of research Useful algorithms  Dynamic programming  String algorithms  Graphical models (Hidden Markov Models & other) for sequence analysis

4 Topics in CS262 Part 1: Basic Algorithms  Sequence Alignment & Dynamic Programming  Hidden Markov models, Context Free Grammars, Conditional Random Fields Part 2: Topics in computational genomics and areas of active research  DNA sequencing  Comparative genomics  Genes: finding genes, gene regulation  Proteins, families, and evolution  Networks of protein interactions

5 Course responsibilities Homeworks  4 challenging problem sets, 4-5 problems/pset Due at beginning of class Up to 3 late days (24-hr periods) for the quarter  Collaboration allowed – please give credit Teams of 2 or 3 students Individual writeups If individual (no team) then drop score of worst problem per problem set (Optional) Scribing  Due one week after the lecture, except special permission  Scribing grade replaces 2 lowest problems from all problem sets First-come first-serve, email staff list to sign up

6 Reading material Books  “Biological sequence analysis” by Durbin, Eddy, Krogh, Mitchinson Chapters 1-4, 6, (7-8), (9-10)  “Algorithms on strings, trees, and sequences” by Gusfield Chapters (5-7), 11-12, (13), 14, (17) Papers Lecture notes

7 Birth of Molecular Biology DNA Phosphate Group Sugar Nitrogenous Base A, C, G, T PhysicistOrnithologist

8 DNA to RNA to Protein to Cell DNA, ~3x10 9 long in humans Contains ~ 22,000 genes G A G U C A G C messenger-RNA transcriptiontranslationfolding

9 T C A C T G G C G A G T C A G C G A G U C A G C DNARNA A - T G - C T  U

10 Composed of a chain of amino acids. R | H 2 N--C--COOH | H Proteins 20 possible groups

11 R R | | H 2 N--C--COOH H 2 N--C--COOH | | H H Proteins

12 Dipeptide R O R | II | H 2 N--C--C--NH--C--COOH | | H H This is a peptide bond

13 Protein structure Linear sequence of amino acids folds to form a complex 3-D structure The structure of a protein is intimately connected to its function

14 DNA in action Questions about DNA as information:  How is the information stored in DNA?  How is the stored information used? Answers:  Information is stored as nucleotide sequences .. and used in protein synthesis

15 Transcription & Translation The DNA is contained in the nucleus of the cell A stretch of it unwinds, and its message (or sequence) is transcribed onto a molecule of mRNA Its destination is a molecular workbench in the cytoplasm, a structure called a ribosome, which translates the mRNA to a protein Think of AUGCCGGGAGUAUAG as AUG-CCG-GGA-GUA-UAG Each triplet (codon) maps to an amino acid A gene is a length of DNA that codes for a protein Genome = The entire DNA sequence within the nucleus

16 The Genetic Code

17 Transcription – key steps Initiation Elongation Termination + DNA RNA DNA

18 Transcription – key steps Initiation Elongation Termination DNA

19 Transcription – key steps Initiation Elongation Termination DNA

20 Transcription – key steps Initiation Elongation Termination DNA

21 Transcription – key steps Initiation Elongation Termination + DNA RNA DNA

22 Promoters Promoters are sequences in the DNA just upstream of transcripts that define the sites of initiation. The role of the promoter is to attract RNA polymerase to the correct start site so transcription can be initiated. 5’ Promoter 3’

23 Promoters Promoters are sequences in the DNA just upstream of transcripts that define the sites of initiation. The role of the promoter is to attract RNA polymerase to the correct start site so transcription can be initiated. 5’ Promoter 3’

24 DNA to RNA to Protein to Cell DNA, ~3x10 9 long in humans Contains ~ 22,000 genes G A G U C A G C messenger-RNA transcriptiontranslationfolding

25 Genetics in the 20 th Century

26 21 st Century Technology drives an information revolution AGTAGCACAGACTACGACGA GACGATCGTGCGAGCGACGG CGTAGTGTGCTGTACTGTCG TGTGTGTGTACTCTCCTCTC TCTAGTCTACGTGCTGTATG CGTTAGTGTCGTCGTCTAGT AGTCGCGATGCTCTGATGTT AGAGGATGCACGATGCTGCT GCTACTAGCGTGCTGCTGCG ATGTAGCTGTCGTACGTGTA GTGTGCTGTAAGTCGAGTGT AGCTGGCGATGTATCGTGGT

27 From DNA to organisms DNA, ~3x10 9 long in humans Contains ~ 22,000 genes G A G U C A G C messenger-RNA transcriptiontranslationfolding 1

28 Sequencing and Fragment Assembly AGTAGCACAGA CTACGACGAGA CGATCGTGCGA GCGACGGCGTA GTGTGCTGTAC TGTCGTGTGTG TGTACTCTCCT 3x10 9 nucleotides Cut at random

29 Sequencing and Fragment Assembly AGTAGCACAGA CTACGACGAGA CGATCGTGCGA GCGACGGCGTA GTGTGCTGTAC TGTCGTGTGTG TGTACTCTCCT 3x10 9 nucleotides Cut at random acgtccagcatgactacgagactgtgtagcgcgatcgatct ~700 nucleotides paired

30 Sequencing and Fragment Assembly AGTAGCACAGA CTACGACGAGA CGATCGTGCGA GCGACGGCGTA GTGTGCTGTAC TGTCGTGTGTG TGTACTCTCCT 3x10 9 nucleotides ~700 nucleotides paired gatcgactgtacgatcagtgtcatc gagctgatcgactgtacgatcagtgt catgactacgagctgatcgactgtacgatca cagcatgactacgagctgatcgac acgtccagcatgactacgag ……acgtccagcatgactacgagctgatcgactgtacgatcagtgtgcatc……

31 Sequencing and Fragment Assembly AGTAGCACAGA CTACGACGAGA CGATCGTGCGA GCGACGGCGTA GTGTGCTGTAC TGTCGTGTGTG TGTACTCTCCT 3x10 9 nucleotides 50% of human DNA is composed of repeats Error! Glued together two distant regions

32 Sequencing and Fragment Assembly AGTAGCACAGA CTACGACGAGA CGATCGTGCGA GCGACGGCGTA GTGTGCTGTAC TGTCGTGTGTG TGTACTCTCCT 3x10 9 nucleotides C R D ARB, CRD or ARD, CRB ? ARB

33 Sequencing and Fragment Assembly AGTAGCACAGA CTACGACGAGA CGATCGTGCGA GCGACGGCGTA GTGTGCTGTAC TGTCGTGTGTG TGTACTCTCCT 3x10 9 nucleotides

34 Complete genomes today More than 300 complete genomes have been sequenced

35 DNA to RNA, and genes G A G U C A G C DNA, ~3x10 9 long in humans Contains ~ 22,000 genes RNA: carries the “message” for “translating”, or “expressing” one gene transcriptiontranslation folding 1 2

36 Where are the genes? 2. Gene Finding In humans: ~22,000 genes ~1.5% of human DNA

37 Start codon ATG 5’ 3’ Exon 1 Exon 2 Exon 3 Intron 1Intron 2 Stop codon TAG/TGA/TAA Splice sites 2. Gene Finding

38

39 From DNA to organisms DNA, ~3x10 9 long in humans Contains ~ 22,000 genes G A G U C A G C messenger-RNA transcriptiontranslationfolding 3

40 3. Protein Folding The amino-acid sequence of a protein determines the 3D fold The 3D fold of a protein determines its function Can we predict 3D fold of a protein given its amino-acid sequence?  Holy grail of compbio—35 years old problem  Molecular dynamics, robotics, machine learning, computational geometry  Not a topic for CS262 – take CS273 for protein structure

41 Complete Genomes More than 200 complete genomes have been sequenced

42 Evolution

43 From DNA to organisms DNA, ~3x10 9 long in humans Contains ~ 22,000 genes G A G U C A G C messenger-RNA transcriptiontranslationfolding 4

44 Evolution at the DNA level OK X X Still OK? next generation

45 4. Sequence Comparison Sequence conservation implies function Sequence comparison is key to Finding genes Determining function Uncovering the evolutionary processes

46 Sequence Comparison—Alignment AGGCTATCACCTGACCTCCAGGCCGATGCCC TAGCTATCACGACCGCGGTCGATTTGCCCGAC -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- | | | | | | | | | | | | | x | | | | | | | | | | | TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC Sequence Alignment Introduced ~1970 BLAST: 1990, most cited paper in history Still very active area of research query DB BLAST

47 Comparison of Human, Mouse, and Rat

48 More DNA is coming…

49 From DNA to organisms DNA, ~3x10 9 long in humans Contains ~ 22,000 genes G A G U C A G C messenger-RNA transcriptiontranslationfolding 5

50 5. Clustering of Microarrays Clinical prediction of Leukemia type 2 types  Acute lymphoid (ALL)  Acute myeloid (AML) Different treatment & outcomes Predict type before treatment? Bone marrow samples: ALL vs AML Measure amount of each gene

51 6. Protein networks Fresh research area Construct networks from multiple data sources Navigate networks Compare networks across organisms  Statistics  Machine learning  Graph algorithms  Databases

52 Some goals of biology for the next 50 years List all molecular parts that build an organism  Genes, proteins, other functional parts Understand the function of each part Understand how parts interact Study how function has evolved across all species Find genetic defects that cause diseases Design drugs rationally Sequence the genome of every human, use it for personalized medicine

53 Computer Scientists vs Biologists

54 Computer scientists vs Biologists (almost) Nothing is ever true or false in Biology Everything is true or false in computer science

55 Computer scientists vs Biologists Biologists strive to understand the complicated, messy natural world Computer scientists seek to build their own clean and organized virtual worlds

56 Biologists are obsessed with being the first to discover something Computer scientists are obsessed with being the first to invent or prove something Computer scientists vs Biologists

57 Biologists are comfortable with the idea that all data have errors Computer scientists are not Computer scientists vs Biologists

58 Computer scientists get high-paid jobs after graduation Biologists typically have to complete one or more 5-year post-docs... Computer scientists vs Biologists

59 Computer Science is to Biology what Mathematics is to Physics


Download ppt "Welcome to CS262: Computational Genomics Instructor: Serafim Batzoglou TAs: George Asimenos Andreas Sundquist Tuesday&Thursday 2:45-4:00 Skilling Auditorium."

Similar presentations


Ads by Google