Presentation is loading. Please wait.

Presentation is loading. Please wait.

2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole.

Similar presentations


Presentation on theme: "2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole."— Presentation transcript:

1 2: Large-Scale 1 / 42 1 Large!

2 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole genome yeast two hybrid scan Genomic knockout of all single genes SNP/CGH Methylation profiling … Proteome profiling

3 2: Large-Scale 3 / 42 Genomic Sequencing – shotgun sequencing Sequencing is usually ~700 bp in a single run. How can we sequence a genome?

4 2: Large-Scale 4 / 42 Genomic Sequencing – Walking. 1.Design a primer 2.Sequence. 3.Design a new primer 4.Sequence 5.… One has to design new primers every time. To do so, one has to wait for the sequencing results

5 2: Large-Scale 5 / 42 GAGGAGACGAACACCCGTATACAGTCGACG ACCCCGAGGAGACGAACACCCGTATACAGTCGACGTTTATATATA GTATACAGTCGACGTTTATATATA ACCCCGAGGAGACGA Genomic Sequencing – shotgun sequencing 1. Break DNA to small pieces 2. Sequence each piece 3. Assemble

6 2: Large-Scale 6 / 42 After the DNA is isolated (from the tissue/cell/virus), it is fragmented either by restriction enzymes or by mechanical force. ACGTAACGTATACCCGAC TATATGCATTGCATATG “Frayed ends” 1. Break DNA to small pieces

7 2: Large-Scale 7 / 42 ←ATACGTAACGTATACCCGAC TATATGCATTGCATATGGG → 3’ 5’ 3’ To blunt-end (“fix”) frayed ends, one needs a DNA polymerase. In the example above, just adding a polymerase will make the edges blunt. Polymerases always make the chain grow from the 5’ towards the 3’ (5’ → 3’)

8 2: Large-Scale 8 / 42 ACGTAACGTATACCCGAC ATTGCATATGGGCTGAACAT 3’ 5’ 3’ Polymerases always make the chain grow from the 5’ towards the 3’ (5’ → 3’) But what about this case? ←ATACGTAACGTATACCCGAC TATATGCATTGCATATGGG → 5’3’ 5’

9 2: Large-Scale 9 / 42 E. coli DNA polymerase has 3 domains: One does the replication One digests DNA 3’ → 5’ (exonuclease). One digests DNA 5’ → 3’ (exonuclease). Klenow fragment = engineered polymerase without the 5’ → 3’ exonuclease activity.

10 2: Large-Scale 10 / 42 ACGTAACGTATACCCGAC ATTGCATATGGGCTGAACAT 3’ 5’ 3’ Polymerases always make the chain grow from the 5’ towards the 3’ (5’ → 3’) But what about this case? Klenow has 3’ → 5’ exonuclease activity ←ATACGTAACGTATACCCGAC TATATGCATTGCATATGGG → 5’3’ 5’

11 2: Large-Scale 11 / 42 GAGGAGACGAACACCCGTATACAGTCGACG GTATACAGTCGACGTTTATATATA ACCCCGAGGAGACGA The pieces are inserted into a vector – e.g., a plasmid. Sequencing is done from both sides 2. Sequence each piece: One can use the same primers for all the sequencing. Parallelism of sequencing.

12 2: Large-Scale 12 / 42 GAGGAGACGAACACCCGTATACAGTCGACG ACCCCGAGGAGACGA ? GTATACAGTCGACGTTTATATATA GTATACAGTCGACGTTTATATATA ACCCCGAGGAGACGA Shotgun sequencing – why isn’t it a trivial task? 1. By chance, some parts are not sequenced even once!!!

13 2: Large-Scale 13 / 42 Shotgun sequencing – Definition of coverage. X5 coverage: each base in the final sequence was present, on average, in 5 reads Although the human genome was sequenced at a X12 coverage, still 1% of the genome is either not assembled or not reliable.

14 2: Large-Scale 14 / 42 Shotgun sequencing – why isn’t it a trivial task? 2.Some pieces do not align because of sequencing errors GAGGTGAGGAACACCCGTATACAGTCGACG ACCCCGAGG?GA?GAACACCCGTATACAGTCGACGTTTATATATA ACCCCGAGGAGACGA

15 2: Large-Scale 15 / 42 Shotgun sequencing – why not a trivial task? 3. Repetitive sequences –satellites DNA. GGGGGGGGGGGGGGGGGGGGGGGGGGGG ACCCCGGGGGGGGGGGGG????GGGGGGGGGGGGGA GGGGGGGGGGGGGGGGGGGGGGA ACCCCGGGGG

16 2: Large-Scale 16 / 42 Shotgun sequencing – why isn’t it a trivial task? 4. Repetitive sequences (duplicated regions). In the genome we have duplicated regions which have almost identical sequence.

17 2: Large-Scale 17 / 42 Shotgun sequencing – why isn’t it a trivial task? 5. Some fragments are not sequenced because once inserted to a bacterium, they are toxic.

18 2: Large-Scale 18 / 42 A section of the genome that could be reliably assembled. A contig

19 2: Large-Scale 19 / 42 A contig Lander- Waterman estimation of number of contigs w.r.t. genome coverage

20 2: Large-Scale 20 / 42 At 8X-10X coverage, ~5 contigs are expected -> some of the genome is expected to be un-sequenced.

21 2: Large-Scale 21 / 42 Scaffolding

22 2: Large-Scale 22 / 42 Vector (e.g., e. coli) Cloned fragment of the genome (e.g., 10 KB) When sequencing a large genome, often the inserts are very large (10KB). In such case, it is impossible to sequence the entire insert, and only the edges are sequenced.

23 2: Large-Scale 23 / 42 Short fragments from both ends are sequenced Mate pairsA read

24 2: Large-Scale 24 / 42 The size of the insert is also recorded. Mate pairsA read 10 KB

25 2: Large-Scale 25 / 42 Information from mate pairs is used to build a scaffold of the genome A contig

26 2: Large-Scale 26 / 42 The human genome is the chimp genome with 99% accuracy. Comparative assembly If one sequences the chimp genome – the information from the human genome can aid in the assembly.

27 2: Large-Scale 27 / 42 If one offers you to sequence your genome at 99.9% accuracy – don’t take it even for 5$.

28 2: Large-Scale 28 / 42 Often, phages are used as cloning vectors in standard cloning experiments. For genomic sequencing, Bacterial Artificial Chromosomes (BACs) are often used. These are based on the F plasmid – a large plasmid that is stably replicating in E. coli. Over 300kb can be inserted in the plasmid.

29 2: Large-Scale 29 / 42 The idea is to first divide a big genome to overlapping regions, put each in a BAC, and then use shotgun method to sequence each BAC. BAC BAC-by- BAC Assemble of the Genome Into BAC Shotgun Sequencing the edges Assemble each BAC

30 2: Large-Scale 30 / 42 Pyrosequencing: sequencing at the speed of light

31 2: Large-Scale 31 / 42 Pyrosequencing: a relatively new technique (invented 1986) in which the sequence of a DNA is discovered by synthesizing its complementary strand (the "sequencing by synthesis" principle).

32 2: Large-Scale 32 / 42 Pyrosequencing: Gel free Nucleotides are label free Parallelism

33 2: Large-Scale 33 / 42  GTP + DNA(n) -> DNA(n+1) + PPi Enzyme = polymerase  PPi -> ATP Enzyme = ATP Sulfurylase  ATP -> light Enzyme = luciferase  ATP -> AMP + 2PPi Enzyme = Apyrase

34 2: Large-Scale 34 / 42 Pyrosequencing ACGTAACGTATACCCG TGCATT? Only if one adds G – there will be light! ACGTAACGTATACCCG TGCATT? 1.Add ATP -> no light 2.Add CTP -> no light 3.Add GTP -> light 4.Add TTP -> no light 5.Add ATP -> no light 6.Add CTP -> light 7.Add GTP -> no light 8.Add TTP -> no light 9.Add ATP -> light GCASequence = GCA

35 2: Large-Scale 35 / 42 Pyrosequencing Each DNA fragment was amplified and attached to a bead separately (one bead to each fragment). Each bead was added to a fibre-optic well.

36 2: Large-Scale 36 / 42 Pyrosequencing A computer can read the light pattern from billions of wells simultaneously. (Sequencing of a bacterial genome in 7h).

37 2: Large-Scale 37 / 42 Bioinformatics and medicine Your chip analysis suggests stress

38 2: Large-Scale 38 / 42 Bioinformatics and medicine 1.Today, medicine is based on episodic treatment. 2.First step that is currently taken place is the use of digital imaging and their analysis (e.g., optic fibers). 3.Next step: “Digital health” – medical data for a person will be shared by all doctors – no matter where you are.

39 2: Large-Scale 39 / 42 Bioinformatics and medicine 4. Clinical genomics: fast and accurate identification of pathogens 5. Clinical genomics: sequence (part) of the genome to gain insights into which drugs are efficient. 6. Predisposition analysis for diseases. 7. Towards “lifetime treatment”… 8. Less doctor intuition – more quantitative parameters and statistical analysis.

40 2: Large-Scale 40 / 42 Difference between humans: SNP – single nucleotide polymorphism CGH – copy number variation Chromatin Epigenetics We want to link these differences to diseases. Bioinformatics and medicine

41 2: Large-Scale 41 / 42 Some more important buzz words Genomics Proteomics Metabolomics System biology In-silico (in vitro, in vivo) Protein Engineering Synthetic biology Post genomic era

42 2: Large-Scale 42 / 42 Some important NUMBERS Human DNA = ~2 meters 300 x 10 9 cells 3.2 x 10 9 nucleotides


Download ppt "2: Large-Scale 1 / 42 1 Large!. 2: Large-Scale 2 / 42 High throughput technologies: Sequencing Gene expression profiling Chip-CHIP and tiling arrays Whole."

Similar presentations


Ads by Google