Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genomics. Gene expression DNA (Genome) pre-mRNA mRNA mRNA (Transcriptome) Proteins (Proteome) Metabolites (Metabolome) Regulation Nucleus Cytoplasm Chromatography.

Similar presentations


Presentation on theme: "Genomics. Gene expression DNA (Genome) pre-mRNA mRNA mRNA (Transcriptome) Proteins (Proteome) Metabolites (Metabolome) Regulation Nucleus Cytoplasm Chromatography."— Presentation transcript:

1 Genomics

2 Gene expression DNA (Genome) pre-mRNA mRNA mRNA (Transcriptome) Proteins (Proteome) Metabolites (Metabolome) Regulation Nucleus Cytoplasm Chromatography Mass spectrometry NMR DNA arrays and chips (semi) qRT-PCR Northern blot + hybrid. Transkriptional fusions Functional genomics 2D electrophoresis Mass spectrometry Protein sequencing Translational fusional Immunodetection Enzyme activities Genome maping Genome sequencing Genome annotations Structural genomics

3 History of genomes sequencing 1977 bacteriophage øX174 (5386bp, 11 genes) 1981 mitochondrial genome (16,568bp; 13 prots; 2 rRNAs; 22 tRNAs 1986 chloroplast genome (120,000-200,000bp) 1992 Saccharomyces chromosome III (315kb; 182 ORFs) 1995 Haemophilus influenzae (1.8Mb 1996 Saccharomyces whole genome (12.1Mb; over 600 people 100 laboratories) 1997 E. coli (4.6Mb; 4200 proteins) 1998 Caenorhabditis elegans (97 Mb; 19,000 genů) 2000 Arabidopsis thaliana (115Mb, 25-30,000 genů) 2001 mouse (1 year!) 2001 Homo sapiens (2 projekty) 2005 Pan, rice 2006 Populus Technological improvements

4 DNA sequencing – principle (Sanger’s method) Polymeration from primer in the presence of low concentration of terminator (dideoxy) ddNTP primer Random termination on all positions with occurance of the nucleotide

5 Original arrangement sequence - RI labelled primer - 4 separated reactions - with individual ddNTP - ddNTP:dNTP (cca 1:20 – (100)) - PAGE separation A T C G C C C T G T T G A G A Separation by size

6 Automated sequencing with fluorescence-labelled ddNTP Every ddNTP labelled with different fluorescent dye – all together in one reaction Separation by size in capillary – fluorescence detection

7 Genom sequencing is more than sequencing of DNA 1 sequencing reaction 300 – 800 bp Typical genom hunderts of millions to billions bp How to manage?

8 Strategies of genome sequencing Classical strategy (Map-Based Assembly): - minimal quantity of DNA sequencing – sorting of big DNA fragments, successive reading (human genome sequencing – original strategy) - scaffold for genome sequence assemble - time consuming Whole genome shotgun (WGS) – random (7-9x redundant) sequencing – sorting of sequence data (Haemophilus) - problems with repetitive DNA Combination – „hierarchical shotgun“, „chromosome shotgun“

9 Hierarchical shotgun sequencingWhole-genome shotgun sequencing Green (2001) Nature Reviews Genetics 2: 573-583 Production of over- lapping clones (e.g. BACs, YACs) and construction of physical map Shearing of DNA and sequencing of subclones Assembly

10 Hierarchical shotgun sequencing First step: library of big DNA inserts (= genome fragments) phage ( ) vectors: 30 kb cosmids: 50 kb BACs (bacterial artificial chromosomes): 100-300 kb YACs (yeast artificial chromosomes): cca 0.5-1Mb

11 Physical „BAC“ map of genome Arrangement (position, orientation) of individual BAC in the genome Fundamental for classical sequencing Very usefull for assembly of „shotgun“ sequences How to make the map from BACs with unknown sequence?

12 Map construction - BAC fingerprinting - 10-20x more bp in BACs than in the genome for map construction (Arabidopsis – 20 000, rice - 70 000) Restriction sites Sequencing of DNA ends

13 BAC fingerprinting ANIMATION of HIERARCHICAL SHOTGUN: http://www.weedtowonder.org/sequencing.html

14 Minimum tiling path = the lowest possible set of BACs covering the whole sequence physical map arrangement and mapping and clone selection - by restriction fragment analysis - using terminal sequences and hybridization - by hybridization with markers with known position in genetic map

15 Shotgun sequencing random cleavage + direct sequencing (NGS) BAC/chromosome/whole genome sequencing of clone ends (known distance between) Cosmids (40 Kbp): ~500 bp

16 Genome (chromosome, BAC...) assembly..ACGATTACAATAGGTT.. 1.Looking for overlaps in primary sequences 2.Assembly to contigs to get short consensus sequences 3.Assebly to supercontigs using the information of sequence pairs (ends + distance) 4. Complete consensus sequence

17 Repetitive sequences and contig assembly repetition Repetitions are serious problem in assembly, if they are conserved and longer than sequencing run ? ?

18 Use of markers for whole genome assembly ( STS – sequence tagged sites = short sequences with known position on chromosoms) Supecontigs with scaffold (BAC-end sequences with known distance)

19 -optimal – libraries with different insert sizes (2, 10, a 50 kbp) -sequencing the linker clone = filling the gap Filling of gaps: shorter clones are better X

20 What to do with the genome sequence? To annotate! Searching for genes: –Automatic prediction of coding seq. –Prediction of introns/exons –Prediction according to related seq. –Confirmation by cDNA and EST Prediction of function – from experimentally characterized homologues

21 Fragment of GenBank BAC clone annotation

22 Graphical interface of BAC annotation

23 Large genomes alternative strategies of sequencing: - isolation of individual chromosomes e.g. wheat – allows assembly of homeologous chromosomes (allohexaploid) - shotgun sequencing of non-methylated DNA (maize) - sequencing of ESTs (potato)

24 Expressed Sequence Tags (ESTs) -short sequenced regions of cDNA (300-600 nt) -usually gene fragments (primarilly originate from mRNA) -highly redundant, but also incomplete! -problems: - no regulatory sequences (promotors, introns,...) - only transcripts of certain genes

25 Preparation of EST library - mRNA - RT with oligoT primer  cDNA -cleavage of RNA from heteroduplex RNAseH - 2nd strand cDNA synthesis - cleavage with restriction endonuclease - adaptor ligation cloning Expressed Sequence Tags (ESTs) sequencing

26 Assembly of EST contigs - Unigenes

27 Next generation sequencing - faster and cheaper!!! - parallel sequencing of high numbers of sequences! - no handling with individual sequences! Examples of recently developed or developing technologies: 454 sequencing – pyrosequencing (Roche) - complementary strand synthesis Illumina – sequencing by synthesis - complementary strand synthesis SOLiD - Sequencing by Oligonucleotide Ligation and Detection - ligation of labelled oligonucleotides Oxford nanopore technology - exonuclease degradation, el. current changes detection

28 Method Single-molecule real-time sequencing (Pacific Bio) Ion semiconductor (Ion Torrent sequencing) Pyrosequencing (454) Sequencing by synthesis (Illumina) Sequencing by ligation (SOLiD sequencing) Chain termination (Sanger sequencing) Read length 5.000-10.000 (30.000) bpup to 400 bp700 bp50 to 300 bp50+50 bp400 to 900 bp Reads per run50.000up to 80 million1 millionup to 3 billion1.2 to 1.4 billionN/A Cost per 1 million bases (in US$)$0.33-$1.00$1$10$0.05 to $0.15$0.13$2400 NGS – comparison of basic parameters http://en.wikipedia.org/wiki/DNA_sequencing

29 454 technology - pyrosequencing up to 1 mil reads (lenght 700 - 1000 bp) one day (23 hour procedure) = 500-800 Mbp

30 454 technology - pyrosequencing

31 454 technology

32

33 Illumina – sequencing by synthesis (Solexa)

34 Illumina – seqencing by synthesis (Solexa)

35

36

37 SOLiD™ System (Applied Biosystems) 2 Base Encoding Sequencing by Oligonucleotide Ligation and Detection - reads up to 75 b - 20-30 Gb for a day! - high accuracy up to 99,99 % - initial step – clonal multiplication (similar to 454) http://appliedbiosystems.cnpg.com/Video/flatFiles/699/index.aspx

38 SOLiD™ System Mix of 1024 octamers (number of variations NNN = 64) x 16 known dinucleotides Z = nucleotides universally pairing with any nucleotide (prolongation) – cleaved out after ligation labelling: 4 fluorescent dyes – each for 256 octamers (with just 4 known middle dinucleotides) -

39 5 independent reactions = each 10 – 15 times repeated ligations of labelled octamers starting from a primer with shifted end

40 Knowledge of the first nucleotide allows translation of color sequence to nucleotide sequence A A T G C A G G C A T G C C G T A C } alternative translation with different 1st nucleotide

41 Oxford nanopore technologies – direct sequencing of one DNA strand - protein nanopore in membrane (alpha-hemolysin) - covalently bound exonuclease - monitoring specific decrease in current (metC!) http://www.nanoporetech.com/sequences


Download ppt "Genomics. Gene expression DNA (Genome) pre-mRNA mRNA mRNA (Transcriptome) Proteins (Proteome) Metabolites (Metabolome) Regulation Nucleus Cytoplasm Chromatography."

Similar presentations


Ads by Google