Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction methods Gene indices Mapping cDNA on genomic DNA Genome-genome.

Similar presentations


Presentation on theme: "1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction methods Gene indices Mapping cDNA on genomic DNA Genome-genome."— Presentation transcript:

1 1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction methods Gene indices Mapping cDNA on genomic DNA Genome-genome comparison Applications

2 2 Computational Molecular Biology MPI for Molecular Genetics exon 2exon 1exon npromotor 5‘UTR 3‘UTR Protein coding sequence exon n-1 DNA sequences gene structure (eucaryotes)

3 3 Computational Molecular Biology MPI for Molecular Genetics DNA sequences repeats, repetitive elements Long INterspersed Elements SINE (e.g. Alu) Transposons Simple repeats (e.g. ATATA...)

4 4 Computational Molecular Biology MPI for Molecular Genetics DNA sequences repeats, repetitive elements High copy number Sequence variability Mostly located in untranslated regions

5 5 Computational Molecular Biology MPI for Molecular Genetics Gene prediction Strategies for detecting ORFs / exons Distribution of stop codons Codon usage Hexamer frequencies Prediction of the coding frame Splice site recognition (Eucaryotes only)

6 6 Computational Molecular Biology MPI for Molecular Genetics Gene prediction by sequence comparison Comparison of genomic DNA and cDNA/ESTs Comparison of related genomic DNA of different organisms

7 7 Computational Molecular Biology MPI for Molecular Genetics Gene prediction Codon usage (single exon) Frame 1 Frame 2 Frame 3 coding non-coding

8 8 Computational Molecular Biology MPI for Molecular Genetics Gene prediction Codon usage (single exon) Frame 1 Frame 2 Frame 3 coding non-coding correct start coding sequence

9 9 Computational Molecular Biology MPI for Molecular Genetics Gene prediction Codon usage (multiple exons) Frame 1 Frame 2 Frame 3 coding non-coding Splice sites Exons: 208..295 1029..1349 1500..1688 2686..2934 3326..3444 3573..3680 4135..4309 4708..4846 4993..5096 7301..7389 7860..8013 8124..8405 8553..8713 9089..9225 13841..14244

10 10 Computational Molecular Biology MPI for Molecular Genetics Gene prediction Codon usage (multiple exons) Frame 1 Frame 2 Frame 3 coding non-coding Splice sites Exons: 208..295 1029..1349 1500..1688 2686..2934 3326..3444 3573..3680 4135..4309 4708..4846 4993..5096 7301..7389 7860..8013 8124..8405 8553..8713 9089..9225 13841..14244

11 11 Computational Molecular Biology MPI for Molecular Genetics Gene prediction Additional criteria Detection of Start codons Detection of potential promotor elements Detection of repetitive sequences (mostly untranslated) Homology to known genes of related organisms

12 12 Computational Molecular Biology MPI for Molecular Genetics Gene prediction Software GENSCAN (C.Burge & S.Karlin) Grail (neural network; Ueberbacher et al.) MZEF (M. Zhang,1997) FGeneH, Hexon (V.Solovyev et al., 1994) Genie, etc. All programs are using dynamic programming for detection of the optimal solution

13 13 Computational Molecular Biology MPI for Molecular Genetics DNA sequences in public databases Human ~ 2.8 million ESTs + 130 000 RNAs Mouse ~ 1.8 million ESTs + 30 000 RNAs

14 14 Computational Molecular Biology MPI for Molecular Genetics Expressed sequence tags (EST) AAAAAA... mRNA TTTTTT... cDNA is usually oligo dT primed, or by random primers Reverse transcriptase stops ‚randomly‘ cDNA Several cDNAs for the same mRNA may be generated

15 15 Computational Molecular Biology MPI for Molecular Genetics Expressed sequence tags (EST) Average: 1500 bp <700 bp Vector (known sequence) Clone = mRNA fragment Dechiffered sequence (EST) 3‘-primer

16 16 Computational Molecular Biology MPI for Molecular Genetics Expressed sequence tags (EST) Isolation of mRNAs from tissue(s) Generation of cDNAs reflecting parts of the RNAs Cloning of cDNAs into a vector (often random orientation) End sequencing of the clones

17 17 Computational Molecular Biology MPI for Molecular Genetics Generation of ESTs Basecalling problems close to 3‘ end of EST close to 5‘ end of EST missing bases

18 18 Computational Molecular Biology MPI for Molecular Genetics Coverage of an mRNA by ESTs AAAAAA... putative mRNA exon 15‘UTRexon 23‘UTR expressed sequence tags (ESTs)

19 19 Computational Molecular Biology MPI for Molecular Genetics Characteristics of ESTs Highly redundant Low sequence quality (Cheap) Reflect expressed genes May be tissue/stage specific

20 20 Computational Molecular Biology MPI for Molecular Genetics Gene indices UniGene (NCBI) TIGR Gene Indices STACK (SANBI) GeneNest (DKFZ,MPI) Clustering of EST and mRNA sequences of an organism to reduce redundance in sequence data. Goal: Each cluster represents one gene or mRNA

21 21 Computational Molecular Biology MPI for Molecular Genetics Gene indices GeneNest workflow EMBL databaseUnigene database Quality clipping BLAST/QUASAR search, clustering Assembly, Consensus sequences Visualization

22 22 Computational Molecular Biology MPI for Molecular Genetics Gene indices Quality clipping Removal of vector sequence Masking of repetitive sequences (e.g. Alu) Removal of terminal sequences of low quality In order to cluster based on gene-specific sequence data the following steps have to be performed:

23 23 Computational Molecular Biology MPI for Molecular Genetics Gene indices Clustering Minimal % identity (e.g. > 95%) Minimal length of match (e.g. >40 bp) No internal matches (TIGR gene indices) Same origin of tissue (only STACK) Sequences are usually clustered if the matching part between two sequences fullfills several (empirical) criteria:

24 24 Computational Molecular Biology MPI for Molecular Genetics Gene indices Assembly Contigs, reflecting partially different sequences One consensus sequence per contig A relative order of the sequences (alignment) Sequences in a cluster are assembled to group those sequences which are globally similar, resulting in

25 25 Computational Molecular Biology MPI for Molecular Genetics Gene indices Consensus sequences Reduced error rate Consensus often longer than any single sequence contributing Efficient database search Detection of exon/intron boundaries and alternative splice variants

26 26 Computational Molecular Biology MPI for Molecular Genetics Gene indices Alignment consensus

27 27 Computational Molecular Biology MPI for Molecular Genetics Gene indices Alignment Software Phrap (Phil Green) CAP3 (X. Huang) TIGR assembler GAP4 (R. Staden)

28 28 Computational Molecular Biology MPI for Molecular Genetics GeneNest visualization ( http://genenest.molgen.mpg.de )

29 29 Computational Molecular Biology MPI for Molecular Genetics GeneNest visualization ( http://genenest.molgen.mpg.de )

30 30 Computational Molecular Biology MPI for Molecular Genetics TIGR Gene Indices ( http://www.tigr.org/ ) Alignment scheme

31 31 Computational Molecular Biology MPI for Molecular Genetics UniGene ( http://www.ncbi.nih.nlm.gov/UniGene )

32 32 Computational Molecular Biology MPI for Molecular Genetics UniGene ( http://www.ncbi.nih.nlm.gov/UniGene )

33 33 Computational Molecular Biology MPI for Molecular Genetics Mapping of EST consensus sequences on genomic DNA genomic sequence exons consensus sequence (  mRNA) missing intron

34 34 Computational Molecular Biology MPI for Molecular Genetics Mapping cDNA on genomic DNA

35 35 Computational Molecular Biology MPI for Molecular Genetics Mapping cDNA on genomic DNA (http://splicenest.molgen.mpg.de)

36 36 Computational Molecular Biology MPI for Molecular Genetics Genome-genome comparison xxxxx ancestral gene humanmouse X = region with low mutation rate

37 37 Computational Molecular Biology MPI for Molecular Genetics Genome-genome comparison

38 38 Computational Molecular Biology MPI for Molecular Genetics Genome-genome comparison Conserved coding regions (protein similarity, similar function) Conserved coding exons (protein domain similarity, functional feature) Conserved non-coding regions (regulatory sites, transcription factor binding sites)

39 39 Computational Molecular Biology MPI for Molecular Genetics Gene indices Applications Detection of exon/intron boundaries Detection of alternative splicing Detection of Single Nucleotide Polymorphisms Genome annotation Analysis of gene expression Design of DNA-chips/arrays

40 40 Computational Molecular Biology MPI for Molecular Genetics Alternative Splicing hnRNA exon 15‘UTRexon 2exon 3 mRNA 2 exon 15‘UTRexon 2 mRNA 1 exon 15‘UTRexon 3

41 41 Computational Molecular Biology MPI for Molecular Genetics Alternative Splicing genomic sequence exons consensus sequence (  mRNA) splice variant

42 42 Computational Molecular Biology MPI for Molecular Genetics Alternative Splicing (additional exon) skipped exon Splice variants of adenylsuccinate lyase gene prediction errors ? unspliced ?

43 43 Computational Molecular Biology MPI for Molecular Genetics Alternative Splicing Splice variants of APECED gene number of sequencesgenomic sequence alternative variants

44 44 Computational Molecular Biology MPI for Molecular Genetics Alternative splicing

45 45 Computational Molecular Biology MPI for Molecular Genetics Alternative Splicing (alternative donor site)

46 46 Computational Molecular Biology MPI for Molecular Genetics Alternative Splicing

47 47 Computational Molecular Biology MPI for Molecular Genetics Alternative Splicing (alternative exons)

48 48 Computational Molecular Biology MPI for Molecular Genetics Alternative Splicing (unknown gene Hs16936)

49 49 Computational Molecular Biology MPI for Molecular Genetics Single Nucleotide Polymorphisms (SNP) SNPs are single base differences within one species Several million SNPs detected in Human SNPs may be related to diseases

50 50 Computational Molecular Biology MPI for Molecular Genetics Single Nucleotide Polymorphisms (SNP) SNP or basecalling error ?

51 51 Computational Molecular Biology MPI for Molecular Genetics Genome Annotation / Ensembl (http://www.ensembl.org)

52 52 Computational Molecular Biology MPI for Molecular Genetics Analysis of gene expression tissue-specificity Counting frequency of ESTs derived from a specific tissue within one sequence cluster Searching for cluster/contigs which are tissue specific (e.g. tumor) Searching for alternative splice variants which are potentially tissue specific

53 53 Computational Molecular Biology MPI for Molecular Genetics Analysis of gene expression tissue-specificity neuron-specific gene (Hs90005)

54 54 Computational Molecular Biology MPI for Molecular Genetics Analysis of gene expression tissue-specificity neuron-specific gene (Hs90005)

55 55 Computational Molecular Biology MPI for Molecular Genetics Anaysis of gene expression internal priming

56 56 Computational Molecular Biology MPI for Molecular Genetics Analysis of gene expression tissue-specificity

57 57 Computational Molecular Biology MPI for Molecular Genetics Analysis of gene expression tissue-specificity Analysis of tissue-specificity depends on expression level number of clones sequenced

58 58 Computational Molecular Biology MPI for Molecular Genetics Design of DNA chips/arrays non-redundant gene set Selection of ‚optimal‘ clones Generation of gene-specific PCR-products

59 59 Computational Molecular Biology MPI for Molecular Genetics Design of DNA chips/arrays ‚optimal clones‘ clone availability type of clone library length of the clone relative position to the consensus sequence homology to other genes existence of repetitive elements

60 60 Computational Molecular Biology MPI for Molecular Genetics Design of DNA chips/arrays gene-specific PCR-products putative gene  consensus sequence exon Aexon Cexon B repetitive sequence similarity to another gene potential gene-specific fragment potential gene-specific fragment

61 61 Computational Molecular Biology MPI for Molecular Genetics Design of DNA chips/arrays optimal gene-specific PCR-product minimal similarity to other genes minimal content of repetitive sequences not spanning over several exons +/- constant length of PCR-products of different genes

62 62 Computational Molecular Biology MPI for Molecular Genetics Primer design What are primers? short oligonucleotides (15-25 bp) unique sequence defined melting temperature

63 63 Computational Molecular Biology MPI for Molecular Genetics Primer design primer hybridization/elongation 5‘ TTTCAGTAATTAAAAAGATTTCTGT 3‘ 3‘... AAAGTCATTAATTTTTCTAAAGACACCGGTAAA...5‘ |||||||||||||||||||||||||

64 64 Computational Molecular Biology MPI for Molecular Genetics Primer design Applications DNA sequencing Polymerase Chain Reaction (PCR) DNA chip/array design

65 65 Computational Molecular Biology MPI for Molecular Genetics Primer design Features Melting temperature Self-complementarity Secondary binding capacity

66 66 Computational Molecular Biology MPI for Molecular Genetics Primer design melting temperature / 2+4 rule TTT C A G TAATTAAAAA G ATTT C T G T 5 x 4°C + 20 x 2°C = 60°C

67 67 Computational Molecular Biology MPI for Molecular Genetics Primer design thermodynamic stability / nearest neighbour ATCG A-1.2-0.9-1.5 T-0.9-1.2-1.5-1.7 C -1.5-2.1-2.8 G-1.5 -2.3-2.1 TTT C A G TAATTAAAAA G ATTT C T G T -1.2-1.5-1.7kcal/mol

68 68 Computational Molecular Biology MPI for Molecular Genetics Primer design self-complementarity 5‘ TTTCAGTAATTAAAAAGATTTCTGT 3‘ | | |||||| | 3‘ TGTCTTTAGAAAAATTAATGACTTT 5‘ all primers able to form internal loops are also able to form dimers

69 69 Computational Molecular Biology MPI for Molecular Genetics Primer design secondary binding sites 5‘ TTTCAGTAATTAAAAAGATTTCTGT 3‘ || | | | ||||||| 3‘ ACGGTAGGCATTCTACGAAAAGACA 5‘ stability of 3‘-terminal bases gets a higher weight simulating ist importance for the polymerase

70 70 Computational Molecular Biology MPI for Molecular Genetics Primer design secondary binding sites / suffix tree AACGTAGCC......NACGTCAAA......NACGTCGCA... A C G T AGCC...C A C GTAGCC... GCA... AAA...


Download ppt "1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction methods Gene indices Mapping cDNA on genomic DNA Genome-genome."

Similar presentations


Ads by Google