Download presentation
Presentation is loading. Please wait.
Published byAldo Folds Modified over 10 years ago
1
BIOINFORMATICS GENE DISCOVERY BIOINFORMATICS AND GENE DISCOVERY Iosif Vaisman 1998 UNIVERSITY OF NORTH CAROLINA AT CHAPEL HILL Bioinformatics Tutorials
3
From genes to proteins
4
DNA RNA mRNA TRANSCRIPTION SPLICING PROMOTER ELEMENTS PROTEI N TRANSLATION START CODON STOP CODON SPLICE SITES From genes to proteins
6
Comparative Sequence Sizes Yeast chromosome 3 350,000 Escherichia coli (bacterium) genome 4,600,000 Largest yeast chromosome now mapped 5,800,000 Entire yeast genome 15,000,000 Smallest human chromosome (Y) 50,000,000 Largest human chromosome (1) 250,000,000 Entire human genome 3,000,000,000
7
Low-resolution physical map of chromosome 19
8
Chromosome 19 gene map
9
Computational Gene Prediction Where the genes are unlikely to be located? How do transcription factors know where to bind a region of DNA? Where are the transcription, splicing, and translation start and stop signals? What does coding region do (and non-coding regions do not) ? Can we learn from examples? Does this sequence look familiar?
10
Artificial Intelligence in Biosciences Neural Networks (NN) Genetic Algorithms (GA) Hidden Markov Models (HMM) Stochastic context-free grammars (CFG)
11
Information Theory 0 1 1 bit
12
Information Theory 00 01 1 bit 11 10
13
Information Theory 1 bit
14
Scientific Models Mechanistic models Predictive power Elegance Consistency Stochastic models Predictive power Hidden Markov models Mechanism Black box Stochastic mechanism Physical models-- Mathematical models
15
Neural Networks interconnected assembly of simple processing elements (units or nodes) nodes functionality is similar to that of the animal neuron processing ability is stored in the inter-unit connection strengths (weights) weights are obtained by a process of adaptation to, or learning from, a set of training patterns
16
Genetic Algorithms Search or optimization methods using simulated evolution. Population of potential solutions is subjected to natural selection, crossover, and mutation choose initial population evaluate each individual's fitness repeat select individuals to reproduce mate pairs at random apply crossover operator apply mutation operator evaluate each individual's fitness until terminating condition
17
Crossover Child AB Child BA Parent A Parent B crossover point Mutation
18
Markov Model (or Markov Chain) A GA TCT Probability for each character based only on several preceding characters in the sequence # of preceding characters = order of the Markov Model Probability of a sequence P(s) = P[A] P[A,T] P[A,T,C] P[T,C,T] P[C,T,A] P[T,A,G]
19
Hidden Markov Models States -- well defined conditions Edges -- transitions between the states A T C G T A C ATGAC ATTAC ACGAC ACTAC Each transition asigned a probability. Probability of the sequence: single path with the highest probability --- Viterbi path sum of the probabilities over all paths -- Baum-Welch method
20
Hidden Markov Model of Biased Coin Tosses States (S i ): Two Biased Coins {C1, C2} Outputs (O j ): Two Possible Outputs {H, T} p(OutputsO ij ): p(C1, H), p(C1, T), p(C2, H) p(C2, T) Transitions: From State X to Y {A11, A22, A12, A21} p(Initial S i ): p(I, C1), p(I, C2) p(End S i ): p(C1, E), p(C2, E)
21
Hidden Markov Model for Exon and Stop Codon (VEIL Algorithm)
22
GRAIL gene identification program POSSIBLE EXONS REFINED EXON POSITIONS FINAL EXON CANDIDATES
23
Suboptimal Solutions for the Human Growth Hormone Gene (GeneParser)
24
Measures of Prediction Accuracy TN FP FNTN TPFN TP FN REALITY PREDICTION REALITY TP FN TN FP c c nc S n = TP / (TP + FN) S p = TP / (TP + FP) Sensitivity Specificity Nucleotide Level
25
Measures of Prediction Accuracy REALITY PREDICTION Exon Level WRONG EXON CORRECT EXON MISSING EXON S n = Sensitivity number of correct exons number of actual exons S p = Specificity number of correct exons number of predicted exons
26
GeneMark Accuracy Evaluation
27
Gene Discovery Exercise http://metalab.unc.edu/pharmacy/Bioinfo/Gene Bibliography http://linkage.rockefeller.edu/wli/gene/list.html and http://www-hto.usc.edu/software/procrustes/fans_ref/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.