Presentation is loading. Please wait.

Presentation is loading. Please wait.

Frog’s eye view of the jungle (time frozen) Push to restart time.

Similar presentations


Presentation on theme: "Frog’s eye view of the jungle (time frozen) Push to restart time."— Presentation transcript:

1

2

3

4

5 Frog’s eye view of the jungle (time frozen) Push to restart time

6 Frog’s eye view of the jungle (time moving) Frog’s eye view of the jungle (time frozen)

7 Frog’s eye view of the jungle (through movement filter) Push to restart time

8 Frog’s eye view of the jungle (through movement filter)

9 Filters: Information reducers Movement filter

10 Filters: Information reducers Sequence filter TCTACTTATA TTCAATCCAC AGGGCTACAC CTAGTTCTTG AAGAGTCTGT TGAATGAACA CATACATGGT TTATCTGTTT TTCTGTCTGC TCTGACCTCT GGCAGCTTTC CACTAGTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC TTAGATAAAC GAACCTTAGT GACTTCTGCT ATACCAAAGT CTCCACGCCC CTCCGTAAAC CTCTAACATG ATGTCAGCAA ATATTAAAAA TGAATAAACT TTGTTAAAGG TACAAATGAA AATTAGCAAA AAGAGTTTAA AGTTAAAAAC GAATTGCAGT CATTCTAGGG AAACCTGTAT GGTTACATGA ACTGCCTAAA AAACAAGCTA TTATATATTT TAAGAAATTA ATTGCAATTA ATTTCCTGGG CCCCAGCTGT CATTAAAAAG AGGCAAATAC AGCCAAGGAC GACAGCACTG ACCCTCAAGA AGGCACCGGC TGACAGACAG GCTGAAATTC CGCTGAGAGC AGAGTGGTAC ATTGAACCCT CCCTGCACCA GGTCTTTCCT GTGGGCACTG AGTGCAGACA ATGAATGACT GAACGAACGA TTGAATGAAA AGAAATGAGA TATGAGGCAA TCACAGCATC AGGTGACCTT AGTATCTATT CTCGGGAGCG CACGGCTCTA AAGAGGCCCA TATCCAGGCA CCTTTAGATG CAAGAAGGAG GAAACAGCTC GAAATCCCTG AGGCCGGAGG GTCAAGAACT CTCCACCGGC GGCAGCGGCC CCCCGGCCTA AGGCTGCCTG TGCTATAAAT ACGCGGCCCA TTCCCTGGGC TCGGCGGGAC AGATAACATG AATGTGCCCT CTCCGTAAAC CTCTAAC... How organism is made How organism works

11 From Sequence to Organism How does Nature do it? ATGACTTATGATCAACGCACAGGGCTA Met-Thr-Tyr-Asp-Gln-Arg-Thr-Gly-Leu... Genetic code Rules of folding Active site

12 From Sequence to Organism How does Nature do it? ATGACTTATGATCAACGCACAGGGCTA Met-Thr-Tyr-Asp-Gln-Arg-Thr-Gly-Leu... Active site Cell interaction Metabolism, Architecture Genetic codeRules of folding

13 From Sequence to Organism How does Nature do it? ATGACTTATGATCAACGCACAGGGCTA Met-Thr-Tyr-Asp-Gln-Arg-Thr-Gly-Leu... Genetic code Active site Gives us: Custom antibiotics Genetic code Rules of folding

14 From Sequence to Organism How does Nature do it? ATGACTTATGATCAACGCACAGGGCTA Met-Thr-Tyr-Asp-Gln-Arg-Thr-Gly-Leu... Gives us: Custom antibiotics Custom antibodies Custom enzymes New materials Genetic code Rules of folding Active site

15 From Sequence to Organism How does Nature do it? ATGACTTATGATCAACGCACAGGGCTA Met-Thr-Tyr-Asp-Gln-Arg-Thr-Gly-Leu... Genetic code Rules of transcriptional and post-transcriptional control Transcr’l initiation Transcr’l termination/ polyA tailing Splicing Transl’l initiation ? TCTACTTATA TTCAATCCAC AGGGCTACAC CTAGTTCTTG AAGAGTCTGT TGAATGAACA CATACATGGT TTATCTGTTT TTCTGTCTGC TCTGACCTCT GGCAGCTTTC CACTAGTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC TTAGATAAAC GAACCTTAGT GACTTCTGCT ATACCAAAGT CTCCACGCCC CTCCGTAAAC CTCTAACATG ATGTCAGCAA ATATTAAAAA TGAATAAACT TTGTTAAAGG TACAAATGAA AATTAGCAAA AAGAGTTTAA AGTTAAAAAC GAATTGCAGT CATTCTAGGG AAACCTGTAT GGTTACATGA ACTGCCTAAA AAACAAGCTA TTATATATTT TAAGAAATTA ATTGCAATTA ATTTCCTGGG CCCCAGCTGT CATTAAAAAG AGGCAAATAC AGCCAAGGAC GACAGCACTG ACCCTCAAGA AGGCACCGGC TGACAGACAG GCTGAAATTC CGCTGAGAGC AGAGTGGTAC ATTGAACCCT CCCTGCACCA GGTCTTTCCT GTGGGCACTG AGTGCAGACA ATGAATGACT GAACGAACGA TTGAATGAAA AGAAATGAGA ATGACTTATGATCAACGCACAGGGCTA 3% TCTACTTATATTCAATCCACAGGGCTA CACCTAGTTCTTGAAGAGTCTGTTGAA TGAACACATACATGGTTTATCTGTTTT TCTGTCTGCTCTGACCTCTGGCAGCTT TAGCCTGCCCCACTCTTAGATAAACGA ACCTTAGTGACTTCTGCTATACCAAAG TCTCCACGCCCCTCCGTAAACCTCTAA CATGATGTCAGCAAATATTAAAAATGA 97%

16 From Sequence to Organism How does Nature do it? Natural filters/transformations Selective transcription Selective processing Translation Folding DNA Functional protein

17 From Sequence to Organism How does Nature do it? Natural filters/transformations DNA Functional protein

18 From Sequence to Organism How can WE do it? Simulation of Nature Utterence of Wm Shakespeare Utterence of George W Bush “Whether ‘tis nobler in the mind to suffer the slings and arrows of outrageous fortune...” “We must give our military every tool and weapon it needs to prevail...” ???

19 From Sequence to Organism How can WE do it? Surrogate Processes Utterence of Wm Shakespeare Utterence of George W Bush “Whether ‘tis nobler in the mind to suffer the slings and arrows of outrageous fortune...” “We must give our military every tool and weapon it needs to prevail...” Words/sentence; Choice of words; Sentence structure; …

20 From Sequence to Organism How can WE do it? Natural filters/transformations Selective transcription Selective processing Translation Folding Surrogate filters Characteristics of coding sequences/introns My sequence Gene finders Predicted coding regions

21 From Sequence to Organism How can WE do it? Natural filters/transformations Selective transcription Selective processing Translation Folding Surrogate filters Gene finders Similarity finders Sequence/motif Databases My sequence

22 From Sequence to Organism How can WE do it? Natural filters/transformations Selective transcription Selective processing Translation Folding Surrogate filters Gene finders Similarity finders Feature finders Predicted features Characteristics of features My sequence

23 From Sequence to Organism How can WE do it? Natural filters/transformations Selective transcription Selective processing Translation Folding Surrogate filters Gene finders Similarity finders Feature finders Pattern finders My sequences Statistical engine

24 Surrogate Filters Gene finders Similarity finders Feature finders Pattern finders How do they work? Case studies Real problems Mixed strategies You do it

25 Surrogate Filters Gene finders Class 1: Start/Stop codon search (Map, Frames, OrfFinder) CTCCACGCCCCTCCGTACACCTCTAACATGATGTCAGCAAATATTAAAAATGAATAAACTTTGTGACATGTACAAATGGAAATATGCAA CT CCA CGC CCC TCC GTA CAC CTC TAA CAT GAT CTC AGC AAA TAT TAA AAA TGA ATA AAC TTT GTG ACA TGT ACA AAT GGA AAT ATG CAA Look for start codons (ATG) (GTG,TTG) Look for stop codons (TAA,TAG,TGA)

26 CTCCACGCCCCTCCGTACACCTCTAACATGATGTCAGCAAATATTAAAAATGAATAAACTTTGTGACATGTACAAATGGAAATATGCAA TTGCATATTTCCATTTGTACATGTCACAAAGTTTATTCATTTTTAATATTTGCTGAGATCATGTTAGAGGTGTACGGAGGGGCGTGGAG Surrogate Filters Gene finders Class 1: Start/Stop codon search (Map, Frames, OrfFinder) Look for start codons (ATG) (GTG,TTG) Look for stop codons (TAA,TAG,TGA)

27 Pro: Quick, simple Con: Useless for eukaryotic genomic sequences (introns) Inaccurate (start codon problem) Inaccurate (doubtful short open reading frames) Surrogate Filters Gene finders Class 1: Start/Stop codon search (Map, Frames, OrfFinder)

28 Surrogate Filters Gene finders The code is degenerate Class 2: Codon bias recognition (TestCode) Are codons equally used?

29 Surrogate Filters Gene finders Codon usage is biased Most frequently used codons Class 2: Codon bias recognition (TestCode) Codon bias universal?

30 Surrogate Filters Gene finders Class 2: Codon bias recognition (TestCode) Pro: Quick, simple, available through GCG Better than Class 1 in excluding false open reading frames Con: Useless for eukaryotic genomic sequences (introns) Gives only general areas of open reading frames

31 Surrogate Filters Gene finders Class 3: Hidden Markov Model (HMM)-based recognition Principle Step 1: Create model through extensive training set * Training set = proven or suspected genes * Organism-specific Step 2: Assess candidate genes through filter of model

32 Step 1: Create model through extensive training set Surrogate Filters Gene finders Class 3: Hidden Markov Model (HMM)-based recognition AAA AAC AAG AAT ACA... TTG TTT Training Set AAGCTTGACCAAAAAGTTAAAACACTGACGGCAAATAATC AATGACTATCAGACAGAGAATCATCGTGCTGTCAGTAAAA CCTCTGATTTCGATCTTTACCATAATTGTTATGTTGTAAT GACTAACCAGACTATCTTTTACAGAGCTTCTGGTTAACAC TTGTCTAATTAGACATTGATAATGTTTGTGGGGGTTGGTC ATCAGGAATGGTAAATAGCAATTACCCTTCAGACTTTCCT ATGAGACGCTCCGCCAACGAGCAGTGTCTCTTAAAGAACG TTATGAGCGCTCAGTTAACTTCAGAAATTCACGGCGGAAA TCCATAGTTATTATTACTTATGACTAAAACAAAATTACTA TGGCGGCTTGTTTAATATAGATTCTGTGTTCTGAGAAATG ACTTTTAAAGTCCCACTAACTTTTTTCTCATCTATTGCTA TATTTCGACTTTAAAACTTATAGTAGATGGCTTAATTCTC AAATAACAAACTCATTTTTAGTAGATATTTCATGCAAACT GAGGTTTTTAGTGATATTTTCCCCTTATTGAGTACAGCCA CTCCACAAACCTTAGAATGGCTACTCAATATTGCAATTGA TCATGAATATCCCACTGGTAGAGCAGTTTTAATGGAAGAT GCCTGGGGTAATGCAGTTTATTTCGTTGTATCTGGATGGG TAAAAGTTCGGCGCACCTGTGGAGATGATTCGGTAGCTTT

33 Step 1: Create model through extensive training set AAAA: 33% AAAC: 25% AAAG: 12% AAAT: 30% Surrogate Filters Gene finders Class 3: Hidden Markov Model (HMM)-based recognition AAA AAC AAG AAT ACA... TTG TTT Training Set AAGCTTGACCAAAAAGTTAAAACACTGACGGCAAATAATC AATGACTATCAGACAGAGAATCATCGTGCTGTCAGTAAAA CCTCTGATTTCGATCTTTACCATAATTGTTATGTTGTAAT GACTAACCAGACTATCTTTTACAGAGCTTCTGGTTAACAC TTGTCTAATTAGACATTGATAATGTTTGTGGGGGTTGGTC ATCAGGAATGGTAAATAGCAATTACCCTTCAGACTTTCCT ATGAGACGCTCCGCCAACGAGCAGTGTCTCTTAAAGAACG TTATGAGCGCTCAGTTAACTTCAGAAATTCACGGCGGAAA TCCATAGTTATTATTACTTATGACTAAAACAAAATTACTA TGGCGGCTTGTTTAATATAGATTCTGTGTTCTGAGAAATG ACTTTTAAAGTCCCACTAACTTTTTTCTCATCTATTGCTA TATTTCGACTTTAAAACTTATAGTAGATGGCTTAATTCTC AAATAACAAACTCATTTTTAGTAGATATTTCATGCAAACT GAGGTTTTTAGTGATATTTTCCCCTTATTGAGTACAGCCA CTCCACAAACCTTAGAATGGCTACTCAATATTGCAATTGA TCATGAATATCCCACTGGTAGAGCAGTTTTAATGGAAGAT GCCTGGGGTAATGCAGTTTATTTCGTTGTATCTGGATGGG TAAAAGTTCGGCGCACCTGTGGAGATGATTCGGTAGCTTT

34 Step 1: Create model through extensive training set AACA: 30% AACC: 20% AACG: 15% AACT: 35% AAA AAC AAG AAT ACA... TTG TTT Surrogate Filters Gene finders Class 3: Hidden Markov Model (HMM)-based recognition Training Set AAGCTTGACCAAAAAGTTAAAACACTGACGGCAAATAATC AATGACTATCAGACAGAGAATCATCGTGCTGTCAGTAAAA CCTCTGATTTCGATCTTTACCATAATTGTTATGTTGTAAT GACTAACCAGACTATCTTTTACAGAGCTTCTGGTTAACAC TTGTCTAATTAGACATTGATAATGTTTGTGGGGGTTGGTC ATCAGGAATGGTAAATAGCAATTACCCTTCAGACTTTCCT ATGAGACGCTCCGCCAACGAGCAGTGTCTCTTAAAGAACG TTATGAGCGCTCAGTTAACTTCAGAAATTCACGGCGGAAA TCCATAGTTATTATTACTTATGACTAAAACAAAATTACTA TGGCGGCTTGTTTAATATAGATTCTGTGTTCTGAGAAATG ACTTTTAAAGTCCCACTAACTTTTTTCTCATCTATTGCTA TATTTCGACTTTAAAACTTATAGTAGATGGCTTAATTCTC AAATAACAAACTCATTTTTAGTAGATATTTCATGCAAACT GAGGTTTTTAGTGATATTTTCCCCTTATTGAGTACAGCCA CTCCACAAACCTTAGAATGGCTACTCAATATTGCAATTGA TCATGAATATCCCACTGGTAGAGCAGTTTTAATGGAAGAT GCCTGGGGTAATGCAGTTTATTTCGTTGTATCTGGATGGG TAAAAGTTCGGCGCACCTGTGGAGATGATTCGGTAGCTTT

35 Step 2: Assess candidate genes A C G T AAA 0.33 0.25 0.12 0.30 AAC 0.30 0.20 0.15 0.35 AAG 0.35 0.15 0.20 0.30 AAT0.30 0.15 0.20 0.25 ACA0.25 0.20 0.15 0.35... TTG0.25 0.30 0.15 0.30 TTT0.30 0.25 0.10 0.35 Candidate gene AAAGCAA… 0.12 3 rd order Markov model Surrogate Filters Gene finders Class 3: Hidden Markov Model (HMM)-based recognition

36 Step 2: Assess candidate genes AAAGCAA… 0.12 x 0.15 3 rd order Markov model Surrogate Filters Gene finders Class 3: Hidden Markov Model (HMM)-based recognition A C G T AAA 0.33 0.25 0.12 0.30 AAC 0.30 0.20 0.15 0.35 AAG 0.35 0.15 0.20 0.30 AAT0.30 0.15 0.20 0.25 ACA0.25 0.20 0.15 0.35... TTG0.25 0.30 0.15 0.30 TTT0.30 0.25 0.10 0.35 Candidate gene

37 Step 2: Assess candidate genes AAAGCTA… 0.12 x 0.15... So far, not a good candidate! 3 rd order Markov model Surrogate Filters Gene finders Class 3: Hidden Markov Model (HMM)-based recognition A C G T AAA 0.33 0.25 0.12 0.30 AAC 0.30 0.20 0.15 0.35 AAG 0.35 0.15 0.20 0.30 AAT0.30 0.15 0.20 0.25 ACA0.25 0.20 0.15 0.35... TTG0.25 0.30 0.15 0.30 TTT0.30 0.25 0.10 0.35 Candidate gene

38 Surrogate Filters Gene finders Class 3: Hidden Markov Model (HMM)-based recognition Pro: Almost most accurate method known Con: Needs big training set May miss genes of foreign origin Will miss very small genes

39 Surrogate Filters Gene finders Class 3: Hidden Markov Model (HMM)-based recognition Pro: Almost most accurate method known Con: Needs big training set May miss genes of foreign origin Will miss very small genes

40 Surrogate Filters Scenario I – Case of the Hidden Heterocyst

41 Case of the Hidden Heterocyst heterocysts Matveyev and Elhai (unpublished) N2N2 NH 3 O2O2

42 Case of the Hidden Heterocyst Strategy to find heterocyst differentiation genes Nostoc genome Transposon 1. Use transposon mutagenesis

43 Case of the Hidden Heterocyst Strategy to find heterocyst differentiation genes Nostoc genome Transposon 1. Use transposon mutagenesis to find a mutant defective in heterocyst differentiation

44 Case of the Hidden Heterocyst Strategy to find heterocyst differentiation genes Nostoc genome 2. Sequence out from transposon AAGCTTGACCAAAAAGTTAAAACACTGACGGCAAATA ATCAATGACTATCAGACAGAGAATCATCGTGCTGTCA GTAAAACCTCTGATTTCGATCTTTACCATAATTGTTA TGTTGTAATGACTAACCAGACTATCTTTTACAGAGCT TCTGGTTAACACTTGTCTAATTAGACATTGATAATGT TTGTGGGGGTTGGTCATCAGGAATGGTAAATAGCAAT TACCCTTCAGACTTTCCTATGAGACGCTCCGCCAACG AGCAGTGTCTCTTAAAGAACGTTATGAGCGCTCAGTT AACTTCAGAAATTCACGGCGGAAATCCATAGTTATTA TTACTTATGACTAAAACAAAATTACTATGGCGGCTTG TTTAATATAGATTCTGTGTTCTGAGAAATGACTTTTA AAGTCCCACTAACTTTTTTCTCATCTATTGCTATATT TCGACTTTAAAACTTATAGTAGATGGCTTAATTCTCA AATAACAAACTCATTTTTAGTAGATATTTCATGCAAA CTGAGGTTTTTAGTGATATTTTCCCCTTATTGAGTAC AGCCACTCCACAAACCTTAGAATGGCTACTCAATATT GCAATTGATCATGAATATCCCACTGGTAGAGCAGTTT TAATGGAAGATGCCTGGGGTAATGCAGTTTATTTCGT TGTATCTGGATGGGTAAAAGTTCGGCGCACCTGTGGA 1. Use transposon mutagenesis to find a mutant defective in heterocyst differentiation

45 Case of the Hidden Heterocyst Strategy to find heterocyst differentiation genes Nostoc genome 2. Sequence out from transposon AAGCTTGACCAAAAAGTTAAAACACTGACGGCAAATA ATCAATGACTATCAGACAGAGAATCATCGTGCTGTCA GTAAAACCTCTGATTTCGATCTTTACCATAATTGTTA TGTTGTAATGACTAACCAGACTATCTTTTACAGAGCT TCTGGTTAACACTTGTCTAATTAGACATTGATAATGT TTGTGGGGGTTGGTCATCAGGAATGGTAAATAGCAAT TACCCTTCAGACTTTCCTATGAGACGCTCCGCCAACG AGCAGTGTCTCTTAAAGAACGTTATGAGCGCTCAGTT AACTTCAGAAATTCACGGCGGAAATCCATAGTTATTA TTACTTATGACTAAAACAAAATTACTATGGCGGCTTG TTTAATATAGATTCTGTGTTCTGAGAAATGACTTTTA AAGTCCCACTAACTTTTTTCTCATCTATTGCTATATT TCGACTTTAAAACTTATAGTAGATGGCTTAATTCTCA AATAACAAACTCATTTTTAGTAGATATTTCATGCAAA CTGAGGTTTTTAGTGATATTTTCCCCTTATTGAGTAC AGCCACTCCACAAACCTTAGAATGGCTACTCAATATT GCAATTGATCATGAATATCCCACTGGTAGAGCAGTTT TAATGGAAGATGCCTGGGGTAATGCAGTTTATTTCGT TGTATCTGGATGGGTAAAAGTTCGGCGCACCTGTGGA 1. Use transposon mutagenesis to find a mutant defective in heterocyst differentiation 3. Find gene boundaries 4. Identify gene Do it

46 Case of the Hidden Heterocyst Strategy to find heterocyst differentiation genes 1. Go to http://www.vcu.edu/~elhaij/BioInf 2. Open second browser (Ctrl-N in Netscape) Go to same site (copy and paste URL) 3. In 1 st browser, go to Program List Click on Gene Finders Open GeneMark 4. In 2 nd browser, open Nostoc sequence

47

48 Case of the Hidden Heterocyst Strategy to find heterocyst differentiation genes Mission successful: >Translation: 397..639 (direct), 81 amino acids VLGSKIEEGPKHIILDLSQIDFIDSSGLGALVQLAKQAQTAEGTLQIVTNAR VTQTVKLVRLEKFLSLQKSVEEALENVK* … or was it? Check predicted protein against databases

49 Surrogate Filters Similarity finders Blast BlastP: Protein sequence to search protein database BlastN: Nucleotide sequence to search nucleotide database BlastX: Nucleotide sequence (translated) to search protein database TBlastN: Protein sequence to search (translated) nucleotide database Blast2Seq: Compare two sequences you specify Do it FastA (Various flavors) Pfam (Protein motif families) Finds conserved motifs similar to protein sequence

50

51 Case of the Hidden Heterocyst Strategy to find heterocyst differentiation genes Mission successful: >Translation: 397..639 (direct), 81 amino acids VLGSKIEEGPKHIILDLSQIDFIDSSGLGALVQLAKQAQTAEGTLQIVTNAR VTQTVKLVRLEKFLSLQKSVEEALENVK* Why? GeneMark correct: Conservation of noncoding regions VLGSK GeneMark wrong: Fooled by weird aa sequence or start codon

52 Case of the Hidden Heterocyst Strategy to find heterocyst differentiation genes Moral Automated gene finders are wonderful, but common sense is better Don’t trust automated annotation

53 Surrogate Filters Feature finders Hidden Markov model-based methods Good for contiguous features (e.g. signal sequences) Not good with features with gaps (e.g. promoters) Ad hoc methods Feature-specific rules (e.g. tandem repeats, terminators) Position-dependent frequency tables = Position-specific scoring matrix (PSSM) = Weight table

54 Surrogate Filters Feature finders Position-dependent frequency tables CCCTATATAAGGC...histone H1t CGCTATAAAAACT...HMG-17 GGGTATATAAGCG...b'-tubulin b'2 GGCTATATAAAAC...a'-actin skel-m. TTCTATAAAGCGG...a'-cardiac actin CCCTATAAAACCC...b'-actin GAGTATAAAGCAC...keratin I 50K GGTTATAAAAACA...vimentin CAGTATAAAAGGG...a'1(I) collagen CCGTATAAATAGG...a'2(I) collagen TCCCATATAAGCC...fibronectin Some of 106 aligned human promoter sequences (near -26) Consensus TATAAA

55 Surrogate Filters Feature finders Position-dependent frequency tables CCCTATATAAGGC...histone H1t CGCTATAAAAACT...HMG-17 GGGTATATAAGCG...b'-tubulin b'2 GGCTATATAAAAC...a'-actin skel-m. TTCTATAAAGCGG...a'-cardiac actin CCCTATAAAACCC...b'-actin GAGTATAAAGCAC...keratin I 50K GGTTATAAAAACA...vimentin CAGTATAAAAGGG...a'1(I) collagen CCGTATAAATAGG...a'2(I) collagen TCCCATATAAGCC...fibronectin Some of 106 aligned human promoter sequences (near -26)

56 aceBACTATGGAGCATCTGCACATGAAAACC atpIACCTCGAAGGGAGCAGGAGTGAAAAAC bioBACGTTTTGGAGAAGCCCCATGGCTCAC glnAATCCAGGAGAGTTAAAGTATGTCCGCT glnHTAGAAAAAAGGAAATGCTATGAAGTCT lacZTTCACACAGGAAACAGCTATGACCATG rpsJAATTGGAGCTCTGGTCTCATGCAGAAC serCGCAACGTGGTGAGGGGAAATGGCTCAA sucAGATGCTTAAGGGATCACGATGCAGAAC trpE CAAAATTAGAGAATAACAATGCAAACA Position-Specific Scoring Matrix in action Surrogate Filters Feature finders Experimentally proven start sites unknown

57 aceBACTATGGAGCATCTGCACATGAAAACC atpIACCTCGAAGGGAGCAGGAGTGAAAAAC bioBACGTTTTGGAGAAGCCCCATGGCTCAC glnAATCCAGGAGAGTTAAAGTATGTCCGCT glnHTAGAAAAAAGGAAATGCTATGAAGTCT lacZTTCACACAGGAAACAGCTATGACCATG rpsJAATTGGAGCTCTGGTCTCATGCAGAAC serCGCAACGTGGTGAGGGGAAATGGCTCAA sucAGATGCTTAAGGGATCACGATGCAGAAC trpE CAAAATTAGAGAATAACAATGCAAACA Position-Specific Scoring Matrix in action Surrogate Filters Feature finders Experimentally proven start sites unknown

58 aceBACCACATAACTATGGAGCATCTGCACATGAAAACC atpI ACCTCGAAGGGAGCAG.....GAGTGAAAAAC bioB ACGTTTTGGAGAAGC...CCCATGGCTCAC glnA ATCCAGGAGAGTTA.AAGTATGTCCGCT glnH TAGAAAAAAGGAAATG.....CTATGAAGTCT lacZ TTCACACAGGAAACAG....CTATGACCATG rpsJ AATTGGAGCTCTGGTCTCATGCAGAAC serC GCAACGTGGTGAGGG...GAAATGGCTCAA sucA GATGCTTAAGGGATCA....CGATGCAGAAC trpE CAAAATTAGAGAATA...ACAATGCAAACA Surrogate Filters Feature finders Position-Specific Scoring Matrix in action ACGTACGT

59 aceB ACCACATAACTATGGAGCATCT.GCACATGAAAACC atpI ACCTCGAAGGGAGCAG.....GAGTGAAAAAC bioB ACGTTTTGGAGAAGC...CCCATGGCTCAC glnA ATCCAGGAGAGTTA.AAGTATGTCCGCT glnH TAGAAAAAAGGAAATG.....CTATGAAGTCT lacZ TTCACACAGGAAACAG....CTATGACCATG rpsJ AATTGGAGCTCTGGTCTCATGCAGAAC serC GCAACGTGGTGAGGG...GAAATGGCTCAA sucA GATGCTTAAGGGATCA....CGATGCAGAAC trpE CAAAATTAGAGAATA...ACAATGCAAACA Surrogate Filters Feature finders Position-Specific Scoring Matrix in action ACGTACGT

60 Surrogate Filters Pattern finders Specified patterns (FindPatterns, PatScan) e.g. Find instances of restriction sites New pattern discovery (Meme, Gibbs sampler) snRNA U1 (pU1-6)AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC histone H1tGCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT HMG-14CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG TP1GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT protamine P1CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT nucleolinGCAGGCTCAGTCTTTCGCCTCAGTCTCGAGCTCTCGCTGG snRNP ETGCCGCCGCGTGACCTTCACACTTCCGCTTCCGGTTCTTT rp S14GACACGGAAGTGACCCCCGTCGCTCCGCCCTCTCCCACTC rp S17TGGCCTAAGCTTTAACAGGCTTCGCCTGTGCTTCCTGTTT ribosomal p. S19ACCCTACGCCCGACTTGTGCGCCCGGGAAACCCCGTCGTT a'-tubulin ba'1GGTCTGGGCGTCCCGGCTGGGCCCCGTGTCTGTGCGCACG b'-tubulin b'2GGGAGGGTATATAAGCGTTGGCGGACGGTCGGTTGTAGCA a'-actin skel-m.CCGCGGGCTATATAAAACCTGAGCAGAGGGACAAGCGGCC a'-cardiac actinTCAGCGTTCTATAAAGCGGCCCTCCTGGAGCCAGCCACCC b'-actinCGCGGCGGCGCCCTATAAAACCCAGCGGCGCGACGCGCCA Human sequences 5’ to transcriptional start

61 Surrogate Filters Pattern finders How do pattern finders work? snRNA U1 (pU1-6)AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC histone H1tGCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT HMG-14CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG TP1GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT protamine P1CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT Step 1. Arbitrarily choose candidate pattern from a sequence Step 2. Find best matches to pattern in all sequences Step 3. Construct position-dependent frequency table based on matches Step 4. Calculate relative probability of matches from frequency table GACAGGGCAGAA GCCCGGGTGTTT GCCGGGGACGCG GCCCCCGGGCCT GCCGCAGAGCTG

62 Surrogate Filters Pattern finders How do pattern finders work? snRNA U1 (pU1-6)AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC histone H1tGCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT HMG-14CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG TP1GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT protamine P1CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT Step 1. Arbitrarily choose candidate pattern from a sequence Step 2. Find best matches to pattern in all sequences Step 3. Construct position-dependent frequency table based on matches Step 4. Calculate relative probability of matches from frequency table Step 5. If probability score high, remember pattern and score

63 Surrogate Filters Pattern finders How do pattern finders work? snRNA U1 (pU1-6)AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC histone H1tGCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT HMG-14CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG TP1GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT protamine P1CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT Step 1. Arbitrarily choose candidate pattern from a sequence Step 2. Find best matches to pattern in all sequences Step 3. Construct position-dependent frequency table based on matches Step 4. Calculate relative probability of matches from frequency table Step 5. If probability score high, remember pattern and score Step 6. Repeat Steps 1 - 5

64 Surrogate Filters Scenario II – Case of the Masked Motif You’ve found a gene related to Purple Tongue Syndrome BlastP: Encoded protein related to cAMP-binding proteins Are the similarities trivial? Related to cAMP binding? Does your protein contain cAMP-binding site? What IS a cAMP-binding site? Task 1.Determine what is a cAMP-binding site 2.Determine if your protein has one

65 Surrogate Filters Scenario II – Case of the Masked Motif 1.Collect sequences of known cAMP-binding proteins 2.Run Meme, a pattern-finding program Ask it to find any significant motifs 3.Rerun Meme. Demand that every protein has identified motifs 4.Run Pfam over known sequence to check Do it Strategy

66

67 Surrogate Filters Scenario III – Case of the Mortal Mitochondrion Progressive External Ophthalmoplegia (PEO) Slow paralysis of voluntary eye muscles Many other symptoms (e.g., frequent deafness) Loss of mitochondrial DNA

68 Surrogate Filters Scenario III – Case of the Mortal Mitochondrion Progressive External Ophthalmoplegia (PEO) Slow paralysis of voluntary eye muscles Many other symptoms (e.g., frequent deafness) Loss of mitochondrial DNA Inheritance Mendelian Autosomal dominant Linked to chromosome 4q34

69 Surrogate Filters Scenario III – Case of the Mortal Mitochondrion Progressive External Ophthalmoplegia (PEO) Slow paralysis of voluntary eye muscles Many other symptoms (e.g., frequent deafness) Loss of mitochondrial DNA Inheritance Mendelian Autosomal dominant Linked to chromosome 4q34 Your task Examine sequence of 4q34 region Assess likelihood that a gene in the area could cause disease symptoms

70 Surrogate Filters Scenario III – Case of the Mortal Mitochondrion Examining Sequence of 4q34 Region tctacttatattcaatccacagggctacacctagttcttggtacacagtacatgctcagcaagagtctgttgaatgaacacatacatggtttatctgtttgtctcttccgagttcttgacttctgtctgct ctgacctctggcagctttccactagtttctagctttcattctgcttacctggatttcggaactctagcctgccccactcttagataaacgcatgccctctgtggccctggaaccttagtgacttctgctat accaaagtctccacgcccagggtgacacgcagctgcagctccgtaaacctctaacatgatgtcagcaaatattaaaaaaaaaaagtttataaaaacaatgaataaactttgttaaaggtacaaatgaaaat tagcaaacatgggaagataattgagtaaagagtttaaagttaaaaacgaattgcagtcattctaggggaaggaacagttgtatttgaaaacctgtatggttacatgaactgcctaaaaaacaagctaagga aaattaaagctcagatttatatattttaagaaattaattgcaattaatttcctgggattaaatagcatttcctcaaccccagctgtcattaaaaagaggcaaatacagccaaggactggatcttctccgga aggctgacagcactgaccctcaagaaggcaccggctgacagacagaacattctgccctaatatgtgctgaaattccgctgagagcagagtggtacattgaaccctttaggggcttacaaaagaagtgtcct gtgttttagagtcacagagttttgcagaaacaagtatgaattcacctagtggccccctgcaccaggtctttcctgtgggcactgagtgcagacacatcaatatgtaatagcagaatgaatgactgaacgaa cgattgaatgaaaagaaatgagaggcagcaggttgtcagattctatgaggcaatcacagcatcaggtgaccttagtatctatttgagaggactgccatttattctcgggagcgcacggctctaaagaggcc catatccaggcagtgagctctggtggggggcgcctttagatgcaagaaggaggaaacagctcgaaatccctgggcctgagcgcggcccgtgcaggccggagggtcaagaactctccaccggcggcagcggc ccggtgtctgccccggcttcgccccggcctaaggctgcctgtgctataaatacgcggcccacatgccgcggtgacacggtgttccctgggctcggcgggacagataacatgaatgtgccctttaaacgtcc caagttgcagggacagcccccggcccagcctcgctcccggaagcgccttcgcccccgatgccctctgcagctgggaggagggggcgccccgcacctgcccagccaatgcgcggcgcgagcgccggccgcga cccgcctcctctcgcgagagcccggcggggatataagggggagctgcgggccaggcggcggccccctagcgtcgcgcagggtcggggactgcgcgcggtgccaggccgggcgtgggcgagagcacgaacgg gctgcctgcgggctgagagcgtcgagctgtcaccatgggtgatcacgcttggagcttcctaaaggacttcctggccgggggcgtcgccgctgccgtctccaagaccgcggtcgcccccatcgagagggtca aactgctgctgcaggtgaggaccgcgcggtgcaagaggcgggcgcgggcgcggcgggccgggcggggcgcgcgatgcggcgcgagctgcagggcgcggggcgccgcggaaaatctgcgccaggccacaggc ccgggcgcccgcccgcccgcgggggaagaaggtgccctctgcgtagagacaggtccagcgtcagtcgcagattcctggtgtcgggtggcgcccggcgttcgggtgtctatatatggaaacccacccggagc cggtttacgtgtgccagatcctgcgcccgtgacagcacgggcgtgcactcaggcccggaggcacctagtgattgccagtatttttggcaccgtcttatgcgcacgcacctttacaataaaaacatcaaaat aatcatcacccaagaattcccttatcgtatctcatgcacaatgctgtatgtaggctgacgccttcatctttatgtaacctctgtgagagagttattcttctccattttacagatgaagctgaggttttgaa atattaagaaacaattttcggaataaactcagatcatcctgtctccaaatcttttcctcccctacctggtcgctgaatggtttatcatcctctcgtgttttcctccacctgcccaaaaggtcagggcccct caatgaggaagagcccaatttgggagtcagaattactaacaacaaaacccccacaaattgctcacaacggcagcaaacccttaataattgattacttggattatctgcttgaaaactttggaggcctaatg tttagtggatttattctccttcctctattagagcatctagtagagatcctcatctccagggtgatcagagtgacactgagaaattgtcattttttggccatcatgtctattaaatccaaagccctttgaag cagggagtgttactcatttctgtcccccagtaagcccctcatacagttctcaaacctagggaaagtgaaataaataaatggctatagctttatataattcaatcaccttttcagtttatttggggcaatac ctttccctcaaataccctaataattgaagcaacattggattattttggcttgttatccagtaactaacatggataacagtatccatttacacgtcctcgtatccatttgatttcctcatcctttttttctt caaaaaaaaaatctaggaagtgcaaaccttttttttttctcctgtcctcttcccttctctctaccctgcctgtcctctgtcacccaccctcccctccaccaggtccagcatgccagcaaacagatcagtgc tgagaagcagtacaaagggatcattgattgtgtggtgagaatccctaaggagcagggcttcctctccttctggaggggtaacctggccaacgtgatccgttacttccccacccaagctctcaacttcgcct tcaaggacaagtacaagcagctcttcttagggggtgtggatcggcataagcagttctggcgctactttgctggtaacctggcgtccggtggggccgctggggccacctccctttgctttgtctacccgctg gactttgctaggaccaggttggctgctgatgtgggcaagggcgccgcccagcgtgagttccatggtctgggcgactgtatcatcaagatcttcaagtctgatggcctgagggggctctaccagggtttcaa cgtctctgtccaaggcatcattatctatagagctgcctacttcggagtctatgatactgccaagggtgagagaggggcatcggggagaaggagggtggtgtggaaagaggatcctatgggatctataactc acaaaggacctgatatatattgatcttgttttttctagtctctgggataattgaggcttctgaatgaggaggtgatgtgcataagttaatagctgaagcgttccttgtgtcctctactgaaataaactctg gcctttagttattcagagaggaggaggggggagcctgtctccctctagacacagccatagcagttactgagtttaacttgaagccacttccaatgccctgtatacaagctgagcactgcccctccggggtc cggagagggcagcagccacctttgctgtctgcctggtcatatgtgaagcacctgcacaggggcaggttccccgcaaggtcagagcatggagctggaggtgcagtggcctctctccctccacctgctttctg ctgagaacaggcacttcatagccgttcggcttctgggctctgtccacagggatgctgcctgaccccaagaacgtgcacatttttgtgagctggatgattgcccagagtgtgacggcagtcgcagggctggt gtcctacccctttgacactgttcgtcgtagaatgatgatgcagtccggccggaaagggggtaagcttgtgctctactcatctaaacttgtttggttttgcccgaggagaacattttacagggctcctttca gtcttccttactggaaattaattttcaaaattatttgataaggacttagggaagaaagatggtattaattccccctaacgttctcaactatcctattagggaaaagtattttccattttattagagatgat aagaacatgaatagtaagacatttagatgtgaatttaactaggtatccagcattatagagaccctaggccctcttcccttagagcctgggtgcaaaagctagggaaaagaagtagttagctacttcttaca aagaactcttgcttccctcctagttacaggtgttagtgggatggggtgtttagctgggtagagatggcctgaagcaatctgttgtgccagagaaagttttggcttctataggttgaaccatatgaaattgc cactttaaaagtcaaaaacagtccaatgttagcagtttcgtatgtttcaacgaatagttacagccttttatttagactgcataacctcgtgcaggatcatctgaggctcagcctcagttcggtcctccata aaaaaaggtaaccgcgtagcataatactcctgctccactgcgcccttcttgtttcgcagttgggcagtccatgaattacttggttaattgccccagttcttcactgaccttgaactaatggagtaggaatg acaggagacccagcctgccagtgaagcaaggaaggagatgtccagtgggatgttgcatggagctgggactccatgcccagatgaccctgattttataaaactggtaacagtgtgtacagatatgtttcagg ggaaaagtctctttcctccagcgttacggagccctcaccagcatttgtttccacagccgatattatgtacacggggacagttgactgctggaggaagattgcaaaagacgaaggagccaaggccttcttca aaggtgcctggtccaatgtgctgagaggcatgggcggtgcttttgtattggtgttgtatgatgagatcaaaaaatatgtctaatgtaattaaaacacaagttcacagatttacatgaacttgatctacaag ttcacagatccattgtgtggtttaatagactattcctaggggaagtaaaaagatctgggataaaaccagactgaaggaatacctcagaagagatgcttcattgagtgttcattaaaccacacatgtatttt gtatttattttacatttaaattcccacagcaaatagaaaataatttatcatacttgtacaattaactgaagaattgataataactgaatgtgaaacatcaataaagaccacttaatgcacgctttctattt tattgaactcttattaactgtaaaatgcatttttaaaagatcaaaaatgcatattttctagcatgattcatgtatcagtcagcagccaagcttctaaatgccagatattatattgagaatgtattatatga gaacgtacaatgcttaaagttccggttttcaaacttaggcaggtcatattctatctatcttatccagcgttactgtaggctagaaagtgataatggctttcataatcctgccttgtcttaggcactttcct gcag

71 Strategy Protein has function associated with mitochondrial location? Protein has structure associated with mitochondrial location? Assume that encoded protein is in mitochondria – Use Gene finder to identify protein sequence(s) – Use Similarity finder to identify possible function – Use Feature finders to identify pertinent regions – (What ARE pertinent regions?) Surrogate Filters Scenario III – Case of the Mortal Mitochondrion

72 Name: PEO-related_gene? First three lines of sequence: tctacttatattcaatccacagggctacacctagttcttggtacacagtacatgctcagcaagagtctgttgaat gaacacatacatggtttatctgtttgtctcttccgagttcttgacttctgtctgctctgacctctggcagctttc cactagtttctagctttcattctgcttacctggatttcggaactctagcctgccccactcttagataaacgcatg fgene Wed Feb 27 16:55:29 GMT 2002 >PEO-related_gene? length of sequence - 5768 number of predicted exons - 5 positions of predicted exons: 1607 - 1717 w= 17.84 ORF: 1607 - 1717 2985 - 3231 w= 9.13 ORF: 2985 - 3230 3421 - 3471 w= 6.08 ORF: 3423 - 3470 3980 - 4120 w= 12.62 ORF: 3982 - 4119 5035 - 5192 w= 1.93 ORF: 5037 - 5192 Length of Coding region- 708bp Amino acid sequence - 235aa MGDHAWSFLKDFLAGGVAAAVSKTAVAPIERVKLLLQVQHASKQISAEKQYKGIIDCVVR IPKEQGFLSFWRGNLANVIRYFPTQALNFAFKDKYKQLFLGGVDRHKQFWRYFAGNLASG IIIYRAAYFGVYDTAKGMLPDPKNVHIFVSWMIAQSVTAVAGLVSYPFDTVRRRMMMQSG RKGADIMYTGTVDCWRKIAKDEGAKAFFKGAWSNVLRGMGGAFVLVLYDEIKKYV* Surrogate Filters Scenario III – Case of the Mortal Mitochondrion Run 4q34 region through FGene

73 Name: PEO-related_gene? First three lines of sequence: tctacttatattcaatccacagggctacacctagttcttggtacacagtacatgctcagcaagagtctgttgaat gaacacatacatggtttatctgtttgtctcttccgagttcttgacttctgtctgctctgacctctggcagctttc cactagtttctagctttcattctgcttacctggatttcggaactctagcctgccccactcttagataaacgcatg Fgenesh Wed Feb 27 16:59:14 GMT 2002 FGENESH 1.0 Prediction of potential genes in Human genomic DNA Time: Wed Feb 27 16:59:14 2002 Seq name: PEO-related_gene? Length of sequence: 5768 GC content: 48 Zone: 2 Positions of predicted genes and exons: G Str Feature Start End Score ORF Len 1 + TSS 1216 -2.70 1 + 1 CDSf 1607 - 1717 18.01 1607 - 1717 111 1 + 2 CDSi 2985 - 3471 52.41 2985 - 3470 486 1 + 3 CDSi 3980 - 4120 20.99 3982 - 4119 138 1 + 4 CDSl 5035 - 5192 2.32 5037 - 5192 156 1 + PolA 5471 0.92 Predicted protein(s): >FGENESH 1 4 exon (s) 1607 - 5192 298 aa, chain + MGDHAWSFLKDFLAGGVAAAVSKTAVAPIERVKLLLQVQHASKQISAEKQYKGIIDCVVR IPKEQGFLSFWRGNLANVIRYFPTQALNFAFKDKYKQLFLGGVDRHKQFWRYFAGNLASG GAAGATSLCFVYPLDFARTRLAADVGKGAAQREFHGLGDCIIKIFKSDGLRGLYQGFNVS VQGIIIYRAAYFGVYDTAKGMLPDPKNVHIFVSWMIAQSVTAVAGLVSYPFDTVRRRMMM QSGRKGADIMYTGTVDCWRKIAKDEGAKAFFKGAWSNVLRGMGGAFVLVLYDEIKKYV FGENE output 1607 - 1717 w= 17.84 2985 - 3231 w= 9.13 3421 - 3471 w= 6.08 3980 - 4120 w= 12.62 5035 - 5192 w= 1.93 Surrogate Filters Scenario III – Case of the Mortal Mitochondrion Run 4q34 region through FGeneSH

74 How to decide where exons are? AAAAAAAA mRNA DNA P Exon Intron Exon Intron Exon hnRNA Strategy Compare sequence of 4q34 region to sequence of mRNA Sequence of mRNA may be in cDNA library Expressed Sequence Tag (EST) library Problems Library may not exist Expression of gene may be low

75 MORAL: Trust, but verify. Final Score Card for Gene Finders 3980-4120 Surrogate Filters Scenario III – Case of the Mortal Mitochondrion Run 4q34 region through BlastN (x human est’s)

76 Strategy Protein has function associated with mitochondrial location? Protein has structure associated with mitochondrial location? Assume that encoded protein is in mitochondria – Use Gene finder to identify protein sequence(s) – Use Similarity finder to identify possible function – Use Feature finders to identify pertinent structures – (What ARE pertinent structures?)  Surrogate Filters Scenario III – Case of the Mortal Mitochondrion

77 Name: PEO-related_gene? First three lines of sequence: tctacttatattcaatccacagggctacacctagttcttggtacacagtacatgctcagcaagagtctgttgaat gaacacatacatggtttatctgtttgtctcttccgagttcttgacttctgtctgctctgacctctggcagctttc cactagtttctagctttcattctgcttacctggatttcggaactctagcctgccccactcttagataaacgcatg Fgenesh Wed Feb 27 16:59:14 GMT 2002 FGENESH 1.0 Prediction of potential genes in Human genomic DNA Time: Wed Feb 27 16:59:14 2002 Seq name: PEO-related_gene? Length of sequence: 5768 GC content: 48 Zone: 2 Positions of predicted genes and exons: G Str Feature Start End Score ORF Len 1 + TSS 1216 -2.70 1 + 1 CDSf 1607 - 1717 18.01 1607 - 1717 111 1 + 2 CDSi 2985 - 3471 52.41 2985 - 3470 486 1 + 3 CDSi 3980 - 4120 20.99 3982 - 4119 138 1 + 4 CDSl 5035 - 5192 2.32 5037 - 5192 156 1 + PolA 5471 0.92 Predicted protein(s): >FGENESH 1 4 exon (s) 1607 - 5192 298 aa, chain + MGDHAWSFLKDFLAGGVAAAVSKTAVAPIERVKLLLQVQHASKQISAEKQYKGIIDCVVR IPKEQGFLSFWRGNLANVIRYFPTQALNFAFKDKYKQLFLGGVDRHKQFWRYFAGNLASG GAAGATSLCFVYPLDFARTRLAADVGKGAAQREFHGLGDCIIKIFKSDGLRGLYQGFNVS VQGIIIYRAAYFGVYDTAKGMLPDPKNVHIFVSWMIAQSVTAVAGLVSYPFDTVRRRMMM QSGRKGADIMYTGTVDCWRKIAKDEGAKAFFKGAWSNVLRGMGGAFVLVLYDEIKKYV Surrogate Filters Scenario III – Case of the Mortal Mitochondrion Run 4q34 region through BlastP

78 Summary One protein in region Contains mitochondrial carrier motifs Similar to ATP/ADP transporter Mitochondrial signal sequence? Reasonable candidate for PEO-related protein Surrogate Filters Scenario III – Case of the Mortal Mitochondrion Run 4q34 region through BlastP

79 Complex gene discovery Your turn: Repeat and extend characterization of PEO-related gene 1. Take same sequence (FastA format) e-mailed to you 2. Get better estimate of promoter and polyA site (e.g. by TSSW and PolyASH) (Is there a TATA box upstream from the predicted promoter?) 3. Find encoded protein sequence by suitable method (e.g. FGeneSH(GC) or comparison with cDNA) 4. Continue characterization of protein * Contains signal sequence? * Contains transmembrane domains?

80

81 Filter limitation Inevitable… but whose filter?

82 Filters controlled by outside programmers

83 Filters controlled by you

84

85


Download ppt "Frog’s eye view of the jungle (time frozen) Push to restart time."

Similar presentations


Ads by Google