Download presentation
Presentation is loading. Please wait.
Published byKristian Bates Modified over 9 years ago
2
Billions and Billions of Bases How does a biologist maintain a grip on reality?
3
46 chromosomes ~3 billion nucleotides The Human Genome Project One millionth of total
4
The Human Genome Project TGAGACACATATTTTTGATATTCCAGTTGTTGCAATC GAATGTAAAACATATTTAGATCTTTAAATGTATGGTAC ATTCAAGATCCAACCTTCATTCTAGTGTTTAAAGAGAAC TGATTTGTTTGCAGGGGCAGGAGGCTTTGGTTTAGGTTTTG AAATGGCAGGCTTCTCTGTACCTTTATCTGTTGAAATTGATAC C TGGGCTTGTGATACACTACGCTACAACCGCCCTGATTCAACAGT TA TTCAAAATGATATCGGTAACTTTAGTACAGAAAATGACGTTAAGA ATA TCTGCAACTTTAAACCTGATATTATTATTGGCGGGCCTCCATGCCAG GGATT TAGTATTGCTGGGCCAGCCCAAAAAGATCCTAAAGATCCTAGAAATG GTTTATT CATCAACTTTGCACAATGGATAAAATTTCTTGAACCTAAAGCGTTTGTC ATGGAAAA CGTAAAAGGATTGCTATCAAGGAAAAATGCAGAAGGTTTTAAAGTTATAG ATATTATTAAG AAAACATTTGAAGAACTTGGTTATTTTGTCGAAGTATGGGTTTTAAATGCTG CGGAATATGGCATT CCGCAAATTAGAGAACGTATTTTTATTGTTGGCAATAAAAAAGGTAAAGTACT AGGTATTCCTAAAAAAA CACATTCTCTGCAATTTTTAAATTTAAATAGGTCTCAATTATCGATCTTCGATGAT ATGAGTATTATACCTGCACTAA CTTTGTGGGACGCAATATCAGACTTACCAGAACTTAATGCGCGTGAAGGAAGTGAA GAGCAACCCTATCATTTAAAACCTC AAAATACTTATCAGACTTGGGCTAGAAATGGTAGTGCTACGCTTTACAATCATGTTGCAAT GGAACATTCTGACCGTTTAGTAGAACG TTTCCGGCATATAAAATGGGGTGAATCCAGTTCGGATGTATCTAAAGAACATGGAGCTAGACGACGT AGTGGTAATGGTGAATTATCAAACAAATCA TATGATCAGAATAATCGCCGTTTAAATCCTCATAAACCGTCTCACACTATTGCTGCGTCATTCTATGCTAATTTTG TCCATCCTTTTCAACATCGAAATTTAACAGCCCGT GAAGGAGCTAGAATCCAATCTTTTCCAGATAACTATAGATTTTTTGGAAAAAAAACTGTCGTATCTCATAAACTATTGCATCGA GAAGAAAGATTTGATGAAAAATTTCTTTGTCAATATAATC AAATCGGTAATGCTGTACCCCCTCTTCTCGCTAAAGTAATTGCACATCATCTTCTAGAGAAATTAGAGTTATGCCAACAACTGATAGAAATCCTCTA GTGCATGGATCAAATCTTGAACAAAAAGAGAATCATCGTACAAAA TACAGAGATACTGAAAGCAGGACTTTCCTTAGAGAAATCAGAACTGAATATGACAAATGGCATAAAGCAAATATGAACCTGGTTGGACCAAAATCAGAAATTACTGACCA AGATGATTCAATTATTACTCAAAGAGTGGAACTTCTCACTAAATATAAAGATTTT TTAGATCAGCAGCATTATGCAGAAAAATTTGATTCAAGATCCAACCTTCATTCTAGTGTTTTAGAGACCATTTATAAAGTAAATCTTTAGACGACTAGACGACGTAGCATAATACGAGTCATAACGGCATATATG GCAGCCTCACTCATTTCTGGGAGACGCTCATAATCCTTACTGAGACGACGGTACTGGTTTAACCAGCC AAATGTTCTTTCTACTACCCACCGTTTGGGCAAAACCTGAAATTCTTGATTAGTACGCCGGATTACCTCAACATGAGCTTGAATCATCAGCCAAACAGAGAGCGCAAATTTATCACCGTCATAGCCGGAATCAACCCAGATGACTTCAACTTTTTCCAGTAATTCTGGACGCTCTTCTAACAGTTCCATCAAAGTATA GGCGGCAAGTAATCTTTCTCCAGCATTTGCTTCACTTACAACCACTTTTAACAA AAGTCCCAGACTATCAACCAAAGTTTGCCGCTTTCGTCCTTTTACCTTCTTGCCACCATCAAAACCGTACACATCCCCCTTTTTTCAGTCGTTTTTACCGACTGGCTGTCTGCCGCGATCGCCGTGGGTTGAGTTGACTTCCCCATTTTTTGACGAACTTGATCGCGCAAAGTATGATTCATTTCAGTTGAACTAGGAGGAAAATCCCCTGGAAGCATATCCCACTGAC AACCTGTTTTCAGATGGTAGTAGATAGCGTTGCATACTTCTCGCATATCAGTTGTTCGGGGATGCCCACCGCATTTAGCGGGTGGAATCAAAGGAGCTAAAATTGCCCATTCTGAGTCATTAAGGTCTGTA GAATAAGACTTTCGTCTCATTGTTTCCTATGTAAATACACTCTACAAACAGTATCTTATCGCTGCCTTTTTATCTTAGCTCTCCTTTAGATTTACTTTATAAATAGCCTCTTAGAAGAATTTCTTTATTATTTATTTAAAGATTTAGTACAAGATTTCGGGCAGAACGCTCTTATTGGTAAGTCACACACGTTCAAAGATATTTTCTTCGTACCACCAAAATATTCTGAAATGCTCAAGCGACCTTATGCGCGAATTGAGAGAAAAGATCATGATTTCGTAATTGGTGCAACTGTTCAAGCATCGCTTGAAGCAGCACCTCCTCCAGAACAAAACCATGCTTGAGGGATCTTCACGCGCAGCAGAGGATTTAAAAGCGAGAAATCCTAACAGTTTATACCTTGTGGTTATGGAATGGATAAAACTGACCAATGATGTAAATTTACGAAAATATAAAGTTGAT CAAATTTATGTACTACGTCAGCAAAAAAATACTGATAGAGAGTTTAGGTATGAGTCAACTTACATAAAAAAT
5
The Human Genome Project AATAAAGCTTTACAAACCAA ACTCTGGCTTCAATTGTGTAA CCCAAGCTTTGATTCTTTCCT CTGTTAAATCGGATTGATTAT CTTCATCAAGGGCAAGACCT ACAAATTTACCATCACGAAC AGCTTTAGACTCACTGAATT CATAACCTTCTGTAGGCCAA TAGCCAACTGTTTCACCACC ATTTTCTGAAATTTTTTCCTCT AGAATACCGAGGGCATCTTG AAATGTATCAGGATAACCAA CCTGGTCTCCAGGAGCAAAA TAAGCAACTTTTTTGCCGATG AAGTCAATGTTATCTAACTC ATCATAAAAATTTTCCCAAT CACTTTGCAATTCTCCAACAT TCCAGGTAGGACAACCAAC AACGATATAATCGTAGTTAT TGAAATCACTTGGTTCAGCTT GTGAAATATCATATAAAGTT ACAACACTATCACCACCAAA CTCCTTCTGAATTATTTCTGA TTCAGTTTGGGTATTGCCTGT TTGAGTACCAAAAAATAAAC CAATATTAGACATTTTTACTC CTTTTATGTATTTGCAAAATT ATTTCAATTAAAATATTTAGT AATAATTAATTGTTAGCTAG CTAATAATTAAATTTTTATTA CAATCATTGTAAAAGGCATT GAAAAAGTAAATAAAAATT TTTATTCTACGTTATTTCAAA AATATTTACTTACATATACTT AACCTTTATAGTGATGTAAT ATACTCTAATTCCTATTTTAC TTATAAATACCATCTCAGCTT AATGTAACGAATTTTTCTGTT TATCTTTAAATACAAAAAAT TCAACAAAACTACAGAAAA TTAATCTTAATAACACAAAA CAAGTATCAATCTGTAATAC AACTAAGCTTAAATAAATTA ATAGAAAGCTTCATCTATCT AATAGGTTGAGAATAGTTTA TGTCTAATGACATAAATTCA TTCGTGTTGATTTCATTTGGG TATATTCATCTGATTTAGGAT TTACTCCATTAAGTTTGTACT CATCAATGCCCGCCTGTTGG TATCCACAATTCTCATACAG TGCGCGAGCAAAGTAATCA ATCGTTCGTCGCCATATCTA ACTTTGAGTCAAACAAACCA GTTGGATTACCAACCCTCAA CTAATCGCTTCTTTAAGGCG AGCGATCGCACATTTAACTG TTGGTTGTCACAAGAGAACT AATACTACAGCAGTATATTT AACAACTAAGGGTGGTTCAA CTTTCGCTGCGACTCCTCCAA CGCGCTGAAATACACAGGA CTGATGCGATCGCAAACTCT TTGACTAAATTCCATACATT ATCATGACCATCTCCCAAAC AAACAAGTGGGTTAACCAG ATGCTGACTATTAACATCCC CTGAGTTCGGAGTTGTAGGT CTATTTGACTGGTTCAAAGC GATGATGGAACGGCTTTGTT GCATGAATTAAAAAAAGAC ACACCATCACCTACTTCTAG GATAGACACATCAAACGTCC CACCGCCTAAGTCAAATACC AAGATAATTTCGTTAGTTTTC TTGTCAAGTCCGTAAGCGAG GGCCGCCGCCGTGGGCTAGT TGATAATTCGCAGAACTTTA ATCCCGGCAATTCTACTGGC ATCTTTGGTAGCCTGCCGTTG AGAGTCATTGAAATAGGCAG GGGTGGTAATTACCGCTTGC CTCACTGGTTCCCCCAGATA TGTGCTGGCATCATCTATCA GCTTGCGGACTACCTCATAC CATTTCACGAAAAACCTGAT ACACATGTAAACTCTGAAAC CCTTGCTGTATCAAAGTTTTG TAATTACGAATTACGAATTA CGAATTGATATCAGCCGAGA TTTCTTCGGGTGAAAATTCCT TGTTCAGAGCGGGACAGTGT AGCTTGACATTGCCATTACT GTCACGTACCACTTTGTAAG TAACTTGTTTTGCCTCTTGCG TAACTTCATCATACCTGCGC CCGATGAACCGCTTCACAGA ATAAAAAGTGTTTTCTGGGT TCATTACACCCTGGCGCTT
6
The Human Genome Project
10
A Walk in the Forest * Photo courtesy of www.webshots.com
11
Observation * Photos courtesy of www.webshots.com and Peter Smallwood
12
Observation * Photos courtesy of www.webshots.com and Peter Smallwood
13
Observation * Photos courtesy of www.webshots.com and Peter Smallwood
14
Observation * Photos courtesy of www.webshots.com and Peter Smallwood
15
Experiment * Photos courtesy of www.webshots.com and Peter Smallwood
16
Filters: Information reducers Squirrel filter
17
Filters: Information reducers Molecule filter
18
Filters: Information reducers Sequence filter How organism is made How organism works TCTACTTATA TTCAATCCAC AGGGCTACAC CTAGTTCTTG AAGAGTCTGT TGAATGAACA CATACATGGT TTATCTGTTT TTCTGTCTGC TCTGACCTCT GGCAGCTTTC CACTAGTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC TTAGATAAAC GAACCTTAGT GACTTCTGCT ATACCAAAGT CTCCACGCCC CTCCGTAAAC CTCTAACATG ATGTCAGCAA ATATTAAAAA TGAATAAACT TTGTTAAAGG TACAAATGAA AATTAGCAAA AAGAGTTTAA AGTTAAAAAC GAATTGCAGT CATTCTAGGG AAACCTGTAT GGTTACATGA ACTGCCTAAA AAACAAGCTA TTATATATTT TAAGAAATTA ATTGCAATTA ATTTCCTGGG CCCCAGCTGT CATTAAAAAG AGGCAAATAC AGCCAAGGAC GACAGCACTG ACCCTCAAGA AGGCACCGGC TGACAGACAG GCTGAAATTC CGCTGAGAGC AGAGTGGTAC ATTGAACCCT CCCTGCACCA GGTCTTTCCT GTGGGCACTG AGTGCAGACA ATGAATGACT GAACGAACGA TTGAATGAAA AGAAATGAGA TATGAGGCAA TCACAGCATC AGGTGACCTT AGTATCTATT CTCGGGAGCG CACGGCTCTA AAGAGGCCCA TATCCAGGCA CCTTTAGATG CAAGAAGGAG GAAACAGCTC GAAATCCCTG AGGCCGGAGG GTCAAGAACT CTCCACCGGC GGCAGCGGCC CCCCGGCCTA AGGCTGCCTG TGCTATAAAT ACGCGGCCCA TTCCCTGGGC TCGGCGGGAC AGATAACATG AATGTGCCCT CTCCGTAAAC CTCTAAC...
19
From Sequence to Organism How does Nature do it? ATGACTTATGATCAACGCACAGGGCTA Met-Thr-Tyr-Asp-Gln-Arg-Thr-Gly-Leu... Genetic code Rules of folding Active site
20
From Sequence to Organism How does Nature do it? ATGACTTATGATCAACGCACAGGGCTA Met-Thr-Tyr-Asp-Gln-Arg-Thr-Gly-Leu... Active site Cell interaction Metabolism, Architecture Genetic codeRules of folding
21
From Sequence to Organism How does Nature do it? ATGACTTATGATCAACGCACAGGGCTA Met-Thr-Tyr-Asp-Gln-Arg-Thr-Gly-Leu... Genetic code Active site Gives us: Custom antibiotics Genetic code Rules of folding
22
From Sequence to Organism How does Nature do it? ATGACTTATGATCAACGCACAGGGCTA Met-Thr-Tyr-Asp-Gln-Arg-Thr-Gly-Leu... Gives us: Custom antibiotics Custom antibodies Custom enzymes New materials Genetic code Rules of folding Active site
23
From Sequence to Organism How does Nature do it? ATGACTTATGATCAACGCACAGGGCTA Met-Thr-Tyr-Asp-Gln-Arg-Thr-Gly-Leu... Genetic code Rules of transcriptional and post-transcriptional control Begin transcription End transcription Splice transcript Begin translation ATGACTTATGATCAACGCACAGGGCTA 3% ? TCTACTTATA TTCAATCCAC AGGGCTACAC CTAGTTCTTG AAGAGTCTGT TGAATGAACA CATACATGGT TTATCTGTTT TTCTGTCTGC TCTGACCTCT GGCAGCTTTC CACTAGTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC TTAGATAAAC GAACCTTAGT GACTTCTGCT ATACCAAAGT CTCCACGCCC CTCCGTAAAC CTCTAACATG ATGTCAGCAA ATATTAAAAA TGAATAAACT TTGTTAAAGG TACAAATGAA AATTAGCAAA AAGAGTTTAA AGTTAAAAAC GAATTGCAGT CATTCTAGGG AAACCTGTAT GGTTACATGA ACTGCCTAAA AAACAAGCTA TTATATATTT TAAGAAATTA ATTGCAATTA ATTTCCTGGG CCCCAGCTGT CATTAAAAAG AGGCAAATAC AGCCAAGGAC GACAGCACTG ACCCTCAAGA AGGCACCGGC TGACAGACAG GCTGAAATTC CGCTGAGAGC AGAGTGGTAC ATTGAACCCT CCCTGCACCA GGTCTTTCCT GTGGGCACTG AGTGCAGACA ATGAATGACT GAACGAACGA TTGAATGAAA AGAAATGAGA
24
From Sequence to Organism How does Nature do it? ATGACTTATGATCAACGCACAGGGCTA Met-Thr-Tyr-Asp-Gln-Arg-Thr-Gly-Leu... Genetic code Rules of transcriptional and post-transcriptional control TCTACTTATATTCAATCCACAGGGCTA CACCTAGTTCTTGAAGAGTCTGTTGAA TGAACACATACATGGTTTATCTGTTTT TCTGTCTGCTCTGACCTCTGGCAGCTT TAGCCTGCCCCACTCTTAGATAAACGA ACCTTAGTGACTTCTGCTATACCAAAG TCTCCACGCCCCTCCGTAAACCTCTAA CATGATGTCAGCAAATATTAAAAATGA 97% TCTACTTATA TTCAATCCAC AGGGCTACAC CTAGTTCTTG AAGAGTCTGT TGAATGAACA CATACATGGT TTATCTGTTT TTCTGTCTGC TCTGACCTCT GGCAGCTTTC CACTAGTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC TTAGATAAAC GAACCTTAGT GACTTCTGCT ATACCAAAGT CTCCACGCCC CTCCGTAAAC CTCTAACATG ATGTCAGCAA ATATTAAAAA TGAATAAACT TTGTTAAAGG TACAAATGAA AATTAGCAAA AAGAGTTTAA AGTTAAAAAC GAATTGCAGT CATTCTAGGG AAACCTGTAT GGTTACATGA ACTGCCTAAA AAACAAGCTA TTATATATTT TAAGAAATTA ATTGCAATTA ATTTCCTGGG CCCCAGCTGT CATTAAAAAG AGGCAAATAC AGCCAAGGAC GACAGCACTG ACCCTCAAGA AGGCACCGGC TGACAGACAG GCTGAAATTC CGCTGAGAGC AGAGTGGTAC ATTGAACCCT CCCTGCACCA GGTCTTTCCT GTGGGCACTG AGTGCAGACA ATGAATGACT GAACGAACGA TTGAATGAAA AGAAATGAGA ? Begin transcription End transcription Splice transcript Begin translation
25
From Sequence to Organism How does Nature do it? Natural filters/transformations Selective transcription Selective processing Translation Folding DNA Functional protein
26
From Sequence to Organism How does Nature do it? Natural filters/transformations DNA Functional protein Simulation of NatureSurrogate Processes From Sequence to Organism How can WE do it?
27
Simulation of Nature Utterance of W Shakespeare Utterance of George W Bush “Whether ‘tis nobler in the mind to suffer the slings and arrows of outrageous fortune...” “We must give our military every tool and weapon it needs to prevail...” ???
28
From Sequence to Organism How can WE do it? Surrogate Processes Utterance of W Shakespeare Utterance of George W Bush “Whether ‘tis nobler in the mind to suffer the slings and arrows of outrageous fortune...” “We must give our military every tool and weapon it needs to prevail...” Word frequency
29
From Sequence to Organism How can WE do it? Surrogate Processes Utterance of W Shakespeare Utterance of George W Bush “Whether ‘tis nobler in the mind to suffer the slings and arrows of outrageous fortune...” “We must give our military every tool and weapon it needs to prevail...” Word frequency, words/sentence…
30
From Sequence to Organism How can WE do it? Natural filters/transformations Selective transcription Selective processing Translation Folding/function Surrogate filters TCTACTTATA TTCAATCCAC AGGGCTACAC CTAGTTCTTG AAGAGTCTGT TGAATGAACA CATACATGGT TTATCTGTTT TTCTGTCTGC TCTGACCTCT GGCAGCTTTC CACTAGTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC Characteristics of coding sequences/introns Gene finders Predicted coding regions My sequence
31
From Sequence to Organism How can WE do it? Natural filters/transformations Selective transcription Selective processing Translation Folding/function Surrogate filters Gene finders Met-Thr-Tyr-Asp-Gln-Arg-Thr-Gly-Leu... Function?
32
From Sequence to Organism How can WE do it? Natural filters/transformations Selective transcription Selective processing Translation Folding/function Surrogate filters Gene finders Similarity finders My predicted gene Sequence/motif databases globin globin? Similar genes
33
Surrogate Filters Gene finders Start/Stop codon search CTCCACGCCCCTCCGTACACCTCTAACATGATGTCAGCAAATATTAAAAATGAATAAACTTTGTGACATGTACAAATGGAAATATGCAA CT CCA CGC CCC TCC GTA CAC CTC TAA CAT GAT CTC AGC AAA TAT TAA AAA TGA ATA AAC TTT GTG ACA TGT ACA AAT GGA AAT ATG CAA Look for start codons (ATG) (GTG,TTG) Look for stop codons (TAA,TAG,TGA)
34
CTCCACGCCCCTCCGTACACCTCTAACATGATGTCAGCAAATATTAAAAATGAATAAACTTTGTGACATGTACAAATGGAAATATGCAA TTGCATATTTCCATTTGTACATGTCACAAAGTTTATTCATTTTTAATATTTGCTGAGATCATGTTAGAGGTGTACGGAGGGGCGTGGAG Surrogate Filters Gene finders Start/Stop codon search Look for start codons (ATG) (GTG,TTG) Look for stop codons (TAA,TAG,TGA) Highly inaccurate
35
Surrogate Filters Gene finders Hidden Markov Model (HMM)-based recognition Step 1: Create model through extensive training set AAA AAC AAG AAT ACA... TTG TTT Training Set AAGCTTGACCAAAAAGTTAAAACACTGACGGCAAATAATC AATGACTATCAGACAGAGAATCATCGTGCTGTCAGTAAAA CCTCTGATTTCGATCTTTACCATAATTGTTATGTTGTAAT GACTAACCAGACTATCTTTTACAGAGCTTCTGGTTAACAC TTGTCTAATTAGACATTGATAATGTTTGTGGGGGTTGGTC ATCAGGAATGGTAAATAGCAATTACCCTTCAGACTTTCCT ATGAGACGCTCCGCCAACGAGCAGTGTCTCTTAAAGAACG TTATGAGCGCTCAGTTAACTTCAGAAATTCACGGCGGAAA TCCATAGTTATTATTACTTATGACTAAAACAAAATTACTA TGGCGGCTTGTTTAATATAGATTCTGTGTTCTGAGAAATG ACTTTTAAAGTCCCACTAACTTTTTTCTCATCTATTGCTA TATTTCGACTTTAAAACTTATAGTAGATGGCTTAATTCTC AAATAACAAACTCATTTTTAGTAGATATTTCATGCAAACT GAGGTTTTTAGTGATATTTTCCCCTTATTGAGTACAGCCA CTCCACAAACCTTAGAATGGCTACTCAATATTGCAATTGA TCATGAATATCCCACTGGTAGAGCAGTTTTAATGGAAGAT GCCTGGGGTAATGCAGTTTATTTCGTTGTATCTGGATGGG TAAAAGTTCGGCGCACCTGTGGAGATGATTCGGTAGCTTT
36
Step 1: Create model through extensive training set AAAA: 33% AAAC: 25% AAAG: 12% AAAT: 30% Surrogate Filters Gene finders Class 3: Hidden Markov Model (HMM)-based recognition AAA AAC AAG AAT ACA... TTG TTT Training Set AAGCTTGACCAAAAAGTTAAAACACTGACGGCAAATAATC AATGACTATCAGACAGAGAATCATCGTGCTGTCAGTAAAA CCTCTGATTTCGATCTTTACCATAATTGTTATGTTGTAAT GACTAACCAGACTATCTTTTACAGAGCTTCTGGTTAACAC TTGTCTAATTAGACATTGATAATGTTTGTGGGGGTTGGTC ATCAGGAATGGTAAATAGCAATTACCCTTCAGACTTTCCT ATGAGACGCTCCGCCAACGAGCAGTGTCTCTTAAAGAACG TTATGAGCGCTCAGTTAACTTCAGAAATTCACGGCGGAAA TCCATAGTTATTATTACTTATGACTAAAACAAAATTACTA TGGCGGCTTGTTTAATATAGATTCTGTGTTCTGAGAAATG ACTTTTAAAGTCCCACTAACTTTTTTCTCATCTATTGCTA TATTTCGACTTTAAAACTTATAGTAGATGGCTTAATTCTC AAATAACAAACTCATTTTTAGTAGATATTTCATGCAAACT GAGGTTTTTAGTGATATTTTCCCCTTATTGAGTACAGCCA CTCCACAAACCTTAGAATGGCTACTCAATATTGCAATTGA TCATGAATATCCCACTGGTAGAGCAGTTTTAATGGAAGAT GCCTGGGGTAATGCAGTTTATTTCGTTGTATCTGGATGGG TAAAAGTTCGGCGCACCTGTGGAGATGATTCGGTAGCTTT
37
Step 1: Create model through extensive training set AACA: 30% AACC: 20% AACG: 15% AACT: 35% AAA AAC AAG AAT ACA... TTG TTT Surrogate Filters Gene finders Class 3: Hidden Markov Model (HMM)-based recognition Training Set AAGCTTGACCAAAAAGTTAAAACACTGACGGCAAATAATC AATGACTATCAGACAGAGAATCATCGTGCTGTCAGTAAAA CCTCTGATTTCGATCTTTACCATAATTGTTATGTTGTAAT GACTAACCAGACTATCTTTTACAGAGCTTCTGGTTAACAC TTGTCTAATTAGACATTGATAATGTTTGTGGGGGTTGGTC ATCAGGAATGGTAAATAGCAATTACCCTTCAGACTTTCCT ATGAGACGCTCCGCCAACGAGCAGTGTCTCTTAAAGAACG TTATGAGCGCTCAGTTAACTTCAGAAATTCACGGCGGAAA TCCATAGTTATTATTACTTATGACTAAAACAAAATTACTA TGGCGGCTTGTTTAATATAGATTCTGTGTTCTGAGAAATG ACTTTTAAAGTCCCACTAACTTTTTTCTCATCTATTGCTA TATTTCGACTTTAAAACTTATAGTAGATGGCTTAATTCTC AAATAACAAACTCATTTTTAGTAGATATTTCATGCAAACT GAGGTTTTTAGTGATATTTTCCCCTTATTGAGTACAGCCA CTCCACAAACCTTAGAATGGCTACTCAATATTGCAATTGA TCATGAATATCCCACTGGTAGAGCAGTTTTAATGGAAGAT GCCTGGGGTAATGCAGTTTATTTCGTTGTATCTGGATGGG TAAAAGTTCGGCGCACCTGTGGAGATGATTCGGTAGCTTT
38
Step 2: Assess candidate genes 0.12 A C G T AAA 0.33 0.25 0.12 0.30 AAC 0.30 0.20 0.15 0.35 AAG 0.35 0.15 0.20 0.30 AAT0.30 0.15 0.20 0.25 ACA0.25 0.20 0.15 0.35... TTG0.25 0.30 0.15 0.30 TTT0.30 0.25 0.10 0.35 Candidate gene AAAGCAA… 3 rd order Markov model Surrogate Filters Gene finders Class 3: Hidden Markov Model (HMM)-based recognition
39
Step 2: Assess candidate genes AAAGCAA… 0.12 x 0.15 3 rd order Markov model Surrogate Filters Gene finders Class 3: Hidden Markov Model (HMM)-based recognition A C G T AAA 0.33 0.25 0.12 0.30 AAC 0.30 0.20 0.15 0.35 AAG 0.35 0.15 0.20 0.30 AAT0.30 0.15 0.20 0.25 ACA0.25 0.20 0.15 0.35... TTG0.25 0.30 0.15 0.30 TTT0.30 0.25 0.10 0.35 Candidate gene
40
Step 2: Assess candidate genes AAAGCTA… 0.12 x 0.15... So far, not a good candidate! 3 rd order Markov model Surrogate Filters Gene finders Class 3: Hidden Markov Model (HMM)-based recognition A C G T AAA 0.33 0.25 0.12 0.30 AAC 0.30 0.20 0.15 0.35 AAG 0.35 0.15 0.20 0.30 AAT0.30 0.15 0.20 0.25 ACA0.25 0.20 0.15 0.35... TTG0.25 0.30 0.15 0.30 TTT0.30 0.25 0.10 0.35 Candidate gene
41
Step 2: Assess candidate genes 3 rd order Markov model Surrogate Filters Gene finders Class 3: Hidden Markov Model (HMM)-based recognition Candidate genesPredicted genes
42
Step 2: Assess candidate genes 3 rd order Markov model Surrogate Filters Gene finders Class 3: Hidden Markov Model (HMM)-based recognition Candidate genesPredicted genes Conform to standard model Challenge accepted beliefs
43
Computers are powerful globin Highly filtered output Easy to grasp High-level insights Unfiltered output Confusing Basic insights
44
Computers are tempting
45
Globin Computers are tempting
46
Crisis in Bioinformatics 1. Need high-level filters 2. Need access to raw phenomena 3. Need new tools for new phenomena 4. Need intuitive representation of results Need a new generation 5. Need ability to build new tools
47
View of the Future
48
View of the Future Integration of information ATGACTTATGATCAACGCACAGGGCTA Met-Thr-Tyr-Asp-Gln-Arg-Thr-Gly-Leu... Cell interaction Metabolism, Architecture Genetic codeRules of folding Active site
49
Prochlorococcus MED4 Prochlorococcus MIT9313
50
Gene present in Prochlorococcus MED4 MED4 is naturally adapted to grow in high light. How do cells control response to light? Ortholog absent in Prochlorococcus MIT9313 MIT9313 is naturally adapted to grow in low light Ortholog present in Synechocystis PCC 6803 Reason will become apparent in a moment Synechocystis PCC 6803 ortholog responds to high light Gene turns on by factor > 2 in response to high light What genes are related to the adaptation to high light? Look for:
51
Build setDisplay set Click on Build Set to begin finding orfs with the desired specifications HELPSet operation
52
All items in All open reading frames of All amino acid sequences of All intergenic regions of Human-annotated orfs of Private set Public set All open reading frames of Build set Display set Choose set type Goal is to find all open reading frames within Prochlorococcus MED4 that meet certain specifications, so click on All open reading frames in CancelHELPSet operation
53
All items in All open reading frames ofArthrobacter platensis Gloeobacter violaceus Microcystis aeruginosa Nostoc punctiforme Nostoc PCC 7120 Prochlorococcus MED4 Prochlorococcus MIT9313 Prochlorococcus S120 Synechococcus PCC6301 Synechococcus PCC7942 Synechococcus WH Synechocystis PCC 6803 Thermosynechococcus Trichodesmium Unicellulular Filamentous All Prochlorococcus MED4 Build setDisplay set Choose set typeChoose database Click on Prochlorococcus MED4 CancelHELPSet operation
54
All items in All open reading frames ofProchlorococcus MED4 Display set such that: Variable DataOperationFunctionDone Choose set typeChoose database Build set You will ask that an ortholog of each desired MED4 genes exists in Synechocystis PCC 6803. It is convenient to define the ortholog now. Click the Variable button CancelHELPSet operation
55
All items in All open reading frames ofProchlorococcus MED4 Display set such that: Variable Data Item New variable Variable Choose set typeChoose database New variable Build set Item refers to the MED4 orf under consideration. You want to define its ortholog in Synechocystis, so click on New variable OperationFunctionDone CancelHELPSet operation
56
All items in All open reading frames ofProchlorococcus MED4 Display set such that: Variable Data 6803 ortholog Type variable name = Choose set typeChoose database Build set You can name the variable representing the ortholog anything you like. For this simulation, a name is provided. Press the Enter key OperationFunctionDone CancelHELPSet operation
57
All items in All open reading frames ofProchlorococcus MED4 Display set such that: VariableData 6803 ortholog Type variable name = Closest ortholog of Protein product of Upstream region of Downstream region of Ortholog of (item Choose set typeChoose database Choose function Build set One variable can be defined with respect to another in several ways. The relationship you want is Ortholog of OperationFunctionDone CancelHELPSet operation
58
All items in All open reading frames of Choose set type Prochlorococcus MED4 Choose database Display set such that: VariableData = Ortholog of (item in Arthrobacter platensis Gloeobacter violaceus Microcystis aeruginosa Nostoc punctiforme Nostoc PCC 7120 Prochlorococcus MED4 Prochlorococcus MIT9313 Prochlorococcus S120 Synechococcus PCC6301 Synechococcus PCC7942 Synechococcus WH Synechocystis PCC 6803 Thermosynechococcus Trichodesmium Choose database Synechocystis PCC6803 ) Choose function Build set Clicking on Synechocystis PCC6803 defines the variable 6803 ortholog as the ortholog in Synechocystis to a given orf of MED4. 6803 ortholog Type variable name OperationFunctionDone CancelHELPSet operation
59
All items in All open reading frames of Choose set type Prochlorococcus MED4 Choose database Display set such that: VariableData Synechocystis PCC 6803 Build set ) The first limitation on the MED4 orf is that no ortholog of it exists in MIT9313. To evoke the concept of ortholog, press the Function button = Ortholog of (item in Choose databaseChoose function 6803 ortholog Type variable name OperationFunctionDone CancelHELPSet operation
60
All items in All open reading frames of Choose set type Prochlorococcus MED4 Choose database Display set such that: VariableData Build set Click on Ortholog of Closest ortholog of Protein product of Upstream region of Downstream region of Ortholog of Choose function Synechocystis PCC 6803 ) = Ortholog of (item in Choose databaseChoose function 6803 ortholog Type variable name OperationFunctionDone CancelHELPSet operation
61
All items in All open reading frames of Choose set type Prochlorococcus MED4 Choose database Display set such that: VariableData Build set As always, Item refers to the orf of MED4 that is being defined. You want to specify that an ortholog of it in MIT9313 doesn’t exist, so click on Item. Item 6803 ortholog Variable Item ( in Synechocystis PCC 6803 ) = Ortholog of (item in Choose databaseChoose function 6803 ortholog Type variable name Ortholog of Choose function OperationFunctionDone CancelHELPSet operation
62
All items in All open reading frames of Choose set type Prochlorococcus MED4 Choose database Display set such that: VariableData Build set Clicking on Prochlorococcus MIT9313 defines an ortholog of a MED4 gene in MIT9313 (if such an ortholog exists) Item Variable ( in Synechocystis PCC 6803 ) = Ortholog of (item in Choose databaseChoose function 6803 ortholog Type variable name Ortholog of Choose function Arthrobacter platensis Gloeobacter violaceus Microcystis aeruginosa Nostoc punctiforme Nostoc PCC 7120 Prochlorococcus MED4 Prochlorococcus MIT9313 Prochlorococcus S120 Synechococcus PCC6301 Synechococcus PCC7942 Synechococcus WH Synechocystis PCC 6803 Thermosynechococcus Trichodesmium Choose database ) Prochlorococcus MIT9313 OperationFunctionDone CancelHELPSet operation
63
All items in All open reading frames of Choose set type Prochlorococcus MED4 Choose database Display set such that: VariableData Build set You want to keep only those MED4 genes where an ortholog in MIT9313 does NOT exist, so click on doesn’t exist. Item Variable ( in Synechocystis PCC 6803 ) = Ortholog of (item in Choose databaseChoose function 6803 ortholog Type variable name Ortholog of Choose function Prochlorococcus MIT9313 Choose database ) = exists doesn’t exist Op OperationFunctionDone CancelHELPSet operation
64
All items in All open reading frames of Choose set type Prochlorococcus MED4 Choose database Display set such that: VariableData Build set That completes one specification, but there are more. Click on the Operation button to connect one specification to the next. Item Variable ( in Synechocystis PCC 6803 ) = Ortholog of (item in Choose databaseChoose function 6803 ortholog Type variable name Ortholog of Choose function Prochlorococcus MIT9313 Choose database ) doesn’t exist Op OperationFunctionDone CancelHELPSet operation
65
All items in All open reading frames of Choose set type Prochlorococcus MED4 Choose database Display set such that: VariableData Build set You want both the first specification AND the second to be true, so click on AND. Item Variable ( in Synechocystis PCC 6803 ) = Ortholog of (item in Choose databaseChoose function 6803 ortholog Type variable name Ortholog of Choose function Prochlorococcus MIT9313 Choose database ) doesn’t exist Op AND OR AND Op OperationFunctionDone CancelHELPSet operation
66
All items in All open reading frames of Choose set type Prochlorococcus MED4 Choose database Display set such that: Variable Data Build set The second specification is that microarray data for the 6803 ortholog meets a certain criterion. To get at that data, press the Data button Item Variable ( in Synechocystis PCC 6803 ) = Ortholog of (item in Choose databaseChoose function 6803 ortholog Type variable name Ortholog of Choose function Prochlorococcus MIT9313 Choose database ) doesn’t exist Op AND Op [ OperationFunctionDone CancelHELPSet operation
67
All items in All open reading frames of Choose set type Prochlorococcus MED4 Choose database Display set such that: Variable Data Build set The data you want is for the 6803 ortholog. Click on 6803 ortholog. Item Variable ( in Synechocystis PCC 6803 ) = Ortholog of (item in Choose databaseChoose function 6803 ortholog Type variable name Ortholog of Choose function Prochlorococcus MIT9313 Choose database ) doesn’t exist Op AND Op [ data for ( Item 6803 ortholog New variable Variable 6803 ortholog in OperationFunctionDone CancelHELPSet operation
68
All items in All open reading frames of Choose set type Prochlorococcus MED4 Choose database Display set such that: Variable Data Build set Choose the Hihara experiment, which measured expression changes upon shift from low light to high light. If you didn’t know which experiment was appropriate, you could have clicked on Choose data set for a description of the choices Item Variable ( in Synechocystis PCC 6803 ) = Ortholog of (item in Choose databaseChoose function 6803 ortholog Type variable name Ortholog of Choose function Prochlorococcus MIT9313 Choose database ) doesn’t exist Op AND Op [ data for ( 6803 ortholog Variable in Microarray:Hihara1(6803) Microarray:Suzuki1(6803) Microarray:Yoshimura1(6803) Microarray:Meeks(Npun) Microarray:Golden(7120) Choose data set Microarray:Hihara1(6803) ) OperationFunctionDone CancelHELPSet operation High light vs low light experiment
69
All items in All open reading frames of Choose set type Prochlorococcus MED4 Choose database Display set such that: Variable Data Build set You want the ratio of experimental condition to control to exceed a specified value. Click on >. Item Variable ( in Synechocystis PCC 6803 ) = Ortholog of (item in Choose databaseChoose function 6803 ortholog Type variable name Ortholog of Choose function Prochlorococcus MIT9313 Choose database ) doesn’t exist Op AND Op [ data for ( Variable in Microarray:Hihara1(6803) Choose data set ) < < or = = > or = > > Op 6803 ortholog OperationFunctionDone CancelHELPSet operation
70
All items in All open reading frames of Choose set type Prochlorococcus MED4 Choose database Display set such that: Variable Data Build set You can type in the value you want. For this simulation a number is supplied. Press the Enter key. Item Variable ( in Synechocystis PCC 6803 ) = Ortholog of (item in Choose databaseChoose function 6803 ortholog Type variable name Ortholog of Choose function Prochlorococcus MIT9313 Choose database ) doesn’t exist Op AND Op [ data for ( Variable in Microarray:Hihara1(6803) Choose data set ) > OpValue ] +2 6803 ortholog OperationFunctionDone CancelHELPSet operation
71
All items in All open reading frames of Choose set type Prochlorococcus MED4 Choose database Display set such that: VariableData Build set No more specifications. Press the Done button. Item Variable ( in Synechocystis PCC 6803 ) = Ortholog of (item in Choose databaseChoose function 6803 ortholog Type variable name Ortholog of Choose function Prochlorococcus MIT9313 Choose database ) doesn’t exist Op AND Op [ data for ( Variable in Microarray:Hihara1(6803) Choose data set ) > OpValue ] +2 6803 ortholog OperationFunctionDone CancelHELPSet operation
72
All items in All open reading frames of Choose set type Prochlorococcus MED4 Choose database Display set such that: VariableData Build set This was a complicated search. If you wanted to do it again, you could save the search description. In this case, just save the results by clicking on Save only results. Item Variable ( in Synechocystis PCC 6803 ) = Ortholog of (item in Choose databaseChoose function 6803 ortholog Type variable name Ortholog of Choose function Prochlorococcus MIT9313 Choose database ) doesn’t exist Op AND Op [ data for ( Variable in Microarray:Hihara1(6803) Choose data set ) > OpValue ] +2 6803 ortholog Save results and script Save only results Save only results OperationFunctionDone CancelHELPSet operation
73
All items in All open reading frames of Choose set type Prochlorococcus MED4 Choose database Display set such that: VariableData Build set All MED4 genes meeting the given specifications will be collected into a set. You can name the set anything you want. For this simulation, a name is provided. Press the Enter key. Item Variable ( in Synechocystis PCC 6803 ) = Ortholog of (item in Choose databaseChoose function 6803 ortholog Type variable name Ortholog of Choose function Prochlorococcus MIT9313 Choose database ) doesn’t exist Op AND Op [ data for ( Variable in Microarray:Hihara1(6803) Choose data set ) > OpValue ] +2 Light-specific genes Type name of set 6803 ortholog OperationFunctionDone CancelHELPSet operation
74
Build setDisplay set :all0687 hupL [NiFe] uptake hydrogenase large subunit, C terminus :all0687 hupL [NiFe] uptake hydrogenase large subunit, N terminus :all0688 hupS [NiFe] uptake hydrogenase small subunit :alr0692 similar to nifU :alr0874 nifH2 dinitrogenase reductase :asr1309 similar to nifU :alr1407 nifV1 homocitrate synthase :asr1408 nifZ iron-sulfur cofactor synthesis :asr1408 nifT Set: Light-specific genes ProcMed4:all0687 hupL [NiFe] uptake hydrogenase large subunit, C terminus ProcMed4:all0687 hupL [NiFe] uptake hydrogenase large subunit, N terminus ProcMed4:all0688 hupS [NiFe] uptake hydrogenase small subunit ProcMed4:alr0692 similar to nifU ProcMed4:alr0874 psbBX dinitrogenase reductase ProcMed4:asr1309 similar to nifU ProcMed4:alr1407 psbY1 homocitrate synthase ProcMed4:asr1408 psbX iron-sulfur cofactor synthesis ProcMed4:asr1408 nifT The results are displayed as a list of orfs (Of course, the search capabilities do not now exist, and the results of the described search are unknown) Clicking on the name of any orf brings you to its page (see Scenarios 1 and 2). Clicking on circles next to the orf names allows you to modify the set. The genetic neighborhood of each orf is shown to the right. DoneHELPSet operation [WARNING: Fantasy filtration not in effect!]
75
Prochlorococcus MED4: pll1290 Replicon: Chromosome Coordinates: 1533026 (stop) <- 1533931 (start-TTG)Human Length = 301 amino acids Strand: Complementary Gene name(s): proXM Function: Putative type II DNA cytosine methyltransferase (CAGCTG-specific)Human Classification: Type II beta (N4)Human Activity: Protects against: PvuII Experiment In vivo activity: existsExperiment Cyanobacterial orthologs: none ProcMED4 Proteus vulgaris Salmonella paratyphi Streptomyces spectabilis OptionsAnnotate Main Menu History More A A A A A HELP [WARNING: Fantasy filtration not in effect!]
76
All items in All open reading frames of Choose set type Prochlorococcus MED4 Choose database Display set such that: VariableData Build set This was a complicated search. If you wanted to do it again, you could save the search description. In this case, just save the results by clicking on Save only results. Item Variable ( in Synechocystis PCC 6803 ) = Ortholog of (item in Choose databaseChoose function 6803 ortholog Type variable name Ortholog of Choose function Prochlorococcus MIT9313 Choose database ) doesn’t exist Op AND Op [ data for ( Variable in Microarray:Hihara1(6803) Choose data set ) > OpValue ] +2 6803 ortholog Save results and script Save only results Save results and script OperationFunctionDone CancelHELPSet operation
77
Equivalent script that bypasses interface FOR orf IN (orfs:ProcMED4) { 6803ortholog = Ortholog(orf,orfs:Syny6803); WHEN (NOT Exists(Ortholog(orf,orfs:Proc9313)) AND Data(6803ortholog,microarray:Hihara1) > +2){ COLLECT orf INTO light_specific_genes; } DISPLAY (light_specific_genes, “BNC”); or MAIL (light_specific_genes,Rocap@Ocean.Washington.Edu,“BNC”); The same search could have been conducted through the script shown above. The script interface makes possible complex searches beyond the scope of the graphical interface.
78
All items in All open reading frames of Choose set type Prochlorococcus MED4 Choose database Display set such that: VariableData Build set OperationFunctionDone CancelHELPSet operation HELP ???
79
Cyanobacterial Knowledge Base Virtual Help Desk How to search for data? How to build a new filter?
80
Cyanobacterial Knowledge Base Virtual Help Desk How to......I don’t know! Virtual Help Desk Staff HELP
81
Cyanobacterial Knowledge Base Virtual Help Desk Upper echelons Staff You Virtual Help Desk Staff HELP
82
Billions and Billions of Bases How does a biologist maintain a grip on sanity? reality?
83
View of the Future Interplay of low- & high-level perception ProcMED4 Proteus vulgaris Salmonella paratyphi Streptomyces spectabilis
84
View of the Future Interplay of low- & high-level perception Anab7120 Proteus vulgaris Salmonella paratyphi Streptomyces spectabilis TCTACTTATATTCAATCCACAGGGCTA CACCTAGTTCTTGAAGAGTCTGTTGAA TGAACACATACATGGTTTATCTGTTTT TCTGTCTGCTCTGACCTCTGGCAGCTT TAGCCTGCCCCACTCTTAGATAAACGA ACCTTAGTGACTTCTGCTATACCAAAG TCTCCACGCCCCTCCGTAAACCTCTAA CATGATGTCAGCAAATATTAAAAATGA 97% TCTACTTATA TTCAATCCAC AGGGCTACAC CTAGTTCTTG AAGAGTCTGT TGAATGAACA CATACATGGT TTATCTGTTT TTCTGTCTGC TCTGACCTCT GGCAGCTTTC CACTAGTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC TTAGATAAAC GAACCTTAGT GACTTCTGCT ATACCAAAGT CTCCACGCCC CTCCGTAAAC CTCTAACATG ATGTCAGCAA ATATTAAAAA TGAATAAACT TTGTTAAAGG TACAAATGAA AATTAGCAAA AAGAGTTTAA AGTTAAAAAC GAATTGCAGT CATTCTAGGG AAACCTGTAT GGTTACATGA ACTGCCTAAA AAACAAGCTA TTATATATTT TAAGAAATTA ATTGCAATTA ATTTCCTGGG CCCCAGCTGT CATTAAAAAG AGGCAAATAC AGCCAAGGAC GACAGCACTG ACCCTCAAGA AGGCACCGGC TGACAGACAG GCTGAAATTC CGCTGAGAGC AGAGTGGTAC ATTGAACCCT CCCTGCACCA GGTCTTTCCT GTGGGCACTG AGTGCAGACA ATGAATGACT GAACGAACGA TTGAATGAAA AGAAATGAGA
85
Anabaena Chromosome (6413771 bp): 4001 to 5000 cgcccaacaataacaaatgtgtaatctagaccttctgccttgagttcctt ggcgcggttttcggcacgacggatgacgttggtattgtaaccgccgcaca aaccacgatcgccagaaataactagcaagcctactgatttaacttcccgt tttttcagtagaggtaagtctacatcttcaaaccgtagacgagtttgcaa accgtataatacttgtgccaaacggtcagcaaaaggacgagtagcgatta cttgttcttgggcgcgacgtacacgcgccgccgctaccagccgcatggct tctgtgattttcttggtgtttttgaccgactgaatgcgatcgcgtattga tttgagattaggcataatatttgttgattgtcagttgtcagttgtcagtt gtcagttgtcagtgtctattgctactgaccactgaccaatgactaatgac taattacgctgtagctttgaaggtctttttgtagtcttctaaagctgcct tcaatgctttttcttcatcatcacccagtgctttcttcgattgtacgtct tggaagtaggggttaacgccggacttcaagtaatctctcaagcctttggt gaaggtggtgactttatcaacagggatatcatctaagtaaccgttgatac ctgcgtacagaatggctacttgttcagctacggatagaggctgattttgg gactgtttgaggagttcccgcaggcgttgacctcttgccaattggtcttg ggtggctttatctaggtcggaagcaaattgcgcgaaggcttggaggtcgt caaactgtgctagttcgagcttaatcttaccagcaacttttttcatcgct ttggtttgtgccgcagaacccacacgggatacagagataccagggtttac agccggacgaataccagcgttaaataagtcagaagataagaatatctgac cgtctgtaatagaaattacgttggtaggaatgtaggcagaaacgtcacca Typical output of current programs
86
Future: Sequence plus genetic context Noncoding region
87
Future: Both filtered and raw data
89
Filters: Information reducers Build filter to find repeated sequences TCTACTTATA TTCAATCCAC AGGGCTACAC CTAGTTCTTG AAGAGTCTGT TGAATGAACA CATACATGGT TTATCTGTTT TTCTGTCTGC TCTGACCTCT GGCAGCTTTC CACTAGTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC TTAGATAAAC GAACCTTAGT GACTTCTGCT ATACCAAAGT CTCCACGCCC CTCCGTAAAC CTCTAACATG ATGTCAGCAA ATATTAAAAA TGAATAAACT TTGTTAAAGG TACAAATGAA AATTAGCAAA AAGAGTTTAA AGTTAAAAAC GAATTGCAGT CATTCTAGGG AAACCTGTAT GGTTACATGA ACTGCCTAAA AAACAAGCTA TTATATATTT TAAGAAATTA ATTGCAATTA ATTTCCTGGG CCCCAGCTGT CATTAAAAAG AGGCAAATAC AGCCAAGGAC GACAGCACTG ACCCTCAAGA AGGCACCGGC TGACAGACAG GCTGAAATTC CGCTGAGAGC AGAGTGGTAC ATTGAACCCT CCCTGCACCA GGTCTTTCCT GTGGGCACTG AGTGCAGACA ATGAATGACT GAACGAACGA TTGAATGAAA AGAAATGAGA TATGAGGCAA TCACAGCATC AGGTGACCTT AGTATCTATT CTCGGGAGCG CACGGCTCTA AAGAGGCCCA TATCCAGGCA CCTTTAGATG CAAGAAGGAG GAAACAGCTC GAAATCCCTG AGGCCGGAGG GTCAAGAACT CTCCACCGGC GGCAGCGGCC CCCCGGCCTA AGGCTGCCTG TGCTATAAAT ACGCGGCCCA TTCCCTGGGC TCGGCGGGAC AGATAACATG AATGTGCCCT TGGTCTCCGACCGACCGTAGGTCATCG CTTGTACTGAGCGAAGTCGAAGTA CTTGTACTGAGCGTAGCCGAAGTA GTTCGACTGAGCGTAGTCGAAGTC... Repeat filter Entire genomeRepeated sequences
91
Filters: Information reducers Build repeats filter TCTACTTATA TTCAATCCAC AGGGCTACAC CTAGTTCTTG AAGAGTCTGT TGAATGAACA CATACATGGT TTATCTGTTT TTCTGTCTGC TCTGACCTCT GGCAGCTTTC CACTAGTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC TTAGATAAAC GAACCTTAGT GACTTCTGCT ATACCAAAGT CTCCACGCCC CTCCGTAAAC CTCTAACATG ATGTCAGCAA ATATTAAAAA TGAATAAACT TTGTTAAAGG TACAAATGAA AATTAGCAAA AAGAGTTTAA AGTTAAAAAC GAATTGCAGT CATTCTAGGG AAACCTGTAT GGTTACATGA ACTGCCTAAA AAACAAGCTA TTATATATTT TAAGAAATTA ATTGCAATTA ATTTCCTGGG CCCCAGCTGT CATTAAAAAG AGGCAAATAC AGCCAAGGAC GACAGCACTG ACCCTCAAGA AGGCACCGGC TGACAGACAG GCTGAAATTC CGCTGAGAGC AGAGTGGTAC ATTGAACCCT CCCTGCACCA GGTCTTTCCT GTGGGCACTG AGTGCAGACA ATGAATGACT GAACGAACGA TTGAATGAAA AGAAATGAGA TATGAGGCAA TCACAGCATC AGGTGACCTT AGTATCTATT CTCGGGAGCG CACGGCTCTA AAGAGGCCCA TATCCAGGCA CCTTTAGATG CAAGAAGGAG GAAACAGCTC GAAATCCCTG AGGCCGGAGG GTCAAGAACT CTCCACCGGC GGCAGCGGCC CCCCGGCCTA AGGCTGCCTG TGCTATAAAT ACGCGGCCCA TTCCCTGGGC TCGGCGGGAC AGATAACATG AATGTGCCCT TGGTCTCCGACCGACCGTAGGTCATCG CTTGTACTGAGCGAAGTCGAAGTA CTTGTACTGAGCGTAGCCGAAGTA GTTCGACTGAGCGTAGTCGAAGTC... Repeat filter Entire genomeRepeated sequences NIS-1: repeat family
92
Alignment of NIS-1 (…271 more)
93
Filters: Information reducers Build secondary repeats filter A: CTTGTACTGAGCGAAGTCGAAGTA B: CTTGTACTGAGCGTAGCCGAAGTA Distance = 2 CTTGTACTGAGCGAAGTCGAAGTA... CTTGTACTGAGCGAAGTCGAAGTA Copy number = 10 Subfamily A CTTGTACTGAGCGTAGCCGAAGTA Copy number = 2 Subfamily B GTTCGACTGAGCGTAGTCGAAGTC Copy number = 1 Subfamily C
94
Filters: Information reducers Build secondary repeats filter Distance = 2 A: CTTGTACTGAGCGAAGTCGAAGTA C: GTTCGACTGAGCGTAGTCGAAGTC Distance = 5 CTTGTACTGAGCGAAGTCGAAGTA... CTTGTACTGAGCGAAGTCGAAGTA Copy number = 10 Subfamily A CTTGTACTGAGCGTAGCCGAAGTA Copy number = 2 Subfamily B GTTCGACTGAGCGTAGTCGAAGTC Copy number = 1 Subfamily C
95
Filters: Information reducers Build secondary repeats filter B: CTTGTACTGAGCGTAGCCGAAGTA C: GTTCGACTGAGCGTAGTCGAAGTC Distance = 5 Do for all pairs of subfamilies CTTGTACTGAGCGAAGTCGAAGTA... CTTGTACTGAGCGAAGTCGAAGTA Copy number = 10 Subfamily A CTTGTACTGAGCGTAGCCGAAGTA Copy number = 2 Subfamily B GTTCGACTGAGCGTAGTCGAAGTC Copy number = 1 Subfamily C Distance = 2
96
Diameter Copies of exact repeats Distance Number of mismatches Relationship between related repeats in genome (sequences within NIS-1 repeat family)
97
Crisis in Bioinformatics 1. Need high-level filters 2. Need access to raw phenomena Integrated knowledge base
98
Crisis in Bioinformatics 1. Need high-level filters 2. Need access to raw phenomena 3. Need new tools for new phenomena 4. Need intuitive representation of results Integrated knowledge base Tools that bridge levels of perception
99
Crisis in Bioinformatics 1. Need high-level filters 2. Need access to raw phenomena 3. Need new tools for new phenomena 4. Need intuitive representation of results Long term: Need a new generation 5. Need ability to build new tools Integrated knowledge base Tools that bridge levels of perception Short term: Graphical programming Human help
100
Billions and Billions of Bases How does a biologist maintain a grip on reality? Filtering reality Raw reality Real questions with real answers
101
Pre-genomic Molecular Biology
107
How do we figure out how cars are made? Genetic approachBiochemical approach
108
Pre-genomic Molecular Biology Geneticist’s Approach
110
Isolation of Defective Gene Pre-genomic Molecular Biology Geneticist’s Approach
111
Pre-genomic Molecular Biology How do we figure out how cars are made? Genetic approachBiochemical approach
112
Pre-genomic Molecular Biology Biochemist’s Approach
116
Pre-genomic Molecular Biology How do we figure out how cars are made? Genetic approachBiochemical approach
117
One component at a time Highly filtered perception Many local viewpoints Pre-genomic Molecular Biology How we viewed the world
118
Post-genomic Molecular Biology
119
Post-genomic Molecular Biology Bioinformaticist’s Approach (long term) Assemble the whole
120
Post-genomic Molecular Biology Bioinformaticist’s Approach (short term) Identify critical parts
121
Globin Current Biology
122
AATAAAGCTTTACAAACCAA ACTCTGGCTTCAATTGTGTAA CCCAAGCTTTGATTCTTTCCT CTGTTAAATCGGATTGATTAT CTTCATCAAGGGCAAGACCT ACAAATTTACCATCACGAAC AGCTTTAGACTCACTGAATT CATAACCTTCTGTAGGCCAA TAGCCAACTGTTTCACCACC ATTTTCTGAAATTTTTTCCTCT AGAATACCGAGGGCATCTTG AAATGTATCAGGATAACCAA CCTGGTCTCCAGGAGCAAAA TAAGCAACTTTTTTGCCGATG AAGTCAATGTTATCTAACTC ATCATAAAAATTTTCCCAAT CACTTTGCAATTCTCCAACAT TCCAGGTAGGACAACCAAC AACGATATAATCGTAGTTAT TGAAATCACTTGGTTCAGCTT GTGAAATATCATATAAAGTT ACAACACTATCACCACCAAA CTCCTTCTGAATTATTTCTGA TTCAGTTTGGGTATTGCCTGT TTGAGTACCAAAAAATAAAC CAATATTAGACATTTTTACTC CTTTTATGTATTTGCAAAATT ATTTCAATTAAAATATTTAGT AATAATTAATTGTTAGCTAG CTAATAATTAAATTTTTATTA CAATCATTGTAAAAGGCATT GAAAAAGTAAATAAAAATT TTTATTCTACGTTATTTCAAA AATATTTACTTACATATACTT AACCTTTATAGTGATGTAAT ATACTCTAATTCCTATTTTAC TTATAAATACCATCTCAGCTT AATGTAACGAATTTTTCTGTT TATCTTTAAATACAAAAAAT TCAACAAAACTACAGAAAA TTAATCTTAATAACACAAAA CAAGTATCAATCTGTAATAC AACTAAGCTTAAATAAATTA ATAGAAAGCTTCATCTATCT AATAGGTTGAGAATAGTTTA TGTCTAATGACATAAATTCA TTCGTGTTGATTTCATTTGGG TATATTCATCTGATTTAGGAT TTACTCCATTAAGTTTGTACT CATCAATGCCCGCCTGTTGG TATCCACAATTCTCATACAG TGCGCGAGCAAAGTAATCA ATCGTTCGTCGCCATATCTA ACTTTGAGTCAAACAAACCA GTTGGATTACCAACCCTCAA CTAATCGCTTCTTTAAGGCG AGCGATCGCACATTTAACTG TTGGTTGTCACAAGAGAACT AATACTACAGCAGTATATTT AACAACTAAGGGTGGTTCAA CTTTCGCTGCGACTCCTCCAA CGCGCTGAAATACACAGGA CTGATGCGATCGCAAACTCT TTGACTAAATTCCATACATT ATCATGACCATCTCCCAAAC AAACAAGTGGGTTAACCAG ATGCTGACTATTAACATCCC CTGAGTTCGGAGTTGTAGGT CTATTTGACTGGTTCAAAGC GATGATGGAACGGCTTTGTT GCATGAATTAAAAAAAGAC ACACCATCACCTACTTCTAG GATAGACACATCAAACGTCC CACCGCCTAAGTCAAATACC AAGATAATTTCGTTAGTTTTC TTGTCAAGTCCGTAAGCGAG GGCCGCCGCCGTGGGCTAGT TGATAATTCGCAGAACTTTA ATCCCGGCAATTCTACTGGC ATCTTTGGTAGCCTGCCGTTG AGAGTCATTGAAATAGGCAG GGGTGGTAATTACCGCTTGC CTCACTGGTTCCCCCAGATA TGTGCTGGCATCATCTATCA GCTTGCGGACTACCTCATAC CATTTCACGAAAAACCTGAT ACACATGTAAACTCTGAAAC CCTTGCTGTATCAAAGTTTTG TAATTACGAATTACGAATTA CGAATTGATATCAGCCGAGA TTTCTTCGGGTGAAAATTCCT TGTTCAGAGCGGGACAGTGT AGCTTGACATTGCCATTACT GTCACGTACCACTTTGTAAG TAACTTGTTTTGCCTCTTGCG TAACTTCATCATACCTGCGC CCGATGAACCGCTTCACAGA ATAAAAAGTGTTTTCTGGGT TCATTACACCCTGGCGCTT Future Biology
123
AATAAAGCTTTACAAACCAA ACTCTGGCTTCAATTGTGTAA CCCAAGCTTTGATTCTTTCCT CTGTTAAATCGGATTGATTAT CTTCATCAAGGGCAAGACCT ACAAATTTACCATCACGAAC AGCTTTAGACTCACTGAATT CATAACCTTCTGTAGGCCAA TAGCCAACTGTTTCACCACC ATTTTCTGAAATTTTTTCCTCT AGAATACCGAGGGCATCTTG AAATGTATCAGGATAACCAA CCTGGTCTCCAGGAGCAAAA TAAGCAACTTTTTTGCCGATG AAGTCAATGTTATCTAACTC ATCATAAAAATTTTCCCAAT CACTTTGCAATTCTCCAACAT TCCAGGTAGGACAACCAAC AACGATATAATCGTAGTTAT TGAAATCACTTGGTTCAGCTT GTGAAATATCATATAAAGTT ACAACACTATCACCACCAAA CTCCTTCTGAATTATTTCTGA TTCAGTTTGGGTATTGCCTGT TTGAGTACCAAAAAATAAAC CAATATTAGACATTTTTACTC CTTTTATGTATTTGCAAAATT ATTTCAATTAAAATATTTAGT AATAATTAATTGTTAGCTAG CTAATAATTAAATTTTTATTA CAATCATTGTAAAAGGCATT GAAAAAGTAAATAAAAATT TTTATTCTACGTTATTTCAAA AATATTTACTTACATATACTT AACCTTTATAGTGATGTAAT ATACTCTAATTCCTATTTTAC TTATAAATACCATCTCAGCTT AATGTAACGAATTTTTCTGTT TATCTTTAAATACAAAAAAT TCAACAAAACTACAGAAAA TTAATCTTAATAACACAAAA CAAGTATCAATCTGTAATAC AACTAAGCTTAAATAAATTA ATAGAAAGCTTCATCTATCT AATAGGTTGAGAATAGTTTA TGTCTAATGACATAAATTCA TTCGTGTTGATTTCATTTGGG TATATTCATCTGATTTAGGAT TTACTCCATTAAGTTTGTACT CATCAATGCCCGCCTGTTGG TATCCACAATTCTCATACAG TGCGCGAGCAAAGTAATCA ATCGTTCGTCGCCATATCTA ACTTTGAGTCAAACAAACCA GTTGGATTACCAACCCTCAA CTAATCGCTTCTTTAAGGCG AGCGATCGCACATTTAACTG TTGGTTGTCACAAGAGAACT AATACTACAGCAGTATATTT AACAACTAAGGGTGGTTCAA CTTTCGCTGCGACTCCTCCAA CGCGCTGAAATACACAGGA CTGATGCGATCGCAAACTCT TTGACTAAATTCCATACATT ATCATGACCATCTCCCAAAC AAACAAGTGGGTTAACCAG ATGCTGACTATTAACATCCC CTGAGTTCGGAGTTGTAGGT CTATTTGACTGGTTCAAAGC GATGATGGAACGGCTTTGTT GCATGAATTAAAAAAAGAC ACACCATCACCTACTTCTAG GATAGACACATCAAACGTCC CACCGCCTAAGTCAAATACC AAGATAATTTCGTTAGTTTTC TTGTCAAGTCCGTAAGCGAG GGCCGCCGCCGTGGGCTAGT TGATAATTCGCAGAACTTTA ATCCCGGCAATTCTACTGGC ATCTTTGGTAGCCTGCCGTTG AGAGTCATTGAAATAGGCAG GGGTGGTAATTACCGCTTGC CTCACTGGTTCCCCCAGATA TGTGCTGGCATCATCTATCA GCTTGCGGACTACCTCATAC CATTTCACGAAAAACCTGAT ACACATGTAAACTCTGAAAC CCTTGCTGTATCAAAGTTTTG TAATTACGAATTACGAATTA CGAATTGATATCAGCCGAGA TTTCTTCGGGTGAAAATTCCT TGTTCAGAGCGGGACAGTGT AGCTTGACATTGCCATTACT GTCACGTACCACTTTGTAAG TAACTTGTTTTGCCTCTTGCG TAACTTCATCATACCTGCGC CCGATGAACCGCTTCACAGA ATAAAAAGTGTTTTCTGGGT TCATTACACCCTGGCGCTT Future Biology
124
Globin TCTACTTATA TTCAATCCAC AGGGCTACAC CTAGTTCTTG AAGAGTCTGT TGAATGAACA CATACATGGT TTATCTGTTT TTCTGTCTGC TCTGACCTCT GGCAGCTTTC CACTAGTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC TTAGATAAAC GAACCTTAGT GACTTCTGCT ATACCAAAGT CTCCACGCCC CTCCGTAAAC CTCTAACATG ATGTCAGCAA ATATTAAAAA TGAATAAACT TTGTTAAAGG TACAAATGAA AATTAGCAAA AAGAGTTTAA AGTTAAAAAC GAATTGCAGT CATTCTAGGG AAACCTGTAT GGTTACATGA ACTGCCTAAA AAACAAGCTA TTATATATTT TAAGAAATTA ATTGCAATTA ATTTCCTGGG CCCCAGCTGT CATTAAAAAG AGGCAAATAC AGCCAAGGAC GACAGCACTG ACCCTCAAGA AGGCACCGGC TGACAGACAG GCTGAAATTC CGCTGAGAGC AGAGTGGTAC ATTGAACCCT CCCTGCACCA GGTCTTTCCT GTGGGCACTG AGTGCAGACA ATGAATGACT GAACGAACGA TTGAATGAAA AGAAATGAGA TATGAGGCAA TCACAGCATC AGGTGACCTT AGTATCTATT CTCGGGAGCG CACGGCTCTA AAGAGGCCCA TATCCAGGCA CCTTTAGATG CAAGAAGGAG GAAACAGCTC GAAATCCCTG AGGCCGGAGG GTCAAGAACT CTCCACCGGC GGCAGCGGCC CCCCGGCCTA AGGCTGCCTG TGCTATAAAT ACGCGGCCCA TTCCCTGGGC TCGGCGGGAC AGATAACATG AATGTGCCCT Current Biology Current Life
125
“Axis of Evil...” Current Life
126
“No war for oil...” Globin Current Life
127
“No war for oil...” Globin Current Life
130
Contact Information Jeff Elhai Department of Biology Virginia Commonwealth University Richmond, VA E-Mail: ElhaiJ@VCU.Edu Tel: 804-828-0794 Web: www.people.vcu.edu/~elhaij/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.