Download presentation
Presentation is loading. Please wait.
1
Challenges for computer science as a part of Systems Biology Benno Schwikowski Institute for Systems Biology Seattle, WA
2
Math and Computer Science ChallengesBenno Schwikowski Species Conditions/time Genes Towards integrative models Protein interaction - Interaction partner - Direct/indirect - Affinity - Effect DNA - Sequence - Genomic locus - Domain content - Intron/exon structure - Regulatory motifs - Chemical modifications - SNPs - Splice variants - Accessibility - Variation mRNA - Abundance - Regulatory information - initiation/ termination signals Protein - Abundance - State - Localization - 3D structure - Functional characterization - Half-life - Active sites - Biochemical function - Cellular role
3
Math and Computer Science ChallengesBenno Schwikowski Challenge: Integrative models …Across genes and proteins: Many genes involved (e.g., multifactorial diseases) …Across model systems: Lack of experimental platforms in target system …Across levels of biological organization (e.g. gene regulatory processes involving phosphorylation) …Across experiments: Robustness against errors in mass spectrometry, mRNA measurements …Across timescales
4
Math and Computer Science ChallengesBenno Schwikowski DNA RNA Proteins Modules Organelles Cells Organs Individuals Populations Ecologies Challenge: Capturing evolutionary constraints "Nothing in biology makes sense except in the light of evolution.“ Theodosius Dobzhansky
5
Challenge: Which tools and experiments to use
6
Math and Computer Science ChallengesBenno Schwikowski Challenge: Choosing experiments Machine Learning Determine most likely classification/parameterization on the basis of a randomly sampled dataset Active Learning Allow an algorithm to query selected data points, using the result of previous queries.
7
Math and Computer Science ChallengesBenno Schwikowski Challenge: Relations between system variables can be quite complex Yuh, Bolouri, Davidson, Science, 1998
8
Math and Computer Science ChallengesBenno Schwikowski Challenge: Relations between system variables can be quite complex Yuh, Bolouri, Davidson, Science, 1998
9
Math and Computer Science ChallengesBenno Schwikowski Challenge: Develop models that allow extremely efficient algorithms AGTCGTACGTGAC... AGTAGACGTGCCG... ACGTGAGATACGT... GAACGGAGTACGT... TCGTGACGGTGAT...
10
Math and Computer Science ChallengesBenno Schwikowski CLUSTALW(1.74) multiple sequence alignment CottonACGGTT-TCCATTGGATGA---AATGAGATAAGAT---CACTGTGC---TTCTTCCACGTG--GCAGGTTGCCAAAGATA-------AGGCTTTACCATT PeaGTTTTT-TCAGTTAGCTTA---GTGGGCATCTTA----CACGTGGC---ATTATTATCCTA--TT-GGTGGCTAATGATA-------AGG--TTAGCACA TobaccoTAGGAT-GAGATAAGATTA---CTGAGGTGCTTTA---CACGTGGC---ACCTCCATTGTG--GT-GACTTAAATGAAGA-------ATGGCTTAGCACC Ice-plantTCCCAT-ACATTGACATAT---ATGGCCCGCCTGCGGCAACAAAAA---AACTAAAGGATA--GCTAGTTGCTACTACAATTC--CCATAACTCACCACC TurnipATTCAT-ATAAATAGAAGG---TCCGCGAACATTG--AAATGTAGATCATGCGTCAGAATT--GTCCTCTCTTAATAGGA-------A-------GGAGC WheatTATGAT-AAAATGAAATAT---TTTGCCCAGCCA-----ACTCAGTCGCATCCTCGGACAA--TTTGTTATCAAGGAACTCAC--CCAAAAACAAGCAAA DuckweedTCGGAT-GGGGGGGCATGAACACTTGCAATCATT-----TCATGACTCATTTCTGAACATGT-GCCCTTGGCAACGTGTAGACTGCCAACATTAATTAAA LarchTAACAT-ATGATATAACAC---CGGGCACACATTCCTAAACAAAGAGTGATTTCAAATATATCGTTAATTACGACTAACAAAA--TGAAAGTACAAGACC CottonCAAGAAAAGTTTCCACCCTC------TTTGTGGTCATAATG-GTT-GTAATGTC-ATCTGATTT----AGGATCCAACGTCACCCTTTCTCCCA-----A PeaC---AAAACTTTTCAATCT-------TGTGTGGTTAATATG-ACT-GCAAAGTTTATCATTTTC----ACAATCCAACAA-ACTGGTTCT---------A TobaccoAAAAATAATTTTCCAACCTTT---CATGTGTGGATATTAAG-ATTTGTATAATGTATCAAGAACC-ACATAATCCAATGGTTAGCTTTATTCCAAGATGA Ice-plantATCACACATTCTTCCATTTCATCCCCTTTTTCTTGGATGAG-ATAAGATATGGGTTCCTGCCAC----GTGGCACCATACCATGGTTTGTTA-ACGATAA TurnipCAAAAGCATTGGCTCAAGTTG-----AGACGAGTAACCATACACATTCATACGTTTTCTTACAAG-ATAAGATAAGATAATGTTATTTCT---------A WheatGCTAGAAAAAGGTTGTGTGGCAGCCACCTAATGACATGAAGGACT-GAAATTTCCAGCACACACA-A-TGTATCCGACGGCAATGCTTCTTC-------- DuckweedATATAATATTAGAAAAAAATC-----TCCCATAGTATTTAGTATTTACCAAAAGTCACACGACCA-CTAGACTCCAATTTACCCAAATCACTAACCAATT LarchTTCTCGTATAAGGCCACCA-------TTGGTAGACACGTAGTATGCTAAATATGCACCACACACA-CTATCAGATATGGTAGTGGGATCTG--ACGGTCA CottonACCAATCTCT---AAATGTT----GTGAGCT---TAG-GCCAAATTT-TATGACTATA--TAT----AGGGGATTGCACC----AAGGCAGTG-ACACTA PeaGGCAGTGGCC---AACTAC--------------------CACAATTT-TAAGACCATAA-TAT----TGGAAATAGAA------AAATCAAT--ACATTA TobaccoGGGGGTTGTT---GATTTTT----GTCCGTTAGATAT-GCGAAATATGTAAAACCTTAT-CAT----TATATATAGAG------TGGTGGGCA-ACGATG Ice-plantGGCTCTTAATCAAAAGTTTTAGGTGTGAATTTAGTTT-GATGAGTTTTAAGGTCCTTAT-TATA---TATAGGAAGGGGG----TGCTATGGA-GCAAGG TurnipCACCTTTCTTTAATCCTGTGGCAGTTAACGACGATATCATGAAATCTTGATCCTTCGAT-CATTAGGGCTTCATACCTCT----TGCGCTTCTCACTATA WheatCACTGATCCGGAGAAGATAAGGAAACGAGGCAACCAGCGAACGTGAGCCATCCCAACCA-CATCTGTACCAAAGAAACGG----GGCTATATATACCGTG DuckweedTTAGGTTGAATGGAAAATAG---AACGCAATAATGTCCGACATATTTCCTATATTTCCG-TTTTTCGAGAGAAGGCCTGTGTACCGATAAGGATGTAATC LarchCGCTTCTCCTCTGGAGTTATCCGATTGTAATCCTTGCAGTCCAATTTCTCTGGTCTGGC-CCA----ACCTTAGAGATTG----GGGCTTATA-TCTATA CottonT-TAAGGGATCAGTGAGAC-TCTTTTGTATAACTGTAGCAT--ATAGTAC PeaTATAAAGCAAGTTTTAGTA-CAAGCTTTGCAATTCAACCAC--A-AGAAC TobaccoCATAGACCATCTTGGAAGT-TTAAAGGGAAAAAAGGAAAAG--GGAGAAA Ice-plantTCCTCATCAAAAGGGAAGTGTTTTTTCTCTAACTATATTACTAAGAGTAC LarchTCTTCTTCACAC---AATCCATTTGTGTAGAGCCGCTGGAAGGTAAATCA TurnipTATAGATAACCA---AAGCAATAGACAGACAAGTAAGTTAAG-AGAAAAG WheatGTGACCCGGCAATGGGGTCCTCAACTGTAGCCGGCATCCTCCTCTCCTCC DuckweedCATGGGGCGACG---CAGTGTGTGGAGGAGCAGGCTCAGTCTCCTTCTCG
11
Math and Computer Science ChallengesBenno Schwikowski Challenge: Developing models that allow extremely efficient algorithms Parsimony score: 1 AGTCGTACGTGAC... AGTAGACGTGCCG... ACGTGAGATACGT... GAACGGAGTACGT... TCGTGACGGTGAT... ACGG ACGT J. Comp Biol. 2002
12
Math and Computer Science ChallengesBenno Schwikowski An Exact Algorithm (generalizing Sankoff and Rousseau 1975) W u [s] =best parsimony score for subtree rooted at node u, if u is labeled with string s. AGTCGTACGTG ACGGGACGTGC ACGTGAGATAC GAACGGAGTAC TCGTGACGGTG … ACGG: 2 ACGT: 1... … ACGG : 0 ACGT : 2... … ACGG : 1 ACGT : 1... … ACGG: + ACGT: 0... … ACGG: 1 ACGT: 0... 4 k entries … ACGG: 0 ACGT: + ... … ACGG: ACGT :0... W u [s] = min ( W v [t] + d(s, t) ) v : child t of u J. Comp Biol. 2002
13
Math and Computer Science ChallengesBenno Schwikowski What are good challenges to tackle? Biological/medical questions asked Experimental technologies to acquire a lot of relevant data Available datasets with a formalized notion of “data quality”
14
Math and Computer Science ChallengesBenno Schwikowski Memory complexity: O(k 4 2k ) per node Number of species Average sequence length Motif length Time complexity: Total time O(n k (4 2k + l )) J. Comp Biol. 2002
15
Technology-based challenges: Universal DNA Tag Systems Existing applications in high-throughput technologies Universal DNA arrays Padlock probes LYNX mRNA technology
16
Formalization Define: weight(A/T)=1, weight(C/G)=2 weight(AACTTG) = 1+1+2+1+1+2 = 8 melting temperature (AACTTG) = 2·weight l-u code problem Given two integers, l < u, find the largest set of tags such that Each tag has weight u Each string of weight l occurs at most once J. Comp Biol. 2000 & 2003
17
Math and Computer Science ChallengesBenno Schwikowski Challenge: Visualization Andrea Weston et al. @ ISB & Cytoscape
18
Math and Computer Science ChallengesBenno Schwikowski Challenge: Visualization Cytoscape, pre-release 2.0
19
Math and Computer Science ChallengesBenno Schwikowski A computer scientist’s perspective “Biology is so digital, and incredibly complicated […] I can't be as confident about computer science as I can about biology. Biology easily has 500 years of exciting problems to work on, it's at that level.” Donald Knuth, 7 Dec 1993 Donald Knuth
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.