Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bioinformatics: Theory and Practice – Striking a Balance (a plea for teaching, as well as doing, Bioinformatics) Practice (Molecular Biology) Theory: Central.

Similar presentations


Presentation on theme: "Bioinformatics: Theory and Practice – Striking a Balance (a plea for teaching, as well as doing, Bioinformatics) Practice (Molecular Biology) Theory: Central."— Presentation transcript:

1 Bioinformatics: Theory and Practice – Striking a Balance (a plea for teaching, as well as doing, Bioinformatics) Practice (Molecular Biology) Theory: Central Dogma Methods: separation, visualization Experiment as “Art” Theory (population/statistical genetics) Theory: 80+ years of Mathematical Biology Methods: Ag,RFLPs,SNPs… Bioinformatics Theory: 40 years of algorithms, information theory 20+ years of statistics Current practiceMy ideal The spectrum of experimental Biology Practice – Theory

2 Teaching and Bioinformatics What is the goal? Learning Biology / learning Computer Science Becoming "computer literate" scripting/programming Exploring uncertainty –experimental shortcomings –computational biases Utility – getting something done Bioinformatics is challenging because biology is complicated and idiosyncratic

3 Biology: A “clean” experiment – Internal positive and negative controls Southern blot of human class-mu Glutathione transferase genes from individuals with low (-) or high (+) GT-tSBO activity. Bands found with high GT-tSBO (GSTM1) RFLP independent of GT-tSBO When GSTM1 is present, it is detected When it is not detected, it is absent

4 Bioinformatics – ambiguity or computational error? D3BUQ5 is “clearly” homologous to GSTA6_RAT, aligning from beginning to end Does it have a GST_C domain? Does it have glutathione transferase activity? Could it be a steroid isomerase? Prostaglandin synthetase?

5 Why is Bioinformatics “hard”? Bioinformatics is at the intersection of Biology, Computer science, and Statistics What is interesting to Computer Scientists, – algorithms, optimality – is less relevant to Biologists (text book bias) “irrelevant” parameters for Computer Scientists – DNA vs protein – are important in practice Statistics are central, and the statistical perspective is not well integrated into either Biology or CS curricula The biological assumptions behind a “null hypothesis” are rarely explicit and often idealistic Biologists do experiments (CS folks like theory). If it works, use it. Bioinformatics uses "hard/true/reproducible" techniques to solve "soft/ambiguous/varying" biological questions. A teaching "opportunity"

6 Alberts is wrong about sequence similarity (three times in three claims) “With such a large number of proteins in the database, the search programs find many nonsignificant matches, resulting in a background noise level that makes it very difficult to pick out all but the closest relatives. Generally speaking, one requires a 30% identity in sequence to consider that two proteins match. However, we know the function of many short signature sequences ("fingerprints"), and these are widely used to find more distant relationships.” – Alberts, Molecular Biology of the Cell (5 th ed, 2007) p. 139 6 Sequences producing statistically significant alignments ALWAYS share a common structure Many significant alignments share < 30% identity (<25% identity is routine, and <20% identity can be significant) In the absence of significant similarity, “fingerprints” should never be trusted.

7 How can we teach better? Discuss the strengths and weaknesses of data resources Examine how published protocols go out of date (or are optimized for different problems). Examine potential weaknesses – what do the protocols assume? Review high-profile papers with mistaken conclusions to understand what went wrong.

8 Biology 4XXX – Bioinformatics and Functional Genomics 3hr lecture, 1hr lab Introduction to Unix / perl (python) scripting / web resources –programming by imitation similarity searching / domain identification –homology, scoring matrices –errors in domain annotation (why) multiple sequence alignment –sequences vs domains evolutionary tree-building –finding the best tree –evaluating alternative trees –where is the uncertainty (why) Introduction to 'R' statistical language – programming by imitation Expression analysis – read mapping, read counting Motif extraction, mapping – motif independence? Pathway analysis – gene enrichment Gene models and alternative splicing – which gene/splicing models supported?

9 Computational and Comparative Genomics Oct 29 – Nov 4, 2014 (application deadline July 15, 2014)


Download ppt "Bioinformatics: Theory and Practice – Striking a Balance (a plea for teaching, as well as doing, Bioinformatics) Practice (Molecular Biology) Theory: Central."

Similar presentations


Ads by Google