UTACCEL 2010 Adventures in Biotechnology Graham Cromar
Bioinformatics Bioinformatics is about integrating biological themes together with the help of computer tools and biological databases, and gaining new knowledge from this.
Sanger sequencing
Automated Sequencing In the past, the separation of the DNA strands by electrophoresis was a time consuming process. Today, fluorescent labels and new advances in gel electrophoresis have made DNA sequencing fast and accurate. Also, the process is almost fully automated, including the read out of the final sequence.
Parallelizing Sequencing
6Introduction 1.0 Genbank doubles every 14 months (from the National Centre for Biotechnology Information) Shorter than Moore’s law (computer power doubling every 20 months!)
7Introduction 1.0 Genomes Number of base pairs ___________________________________________________________ 1971 First published DNA sequence PhiX174 5, Lambda 48, Yeast Chromosome III 316, Haemophilus influenza 1,830, Saccharomyces 12,068, C. elegans 97,000, D. melanogaster 120,000, H. sapines (draft) 2,600,000, H. sapiens 2,850,000,000 Complexity does not always correlate with size. The largest genome known to date is the amoeba!
10 The next step is to locate all of the genes and regulatory regions, describe their functions, and identify how they differ between different groups (i.e. “disease” vs “healthy”)… …bioinformatics plays a critical role Storage, search, retrieval and visualization are key
Bioinformatics will help with……. Structure- Function Relationships u Can we predict the function of protein molecules from their sequence? sequence > structure > function Prediction of some simple 3-D structures ( -helix, -sheet, membrane spanning, etc.)
12Introduction 1.0 BLAST Result Basic Local Alignment Search Tool
13Introduction 1.0 Micro-array analysis: Figure 4 Figure 1 Science Jan : The Transcriptional Program in the Response of Human Fibroblasts to Serum Vishwanath R. Iyer, Michael B. Eisen, Douglas T. Ross, Greg Schuler, Troy Moore, Jeffrey C. F. Lee, Jeffrey M. Trent, Louis M. Staudt, James Hudson Jr., Mark S. Boguski, Deval Lashkari, Dari Shalon, David Botstein, Patrick O. Brown
14 Genetic Analysis of Cancer in Families The Genetic Predisposition to Cancer PubMed Text Neighboring Common terms could indicate similar subject matter Statistical method Weights based on term frequencies within document and within the database as a whole Some terms are better than others There are over 1 million papers published in the life sciences each year!
15Introduction 1.0 Top 10 Future Challenges for Bioinformatics Precise, predictive model of transcription initiation and termination: ability to predict where and when transcription will occur in a genome Precise, predictive model of RNA splicing/alternative splicing: ability to predict the splicing pattern of any primary transcript in any tissue Precise, quantitative models of signal transduction pathways: ability to predict cellular responses to external stimuli Determining effective protein:DNA, protein:RNA and protein:protein recognition codes Accurate ab initio protein structure prediction Rational design of small molecule inhibitors of proteins Mechanistic understanding of protein evolution: understanding exactly how new protein functions evolve Mechanistic understanding of speciation: molecular details of how speciation occurs Continued development of effective gene ontologies - systematic ways to describe the functions of any gene or protein Education: development of appropriate bioinformatics curricula for secondary, undergraduate and graduate education
Tutorial