Bioinformatics: A New Frontier for Computer Scientists Ruth G. Alscher Lenwood S. Heath
A new language has been created. Words in the language that are useful for today’s talk. Genomics Functional Genomics Proteomics cDNA microarrays Global Gene Expression Patterns The Language of the New Biology
Human Genome Project How many individuals? Which races? Statistics about sequencing Etc. (Ruth)
New Computational Tools Needed for Biology Sequencing Analyzing experimental data Representing vast quantities of information Searching Pattern matching Data mining Gene discovery Function discovery
Molecular Biology Cell function Nucleic acids, DNA, RNA, chromosomes, genes Amino acids, proteins
DNA Strand A= adenine complements T= thymine C = cytosine complements G=guanine
Complementary DNA Strands Double-Stranded DNA
RNA Strand U=uracil replaces T= thymine
Proteins Unlike DNA, proteins have three-dimensional structure Protein folds to a three-dimensional shape that minimizes energy
Amino Acids Protein is a large molecule that is a chain of amino acids (100 to 5000). There are 20 common amino acids (Alanine, Cysteine, …, Tyrosine) Three bases --- a codon --- suffice to encode an amino acid. There are also START and STOP codons.
Chromosomes Long molecules of DNA: 10^4 to 10^8 base pairs 26 matched pairs in humans A gene is a subsequence of a chromosome that encodes a protein. Proteins associated with regulation. Only a fraction of the genes are in use at any time. Every gene is present in every cell.
Cell’s Fetch-Execute Cycle Stored Program: DNA, chromosomes, genes Fetch/Decode: RNA, ribosomes Execute Functions: Proteins --- oxygen transport, cell structures, enzymes Inputs: Nutrients, environmental signals, external proteins Outputs: Waste, response proteins, enzymes
Evolution Genotype: Genetic makeup of individuals or species Mutations are basis for evolution of species Phenotype: Perceived traits of organism (eye color, number of limbs, etc.); controlled by interaction of many genes
Genetics An individual organism has some set of genes, stored in DNA of each cell. Gene set determines biological functions and individual characteristics. Genetic makeup of a particular species defines that species.
Protein-Coding Genes
Genomics: Discovery of genetic sequences and the ordering of those sequences into individual genes, into gene families, and into chromosomes. Identification of sequences that code for gene products/proteins and sequences that act as regulatory elements. Genomics
Functional Genomics: The biological role of individual genes, mechanisms underlying the regulation of their expression, and regulatory interactions among them. Functional Genomics
Biologists Need Computer Scientists Assembling DNA fragments Physical mapping Identifying genes and gene families Protein folding Determining protein function Data analysis (microarrays) Data visualization Searching Sequence alignment Data mining
How to use microarrays to learn more about the influence of drought stress on gene expression? Where the biologists need the computer scientists. A. Confounding factors in the raw data 1. Limitations in accuracy (technique) 2. Biological variation (individuals) B. How to apply corrections for these confounding factors to maximize the predictive power of the data. C. Modeling regulatory networks. Microarray Data Analysis
Effects of drought stress on loblolly pine- a pilot experiment Virginia Tech: Plant Biologists: Ruth Alscher, Boris Chevone. CS: Lenny Heath and colleagues. Statistics: Ina Hoeschele, Shun-Hwa Li. NC State (Forest Biotechnology): Ying-Hsuan Sun, Ron Sederoff, Ross Whetten Effects of Drought Stress
Spots: (Sequences affixed to slide) TreatmentControl Mix 123 Excitation Emission Detection Relative Abundance Detection Hybridization Relative Abundance Detection
Biological Variation as Reflected in A Comparison of Expression in Two Trees of the Same Clone. A Subquadrant Biological Variation
Detection of gene expression effects on microarrays Characterize gene function Test mutant phenotypes Genetic Regulatory Networks Identify mutants Iterative strategy for detection of genetic interactions using microarrays Iterative Strategy
Glycolysis, Citric Acid Cycle, and Related Metabolic Processes
Gene Expression: Control Points
Responses to Environmental Signals
Intracellular Decision Making
Drosophila Genome
A publicly accessible collection of cDNAs representing mRNAs present in specific tissues. The cDNAs have been partially sequenced and identified, where possible, as homologs to publicly accessible genes of known function. Expressed Sequence Tags
Microarray Quotes “ A fresh, comprehensive and open-mined look at every problem in biology” Brown and Botstein, page 33. WOW! “… the construction of a Biological Periodic Table…” Lander, page 3. “… as model-independent as possible…” Brown and Botstein, page 33. From The Chipping Forecast
ROS arise throughout the cell. ROS arise throughout the cell
Free Radicals
Bioinformatics Institute Research institute based at Virginia Tech Begins July 1 with $3 million Will occupy 2 building and have 100+ employees in 4 years
Getting Into Bioinformatics Get a minor in biology Get involved with bioinformatics research –Dr. Alscher –Dr. Heath –Dr. Keller –Dr. Ramakrishnan –Dr. Watson