Summer Bioinformatics Workshop 2008 Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Introduction to Bioinformatics Summer Bioinformatics Workshop 2008
2 Outline What is Bioinformatics The Human Genome Project Applications of Bioinformatics References Acknowledgement: The presentation includes adaptations from DOE’s “Human Genome Project and Beyond Primer” and Dr. Yan Asmann’s (Mayo Clinic) lecture notes
Summer Bioinformatics Workshop Bioinformatics Living things have the ability to store, utilize, and pass on information Bioinformatics strives to –determine what information is biologically important –decipher how it is used to precisely control the chemical environment within living organisms
Summer Bioinformatics Workshop What is Bioinformatics The collaboration of Biology and Informatics Originally referred to the use of computational tools to organize and analyze genetic and protein sequence data (first coined by Dr. Hwa Lim in 1988)
Summer Bioinformatics Workshop NCBI’s Definition of Bioinformatics NCBI (National Center for Biotechnology Information, –“Bioinformatics is the field of science in which biology, computer science, and information technology merge to form a single discipline.” –“The ultimate goal of the field is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned.”
Summer Bioinformatics Workshop Human Genome Project
Summer Bioinformatics Workshop Human Genome Project Goals include –Identify genes in human DNA –Determine sequence making up human DNA –Store this information in databases –Improve tools for data analysis –Etc. Milestone –April 2003: HGP sequencing is completed and project is declared finished two years ahead of schedule
Summer Bioinformatics Workshop Interesting Numbers characterizing the Human Genome 3 billion: –The number of chemical nucleotide bases (A, C, G, and T) contained in the haploid human genome 3 million: –The number of locations where single-base DNA differences occur in the human genome 2.4 million: –The number of bases comprising the largest known human gene (the average gene comprises 3000 bases) 30,000: –The total number of genes estimated (much lower than previous estimates of 80,000 to 140,000)
Summer Bioinformatics Workshop Interesting Numbers characterizing the Human Genome 99.9% –Fraction of nucleotide bases that are exactly the same in all people 50% –Fraction of discovered genes for which function is unknown 2% –Fraction of genome that codes for proteins (the rest: “junk”(?) DNA) 9%, 11%, 26%, 28%, 45%, 83%, 89%, and 95% –The percentage of genes E. coli, rice, roundworm, yeast, fruit fly, zebrafish, mouse, and chimpanzee share with humans, respectively.
Summer Bioinformatics Workshop How does the human genome stack up? OrganismGenome Size (Bases) Estimated Genes Human (Homo sapiens)3 billion30,000 Laboratory mouse (M. musculus) 2.6 billion30,000 Mustard weed (A. thaliana)100 million25,000 Roundworm (C. elegans)97 million19,000 Fruit fly (D. melanogaster)137 million13,000 Yeast (S. cerevisiae)12.1 million6,000 Bacterium (E. coli) 4.6 million3,200 Human immunodeficiency virus (HIV) Humans share most of the same protein families with worms, flies, and plants!
Summer Bioinformatics Workshop Anticipated Benefits of Genome Research Molecular medicine Microbial genomics Bioarchaeology Anthropology Evolution Human Migration DNA identification (forensics) Agriculture, livestock breeding, and bioprocessing
Summer Bioinformatics Workshop ELSI: Ethical, Legal, and Social Issues Privacy and confidentiality of genetic information Fairness in the use of genetic information Psychological impact, stigmatization, and discrimination Reproductive issues Clinical issues Uncertainties associated with gene tests for susceptibilities and complex conditions Fairness in access to advanced genomic technologies. Conceptual and philosophical implications Health and environmental issues Commercialization of products
Summer Bioinformatics Workshop Mike Thompson, Detroit, Michigan -- from The Detroit Free Press Source:
Summer Bioinformatics Workshop Future Challenges: What We Still Don’t Know Gene prediction and discovery –location, function, structure, regulation, etc. Single-base DNA variations among individuals –Correlation with health and disease –Disease-susceptibility prediction Genes involved in complex traits and multigene disorders Protein conservation (structure and function) Proteomes (total protein content and function) in organisms Systems biology –Coordination of gene expression and protein synthesis –Interaction of proteins in complex molecular machines –Microbial consortia useful for environmental restoration Developmental genetics and genomics Evolutionary conservation among organisms And many more …
Summer Bioinformatics Workshop Tackle Future Challenges: Bioinformatics High volume of data to store, compute, and analyze Huge amount of information to retrieve, interpret, and visualize Complex system to study, model, and simulate THAT’S WHY BIOINFORMATICS IS INDISPENSABLE!!
Summer Bioinformatics Workshop Genomics Studies Genomics –Study of the whole genome –Sequencing and annotating genomes Comparative genomics –Comparison and characterization of genomes from different species to identify genes and their functions and to investigate evolutionary history Functional genomics –Understanding the function of genes and other parts of the genome Structural genomics –Determining the 3D structure of all proteins Pharmacogenomics –Study of how an individual's genetic inheritance affects the body's response to drugs
Summer Bioinformatics Workshop Genome Sequencing Drew Sheneman, New Jersey -- The Newark Star Ledger Source:
Summer Bioinformatics Workshop Human Migration Patterns using DNA Sequences
Summer Bioinformatics Workshop Anticipated benefits: –Improved diagnosis of disease –Earlier detection of genetic predispositions to disease –Pharmacogenomics: Genetic testing before prescribing drugs Dose-selection based on genetic variations Drugs tailor-made to each patient Medicine and the New Genetics Gene Testing Pharmacogenomics Gene Therapy However, the application of pharmacogenomics in medical practice is still quite limited today, due to the lack of genetic information from a large population
Summer Bioinformatics Workshop References NCBI (National Center for Biotechnology Information) homepagehttp:// NCBI Science Primer Human Genome Project Information enome/home.shtml (esp. link to the Education module) enome/home.shtml The Human Genome Project and Beyond Primer enome/publicat/primer2001/primer.ppt enome/publicat/primer2001/primer.ppt