Topics The topics: basic concepts of molecular biology more on Perl

Slides:



Advertisements
Similar presentations
Bioinformatics Ayesha M. Khan Spring 2013.
Advertisements

Beyond the Human Genome Project New Discovery Paths and Diverse Applications.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
On line (DNA and amino acid) Sequence Information Lecture 7.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
BIOINFORMATICS Ency Lee.
Bioinformatics What is bioinformatics? Why bioinformatics? The major molecular biology facts Brief history of bioinformatics Typical problems of bioinformatics:
Basic Molecular Biology Many slides by Omkar Deshpande.
1 Genetics The Study of Biological Information. 2 Chapter Outline DNA molecules encode the biological information fundamental to all life forms DNA molecules.
Workshop: computational gene prediction in DNA sequences (intro)
Bioinformatics Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly Lecture 1 Introduction Aleppo University Faculty of technical engineering.
Protein databases Morten Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
Archives and Information Retrieval
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
The Cell, Central Dogma and Human Genome Project.
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
Scientific Data Mining: Emerging Developments and Challenges F. Seillier-Moiseiwitsch Bioinformatics Research Center Department of Mathematics and Statistics.
The Protein Data Bank (PDB)
Class European Resources Protein Focused. Protein Databases EBI – European Bioinformatics Institute
Summer Bioinformatics Workshop 2008 Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester
Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center Introduction to Bioinformatics.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Signaling Pathways and Summary June 30, 2005 Signaling lecture Course summary Tomorrow Next Week Friday, 7/8/05 Morning presentation of writing assignments.
Goals of the Human Genome Project determine the entire sequence of human DNA identify all the genes in human DNA store this information in databases improve.
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
What is genomics? Study of genomes. What is the genome? Entire genetic compliment of an organism.
Overview of Bioinformatics A/P Shoba Ranganathan Justin Choo National University of Singapore A Tutorial on Bioinformatics.
On line (DNA and amino acid) Sequence Information
Elements of Molecular Biology All living things are made of cells All living things are made of cells Prokaryote, Eukaryote Prokaryote, Eukaryote.
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
Bioinformatics.
Development of Bioinformatics and its application on Biotechnology
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Chapter 14 Genomes and Genomics. Sequencing DNA dideoxy (Sanger) method ddGTP ddATP ddTTP ddCTP 5’TAATGTACG TAATGTAC TAATGTA TAATGT TAATG TAAT TAA TA.
Bioinformatics for biomedicine
Introduction to databases Tuomas Hätinen. Topics File Formats Databases -Primary structure: UniProt -Tertiary structure: PDB Database integration system.
Human Genome Project. In 2003 scientists in the Human Genome Project obtained the DNA sequence of the 3 billion base pairs making up the human genome.
Beyond the Human Genome Project Future goals and projects based on findings from the HGP.
Genome Project and Bioinformatics Dr Tan Tin Wee Director Bioinformatics Centre.
Write down what you know about the human genome project.
Biological Databases By : Lim Yun Ping E mail :
Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers.
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
Part I: Identifying sequences with … Speaker : S. Gaj Date
Organizing information in the post-genomic era The rise of bioinformatics.
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
Genomics and Arabidopsis. What is ‘genomics’? Study of an organism’s entire genome –All the DNA encoded in the organism –Nucleus, mitochondria, chloroplasts.
REMINDERS 2 nd Exam on Nov.17 Coverage: Central Dogma of DNA Replication Transcription Translation Cell structure and function Recombinant DNA technology.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Computers and Scientific Thinking David Reed, Creighton University Computers in Biology and Bioinformatics 1.
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
Central dogma: the story of life RNA DNA Protein.
EB3233 Bioinformatics Introduction to Bioinformatics.
Application of Bioinformatics in Genetic Research Instructors: Dr. Henry Baker Dr. Luciano Brocchieri Dr. Michele Tennant Dr. Lei Zhou
Bioinformatics and Computational Biology
Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore.
Genome They are the volums of an encyclopaedia called Genome. Cell Nucleus Tissues The chromosomes contains the instruction of alive beings.
A Lot More Advanced Biotechnology Tools (Part 2) Sequencing.
Bioinformatics Overview
Biological Databases By: Komal Arora.
Archives and Information Retrieval
생물정보학 Bioinformatics.
Genomes and Their Evolution
Introduction to Bioinformatics
The Study of Biological Information
In 2003 scientists in the Human Genome Project achieved a long-sought goal by obtaining the DNA sequence of the 3.2 billion base pairs (the order of As,
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Topics The topics: basic concepts of molecular biology more on Perl overview of the field biological databases and database searching sequence alignments phylogenetic trees protein structure prediction microarray data analysis

The Human Genome Project The human genome sequence is complete - almost - approximately 3 billion base pairs. 23 chromosomes, starting from 1990 Some of these slides are adapted from Lecture Notes of Stuart M. Brown at NYU

Whole genome sequencing has now become routine

How does the human genome stack up? Organism Genome Size (Bases) Estimated Genes Human (Homo sapiens) 3.2 billion 25,000 Laboratory mouse (M. musculus) 2.6 billion Mustard weed (A. thaliana) 100 million Roundworm (C. elegans) 97 million 19,000 Fruit fly (D. melanogaster) 137 million 13,000 Yeast (S. cerevisiae) 12.1 million 6,000 Bacterium (E. coli) 4.6 million 3,200 Human immunodeficiency virus (HIV) 9700 9 U.S. Department of Energy Genome Programs, Genomics and Its Impact on Science and Society, 2003

The Path Forward How does DNA impact health? What do all the genes do? Identify and understand the difference in DNA sequence (A,T,C,G) among human populations What do all the genes do? Discover the functions of human genes by experimentation and by finding genes with similar funcs in the model organisms What are the functions of nongene areas? Identify important elements in the nongene regions of DNA How does info in the genome enable life? Explore life at the ultimate level of the whole organism instead of single genes/proteins. U.S. Department of Energy, 2005

Diverse applications Medicine – customized treatments, … Microbes for energy and the environment – generate clean energy source, clean up toxic wastes,… Bioanthropology – human lineage Agriculture, livestock breeding, Bioprocessing – crops&animals more resistant to diseases, efficient industrial processes,… DNA identification – implicate people accused of crimes, identify contaminants in air, water, … U.S. Department of Energy, 2005

Genomics: Journey to the Center of Biology Without doubt, the greatest achievement in biology over the past millennium has been the elucidation of the mechanism of heredity. The instructions for assembling every organism on the planet are all specified in DNA sequences that can be translated into digital information and stored in a computer for analysis. As a consequence of this revolution, biology in the 21st century is rapidly becoming an information science. Powerful new types of bioinformatics will clearly be required to assimilate and interpret the data that will issue from various types of genomics research. Eric Lander & Robert Weinberg, Science, 2000

Nucleic Acid Sequence Databases the principal nucleic acid sequence databases are GeneBank, EMBL and DDBJ, which each collect a portion of the total sequence data reported world-wide, and exchange new and updated entries on a daily basis Nucleic acid sequence Databases EMBL (European Molecular Biology Laboratory) GenBank (USA) DDBJ (DNA Data Bank of Japan) ENSEMBL (project between EMBL - EBI and the Sanger Institute, to produce and maintain automatic annotation on selected eukaryotic genomes ) dbEST (division of GenBank) GSDB (Genome Sequence DataBase, division of GenBank)

GenBank Once upon a time, GenBank sent out sequence updates on CD-ROM disks a few times per year. .

Specialised Genomic Resources In addition to the comprehensive DNA sequence DBs, there is a variety of more specialised genomic resources. These so called boutique DBs bring focus to species-specific genomics and to particular sequencing techniques. Specialised Genomic Resources SGD – Saccharomyces Genome Database UniGene - gene-oriented clusters from GenBank TIGR - Databases of The Institute for Genomic Research ACeDB – A C.elegans DataBase

Protein Information Resources The primary structure of a protein is its amino acid sequence The second structure of a protein corresponds to regions of local regularity (e.g., α-helices and β-strands). The tertiary structure of a protein arises from the packing of its secondary structure elements, which may form discrete domains within a fold. Levels of protein sequence and structural organisation: primary tertiary secondary

Primary Protein Databases The primary structure of a protein is its amino acid sequence. These are stored in primary databases as linear alphabets that denote the constituent residues. Protein sequence Databases SWISS-PROT - Protein knowledgebase TrEMBL - Computer-annotated supplement to Swiss-Prot PIR – Protein Information Resource MIPS – Munich Information Centre for Protein Sequences NRL-3D - produced by PIR

Structure Classification DBs Contain 3D structures available from crystallographic and spectroscopic studies Structure Classification Databases PDB – Protein Data Bank CATH – Class, Architecture, Topology, Homology SCOP – Structural Classification of Proteins

PDB: Growth (2006)

Databases concerning Mutations dbSNP http://www.ncbi.nlm.nih.gov/SNP HGBASE (Human Genome Variation Database) http://hgbase.cgr.ki.se The SNP Consortium (TSC) http://snp.cshl.org

Literature Databases PubMed http://www.ncbi.nlm.nih.gov/entrez/query Bioinformatics Online http://www.bioinformatics.oupjournals.org Nature http://www.nature.com Science http://www.sciencemag.org

Systems Biology Integrate different levels of information to understand how biological systems function Use computational and mathematical models to analyze, model and simulate cellular networks, interactions and pathways.

Microarray DNA microarray is a new technology to measure the level of the mRNA gene products of a living cell.

Affymetrix GeneChip® Probe Arrays Hybridized Probe Cell * * GeneChip Probe Array * * * * Single stranded, fluorescently labeled cRNA target Oligonucleotide probe 24~50µm 1.28cm Each probe cell or feature contains millions of copies of a specific oligonucleotide probe Image of Hybridized Probe Array BGT108_DukeUniv

Bioinformatics Tools Database & searching Computational algorithms Alignment Similarity Clustering Pattern Searching Structure predictions Statistical methods Data visualization

Bioinformatics Bioinformatics is the research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data; Computational biology is the development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems.