BioInformatics - What and Why? The following power point presentation is designed to give some background information on Bioinformatics. This presentation.

Slides:



Advertisements
Similar presentations
Replication transcription processingtranslation Molecular Analysis possible to detect and analyze DNA, RNA, and protein DNA sequence represents 'genotype'
Advertisements

Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
BIOINFORMATICS Ency Lee.
Bioinformatics What is bioinformatics? Why bioinformatics? The major molecular biology facts Brief history of bioinformatics Typical problems of bioinformatics:
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
AI and Bioinformatics From Database Mining to the Robot Scientist.
Pathogenomics: Focusing studies of bacterial pathogenicity through evolutionary analysis of genomes.
1 Genetics The Study of Biological Information. 2 Chapter Outline DNA molecules encode the biological information fundamental to all life forms DNA molecules.
JYC: CSM17 BioinformaticsCSM17 Week 10: Summary, Conclusions, The Future.....? Bioinformatics is –the study of living systems –with respect to representation,
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
Copyright OpenHelix. No use or reproduction without express written consent1 Organization of genomic data… Genome backbone: base position number sequence.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Bioinformatics and Phylogenetic Analysis
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
CHAPTER 15 Microbial Genomics Genomic Cloning Techniques Vectors for Genomic Cloning and Sequencing MS2, RNA virus nt sequenced in 1976 X17, ssDNA.
Bioinformatics Student host Chris Johnston Speaker Dr Kate McCain.
10 Genomics, Proteomics and Genetic Engineering. 2 Genomics and Proteomics The field of genomics deals with the DNA sequence, organization, function,
BI420 – Course information Web site: Instructor: Gabor Marth Teaching.
1 Characterization, Amplification, Expression Screening of libraries Amplification of DNA (PCR) Analysis of DNA (Sequencing) Chemical Synthesis of DNA.
Exploring the Biology of Disulfide-Rich Hyperthermophiles through Protein Phylogenetic Profiles Navapoln Ramakul 1, Morgan Beeby 12, and Todd O. Yeates.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Goals of the Human Genome Project determine the entire sequence of human DNA identify all the genes in human DNA store this information in databases improve.
© Wiley Publishing All Rights Reserved. Biological Sequences.
Statistics in Bioinformatics May 12, 2005 Quiz 3-on May 12 Learning objectives-Understand equally likely outcomes, counting techniques (Example, genetic.
Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011.
Genome projects and model organisms Level 3 Molecular Evolution and Bioinformatics Jim Provan.
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
Bioinformatics.
Bioinformatics Timothy Ketcham Union College Gradutate Seminar 2003 Bioinformatics.
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
Chapter 14 Genomes and Genomics. Sequencing DNA dideoxy (Sanger) method ddGTP ddATP ddTTP ddCTP 5’TAATGTACG TAATGTAC TAATGTA TAATGT TAATG TAAT TAA TA.
Tools of Bioinformatics
Supporting bioinformatics education in the Asia-Pacific Shoba Ranganathan Professor and Chair – Bioinformatics Dept. of Chemistry and Biomolecular Sciences.
Manipulation of DNA. Restriction enzymes are used to cut DNA into smaller fragments. Different restriction enzymes recognize and cut different DNA sequences.
Organizing information in the post-genomic era The rise of bioinformatics.
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
REMINDERS 2 nd Exam on Nov.17 Coverage: Central Dogma of DNA Replication Transcription Translation Cell structure and function Recombinant DNA technology.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Overview of Bioinformatics 1 Module Denis Manley..
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
November 18, 2000ICTCM 2000 Introductory Biological Sequence Analysis Through Spreadsheets Stephen J. Merrill Sandra E. Merrill Marquette University Milwaukee,
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
Central dogma: the story of life RNA DNA Protein.
EB3233 Bioinformatics Introduction to Bioinformatics.
Biotechnology and Genomics Chapter 16. Biotechnology and Genomics 2Outline DNA Cloning  Recombinant DNA Technology ­Restriction Enzyme ­DNA Ligase 
Bioinformatics and Computational Biology
Human Genomics. Writing in RED indicates the SQA outcomes. Writing in BLACK explains these outcomes in depth.
Nothing in (computational) biology makes sense except in the light of evolution after Theodosius Dobzhansky (1970) Comparative genomics, genome context.
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
Chapter 20 DNA Technology and Genomics. Biotechnology is the manipulation of organisms or their components to make useful products. Recombinant DNA is.
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
MICROBIOLOGIA GENERALE Prokaryotic genomes. The prokaryotic genome.
MICROBIOLOGIA GENERALE Prokaryotic genomes. The Escherichia coli nucleoid.
BME435 BIOINFORMATICS.
Microbial genomics.
Biotechnology.
Bioinformatics Overview
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
University of Pittsburgh
14-3 Human Molecular Genetics
Genomes and Their Evolution
COT6930 High Performance Computing and Bioinformatics Course overview, Introduction Instructors: Xingquan (Hill) Zhu and Imad Mahgoub
Genome organization and Bioinformatics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Explore Evolution: Instrument for Analysis
From Mendel to Genomics
MULTIPLE SEQUENCE ALIGNMENT
Introduction to Bioinformatics
Presentation transcript:

BioInformatics - What and Why? The following power point presentation is designed to give some background information on Bioinformatics. This presentation is modified from information supplied by Dr. Bruno Gaeta, and with permission from eBioInformatics Pty Ltd (c) Copywright

The need for bioinformaticists. The number of entries in data bases of gene sequences is increasing exponentially. Bioinformaticians are needed to understand and use this information GenBank growth

Genome sequencing projects, including the human genome project are producing vast amounts of information. The challenge is to use this information in a useful way COMPLETE/PUBLIC Aquifex aeolicus Pyrococcus horikoshii Bacillus subtilis Treponema pallidum Borrelia burgdorferi Helicobacter pylori Archaeoglobus fulgidus Methanobacterium thermo. Escherichia coli Mycoplasma pneumoniae Synechocystis sp. PCC6803 Methanococcus jannaschii Saccharomyces cerevisiae Mycoplasma genitalium Haemophilus influenzae COMPLETE/PENDING PUBLICATION Rickettsia prowazekii Pseudomonas aeruginosa Pyrococcus abyssii Bacillus sp. C-125 Ureaplasma urealyticum Pyrobaculum aerophilum ALMOST/PUBLIC Pyrococcus furiosus Mycobacterium tuberculosis H37Rv Mycobacterium tuberculosis CSU93 Neisseria gonorrhea Neisseria meningiditis Streptococcus pyogenes Terry Gaasterland, Siv Andersson, Christoph Sensen Publically available genomes (April 1998)

”..We must hook our individual computers into the worldwide network that gives us access to daily changes in the databases and also makes immediate our communications with each other. The programs that display and analyze the material for us must be improved - and we must learn to use them more effectively. Like the purchased kits, they will make our life easier, but also like the kits, we must understand enough of how they work to use them effectively…” Walter Gilbert (1991) “Towards a paradigm shift in biology” Nature News and Views 349:99 “Towards a paradigm shift in biology” Nature News and Views 349:99 Bioinformatics impacts on all aspects of biological research.

Promises of genomics and bioinformatics  Medicine  Knowledge of protein structure facilitates drug design  Understanding of genomic variation allows the tailoring of medical treatment to the individual’s genetic make-up  Genome analysis allows the targeting of genetic diseases  The effect of a disease or of a therapeutic on RNA and protein levels can be elucidated  The same techniques can be applied to biotechnology, crop and livestock improvement, etc...

What is bioinformatics?  Application of information technology to the storage, management and analysis of biological information  Facilitated by the use of computers

What is bioinformatics?  Sequence analysis  Geneticists/ molecular biologists analyse genome sequence information to understand disease processes  Molecular modeling  Crystallographers/ biochemists design drugs using computer-aided tools  Phylogeny/evolution  Geneticists obtain information about the evolution of organisms by looking for similarities in gene sequences  Ecology and population studies  Bioinformatics is used to handle large amounts of data obtained in population studies  Medical informatics  Personalised medicine

Sequence analysis: overview Nucleotide sequence file Search databases for similar sequences Sequence comparison Multiple sequence analysis Design further experiments Restriction mapping PCR planning Translate into protein Search for known motifs RNA structure prediction non-coding coding Protein sequence analysis Search for protein coding regions Manual sequence entry Sequence database browsing Sequencing project management Protein sequence file Search databases for similar sequences Sequence comparison Search for known motifs Predict secondary structure Predict tertiary structure Create a multiple sequence alignment Edit the alignment Format the alignment for publication Molecular phylogeny Protein family analysis Nucleotide sequence analysis Sequence entry

Gene Sequencing: Automated chemcial sequencing methods allow rapid generation of large data banks of gene sequences

Database similarity searching: The BLAST program has been written to allow rapid comparison of a new gene sequence with the 100s of 1000s of gene sequences in data bases Sequences producing significant alignments: (bits) Value gnl|PID|e (Z74911) ORF YOR003w [Saccharomyces cerevisiae] 112 7e-26 gi| (U18795) Prb1p: vacuolar protease B [Saccharomyces ce e-24 gnl|PID|e (X59720) YCR045c, len:491 [Saccharomyces cerevi e-13 gnl|PID|e (Z71514) ORF YNL238w [Saccharomyces cerevisiae] gnl|PID|e (Z71603) ORF YNL327w [Saccharomyces cerevisiae] gnl|PID|e (Z71554) ORF YNL278w [Saccharomyces cerevisiae] gnl|PID|e (Z74911) ORF YOR003w [Saccharomyces cerevisiae] Length = 478 Score = 112 bits (278), Expect = 7e-26 Identities = 85/259 (32%), Positives = 117/259 (44%), Gaps = 32/259 (12%) Query: 2 QSVPWGISRVQAPAAHNRG LTGSGVKVAVLDTGIST-HPDLNIRGG-ASFV 50 + PWG+ RV G G GV VLDTGI T H D R + + Sbjct: 174 EEAPWGLHRVSHREKPKYGQDLEYLYEDAAGKGVTSYVLDTGIDTEHEDFEGRAEWGAVI 233 Query: 51 PGEPSTQDGNGHGTHVAGTIAALNNSIGVLGVAPSAELYXXXXXXXXXXXXXXXXXQGLE 110 P D NGHGTH AG I + + GVA G+E Sbjct: 234 PANDEASDLNGHGTHCAGIIGSKH-----FGVAKNTKIVAVKVLRSNGEGTVSDVIKGIE 288

Sequence comparison: Gene sequences can be aligned to see similarities between gene from different sources 768 TT....TGTGTGCATTTAAGGGTGATAGTGTATTTGCTCTTTAAGAGCTG 813 || || || | | ||| | |||| ||||| ||| ||| 87 TTGACAGGTACCCAACTGTGTGTGCTGATGTA.TTGCTGGCCAAGGACTG AGTGTTTGAGCCTCTGTTTGTGTGTAATTGAGTGTGCATGTGTGGGAGTG 863 | | | | |||||| | |||| | || | | 136 AAGGATC TCAGTAATTAATCATGCACCTATGTGGCGG AAATTGTGGAATGTGTATGCTCATAGCACTGAGTGAAAATAAAAGATTGT 913 ||| | ||| || || ||| | ||||||||| || |||||| | 173 AAA.TATGGGATATGCATGTCGA...CACTGAGTG..AAGGCAAGATTAT 216

Restriction mapping: Genes can be analysed to detect gene sequences that can be cleaved with restriction enzymes

PCR Primer Design: Oligonucleotides for use in the polymerisation chain reaction can be designed using computer based prgrams OPTIMAL primer length --> 20 MINIMUM primer length --> 18 MAXIMUM primer length --> 22 OPTIMAL primer melting temperature --> MINIMUM acceptable melting temp --> MAXIMUM acceptable melting temp --> MINIMUM acceptable primer GC% --> MAXIMUM acceptable primer GC% --> Salt concentration (mM) --> DNA concentration (nM) --> MAX no. unknown bases (Ns) allowed --> 0 MAX acceptable self-complementarity --> 12 MAXIMUM 3' end self-complementarity --> 8 GC clamp how many 3' bases --> 0

Plot created using codon preference (GCG) Gene discovery: Computer program can be used to recognise the protein coding regions in DNA 01,0002,0003,0004,000 3,0002,0001,

RNA structure prediction: Structural features of RNA can be predicted G G A C A G G A G G A U A C C G C G G U C C U G C C G G UC C U C A C U U GGAC U U A G U A U CA U C A G U C UG CGC A AU A G G U AA C G C G U

Protein structure prediction: Particular structural features can be recognised in protein sequences KD Hydrophobicity Surface Prob. Flexibility Antigenic Index CF Turns CF Alpha Helices CF Beta Sheets GOR Alpha Helices GOR Turns GOR Beta Sheets Glycosylation Sites

Protein Structure : the 3-D structure of proteins is used to understand protein function and design new drugs

Multiple sequence alignment: Sequences of proteins from different organisms can be aligned to see similarities and differences Alignment formatted using MacBoxshade

Phylogeny inference: Analysis of sequences allows evolutionary relationships to be determined E.coli C.botulinum C.cadavers C.butyricum B.subtilis B.cereus Phylogenetic tree constructed using the Phylip package

Large scale bioinformatics: genome projects  Mapping Identifying the location of clones and markers on the chromosome by genetic linkage analysis and physical mapping  Sequencing Assembling clone sequence reads into large (eventually complete) genome sequences  Gene discovery Identifying coding regions in genomic DNA by database searching and other methods  Function assignment Using database searches, pattern searches, protein family analysis and structure prediction to assign a function to each predicted gene Data mining Searching for relationships and correlations in the information  Genome comparison Comparing different complete genomes to infer evolutionary history and genome rearrangements

Challenges in bioinformatics  Explosion of information  Need for faster, automated analysis to process large amounts of data  Need for integration between different types of information (sequences, literature, annotations, protein levels, RNA levels etc…)  Need for “smarter” software to identify interesting relationships in very large data sets  Lack of “bioinformaticians”  Software needs to be easier to access, use and understand  Biologists need to learn about the software, its limitations, and how to interpret its results