BIOINFORMATICS Introduction

Slides:



Advertisements
Similar presentations
1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
Advertisements

Web Resources for Bioinformatics Vadim Alexandrov and Mark Gerstein.
Basics of Comparative Genomics Dr G. P. S. Raghava.
1 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu BIOINFORMATICS Introduction Mark Gerstein, Yale University bioinfo.mbb.yale.edu/mbb452a.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Structural bioinformatics
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Bioinformatics and Phylogenetic Analysis
Introduction to Computational Biology Topics. Molecular Data Definition of data  DNA/RNA  Protein  Expression Basics of programming in Matlab  Vectors.
Introduction to BioInformatics GCB/CIS535
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Protein Modules An Introduction to Bioinformatics.
Tutorial 2: Some problems in bioinformatics 1. Alignment pairs of sequences Database searching for sequences Multiple sequence alignment Protein classification.
Similar Sequence Similar Function Charles Yan Spring 2006.
Protein Structures.
Ayesha Masrur Khan Spring Course Outline Introduction to Bioinformatics Definition of Bioinformatics and Related Fields Earliest Bioinformatics.
Protein Tertiary Structure Prediction
Cédric Notredame (30/08/2015) Chemoinformatics And Bioinformatics Cédric Notredame Molecular Biology Bioinformatics Chemoinformatics Chemistry.
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics Bioinformatics problems sequence comparison pattern/structure.
Sequencing a genome and Basic Sequence Alignment
Introduction to Bioinformatics Biostatistics & Medical Informatics 576 Computer Sciences 576 Fall 2008 Colin Dewey Dept. of Biostatistics & Medical Informatics.
Construction of Substitution Matrices
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
AdvancedBioinformatics Biostatistics & Medical Informatics 776 Computer Sciences 776 Spring 2002 Mark Craven Dept. of Biostatistics & Medical Informatics.
Proteomics Session 1 Introduction. Some basic concepts in biology and biochemistry.
Central dogma: the story of life RNA DNA Protein.
EB3233 Bioinformatics Introduction to Bioinformatics.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
داده های عظیم در دوران پساژنوم Big Data in Post Genome Era مهدی صادقی پژوهشگاه ملی مهندسی ژنتیک و زیست فناوری پژوهشکده علوم زیستی، پژوهشگاه دانش های بنیادی.
Sequence Alignment.
Construction of Substitution matrices
BMC Bioinformatics 2005, 6(Suppl 4):S3 Protein Structure Prediction not a trivial matter Strict relation between protein function and structure Gap between.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
bacteria and eukaryotes
Bioinformatics Overview
Demo: Protein Information Resource
Basics of Comparative Genomics
Techniques for Finding Patterns in Large Amounts of Data: Applications in Biology Vipin Kumar William Norris Professor and Head, Department of Computer.
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
High-throughput Biological Data The data deluge
LSM3241: Bioinformatics and Biocomputing Lecture 4: Sequence analysis methods revisited Prof. Chen Yu Zong Tel:
1 Department of Engineering, 2 Department of Mathematics,
Bioinformatics Biological Data Computer Calculations +
1 Department of Engineering, 2 Department of Mathematics,
Sequence Based Analysis Tutorial
1 Department of Engineering, 2 Department of Mathematics,
Sequence Based Analysis Tutorial
Protein Structures.
BIOINFORMATICS Summary
Protein structure prediction.
Introduction to Bioinformatic
HIDDEN MARKOV MODELS IN COMPUTATIONAL BIOLOGY
Basics of Comparative Genomics
Computational Biology
Introduction to Bioinformatics
Basic Local Alignment Search Tool
Presentation transcript:

BIOINFORMATICS Introduction Mark Gerstein, Yale University bioinfo.mbb.yale.edu/mbb452a

What is Bioinformatics? (Molecular) Bio - informatics One idea for a definition? Bioinformatics is conceptualizing biology in terms of molecules (in the sense of physical-chemistry) and then applying “informatics” techniques (derived from disciplines such as applied math, CS, and statistics) to understand and organize the information associated with these molecules, on a large-scale. Bioinformatics is “MIS” for Molecular Biology Information. It is a practical discipline with many applications.

Organizing Molecular Biology Information: Redundancy and Multiplicity Different Sequences Have the Same Structure Organism has many similar genes Single Gene May Have Multiple Functions Genes are grouped into Pathways Genomic Sequence Redundancy due to the Genetic Code How do we find the similarities?..... (idea from D Brutlag, Stanford) Integrative Genomics - genes  structures  functions  pathways  expression levels  regulatory systems  ….

A Parts List Approach to Bike Maintenance

A Parts List Approach to Bike Maintenance How many roles can these play? How flexible and adaptable are they mechanically? What are the shared parts (bolt, nut, washer, spring, bearing), unique parts (cogs, levers)? What are the common parts -- types of parts (nuts & washers)? Where are the parts located?

What is Bioinformatics? (Molecular) Bio - informatics One idea for a definition? Bioinformatics is conceptualizing biology in terms of molecules (in the sense of physical-chemistry) and then applying “informatics” techniques (derived from disciplines such as applied math, CS, and statistics) to understand and organize the information associated with these molecules, on a large-scale. Bioinformatics is “MIS” for Molecular Biology Information. It is a practical discipline with many applications.

General Types of “Informatics” techniques in Bioinformatics Databases Building, Querying Object DB Text String Comparison Text Search 1D Alignment Significance Statistics Alta Vista, grep Finding Patterns AI / Machine Learning Clustering Datamining Geometry Robotics Graphics (Surfaces, Volumes) Comparison and 3D Matching (Visision, recognition) Physical Simulation Newtonian Mechanics Electrostatics Numerical Algorithms Simulation

New Paradigm for Scientific Computing Because of increase in data and improvement in computers, new calculations become possible But Bioinformatics has a new style of calculation... Two Paradigms Physics Prediction based on physical principles Exact Determination of Rocket Trajectory Supercomputer, CPU Biology Classifying information and discovering unexpected relationships globin ~ colicin~ plastocyanin~ repressor networks, “federated” database

Bioinformatics Topics -- Genome Sequence Finding Genes in Genomic DNA introns exons promotors Characterizing Repeats in Genomic DNA Statistics Patterns Duplications in the Genome

Bioinformatics Topics -- Protein Sequence Sequence Alignment non-exact string matching, gaps How to align two strings optimally via Dynamic Programming Local vs Global Alignment Suboptimal Alignment Hashing to increase speed (BLAST, FASTA) Amino acid substitution scoring matrices Multiple Alignment and Consensus Patterns How to align more than one sequence and then fuse the result in a consensus representation Transitive Comparisons HMMs, Profiles Motifs Bioinformatics Topics -- Protein Sequence Scoring schemes and Matching statistics How to tell if a given alignment or match is statistically significant A P-value (or an e-value)? Score Distributions (extreme val. dist.) Low Complexity Sequences

Bioinformatics Topics -- Sequence / Structure Secondary Structure “Prediction” via Propensities Neural Networks, Genetic Alg. Simple Statistics TM-helix finding Assessing Secondary Structure Prediction Tertiary Structure Prediction Fold Recognition Threading Ab initio Function Prediction Active site identification Relation of Sequence Similarity to Structural Similarity

Topics -- Structures Basic Protein Geometry and Least-Squares Fitting Distances, Angles, Axes, Rotations Calculating a helix axis in 3D via fitting a line LSQ fit of 2 structures Molecular Graphics Calculation of Volume and Surface How to represent a plane How to represent a solid How to calculate an area Docking and Drug Design as Surface Matching Packing Measurement Structural Alignment Aligning sequences on the basis of 3D structure. DP does not converge, unlike sequences, what to do? Other Approaches: Distance Matrices, Hashing Fold Library

Topics -- Databases Relational Database Concepts Clustering and Trees Keys, Foreign Keys SQL, OODBMS, views, forms, transactions, reports, indexes Joining Tables, Normalization Natural Join as "where" selection on cross product Array Referencing (perl/dbm) Forms and Reports Cross-tabulation Protein Units? What are the units of biological information? sequence, structure motifs, modules, domains How classified: folds, motions, pathways, functions? Clustering and Trees Basic clustering UPGMA single-linkage multiple linkage Other Methods Parsimony, Maximum likelihood Evolutionary implications The Bias Problem sequence weighting sampling

Topics -- Genomics Expression Analysis Time Courses clustering Measuring differences Identifying Regulatory Regions Large scale cross referencing of information Function Classification and Orthologs The Genomic vs. Single-molecule Perspective Genome Comparisons Ortholog Families, pathways Large-scale censuses Frequent Words Analysis Genome Annotation Trees from Genomes Identification of interacting proteins Structural Genomics Folds in Genomes, shared & common folds Bulk Structure Prediction Genome Trees

Topics -- Simulation Molecular Simulation Parameter Sets Geometry -> Energy -> Forces Basic interactions, potential energy functions Electrostatics VDW Forces Bonds as Springs How structure changes over time? How to measure the change in a vector (gradient) Molecular Dynamics & MC Energy Minimization Parameter Sets Number Density Poisson-Boltzman Equation Lattice Models and Simplification

What is Bioinformatics? (Molecular) Bio - informatics One idea for a definition? Bioinformatics is conceptualizing biology in terms of molecules (in the sense of physical-chemistry) and then applying “informatics” techniques (derived from disciplines such as applied math, CS, and statistics) to understand and organize the information associated with these molecules, on a large-scale. Bioinformatics is “MIS” for Molecular Biology Information. It is a practical discipline with many applications.

Major Application I: Designing Drugs Understanding How Structures Bind Other Molecules (Function) Designing Inhibitors Docking, Structure Modeling (From left to right, figures adapted from Olsen Group Docking Page at Scripps, Dyson NMR Group Web page at Scripps, and from Computational Chemistry Page at Cornell Theory Center).

Major Application II: Finding Homologs

Major Application I|I: Overall Genome Characterization Overall Occurrence of a Certain Feature in the Genome e.g. how many kinases in Yeast Compare Organisms and Tissues Expression levels in Cancerous vs Normal Tissues Databases, Statistics (Clock figures, yeast v. Synechocystis, adapted from GeneQuiz Web Page, Sander Group, EBI)

Bioinformatics Schematic

Bioinformatics - History 1980 2005 2000 1990 1985 1995 Single Structures Modeling & Geometry Forces & Simulation Docking Sequences, Sequence-Structure Relationships Alignment Structure Prediction Fold recognition Genomics Dealing with many sequences Gene finding & Genome Annotation Databases Integrative Analysis Expression & Proteomics Data Datamining Simulation again….