Bioinformatics. Not only small molecules and QM, MM techniques rule the world.

Slides:



Advertisements
Similar presentations
1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
Advertisements

Sources Page & Holmes Vladimir Likic presentation: 20show.pdf
Measuring the degree of similarity: PAM and blosum Matrix
Phylogenetic Trees Understand the history and diversity of life. Systematics. –Study of biological diversity in evolutionary context. –Phylogeny is evolutionary.
Last lecture summary.
Bioinformatics What is bioinformatics? Why bioinformatics? The major molecular biology facts Brief history of bioinformatics Typical problems of bioinformatics:
Lecture 8 Alignment of pairs of sequence Local and global alignment
Sequence Alignments and Database Searches Introduction to Bioinformatics.
Structural bioinformatics
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004.
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
Using Bioinformatics to Make the Bio- Math Connection The Confessions of a Biology Teacher.
Bioinformatics and Phylogenetic Analysis
Lecture 1 BNFO 240 Usman Roshan. Course overview Perl progamming language (and some Unix basics) Sequence alignment problem –Algorithm for exact pairwise.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 18: Application-Driven Hardware Acceleration (4/4)
Summer Bioinformatics Workshop 2008 Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University.
Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
Sequence similarity.
Similar Sequence Similar Function Charles Yan Spring 2006.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Incorporating Bioinformatics in an Algorithms Course Lawrence D’Antonio Ramapo College of New Jersey.
Protein Structures.
Sequencing a genome and Basic Sequence Alignment
Information theoretic interpretation of PAM matrices Sorin Istrail and Derek Aguiar.
Pairwise alignments Introduction Introduction Why do alignments? Why do alignments? Definitions Definitions Scoring alignments Scoring alignments Alignment.
Pairwise & Multiple sequence alignments
Protein Tertiary Structure Prediction
C OMPUTATIONAL BIOLOGY. O UTLINE Proteins DNA RNA Genetics and evolution The Sequence Matching Problem RNA Sequence Matching Complexity of the Algorithms.
An Introduction to Bioinformatics
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
Protein Evolution and Sequence Analysis Protein Evolution and Sequence Analysis.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
Protein Sequence Alignment and Database Searching.
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
Intelligent Systems for Bioinformatics Michael J. Watts
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
DNA alphabet DNA is the principal constituent of the genome. It may be regarded as a complex set of instructions for creating an organism. Four different.
Sequencing a genome and Basic Sequence Alignment
CHAPTER 12 STUDY GUIDE MATER LAKES ACADEMY MR. R. VAZQUEZ BIOLOGY
Construction of Substitution Matrices
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Last lecture summary. New generation sequencing (NGS) The completion of human genome was just a start of modern DNA sequencing era – “high-throughput.
Last lecture summary. Flavors of sequence alignment pair-wise alignment × multiple sequence alignment.
November 18, 2000ICTCM 2000 Introductory Biological Sequence Analysis Through Spreadsheets Stephen J. Merrill Sandra E. Merrill Marquette University Milwaukee,
Central dogma: the story of life RNA DNA Protein.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
Sequence Alignment.
Construction of Substitution matrices
Step 3: Tools Database Searching
Protein Synthesis The process of protein synthesis is explained by the central dogma of molecular biology, which states that: DNA  RNA  Proteins How.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Protein Tertiary Structure Prediction Structural Bioinformatics.
DNA sequences alignment measurement Lecture 13. Introduction Measurement of “strength” alignment Nucleic acid and amino acid substitutions Measurement.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Last lecture summary. Sequence alignment What is sequence alignment Three flavors of sequence alignment Point mutations, indels.
Prepared By: Syed Khaleelulla Hussaini. Outline Proteins DNA RNA Genetics and evolution The Sequence Matching Problem RNA Sequence Matching Complexity.
Bioinformatics Overview
Introduction to Bioinformatics II
Protein Structures.
Protein structure prediction.
Basic Local Alignment Search Tool
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

Bioinformatics

Not only small molecules and QM, MM techniques rule the world.

Central dogma of molecular biology Term is due to Francis Crick The conversion DNA → protein is not direct, RNA is involved DNA is the information store, RNA is messenger (mRNA), transporter (tRNA), biomolecular nanomachine (rRNA) source: wikipedia.org

Nucleic acids four letters (DNA, RNA) sequence - AACTAACG (5’ → 3’) DNA – double helix RNA – “single stranded” helix, folding (double helical regions, C2’ -OH → secondary and tertiary motifs)

nucleoside nucleotide

B-DNAA-DNAZ-DNA B A Z

RNA secondary motifs Nowakowski and Tinoco, Seminars in Virology 8, 153, 1997.

RNA source:

Proteins 20 letters primary structure - sequence AMNTSSTVG (N-end → C- end) Alberts, Molecular Biology of the Cell, 5th Ed.

secondary structure (random coil,  -helix, β-sheet, loops) several secondary structure elements form motifs e.g. greek key, β-α-β, HTH

tertiary structure (the arrangements of motifs into domain/s) quartenary structure (multimeric complexes)

Proteins source:

Proteins source: Petsko, Ringe – Protein structure and function

Systems biology focuses on the systematic study of complex interactions in biological systems using a new perspective - holism instead of reductionism holism – the properties of a system cannot be determined or explained by its component parts alone one of the goals of systems biology is to discover new emergent properties new field, boom since 2000, very little covered in CZ

Systems biology source: wikipedia.org

Systems biology based on mathematical modelling of systems, control theory, cybernetics engineering view on complex biological systems e.g. answers questions about robustness of the given system when one of its part fails or about response of a systems upon the change of the environmental conditions

quantum chemistry molecular dynamics bioinformatics systems biology

Bioinformatics application of information technology to the field of molecular biology, genomics and related biological disciplines tremendous amount of data the creation and advancement of databases, algorithms, computational and statistical techniques, and theory to solve problems arising from the management and analysis of biological data

Podle definičního třídění ruských vědců rozlišujeme dva obory paranormálních jevů: bioinformatika a bioenergetika. Bioinformatika (tzn. mimosmyslové vnímání, ESP) zahrnuje získávání a výměnu informací mimosmyslovou cestou (nikoli normálními smyslovými orgány). V podstatě rozlišujeme následující formy bioinformace: hypnózu (kontrolu vědomí), telepatii, dálkové vnímání, prekognici, retrokognici, mimotělní zkušenost, "vidění" rukama nebo jinými částmi těla, inspiraci a zjevení. zdroj:

Bioinformatics sequence analysis (sequence bioinformatics) structural analysis (structural bioinformatics) functional analysis (systems biology)

genetic code gene genome, genomics large data sets high throughput human genome DNA localized mainly in nucleus, each nucleus carries the whole genetic information 3.2 billions bp – genes ca 1,5 % codes for proteins, the rest - junk DNA what is proteome? proteomics Is it more difficult to study genome or proteome?

Sequential bioinformatics reconstruction of sequence fragments searching of genes and other interesting regions in the genome junk DNA – 95% of human genome is made by non-coding sequences, either no function, or not yet understood querying huge genomes for a given sequence comparison of genes within a specie – similarities between protein functions comparison of genes between species – organism's evolutionary relationships (phylogenetic analysis)

Sequence alignment Procedure of comparing sequences Point mutations – easy More difficult example However, gaps can be inserted to get something like this ACGTCTGATACGCCGTATAGTCTATCT ACGTCTGATTCGCCCTATCGTCTATCT ACGTCTGATACGCCGTATAGTCTATCT CTGATTCGCATCGTCTATCT ACGTCTGATACGCCGTATAGTCTATCT ----CTGATTCGC---ATCGTCTATCT gapless alignment gapped alignment insertion × deletion indel

Flavors of sequence alignment pair-wise alignment × multiple sequence alignment

Flavors of sequence alignment global alignment × local alignment global local align entire sequence stretches of sequence with the highest density of matches are aligned, generating islands of matches or subalignments in the aligned sequences

Identity matrix Scoring systems I DNA and protein sequences can be aligned so that the number of identically matching pairs is maximized. Counting the number of matches gives us a score (3 in this case). Higher score means better alignment. This procedure can be formalized using substitution matrix. A T T G T A – - G A C A T ATCG A1 T01 C001 G0001

Scoring systems II For nucleotide sequences identity matrix is usually good enough. For protein sequences, identity matrix is not sufficient to describe biological and evolutionary proceses. It’s because amino acids are not exchanged with the same probability as can be conceived theoretically. For example substitution of aspartic acids D by glutamic acid E is frequently observed. And change from aspartic acid to tryptophan W is very rare. Why is that? 1. Triplet-based genetic code GAT (D) → GAA (E), GAT (D) → TGG (W) 2. Both D and E have similar properties, but D and W differ considerably. D is hydrophylic, W is hydrophobic, D → W mutation can greatly alter 3D structure and consequently function.

Substitution matrices small, polar small, nonpolar polar or acidic basic large, hydrophobic aromatic Zvelebil, Baum, Understanding bioinformatics. Positive score – frequency of substitutions is greater than would have occurred by random chance. Zero score – frequency is equal to that expected by chance. Negative score – frequency is less than would have occurred by random chance.

Sequence database search BLAST Google of sequence world

Phylogenetic analysis

Structural bioinformatics the function of chemical moiety is given by its structure while DNA structure is “given” (double-helix), RNA and proteins can accommodate very different conformations (i.e. specific arrangements of atoms in 3D space) structural bioinformatics covers analysis of the NA and proteins structure prediction of structure from the sequence

Protein structure prediction secondary structure prediction the conformational state of each residue is predicted as H (helix), E (extended, β-sheet), C (coil) accuracy: 80% tertiary structure prediction why? many sequences are known, not that many 3D structures has been solved some proteins (e.g. transmembrane) are difficult to characterize experimentally many proteins have known function, but unknown structure (which is however needed to understand their behavior in detail) ab initio, threading, homology modelling

CASP Critical Assessment of Structure Prediction since 1994, every 2 years, CASP10 in preparation predict solved, but not publicly released structures competition of individual groups in 3D prediction: human groups – answer in 14 days software (automated prediction) – answer in 48 hours