Introduction to Bioinformatics Summary Thomas Nordahl Petersen.

Slides:



Advertisements
Similar presentations
It og Sundhed Nov Jan. Thomas Nordahl Petersen, Associate Professor Center for Biological Sequence Analysis, DTU Normal
Advertisements

It og Sundhed Thomas Nordahl Petersen, Associate Professor Center for Biological Sequence Analysis, DTU Building 208, room 021
1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
It og Sundhed Nov Jan. Thomas Nordahl Petersen, Associate Professor Center for Biological Sequence Analysis, DTU
Protein Structure and Physics. What I will talk about today… -Outline protein synthesis and explain the basic steps involved. -Go over the Chemistry of.
RNA and Protein Synthesis
Sequence alignment & Substitution matrices By Thomas Nordahl & Morten Nielsen.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
Predicting the Function of Single Nucleotide Polymorphisms Corey Harada Advisor: Eleazar Eskin.
Simultaneous transcription and translation in prokaryotes Green arrow = E. coli DNA Red arrow = mRNA combined with ribosomes.
Heuristic alignment algorithms and cost matrices
Amino Acids and Proteins 1.What is an amino acid / protein 2.Where are they found 3.Properties of the amino acids 4.How are proteins synthesized 1.Transcription.
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
Visualization of genomic data Genome browsers. How many have used a genome browser ? UCSC browser ? Ensembl browser ? Others ? survey.
It og Sundhed Thomas Nordahl Petersen, Associate Professor Center for Biological Sequence Analysis, DTU
It & Health 2009 Summary Thomas Nordahl Petersen.
Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen.
Sequence similarity.
Sequence alignment & Substitution matrices By Thomas Nordahl & Morten Nielsen.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
It & Health 2010 Summary Thomas Nordahl Petersen.
Entropy, Information contents & Logo plots By Thomas Nordahl Petersen.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Signaling Pathways and Summary June 30, 2005 Signaling lecture Course summary Tomorrow Next Week Friday, 7/8/05 Morning presentation of writing assignments.
Bioinformatics Gene Introduction Oct NTUST.
Sequence alignment, E-value & Extreme value distribution
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
An Introduction to Bioinformatics
Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
Intelligent Systems for Bioinformatics Michael J. Watts
Protein Synthesis (Eukaryotes)
COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Molecular Biology Primer. Starting 19 th century… Cellular biology: Cell as a fundamental building block 1850s+: ``DNA’’ was discovered by Friedrich Miescher.
Construction of Substitution Matrices
Outline 1.What is an amino acid / protein naturally occurring amino acids 3.Codon – triplet coding for an amino acid 1.How are proteins synthesized.
The Blosum scoring matrices Morten Nielsen BioSys, DTU.
Genome Annotation Rosana O. Babu.
Outline What is an amino acid / protein
From Genomes to Genes Rui Alves.
Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre.
Introduction to Bioinformatics Algorithms Algorithms for Molecular Biology CSCI Elizabeth White
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Sequence Alignment.
Construction of Substitution matrices
Blosum matrices What are they? Morten Nielsen BioSys, DTU
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
Copyright OpenHelix. No use or reproduction without express written consent1.
Visualization of genomic data Genome browsers. How many have used a genome browser ? UCSC browser ? Ensembl browser ? Others ? survey.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
bacteria and eukaryotes
Bioinformatics Overview
CS515: Bioinformatic Algorithms
Visualization of genomic data
Visualization of genomic data
Outline What is an amino acid / protein
Entropy, Information contents & Logo plots By Thomas Nordahl Petersen
There are four levels of structure in proteins
Entropy, Information contents & Logo plots By Thomas Nordahl Petersen
It og Sundhed Thomas Nordahl Petersen, Associate Professor
Pairwise Sequence Alignment
Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen
BLAT Blast Like Alignment Tool
Entropy, Information contents & Logo plots By Thomas Nordahl Petersen
It og Sundhed Thomas Nordahl Petersen, Associate Professor
Thomas Nordahl Petersen, Associate Prof, Food DTU
Thomas Nordahl Petersen, Associate Bioinformatics, DTU
Presentation transcript:

Introduction to Bioinformatics Summary Thomas Nordahl Petersen

DNA/RNA DNA findes I celle kernen (Eukaryoter) base paring T substituted with U in RNA Reading direction Reading frame (1,2,3,-1,-2,-3) 64 codons DNA -> mRNA Intron, exon & UTR (non-coding exon) Intron/Exon splice site

Reading frame and reverse complement TGCCATGCATAGCCCCTGCCATATCT Having a piece of DNA like: Forward strings & reading frames 1 : TGCCATGCATAGCCCCTGCCATATCT 2 : GCCATGCATAGCCCCTGCCATATCT 3 : CCATGCATAGCCCCTGCCATATCT Reverse complement strings & reading frames -1: AGATATGGCAGGGGCTATGCATGGCA -2: GATATGGCAGGGGCTATGCATGGCA -3: ATATGGCAGGGGCTATGCATGGCA

Amino acids 20 naturally occurring amino acids mRNA -> protein Reading direction 4 backbone atoms Amino acid properties L or D isomers Only L isomer found in living organisms Acidic, basic, polar, charged, hydrophibic 1 and 3 letter codes

Amino Acids Amine and carboxyl groups. Sidechain ‘R’ is attached to C-alpha carbon The amino acids found in Living organisms are L-amino acids

Amino Acids - peptide bond N-terminalC-terminal

Databases and web-tools Databases and biological information Genbank (DNA and genes) Uniprot (protein sequences) Web-tools NCBI Blast UCSC genome browser Weblogo

CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Theory of evolution Charles Darwin

Phylogenetic tree

Global versus local alignments Global alignment: align full length of both sequences. (The “Needleman-Wunsch” algorithm). Local alignment: find best partial alignment of two sequences (the “Smith-Waterman” algorithm). Global alignment Seq 1 Seq 2 Local alignment

Pairwise alignment: the solution ” Dynamic programming ” (the Needleman-Wunsch algorithm)

Sequence alignment - Blast

Phylogenetic trees and distance matrices

Blosum & PAM matrices Blosum matrices are the most commonly used substitution matrices. Defalut is Blosum62 and blosum50 matrices Very remote sequences - low Blosumxx Very similar sequences - use high Blosumxx

Log-odds scores BLOSUM is a log-likelihood matrix: Likelihood of observing j given you have i is –P(j|i) = P ij /P i The prior likelihood of observing j is –Q j, which is simply the frequency The log-likelihood score is –S ij = 2log 2 (P(j|i)/log(Q j ) = 2log 2 (P ij /(Q i Q j )) –Where, Log 2 (x)=log n (x)/log n (2) –S has been normalized to half bits, therefore the factor 2

BLAST Exercise

Genome browsers - UCSC Intron - Exon structure Single Nucleotide polymorphism - SNP

SNPs

UCSC genome Blat search browser details

From genotype to phenotype

Protein 3D-structure

Protein structure Primary structure: Amino acids sequences Secondary structure: Helix/Beta sheet Tertiary structure: Fold, 3D cordinates

Protein structure  -helix    helix3 residues/turn - few, but not uncommon  - helix3.6 residues/turn - by far the most common helix Pi-helix4.1 residues/turn - very rare

Protein structure  strand/sheet

Protein folds Class Alpha,beta, alpha+beta and alpha/beta And last class – none or few SS-elements Architecture Overall shape of a domain Topology Share secondary structure connectivity

Protein 3D-structure

Neural Networks From knowledge to information Protein sequence Biological feature

A data-driven method to predict a feature, given a set of training data In biology input features could be amino acid sequence or nucleotides Secondary structure prediction Signal peptide prediction Surface accessibility Propeptide prediction Use of artificial neural networks N C Signal peptide Propeptide Mature/active protein

Prediction of biological features Surface accessible Predict surface accessible from amino acid sequence only.

Logo plots Information content, how is it calculated - what does it mean.

Logo plots - Information Content Sequence-logo Calculate Information Content I =  a  p a log 2 p a + log 2 (4), Maximal value is 2 bits Total height at a position is the ‘Information Content’ measured in bits. Height of letter is the proportional to the frequency of that letter. A Logo plot is a visualization of a mutiple alignment. ~0.5 each Completely conserved