It & Health 2010 Summary Thomas Nordahl Petersen.

Slides:

Advertisements

Similar presentations

It og Sundhed Nov Jan. Thomas Nordahl Petersen, Associate Professor Center for Biological Sequence Analysis, DTU Normal

Advertisements

It og Sundhed Thomas Nordahl Petersen, Associate Professor Center for Biological Sequence Analysis, DTU Building 208, room 021

1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.

Protein Targetting Prokaryotes vs. Eukaryotes Mutations

It og Sundhed Nov Jan. Thomas Nordahl Petersen, Associate Professor Center for Biological Sequence Analysis, DTU

Protein Structure and Physics. What I will talk about today… -Outline protein synthesis and explain the basic steps involved. -Go over the Chemistry of.

Sources Page & Holmes Vladimir Likic presentation: 20show.pdf

BLAST Sequence alignment, E-value & Extreme value distribution.

Sequence alignment & Substitution matrices By Thomas Nordahl & Morten Nielsen.

Simultaneous transcription and translation in prokaryotes Green arrow = E. coli DNA Red arrow = mRNA combined with ribosomes.

Heuristic alignment algorithms and cost matrices

Amino Acids and Proteins 1.What is an amino acid / protein 2.Where are they found 3.Properties of the amino acids 4.How are proteins synthesized 1.Transcription.

It og Sundhed Thomas Nordahl Petersen, Associate Professor Center for Biological Sequence Analysis, DTU

It & Health 2009 Summary Thomas Nordahl Petersen.

Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen.

Introduction to bioinformatics

Sequence similarity.

Sequence alignment & Substitution matrices By Thomas Nordahl & Morten Nielsen.

Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.

Entropy, Information contents & Logo plots By Thomas Nordahl Petersen.

Scoring matrices Identity PAM BLOSUM.

Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.

Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.

Sequence Alignments Revisited

Alignment III PAM Matrices. 2 PAM250 scoring matrix.

Signaling Pathways and Summary June 30, 2005 Signaling lecture Course summary Tomorrow Next Week Friday, 7/8/05 Morning presentation of writing assignments.

Sequence alignment, E-value & Extreme value distribution

Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.

An Introduction to Bioinformatics

Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

Molecular Biology Primer for CS and engineering students Alan Qi Jan. 10, 2008.

Intelligent Systems for Bioinformatics Michael J. Watts

Protein Synthesis (Eukaryotes)

COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering.

Sequence analysis: Macromolecular motif recognition Sylvia Nagl.

Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.

Construction of Substitution Matrices

Outline 1.What is an amino acid / protein naturally occurring amino acids 3.Codon – triplet coding for an amino acid 1.How are proteins synthesized.

The Blosum scoring matrices Morten Nielsen BioSys, DTU.

Outline What is an amino acid / protein

Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre.

Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.

Doug Raiford Lesson 5.  Dynamic programming methods  Needleman-Wunsch (global alignment)  Smith-Waterman (local alignment)  BLAST Fixed: best Linear:

Sequence Alignment.

Construction of Substitution matrices

The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.

Copyright OpenHelix. No use or reproduction without express written consent1.

Visualization of genomic data Genome browsers. How many have used a genome browser ? UCSC browser ? Ensembl browser ? Others ? survey.

Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.

Introduction to Bioinformatics Summary Thomas Nordahl Petersen.

Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,

bacteria and eukaryotes

Bioinformatics Overview

Sequence similarity, BLAST alignments & multiple sequence alignments

Lesson Four Structure of a Gene.

Lesson Four Structure of a Gene.

CS515: Bioinformatic Algorithms

Visualization of genomic data

Outline What is an amino acid / protein

Entropy, Information contents & Logo plots By Thomas Nordahl Petersen

Entropy, Information contents & Logo plots By Thomas Nordahl Petersen

It og Sundhed Thomas Nordahl Petersen, Associate Professor

Pairwise Sequence Alignment

Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen

BLAT Blast Like Alignment Tool

Entropy, Information contents & Logo plots By Thomas Nordahl Petersen

It og Sundhed Thomas Nordahl Petersen, Associate Professor

Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen

Thomas Nordahl Petersen, Associate Prof, Food DTU

Thomas Nordahl Petersen, Associate Bioinformatics, DTU

Presentation transcript:

It & Health 2010 Summary Thomas Nordahl Petersen

DNA/RNA DNA findes I celle kernen (Eukaryoter) base paring T substituted with U in RNA Reading direction Reading frame (1,2,3,-1,-2,-3) 64 codons DNA -> mRNA Intron, exon & UTR (non-coding exon) Intron/Exon splice site

Reading frame and reverse complement TGCCATGCATAGCCCCTGCCATATCT Having a piece of DNA like: Forward strings & reading frames 1 : TGCCATGCATAGCCCCTGCCATATCT 2 : GCCATGCATAGCCCCTGCCATATCT 3 : CCATGCATAGCCCCTGCCATATCT Reverse complement strings & reading frames -1: TCTATACCGTCCCCGATACGTACCGT -2: CTATACCGTCCCCGATACGTACCGT -3: TATACCGTCCCCGATACGTACCGT

Amino acids 20 naturally occurring amino acids -mRNA -> protein -Reading direction -4 backbone atoms -Amino acid properties -Acidic, basic, polar, charged, hydrophibic -1 and 3 letter codes

Amino Acids Amine and carboxyl groups. Sidechain ‘R’ is attached to C-alpha carbon The amino acids found in Living organisms are L-amino acids

Amino Acids - peptide bond N-terminalC-terminal

Databases and web-tools Databases and biological information Genbank Uniprot Web-tools NCBI Blast UCSC genome browser Weblogo

CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Theory of evolution Charles Darwin

Phylogenetic tree

Global versus local alignments Global alignment: align full length of both sequences. (The “Needleman-Wunsch” algorithm). Local alignment: find best partial alignment of two sequences (the “Smith-Waterman” algorithm). Global alignment Seq 1 Seq 2 Local alignment

Pairwise alignment: the solution ” Dynamic programming ” (the Needleman-Wunsch algorithm)

Sequence alignment - Blast

Blosum & PAM matrices Blosum matrices are the most commonly used substitution matrices. Blosum50, Blosum62, blosum80 PAM - Percent Accepted Mutations PAM-0 is the identity matrix. PAM-1 diagonal small deviations from 1, off- diag has small deviations from 0 PAM-250 is PAM-1 multiplied by itself 250 times.

Sequence profiles (1J2J.B) >1J2J.B mol:aa PROTEIN TRANSPORT NVIFEDEEKSKMLARLLKSSHPEDLRAANKLIKEMVQEDQKRMEK

Log-odds scores BLOSUM is a log-likelihood matrix: Likelihood of observing j given you have i is –P(j|i) = P ij /P i The prior likelihood of observing j is –Q j, which is simply the frequency The log-likelihood score is –S ij = 2log 2 (P(j|i)/log(Q j ) = 2log 2 (P ij /(Q i Q j )) –Where, Log 2 (x)=log n (x)/log n (2) –S has been normalized to half bits, therefore the factor 2

BLAST Exercise

Genome browsers - UCSC Intron - Exon structure Single Nucleotide polymorphism - SNP

SNPs

Protein 3D-structure

Protein structure Primary structure: Amino acids sequences Secondary structure: Helix/Beta sheet Tertiary structure: Fold, 3D cordinates

Protein structure  -helix    helix3 residues/turn - few, but not uncommon  - helix3.6 residues/turn - by far the most common helix Pi-helix4.1 residues/turn - very rare

Protein structure  strand/sheet

Protein folds Class Alpha,beta, alpha+beta and alpha/beta And last class – none or few SS-elements Architecture Overall shape of a domain Topology Share secondary structure connectivity

Protein 3D-structure

Neural Networks From knowledge to information Protein sequence Biological feature

A data-driven method to predict a feature, given a set of training data In biology input features could be amino acid sequence or nucleotides Secondary structure prediction Signal peptide prediction Surface accessibility Propeptide prediction Use of artificial neural networks N C Signal peptide Propeptide Mature/active protein

Prediction of biological features Surface accessible Predict surface accessible from amino acid sequence only.

Logo plots Information content, how is it calculated - what does it mean.

Logo plots - Information Content Sequence-logo Calculate Information Content I =  a  p a log 2 p a + log 2 (4), Maximal value is 2 bits Total height at a position is the ‘Information Content’ measured in bits. Height of letter is the proportional to the frequency of that letter. A Logo plot is a visualization of a mutiple alignment. ~0.5 each Completely conserved