It & Health 2010 Summary Thomas Nordahl Petersen.

Slides:



Advertisements
Similar presentations
It og Sundhed Nov Jan. Thomas Nordahl Petersen, Associate Professor Center for Biological Sequence Analysis, DTU Normal
Advertisements

It og Sundhed Thomas Nordahl Petersen, Associate Professor Center for Biological Sequence Analysis, DTU Building 208, room 021
1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
Protein Targetting Prokaryotes vs. Eukaryotes Mutations
It og Sundhed Nov Jan. Thomas Nordahl Petersen, Associate Professor Center for Biological Sequence Analysis, DTU
Protein Structure and Physics. What I will talk about today… -Outline protein synthesis and explain the basic steps involved. -Go over the Chemistry of.
Sources Page & Holmes Vladimir Likic presentation: 20show.pdf
BLAST Sequence alignment, E-value & Extreme value distribution.
Sequence alignment & Substitution matrices By Thomas Nordahl & Morten Nielsen.
Simultaneous transcription and translation in prokaryotes Green arrow = E. coli DNA Red arrow = mRNA combined with ribosomes.
Heuristic alignment algorithms and cost matrices
Amino Acids and Proteins 1.What is an amino acid / protein 2.Where are they found 3.Properties of the amino acids 4.How are proteins synthesized 1.Transcription.
It og Sundhed Thomas Nordahl Petersen, Associate Professor Center for Biological Sequence Analysis, DTU
It & Health 2009 Summary Thomas Nordahl Petersen.
Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen.
Introduction to bioinformatics
Sequence similarity.
Sequence alignment & Substitution matrices By Thomas Nordahl & Morten Nielsen.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Entropy, Information contents & Logo plots By Thomas Nordahl Petersen.
Scoring matrices Identity PAM BLOSUM.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Sequence Alignments Revisited
Alignment III PAM Matrices. 2 PAM250 scoring matrix.
Signaling Pathways and Summary June 30, 2005 Signaling lecture Course summary Tomorrow Next Week Friday, 7/8/05 Morning presentation of writing assignments.
Sequence alignment, E-value & Extreme value distribution
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
An Introduction to Bioinformatics
Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
Molecular Biology Primer for CS and engineering students Alan Qi Jan. 10, 2008.
Intelligent Systems for Bioinformatics Michael J. Watts
Protein Synthesis (Eukaryotes)
COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Construction of Substitution Matrices
Outline 1.What is an amino acid / protein naturally occurring amino acids 3.Codon – triplet coding for an amino acid 1.How are proteins synthesized.
The Blosum scoring matrices Morten Nielsen BioSys, DTU.
Outline What is an amino acid / protein
Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Doug Raiford Lesson 5.  Dynamic programming methods  Needleman-Wunsch (global alignment)  Smith-Waterman (local alignment)  BLAST Fixed: best Linear:
Sequence Alignment.
Construction of Substitution matrices
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
Copyright OpenHelix. No use or reproduction without express written consent1.
Visualization of genomic data Genome browsers. How many have used a genome browser ? UCSC browser ? Ensembl browser ? Others ? survey.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
Introduction to Bioinformatics Summary Thomas Nordahl Petersen.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
bacteria and eukaryotes
Bioinformatics Overview
Sequence similarity, BLAST alignments & multiple sequence alignments
Lesson Four Structure of a Gene.
Lesson Four Structure of a Gene.
CS515: Bioinformatic Algorithms
Visualization of genomic data
Outline What is an amino acid / protein
Entropy, Information contents & Logo plots By Thomas Nordahl Petersen
Entropy, Information contents & Logo plots By Thomas Nordahl Petersen
It og Sundhed Thomas Nordahl Petersen, Associate Professor
Pairwise Sequence Alignment
Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen
BLAT Blast Like Alignment Tool
Entropy, Information contents & Logo plots By Thomas Nordahl Petersen
It og Sundhed Thomas Nordahl Petersen, Associate Professor
Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen
Thomas Nordahl Petersen, Associate Prof, Food DTU
Thomas Nordahl Petersen, Associate Bioinformatics, DTU
Presentation transcript:

It & Health 2010 Summary Thomas Nordahl Petersen

DNA/RNA DNA findes I celle kernen (Eukaryoter) base paring T substituted with U in RNA Reading direction Reading frame (1,2,3,-1,-2,-3) 64 codons DNA -> mRNA Intron, exon & UTR (non-coding exon) Intron/Exon splice site

Reading frame and reverse complement TGCCATGCATAGCCCCTGCCATATCT Having a piece of DNA like: Forward strings & reading frames 1 : TGCCATGCATAGCCCCTGCCATATCT 2 : GCCATGCATAGCCCCTGCCATATCT 3 : CCATGCATAGCCCCTGCCATATCT Reverse complement strings & reading frames -1: TCTATACCGTCCCCGATACGTACCGT -2: CTATACCGTCCCCGATACGTACCGT -3: TATACCGTCCCCGATACGTACCGT

Amino acids 20 naturally occurring amino acids -mRNA -> protein -Reading direction -4 backbone atoms -Amino acid properties -Acidic, basic, polar, charged, hydrophibic -1 and 3 letter codes

Amino Acids Amine and carboxyl groups. Sidechain ‘R’ is attached to C-alpha carbon The amino acids found in Living organisms are L-amino acids

Amino Acids - peptide bond N-terminalC-terminal

Databases and web-tools Databases and biological information Genbank Uniprot Web-tools NCBI Blast UCSC genome browser Weblogo

CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Theory of evolution Charles Darwin

Phylogenetic tree

Global versus local alignments Global alignment: align full length of both sequences. (The “Needleman-Wunsch” algorithm). Local alignment: find best partial alignment of two sequences (the “Smith-Waterman” algorithm). Global alignment Seq 1 Seq 2 Local alignment

Pairwise alignment: the solution ” Dynamic programming ” (the Needleman-Wunsch algorithm)

Sequence alignment - Blast

Blosum & PAM matrices Blosum matrices are the most commonly used substitution matrices. Blosum50, Blosum62, blosum80 PAM - Percent Accepted Mutations PAM-0 is the identity matrix. PAM-1 diagonal small deviations from 1, off- diag has small deviations from 0 PAM-250 is PAM-1 multiplied by itself 250 times.

Sequence profiles (1J2J.B) >1J2J.B mol:aa PROTEIN TRANSPORT NVIFEDEEKSKMLARLLKSSHPEDLRAANKLIKEMVQEDQKRMEK

Log-odds scores BLOSUM is a log-likelihood matrix: Likelihood of observing j given you have i is –P(j|i) = P ij /P i The prior likelihood of observing j is –Q j, which is simply the frequency The log-likelihood score is –S ij = 2log 2 (P(j|i)/log(Q j ) = 2log 2 (P ij /(Q i Q j )) –Where, Log 2 (x)=log n (x)/log n (2) –S has been normalized to half bits, therefore the factor 2

BLAST Exercise

Genome browsers - UCSC Intron - Exon structure Single Nucleotide polymorphism - SNP

SNPs

Protein 3D-structure

Protein structure Primary structure: Amino acids sequences Secondary structure: Helix/Beta sheet Tertiary structure: Fold, 3D cordinates

Protein structure  -helix    helix3 residues/turn - few, but not uncommon  - helix3.6 residues/turn - by far the most common helix Pi-helix4.1 residues/turn - very rare

Protein structure  strand/sheet

Protein folds Class Alpha,beta, alpha+beta and alpha/beta And last class – none or few SS-elements Architecture Overall shape of a domain Topology Share secondary structure connectivity

Protein 3D-structure

Neural Networks From knowledge to information Protein sequence Biological feature

A data-driven method to predict a feature, given a set of training data In biology input features could be amino acid sequence or nucleotides Secondary structure prediction Signal peptide prediction Surface accessibility Propeptide prediction Use of artificial neural networks N C Signal peptide Propeptide Mature/active protein

Prediction of biological features Surface accessible Predict surface accessible from amino acid sequence only.

Logo plots Information content, how is it calculated - what does it mean.

Logo plots - Information Content Sequence-logo Calculate Information Content I =  a  p a log 2 p a + log 2 (4), Maximal value is 2 bits Total height at a position is the ‘Information Content’ measured in bits. Height of letter is the proportional to the frequency of that letter. A Logo plot is a visualization of a mutiple alignment. ~0.5 each Completely conserved