Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Bioinformatics Summary Thomas Nordahl Petersen.

Similar presentations


Presentation on theme: "Introduction to Bioinformatics Summary Thomas Nordahl Petersen."— Presentation transcript:

1 Introduction to Bioinformatics Summary Thomas Nordahl Petersen

2 DNA/RNA DNA findes I celle kernen (Eukaryoter) base paring T substituted with U in RNA Reading direction Reading frame (1,2,3,-1,-2,-3) 64 codons DNA -> mRNA Intron, exon & UTR (non-coding exon) Intron/Exon splice site

3 Reading frame and reverse complement TGCCATGCATAGCCCCTGCCATATCT Having a piece of DNA like: Forward strings & reading frames 1 : TGCCATGCATAGCCCCTGCCATATCT 2 : GCCATGCATAGCCCCTGCCATATCT 3 : CCATGCATAGCCCCTGCCATATCT Reverse complement strings & reading frames -1: AGATATGGCAGGGGCTATGCATGGCA -2: GATATGGCAGGGGCTATGCATGGCA -3: ATATGGCAGGGGCTATGCATGGCA

4 Amino acids 20 naturally occurring amino acids mRNA -> protein Reading direction 4 backbone atoms Amino acid properties L or D isomers Only L isomer found in living organisms Acidic, basic, polar, charged, hydrophibic 1 and 3 letter codes

5 Amino Acids Amine and carboxyl groups. Sidechain ‘R’ is attached to C-alpha carbon The amino acids found in Living organisms are L-amino acids

6 Amino Acids - peptide bond N-terminalC-terminal

7 Databases and web-tools Databases and biological information Genbank (DNA and genes) Uniprot (protein sequences) Web-tools NCBI Blast UCSC genome browser Weblogo

8 CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Theory of evolution Charles Darwin 1809-1882

9 Phylogenetic tree

10 Global versus local alignments Global alignment: align full length of both sequences. (The “Needleman-Wunsch” algorithm). Local alignment: find best partial alignment of two sequences (the “Smith-Waterman” algorithm). Global alignment Seq 1 Seq 2 Local alignment

11 Pairwise alignment: the solution ” Dynamic programming ” (the Needleman-Wunsch algorithm)

12 Sequence alignment - Blast

13

14 Phylogenetic trees and distance matrices

15 Blosum & PAM matrices Blosum matrices are the most commonly used substitution matrices. Defalut is Blosum62 and blosum50 matrices Very remote sequences - low Blosumxx Very similar sequences - use high Blosumxx

16 Log-odds scores BLOSUM is a log-likelihood matrix: Likelihood of observing j given you have i is –P(j|i) = P ij /P i The prior likelihood of observing j is –Q j, which is simply the frequency The log-likelihood score is –S ij = 2log 2 (P(j|i)/log(Q j ) = 2log 2 (P ij /(Q i Q j )) –Where, Log 2 (x)=log n (x)/log n (2) –S has been normalized to half bits, therefore the factor 2

17 BLAST Exercise

18 Genome browsers - UCSC Intron - Exon structure Single Nucleotide polymorphism - SNP

19 SNPs

20 UCSC genome Blat search browser details

21 From genotype to phenotype

22 Protein 3D-structure

23 Protein structure Primary structure: Amino acids sequences Secondary structure: Helix/Beta sheet Tertiary structure: Fold, 3D cordinates

24 Protein structure  -helix    helix3 residues/turn - few, but not uncommon  - helix3.6 residues/turn - by far the most common helix Pi-helix4.1 residues/turn - very rare

25 Protein structure  strand/sheet

26 Protein folds Class Alpha,beta, alpha+beta and alpha/beta And last class – none or few SS-elements Architecture Overall shape of a domain Topology Share secondary structure connectivity

27 Protein 3D-structure

28 Neural Networks From knowledge to information Protein sequence Biological feature

29 A data-driven method to predict a feature, given a set of training data In biology input features could be amino acid sequence or nucleotides Secondary structure prediction Signal peptide prediction Surface accessibility Propeptide prediction Use of artificial neural networks N C Signal peptide Propeptide Mature/active protein

30 Prediction of biological features Surface accessible Predict surface accessible from amino acid sequence only.

31 Logo plots Information content, how is it calculated - what does it mean.

32 Logo plots - Information Content Sequence-logo Calculate Information Content I =  a  p a log 2 p a + log 2 (4), Maximal value is 2 bits Total height at a position is the ‘Information Content’ measured in bits. Height of letter is the proportional to the frequency of that letter. A Logo plot is a visualization of a mutiple alignment. ~0.5 each Completely conserved


Download ppt "Introduction to Bioinformatics Summary Thomas Nordahl Petersen."

Similar presentations


Ads by Google