Download presentation
Presentation is loading. Please wait.
Published byMadeleine Wilkinson Modified over 8 years ago
1
Introduction to Bioinformatics Summary Thomas Nordahl Petersen
2
DNA/RNA DNA findes I celle kernen (Eukaryoter) base paring T substituted with U in RNA Reading direction Reading frame (1,2,3,-1,-2,-3) 64 codons DNA -> mRNA Intron, exon & UTR (non-coding exon) Intron/Exon splice site
3
Reading frame and reverse complement TGCCATGCATAGCCCCTGCCATATCT Having a piece of DNA like: Forward strings & reading frames 1 : TGCCATGCATAGCCCCTGCCATATCT 2 : GCCATGCATAGCCCCTGCCATATCT 3 : CCATGCATAGCCCCTGCCATATCT Reverse complement strings & reading frames -1: AGATATGGCAGGGGCTATGCATGGCA -2: GATATGGCAGGGGCTATGCATGGCA -3: ATATGGCAGGGGCTATGCATGGCA
4
Amino acids 20 naturally occurring amino acids mRNA -> protein Reading direction 4 backbone atoms Amino acid properties L or D isomers Only L isomer found in living organisms Acidic, basic, polar, charged, hydrophibic 1 and 3 letter codes
5
Amino Acids Amine and carboxyl groups. Sidechain ‘R’ is attached to C-alpha carbon The amino acids found in Living organisms are L-amino acids
6
Amino Acids - peptide bond N-terminalC-terminal
7
Databases and web-tools Databases and biological information Genbank (DNA and genes) Uniprot (protein sequences) Web-tools NCBI Blast UCSC genome browser Weblogo
8
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Theory of evolution Charles Darwin 1809-1882
9
Phylogenetic tree
10
Global versus local alignments Global alignment: align full length of both sequences. (The “Needleman-Wunsch” algorithm). Local alignment: find best partial alignment of two sequences (the “Smith-Waterman” algorithm). Global alignment Seq 1 Seq 2 Local alignment
11
Pairwise alignment: the solution ” Dynamic programming ” (the Needleman-Wunsch algorithm)
12
Sequence alignment - Blast
14
Phylogenetic trees and distance matrices
15
Blosum & PAM matrices Blosum matrices are the most commonly used substitution matrices. Defalut is Blosum62 and blosum50 matrices Very remote sequences - low Blosumxx Very similar sequences - use high Blosumxx
16
Log-odds scores BLOSUM is a log-likelihood matrix: Likelihood of observing j given you have i is –P(j|i) = P ij /P i The prior likelihood of observing j is –Q j, which is simply the frequency The log-likelihood score is –S ij = 2log 2 (P(j|i)/log(Q j ) = 2log 2 (P ij /(Q i Q j )) –Where, Log 2 (x)=log n (x)/log n (2) –S has been normalized to half bits, therefore the factor 2
17
BLAST Exercise
18
Genome browsers - UCSC Intron - Exon structure Single Nucleotide polymorphism - SNP
19
SNPs
20
UCSC genome Blat search browser details
21
From genotype to phenotype
22
Protein 3D-structure
23
Protein structure Primary structure: Amino acids sequences Secondary structure: Helix/Beta sheet Tertiary structure: Fold, 3D cordinates
24
Protein structure -helix helix3 residues/turn - few, but not uncommon - helix3.6 residues/turn - by far the most common helix Pi-helix4.1 residues/turn - very rare
25
Protein structure strand/sheet
26
Protein folds Class Alpha,beta, alpha+beta and alpha/beta And last class – none or few SS-elements Architecture Overall shape of a domain Topology Share secondary structure connectivity
27
Protein 3D-structure
28
Neural Networks From knowledge to information Protein sequence Biological feature
29
A data-driven method to predict a feature, given a set of training data In biology input features could be amino acid sequence or nucleotides Secondary structure prediction Signal peptide prediction Surface accessibility Propeptide prediction Use of artificial neural networks N C Signal peptide Propeptide Mature/active protein
30
Prediction of biological features Surface accessible Predict surface accessible from amino acid sequence only.
31
Logo plots Information content, how is it calculated - what does it mean.
32
Logo plots - Information Content Sequence-logo Calculate Information Content I = a p a log 2 p a + log 2 (4), Maximal value is 2 bits Total height at a position is the ‘Information Content’ measured in bits. Height of letter is the proportional to the frequency of that letter. A Logo plot is a visualization of a mutiple alignment. ~0.5 each Completely conserved
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.