Download presentation
Presentation is loading. Please wait.
Published byAldous Stanley Modified over 9 years ago
1
Bioinformatics Computing 1 CMP 807 – Day 2 Kevin Galens
2
Today’s Objectives Sequence Alignment Global Local Substitution Matrices DNA Sequencing BLAST Algorithm Install Software: BLAST DB EMBOSS – emboss.open-bio.org ClustalW - ftp.ebi.ac.uk File Formats
3
Fundamentals of Sequence Alignment
4
Global Alignment: Needleman-Wunsch What is Global alignment? Uses whole length of both sequences Result: 1 optimal alignment Needleman-Wunsch: Utilize a 2-d matrix Scenario: Align: COELACANTH and PELICAN +1 – Match -1 – Mismatch -1 - Gap
5
Global Alignment: Needleman-Wunsch
7
Resulting alignment: COELACANTH P-ELICAN-- or COELACANTH -PELICAN--
8
Local Alignment: Smith-Waterman What is a local alignment? Find the highest scoring substring No assumption on sequence length Smith-Waterman Use a 2-d matrix Scenario: Align: COELACANTH and PELICAN +1 – Match -1 – Mismatch -1 - Gap
9
Local Alignment: Smith-Waterman
11
Resulting alignment: ELACAN ELICAN
12
Sequence Alignment More sophisticated scoring: Substitution Matrix PAMX (Point Accepted Mutation) Scaled according to evolutionary distance of closely related proteins PAM1 = 1% of amino acid positions have changed PAM250 – most common BLOSUMX (BLOck SUbstitution Matrix) Scaled according to more distantly related proteins BLOSUM62 – based on proteins with <=62% identity
13
Sequence Alignment Questions?
14
DNA Sequencing
15
Sanger (Chain-Termination) Sequencing Sanger Purified DNA Isolation via a clone (plasmid/phage) Polymerase Chain Reaction (PCR) ddNTP – chain terminating nucleotide with fluorescent (or radioactive label) Denature DNA Reanneal with Primer Elongate (random length fragments because of ddNTP)
16
DNA Sequencing Sanger (Chain-Termination) Sequencing PCR yields: Random length pieces of Labeled DNA Gel Electrophoresis DNA – net negative (-) charge Separate DNA by size Largest move slow Smallest move fast Sequencing gel A C G T
17
DNA Sequencing Modern DNA Sequencing Capillary gel electrophoresis Read fluorescents
18
BLAST Filter regions of ‘low complexity’ GGGGGGGGG – XXXXXXXXX Generate a list of words from Query Length 3 for protein Length 11 for DNA ABC, BCD, CDE… Find all similar words from database sequence words Impose a cutoff (T) score for a given word’s ‘Neighborhood’ Limits the search space
19
BLAST
20
Scan entire sequence database with high- scoring words Use a suffix tree for speed If a word matches… Align sequence in both directions Until score drops a bit below best Throw out High-scoring Segments Pairs below a cutoff Assess significance of HSP score
21
BLAST Questions?
22
Common Bioinformatics File Formats
23
File Formats FASTA/PIR GenBank/EMBL/DDBJ Swiss-Prot PDB
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.