Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bioinformatics Computing 1 CMP 807 – Day 2 Kevin Galens.

Similar presentations


Presentation on theme: "Bioinformatics Computing 1 CMP 807 – Day 2 Kevin Galens."— Presentation transcript:

1 Bioinformatics Computing 1 CMP 807 – Day 2 Kevin Galens

2 Today’s Objectives Sequence Alignment Global Local Substitution Matrices DNA Sequencing BLAST Algorithm Install Software: BLAST DB EMBOSS – emboss.open-bio.org ClustalW - ftp.ebi.ac.uk File Formats

3 Fundamentals of Sequence Alignment

4 Global Alignment: Needleman-Wunsch What is Global alignment? Uses whole length of both sequences Result: 1 optimal alignment Needleman-Wunsch: Utilize a 2-d matrix Scenario: Align: COELACANTH and PELICAN +1 – Match -1 – Mismatch -1 - Gap

5 Global Alignment: Needleman-Wunsch

6

7 Resulting alignment: COELACANTH P-ELICAN-- or COELACANTH -PELICAN--

8 Local Alignment: Smith-Waterman What is a local alignment? Find the highest scoring substring No assumption on sequence length Smith-Waterman Use a 2-d matrix Scenario: Align: COELACANTH and PELICAN +1 – Match -1 – Mismatch -1 - Gap

9 Local Alignment: Smith-Waterman

10

11 Resulting alignment: ELACAN ELICAN

12 Sequence Alignment More sophisticated scoring: Substitution Matrix PAMX (Point Accepted Mutation)  Scaled according to evolutionary distance of closely related proteins  PAM1 = 1% of amino acid positions have changed  PAM250 – most common BLOSUMX (BLOck SUbstitution Matrix)  Scaled according to more distantly related proteins  BLOSUM62 – based on proteins with <=62% identity

13 Sequence Alignment Questions?

14 DNA Sequencing

15 Sanger (Chain-Termination) Sequencing Sanger Purified DNA Isolation via a clone (plasmid/phage) Polymerase Chain Reaction (PCR) ddNTP – chain terminating nucleotide with fluorescent (or radioactive label) Denature DNA Reanneal with Primer Elongate (random length fragments because of ddNTP)

16 DNA Sequencing Sanger (Chain-Termination) Sequencing PCR yields: Random length pieces of Labeled DNA Gel Electrophoresis DNA – net negative (-) charge Separate DNA by size  Largest move slow  Smallest move fast Sequencing gel A C G T

17 DNA Sequencing Modern DNA Sequencing Capillary gel electrophoresis Read fluorescents

18 BLAST Filter regions of ‘low complexity’ GGGGGGGGG – XXXXXXXXX Generate a list of words from Query Length 3 for protein Length 11 for DNA ABC, BCD, CDE… Find all similar words from database sequence words Impose a cutoff (T) score for a given word’s ‘Neighborhood’ Limits the search space

19 BLAST

20 Scan entire sequence database with high- scoring words Use a suffix tree for speed If a word matches… Align sequence in both directions Until score drops a bit below best Throw out High-scoring Segments Pairs below a cutoff Assess significance of HSP score

21 BLAST Questions?

22 Common Bioinformatics File Formats

23 File Formats FASTA/PIR GenBank/EMBL/DDBJ Swiss-Prot PDB


Download ppt "Bioinformatics Computing 1 CMP 807 – Day 2 Kevin Galens."

Similar presentations


Ads by Google