Projects….

Slides:



Advertisements
Similar presentations
Sequence comparison: Significance of similarity scores Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Advertisements

Fa07CSE 182 CSE182-L4: Database filtering. Fa07CSE 182 Summary (through lecture 3) A2 is online We considered the basics of sequence alignment –Opt score.
BLAST Sequence alignment, E-value & Extreme value distribution.
Searching Sequence Databases
Lecture outline Database searches
Database searching. Purposes of similarity search Function prediction by homology (in silico annotation) Function prediction by homology (in silico annotation)
Basic Statistical Concepts Psych 231: Research Methods in Psychology.
Sequence Alignment vs. Database Task: Given a query sequence and millions of database records, find the optimal alignment between the query and a record.
Basic Statistical Concepts
Statistics Psych 231: Research Methods in Psychology.
Pairwise Sequence Alignment Part 2. Outline Global alignments-continuation Local versus Global BLAST algorithms Evaluating significance of alignments.
Spring 2007 Bioinformatiatics Ch. 2 - Sequence Alignment.
Similar Sequence Similar Function Charles Yan Spring 2006.
Fa05CSE 182 CSE182-L5: Scoring matrices Dictionary Matching.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Rationale for searching sequence databases June 22, 2005 Writing Topics due today Writing projects due July 8 Learning objectives- Review of Smith-Waterman.
Basic Statistical Concepts Part II Psych 231: Research Methods in Psychology.
Sequence alignment, E-value & Extreme value distribution
Sequence comparison: Score matrices Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Sequence comparison: Local alignment
TM Biological Sequence Comparison / Database Homology Searching Aoife McLysaght Summer Intern, Compaq Computer Corporation Ballybrit Business Park, Galway,
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
BLAST What it does and what it means Steven Slater Adapted from pt.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
AP Statistics Section 15 A. The Regression Model When a scatterplot shows a linear relationship between a quantitative explanatory variable x and a quantitative.
Protein Sequence Alignment and Database Searching.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Motif discovery Tutorial 5. Motif discovery MEME Creates motif PSSM de-novo (unknown motif) MAST Searches for a PSSM in a DB TOMTOM Searches for a PSSM.
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
Comp. Genomics Recitation 3 The statistics of database searching.
Regression Analysis Week 8 DIAGNOSTIC AND REMEDIAL MEASURES Residuals The main purpose examining residuals Diagnostic for Residuals Test involving residuals.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Rationale for searching sequence databases June 25, 2003 Writing projects due July 11 Learning objectives- FASTA and BLAST programs. Psi-Blast Workshop-Use.
Pairwise Local Alignment and Database Search Csc 487/687 Computing for Bioinformatics.
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
Effective Keyword-Based Selection of Relational Databases By Bei Yu, Guoliang Li, Karen Sollins & Anthony K. H. Tung Presented by Deborah Kallina.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Your friend has a hobby of generating random bit strings, and finding patterns in them. One day she come to you, excited and says: I found the strangest.
BLAST BNFO 236 Usman Roshan. BLAST Local pairwise alignment heuristic Faster than standard pairwise alignment programs such as SSEARCH, but less sensitive.
Lecture Slides Elementary Statistics Twelfth Edition
Anticipating Patterns Statistical Inference
Computations, and the best fitting line.
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Sequence comparison: Local alignment
Essential Statistics (a.k.a: The statistical bare minimum I should take along from STAT 101)
Simple Linear Regression - Introduction
Identifying templates for protein modeling:
Lecture Slides Elementary Statistics Thirteenth Edition
Sequence comparison: Significance of similarity scores
Local alignment and BLAST
Global, local, repeated and overlaping
Re-expressing Data:Get it Straight!
Fast Sequence Alignments
Sequence Based Analysis Tutorial
Sequence Based Analysis Tutorial
Pairwise Sequence Alignment (cont.)
Basic Practice of Statistics - 3rd Edition Inference for Regression
Sequence comparison: Significance of similarity scores
Basic Local Alignment Search Tool
Created by Erin Hodgess, Houston, Texas
Sequence alignment, E-value & Extreme value distribution
CH2 Time series.
Searching Sequence Databases
Presentation transcript:

Projects…

FASTA Lookup Tables ACNGTSCHQE C S Q GCHCLSAGQD ACNGTSCHQE G C sequence 1: ACNGTSCHQE sequence 2: GCHCLSAGQD ACNGTSCHQE C S Q GCHCLSAGQD ACNGTSCHQE CH GCHCLSAGQD ACNGTSCHQE G C GCHCLSAGQD

SSEARCH Smith-Waterman local alignment pairwise on entire database Extremely slow Best for identifying weak, distant relationships Review of Scoring

Scoring Normal Scores collected from SW matches against a database of sequences are the BEST scores for each pair, not random Thus, distribution is not normal, but skewed positively. For database searches, we can use the actual scores of all pairwise comparisions in DB as the set of scores. Knowing the distribution allows us to compute P(Score≥x) Gumbel Extreme Value Distribution has 2 parameters m(center) and l (scaling) Extreme Value

Scoring, cont. Parameter Estimation [m(center) and l (scaling)] Estimate from moments [m = x - 0.4500s and l = 1.2825s] Maximum likelihood estimation [SSEARCH, FASTA] scores between random sequences increase with sequence length. For each seq. near length L, plot SW-score vs. log(avg.LENGTH) Fit scores by linear regression High scores and low outliers are trimmed from regression fit. “normalize”: subtract predicted value from real value Compute z-score: how many standard deviations away is normalized score Z-scores have known extreme value distribution parameters.

Profile/Scoring Matrixes So far, query is single sequence Compare: query as regular expression or other generalized pattern Example: Position-Specific Scoring Matrix (PSSM) WHY? Motifs Multiple sequence alignments

PSSM A M P G V A M P G V A M P G V A 4 . . . A 4 . . . A 4 . . . C . . . . G . . 2 0 M 1 2 . . P . 3 1 . V . . . 1 - A 4 . . . C . . . . G . . 2 0 M 1 2 . . P . 3 1 . V . . . 1 - A 4 . . . C . . . . G . . 2 0 M 1 2 . . P . 3 1 . V . . . 1 - 4+2-1+0=5 1+3+2-1=5 0+0+0=0