Projects….

Slides:

Advertisements

Similar presentations

Sequence comparison: Significance of similarity scores Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.

Advertisements

Fa07CSE 182 CSE182-L4: Database filtering. Fa07CSE 182 Summary (through lecture 3) A2 is online We considered the basics of sequence alignment –Opt score.

BLAST Sequence alignment, E-value & Extreme value distribution.

Searching Sequence Databases

Lecture outline Database searches

Database searching. Purposes of similarity search Function prediction by homology (in silico annotation) Function prediction by homology (in silico annotation)

Basic Statistical Concepts Psych 231: Research Methods in Psychology.

Sequence Alignment vs. Database Task: Given a query sequence and millions of database records, find the optimal alignment between the query and a record.

Basic Statistical Concepts

Statistics Psych 231: Research Methods in Psychology.

Pairwise Sequence Alignment Part 2. Outline Global alignments-continuation Local versus Global BLAST algorithms Evaluating significance of alignments.

Spring 2007 Bioinformatiatics Ch. 2 - Sequence Alignment.

Similar Sequence Similar Function Charles Yan Spring 2006.

Fa05CSE 182 CSE182-L5: Scoring matrices Dictionary Matching.

Quantitative Business Analysis for Decision Making Simple Linear Regression.

Rationale for searching sequence databases June 22, 2005 Writing Topics due today Writing projects due July 8 Learning objectives- Review of Smith-Waterman.

Basic Statistical Concepts Part II Psych 231: Research Methods in Psychology.

Sequence alignment, E-value & Extreme value distribution

Sequence comparison: Score matrices Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Sequence comparison: Local alignment

TM Biological Sequence Comparison / Database Homology Searching Aoife McLysaght Summer Intern, Compaq Computer Corporation Ballybrit Business Park, Galway,

© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.

BLAST What it does and what it means Steven Slater Adapted from pt.

BPS - 3rd Ed. Chapter 211 Inference for Regression.

AP Statistics Section 15 A. The Regression Model When a scatterplot shows a linear relationship between a quantitative explanatory variable x and a quantitative.

Protein Sequence Alignment and Database Searching.

Sequence analysis: Macromolecular motif recognition Sylvia Nagl.

Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.

Motif discovery Tutorial 5. Motif discovery MEME Creates motif PSSM de-novo (unknown motif) MAST Searches for a PSSM in a DB TOMTOM Searches for a PSSM.

BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.

CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.

Comp. Genomics Recitation 3 The statistics of database searching.

Regression Analysis Week 8 DIAGNOSTIC AND REMEDIAL MEASURES Residuals The main purpose examining residuals Diagnostic for Residuals Test involving residuals.

Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.

1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.

HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.

Rationale for searching sequence databases June 25, 2003 Writing projects due July 11 Learning objectives- FASTA and BLAST programs. Psi-Blast Workshop-Use.

Pairwise Local Alignment and Database Search Csc 487/687 Computing for Bioinformatics.

Applied Bioinformatics Week 3. Theory I Similarity Dot plot.

Effective Keyword-Based Selection of Relational Databases By Bei Yu, Guoliang Li, Karen Sollins & Anthony K. H. Tung Presented by Deborah Kallina.

Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.

Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.

BPS - 5th Ed. Chapter 231 Inference for Regression.

Your friend has a hobby of generating random bit strings, and finding patterns in them. One day she come to you, excited and says: I found the strangest.

BLAST BNFO 236 Usman Roshan. BLAST Local pairwise alignment heuristic Faster than standard pairwise alignment programs such as SSEARCH, but less sensitive.

Lecture Slides Elementary Statistics Twelfth Edition

Anticipating Patterns Statistical Inference

Computations, and the best fitting line.

BLAST Anders Gorm Pedersen & Rasmus Wernersson.

Sequence comparison: Local alignment

Essential Statistics (a.k.a: The statistical bare minimum I should take along from STAT 101)

Simple Linear Regression - Introduction

Identifying templates for protein modeling:

Lecture Slides Elementary Statistics Thirteenth Edition

Sequence comparison: Significance of similarity scores

Local alignment and BLAST

Global, local, repeated and overlaping

Re-expressing Data:Get it Straight!

Fast Sequence Alignments

Sequence Based Analysis Tutorial

Sequence Based Analysis Tutorial

Pairwise Sequence Alignment (cont.)

Basic Practice of Statistics - 3rd Edition Inference for Regression

Sequence comparison: Significance of similarity scores

Basic Local Alignment Search Tool

Created by Erin Hodgess, Houston, Texas

Sequence alignment, E-value & Extreme value distribution

CH2 Time series.

Searching Sequence Databases

Presentation transcript:

Projects…

FASTA Lookup Tables ACNGTSCHQE C S Q GCHCLSAGQD ACNGTSCHQE G C sequence 1: ACNGTSCHQE sequence 2: GCHCLSAGQD ACNGTSCHQE C S Q GCHCLSAGQD ACNGTSCHQE CH GCHCLSAGQD ACNGTSCHQE G C GCHCLSAGQD

SSEARCH Smith-Waterman local alignment pairwise on entire database Extremely slow Best for identifying weak, distant relationships Review of Scoring

Scoring Normal Scores collected from SW matches against a database of sequences are the BEST scores for each pair, not random Thus, distribution is not normal, but skewed positively. For database searches, we can use the actual scores of all pairwise comparisions in DB as the set of scores. Knowing the distribution allows us to compute P(Score≥x) Gumbel Extreme Value Distribution has 2 parameters m(center) and l (scaling) Extreme Value

Scoring, cont. Parameter Estimation [m(center) and l (scaling)] Estimate from moments [m = x - 0.4500s and l = 1.2825s] Maximum likelihood estimation [SSEARCH, FASTA] scores between random sequences increase with sequence length. For each seq. near length L, plot SW-score vs. log(avg.LENGTH) Fit scores by linear regression High scores and low outliers are trimmed from regression fit. “normalize”: subtract predicted value from real value Compute z-score: how many standard deviations away is normalized score Z-scores have known extreme value distribution parameters.

Profile/Scoring Matrixes So far, query is single sequence Compare: query as regular expression or other generalized pattern Example: Position-Specific Scoring Matrix (PSSM) WHY? Motifs Multiple sequence alignments

PSSM A M P G V A M P G V A M P G V A 4 . . . A 4 . . . A 4 . . . C . . . . G . . 2 0 M 1 2 . . P . 3 1 . V . . . 1 - A 4 . . . C . . . . G . . 2 0 M 1 2 . . P . 3 1 . V . . . 1 - A 4 . . . C . . . . G . . 2 0 M 1 2 . . P . 3 1 . V . . . 1 - 4+2-1+0=5 1+3+2-1=5 0+0+0=0