Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bioinformatics Shared Resource Bioinformatics : How to… Bioinformatics Shared Resource Kutbuddin Doctor, PhD.

Similar presentations


Presentation on theme: "Bioinformatics Shared Resource Bioinformatics : How to… Bioinformatics Shared Resource Kutbuddin Doctor, PhD."— Presentation transcript:

1 Bioinformatics Shared Resource Bioinformatics : How to… Bioinformatics Shared Resource http://bsrweb.burnham.org Kutbuddin Doctor, PhD

2 Bioinformatics Shared Resource Overview Homology Defined Searching for Homologs Homology Exceptions & Warnings Alignments (protein & genomic) BSR Services

3 Bioinformatics Shared Resource Overview Homology Defined Searching for Homologs Homology Exceptions & Warnings Alignments (protein & genomic) BSR Services

4 Bioinformatics Shared Resource Homology Defined Homologs: “Proteins/genes that share a common ancestral protein/gene.” They may share function. Homology is inferred based on sequence similarity. A statistical model decides if similarity is sufficient to infer homology. (never say “% homology” – this is wrong)

5 Bioinformatics Shared Resource Statistical Models Smith-Waterman : optimal* alignment. More true homologs predicted. Slow - NOT heuristic. BLAST : Provides reliable predictions. Low false positives. Fast, heuristic method. Protein-based homology : more sensitive models than nucleotide sequence searches. (Multiple) Sequence Alignments : NO! Heuristic methods make assumptions at the risk of missing some alignments.

6 Bioinformatics Shared Resource

7 Bioinformatics Shared Resource Overview Homology Defined Searching for Homologs Homology Exceptions & Warnings Alignments (protein & genomic) BSR Services

8 Bioinformatics Shared Resource Option to Choose species, or.. http://www.ncbi.nlm.nih.gov/blast

9 Bioinformatics Shared Resource

10 Bioinformatics Shared Resource Enter Query sequence Choose reference DB to search nr : non-redundant refseq_protein : curated reference sequences by NCBI SwissProt : curated reference seq by EMBL / SIB PDB : 3-D Structure database (not models) env_nr : enviromental samples (unknown organism) pat : patented protein sequences Use default “blastp” algorithm

11 Bioinformatics Shared Resource BLAST Enter Query sequence Choose reference DB to search nr : non-redundant refseq : curated reference sequences by NCBI SwissProt : curated reference seq by EMBL / SIB PDB : experimental 3-D Structure database env-nr : enviromental sequences (unknown organism)

12 Bioinformatics Shared Resource Domains

13 Bioinformatics Shared Resource Switch to Browser

14 Bioinformatics Shared Resource BLAST - output

15 Bioinformatics Shared Resource BLAST - output Expectation value (“E-value”): The number of RANDOM alignments you can expect to have with that score or higher given the size of the database searched. This alignment was MUCH BETTER than we expect for a random occurrence. The alignment is due to common ancestry rather than a random chance alignment. 2e-109 = 2 x 10 -109 0.000000000….. 2

16 Bioinformatics Shared Resource Expectation value (“E-value”): The number of RANDOM alignments you can expect to have with that score or higher given the size of the database searched. We expect at ~1 false positive (non-homolog) alignment at this score or better… is this the false positive? What false positive rate can you tolerate? Based on that, you can choose an “E-value cutoff”

17 Bioinformatics Shared Resource Better statistical models are able to move this “FALSE NEGATIVE” to a “TRUE POSITIVE” prediction. BLAST - output

18 Bioinformatics Shared Resource PSI-BLAST (not default)

19 Bioinformatics Shared Resource BLASTp vs. PSI-BLAST REFERENCE DATABASE (RefSeq) Fixed Statistical Model Query Alignments Query Alignments Fixed Statistical Model Homologs Modified Statistical Model

20 Bioinformatics Shared Resource BLASTp vs. PSI-BLAST REFERENCE DATABASE (RefSeq) Fixed Statistical Model Query Alignments Query Alignments Fixed Statistical Model Homologs Modified Statistical Model Homologs iteration

21 Bioinformatics Shared Resource PSI-BLAST Iterations QUERY: Hu RAIDD-2 proteinDB: refseq_protein

22 Bioinformatics Shared Resource ORIGINAL BLAST ITERATIVE BLAST : PSI-BLAST

23 Bioinformatics Shared Resource Nucleotide BLAST Protein vs DNA BLAST protein encoded searches are better. Translated DNA BLAST (TBLASTN): both query and reference database are translated to protein (6 frames). Then BLAST is run as a protein vs protein search.

24 Bioinformatics Shared Resource Nucleotide Databases BLAST DATABASES UCSC genome –http://genome.ucsc.eduhttp://genome.ucsc.edu –“BLAT” search

25 Bioinformatics Shared Resource

26 Bioinformatics Shared Resource Choose which species to show in synteny tracks

27 Bioinformatics Shared Resource Homology Search NCBI BLAST & UCSC BLAT on WEB: (as shown) Local BLAST: Specialized Reference DB. 10+ queries. BLAST as service: BSR – Compute Cluster : 132 CPU Example : Smith-Waterman 1000 siRNA singe computer: 4,000 hrs BSR cluster : 8 hrs Also “Small jobs” : single families, multi-domain families

28 Bioinformatics Shared Resource ALIGNMENTS BLAST alignments : specialized for searches. Multiple Sequence Alignments. –ClustalW, T-coffee, COBALT, etcClustalW –Not a Search alignment.. Goals differ Structure-based Alignments. –Some based only on structure, query is structure –FatCat (Godzik lab) / CE (Bourne lab)FatCatCE

29 Bioinformatics Shared Resource

30 Bioinformatics Shared Resource

31 Bioinformatics Shared Resource From EBI - ClustalW

32 Bioinformatics Shared Resource Parting TIPS.. Biobar - great browser bar for biologists: BSR website: http://bsrweb.burnham.org

33 Bioinformatics Shared Resource Contact http://bsrweb.burnham.org/ Myself: Kutbuddin Doctor, PhD Bldg 10, Rm 1205 (downstairs) x3488 ; ksdoctor@burnham.orgksdoctor@burnham.org


Download ppt "Bioinformatics Shared Resource Bioinformatics : How to… Bioinformatics Shared Resource Kutbuddin Doctor, PhD."

Similar presentations


Ads by Google