Download presentation
Presentation is loading. Please wait.
Published byBaldric Reed Modified over 8 years ago
1
Bioinformatics Shared Resource Bioinformatics : How to… Bioinformatics Shared Resource http://bsrweb.burnham.org Kutbuddin Doctor, PhD
2
Bioinformatics Shared Resource Overview Homology Defined Searching for Homologs Homology Exceptions & Warnings Alignments (protein & genomic) BSR Services
3
Bioinformatics Shared Resource Overview Homology Defined Searching for Homologs Homology Exceptions & Warnings Alignments (protein & genomic) BSR Services
4
Bioinformatics Shared Resource Homology Defined Homologs: “Proteins/genes that share a common ancestral protein/gene.” They may share function. Homology is inferred based on sequence similarity. A statistical model decides if similarity is sufficient to infer homology. (never say “% homology” – this is wrong)
5
Bioinformatics Shared Resource Statistical Models Smith-Waterman : optimal* alignment. More true homologs predicted. Slow - NOT heuristic. BLAST : Provides reliable predictions. Low false positives. Fast, heuristic method. Protein-based homology : more sensitive models than nucleotide sequence searches. (Multiple) Sequence Alignments : NO! Heuristic methods make assumptions at the risk of missing some alignments.
6
Bioinformatics Shared Resource
7
Bioinformatics Shared Resource Overview Homology Defined Searching for Homologs Homology Exceptions & Warnings Alignments (protein & genomic) BSR Services
8
Bioinformatics Shared Resource Option to Choose species, or.. http://www.ncbi.nlm.nih.gov/blast
9
Bioinformatics Shared Resource
10
Bioinformatics Shared Resource Enter Query sequence Choose reference DB to search nr : non-redundant refseq_protein : curated reference sequences by NCBI SwissProt : curated reference seq by EMBL / SIB PDB : 3-D Structure database (not models) env_nr : enviromental samples (unknown organism) pat : patented protein sequences Use default “blastp” algorithm
11
Bioinformatics Shared Resource BLAST Enter Query sequence Choose reference DB to search nr : non-redundant refseq : curated reference sequences by NCBI SwissProt : curated reference seq by EMBL / SIB PDB : experimental 3-D Structure database env-nr : enviromental sequences (unknown organism)
12
Bioinformatics Shared Resource Domains
13
Bioinformatics Shared Resource Switch to Browser
14
Bioinformatics Shared Resource BLAST - output
15
Bioinformatics Shared Resource BLAST - output Expectation value (“E-value”): The number of RANDOM alignments you can expect to have with that score or higher given the size of the database searched. This alignment was MUCH BETTER than we expect for a random occurrence. The alignment is due to common ancestry rather than a random chance alignment. 2e-109 = 2 x 10 -109 0.000000000….. 2
16
Bioinformatics Shared Resource Expectation value (“E-value”): The number of RANDOM alignments you can expect to have with that score or higher given the size of the database searched. We expect at ~1 false positive (non-homolog) alignment at this score or better… is this the false positive? What false positive rate can you tolerate? Based on that, you can choose an “E-value cutoff”
17
Bioinformatics Shared Resource Better statistical models are able to move this “FALSE NEGATIVE” to a “TRUE POSITIVE” prediction. BLAST - output
18
Bioinformatics Shared Resource PSI-BLAST (not default)
19
Bioinformatics Shared Resource BLASTp vs. PSI-BLAST REFERENCE DATABASE (RefSeq) Fixed Statistical Model Query Alignments Query Alignments Fixed Statistical Model Homologs Modified Statistical Model
20
Bioinformatics Shared Resource BLASTp vs. PSI-BLAST REFERENCE DATABASE (RefSeq) Fixed Statistical Model Query Alignments Query Alignments Fixed Statistical Model Homologs Modified Statistical Model Homologs iteration
21
Bioinformatics Shared Resource PSI-BLAST Iterations QUERY: Hu RAIDD-2 proteinDB: refseq_protein
22
Bioinformatics Shared Resource ORIGINAL BLAST ITERATIVE BLAST : PSI-BLAST
23
Bioinformatics Shared Resource Nucleotide BLAST Protein vs DNA BLAST protein encoded searches are better. Translated DNA BLAST (TBLASTN): both query and reference database are translated to protein (6 frames). Then BLAST is run as a protein vs protein search.
24
Bioinformatics Shared Resource Nucleotide Databases BLAST DATABASES UCSC genome –http://genome.ucsc.eduhttp://genome.ucsc.edu –“BLAT” search
25
Bioinformatics Shared Resource
26
Bioinformatics Shared Resource Choose which species to show in synteny tracks
27
Bioinformatics Shared Resource Homology Search NCBI BLAST & UCSC BLAT on WEB: (as shown) Local BLAST: Specialized Reference DB. 10+ queries. BLAST as service: BSR – Compute Cluster : 132 CPU Example : Smith-Waterman 1000 siRNA singe computer: 4,000 hrs BSR cluster : 8 hrs Also “Small jobs” : single families, multi-domain families
28
Bioinformatics Shared Resource ALIGNMENTS BLAST alignments : specialized for searches. Multiple Sequence Alignments. –ClustalW, T-coffee, COBALT, etcClustalW –Not a Search alignment.. Goals differ Structure-based Alignments. –Some based only on structure, query is structure –FatCat (Godzik lab) / CE (Bourne lab)FatCatCE
29
Bioinformatics Shared Resource
30
Bioinformatics Shared Resource
31
Bioinformatics Shared Resource From EBI - ClustalW
32
Bioinformatics Shared Resource Parting TIPS.. Biobar - great browser bar for biologists: BSR website: http://bsrweb.burnham.org
33
Bioinformatics Shared Resource Contact http://bsrweb.burnham.org/ Myself: Kutbuddin Doctor, PhD Bldg 10, Rm 1205 (downstairs) x3488 ; ksdoctor@burnham.orgksdoctor@burnham.org
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.