Download presentation
Presentation is loading. Please wait.
Published byRalf Gilbert Modified over 9 years ago
1
http://www.faculty.ucr.edu/~tgirke/Teaching/Gen240B_2003.ppt Web-based/Open-source Tools for Bioinformatics and Genome Analysis
2
Bioinformatics Areas A. Traditional Bioinformatics Sequence analysis Gene expression analysis Proteomics Metabolic profiling Phenotypes Networks B. Structural Bioinformatics Molecular modeling Drug design C. Biological Databases Systems Biology
3
Focus of this Seminar 1. Sequences 2. Structure 3. Expression 4. Functional Groups Bio* Projects and Databases
4
1. Some Analysis Steps Fragment Assembly: ESTs and genes Mapping Annotation Gene predictions ORFs, UTRs, introns, exons, promoters Lots of errors in eukaryote genomes!! Similarity searches BLAST, FASTA, Smith-Waterman Gene families Domain databases Multiple alignments Structure/Function 2D, 3D structure (availability?)
5
Important Sequence Databases Selection NCBI Entrez: http://www.ncbi.nlm.nih.gov/ Batch Entrez: http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi Downloads: ftp://ftp.ncbi.nih.gov/blast/db/ EMBL-EBI General: http://www.ebi.ac.uk/ Downloads: http://www.ebi.ac.uk/FTP/ Swiss-Prot General: http://us.expasy.org/ Downloads: http://us.expasy.org/expasy_urls.html TIGR General: http://www.tigr.org/ Downloads: ftp://ftp.tigr.org/pub/data/ Protein Data Bank (PDB) General: http://www.rcsb.org/pdb/ Downloads: ftp://ftp.rcsb.org/pub/pdb/data
6
Example: NCBI
7
Sequence Database Searches Important search algorithms Swiss-Waterman, FASTA, BLAST BLAST Flavors: http://www.ncbi.nlm.nih.gov/Sitemap/index.html#BLAST BLAST: BLASN, BLASTP, TBLASTN, TBLASTX Psi-BLAST: Position-Specific Iterated BLAST RPS-BLAST: Reverse Position-Specific BLAST Phi-BLAST: Pattern Hit Initiated BLAST Mega-BLAST: 10 faster than BLASTN BLAST2: pairwise comparisons WU-BLAST: Washington University BLAST Download of NCBI BLAST tools: ftp://ftp.ncbi.nih.gov/toolbox/
8
Homework Assignment Finish only one assignment! Go to http://www.ncbi.nlm.nih.gov/, select protein DB, run query: P450 & hydroxylase & human [organism], select under ‘Limits’ SwissProt report final query syntax from ‘Details’ page. Save GIs from this final query to file (select ‘GI List’ format under display) report how many GIs you retrieved Retrieve the corresponding sequences through Batch-Entrez (http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi) using GI list file as query input -> save sequences in FASTA format Generate multiple alignment and tree of these sequences using Multalign (http://prodes.toulouse.inra.fr/multalin/multalin.html) save multiple alignment and tree to file identify putative heme binding cysteine Open corresponding SwissProt page (http://us.expasy.org/sprot/) for first P450 sequence in your list Compare putative heme binding cysteine and compare with consensus pattern from Prosite database Report corresponding Pfam ID How many mouse (Mus musculus) sequences are in this family (use ‘species tree’ on Pfam db) BLASTP against nr database (use again first P450 in your list), select on “See Conserved Domains from CDD” (this runs RPS-BLAST), click on red P450 domain. Compare resulting alignment with result from MultAlin View 3D structure in Cn3D, save structure (screen shot) and highlight heme binding cysteine
9
Remote Homology Detection Psi-BLAST/RPS-BLAST HMMs: HMMER, SAM Domain databases Fold recognition approaches (Meta Servers)
10
Protein Domain Databases Selection PFAM http://pfam.wustl.edu/ PROSITE http://us.expasy.org/prosite/ ProDom http://prodes.toulouse.inra.fr/prodom/2002.1/html/h ome.php InterPro http://www.ebi.ac.uk/interpro/
11
Selection of Tools for Promoter Analysis Verbumculus, UC Riverside http://www.cs.ucr.edu/%7Estelo/Verbumculus/ AlignACE & ScanACE http://arep.med.harvard.edu/mrnadata/mrnasoft.html MEME and META-MEME, San Diego Super Computer Center: http://www.sdsc.edu/Research/biology/ Regulatory Sequence Analysis Tools (RSA) http://rsat.ulb.ac.be/rsat/ Gibbs Motif Sampler, Coldspring Harbor: http://argon.cshl.org/ioschikz/gibbsDNA/mgibbsDNA-form.html Motif Sampler, searches for over-represented motifs http://www.esat.kuleuven.ac.be/~thijs/Work/MotifSampler.html Stanford, motif finding in upstream sequences http://genome-www4.stanford.edu/cgi-bin/ewing/oligoAnalysis.pl
12
Example: RSA
13
Promoter Databases Selection Regulatory Sequence Analysis Tools (RSA) http://rsat.ulb.ac.be/rsat/ Eukaryotic Promoter Database http://www.epd.isb-sib.ch/ Human Promoter Database http://zlab.bu.edu/%7Emfrith/HPD.html Arabidopsis http://exon.cshl.org/cgi-bin/atprobe/atprobe.pl
14
Alternative Homework Do only one assignment! Work through tutorial of Regulatory Sequence Analysis Tools (http://rsat.ulb.ac.be/rsat/). Provide short summary for different tools
15
2. Protein Modeling Tool collection: http://faculty.ucr.edu/~tgirke/Links.htm Databases: Protein Data Bank: General: http://www.rcsb.org/pdb/ Downloads: ftp://ftp.rcsb.org/pub/pdb/data More databases: http://faculty.ucr.edu/~tgirke/Links.htm#Databases
16
3. Microarrays and Chips Definition: Hybridization-based technique that allows simultaneous analysis of thousands of samples on a solid substrate. Applications: Examples Transcriptional Profiling Gene copy number Resequencing Genotyping Single-nucleotide polymorphism DNA-protein interaction Insertional library screening Identification of new cell lines Etc. Developing Areas: Protein arrays Chemical arrays
17
Why Microarrays? Simultaneous analysis of over 50,000 genes Signaling and Metabolic Networks Regulatory genes First step in discovery of gene function Prediction of limiting factors in biological processes Rapid analysis of mutants and transgenics Reduce time of costly clinical studies and field trials DNA Arrays gene expression Input SamplesOutputs WT Mutants Transgenics Treatments biotic, abiotic, chemicals Prognosis Diagnosis Target identification
18
Basic Analysis Steps Image analysis Filtering, background correction Standardization, scaling and normalization Significance analysis (replicates) Cluster analysis (time series) Integration with sequence and functional information
19
Planning Steps of Transcriptional Profiling Experiments 1. Biological question(s), e.g.: - Which genes are up or down-regulated in a mutant/transgenic line? - Which genes cycle during a series of treatments? 2. Selection of best biological samples - Minimize variability in sample collection. 3. Develop validation and follow-up strategy for expected expression hits - e.g. real-time PCR and analysis of transgenics or mutants 4. Choose type of experiment - pairwise: e.g.WT vs. Mutant/Transgenic - series of time points or treatments allows cluster analysis 5. Choose Reference - sample with maximum number of expressed genes (maxim. biolog.information) - pooled RNA of all points: less variability from reference, saves chips WT t1 WT t2 MT t1 MT t2 WT t1 WT t2 WT t3 WT t4 WT t5
20
Planning Steps of Transcriptional Profiling Experiments 6. How many replicates? - biological replicate: starts with sample collection - technical replicate: starts usually with same RNA isolation - dye-swaps: (1) WT-Cy3:MT-Cy5, (2) WT-Cy5:MT-Cy3 7. Management of sample collection and RNA isolation - Define a “realistic” volume - RNA quality tests!!!! 8. cDNA/cRNA labeling - Which labeling technique? RNA amplification, reliability, sensitivity, etc. 9. Array hybridizations and post-processing 10. Array scanning
21
Important Pattern Recognition (clustering) Methods Hierarchical clustering single, average (UPGMA) and complete linkage Non-hierarchical clustering Self Organizing Maps (SOM) k-means Dimension Reduction Analysis Principal Component Analysis Neural Networks & Machine Learning
22
Tools for Microarray Analysis Image analysis: ScanAlyze Normalization: SNOMAD, R projects Mining/clustering: J-Express, R projects Much more: http://faculty.ucr.edu/%7Etgirke/Links.htm#Profiling
23
Example of an Integrated Clustering Tool: J-Express
24
Microarray Databases Selection Stanford Microarray Database (SMD) http://genome-www5.stanford.edu/MicroArray/SMD/ Gene Expression Omnibus (GEO) http://www.ncbi.nlm.nih.gov/geo/
25
- Go to the SNOMAD page (Standardization and Normalization of Microarray Data): http://pevsnerlab.kennedykrieger.org/snomadinput.html - Select “Use an Example dataset to see how SNOMAD works” and chose either option #2 (Incyte dataset) or #3 (Affymetrix dataset). If you prefer you can use your own or other public data instead. A good resource to download public data is the Stanford site: http://genome-www5.stanford.edu/cgi-bin/SMD/publicData.pl - Select all possible transformations and graphs and submit the data for processing. - Report: Give a short description (one or two sentences) for each graph/transformation of the returned results. Alternative Homework Assignment Do only one assignment!
26
4. Functional Groups Assigning “Biological Meaning” to Profiling Data Protein Families COGs (43 genomes, NCBI): http://www.ncbi.nlm.nih.gov/COG/ Protein Domain Databases (PFAM) Gene Ontology Consortium Df: controlled vocabulary for all organisms http://www.geneontology.org/ Pathways KEGG Metabolic Pathways http://www.genome.ad.jp/kegg/kegg2.html WIT Database (39 genomes) http://wit.mcs.anl.gov/WIT2/
27
Toolboxes for Bioinformaticians Popular scripting languages Perl: http://www.perl.com/ Python: http://www.python.org/ Bio* modules for processing data from databases and applications BioPerl: http://bio.perl.org/ BioPython: http://biopython.org/ BioJava: http://www.biojava.org/ BioRuby: http://bioruby.org/ Statistics R: http://www.R-project.org BioConductor (Microarray): http://www.bioconductor.org/ Database systems MySQL: http://www.mysql.com/ PostgreSQL: http://www.postgresql.org/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.