Public data and tool repositories Section 2 Survey of analysis tools and tutorials
Problems from last section Query Entrez Gene with the following two queries separately and then explain the differences between the two results using a logical NOT operation: tyrosine kinase[Gene Ontology] AND human[Organism] cd00192[Domain Name] AND human[Organism] Retrieve the APP gene record from NCBI and use the Display dropdown menu to display Conserved Domain Links. Use the ids of the listed domains to query Entrez Gene for records with the same domains. Use the SNP Geneview link at NCBI to identify coding SNPs in the APP gene. Which SNP is missing from this display which was present in the Ensembl APP protein record? Use the Homologene link at NCBI to identify possible functional orthologs for human APP. How does this list compare to the Ensembl list of orthologs that we reviewed previously?
Review of last section example: human APP gene NCBI Entrez databases Constructing queries Gene, Nucleotide and Protein RefSeq UCSC Genome Browser Finding genes Displaying data tracks Comparing data from different sources EBI/Ensembl Viewing Genes, Transcripts, Exons, Proteins and SNPs Common id and data formats
This section Protein structure visualization/analysis example Promoter/enhancer analysis example More information
Amyloid Precursor Protein (APP) G-protein coupled receptor that binds heparin and laminin ß-secretase amyloid fibril amyloid plaque DAEFRHDSGYEVHHQKLVFFAEDVGSNKGAIIGLMVGGVVIA -secretase Ex: Viewing the structure of an amyloid fibril
Other structure tools Structure visualization. Free applications: RasMol Cn3D VMD Structure prediction servers/applications CASP: Critical Assessment of Techniques for Protein Structure Prediction General method: Sequence similarity search to identify closest homolog with known structure Fit to homolog’s known structure, minimizing some constraint
APP Upstream Region 15kb Ex: Extracting and aligning human and mouse APP upstream regions
Promoter/enhancer analysis approaches Same gene, multiple species Assumed evolutionary conservation of non-coding regions Can use pairwise or multiple alignment method Examples: Precomputed: UCSC conservation tracks Dynamic: eg, rVista Different genes, same species Typical output as co-expressed clusters from microarray data Looking for over-represented, small binding sites Much better results if looking for a pattern or clustering of multiple sites Motif-finding algorithm, eg, MEME
Tutorials NCBI EBI UCSC Field Guide Information and tutorials Science Primer EBI 2Can Tutorials UCSC Genome Browser User’s Guide
Next week’s sections John Major Genome Browsers genome build process, ongoing and complete genome projects genome browsers of Ensembl, UCSC and NCBI Mapviewer Bulk downloads how bulk bioinformatics data might be useful common data formats retrieving data