Locating Gene/Protein Information September 26, 2013 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services Health Sciences Library System University of Pittsburgh
Topics Literature Informatics Gene / Protein Information Gateways Search Engine for MolBio / Bioinformatics Databases and Software
Literature Informatics Which genes/proteins are reported to be associated with the disease - Schizophrenia? Citations: 19 million Journals: 5200 Schizophrenia: 86, Schizophrenia gene: 5851…7295
Challenges in Literature Search Am I getting everything? Too much Information.. How to digest? A list with citations
Medical Subject Heading (MESH)
Medical Subject Headings (MeSH) The U.S. National Library of Medicine's controlled vocabulary (thesaurus) Arranged in a hierarchical manner called the MeSH Tree Structures Updated annually
MeSH Vocabulary Headings over 24,000 representing concepts found in the biomedical literature (Body Weight, Kidney, Radioactive Waste) Subheadings attached to headings to describe a specific aspect of a concept (adverse effects, metabolism, diagnosis, therapy) Supplementary Concept Records over 172,000 terms in a separate chemical thesaurus -updated weekly (cordycepin, valspodar, tacrolimus binding protein 4) Publication Types (Letter, Review, Randomized Controlled Trial)
MeSH Tree Structure A. Anatomy B. Organisms C. Diseases D. Chemical and Drugs E. Analytical, Diagnostic and Therapeutic Techniques and Equipment F. Psychiatry and Psychology G. Biological Sciences H. Physical Sciences I. Anthropology, Education, Sociology and Social Phenomena J. Technology and Food and Beverages K. Humanities L. Information Science M. Persons N. Health Care V. Publication Characteristics Z. Geographic Locations
MeSH Indexing Source: NLM
MeSH Indexing Genes/Chemicals MeSH Terms
PubMed Query Using MeSH
Find articles on “Dengue outbreaks in India” by searching PubMed using Mesh terms Link to the video tutorial: Resources Mesh Browser : PubMed:
Building PubMed Queries TermBooleanTermBooleanTerm# papers DengueANDOutbreaks823 Dengue *ANDOutbreaks746 DengueANDOutbreaksANDIndia131 Dengue*ANDOutbreaksANDIndia116 DengueANDOutbreaks/ statistics and numerical data ANDIndia7 Dengue*ANDOutbreaks/ statistics and numerical data ANDIndia7
Useful links for MESH MESH Browser: Link to Wikipedia, Youtube videos, blogs etc on “medical subject heading”: ways to improve your Pubmed searches by Carrie Iwema pubmed-searches/ pubmed-searches/ Searching by using the MeSH Database. NCBI Handbook : &part=pubmedhelp#pubmedhelp.Searching_by_using_t &part=pubmedhelp#pubmedhelp.Searching_by_using_t
Find genes that are reported to be associated with the disease SCHIZOPHRENIA by searching PubMed Link to the video tutorial: Resources PubMed Clinical Queries:
Topic-Specific PubMed Queries
Research on Optimal Search Strategies
PubMed Special Topic Queries
Search Filters
PubMed Search Filter: Medical Genetics ("schizophrenia"[MeSH Terms] OR "schizophrenia"[All Fields]) AND (("genetics, medical"[MeSH Terms] OR ("genetics"[All Fields] AND "medical"[All Fields]) OR "medical genetics"[All Fields] OR ("medical"[All Fields] AND "genetics"[All Fields])) OR ("genotype"[MeSH Terms] OR "genotype"[All Fields]) OR "genetics"[Subheading] AND ("genetics"[Subheading] OR "genetics"[All Fields] OR "genetics"[MeSH Terms]))
PubMed Search Result Display
Latest Innovations in Literature Searching GoPubMed Display search results sorted into meaningful topics and subtopics
GoPubMed
Find genes that are reported to be associated with the disease SCHIZOPHRENIA by using GoPubMed Link to the video tutorial: Resources GoPubMed:
GoPubMed Search Result
GoPubMed Search Result Analysis
GoPubMed Search Result Analysis
Latest Innovations in Literature Searching
PubMed driven Web Tools
Literature to Gene List GLAD4U
Gene list to common functions
Literature to Gene list
NIH Grant Applications to Gene List
Curated Molecular Databases
Molecular Databases Nucleic Acids Research : Annual databases IssueAnnual databases Issue NAR: Annual Web Server IssueAnnual Web Server Issue Oxford Journal : BioinformaticsBioinformatics BioMedCentral: BMC Bioinformatics BioMedCentral: BMC Bioinformatics Growth of bioinformatics tools
Growth of Molecular Databases Source: Nodal Point Blog 2008: : 1330
GWA Studies Catalog
GWA Studies Catalog
Search Engine Just for Human Genetics CDC HuGENavigator :
Find human genes that are reported to be associated with the Asthma Find human SNPs that are reported to be associated with the Asthma Link to the video tutorial: Resources HugeNavigator:
Search Engine Just for Human Genetics
Search Engine Just for Human Genetics CDC HuGENavigator :
Search Engine Just for Human Genetics
Search Engine Just for Human Genetics CDC HuGENavigator :
Find Disease Causing SNPs What SNPs are associated with “Schizophrenia”?
Hands-On exercise on lit search Which proteins are related to Alzheimer’s disease? Where/who are the leading centers and scientists for liver transplantation? Which hormones are Autistic Disorder associated with?
Search Engine for Bioinformatics Tools
Biomedical and Life Sciences Search Engines OBRC : University of Pittsburgh Vadlo OReFil : University of Tokyo
Search.HSLS.MolBio
Search.HSLS.MolBio Integrated search system Databases & Software Articles on Databases & Software Genes/Proteins Pathways Protocols Seminar/Talks Videos Recommended Articles Tabbed browsing Clustered search results
Search term: “phosphorylation”
Molecular Databases and Software: search term: “Phosphorylation”
Search Result Page
Citation Trackers
Searh PubMed Articles on Databases and Software : “phosphorylation”
Articles on Databases and Software
Articles on Prediction of Phosphorylation Sites
Prediction of Phosphorylation Sites
MetaPredPS
Clustering Remix
Genes/Proteins Info
Entrez Gene
BioBase Knowledge Library
Protocols:
Seminar Talks Video
Seminar Talks Video
Recommended Articles Faculty of 1000 Biology: a literature awareness tool that highlights and reviews the most interesting papers published in the biological sciences, based on the recommendations of a faculty of well over 2300 selected leading researchers.
Faculty of
Recommended Articles
Recommended Articles
Gene/Protein Information Mining
Bioinformatics Databases & Software Providers National Center for Biotechnology Information (NCBI) Home page Home page Site map Site map Resource Guide Resource Guide European Bioinformatics Institute (EBI) Home page Home page Databases Databases Software Software
Gene Information Gateways o Open access resources: National Center for Biotechnology Information (NCBI) Genbank Refseq Entrez Gene Gene Expression Omnibus (GEO) OMIM
Protein Information Hubs o Open access resources: European Bioinformatics Institute (EBI) Uniprot Interpro Prosite STRING UCSC Genome Bioinformatics BLAT Search Gene Detail Page
Protein Information Hubs o Open access resources: National Center for Biotechnology Information (NCBI) Refseq Entrez Gene Conserved Domain Database (CDD) Molecular Modeling Database (MMDB) 3D structure viewer: Cn3D
Gene/Protein Information Chromosomal location, mRNA, genomic seq, orthologs, paralogs, regulatory elements, Amino acid seq, domain architecture, protein structure, post translational modifications Gene expression, biological pathways, protein interaction map, disease association, biomarkers
Gene Questions ? What is its function? What are its neighboring genes? What is its genomic seq? How many splice varients are there? What are its intron-exon architechure? What diseases are associated with it? Which tissues it expressed ? How can I get its cDNA clone?
SNP Genomic Sequence Expression Profile Interacting Partners 3D Structure mRNA Sequence Chromosomal Localization Disease Amino acid Sequence Homologous Sequences NCBI : Entrez Gene
Entrez Gene Find: gene symbols and aliases sequences: genomic, mRNA, protein intron-exon architecture genomic context: neighboring and antisense genes Interacting partners associated gene ontology terms: function, cellular component and biological process
Entrez Gene a searchable database of genes, from RefSeq genomes, and defined by sequence and/or located in the NCBI Map ViewerRefSeq Statistics Gene: 7974 organisms Genbank: 160,000 organisms each record represents a single gene from a given organism
NCBI Sequence Databases GenBank archival database of nucleotide sequences from >160,000 organisms More infoMore info GenPept conceptual translation of GenBank CDS Refseq based on GenBank record, non-redundant expert verified databases of reference sequences
International Nucleotide Sequence Database Collaboration
Primary Vs Derivative databases
RefSeq Scope & Accessions Genomic DNA NC_ complete genome, complete chromosome, complete plasmid NG_ genomic region NT_ genomic contig mRNA - NM_ Protein - NP_ more about RefSeq scope and accessions...
RefSeq Status Codes Provisional Reviewed Predicted Genome Annotation more about RefSeq status codes more about RefSeq status codes
Hands on Find mRNA sequence for your gene of interest (p53, BRCA1, EGFR, PLCg1) Start page: Entrez core nucleotidecore nucleotide Use Limits, History and Preview Index
Sequence Format GenBank Header Features Sequence FASTA Sequence Example: U49845U49845 Sample GenBank record Sample GenBank record Sequence Revision History tool Sequence Revision History tool
Video Tutorials
Find mRNA Sequence for Reelin Gene.
Gene Function What is its function? Entrez Gene Page: Summary (TOC) Gene Ontology GeneRIFs Pathways (TOC) Biosystems (Links)
Gene Ontology (GO)GO Controlled vocabulary tagging Function Biological Processes Cellular Component
Gene Ontology (GO) and KEGG GO information page GO evidence codes KEGG Information page Information page
Function How many splice variants are there? What is/are its sequence? Entrez Gene Page: Genomic regions… (TOC) UCSC (Links) Video Tutorials
Alternative Splicing
Intron-Exon Coordinates What are its intron-exon architechure? Entrez Gene Page: Display Change it from Full report to Gene Table Video Tutorials
Neighboring Genes What are its neighboring genes? Entrez Gene Page: Genomic context (TOC) Video Tutorials
Chromosomal location
Associated Diseases What diseases are associated with it? Entrez Gene Page:TOC General Information_Phe notype Links OMIM HuGE Navigator Video Tutorials
Homologene What are its homologous genes? Entrez Gene Page: Link Homologene change Display settings Video Tutorials
Reagents How can I get its cDNA clone?..antibodies?.. siRNA ? Entrez Gene Page: TOC: Additional Links Research Materias Exact Antigen Video Tutorials
Protein Information Gateways
UniprotKB : Universal Protein Resource : a comprehensive, centralized protein information resource Developed by a consortium: European Bioinformatics Institute (EBI) the Swiss Institute of Bioinformatics (SIB) the Protein Information Resource (PIR) Comprised of: --Swiss-Prot: biologist-curated annotation data --TrEMBL: computationally annotation data --PIR-International Protein Sequence Database (PIR-PSD): the most comprehensive and expertly-curated protein sequence database in the public domain for over 20 years. Funded by: NIH, NSF, the European Union and the Swiss Federal government Link to Wiki, YouTube, Blogs and Tweets: Tutorial Video:
Protein Questions ? What is its Function? Amino acid sequence? … molecular wt? isoelectric point (PI)? … post translational modifications? … presence of domain/pattern/profile? … hydrophobicity? … homologous orthologs? Etc. Structure? … secondary and tertiary? Interaction Partner ?
Uniprot Video Tutorial
Protein Function from UniprotKB Uniprot Search: Look under: general annotation_Function, ontologies_keywords, geneontology
Protein Sequence Uniprot Sequence annotations sequences Gene Genomic regions, transcripts, and products ccds (consensus cds report) UCSC Sequence and links
Protein Sequence Analysis PTM Uniprot Seq annt IPA Modificatio- ns and Regulation PI/MW Uniprot Seq_Tool Compute PI Hydroph- obicity Uniprot Seq_Tool ProtScale Peptide Digest Uniprot Seq_Tools PeptideMa ss PeptideCutt er Homolog ous Seq Entrez Gene Homologe ne Domain/patt ern Uniprot Sequence annotation InterPro Entrez gene Conserved Domain
Protein Domain Resources Protein Domain Databases: InterPro
Protein Domains Wikipedia: Wikipedia A protein domain is a part of protein sequence and structure that can evolve, function, and exist independently of the rest of the protein chain. Each domain forms a compact three-dimensional structure and often can be independently stable and folded. Many proteins consist of several structural domains. One domain may appear in a variety of evolutionarily related proteins. Domains vary in length from between about 25 amino acids up to 500 amino acids in length. The shortest domains such as zinc fingers are stabilized by metal ions or disulfide bridges. Domains often form functional units, such as the calcium-binding EF hand domain of calmodulin. structureevolvefoldedEF hand domaincalmodulin
Protein Domain: SH3 Src homology 3 domains; SH3 domains bind to proline-rich ligands with moderate affinity and selectivity, preferentially to PxxP motifs; they play a role in the regulation of enzymes by intramolecular interactions, changing the subcellular localization of signal pathway components and mediate multiprotein complex assemblies.
Protein Structure Primary Secondary Tertiary Quarternary Useful links: Taken from wikipedia
Protein Structure NCBI
Finding Protein Structure
Structure Databases and Viewer Databases: RCSB Protein Data Bank (PDB) State University of New Jersey (Rutgers), the San Diego Supercomputer Center at the University of California San Diego, the University of Wisconsin-Madison Link MMDB NCBI's structure database is called MMDB (Molecular Modeling DataBase), and it is a subset of three-dimensional structures obtained from the Protein Data Bank (PDB), excluding theoretical models..PDB Viewer: Cn3D : a helper application for your web browser that allows you to view 3-dimensional structures from NCBI's Entrez retrieval service.Entrez Rasmol: EBI First glance in j mol : A simple tool for macromolecular visualization. (More..)More..
Protein Structure Search for the 3D structure of P53 Entrez structure View the crystal structure of mouse p53 core domain (MMDB: 42987) or Crystal Structure Of A P53 Core Dimer Bound To Dna ( PDB:2GEQ)
Manipulating the Structure Viewer Window
Find Similar Structure: NCBI VAST
NCBI BLink BLink ("BLAST Link") displays the results of BLAST searches that have been done for every protein sequence in the Entrez Proteins data domain.BLAST LinkEntrez Proteins To access it, follow the BLink link displayed beside any hit in the results of an Entrez Proteins search.
Hands-on Protein Structure View the crystal structure of Chronophin (PDB entry: 2P69). A variant of this protein with mutations in its amino acid sequence has been isolated. Can you predict any effect of its mutations into its function? Hint: Find the amino acid residues which are in close contact (3.5 A) with PYRIDOXAL-5'-PHOSPHATE (PLP). Label the amino acids and save the picture in PNG format. Learn more on Chronophin structure at:
Hands-on Protein Structure of Chronophin
Sequence Alignment in Cn3D NCBI
Hands-On Can you identify the human protein which contains a short peptide sequence: GPDGMPVIYHGHTLTTKIKFSDVLHTIKE ? What is its function? What is its calculated PI and molecular wt? Which region of this protein is most hydrophobic? Locate five experimentally verified S/T/Y phosphorylation sites present in this protein. Find the homologous mouse and fruit fly orthologs of this human protein and report the % protein identity it shares with these orthologs. How many protein domains are reported to be present in this human protein? Find the location of its largest domain.
Licensed Tools for Gene/Protein Information
HSLS Licensed Tools BioBase Metacore Ingenuity IPA
Gene/Protein facts from Biobase
BioBase BioKnowledge Library
Protein Function from IPA
Gene Lists Microarrays Protein arrays CHIP-chip SNP arrays RNA Seq Literature Search
Gene Expression Databases NCBI Gene Expression Omnibus (GEO) EBI ArrayExpress
Use of GEO Data
Gene Expression Databases NCBI Gene Expression Omnibus (GEO) EBI ArrayExpress
Array Express
Gene Expression Atlas
NextBio
NextBio
Thank you! Any questions? Carrie IwemaAnsuman Chattopadhyay