Locating Gene/Protein Information September 26, 2013 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services Health Sciences Library System.

Slides:



Advertisements
Similar presentations
Databases (“knowledge bases”) used in genome analysis
Advertisements

Beyond PubMed and BLAST: Exploring NCBI tools and databases Kate Bronstad David Flynn Alumni Medical Library.
Bunu databases’in icine koy lecture 5i de sonuna
NCBI data, sliding window programs and dot plots Sept. 25, 2012 Learning objectives-Become familiar with OMIM and PubMed. Understand the difference between.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
COT 6930 HPC and Bioinformatics Bioinformatics Resources and Databases Xingquan Zhu Dept. of Computer Science and Engineering.
Genes, Proteins and Literature Searching Ansuman Chattopadhyay, PhD Molecular Biology Information Services Health Sciences Library System University of.
Searching Pubmed Database استخدام قاعدة المعلومات Pubmed د. سيناء عبد المحسن العقيل قسم الصيدلة الإكلينيكية برنامج مهارات البحث العلمي.
NCBI web resources I: databases and Entrez Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
On line (DNA and amino acid) Sequence Information Lecture 7.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
The NLM Controlled Vocabulary Medical Subject Headings (MeSH) PubMed for Trainers, Spring 2015 U.S. National Library of Medicine (NLM) and NLM Training.
Online Resources for Psychiatric Genetics 19 th March, 2009 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services Health Sciences Library.
Locating Gene/Protein Information January 11, 2011 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services Health Sciences Library System.
Protein databases Morten Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
Archives and Information Retrieval
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Lecture 2.21 Retrieving Information: Using Entrez.
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
The Cell, Central Dogma and Human Genome Project.
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
IST Computational Biology1 Information Retrieval Biological Databases 2 Pedro Fernandes Instituto Gulbenkian de Ciência, Oeiras PT.
The Protein Data Bank (PDB)
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Signaling Pathways and Summary June 30, 2005 Signaling lecture Course summary Tomorrow Next Week Friday, 7/8/05 Morning presentation of writing assignments.
Pathway Informatics 6 th July, 2015 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services Health Sciences Library System University of.
Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011.
Biology in Silico : Online Tools for “Omics” WCRC RETREAT SEPTEMBER 6TH, 2014 ANSUMAN CHATTOPADHYAY, PHD HEAD, MOLECULAR BIOLOGY INFORMATION SERVICE HEALTH.
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Genome Databases Computational Molecular Biology Biochem 218 – BioMedical Informatics.
On line (DNA and amino acid) Sequence Information
Bioinformatics.
Development of Bioinformatics and its application on Biotechnology
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
X-ray crystallography NMR cryoEM Experimental approaches for structural biology.
Introduction to databases Tuomas Hätinen. Topics File Formats Databases -Primary structure: UniProt -Tertiary structure: PDB Database integration system.
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
Searching PubMed® NCBI, NLM Resources, Micromedex -GSBS TTUHSC Preston Smith Library presents Rev. 08/17/14.
NCBI FieldGuide NCBI Molecular Biology Resources January 2008 Using Entrez.
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Professional Development Course 1 – Molecular Medicine Genome Biology June 12, 2012 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
NCBI Literature Databases: PubMed
Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System.
Bioinformatics and Computational Biology
Exploring and Exploiting the Biological Maze Zoé Lacroix Arizona State University.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Copyright OpenHelix. No use or reproduction without express written consent1.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
PubMed …featuring more than 20 million citations for biomedical literature from MEDLINE, life science journals, and online books.
Central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank,
An Introduction to NCBI & BLAST National Center for Biotechnology Information Richard Johnston Pasadena City College.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
NCBI: something old, something new. What is NCBI? Create automated systems for knowledge about molecular biology, biochemistry, and genetics. Perform.
Medical Subject Headings (MeSH)
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
Pathway Informatics 30 th March, 2016 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services Health Sciences Library System University.
Selection of Resources for the Development of an Information Service Program in Molecular Biology and Genetics Ansuman Chattopadhyay, PhD Information Specialist.
Keeping Current: Genetics Resources. This workshop will provide an overview of NCBI resources for finding-- Background information & journal articles.
Pathway Informatics 16th August, 2017
Introduction to Bioinformatics
Protein databases Henrik Nielsen
Archives and Information Retrieval
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Locating Gene/Protein Information September 26, 2013 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services Health Sciences Library System University of Pittsburgh

Topics Literature Informatics Gene / Protein Information Gateways Search Engine for MolBio / Bioinformatics Databases and Software

Literature Informatics Which genes/proteins are reported to be associated with the disease - Schizophrenia? Citations: 19 million Journals: 5200 Schizophrenia: 86, Schizophrenia gene: 5851…7295

Challenges in Literature Search Am I getting everything? Too much Information.. How to digest? A list with citations

Medical Subject Heading (MESH)

Medical Subject Headings (MeSH) The U.S. National Library of Medicine's controlled vocabulary (thesaurus) Arranged in a hierarchical manner called the MeSH Tree Structures Updated annually

MeSH Vocabulary Headings  over 24,000 representing concepts found in the biomedical literature (Body Weight, Kidney, Radioactive Waste) Subheadings  attached to headings to describe a specific aspect of a concept (adverse effects, metabolism, diagnosis, therapy) Supplementary Concept Records  over 172,000 terms in a separate chemical thesaurus -updated weekly (cordycepin, valspodar, tacrolimus binding protein 4) Publication Types (Letter, Review, Randomized Controlled Trial)

MeSH Tree Structure A. Anatomy B. Organisms C. Diseases D. Chemical and Drugs E. Analytical, Diagnostic and Therapeutic Techniques and Equipment F. Psychiatry and Psychology G. Biological Sciences H. Physical Sciences I. Anthropology, Education, Sociology and Social Phenomena J. Technology and Food and Beverages K. Humanities L. Information Science M. Persons N. Health Care V. Publication Characteristics Z. Geographic Locations

MeSH Indexing Source: NLM

MeSH Indexing Genes/Chemicals MeSH Terms

PubMed Query Using MeSH

Find articles on “Dengue outbreaks in India” by searching PubMed using Mesh terms Link to the video tutorial: Resources Mesh Browser : PubMed:

Building PubMed Queries TermBooleanTermBooleanTerm# papers DengueANDOutbreaks823 Dengue *ANDOutbreaks746 DengueANDOutbreaksANDIndia131 Dengue*ANDOutbreaksANDIndia116 DengueANDOutbreaks/ statistics and numerical data ANDIndia7 Dengue*ANDOutbreaks/ statistics and numerical data ANDIndia7

Useful links for MESH MESH Browser: Link to Wikipedia, Youtube videos, blogs etc on “medical subject heading”:  ways to improve your Pubmed searches by Carrie Iwema  pubmed-searches/ pubmed-searches/ Searching by using the MeSH Database. NCBI Handbook :  &part=pubmedhelp#pubmedhelp.Searching_by_using_t &part=pubmedhelp#pubmedhelp.Searching_by_using_t

Find genes that are reported to be associated with the disease SCHIZOPHRENIA by searching PubMed Link to the video tutorial: Resources PubMed Clinical Queries:

Topic-Specific PubMed Queries

Research on Optimal Search Strategies

PubMed Special Topic Queries

Search Filters

PubMed Search Filter: Medical Genetics ("schizophrenia"[MeSH Terms] OR "schizophrenia"[All Fields]) AND (("genetics, medical"[MeSH Terms] OR ("genetics"[All Fields] AND "medical"[All Fields]) OR "medical genetics"[All Fields] OR ("medical"[All Fields] AND "genetics"[All Fields])) OR ("genotype"[MeSH Terms] OR "genotype"[All Fields]) OR "genetics"[Subheading] AND ("genetics"[Subheading] OR "genetics"[All Fields] OR "genetics"[MeSH Terms]))

PubMed Search Result Display

Latest Innovations in Literature Searching GoPubMed Display search results sorted into meaningful topics and subtopics

GoPubMed

Find genes that are reported to be associated with the disease SCHIZOPHRENIA by using GoPubMed Link to the video tutorial: Resources GoPubMed:

GoPubMed Search Result

GoPubMed Search Result Analysis

GoPubMed Search Result Analysis

Latest Innovations in Literature Searching

PubMed driven Web Tools

Literature to Gene List GLAD4U 

Gene list to common functions

Literature to Gene list

NIH Grant Applications to Gene List

Curated Molecular Databases

Molecular Databases  Nucleic Acids Research : Annual databases IssueAnnual databases Issue  NAR: Annual Web Server IssueAnnual Web Server Issue  Oxford Journal : BioinformaticsBioinformatics  BioMedCentral: BMC Bioinformatics BioMedCentral: BMC Bioinformatics Growth of bioinformatics tools

Growth of Molecular Databases Source: Nodal Point Blog 2008: : 1330

GWA Studies Catalog

GWA Studies Catalog

Search Engine Just for Human Genetics CDC HuGENavigator :

 Find human genes that are reported to be associated with the Asthma  Find human SNPs that are reported to be associated with the Asthma Link to the video tutorial: Resources HugeNavigator:

Search Engine Just for Human Genetics

Search Engine Just for Human Genetics CDC HuGENavigator :

Search Engine Just for Human Genetics

Search Engine Just for Human Genetics CDC HuGENavigator :

Find Disease Causing SNPs What SNPs are associated with “Schizophrenia”?

Hands-On exercise on lit search Which proteins are related to Alzheimer’s disease? Where/who are the leading centers and scientists for liver transplantation? Which hormones are Autistic Disorder associated with?

Search Engine for Bioinformatics Tools

Biomedical and Life Sciences Search Engines  OBRC : University of Pittsburgh  Vadlo  OReFil : University of Tokyo

Search.HSLS.MolBio

Search.HSLS.MolBio Integrated search system  Databases & Software  Articles on Databases & Software  Genes/Proteins  Pathways  Protocols  Seminar/Talks Videos  Recommended Articles Tabbed browsing Clustered search results

Search term: “phosphorylation”

Molecular Databases and Software: search term: “Phosphorylation”

Search Result Page

Citation Trackers

Searh PubMed Articles on Databases and Software : “phosphorylation”

Articles on Databases and Software

Articles on Prediction of Phosphorylation Sites

Prediction of Phosphorylation Sites

MetaPredPS

Clustering Remix

Genes/Proteins Info

Entrez Gene

BioBase Knowledge Library

Protocols:

Seminar Talks Video

Seminar Talks Video

Recommended Articles Faculty of 1000 Biology: a literature awareness tool that highlights and reviews the most interesting papers published in the biological sciences, based on the recommendations of a faculty of well over 2300 selected leading researchers.

Faculty of

Recommended Articles

Recommended Articles

Gene/Protein Information Mining

Bioinformatics Databases & Software Providers National Center for Biotechnology Information (NCBI)  Home page Home page  Site map Site map  Resource Guide Resource Guide European Bioinformatics Institute (EBI)  Home page Home page  Databases Databases  Software Software

Gene Information Gateways o Open access resources: National Center for Biotechnology Information (NCBI)  Genbank  Refseq  Entrez Gene  Gene Expression Omnibus (GEO)  OMIM

Protein Information Hubs o Open access resources: European Bioinformatics Institute (EBI)  Uniprot  Interpro  Prosite  STRING UCSC Genome Bioinformatics  BLAT Search  Gene Detail Page

Protein Information Hubs o Open access resources: National Center for Biotechnology Information (NCBI)  Refseq  Entrez Gene  Conserved Domain Database (CDD)  Molecular Modeling Database (MMDB)  3D structure viewer: Cn3D

Gene/Protein Information Chromosomal location, mRNA, genomic seq, orthologs, paralogs, regulatory elements, Amino acid seq, domain architecture, protein structure, post translational modifications Gene expression, biological pathways, protein interaction map, disease association, biomarkers

Gene Questions ? What is its function? What are its neighboring genes? What is its genomic seq? How many splice varients are there? What are its intron-exon architechure? What diseases are associated with it? Which tissues it expressed ? How can I get its cDNA clone?

SNP Genomic Sequence Expression Profile Interacting Partners 3D Structure mRNA Sequence Chromosomal Localization Disease Amino acid Sequence Homologous Sequences NCBI : Entrez Gene

Entrez Gene Find: gene symbols and aliases sequences: genomic, mRNA, protein intron-exon architecture genomic context: neighboring and antisense genes Interacting partners associated gene ontology terms: function, cellular component and biological process

Entrez Gene a searchable database of genes, from RefSeq genomes, and defined by sequence and/or located in the NCBI Map ViewerRefSeq Statistics  Gene: 7974 organisms  Genbank: 160,000 organisms each record represents a single gene from a given organism

NCBI Sequence Databases GenBank  archival database of nucleotide sequences from >160,000 organisms More infoMore info GenPept  conceptual translation of GenBank CDS Refseq  based on GenBank record, non-redundant expert verified databases of reference sequences

International Nucleotide Sequence Database Collaboration

Primary Vs Derivative databases

RefSeq Scope & Accessions Genomic DNA NC_ complete genome, complete chromosome, complete plasmid NG_ genomic region NT_ genomic contig mRNA - NM_ Protein - NP_ more about RefSeq scope and accessions...

RefSeq Status Codes Provisional Reviewed Predicted Genome Annotation  more about RefSeq status codes more about RefSeq status codes

Hands on Find mRNA sequence for your gene of interest (p53, BRCA1, EGFR, PLCg1) Start page:  Entrez core nucleotidecore nucleotide  Use Limits, History and Preview Index

Sequence Format GenBank  Header  Features  Sequence FASTA  Sequence  Example: U49845U49845  Sample GenBank record Sample GenBank record  Sequence Revision History tool Sequence Revision History tool

Video Tutorials

Find mRNA Sequence for Reelin Gene.

Gene Function What is its function? Entrez Gene Page:  Summary (TOC)  Gene Ontology  GeneRIFs  Pathways (TOC)  Biosystems (Links)

Gene Ontology (GO)GO  Controlled vocabulary tagging Function Biological Processes Cellular Component

Gene Ontology (GO) and KEGG GO information page GO evidence codes KEGG  Information page Information page

Function How many splice variants are there? What is/are its sequence? Entrez Gene Page:  Genomic regions… (TOC)  UCSC (Links) Video Tutorials

Alternative Splicing

Intron-Exon Coordinates What are its intron-exon architechure? Entrez Gene Page: Display Change it from Full report to Gene Table Video Tutorials

Neighboring Genes What are its neighboring genes? Entrez Gene Page:  Genomic context (TOC) Video Tutorials

Chromosomal location

Associated Diseases What diseases are associated with it? Entrez Gene Page:TOC General Information_Phe notype Links OMIM HuGE Navigator Video Tutorials

Homologene What are its homologous genes? Entrez Gene Page: Link Homologene change Display settings Video Tutorials

Reagents How can I get its cDNA clone?..antibodies?.. siRNA ? Entrez Gene Page: TOC: Additional Links  Research Materias  Exact Antigen Video Tutorials

Protein Information Gateways

UniprotKB : Universal Protein Resource : a comprehensive, centralized protein information resource  Developed by a consortium: European Bioinformatics Institute (EBI) the Swiss Institute of Bioinformatics (SIB) the Protein Information Resource (PIR)  Comprised of: --Swiss-Prot: biologist-curated annotation data --TrEMBL: computationally annotation data --PIR-International Protein Sequence Database (PIR-PSD): the most comprehensive and expertly-curated protein sequence database in the public domain for over 20 years.  Funded by: NIH, NSF, the European Union and the Swiss Federal government  Link to Wiki, YouTube, Blogs and Tweets: Tutorial Video:

Protein Questions ? What is its  Function?  Amino acid sequence? … molecular wt? isoelectric point (PI)? … post translational modifications? … presence of domain/pattern/profile? … hydrophobicity? … homologous orthologs? Etc.  Structure? … secondary and tertiary?  Interaction Partner ?

Uniprot Video Tutorial

Protein Function from UniprotKB Uniprot Search: Look under: general annotation_Function, ontologies_keywords, geneontology

Protein Sequence Uniprot Sequence annotations sequences Gene Genomic regions, transcripts, and products ccds (consensus cds report) UCSC Sequence and links

Protein Sequence Analysis PTM Uniprot Seq annt IPA Modificatio- ns and Regulation PI/MW Uniprot Seq_Tool Compute PI Hydroph- obicity Uniprot Seq_Tool ProtScale Peptide Digest Uniprot Seq_Tools PeptideMa ss PeptideCutt er Homolog ous Seq Entrez Gene Homologe ne Domain/patt ern Uniprot Sequence annotation InterPro Entrez gene Conserved Domain

Protein Domain Resources Protein Domain Databases: InterPro

Protein Domains Wikipedia: Wikipedia  A protein domain is a part of protein sequence and structure that can evolve, function, and exist independently of the rest of the protein chain. Each domain forms a compact three-dimensional structure and often can be independently stable and folded. Many proteins consist of several structural domains. One domain may appear in a variety of evolutionarily related proteins. Domains vary in length from between about 25 amino acids up to 500 amino acids in length. The shortest domains such as zinc fingers are stabilized by metal ions or disulfide bridges. Domains often form functional units, such as the calcium-binding EF hand domain of calmodulin. structureevolvefoldedEF hand domaincalmodulin

Protein Domain: SH3 Src homology 3 domains; SH3 domains bind to proline-rich ligands with moderate affinity and selectivity, preferentially to PxxP motifs; they play a role in the regulation of enzymes by intramolecular interactions, changing the subcellular localization of signal pathway components and mediate multiprotein complex assemblies.

Protein Structure Primary Secondary Tertiary Quarternary Useful links: Taken from wikipedia

Protein Structure NCBI

Finding Protein Structure

Structure Databases and Viewer Databases: RCSB Protein Data Bank (PDB) State University of New Jersey (Rutgers), the San Diego Supercomputer Center at the University of California San Diego, the University of Wisconsin-Madison Link MMDB NCBI's structure database is called MMDB (Molecular Modeling DataBase), and it is a subset of three-dimensional structures obtained from the Protein Data Bank (PDB), excluding theoretical models..PDB Viewer:  Cn3D :  a helper application for your web browser that allows you to view 3-dimensional structures from NCBI's Entrez retrieval service.Entrez  Rasmol: EBI  First glance in j mol : A simple tool for macromolecular visualization. (More..)More..

Protein Structure Search for the 3D structure of P53  Entrez structure View the crystal structure of mouse p53 core domain (MMDB: 42987) or Crystal Structure Of A P53 Core Dimer Bound To Dna ( PDB:2GEQ)

Manipulating the Structure Viewer Window

Find Similar Structure: NCBI VAST

NCBI BLink BLink ("BLAST Link") displays the results of BLAST searches that have been done for every protein sequence in the Entrez Proteins data domain.BLAST LinkEntrez Proteins To access it, follow the BLink link displayed beside any hit in the results of an Entrez Proteins search.

Hands-on Protein Structure View the crystal structure of Chronophin (PDB entry: 2P69).  A variant of this protein with mutations in its amino acid sequence has been isolated. Can you predict any effect of its mutations into its function?  Hint: Find the amino acid residues which are in close contact (3.5 A) with PYRIDOXAL-5'-PHOSPHATE (PLP).  Label the amino acids and save the picture in PNG format. Learn more on Chronophin structure at:

Hands-on Protein Structure of Chronophin

Sequence Alignment in Cn3D NCBI

Hands-On  Can you identify the human protein which contains a short peptide sequence: GPDGMPVIYHGHTLTTKIKFSDVLHTIKE ? What is its function? What is its calculated PI and molecular wt? Which region of this protein is most hydrophobic? Locate five experimentally verified S/T/Y phosphorylation sites present in this protein. Find the homologous mouse and fruit fly orthologs of this human protein and report the % protein identity it shares with these orthologs. How many protein domains are reported to be present in this human protein? Find the location of its largest domain.

Licensed Tools for Gene/Protein Information

HSLS Licensed Tools BioBase Metacore Ingenuity IPA

Gene/Protein facts from Biobase

BioBase BioKnowledge Library

Protein Function from IPA

Gene Lists Microarrays Protein arrays CHIP-chip SNP arrays RNA Seq Literature Search

Gene Expression Databases NCBI Gene Expression Omnibus (GEO) EBI ArrayExpress

Use of GEO Data 

Gene Expression Databases NCBI Gene Expression Omnibus (GEO) EBI ArrayExpress

Array Express

Gene Expression Atlas

NextBio

NextBio

Thank you! Any questions? Carrie IwemaAnsuman Chattopadhyay