gene-CENTRIC database

Slides:



Advertisements
Similar presentations
What is RefSeqGene?.
Advertisements

© Wiley Publishing All Rights Reserved. Using Nucleotide Sequence Databases.
Beyond PubMed and BLAST: Exploring NCBI tools and databases Kate Bronstad David Flynn Alumni Medical Library.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
On line (DNA and amino acid) Sequence Information Lecture 7.
The National Center for Biotechnology Information (NCBI) a primary resource for molecular biology information Database Resources.
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
Gene Ontology John Pinney
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
Tutorial 7 Genome browser. Free, open source, on-line broswer for genomes Contains ~100 genomes, from nematodes to human. Many tools that can be used.
Introduction to Bioinformatics Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistant: Shula Shazman Sivan Bercovici Course web site :
Biological databases.
Lecture 2.21 Retrieving Information: Using Entrez.
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
UCSC Genome Browser Tutorial
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
On line (DNA and amino acid) Sequence Information
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
Copyright OpenHelix. No use or reproduction without express written consent1.
Linking Diseases and Genes through Informatics Knowledge Bases and Ontologies Joyce A. Mitchell, Ph.D. National Library of Medicine University of Missouri.
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
Searching PubMed® NCBI, NLM Resources, Micromedex -GSBS TTUHSC Preston Smith Library presents Rev. 08/17/14.
NCBI FieldGuide NCBI Molecular Biology Resources January 2008 Using Entrez.
CANDID: A candidate gene identification tool Janna Hutz March 19, 2007.
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
SAGExplore web server tutorial for Module II: Genome Mapping.
Copyright OpenHelix. No use or reproduction without express written consent 2 Overview of Genome Browsers Materials prepared by Warren C. Lathe, Ph.D.
Copyright OpenHelix. No use or reproduction without express written consent1.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
DONNA MAGLOTT, PH.D. PRO AND MEDICAL GENETICS RESOURCES AT NCBI.
COURSE OF BIOINFORMATICS Exam_31/01/2014 A.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
Professional Development Course 1 – Molecular Medicine Genome Biology June 12, 2012 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services.
SAGExplore web server tutorial for Module I: Genome Explore.
Sackler Medical School
Biological databases Exercises. Discovery of distinct sequence databases using ensembl.
NCBI Literature Databases: PubMed
BIOLOGICAL DATABASES. BIOLOGICAL DATA Bioinformatics is the science of Storing, Extracting, Organizing, Analyzing, and Interpreting information in biological.
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
Copyright OpenHelix. No use or reproduction without express written consent1.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
SAGExplore web server tutorial. The SAGExplore server has three different modules …
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Copyright OpenHelix. No use or reproduction without express written consent1.
Genomes at NCBI. Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools lists 57 databases.
Welcome to the combined BLAST and Genome Browser Tutorial.
COURSE OF BIOINFORMATICS Exam_30/01/2014 A.
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Copyright OpenHelix. No use or reproduction without express written consent1.
Keeping Current: Genetics Resources. This workshop will provide an overview of NCBI resources for finding-- Background information & journal articles.
Introduction to Genes and Genomes with Ensembl
NCBI Molecular Biology Resources
Using ArrayExpress.
Functional Annotation of the Horse Genome
Annotation: linking literature to gene products
Welcome to the Protein Database Tutorial
BIOL 3020 GENOMIC BIOLOGY LABORATORY 05A
Updates and Future Direction
Gene Safari (Biological Databases)
Problems from last section
Presentation transcript:

gene-CENTRIC database Daniel Svozil

NCBI Nucleotide Exercise How many nucleotide sequences are there from the bacterium Chlamydia trachomatis in the NCBI Sequence Database? How many mRNA sequences for collagen genes from nematode worms are there in the NCBI Database? How many nucleotide sequences from nematode worms are there in the RefSeq Database? How many structures containing nucleotide sequences from nematode worms are known? How many nucleotide sequences were submitted to NCBI by Matthew Berriman? “Chlamydia trachomatis”[ORGN] - Found 40565 nucleotide sequences. Nucleotide (40417) GSS (148) (Ask students what is GSS?) (12.3. 2012) collagen AND nematode[Organism], Limits mRNA (or on the right there is directly the number) … 1041 (12.3. 2012) nematode[Organism], Limits RefSeq (or on the right there is directly the number) … 168 612 (12.3. 2012) OR Nematode[ORGN] AND srcdb_refseq[PROP] Nematode[ORGN], Limits: Source database PDB Nematode[ORGN] AND srcdb_pdb[PROP] 5 sequences (12.3. 2012) s “Berriman M”[AU] … Found 475057 nucleotide sequences. Nucleotide (265333) EST (121075) GSS (88649) Note that unfortunately the NCBI website does not allow us to search for “Berriman Matthew”[AU] so we cannot be sure that all of these sequences were submitted by Matthew Berriman. Note also that the search above will find sequences that were either submitted to the NCBI database by M. Berriman, or described in a paper on which M. Berriman was an author. Therefore, not all of the sequences found were necessarily submitted by M. Berriman.

Gene-centric databases Sequence databases are great tools when you want to come up with a bibliography for a particular sequence. However, they do not provide easy access to sequence data when your query deals with broader issues related to a gene or function. The second-generation nucleotide-sequence databases have adopted a more gene-centric perspective. all the sequence information relevant to a given gene is made accessible at once NCBI Gene http://www.ncbi.nlm.nih.gov/gene Gene described in http://www.ncbi.nlm.nih.gov/books/NBK21085/ Gene Help: http://www.ncbi.nlm.nih.gov/books/NBK3839/ Gene FAQ: http://www.ncbi.nlm.nih.gov/books/NBK3840/

NCBI Gene Search for DUT gene in human How will you get from the sequence record U90223 to the gene record this sequence belongs to? The central functions of Gene are to establish unique identifiers (GeneID) for genes that can be tracked and, in so doing, support accurate connections with the defining sequences, nomenclature and other descriptors. GeneID – integer, species specific (GeneID assigned to dystrophin in human is different from that in any other species) Find human and mouse genes having reviewed RefSeq records. DUT[gene] and human[organism] in NCBI Gene, or use Advanced search in NCBI gene search U90223 in NCBI Nucleotide, Display Summary, right column Related information, click Gene Click on Limits, check Mus Musculus and Homo Sapiens, Limit by RefSeq Status: Reviewed

NCBI Gene Gene does not claim to be comprehensive; rather, it serves as a guide to additional information in other databases. For example, a gene can be represented by multiple sequences, but not all are reported explicitly from Gene. Instead, connections are supplied from Gene to Entrez Nucleotide, Entrez Protein, and Blink (BLAST Link), where more sequences with significant similarity can be retrieved. In addition to the multiple links to NCBI databases, LinkOuts submitted to Gene from external databases support ready navigation to more gene-specific information.

NCBI Gene Go to the DUT gene in human record. Right column – TOC of the record Additional links in TOC … contain LinkOut What is NCBI LinkOut? Right column – Links … contain connections to other database Go to the Protein database Link Right column – Find related data – Database: Protein It runs BLAST LinkOut - http://www.ncbi.nlm.nih.gov/projects/linkout/index.html LinkOut is a service that allows you to link directly from PubMed and other NCBI databases to a wide range of information and services beyond the NCBI systems. LinkOut aims to facilitate access to relevant online resources in order to extend, clarify, and supplement information found in NCBI databases.

NCBI Gene genomic context – umisteni na chromosomu, OMIM (MIM) – Online Mendelian Inheritance in Man. OMIM is a directory of human genes and genetic disorders, with links to literature references, sequence records, maps, and related databases.

NCBI Gene Location of the gene on the chromosome in non-sequence coordinates. If the gene has been included in a genomic annotation, the section also diagrams neighboring genes and indicates their orientations. The gene being shown on the diagram is in maroon.

three transcript variants NCBI Gene three transcript variants This portion is provided when a gene has been annotated on a genomic RefSeq, in other words, when the position of the intron/exon/coding region information is available in some genomic coordinate system. How annotated structures are rendered is described in http://www.ncbi.nlm.nih.gov/projects/sviewer/help/legends.pdf

Genomic context You can use this section to: view the intron/exon/coding region organization on a genomic RefSeq identify the RefSeqs that correspond to any RNA or protein product and see an overview of the exons they represent alter the zoom level of the display move upstream and downstream in sequence being displayed navigate to a full display of the genomic context via the link Go to nucleotide Graphics navigate to the genomic sequence of the gene in FASTA format navigate to the genomic sequence of the gene in GenBank format. Change the display of the genomic sequence on which the gene is annotated. The default display is the chromosome of the reference assembly; for some taxa there are alternate assemblies. For human, the RefSeqGene can also be selected. move upstream/down … just drag in the window RefSeqGene defines genomic sequences to be used as reference standards for well-characterized genes

Obtaining gene sequence Genomic regions section of the full report – click on FASTA If you want to adjust the range to capture, modify the values in the Change region shown tool on the FASTA display and click on Update View. - from http://www.ncbi.nlm.nih.gov/books/NBK3840/#genefaq.Obtaining_genomic_sequence

Obtaining gene sequence Genomic regions section – click on Graphics Click these arrows Place your cursor over this bar - from http://www.ncbi.nlm.nih.gov/books/NBK3840/#genefaq.Obtaining_genomic_sequence again, region can be adjusted in FASTA view

Obtaining gene sequence Genomic context section – MapViewer Click on Download/View Sequence/Evidence in the upper right of Map Viewer display, or click on dl in the label for the gene.

Obtaining gene sequence How many transcript variants exist for human TP53 gene? Search for TP53[gene] AND human[orgn] In GenBank View find mRNAs in FETURES seven variants - from GenBank record of TP53 gene the sequence can also be obtained, change view to FASTA

Obtaining gene sequence For a limited number of genes in the human genome, gene-specific genomic RefSeqs, termed RefSeqGene, have been created. These have a RefSeq accession beginning with NG_ and can be retrieved from the nucleotide database using the query keyword refseqgene. What is the accession number of RefSeqGene of TP53 gene?

GeneRIF Gene Reference into Function A GeneRIF is a concise phrase describing a function or functions of a gene, with the PubMed citation supporting that assertion. The majority of GeneRIFs have been provided by a collaboration between the NLM's Index Section and NCBI. There is no constraint on the number of independent submissions of GeneRIFs per PubMed id, although those from non-NLM sources are reviewed by RefSeq staff. What is GeneRIF? from http://www.ncbi.nlm.nih.gov/books/NBK3841/

Phenotypes This section reports the effect of the gene on phenotype, especially disease. For human genes, the first row links to the Phenotype-Genotype Integrator, (PheGenI), a web portal providing a tabular display of genome-wide association study results relating the gene and/or its expression to a phenotype. Named phenotypes are provided in subsequent rows. Each phenotype row may be expanded, providing links to more information as available. PheGenI pronounce FEE-GEE-NEE

Interactions There are two major subcategories of information reported as Interactions: HIV-1 interactions and general interactions (TP53 has both). The HIV-1, Human Protein Interaction Database focuses on the human proteins that have been shown to interact with proteins from HIV-1. the other interactant product of the gene that is part of the interaction source of these data description of the interaction

General gene information Several subcategories of information including Pathways: A description of pathways that include this gene with links to more information about that pathway. Homology: A partial listing, with links, of orthologs in other species. GeneOntology (GO): The specific GO terms are listed by source of the information, category, term, evidence information, and links to supporting publications.

Gene Ontology (GO) Unify the representation of gene and gene product attributes across all species. Project aims: Maintain and develop controlled vocabulary of gene and gene product attributes Annotate genes and gene products Provide tools for easy access to all aspects of the data provided by the project The ontology covers three domains: molecular function, the elemental activities of a gene product at the molecular level, such as binding or catalysis biological process, operations or sets of molecular events with a defined beginning and end, pertinent to the functioning of integrated living units: cells, tissues, organs, and organisms. cellular component, the parts of a cell or its extracellular environment http://www.geneontology.org/ AmiGO browser - http://amigo.geneontology.org/cgi-bin/amigo/go.cgi

NCBI Reference Sequences (RefSeqs) This section describes the gene-specific NCBI reference sequences (RefSeqs) that have been established for this gene.

Exercise retrieve all records for human genes that are associated with OMIM and have been annotated on the genome Advanced search + Limits – Homo Sapiens Full list of Entrez filters: http://eutils.ncbi.nlm.nih.gov/entrez/query/static/entrezlinks.html

Selected Entrez filters http://www.ncbi.nlm.nih.gov/books/NBK3841/table/EntrezGene.T.filter_sets_partial_complet/?report=objectonly

Genome-centric databases Nucleotide sequences are routinely determined at the whole genome or chromosome scale – at least for microorganisms We now have information not only about individual gene sequences, but also e.g. about their relative positions or strand orientation. To take advantage of this more global information, researchers have had to design state-of-the-art genome-centric sequence-information management systems that can connect specialized sequence collections with browsing tools. MapViewer described in http://www.ncbi.nlm.nih.gov/books/NBK21089/ MapViewer exercises from http://www.ncbi.nlm.nih.gov/books/NBK21096/ UCSC Genome Browser video tutorials: http://libguides.mit.edu/bits