UCSC Genome Tools and Databases Jim Kent - Genome Bioinformatics Group University of California Santa Cruz.

Slides:



Advertisements
Similar presentations
Introductory to database handling Endre Sebestyén.
Advertisements

The (new) Table Browser. Talk Outline Table Browser History New Table Browser Features New Table Browser Implementation –all.joiner &.as files –Overall.
© Wiley Publishing All Rights Reserved. Using Nucleotide Sequence Databases.
Tools for understanding the sequence, evolution, and function of the human genome. Jim Kent and the Genome Bioinformatics Group University of California.
ENCODE to PhenCode: Combining HbVar with Genomic and ENCODE annotations Ross Hardison, representing: Curators and staff of HbVar and GenPhen PSU Center.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
PhenCode: Connecting genome to phenotype Belinda Giardine Cathy Riemer Ross Hardison Webb Miller Jim Kent PSU and UCSC.
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
HIV Vaccine Database and Web Works UCSC Bid. HIV Vaccine Database and Web Works UCSC Status Report.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
All.joiner A file that describes joinable fields in the UCSC Genome Databases.
ENCODE Data Coordination at UCSC Kate Rosenbloom ENCODE DCC Technical Project Manager UCSC Genome Bioinformatics Group September 2010 Genome Browser SAB.
VisiGene - AVirtual Microscope and Database for In Situ Images at genome.ucsc.edu Galt Barber, Donna Karolchik, David Haussler, Jim Kent VisiGene displays.
Copyright OpenHelix. No use or reproduction without express written consent1 Organization of genomic data… Genome backbone: base position number sequence.
Copyright OpenHelix. No use or reproduction without express written consent1.
Lab 3.41 Demo: Exploiting the UCSC Genome Browser Stefanie Butland UBC Bioinformatics Centre
UCSC Genome Browser Tutorial
Visualizing Genes and Evolution Jim Kent Genome Bioinformatics Group University of California Santa Cruz.
Gene Pix In Situ and other pictures of gene hybridization at UCSC.
The UCSC Genome Browser From Men to Mice WJ Kent, C Sugnet, T Furey, T Pringle, M Schwartz, R Baertsch, R Weber, K Roskin, D Thomas, S Rogic, M Diekhans,
Displaying associations, improving alignments and gene sets at UCSC Jim Kent and the UCSC Genome Bioinformatics Group.
Tools for understanding the sequence, evolution, and function of the human genome. Jim Kent and the Genome Bioinformatics Group University of California.
David Haussler Howard Hughes Medical Institute University of California, Santa Cruz Assembly, Comparison, and Annotation of Mammalian Genomes.
The Human Genome Project and 100 Million Years of Human Evolution
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
Databases at UCSC It just *looks* like 200,000 columns.
ENCODE Data Coordination at UCSC Kate Rosenbloom ENCODE DCC Technical Project Manager UCSC Genome Bioinformatics Group September 2010 Genome Browser SAB.
Spring 2006, v7 Copyright OpenHelix. No use or reproduction without express written consent 1 The UCSC Genome Browser Search, retrieve and display the.
Title: GeneWiz browser: An Interactive Tool for Visualizing Sequenced Chromosomes By Peter F. Hallin, Hans-Henrik Stærfeldt, Eva Rotenberg, Tim T. Binnewies,
Copyright OpenHelix. No use or reproduction without express written consent1.
Center for Biomolecular Science and Engineering University of California, Santa Cruz Robert Kuhn, PhD Center for Biomolecular Science and Engineering University.
1 Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
Copyright OpenHelix. No use or reproduction without express written consent1.
PhenCode Linking Human Mutations to Phenotype. PhenCode Brings the deep information on genotypes and phenotypes in locus specific databases (LSDBs) into.
1 Accelerated Web Development Course JavaScript and Client side programming Day 2 Rich Roth On The Net
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent 2 Overview of Genome Browsers Materials prepared by Warren C. Lathe, Ph.D.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
ENCODE Data Coordination at UCSC Kate Rosenbloom ENCODE DCC Technical Project Manager UCSC Genome Bioinformatics Group September 2010 Genome Browser SAB.
Tutorial session 2 Network annotation Exploring PPI networks using Cytoscape EMBO Practical Course Session 8 Nadezhda Doncheva and Piet Molenaar.
Introduction to Computers Lesson 10B. home Database A collection of related data or facts.
Introduction to Computers Lesson 10B. home Database A collection of related data or facts.
Grup.bio.unipd.it CRIBI Genomics group Erika Feltrin PhD student in Biotechnology 6 months at EBI.
1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
Professional Development Course 1 – Molecular Medicine Genome Biology June 12, 2012 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services.
Databases at UCSC It just *looks* like 200,000 columns.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
Copyright OpenHelix. No use or reproduction without express written consent1.
Lettuce/Sunflower EST CGPDB project. Data analysis, assembly visualization and validation. Alexander Kozik, Brian Chan, Richard Michelmore. Department.
The UCSC Table Browser & Custom Tracks Advanced searching and discovery using the UCSC Table Browser and Custom Tracks Osvaldo Graña CNIO Bioinformatics.
EBI is an Outstation of the European Molecular Biology Laboratory. Gautier Koscielny VectorBase Meeting 08 Feburary 2012, EBI VectorBase Text Search Engine.
EMBL-EBI Dimitris Dimitropoulos MSD-mine. EMBL-EBI MSD-mine overview  Web application for online data analysis and mining  For the advanced MSDSD researcher.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Accessing and visualizing genomics data
Genomes at NCBI. Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools lists 57 databases.
Welcome to the combined BLAST and Genome Browser Tutorial.
The Genome Genome Browser Training Materials developed by: Warren C. Lathe, Ph.D. and Mary Mangan, Ph.D. Part 2.
Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
Galaxy for analyzing genome data Hardison October 05, 2010
Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.
Relational Databases.
From: RTCGD: retroviral tagged cancer gene database
Angie S. Hinrichs1, Kate R. Rosenbloom1, Matthew L
University of Pittsburgh
Visualization of genomic data
Problems from last section
Tantan Liu, Fan Wang, Gagan Agrawal The Ohio State University
Volume 11, Issue 7, Pages (May 2015)
Presentation transcript:

UCSC Genome Tools and Databases Jim Kent - Genome Bioinformatics Group University of California Santa Cruz

Behind the Genome Browser ‘Genome’ database, one for each assembly of each genome. –hg17 (human genome assembly 17) –mm6 (mus musculus 6) –canFam1 (canis familiaris 1) hg17 has 1616 tables, but not really –Some tables split across chromosomes for speed –228 logical tables –Only ~30 different types of tables

Selected fields from related tables results: Ensemble Gene (ensGene) and Superfamily Description (sfDescription).

Custom Track Output Useful for visualizing results of queries in genome browser The way to produce more complex queries.

681/3329 (20%) of Ensemble not known also not conserved 1728/33,666 (5%) of Ensembl in general not conserved

Meta-data behind Table Browser The trackDb table describes each track. Table and field descriptions in AutoSql.as files, which also generate SQL code and C code to load/save from database and tab- separated files. Descriptions of how tables are connected in all.joiner file, which along with joinerCheck program checks database integrity.

.as Files - table and field docs table cpgIsland "Describes the CpG Islands" ( string chrom; "Human chromosome or FPC contig" uint chromStart; "Start position in chromosome" uint chromEnd; "End position in chromosome" string name; "CpG Island" uint length; "Island Length" uint cpgNum; "Number of CpGs in island" uint gcNum; "Number of C and G in island" float perCpg; "Percentage of island that is CpG" float perGc; "Percentage of island that is C or G" ) autoSql generates code from these. They also help document.

all.joiner - basic example The central concept is an identifier that appears in fields in multiple table, sometimes even multiple databases. $gbd is a variable that contains a comma-separated list of databases. An identifier record ends with a blank line. identifier softberryGeneName "Link together Fshgene++ gene structure, peptide, and homolog" $gbd.softberryGene.name $gbd.softberryPep.name $gbd.softberryHom.name

# Genbank/trEMBL Accessions and meaningful subsets thereof identifier genbankAccession external=genbank "Generic Genbank Accession. More specific Genbank accessions follow" $gbd.seq.acc identifier bacEndAccession typeOf=genbankAccession "Genbank accession of a BAC end read." $gbd.all_bacends.qName dupeOk $gbd.bacEndPairs.lfNames comma $hg.fishClones.beNames comma minCheck=0.70 typeOf - allows joins between parent and child, but not between siblings. dupeOk - allows more than one row with same identifier in primary table comma - indicates field is comma separated list of identifiers minCheck - indicates only a portion identifiers in field is in the primary table

identifier hugoName external=HUGO fuzzy "International Human Gene Identifier" $hg.refLink.name $hg.atlasOncoGene.locusSymbol $hg.kgAlias.alias $hg.kgXref.geneSymbol $hg.refFlat.geneName $hg.jaxOrtholog.humanSymbol hg13,hg15.geneBands.name “Biological” names for human genes are so messy, no validation is done (note ‘fuzzy’ keyword).

Other Databases Genome databases - one for each assembly of each organism: hg17, mm6, canFam1, etc. hgCentral - home to dbDb and user settings info. One database shared by all web servers. hgFixed - mostly microarray data. uniProt - Relationalized SwissProt/trEMBL database. go - Gene ontology terms and term/gene associations. genePix - gene image database

Gene Pix Image browser for in-situ and other gene- oriented pictures Hopefully in the long run will have a million images covering almost all vertebrate genes. (Needs new name, Gene Pix is a microarray analysis program. VisiGene?)

Data Sets Paul Gray - ~1000 mouse transcription factor genes - whole embryo & sections. These are in the database now. Other potential sources: –German AxelDB frog in situs –Japanese NIBB frog in situs (have nice browser) –Genepaint.org - mouse stuff –EMAGE and Jackson Lab mouse images From development and other journals, copyright issues. –Nathaniel Heintz BAC expression constructs –Eddy Rubin lab mouse embryos –UCSF cell-localization stuff?

Types of images Whole animal vs. sectioned tissues, vs. single cell. Single vs. multiple probes within same image. Single image vs. image series (movies even). RNA, Antibody, Fusion protein. Mitotic cell 3 stains

Gene Pix Programs genePixLoad - loads SQL database from a well defined format involving a.ra file and a tab separated file. See genePixLoad.doc loadMahoney - converts Paul Gray (Mahoney center) spreadsheet and image directory into genePixLoad format Hg/lib/genePix.c - interface with SQL database. hgGenePix - cgi script to display images knownToGenePix - makes table in mm5 (or other) genome database to connect known genes to genePix Ids.

Gene Pix Database Just a single database for all assemblies of all organisms. A knownToGenePix table in the assembly database.

GenePix tables fileLocation - directory bodyPart - whole, brain etc. sliceType - transverse, sagital treatment - tech details contributor - who done it Journal - scientific journal submissionSet - info about a whole set of images from one author sectionSet - links together separate sections of same specimen. Gene - gene info geneSynonym Antibody - info on an antibody probeType - antibody, RNA, fusion protein Probe - links gene, primers, sequence Ab. probeColor - color probe is imageFile - file containing image Image - a single image. imageProbe links image and probe

Some Anatomy Required

Especially with slices

Edinburgh mouse atlas

Theiler Stages

Later Stages

NIBB Japanese Frog Site

Earlier Stages

Who you gonna call? Angie Hinrichs - developer of 2nd and 4th versions of Table Browser. Genome browser hacker extraordinaire. Hiram Clawson - main mouse man at the moment. Developed ‘wiggle’ tracks.

Kate Rosenbloom - ENCODE project and multiple alignment display. Bob Kuhn - Software and database quality assurance. David Haussler - Ideas. Money. Comparative genomics.

More Acknowledgements UCSC - Robert Baertsch, Gill Bejerano, Galt Barber, Ron Chao, Mark Diekhans, Jorge Garcia, Patrick Gavin, Rachel Harte, Fan Hsu, Yontoa Lu, Crystal Lynch, Donna Karolchik, Jennifer Jackson, Ann Pace, Jacob Pedersen, Andy Pohl, Katie Pollard, Ali Sultan-Qurraie, Brian Raney, Krishna Roskin, Adam Siepel, Chuck Sugnet, Paul Tatarsky, Daryl Thomas, Heather Trumbower Penn State - Scott Schwartz, Laura Elnitski, Belinda Giardine, Ross Hardison, Minmei Hou, Webb Miller, Anton Nekrutenko Funding - NHGRI, HHMI, NCI, UCSC