The (new) Table Browser. Talk Outline Table Browser History New Table Browser Features New Table Browser Implementation –all.joiner &.as files –Overall.

Slides:



Advertisements
Similar presentations
Widhy Hayuhardhika NP, S.Kom. Overview of database structure Connecting to MySQL database Selecting the database to use Using the require_once statement.
Advertisements

Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Chapter 3 – Web Design Tables & Page Layout
Access 2007 ® Use Databases How can Microsoft Access 2007 help you structure your database?
1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Application Graphic design / svetagraphics.com 01 FRAMEWORK data service.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Guide to Oracle10G1 Introduction To Forms Builder Chapter 5.
SUNY Morrisville-Norwich Campus- Week 7 CITA 130 Advanced Computer Applications II Spring 2005 Prof. Tom Smith.
Copyright OpenHelix. No use or reproduction without express written consent1.
Chapter 20 Thinking Big: Functions. Copyright © 2006 Pearson Addison-Wesley. All rights reserved Anatomy of a Function Functions are packages for.
UCSC Archaeal genome browser Advanced browsing September 19, 2006 David Bernick, Aaron Cozen and Todd Lowe September 19, 2006 David Bernick, Aaron Cozen.
Lab 3.41 Demo: Exploiting the UCSC Genome Browser Stefanie Butland UBC Bioinformatics Centre
Designing a Database Unleashing the Power of Relational Database Design.
XP New Perspectives on Microsoft Office Excel 2003, Second Edition- Tutorial 11 1 Microsoft Office Excel 2003 Tutorial 11 – Importing Data Into Excel.
PowerPoint Presentation for Dennis, Wixom & Tegarden Systems Analysis and Design Copyright 2001 © John Wiley & Sons, Inc. All rights reserved. Slide 1.
1 Chapter 2 Reviewing Tables and Queries. 2 Chapter Objectives Identify the steps required to develop an Access application Specify the characteristics.
The University of Adelaide Table Talk: Using tables in Word Peter Murdoch March 2014 PREPARING GOOD LOOKING DOCUMENTS.
© 2002 ComputerPREP, Inc. All rights reserved. Word 2000: Working with Long Documents.
Introduction –All information systems create, read, update and delete data. This data is stored in files and databases. Files are collections of similar.
1 PHP and MySQL. 2 Topics  Querying Data with PHP  User-Driven Querying  Writing Data with PHP and MySQL PHP and MySQL.
TM 7-1 Copyright © 1999 Addison Wesley Longman, Inc. Physical Database Design.
PhenCode Linking Human Mutations to Phenotype. PhenCode Brings the deep information on genotypes and phenotypes in locus specific databases (LSDBs) into.
McGraw-Hill Technology Education © 2004 by the McGraw-Hill Companies, Inc. All rights reserved. Office Access 2003 Lab 3 Analyzing Data and Creating Reports.
Fission Yeast Computing Workshop -1- Searching, querying, browsing downloading and analysing data using PomBase Basic PomBase Features Gene Page Overview.
NOSQL DATABASES Please remember to read the NOSQL Distilled book and the Seven Databases book.
Using Special Operators (LIKE and IN)
Object Persistence Design Chapter 13. Key Definitions Object persistence involves the selection of a storage format and optimization for performance.
Slide 1 Object Persistence Design Chapter 13 Alan Dennis, Barbara Wixom, and David Tegarden John Wiley & Sons, Inc. Slides by Fred Niederman Edited by.
Implementing the Theory dBase Operations in MS Access.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
Editing and Analysis. Notes on Editing We have covered feature editing But not table editing.
Introduction to Enterprise Guide Jennifer Schmidt Rhonda Ellis Cassandra Hall.
Access 2007 ® Use Databases How can Microsoft Access 2007 help you structure your database?
Relational Databases.  In week 1 we looked at the concept of a key, the primary key is a column/attribute that uniquely identifies the rest of the data.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Databases at UCSC It just *looks* like 200,000 columns.
McGraw-Hill/Irwin The O’Leary Series © 2002 The McGraw-Hill Companies, Inc. All rights reserved. Microsoft Access 2002 Lab 3 Analyzing Tables and Creating.
The UCSC Table Browser & Custom Tracks Advanced searching and discovery using the UCSC Table Browser and Custom Tracks Osvaldo Graña CNIO Bioinformatics.
数据库使用 杨建华 2010/9/28. Outline of the Topics UCSC and Ensembl Genome Browser (Blat vs Blast vs Blastz vs Multiz) 挖掘数据用 Table Browser 或 BioMart 用户友好化你的数据.
A337 - Reed Smith1 Structure What is a database? –Table of information Rows are referred to as records Columns are referred to as fields Record identifier.
Database Management Supplement 1. 2 I. The Hierarchy of Data Database File (Entity, Table) Record (info for a specific entity, Row) Field (Attribute,
Copyright © 2003 ProsoftTraining. All rights reserved. Perl Fundamentals.
FLORIDA PUBLIC HURRICANE LOSS MODEL V6.1 Computer Science February 2-4, Dr. Shu-Ching Chen School of Computing and Information Sciences Florida.
Lesson 13 Databases Unit 2—Using the Computer. Computer Concepts BASICS - 22 Objectives Define the purpose and function of database software. Identify.
A table is a set of data elements (values) that is organized using a model of vertical columns (which are identified by their name) and horizontal rows.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
DATA Spatial Data – where things are Non Spatial Data or Attribute Data – What things are Data in a computer database are managed and accessed through.
1 Copyright © 2005, Oracle. All rights reserved. Following a Tuning Methodology.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
SQL Query Analyzer. Graphical tool that allows you to:  Create queries and other SQL scripts and execute them against SQL Server databases. (Query window)
Bigtable: A Distributed Storage System for Structured Data
Manipulating Data Lesson 3. Objectives Queries The SELECT query to retrieve or extract data from one table, how to retrieve or extract data by using.
The Genome Genome Browser Training Materials developed by: Warren C. Lathe, Ph.D. and Mary Mangan, Ph.D. Part 2.
Visual Database Creation with MySQL Workbench 도시정보시스템 설계
Execution Plans Detail From Zero to Hero İsmail Adar.
Database (Microsoft Access). Database A database is an organized collection of related data about a specific topic or purpose. Examples of databases include:
Building Dashboards with JMP 13 Dan Schikore SAS, JMP
Lab 7.2.
Using command line tools to process sequencing data
Databases.
Using SQL to Prepare Data for Analysis
Arrested by the CAP Handling Data in Distributed Systems
Physical Database Design
Advance Database Systems
Implementation of Relational Operations
Tutorial 7 – Integrating Access With the Web and With Other Programs
Evaluation of Relational Operations: Other Techniques
Databases Continued 10/18/05.
Presentation transcript:

The (new) Table Browser

Talk Outline Table Browser History New Table Browser Features New Table Browser Implementation –all.joiner &.as files –Overall control and data flow –Joining and intersection modules Limits and future directions

Table Browser History Goal - annotations over a particular region of genome in text rather than graphic format Krish - did first successful implementation - separated tables into positional and non-positional, merged chrN_ tables, split off hgFind. Angie - added sequence output, filters, intersections, and many help pages. These versions of the table browser were called hgText

Why a New Table Browser hgText is powerful, but much of the power is not obvious in the first page. In hgText the association between tracks and tables was not clear. No way to join fields across related tables.

New Table Browser Flip to demoing new table browser online. –Show overall controls –Demo getting genome position, common name, and review status for refSeq on ENCODE. –Demo getting alt-splice varients with knownCanonical and knownIsoforms –Demo custom track created from filtered cpgIslands (>= 500 bases >= 0.9 Exp/Obs) –Intersect custom fat cpg track with most conserved, requiring 75% overlap, output as custom track –Intersect conserved fat cpg with exonophy, requiring <= 5% overlap, output as hyperlink (custom track output crashes!)

New Table Browser Implementation Built using: –AutoSql.as files to describe table fields –all.joiner file to describe table relationships –.bed based intersection and sequence output code from old table browser –About 8000 lines of new C code in 19.c files in src/hg/hgTables

Data Flow Each region (piece of a chromosome) processed separately Filter is turned into a SQL where clause Field oriented output, especially selected tables is handled by one branch of code. –SQL rows -> joining routines -> output GFF, Custom Track, Sequence, Hyperlink, and Summary Stats outputs handled by a branch of code that turns things into BED format internally: –SQL rows -> BED -> intersecting -> output Need to merge fields & BEDs to get joining and intersecting to happen at the same time ultimately.

Joining Code Use all.joiner to find out route from primary table to other tables in join. Construct SQL query for each table that applies table filters and region and includes key fields even if not part of final output. Construct a row object (array of lists) for each row returned on primary table. Construct a hash keyed by joining field of primary table, with row objects as values. Execute SQL query for next table, and when keys match add info to row object. Repeat with third and subsequent tables if any.

Limits/Features of Joining Code Unless a filter is applied, non-positional tables will be scanned completely. This takes 3 minutes for gbCdnaInfo. (Hint, add filter type=mRNA) Joining code only applied to field oriented output. Will handle joins across split tables. Can chop of prefixes and suffixes on a key field before joining if specified in all.joiner. (Needed for chopping off version number in some Ensembl tables for instance) Avoids combinatorical explosion of output rows by allowing fields to contain lists.

Intersecting Code Primarily inherited from hgText. Uses hTableInfo (call in hg/lib/hdb.c) which reports which fields in database store chromosome, start, end, etc. Analyses hTableInfo to figure out how many fields in corresponding BED structure, and how to query database and massage output to get a BED. Converts second table in intersection into a bitmap. Counts up number of bases in bitmap that intersect each bed item in first table. (For pure bitwise operations converts first table to bitmap too.)

Limits and Features of Intersections Not applied to field or MAF output. Information is lost in converting to BED. Does allow intersection code for sequence, GFF, custom track, BED, statistics, and hyperlinks output to go through same path.

Future Directions Make a combined BED/Row structure to bring together intersections and joining. Polish sequence output in some places. Get.as file info for all tables. Encourage people to pay a little more attention to database concerns as well as genome browser concerns when designing tables. See if can phase out split tables by tuning MySQL aggressively.