Presentation is loading. Please wait.

Presentation is loading. Please wait.

EnsMart: A Generic System for Fast and Flexible Access to Biological Data Arek Kasprzyk et al (2004) 14:160-169, Genome research EBI, Wellcome Trust.

Similar presentations


Presentation on theme: "EnsMart: A Generic System for Fast and Flexible Access to Biological Data Arek Kasprzyk et al (2004) 14:160-169, Genome research EBI, Wellcome Trust."— Presentation transcript:

1 EnsMart: A Generic System for Fast and Flexible Access to Biological Data Arek Kasprzyk et al (2004) 14:160-169, Genome research EBI, Wellcome Trust

2 Objectives  Understand the idea of a “Data Mart”  Understand why this idea is useful to biology  Have an idea of how EnsMart works.  Assess the significance of the EnsMart system. Will it last?

3 Data Mart defined  A database that is potentially derived from many other databases whose primary purpose is query processing and report generation for non-technical users.  Similar to a “Data Warehouse”  Marts/warehouses important components in “decision support systems” in business.

4 Data Mart in EnsMart Data collected Standardized Query Optimized Presented to Users

5 Marts – benefits  Allows good division of labor –Computers for transactions separate from computers for queries –Interface development separate from database development. –Biologists (can be) separated from computer scientists as a result of good interface design. –Produces faster more stable system for users

6 Costs  Construction of the Mart is a challenging and continuous process.  New sources of data need to be incorporated and validated constantly  Trust

7 The case for EnsMart, why now?  Growing number of different databases and opportunities. Genomes, expression, protein, disease…  Assembled, high quality genomes available. –“finished” genomes can be used as references to link data from different databases consistently.  EnsMart built to take advantage of the opportunities for cross-database queries.

8 Inside EnsMart  9 organisms  At least 17 different primary sources of data, many with multiple databases.  2 kinds of “Foci” –Genes  Ensemble  EST  Vega –SNPs

9 EnsMart schema Focus 1 Many One Many

10 EnsMart schema: another focus

11 Schema -> Query Speed  “Central” tables or foci contain binary values for each satellite indicating existence. First step in query generation limits the range of satellite tables accessed.  These values are only useful in the query process (take extra space and time for transactions).  Many queries may not require access to satellite tables as a result.

12 User Interfaces  Supposedly Confucian quote –"What I hear I forget. –What I see I remember. –What I do I understand."

13 User Interfaces  MartView: website, “wizard” query construction.  MartExplorer: Stand alone tool, tree-based query construction.  MartShell: text-based application that utilizes an SQL-like query language. Can be used interactively or in batch processes.  Write your own! – using MartLib java library

14 MartView 1 Choose org and focus

15 MartView 2 Design query

16 MartView 3 Specify Output

17 MartExplorer

18 MartShell

19 Conclusions  Powerful query system for biologists.  Useful framework for software engineers. –All open source!  What about other loci such as repetitive elements?  Data validation?  Annotation updates?

20 EnsMart Discussion  What, if any, are the problems with the foci system?  What alternatives to this system exist?  Describe a task that EnsMart could be used to accomplish.  Describe any personal experiences with EnsMart.


Download ppt "EnsMart: A Generic System for Fast and Flexible Access to Biological Data Arek Kasprzyk et al (2004) 14:160-169, Genome research EBI, Wellcome Trust."

Similar presentations


Ads by Google