Download presentation
Presentation is loading. Please wait.
Published byDwight Stewart Modified over 9 years ago
1
EnsMart: A Generic System for Fast and Flexible Access to Biological Data Arek Kasprzyk et al (2004) 14:160-169, Genome research EBI, Wellcome Trust
2
Objectives Understand the idea of a “Data Mart” Understand why this idea is useful to biology Have an idea of how EnsMart works. Assess the significance of the EnsMart system. Will it last?
3
Data Mart defined A database that is potentially derived from many other databases whose primary purpose is query processing and report generation for non-technical users. Similar to a “Data Warehouse” Marts/warehouses important components in “decision support systems” in business.
4
Data Mart in EnsMart Data collected Standardized Query Optimized Presented to Users
5
Marts – benefits Allows good division of labor –Computers for transactions separate from computers for queries –Interface development separate from database development. –Biologists (can be) separated from computer scientists as a result of good interface design. –Produces faster more stable system for users
6
Costs Construction of the Mart is a challenging and continuous process. New sources of data need to be incorporated and validated constantly Trust
7
The case for EnsMart, why now? Growing number of different databases and opportunities. Genomes, expression, protein, disease… Assembled, high quality genomes available. –“finished” genomes can be used as references to link data from different databases consistently. EnsMart built to take advantage of the opportunities for cross-database queries.
8
Inside EnsMart 9 organisms At least 17 different primary sources of data, many with multiple databases. 2 kinds of “Foci” –Genes Ensemble EST Vega –SNPs
9
EnsMart schema Focus 1 Many One Many
10
EnsMart schema: another focus
11
Schema -> Query Speed “Central” tables or foci contain binary values for each satellite indicating existence. First step in query generation limits the range of satellite tables accessed. These values are only useful in the query process (take extra space and time for transactions). Many queries may not require access to satellite tables as a result.
12
User Interfaces Supposedly Confucian quote –"What I hear I forget. –What I see I remember. –What I do I understand."
13
User Interfaces MartView: website, “wizard” query construction. MartExplorer: Stand alone tool, tree-based query construction. MartShell: text-based application that utilizes an SQL-like query language. Can be used interactively or in batch processes. Write your own! – using MartLib java library
14
MartView 1 Choose org and focus
15
MartView 2 Design query
16
MartView 3 Specify Output
17
MartExplorer
18
MartShell
19
Conclusions Powerful query system for biologists. Useful framework for software engineers. –All open source! What about other loci such as repetitive elements? Data validation? Annotation updates?
20
EnsMart Discussion What, if any, are the problems with the foci system? What alternatives to this system exist? Describe a task that EnsMart could be used to accomplish. Describe any personal experiences with EnsMart.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.