Download presentation
Presentation is loading. Please wait.
1
Making the Most of What We Know: Towards Effective Use of Genomics Data Terence Critchlow Center for Applied Scientific Computing Lawrence Livermore National Laboratory www.llnl.gov/CASC/people/critchlow PITTCON 2001 March 6, 2001 UCRL-VG-141500
2
Outline l A day in the life of a geneticist/biologist. l The business approach. (present) l The impact of the web.(future) l The key to success.
3
Hey! Today, I finally get to analyze the sequencing data I have been collecting for the last week.
4
Wow! I need to look at 500 clones. That will take a couple of weeks.
5
Man, this sure would be easier if I could move between these different sites without having to reformat the data every time.
6
Different users end up doing the same thing. Currently, accessing genomics data is challenging. Source Specific Schema The user is required to perform all data management tasks. dbEST SCoP SWISS-PROT User applications Transform Map data format similar concepts ParseAccess input/the data output PDB
7
The result of the current environment: Scientists limit their searches to a small number of sources they are familiar with. A huge amount of relevant information is not utilized. Resources are wasted. Time is wasted. Knowledge is wasted.
8
What is our ideal environment? A single location that provides effective access to a consistent view of data from many sources through an intuitive and useful interface. PDB dbEST SCoP SWISS-PROT User applications Transform Map data format similar concepts ParseAccess input/the data output
9
What is our ideal environment? Businesses use data warehouses to accomplish this. A single location that provides effective access to a consistent view of data from many sources through an intuitive and useful interface. PDB dbEST SCoP SWISS-PROT User applications Transform Map data format similar concepts ParseAccess input/the data output
10
The business approach.
11
Data warehouses Swiss ProtSCoPPDB l Interfaces provide intuitive access to the data possibly change data format to meet user expectations l Warehouse stores a consistent view of data in a local repository Ingest Code l Mediator transform data from source format to warehouse format l Wrappers read data from source into internal representation Wrapper Mediator Wrapper dbEST Data Warehouse
12
Typical approaches combine wrapper and mediator functionality. User applications User task Parsing Transformations Data Entry SWISS-PROT Parsing Transformations Data Entry dbEST Parsing Transformations Data Entry SCoP Parsing Transformations DataEntry APIAPI Global Schema Source Specific Schema Transform Map data format similar concepts ParseAccess input/the data output PDB Data Warehouse
13
When a source changes, propagating the change can be time consuming. User applications User task PDB Parsing Transformations Data Entry SWISS-PROT Parsing Transformations Data Entry dbEST Parsing Transformations Data Entry SCoP Parsing Transformations DataEntry APIAPI Global Schema Source Specific Schema Transform Map data format similar concepts ParseAccess input/the data output Data Warehouse
14
When a source changes, propagating the change can be time consuming. User applications User task PDB Parsing Transformations Data Entry SWISS-PROT Parsing Transformations Data Entry dbEST Parsing Transformations Data Entry SCoP Parsing Transformations DataEntry APIAPI Global Schema Source Specific Schema Need an architecture capable of managing change Transform Map data format similar concepts ParseAccess input/the data output Data Warehouse
15
DataFoundry separates these tasks. User applications User task PDB wrapper SWISS-PROT wrapper dbEST wrapper mediator Mediator Generator Meta-data SCoP wrapper APIAPI Global Schema Source Specific Schema Transform Map data format similar concepts ParseAccess input/the data output Data Warehouse
16
The impact of the Web.
17
4 Years Ago (when we started) The web has changed how scientific data is shared. PDB dbEST SCoP SWISS-PROT Scientists access data from 1-10 community resources using the web :::::: ~ 30 sources distributed over the WWW
18
~ 500 sources distributed over the WWW Today PDB dbEST SCoP SWISS-PROT Scientists access data from 4-15 community resources using the web :::::: DF The web has changed how scientific data is shared.
19
So, what makes this so much more difficult? l No central directory How do you find the sources in the first place? l Different data formats and semantics Once you find a source, how to you make sense of it? l Complex analysis Queries require more than simple data retrieval, they need to invoke complex programs. l Lots of data sources Too much work to be done manually. Need to select appropriate subset for each user / query. How do you present the results in an useful format?
20
A hybrid approach to information access holds the most promise. Detailed analysis PDB wrapper SWISS-PROT wrapper dbEST wrapper mediator Mediator Generator Meta-data SCoP wrapper APIAPI Global Schema Source Specific Schema Data Warehouse Dynamic access to large numbers of non-integrated sources. Query Engine Dynamic source access wrapper Preproces s Meta-data Detailed analysis PDB wrapper SWISS-PROT wrapper dbEST wrapper mediator Mediator Generator Meta-data SCoP wrapper Source Specific Schema Data Warehouse
21
The key to success is management support.
22
Ideally, management fully supports the bioinformatics effort. Good management understands the complexity of the problems and provides the required resources. The technical problems surrounding bioinformatics can be overcome. l Schedules are reasonable l Goals are realistic l Staff is properly trained l Funding is adequate and stable
23
Unfortunately, that is usually not the case. Poor management underestimates the complexity of the problems and provides inadequate resources.
24
Unfortunately, that is usually not the case. Poor management underestimates the complexity of the problems and provides inadequate resources. You are contributing to the problems facing bioinformatics instead of their solutions. l Schedules are unreasonable l Goals are unrealistic l Morale is low & staff is poorly trained l Funding is minimal and unstable
25
Conclusions The good news The technical challenges surrounding bioinformatics can be overcome. Bioinformatics has a long way to go before it can support scientific exploration. The bad news The major challenge is political, not technical.
26
Conclusions Be part of the solution, not part of the problem.
27
Questions? www.llnl.gov/CASC/people/critchlow
28
This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under contract No. W-7405- ENG-48.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.