Presentation is loading. Please wait.

Presentation is loading. Please wait.

Large Scientific Databases. Large scientific datasets are those which are systematically collected and organized and which stretch the technical capabilites.

Similar presentations


Presentation on theme: "Large Scientific Databases. Large scientific datasets are those which are systematically collected and organized and which stretch the technical capabilites."— Presentation transcript:

1 Large Scientific Databases

2 Large scientific datasets are those which are systematically collected and organized and which stretch the technical capabilites of the species to store, manipulate, and distribute data for scientific investigation--hence limiting that scientific investigation.

3 What is a “small” dataset? “Only a few hundred gigabytes.” -Alex Szalay

4 What about non-scientific databases? Why not Google?

5 Fields producing these datasets Observational data –Earth and space sciences Astronomy and Astrophysics Space Physics Atmospheric Science Geoscience Ocean Science Experimental Laboratory Data –CERN [From Preserving Scientific Data on Our Physical Universe (Washington, National Academy Press: 1995)]

6 Observations The datasets they are collecting are huge and will grow. These datasets stretch the technical capabilities of what our species can do with computer applications and hardware. Thus limiting what we can learn. That there are bottlenecks in storage, manipulation, and in distribution. There is not enough bandwidth for scientific use in the sizes of datasets that now exist.

7 More observations It may be that there are solutions in other disciplines for addressing some problems scientists working with large datasets are wrestling with. –Library & Information Science –Graphics –Hardware and software vendors They shouldn't all have to reinvent everything separately

8 Is there a field? Connections between scientists working on large datasets appear to be informal Assembling scientists working with large datasets will be useful because different ones may have solved different problems already or may have useful insights to share There is an extensive literature but it is technical and largely not self-aware

9 Is there a field 2 On a broader scale, if in 10 years these datasets can be put on a desktop computer, there will be scientists out gathering even bigger datasets. It is what humans do. Can principles be derived from current experience that will help deal with those future larger limits? Can we focus on this aspect of science?

10 Ancillary issues Policy Characteristics of the data etc.

11 What next? Conference? –Gather The scientists Vendors Disciplines that might help the scientists Literature review


Download ppt "Large Scientific Databases. Large scientific datasets are those which are systematically collected and organized and which stretch the technical capabilites."

Similar presentations


Ads by Google