Presentation is loading. Please wait.

Presentation is loading. Please wait.

GLEON Data Management Luke Winslow PASEO 3/18/09.

Similar presentations


Presentation on theme: "GLEON Data Management Luke Winslow PASEO 3/18/09."— Presentation transcript:

1 GLEON Data Management Luke Winslow PASEO 3/18/09

2 GLEON Sites September 2008 Lake Observatory + IT Development

3 Buoys (Hi-res Data)

4 Data Integration Goal Single interface – Access and download *all* data, near real time Include all necessary metadata possible Currently – 6 groups, 20 sites – Soon: 5 more groups – http://dbbadger.gleonrcn.org http://dbbadger.gleonrcn.org

5 Challenges with Global and Grassroots Entire globe covered – Long distances involved Grassroots, Bottom Up – No Top Level Funding or Mandates – Sites have different staffing and funding levels Existing systems – Some have extensive in place infrastructure – Some have no existing systems Tens of diverse groups – Common language, vocabulary?

6 Many Potential Strategies Transferring Data – Push – Pull Archiving Data – Distributed (Federated) – Centralized

7 Distributed Storage No replication Control Designation – Each site stores their data Highly susceptible to faults Potentially poor performance Central Portal

8 Centralized Storage Good query performance Less susceptible to network faults Responsibility and control change – Sites want local copy of data Central DB

9 Moving Data Push -Sites send data -Central listens Pull -Central requests data -Sites listen for requests Central Data Request Data Sent

10 GLEON Model: Mixed All data are stored centrally – Some replicated at local site Pull: Sites with existing systems – Based on XML standard Push: Sites with GLEON system Central

11 Data-Integration Project XML Based Standard – Sites expose data – Data are harvested Underlying DB can be anything Still creates issues

12 ZiggyStardust Source any data originator Repository any ‘next step’ for data Filter – QA/QC – Event Detection – Derived product generator Notification Services Source Filter Repository/Middleware CoreData -Value Metadata -Site -Variable -Offset -Source -Aggregation -RepNumber

13 Data Storage: Flat Structure Create data table DateTime column Each variable is unique column Mendota_Buoy_Table:

14 Data Storage: Vega Data Model Data Model similar to “Star” database schema – Vega is a star ‘Data Stream’ as core entity Inspiration from CUAHSI’s Observation Data Model

15 Data Stream Data – Same metadata – Change only in time Example – Var: Water Temperature – Site: Lake Annie – Unit: C – Depth: 0.5m – Aggregation: 24:00 Mean

16 Vega Data Model Value oriented structure Store data from any number of sites Highly optimized ‘Values’ table Query Times < 1 sec GLEON central – Now 40 million values Streams

17 Controlled Vocabulary AirTemp? Air Temperature? RH? RelHum? Water_Temperature Air_Temperature Phycocyanin Precipitation Relative_Humidity Etc…

18 Software Sharing and Reuse “Good programmers write good code. Great programmers steal great code.” – Unknown

19 Science and Software Development Parallels Science – Heavily collaborative – Sharing ideas and results – Benefits from openness Software Development – Could do the same – Open source community an example (Other Expertise) – Gleon.org Gleon.org Science Software Dev Level of Collaboration

20 Grass Roots Software Dev Model Open Source/Free Software Community – Can be hugely successful – Many high profile projects Lake Analyzer First example Lake Analyzer – Received input to improve algorithms – Available to everyone ZiggyStardust, VADER, others also availableVADER

21 Current Challenges Metadata (Quality Control Specifically) – Collection, standards, storage – Challenging for real time and streaming data – Meaningful output – Replicating updates Metadata (Controlled vocabulary) – Correct way to differentiate variables? Other Observations (manually sampled) – Expand to more diverse datasets

22 Questions? Acknowledgements – All GLEON Members, Tim Kratz, Paul Hanson, Tom Harmon, and all others that have contributed ideas and support – NSF Grant DEB-0217533, DBI-0639229, and DBI-0446017 and the Gordon and Betty Moore Foundation


Download ppt "GLEON Data Management Luke Winslow PASEO 3/18/09."

Similar presentations


Ads by Google