Download presentation
Presentation is loading. Please wait.
1
GLEON Data Management Luke Winslow PASEO 3/18/09
2
GLEON Sites September 2008 Lake Observatory + IT Development
3
Buoys (Hi-res Data)
4
Data Integration Goal Single interface – Access and download *all* data, near real time Include all necessary metadata possible Currently – 6 groups, 20 sites – Soon: 5 more groups – http://dbbadger.gleonrcn.org http://dbbadger.gleonrcn.org
5
Challenges with Global and Grassroots Entire globe covered – Long distances involved Grassroots, Bottom Up – No Top Level Funding or Mandates – Sites have different staffing and funding levels Existing systems – Some have extensive in place infrastructure – Some have no existing systems Tens of diverse groups – Common language, vocabulary?
6
Many Potential Strategies Transferring Data – Push – Pull Archiving Data – Distributed (Federated) – Centralized
7
Distributed Storage No replication Control Designation – Each site stores their data Highly susceptible to faults Potentially poor performance Central Portal
8
Centralized Storage Good query performance Less susceptible to network faults Responsibility and control change – Sites want local copy of data Central DB
9
Moving Data Push -Sites send data -Central listens Pull -Central requests data -Sites listen for requests Central Data Request Data Sent
10
GLEON Model: Mixed All data are stored centrally – Some replicated at local site Pull: Sites with existing systems – Based on XML standard Push: Sites with GLEON system Central
11
Data-Integration Project XML Based Standard – Sites expose data – Data are harvested Underlying DB can be anything Still creates issues
12
ZiggyStardust Source any data originator Repository any ‘next step’ for data Filter – QA/QC – Event Detection – Derived product generator Notification Services Source Filter Repository/Middleware CoreData -Value Metadata -Site -Variable -Offset -Source -Aggregation -RepNumber
13
Data Storage: Flat Structure Create data table DateTime column Each variable is unique column Mendota_Buoy_Table:
14
Data Storage: Vega Data Model Data Model similar to “Star” database schema – Vega is a star ‘Data Stream’ as core entity Inspiration from CUAHSI’s Observation Data Model
15
Data Stream Data – Same metadata – Change only in time Example – Var: Water Temperature – Site: Lake Annie – Unit: C – Depth: 0.5m – Aggregation: 24:00 Mean
16
Vega Data Model Value oriented structure Store data from any number of sites Highly optimized ‘Values’ table Query Times < 1 sec GLEON central – Now 40 million values Streams
17
Controlled Vocabulary AirTemp? Air Temperature? RH? RelHum? Water_Temperature Air_Temperature Phycocyanin Precipitation Relative_Humidity Etc…
18
Software Sharing and Reuse “Good programmers write good code. Great programmers steal great code.” – Unknown
19
Science and Software Development Parallels Science – Heavily collaborative – Sharing ideas and results – Benefits from openness Software Development – Could do the same – Open source community an example (Other Expertise) – Gleon.org Gleon.org Science Software Dev Level of Collaboration
20
Grass Roots Software Dev Model Open Source/Free Software Community – Can be hugely successful – Many high profile projects Lake Analyzer First example Lake Analyzer – Received input to improve algorithms – Available to everyone ZiggyStardust, VADER, others also availableVADER
21
Current Challenges Metadata (Quality Control Specifically) – Collection, standards, storage – Challenging for real time and streaming data – Meaningful output – Replicating updates Metadata (Controlled vocabulary) – Correct way to differentiate variables? Other Observations (manually sampled) – Expand to more diverse datasets
22
Questions? Acknowledgements – All GLEON Members, Tim Kratz, Paul Hanson, Tom Harmon, and all others that have contributed ideas and support – NSF Grant DEB-0217533, DBI-0639229, and DBI-0446017 and the Gordon and Betty Moore Foundation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.