Sensor systems and large data sources Jim Myers, NCSA
Exponential Trends in Observational Technologies Decreasing costs Increasing rates/resolution Increasing automation Increasing dimensionality Increasing breadth of sources
Examples – Add your favorite here… – Home Monitoring – Health Monitoring – Environment Monitoring – Habitat Monitoring – Earthquake Monitoring – Battlefield Monitoring 3rd grade project - 70 MB
Challenges: Volume The ability to create data is outstripping storage… – locally and globally And Storage is growing faster than access speeds… “Over the last 10 years while disk sizes have increased by a factor of 1,000, the rotation speed of large disks used in disk arrays has only changed a factor of 2…”
Challenges: Discovery, Organization, Trust (Quality) Data is being collected – In multiple dimensions – On multiple subjects – In many locations File names don’t scale Rich metadata and provenance is needed to allow discovery, organize it for use, and to support assessment of quality
Challenges: Analysis Whether physical or statistical, analysis methods often scale as powers of data size Research is requiring more sophisticated analysis, not less
Solutions/Trends: Innovation in storage/management HW to optimize data bandwidth (e.g. Graywolf) New forms of databases and content systems: – Streaming – Spatial – SciDB – Column Stores – Semantic stores – Big Table
Solutions/Trends: Innovation in Acquisition, Access, and Processing Adaptive Sensing Query Optimization/Parallelization Moving algorithms to data One-pass algorithms Analysis over compressed data
Summary Data Deluge Metadata Deluge Processing Deluge Innovation required across the life cycle Including development of new data organizations (e.g. DataNet)
References/Image Credits 1.Collins et al. (2003). Science 300, ; Hugenholtz & Tyson (2008) Nature 455, Scientific Data Management in the Coming Decade, Jim Gray, David T. Liu, Maria Nieto- Santisteban, Alexander S. Szalay, David DeWitt, Gerd Heber, January 2005, Microsoft ResearchTechnical Report, MSR-TR The Sensor Spectrum: Technology, Trends, and Requirements, Joseph M. Hellerstein, Wei Hong, Samuel R. Madden, doi= , The Diverse and Exploding Digital Universe, IDC Whitepaper, March GrayWulf: Scalable Clustered Architecture for Data Intensive Computing, Alexander S. Szalay, Gordon Bell, Jan Vandenberg, Alainna Wonders, Randal Burns, Dan Fay, Jim Heasley, Tony Hey, Maria Nieto-SantiSteban, Ani Thakar, Catharine van Ingen, and Richard Wilton, 15 September 2008, MSR Tech Report MSR-TR Fran Berman, Ken Kennedy Award Presentation, SC Dick Crutcher, Gul Agha, Parya Moinzadeh, personal communication