Sensor systems and large data sources Jim Myers, NCSA.

Slides:



Advertisements
Similar presentations
Digital Repositories: interoperability & common services Closing Remarks Dr Liz Lyon, UKOLN, University of Bath, UK
Advertisements

Copyright © SoftTree Technologies, Inc. DB Tuning Expert.
1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.
The Data Lifecycle and the Curation of Laboratory Experimental Data Tony Hey Corporate VP for Technical Computing Microsoft Corporation.
1 Introduction to Database Systems CSE444 Instructor: Scott Vandenberg University of Washington Winter 2000.
Big Data Management and Analytics Introduction Spring 2015 Dr. Latifur Khan 1.
USGS Core Data Stream Report Session 7: Baseline Global Observation Scenario SDCG-7 Sydney, Australia March 4 th – 6 th 2015.
11© 2011 Hitachi Data Systems. All rights reserved. HITACHI DATA DISCOVERY FOR MICROSOFT® SHAREPOINT ® SOLUTION SCALING YOUR SHAREPOINT ENVIRONMENT PRESENTER.
Organizing the Extremely Large LSST Database for Real-Time Astronomical Processing ADASS London, UK September 23-26, 2007 Jacek Becla 1, Kian-Tat Lim 1,
1 Towards Automating Complex Associative Access to Multiple Bioinformatics Data Sources Ling Liu, Calton Pu David Buttler, Wei Han Henrique Paques, Dan.
Data Conservancy: A Life Sciences Perspective Sayeed Choudhury Johns Hopkins University
An Introduction to Grid Computing Richard Fujimoto Reference: The Grid 2, ch. 1-4, 7 Ian Foster & Carl Kesselman (eds.)
Petascale Data Intensive Computing for eScience Alex Szalay, Maria Nieto-Santisteban, Ani Thakar, Jan Vandenberg, Alainna Wonders, Gordon Bell, Dan Fay,
The Cougar Approach to In-Network Query Processing in Sensor Networks By Yong Yao and Johannes Gehrke Cornell University Presented by Penelope Brooks.
Queries over Sensor Networks Sam Madden UC Berkeley Database Seminar October 5, 2001.
1 Visualizing the Legislature Howard University - Systems and Computer Science October 29, 2010 Mugizi Robert Rwebangira.
Chapter 14 The Second Component: The Database.
Pan-STARRS: Learning to Ride the Data Tsunami María A. Nieto-Santisteban 1, Yogesh Simmhan 3, Roger Barga 3, Tamas Budávari 1, László Dobos 1, Nolan Li.
ASCR Scientific Data Management Analysis & Visualization PI Meeting Exploration of Exascale In Situ Visualization and Analysis Approaches LANL: James Ahrens,
Dynamic and Decentralized Approaches for Optimal Allocation of Multiple Resources in Virtualized Data Centers Wei Chen, Samuel Hargrove, Heh Miao, Liang.
4.x Performance Technology drivers – Exascale systems will consist of complex configurations with a huge number of potentially heterogeneous components.
Hopkins Storage Systems Lab, Department of Computer Science A Workload-Driven Unit of Cache Replacement for Mid-Tier Database Caching Xiaodan Wang, Tanu.
DISTRIBUTED DATA FLOW WEB-SERVICES FOR ACCESSING AND PROCESSING OF BIG DATA SETS IN EARTH SCIENCES A.A. Poyda 1, M.N. Zhizhin 1, D.P. Medvedev 2, D.Y.
Big Data in Science (Lessons from astrophysics) Michael Drinkwater, UQ & CAASTRO 1.Preface Contributions by Jim Grey Astronomy data flow 2.Past Glories.
National Center for Supercomputing Applications Observational Astronomy NCSA projects radio astronomy: CARMA & SKA optical astronomy: DES & LSST access:
Alex Szalay, Jim Gray Analyzing Large Data Sets in Astronomy.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
March 6th, 2008Andrew Ofstad ECE 256, Spring 2008 TAG: a Tiny Aggregation Service for Ad-Hoc Sensor Networks Samuel Madden, Michael J. Franklin, Joseph.
An Integrated Instrumentation Architecture for NGI Applications Ian Foster, Darcy Quesnel, Steven Tuecke Argonne National Laboratory The University of.
Wireless Sensor Networks In-Network Relational Databases Jocelyn Botello.
The exponential growth of data –Challenges for Google,Yahoo,Amazon & Microsoft in web search and indexing The volume of data being made publicly available.
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
The Pan-STARRS Data Challenge Jim Heasley Institute for Astronomy University of Hawaii.
Source: Alex Szalay. Example: Sloan Digital Sky Survey The SDSS telescope array is systematically mapping ¼ of the entire sky Discoveries are made by.
“Big Data” and Data-Intensive Science (eScience) Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering University of Washington July.
Graph Data Analytics Arka Mukherjee, Ph.D. Global IDs Resolving Complexity at an Enterprise Scale.
Big Data and Large Scale Data Analysis Andrew Mead School of Life Sciences 23 rd October 2013.
Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.
CS4HS Why Computer Science? Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering University of Washington July 2013.
Components of the Global Climate Change Process IPCC AR4.
Experts in numerical algorithms and High Performance Computing services Challenges of the exponential increase in data Andrew Jones March 2010 SOS14.
+ Big Data IST210 Class Lecture. + Big Data Summary by EMC Corporation ( More videos that.
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
1 CS851 Data Services in Advanced System Applications Sang H. Son
DDM Kirk. LSST-VAO discussion: Distributed Data Mining (DDM) Kirk Borne George Mason University March 24, 2011.
Copyright 2007, Information Builders. Slide 1 Machine Sizing and Scalability Mark Nesson, Vashti Ragoonath June 2008.
EScience: Techniques and Technologies for 21st Century Discovery Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering Computer Science.
Applications and Requirements for Scientific Workflow May NSF Geoffrey Fox Indiana University.
Powered by Microsoft Azure, Auctori Is the Next Generation in Multilingual, Global, Search Engine Optimized Web Content Management Systems MICROSOFT AZURE.
Data and storage services on the NGS.
Applications and Requirements for Scientific Workflow May NSF Geoffrey Fox Indiana University.
LECTURE 2: DATA MINING. WHAT IS DATA MINING? 2 D ATA M INING AND D ATA W AREHOUSES ? It evolved in to being as the science of databases evolved Database.
Tackling I/O Issues 1 David Race 16 March 2010.
Big Data Yuan Xue CS 292 Special topics on.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
MIS 3500 Instructor: Bob Travica Trendy Database Topics 2016.
Database management system Data analytics system:
Modern Data Management
The Design of an Acquisitional Query Processor For Sensor Networks
Data Warehousing and Data Mining
Microsoft Azure Carries the Load, Enabling IT Companies to Offer New Services to Customers “When we realized the volume of network traffic and the amount.
CLUSTER BY: A NEW SQL EXTENSION FOR SPATIAL DATA AGGREGATION
GATES: A Grid-Based Middleware for Processing Distributed Data Streams
Brian Matthews STFC EOSCpilot Brian Matthews STFC
Dark Data Are we at risk?.
Panel on Research Challenges in Big Data
Advanced Geospatial Techniques: Aiding Earth Observation Applications
Convergence of Big Data and Extreme Computing
Presentation transcript:

Sensor systems and large data sources Jim Myers, NCSA

Exponential Trends in Observational Technologies Decreasing costs Increasing rates/resolution Increasing automation Increasing dimensionality Increasing breadth of sources

Examples – Add your favorite here… – Home Monitoring – Health Monitoring – Environment Monitoring – Habitat Monitoring – Earthquake Monitoring – Battlefield Monitoring 3rd grade project - 70 MB

Challenges: Volume The ability to create data is outstripping storage… – locally and globally And Storage is growing faster than access speeds… “Over the last 10 years while disk sizes have increased by a factor of 1,000, the rotation speed of large disks used in disk arrays has only changed a factor of 2…”

Challenges: Discovery, Organization, Trust (Quality) Data is being collected – In multiple dimensions – On multiple subjects – In many locations File names don’t scale  Rich metadata and provenance is needed to allow discovery, organize it for use, and to support assessment of quality

Challenges: Analysis Whether physical or statistical, analysis methods often scale as powers of data size Research is requiring more sophisticated analysis, not less

Solutions/Trends: Innovation in storage/management HW to optimize data bandwidth (e.g. Graywolf) New forms of databases and content systems: – Streaming – Spatial – SciDB – Column Stores – Semantic stores – Big Table

Solutions/Trends: Innovation in Acquisition, Access, and Processing Adaptive Sensing Query Optimization/Parallelization Moving algorithms to data One-pass algorithms Analysis over compressed data

Summary Data Deluge Metadata Deluge Processing Deluge Innovation required across the life cycle Including development of new data organizations (e.g. DataNet)

References/Image Credits 1.Collins et al. (2003). Science 300, ; Hugenholtz & Tyson (2008) Nature 455, Scientific Data Management in the Coming Decade, Jim Gray, David T. Liu, Maria Nieto- Santisteban, Alexander S. Szalay, David DeWitt, Gerd Heber, January 2005, Microsoft ResearchTechnical Report, MSR-TR The Sensor Spectrum: Technology, Trends, and Requirements, Joseph M. Hellerstein, Wei Hong, Samuel R. Madden, doi= , The Diverse and Exploding Digital Universe, IDC Whitepaper, March GrayWulf: Scalable Clustered Architecture for Data Intensive Computing, Alexander S. Szalay, Gordon Bell, Jan Vandenberg, Alainna Wonders, Randal Burns, Dan Fay, Jim Heasley, Tony Hey, Maria Nieto-SantiSteban, Ani Thakar, Catharine van Ingen, and Richard Wilton, 15 September 2008, MSR Tech Report MSR-TR Fran Berman, Ken Kennedy Award Presentation, SC Dick Crutcher, Gul Agha, Parya Moinzadeh, personal communication