The Big Picture Scientific disciplines have developed a computational branch Models without closed form solutions solved numerically This has lead to.

Slides:



Advertisements
Similar presentations
Johns Hopkins University Xiaodan Wang Eric Perlman Randal Burns Tamas Budavari Charles Meneveau Alexander Szalay Purdue University Tanu Malik JAWS: J ob-
Advertisements

High Performance Cluster Computing Architectures and Systems Hai Jin Internet and Cluster Computing Center.
Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.
1 Cheriton School of Computer Science 2 Department of Computer Science RemusDB: Transparent High Availability for Database Systems Umar Farooq Minhas 1,
Nectar: Efficient Management of Computation and Data in Data Centers Lenin Ravindranath Pradeep Kumar Gunda, Chandu Thekkath, Yuan Yu, Li Zhuang.
June 22-23, 2005 Technology Infusion Team Committee1 High Performance Parallel Lucene search (for an OAI federation) K. Maly, and M. Zubair Department.
The Virtual Microscope Umit V. Catalyurek Department of Biomedical Informatics Division of Data Intensive and Grid Computing.
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
Physical Database Monitoring and Tuning the Operational System.
Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR.
Architectural Design Principles. Outline  Architectural level of design The design of the system in terms of components and connectors and their arrangements.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 11 Database Performance Tuning and Query Optimization.
Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins.
Sensor Data Management with Model-based View LSIR, EPFL.
Recovery Techniques in Distributed Databases Naveen Jones December 5, 2011.
Sensor Networks Storage Sanket Totala Sudarshan Jagannathan.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Designing Efficient Systems Services and Primitives for Next-Generation Data-Centers K. Vaidyanathan, S. Narravula, P. Balaji and D. K. Panda Network Based.
ROOT: A Data Mining Tool from CERN Arun Tripathi and Ravi Kumar 2008 CAS Ratemaking Seminar on Ratemaking 17 March 2008 Cambridge, Massachusetts.
Alok 1Northwestern University Access Patterns, Metadata, and Performance Alok Choudhary and Wei-Keng Liao Department of ECE,
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
CSC271 Database Systems Lecture # 30.
1 CMPT 275 High Level Design Phase Architecture. Janice Regan, Objectives of Design  The design phase takes the results of the requirements analysis.
Visual-Spatial Thinking in Digital Libraries —Top Ten Problems Chaomei Chen Brunel University June 28th 2001, Hotel Roanoke and Conference Center, Roanoke,
An Integration Framework for Sensor Networks and Data Stream Management Systems.
Chapter 1 Introduction to Data Mining
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Master Thesis Defense Jan Fiedler 04/17/98
Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
David S. Ebert David S. Ebert Visual Analytics to Enable Discovery and Decision Making: Potential, Challenges, and.
So far we have covered … Basic visualization algorithms Parallel polygon rendering Occlusion culling They all indirectly or directly help understanding.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. LogKV: Exploiting Key-Value.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Object Persistence Design Chapter 13. Key Definitions Object persistence involves the selection of a storage format and optimization for performance.
Indexing and Visualizing Multidimensional Data I. Csabai, M. Trencséni, L. Dobos, G. Herczegh, P. Józsa, N. Purger Eötvös University,Budapest.
Xiaodan Wang Department of Computer Science Johns Hopkins University Processing Data Intensive Queries in Scientific Database Federations.
Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.
Unit 2 Architectural Styles and Case Studies | Website for Students | VTU NOTES | QUESTION PAPERS | NEWS | RESULTS 1.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Near Real-Time Verification At The Forecast Systems Laboratory: An Operational Perspective Michael P. Kay (CIRES/FSL/NOAA) Jennifer L. Mahoney (FSL/NOAA)
Virtualization and Databases Ashraf Aboulnaga University of Waterloo.
Data Intensive Astronomy Group Talk II ICRAR Con 4 September 2015 Chen Wu.
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida Physical Database Implementation—Topics.
DIRAC Project A.Tsaregorodtsev (CPPM) on behalf of the LHCb DIRAC team A Community Grid Solution The DIRAC (Distributed Infrastructure with Remote Agent.
MULTIMEDIA DATA MODELS AND AUTHORING
Data Mining Concepts and Techniques Course Presentation by Ali A. Ali Department of Information Technology Institute of Graduate Studies and Research Alexandria.
Tackling I/O Issues 1 David Race 16 March 2010.
Scientific Computing Goals Past progress Future. Goals Numerical algorithms & computational strategies Solve specific set of problems associated with.
Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN /420 Instructor: Randal Burns 26 February 2014.
Presenter: Yue Zhu, Linghan Zhang A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: a Case Study by PowerPoint.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
Introduction to Operating Systems Concepts
Alternatives to Mobile Agents
MATLAB Distributed, and Other Toolboxes
Open Source distributed document DB for an enterprise
Database Management Systems (CS 564)
So far we have covered … Basic visualization algorithms
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Database Performance Tuning and Query Optimization
Li Weng, Umit Catalyurek, Tahsin Kurc, Gagan Agrawal, Joel Saltz
Mapping the Data Warehouse to a Multiprocessor Architecture
Data Warehousing and Data Mining
Database System Architecture
Chapter 11 Database Performance Tuning and Query Optimization
Database System Architectures
Automatic and Efficient Data Virtualization System on Scientific Datasets Li Weng.
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

The Big Picture Scientific disciplines have developed a computational branch Models without closed form solutions solved numerically This has lead to an explosion of data Simulation and analysis workloads are data-intensive Producing\scanning large amounts of data Management of these data represents a significant challenge Storage\archiving Query processing Visualization

Remote Immersive Analysis Formerly, analysis performed during the computation No data stored for subsequent examination Data-intensive computing breakthroughs have allowed for new interaction with scientific numerical simulations Turbulence Database Cluster Stores entire space-time evolution of the simulation Provides public access to world-class simulations Implements “immersive turbulence * ” approach Introduces new challenges * E. Perlman, R. Burns, Y. Li, and C. Meneveau. Data exploration of turbulence simulations using a database cluster. In Supercomputing, 2007.

Goals Develop data-driven query processing techniques Reduce I/O and computation costs Reduce or eliminate storage overhead Exploit domain knowledge and structure Provide user interfaces that are efficient and flexible Streamline the process of data ingest

Turbulence Database Cluster

Processing a Batch Query query 1 query 3 query 2 q1: q2: q3: Redundant I/O Multiple disk seeks

I/O Streaming Evaluation Method Linear data requirements of the computation allow for: Incremental evaluation Streaming over the data Concurrent evaluation of batch queries

Processing a Batch Query query 1 query 3 query q1 q3 q1 q3 q1 q2 q1 q2 I/O Streaming: Sequential I/O Single pass

Lagrange Polynomial Interpolation Lagrange coefficients Data

Spatial Differentiation

Derivative Interpolation

128 Workload Over an order of magnitude improvement Sorting leads to a more sequential acces Join/Order By executes entire batch as a join I/O Streaming Each atom is read only once Effective cache usage

I/O Streaming alleviates I/O bottleneck Computation emerges as the more costly operation

Particle Tracking Web Server/Mediator DB Node 1 Distribute Points based on Computational Module Storage Layer Retrieve DB Node N Computational Module Storage Layer Retrieve x p (t m ) x * p (t m )

Particle Tracking Web Server/Mediator DB Node 1 Distribute Points based on Computational Module Storage Layer Retrieve DB Node N Computational Module Storage Layer Retrieve x * p (t m ) x p (t m+1 )

Summary and Future Work Extend I/O streaming technique to different decomposable kernel computations: Differentiation Spatial Interpolation Temporal interpolation Filtering and coarse-graining Provide a flexible user interface Allow for different filter functions Allow for new kernel computations Improve particle tracking routine Reduce communication between mediator and DB nodes Asynchronous processing Caching and pre-fetching

Questions Images courtesy of Kai Buerger