Research Overview Gagan Agrawal Associate Professor.

Slides:



Advertisements
Similar presentations
Cyberinfrastructure for Coastal Forecasting and Change Analysis
Advertisements

Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.
Big Data Management and Analytics Introduction Spring 2015 Dr. Latifur Khan 1.
The Virtual Microscope Umit V. Catalyurek Department of Biomedical Informatics Division of Data Intensive and Grid Computing.
Supporting High-Level Abstractions through XML Technologies Xiaogang Li Gagan Agrawal The Ohio State University.
Computer Science and Engineering A Middleware for Developing and Deploying Scalable Remote Mining Services P. 1DataGrid Lab A Middleware for Developing.
4.x Performance Technology drivers – Exascale systems will consist of complex configurations with a huge number of potentially heterogeneous components.
Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information.
Efficient Evaluation of XQuery over Streaming Data Xiaogang Li Gagan Agrawal The Ohio State University.
Overview of the Course Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.
Preparing for the Poster Session Gagan Agrawal. Outline Background on the proposal Overall research focus Equipment requested Preparing for the Site Visit.
Performance Issues in Parallelizing Data-Intensive applications on a Multi-core Cluster Vignesh Ravi and Gagan Agrawal
Ohio State University Department of Computer Science and Engineering 1 Cyberinfrastructure for Coastal Forecasting and Change Analysis Gagan Agrawal Hakan.
Ohio State University Department of Computer Science and Engineering 1 Supporting SQL-3 Aggregations on Grid-based Data Repositories Li Weng, Gagan Agrawal,
Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering Nithya N. Vijayakumar, Beth Plale DDE Lab, Indiana University {nvijayak,
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
Shared Memory Parallelization of Decision Tree Construction Using a General Middleware Ruoming Jin Gagan Agrawal Department of Computer and Information.
EFFECTIVE LOAD-BALANCING VIA MIGRATION AND REPLICATION IN SPATIAL GRIDS ANIRBAN MONDAL KAZUO GODA MASARU KITSUREGAWA INSTITUTE OF INDUSTRIAL SCIENCE UNIVERSITY.
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal June 1,
Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.
Computer Science and Engineering Predicting Performance for Grid-Based P. 1 IPDPS’07 A Performance Prediction Framework.
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
(High-End) Computing Systems Group Department of Computer Science and Engineering The Ohio State University.
FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.
Compiler Support for Exploiting Coarse-Grained Pipelined Parallelism Wei Du Renato Ferreira Gagan Agrawal Ohio-State University.
1 EECS 6083 Compiler Theory Based on slides from text web site: Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
1 Using Tiling to Scale Parallel Datacube Implementation Ruoming Jin Karthik Vaidyanathan Ge Yang Gagan Agrawal The Ohio State University.
High-level Interfaces and Abstractions for Data-Driven Applications in a Grid Environment Gagan Agrawal Department of Computer Science and Engineering.
1 Supporting Dynamic Migration in Tightly Coupled Grid Applications Liang Chen Qian Zhu Gagan Agrawal Computer Science & Engineering The Ohio State University.
Implementing Data Cube Construction Using a Cluster Middleware: Algorithms, Implementation Experience, and Performance Ge Yang Ruoming Jin Gagan Agrawal.
Compiler (and Runtime) Support for CyberInfrastructure Gagan Agrawal (joint work with Wei Du, Xiaogang Li, Ruoming Jin, Li Weng)
Supporting Load Balancing for Distributed Data-Intensive Applications Leonid Glimcher, Vignesh Ravi, and Gagan Agrawal Department of ComputerScience and.
PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.
DOE Network PI Meeting 2005 Runtime Data Management for Data-Intensive Scientific Applications Xiaosong Ma NC State University Joint Faculty: Oak Ridge.
MATE-CG: A MapReduce-Like Framework for Accelerating Data-Intensive Computations on Heterogeneous Clusters Wei Jiang and Gagan Agrawal.
System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Gagan Agrawal Department of Computer and Information Sciences Ohio.
High-level Interfaces for Scalable Data Mining Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University.
Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,
Packet Size optimization for Supporting Coarse-Grained Pipelined Parallelism Wei Du Gagan Agrawal Ohio State University.
AUTO-GC: Automatic Translation of Data Mining Applications to GPU Clusters Wenjing Ma Gagan Agrawal The Ohio State University.
Ohio State University Department of Computer Science and Engineering 1 Tools and Techniques for the Data Grid Gagan Agrawal The Ohio State University.
Collection and storage of provenance data Jakub Wach Master of Science Thesis Faculty of Electrical Engineering, Automatics, Computer Science and Electronics.
1 A Grid-Based Middleware’s Support for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
Ohio State University Department of Computer Science and Engineering 1 Tools and Techniques for the Data Grid Gagan Agrawal.
1 Parallel Datacube Construction: Algorithms, Theoretical Analysis, and Experimental Evaluation Ruoming Jin Ge Yang Gagan Agrawal The Ohio State University.
System Support for High Performance Scientific Data Mining Gagan Agrawal Ruoming Jin Raghu Machiraju S. Parthasarathy Department of Computer and Information.
1 Supporting a Volume Rendering Application on a Grid-Middleware For Streaming Data Liang Chen Gagan Agrawal Computer Science & Engineering Ohio State.
Servicing Seismic and Oil Reservoir Simulation Data through Grid Data Services Sivaramakrishnan Narayanan, Tahsin Kurc, Umit Catalyurek and Joel Saltz.
Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.
Efficient Evaluation of XQuery over Streaming Data
Ge Yang Ruoming Jin Gagan Agrawal The Ohio State University
QianZhu, Liang Chen and Gagan Agrawal
Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering
Supporting Fault-Tolerance in Streaming Grid Applications
Li Weng, Umit Catalyurek, Tahsin Kurc, Gagan Agrawal, Joel Saltz
Topics Covered in COSC 6340 Data models (ER, Relational, XML (short))
Communication and Memory Efficient Parallel Decision Tree Construction
GATES: A Grid-Based Middleware for Processing Distributed Data Streams
Resource Allocation in a Middleware for Streaming Data
Compiler Supported Coarse-Grained Pipelined Parallelism: Why and How
Decision Trees for Mining Data Streams
The Ohio State University
New (Applications of) Compiler Techniques for Data Grids
FREERIDE: A Framework for Rapid Implementation of Datamining Engines
FREERIDE: A Framework for Rapid Implementation of Datamining Engines
LCPC02 Wei Du Renato Ferreira Gagan Agrawal
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Research Overview Gagan Agrawal Associate Professor

Personnel Involved Ph.D student Liang Chen Wei Du Ruoming Jin Feng Li (Jointly with Joel Saltz) Xiaogang Li Masters (thesis) student Ge Yang Undergrad student Leo Glimcher Faculty collaborations: Joel Saltz, Tahsin Kurc, Umit Catalyurek, Srini Parthasarathy, Raghu Machiraju

An Overall Vision Our world will be full of distributed and dynamic data sources High speed networking (Grid computing) Sensor networks, mobile systems, embedded devices Processing this information involves many challenges A lot of data, distributed Often, continuous data streams (can’t store all data, real- time processing constraint) Complex interplay of communication and computational costs Application programmers want more transparency

Research Projects Compilers: Compiling XQuery (Query Language for XML data), Compiling for a distributed heterogeneous (grid) environment, parallelizing scientific data intensive and data mining codes Middleware and Runtime Support: FREERIDE (Framework for Rapid Implementation of Datamining Engines), ongoing work on distributed processing of data streams Data mining and OLAP algorithms: Mining for streaming data, Parallel and scalable mining algorithms, OLAP algorithms

Compiling Data Intensive Applications for a Grid Environment

Compiling XQuery Vision: XML has become an accepted standard for distribution of datasets XQuery is the well-accepted high-level query language for querying and processing XML datasets Compiling complex data-intensive reduction operations written in XQuery Reductions written using recursion Data-centric execution strategies Using XML Schemas to describe the datasets -

System Support for Data Mining in a Parallel Environment Clusters of SMPs Data Parallel Java Compiler Techniques MPI+Posix Threads+File I/O FREERIDE(middleware) Runtime Techniques

Distributed Processing of Data Streams Processing continuous data streams arising from distributed sources A number of system and algorithmic challenges Real time requirement on processing rate – tradeoffs between accuracy of analysis and efficiency Placement of data – obviously want to process an individual stream close to the source of data Feedback based control of accuracy – cannot allow any computational or communication stage to become the bottleneck Performance modeling: impact of output size, level of sampling etc. on performance Recently started work in this area ….

Algorithms for Mining and OLAP Decision tree construction for streaming data: new one-pass algorithm with statistical accuracy bound Parallel and scalable decision tree construction: use sampling, but without losing accuracy Data cube construction: Parallel algorithms with optimal communication volume Tiling based algorithms for scaling output sizes