Research Overview Gagan Agrawal Associate Professor.

Research Overview Gagan Agrawal Associate Professor

Personnel Involved Ph.D student Liang Chen Wei Du Ruoming Jin Feng Li (Jointly with Joel Saltz) Xiaogang Li Masters (thesis) student Ge Yang Undergrad student Leo Glimcher Faculty collaborations: Joel Saltz, Tahsin Kurc, Umit Catalyurek, Srini Parthasarathy, Raghu Machiraju

An Overall Vision Our world will be full of distributed and dynamic data sources High speed networking (Grid computing) Sensor networks, mobile systems, embedded devices Processing this information involves many challenges A lot of data, distributed Often, continuous data streams (can’t store all data, real- time processing constraint) Complex interplay of communication and computational costs Application programmers want more transparency

Research Projects Compilers: Compiling XQuery (Query Language for XML data), Compiling for a distributed heterogeneous (grid) environment, parallelizing scientific data intensive and data mining codes Middleware and Runtime Support: FREERIDE (Framework for Rapid Implementation of Datamining Engines), ongoing work on distributed processing of data streams Data mining and OLAP algorithms: Mining for streaming data, Parallel and scalable mining algorithms, OLAP algorithms

Compiling Data Intensive Applications for a Grid Environment

Compiling XQuery Vision: XML has become an accepted standard for distribution of datasets XQuery is the well-accepted high-level query language for querying and processing XML datasets Compiling complex data-intensive reduction operations written in XQuery Reductions written using recursion Data-centric execution strategies Using XML Schemas to describe the datasets -

System Support for Data Mining in a Parallel Environment Clusters of SMPs Data Parallel Java Compiler Techniques MPI+Posix Threads+File I/O FREERIDE(middleware) Runtime Techniques

Distributed Processing of Data Streams Processing continuous data streams arising from distributed sources A number of system and algorithmic challenges Real time requirement on processing rate – tradeoffs between accuracy of analysis and efficiency Placement of data – obviously want to process an individual stream close to the source of data Feedback based control of accuracy – cannot allow any computational or communication stage to become the bottleneck Performance modeling: impact of output size, level of sampling etc. on performance Recently started work in this area ….

Algorithms for Mining and OLAP Decision tree construction for streaming data: new one-pass algorithm with statistical accuracy bound Parallel and scalable decision tree construction: use sampling, but without losing accuracy Data cube construction: Parallel algorithms with optimal communication volume Tiling based algorithms for scaling output sizes

Research Overview Gagan Agrawal Associate Professor.

Similar presentations

Presentation on theme: "Research Overview Gagan Agrawal Associate Professor."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Research Overview Gagan Agrawal Associate Professor.

Similar presentations

Presentation on theme: "Research Overview Gagan Agrawal Associate Professor."— Presentation transcript:

Similar presentations

About project

Feedback