Download presentation
Presentation is loading. Please wait.
Published byLorin Watkins Modified over 9 years ago
1
Research Overview Gagan Agrawal Associate Professor
2
Personnel Involved Ph.D student Liang Chen Wei Du Ruoming Jin Feng Li (Jointly with Joel Saltz) Xiaogang Li Masters (thesis) student Ge Yang Undergrad student Leo Glimcher Faculty collaborations: Joel Saltz, Tahsin Kurc, Umit Catalyurek, Srini Parthasarathy, Raghu Machiraju
3
An Overall Vision Our world will be full of distributed and dynamic data sources High speed networking (Grid computing) Sensor networks, mobile systems, embedded devices Processing this information involves many challenges A lot of data, distributed Often, continuous data streams (can’t store all data, real- time processing constraint) Complex interplay of communication and computational costs Application programmers want more transparency
4
Research Projects Compilers: Compiling XQuery (Query Language for XML data), Compiling for a distributed heterogeneous (grid) environment, parallelizing scientific data intensive and data mining codes Middleware and Runtime Support: FREERIDE (Framework for Rapid Implementation of Datamining Engines), ongoing work on distributed processing of data streams Data mining and OLAP algorithms: Mining for streaming data, Parallel and scalable mining algorithms, OLAP algorithms
5
Compiling Data Intensive Applications for a Grid Environment
6
Compiling XQuery Vision: XML has become an accepted standard for distribution of datasets XQuery is the well-accepted high-level query language for querying and processing XML datasets Compiling complex data-intensive reduction operations written in XQuery Reductions written using recursion Data-centric execution strategies Using XML Schemas to describe the datasets -
7
System Support for Data Mining in a Parallel Environment Clusters of SMPs Data Parallel Java Compiler Techniques MPI+Posix Threads+File I/O FREERIDE(middleware) Runtime Techniques
8
Distributed Processing of Data Streams Processing continuous data streams arising from distributed sources A number of system and algorithmic challenges Real time requirement on processing rate – tradeoffs between accuracy of analysis and efficiency Placement of data – obviously want to process an individual stream close to the source of data Feedback based control of accuracy – cannot allow any computational or communication stage to become the bottleneck Performance modeling: impact of output size, level of sampling etc. on performance Recently started work in this area ….
9
Algorithms for Mining and OLAP Decision tree construction for streaming data: new one-pass algorithm with statistical accuracy bound Parallel and scalable decision tree construction: use sampling, but without losing accuracy Data cube construction: Parallel algorithms with optimal communication volume Tiling based algorithms for scaling output sizes
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.