Information Capture and Re-Use Joe Hellerstein
Scenario Ubiquitous computing is more than clients! –sensors and their data feeds are key –smart dust (MEMS sensors) –biomedical monitoring devices (MEMS sensors) –every item of value records its use/misuse (disposable computing) –tacit information from human behavior –video from surveillance cameras, broadcasts, etc.
There’s a Data Flood Coming
What does it look like? –Never ends: interactivity required –Big: data reduction/aggregation is key –Unpredictable: this scale of devices and nets will not behave nicely Key Technologies: –CONTROL: early answers and interactivity online aggregation for data reduction –River/Eddy: massively parallel, adaptive dataflow
CONTROL Continuous Output and Navigation Technology with Refinement On Line Data-intensive jobs are long-running. How to give early answers and interactivity? –Statistical estimators, and their performance implications –online query processing algs: ripple joins –online interactivity over feeds: data “juggle” Appreciate interplay of massive data processing, stats, and UIs Challenges: apply to sequence data, scale up
River We built the world’s fastest sorting machine –On the “NOW”: 100 Sun workstations + SAN –But it only beat the record under ideal conditions! River: performance adaptivity for data flows on clusters –simplifies management and programming –perfect for sensor-based streams Challenges: deploy over a wide area
Eddy How to order and reorder operators over time key complement to River: adapt not only to the hardware, but to the processing rates Challenges: scale up, consider parallel scheduling
Telegraph: Putting it Together Want to build next-gen global DB system. Capture and Re-Use Embodied in a vertical solution. Marriage of: –CONTROL, River & Eddy –OceanStore + optionally-Xactional storage that handle new hardware realities, scale –Federation in the wide area via Negotiation/Economics –Combinations of browse/query/mine at UI no magic bullet there! CONTROL is key.
Integration with other options Integration –Use Oceanic Data Utility for distribution, caching, protection of streams –Use negotiation architectures to connect federated and stored streams –Be data-intensive backbone to diverse clients –Be a scalable platform for tacit knowledge extraction Cooperation –Tacit information as a feed –Capture/merge classroom feeds –Use UI design tools for device-independent, interactive stream-based apps
Plan for Success One Year –Implement River/Eddy over parallel cluster, deploy CONTROL modules –Deploy data analysis apps over sequence data (MEMS/Web/Video) Three Year –Integrate w/ wide area storage & processing –Get data-intensive Endeavour apps running on architecture (e.g. tacit knowledge mining) –Develop UI tools for interacting with never- ending streams