Presentation is loading. Please wait.

Presentation is loading. Please wait.

AnHai Doan University of Wisconsin-Madison Managing Unstructured Data.

Similar presentations


Presentation on theme: "AnHai Doan University of Wisconsin-Madison Managing Unstructured Data."— Presentation transcript:

1 AnHai Doan University of Wisconsin-Madison Managing Unstructured Data

2 2 Unstructured Data... Unstructured Data... Appears in many forms –emails, Web pages, memos, call center text record, etc. Is pervasive –80% of the world data, and is growing Managed by many players –SIGIR/WWW/KDD/AAAI, Google/Yahoo/Microsoft/IBM We should work on it, or risk missing the boat! But what sets us apart from the above guys?

3 3 Structure + System Focus! Make it very easy to extract structures from raw data –in raw form  keyword search / bag analysis –many apps want to go beyond that, they want structure –we should encourage this  back to our play ground –not just DB + IR, but DB + IR + IE Instead of working on isolated research problems, lets build end-to-end UDMS –should repeat what we did with System R / Ingres: system blueprint, followed by 20 years of rapid progress –unifies & accelerate our research efforts –keeps work grounded, make impact

4 4 What Does this System Look Like? Extraction + Integration Flexible modes of interaction Mass collaboration Best-effort, pay-as-you-go, improving over time Scale up to huge data (by running over clusters) Joe Hellerstein Joe Six-Pack DB + IR + IE + II, in a best-effort, Web 2.0 fashion

5 5 Broader Impacts Great for many current applications –e-science, business, personal data, Web data, etc. Great for many current research topics –IR, integration, PIM, data spaces –user interfaces, HCI, mashup –provenance, uncertainty –cluster management –query processing –monitoring, handling changes, pub/sub systems Raises novel research issues –mass collab, best-effort, extraction, helping Joe Six-Pax Helps define data mgt principles in broader contexts


Download ppt "AnHai Doan University of Wisconsin-Madison Managing Unstructured Data."

Similar presentations


Ads by Google