Download presentation
Presentation is loading. Please wait.
1
AnHai Doan University of Wisconsin-Madison Managing Unstructured Data
2
2 Unstructured Data... Unstructured Data... Appears in many forms –emails, Web pages, memos, call center text record, etc. Is pervasive –80% of the world data, and is growing Managed by many players –SIGIR/WWW/KDD/AAAI, Google/Yahoo/Microsoft/IBM We should work on it, or risk missing the boat! But what sets us apart from the above guys?
3
3 Structure + System Focus! Make it very easy to extract structures from raw data –in raw form keyword search / bag analysis –many apps want to go beyond that, they want structure –we should encourage this back to our play ground –not just DB + IR, but DB + IR + IE Instead of working on isolated research problems, lets build end-to-end UDMS –should repeat what we did with System R / Ingres: system blueprint, followed by 20 years of rapid progress –unifies & accelerate our research efforts –keeps work grounded, make impact
4
4 What Does this System Look Like? Extraction + Integration Flexible modes of interaction Mass collaboration Best-effort, pay-as-you-go, improving over time Scale up to huge data (by running over clusters) Joe Hellerstein Joe Six-Pack DB + IR + IE + II, in a best-effort, Web 2.0 fashion
5
5 Broader Impacts Great for many current applications –e-science, business, personal data, Web data, etc. Great for many current research topics –IR, integration, PIM, data spaces –user interfaces, HCI, mashup –provenance, uncertainty –cluster management –query processing –monitoring, handling changes, pub/sub systems Raises novel research issues –mass collab, best-effort, extraction, helping Joe Six-Pax Helps define data mgt principles in broader contexts
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.