Presentation is loading. Please wait.

Presentation is loading. Please wait.

Web Data Management Dr. Daniel Deutch. Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of.

Similar presentations


Presentation on theme: "Web Data Management Dr. Daniel Deutch. Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of."— Presentation transcript:

1 Web Data Management Dr. Daniel Deutch

2 Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of challenges – Web data is huge, unstructured, heterogonous, partially incorrect.. Just the ingredients of a fun topic!

3 Goals Searching for relevant web-pages – E.g. given keywords Understanding the results Ranking the results Combining results from different sources – E.g. Social networks + Search history – Combining rankings Recommendations – Movies, restaurants..

4 Types of Data On the Web Text XML Tables Hyperlinks Semantic tags …

5 Challenges Scale – The web is huge.. Heterogonous sources – Different models and analysis techniques need to be designed Uncertainty – A lot of errors (intentional or not) in data – A lot of errors in understanding data – Probabilistic modeling will be needed

6 Ingredients (Unordered) Web Data Types – Semi-structured – Structured – Unstructured Modeling & Storage – XML, text and relational DB representation – XML Typing & querying – Text models Search and Retrieval – Crawling – Querying – Information Retrieval and Extraction (basics)

7 Text Analysis – POS tagging Ranking – HITS algorithm – Google PageRank – Rank Aggregation and Top-K algorithms Recommendations – Collaborative Filtering – The NetFlix Million Dollars Challenge

8 Semantic Web – Onthologies – Data Integration – Deriving semantic information – Wikipedia as an example Web Services and Business Processes – BPEL, WSDL standards – Orchestration – Mashups – Analysis

9 Advanced Topics (time permitting) Querying the deep web Online advertisements – Models – Algorithms Distributed Data Management – MapReduce and PigLatin

10 Resources Web-site – Accessible from http://cs.tau.ac.il/~danieldehttp://cs.tau.ac.il/~danielde – Slides, exercises, links.. Book – http://webdam.inria.fr/Jorge/index.php http://webdam.inria.fr/Jorge/index.php – Free full version available online Papers – Links will be available when relevant

11 Your Duties 70% Final Exam 30% Exercises – Including programming tasks


Download ppt "Web Data Management Dr. Daniel Deutch. Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of."

Similar presentations


Ads by Google