Download presentation
Presentation is loading. Please wait.
Published byLindsey York Modified over 9 years ago
1
Web Data Management Dr. Daniel Deutch
2
Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of challenges – Web data is huge, unstructured, heterogonous, partially incorrect.. Just the ingredients of a fun topic!
3
Goals Searching for relevant web-pages – E.g. given keywords Understanding the results Ranking the results Combining results from different sources – E.g. Social networks + Search history – Combining rankings Recommendations – Movies, restaurants..
4
Types of Data On the Web Text XML Tables Hyperlinks Semantic tags …
5
Challenges Scale – The web is huge.. Heterogonous sources – Different models and analysis techniques need to be designed Uncertainty – A lot of errors (intentional or not) in data – A lot of errors in understanding data – Probabilistic modeling will be needed
6
Ingredients (Unordered) Web Data Types – Semi-structured – Structured – Unstructured Modeling & Storage – XML, text and relational DB representation – XML Typing & querying – Text models Search and Retrieval – Crawling – Querying – Information Retrieval and Extraction (basics)
7
Text Analysis – POS tagging Ranking – HITS algorithm – Google PageRank – Rank Aggregation and Top-K algorithms Recommendations – Collaborative Filtering – The NetFlix Million Dollars Challenge
8
Semantic Web – Onthologies – Data Integration – Deriving semantic information – Wikipedia as an example Web Services and Business Processes – BPEL, WSDL standards – Orchestration – Mashups – Analysis
9
Advanced Topics (time permitting) Querying the deep web Online advertisements – Models – Algorithms Distributed Data Management – MapReduce and PigLatin
10
Resources Web-site – Accessible from http://cs.tau.ac.il/~danieldehttp://cs.tau.ac.il/~danielde – Slides, exercises, links.. Book – http://webdam.inria.fr/Jorge/index.php http://webdam.inria.fr/Jorge/index.php – Free full version available online Papers – Links will be available when relevant
11
Your Duties 70% Final Exam 30% Exercises – Including programming tasks
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.