Web Data Management Dr. Daniel Deutch
Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of challenges – Web data is huge, unstructured, heterogonous, partially incorrect.. Just the ingredients of a fun topic!
Challenges Bringing structure to the Web Utilizing the structure for various tasks Searching for relevant web-pages – Given keywords, social profile… Ranking the results Combining results from different sources – E.g. Social networks + Search history – Combining rankings Recommendations All with huge and uncertain databases
Ingredients Modeling & Storage – XML representation – XML Typing – XPath, XQuery – Efficient XML querying and manipulation Search and Retrieval – Crawling – Querying – Information Retrieval and Extraction (basics)
Ranking – HITS algorithm – Google PageRank – Rank Aggregation and Top-K algorithms Semantic Web – Onthologies – Data Integration – Deriving semantic information – Wikipedia as an example
Web Services and Business Processes – BPEL, WSDL standards – Orchestration – Mashups – Analysis Recommendations – Collaborative Filtering – The NetFlix Million Dollars Challenge
Querying the deep web Online advertisements – Models – Algorithms Building a large-scale application – Distributed data management – MapReduce and PigLatin
Resources Book – – Free full version available online Papers – Links will be available when relevant Web-site – Accesible from – All slides will be available online
Your Duties 20% Quiz 40% Project 40% Exercises – Including programming tasks