Web Data Management Dr. Daniel Deutch. Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of.

Slides:



Advertisements
Similar presentations
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Advertisements

Sean Blong Presents: 1. What are they…?  “[…] specific type of information filtering (IF) technique that attempts to present information items (movies,
A (corny) ending. 2 Course Outcomes After this course, you should be able to answer: –How search engines work and why are some better than others –Can.
“ The Anatomy of a Large-Scale Hypertextual Web Search Engine ” Presented by Ahmed Khaled Al-Shantout ICS
Search Engines and Information Retrieval
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. 1 The Architecture of a Large-Scale Web Search and Query Engine.
A Web of Concepts Dalvi, et al. Presented by Andrew Zitzelberger.
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Web Search – Summer Term 2006 III. Web Search - Introduction (Cont.) - Jeff Dean, Google's Systems Lab:
The Anatomy of a Large-Scale Hypertextual Web Search Engine Sergey Brin and Lawrence Page.
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
Semantic (Language) Models: Robustness, Structure & Beyond Thomas Hofmann Department of Computer Science Brown University Chief Scientist.
Web Mining Research: A Survey
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Web Data Management Dr. Daniel Deutch. Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of.
Modern Information Retrieval Chapter 1 Introduction.
(A taste of) Data Management Over the Web. Web R&D The web has revolutionized our world – Relevant research areas include databases, networks, security…
CS 345 Data Mining Lecture 1 Introduction to Web Mining.
Misc Topics 2 Amol Deshpande CMSC424. Topics OLAP Data Warehouses Information Retrieval.
Exercise 1: Bayes Theorem (a). Exercise 1: Bayes Theorem (b) P (b 1 | c plain ) = P (c plain ) P (c plain | b 1 ) * P (b 1 )
Search engines fdm 20c introduction to digital media lecture warren sack / film & digital media department / university of california, santa.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Overview of Web Data Mining and Applications Part I
 Search engines are programs that search documents for specified keywords and returns a list of the documents where the keywords were found.  A search.
An Application of Graphs: Search Engines (most material adapted from slides by Peter Lee) Slides by Laurie Hiyakumoto.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Search Engines and Information Retrieval Chapter 1.
© Spinnaker Labs, Inc. Google Cluster Computing Faculty Training Workshop Open Source Tools for Teaching.
 Promote books online add more content – increase sales.
Search Engines. Internet protocol (IP) Two major functions: Addresses that identify hosts, locations and identify destination Connectionless protocol.
Web 2.0 Data Analysis DANIEL DEUTCH. Data Management “Data management is the development, execution and supervision of plans, policies, programs and practices.
Homework 4 Final homework Deadline: Sunday April 20, PM In this homework you have to write a short essay on how Google can handle new types of data.
1 BINGO! and Daffodil: Personalized Exploration of Digital Libraries and Web Sources Martin Theobald Max-Planck-Institut für Informatik Claus-Peter Klas.
Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?
Social Networking Algorithms related sections to read in Networked Life: 2.1,
Mini-Project on Web Data Analysis DANIEL DEUTCH. Data Management “Data management is the development, execution and supervision of plans, policies, programs.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Internet Information Retrieval Sun Wu. Course Goal To learn the basic concepts and techniques of internet search engines –How to use and evaluate search.
Pete Bohman Adam Kunk. Real-Time Search  Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.
Autumn Web Information retrieval (Web IR) Handout #0: Introduction Ali Mohammad Zareh Bidoki ECE Department, Yazd University
The Business Model of Google MBAA 609 R. Nakatsu.
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
CS315-Web Search & Data Mining. A Semester in 50 minutes or less The Web History Key technologies and developments Its future Information Retrieval (IR)
Lecture 1 Jan 08, Outline Course logistics Introducing tools to be used in the course Overview of Social Web and Web 2.0 Definition History Key.
Scribing Your responsibility to scribe at least one class (5 points of final grade!)
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
CS798: Information Retrieval Charlie Clarke Information retrieval is concerned with representing, searching, and manipulating.
Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.
Data Mining in Germany IIM Conference, Oct. 24, 2012 Gottfried Schwarz, DLR > Lecture > Author Document > Datewww.DLR.de Chart 1.
CS 440 Database Management Systems Web Data Management 1.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
WEB STRUCTURE MINING SUBMITTED BY: BLESSY JOHN R7A ROLL NO:18.
Crawling When the Google visit your website for the purpose of tracking, Google does this with help of machine, known as web crawler, spider, Google bot,
DATA MINING Introductory and Advanced Topics Part III – Web Mining
A Context Sensitive Searching and Ranking
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Artificial Intelligence Techniques
WIRED Week 2 Syllabus Update Readings Overview.
Defining Data-intensive computing
Social Bookmarking Tools
CSE 635 Multimedia Information Retrieval
Web Mining Department of Computer Science and Engg.
Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.
Google POV Google 0.1 Understand the structure of links on the web.
Web archives as a research subject
Information Retrieval and Web Design
Metadata supported full-text search in a web archive
Presentation transcript:

Web Data Management Dr. Daniel Deutch

Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of challenges – Web data is huge, unstructured, heterogonous, partially incorrect.. Just the ingredients of a fun topic!

Goals Searching for relevant web-pages – E.g. given keywords Understanding the results Ranking the results Combining results from different sources – E.g. Social networks + Search history – Combining rankings Recommendations – Movies, restaurants..

Types of Data On the Web Text XML Tables Hyperlinks Semantic tags …

Challenges Scale – The web is huge.. Heterogonous sources – Different models and analysis techniques need to be designed Uncertainty – A lot of errors (intentional or not) in data – A lot of errors in understanding data – Probabilistic modeling will be needed

Ingredients (Unordered) Web Data Types – Semi-structured – Structured – Unstructured Modeling & Storage – XML, text and relational DB representation – XML Typing & querying – Text models Search and Retrieval – Crawling – Querying – Information Retrieval and Extraction (basics)

Text Analysis – POS tagging Ranking – HITS algorithm – Google PageRank – Rank Aggregation and Top-K algorithms Recommendations – Collaborative Filtering – The NetFlix Million Dollars Challenge

Semantic Web – Onthologies – Data Integration – Deriving semantic information – Wikipedia as an example Web Services and Business Processes – BPEL, WSDL standards – Orchestration – Mashups – Analysis

Advanced Topics (time permitting) Querying the deep web Online advertisements – Models – Algorithms Distributed Data Management – MapReduce and PigLatin

Resources Web-site – Accessible from – Slides, exercises, links.. Book – – Free full version available online Papers – Links will be available when relevant

Your Duties 70% Final Exam 30% Exercises – Including programming tasks