Web Data Management Dr. Daniel Deutch. Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of.

Slides:



Advertisements
Similar presentations
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Advertisements

1 The PageRank Citation Ranking: Bring Order to the web Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd Presented by Fei Li.
Sean Blong Presents: 1. What are they…?  “[…] specific type of information filtering (IF) technique that attempts to present information items (movies,
“ The Anatomy of a Large-Scale Hypertextual Web Search Engine ” Presented by Ahmed Khaled Al-Shantout ICS
Search Engines and Information Retrieval
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. 1 The Architecture of a Large-Scale Web Search and Query Engine.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Web Search – Summer Term 2006 III. Web Search - Introduction (Cont.) - Jeff Dean, Google's Systems Lab:
© nCode 2000 Title of Presentation goes here - go to Master Slide to edit - Slide 1 Anatomy of a Large-Scale Hypertextual Web Search Engine ECE 7995: Term.
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
Semantic (Language) Models: Robustness, Structure & Beyond Thomas Hofmann Department of Computer Science Brown University Chief Scientist.
Web Mining Research: A Survey
(A taste of) Data Management Over the Web. Web R&D The web has revolutionized our world – Relevant research areas include databases, networks, security…
CS 345 Data Mining Lecture 1 Introduction to Web Mining.
1 COMP4332 Web Data Thanks for Raymond Wong’s slides.
Exercise 1: Bayes Theorem (a). Exercise 1: Bayes Theorem (b) P (b 1 | c plain ) = P (c plain ) P (c plain | b 1 ) * P (b 1 )
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Overview of Web Data Mining and Applications Part I
 Search engines are programs that search documents for specified keywords and returns a list of the documents where the keywords were found.  A search.
Review of Claremont Report on Database Research Jiaheng Lu Renmin University of China.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Search Engines and Information Retrieval Chapter 1.
© Spinnaker Labs, Inc. Google Cluster Computing Faculty Training Workshop Open Source Tools for Teaching.
Multimedia Databases (MMDB)
Search Engines. Internet protocol (IP) Two major functions: Addresses that identify hosts, locations and identify destination Connectionless protocol.
Web 2.0 Data Analysis DANIEL DEUTCH. Data Management “Data management is the development, execution and supervision of plans, policies, programs and practices.
Web Data Management Dr. Daniel Deutch. Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of.
Homework 4 Final homework Deadline: Sunday April 20, PM In this homework you have to write a short essay on how Google can handle new types of data.
Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?
Social Networking Algorithms related sections to read in Networked Life: 2.1,
Mini-Project on Web Data Analysis DANIEL DEUTCH. Data Management “Data management is the development, execution and supervision of plans, policies, programs.
Internet Information Retrieval Sun Wu. Course Goal To learn the basic concepts and techniques of internet search engines –How to use and evaluate search.
Pete Bohman Adam Kunk. Real-Time Search  Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.
Course grading Project: 75% Broken into several incremental deliverables Paper appraisal/evaluation/project tool evaluation in earlier May: 25%
Topical Categorization of Large Collections of Electronic Theses and Dissertations Venkat Srinivasan & Edward A. Fox Virginia Tech, Blacksburg, VA, USA.
Keyword Search in Databases using PageRank By Michael Sirivianos April 11, 2003.
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
Search Engines.
Research Topics/Areas. Adapting search to Users Advertising and ad targeting Aggregation of Results Community and Context Aware Search Community-based.
Weekly Project Dashboard: Project Name: Name: Qinyun Zhu Date: 5/17/2012 4/20/2012 R Key Accomplishments for this Reporting Period Read the AI book Chapter.
Categories of Presented Papers Papers Ranking Results – S. Brin and L. Page. The Page Rank Citation Ranking: Bringing Order to the Web. Stanford InfoLab.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Google PageRank Algorithm
CS798: Information Retrieval Charlie Clarke Information retrieval is concerned with representing, searching, and manipulating.
NeOn Components for Ontology Sharing and Reuse Mathieu d’Aquin (and the NeOn Consortium) KMi, the Open Univeristy, UK
Welcome to CPSC 534B: Information Integration Laks V.S. Lakshmanan Rm. 315.
CS 440 Database Management Systems Web Data Management 1.
The Semantic Web & Content Managment Systems Ole Gulbrandsen, CTO Stand: E7049.
WEB STRUCTURE MINING SUBMITTED BY: BLESSY JOHN R7A ROLL NO:18.
Harnessing the Deep Web : Present and Future -Tushar Mhaskar Jayant Madhavan, Loredana Afanasiev, Lyublena Antova, Alon Halevy January 7,
Crawling When the Google visit your website for the purpose of tracking, Google does this with help of machine, known as web crawler, spider, Google bot,
SEARCH ENGINES & WEB CRAWLER Akshay Ghadge Roll No: 107.
A Context Sensitive Searching and Ranking
Prepared by Rao Umar Anwar For Detail information Visit my blog:
A Comparative Study of Link Analysis Algorithms
Submitted By: Usha MIT-876-2K11 M.Tech(3rd Sem) Information Technology
WIRED Week 2 Syllabus Update Readings Overview.
Defining Data-intensive computing
What is a Search Engine EIT, Author Gay Robertson, 2017.
Social Bookmarking Tools
Web Mining Department of Computer Science and Engg.
International Marketing and Output Database Conference 2005
Introduction to Information Retrieval
Google POV Google 0.1 Understand the structure of links on the web.
Information Retrieval and Web Design
COMP5331 Web databases Prepared by Raymond Wong
Information Retrieval and Web Design
Metadata supported full-text search in a web archive
Presentation transcript:

Web Data Management Dr. Daniel Deutch

Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of challenges – Web data is huge, unstructured, heterogonous, partially incorrect.. Just the ingredients of a fun topic!

Challenges Bringing structure to the Web Utilizing the structure for various tasks Searching for relevant web-pages – Given keywords, social profile… Ranking the results Combining results from different sources – E.g. Social networks + Search history – Combining rankings Recommendations All with huge and uncertain databases

Ingredients Modeling & Storage – XML representation – XML Typing – XPath, XQuery – Efficient XML querying and manipulation Search and Retrieval – Crawling – Querying – Information Retrieval and Extraction (basics)

Ranking – HITS algorithm – Google PageRank – Rank Aggregation and Top-K algorithms Semantic Web – Onthologies – Data Integration – Deriving semantic information – Wikipedia as an example

Web Services and Business Processes – BPEL, WSDL standards – Orchestration – Mashups – Analysis Recommendations – Collaborative Filtering – The NetFlix Million Dollars Challenge

Querying the deep web Online advertisements – Models – Algorithms Building a large-scale application – Distributed data management – MapReduce and PigLatin

Resources Book – – Free full version available online Papers – Links will be available when relevant Web-site – Accesible from – All slides will be available online

Your Duties 20% Quiz 40% Project 40% Exercises – Including programming tasks