Exercise 1: Bayes Theorem (a). Exercise 1: Bayes Theorem (b) P (b 1 | c plain ) = P (c plain ) P (c plain | b 1 ) * P (b 1 )

Slides:



Advertisements
Similar presentations
The Inside Story Christine Reilly CSCI 6175 September 27, 2011.
Advertisements

Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Natural Language Processing WEB SEARCH ENGINES August, 2002.
Web Search – Summer Term 2006 VI. Web Search - Ranking (c) Wolfgang Hürst, Albert-Ludwigs-University.
Web Search – Summer Term 2006 VI. Web Search - Ranking (cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Web Search – Summer Term 2006 I. General Introduction (c) Wolfgang Hürst, Albert-Ludwigs-University.
Web Search - Summer Term 2006 II. Information Retrieval (Basics Cont.)
Web Search - Summer Term 2006 III. Web Search - Introduction (Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
Information Retrieval in Practice
Search Engines and Information Retrieval
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
Architecture of the 1st Google Search Engine SEARCHER URL SERVER CRAWLERS STORE SERVER REPOSITORY INDEXER D UMP L EXICON SORTERS ANCHORS URL RESOLVER (CF.
By Andrei Broder, IBM Research 1 A Taxonomy of Web Search Presented By o Onur Özbek o Mirun Akyüz.
Web Search – Summer Term 2006 III. Web Search - Introduction (Cont.) - Jeff Dean, Google's Systems Lab:
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Web IR.
Searching the Web II. The Web Why is it important: –“Free” ubiquitous information resource –Broad coverage of topics and perspectives –Becoming dominant.
Web Search – Summer Term 2006 VII. Selected Topics - The Hilltop Algorithm (c) Wolfgang Hürst, Albert-Ludwigs-University.
Web Data Management Dr. Daniel Deutch. Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of.
Web Search – Summer Term 2006 VII. Selected Topics - Metasearch Engines [1] (c) Wolfgang Hürst, Albert-Ludwigs-University.
Web Search – Summer Term 2006 II. Information Retrieval (Basics Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Web Search – Summer Term 2006 II. Information Retrieval (Basics) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Web Search – Summer Term 2006 II. Information Retrieval (Basics Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Information Retrieval
Search engines fdm 20c introduction to digital media lecture warren sack / film & digital media department / university of california, santa.
Web Search – Summer Term 2006 VII. Selected Topics - PageRank (closer look) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Overview of Web Data Mining and Applications Part I
Overview of Search Engines
Enterprise & Intranet Search How Enterprise is different from Web search What to think about when evaluating Enterprise Search How Intranet use is different.
Web Search – Summer Term 2006 VI. Web Search - Ranking (c) Wolfgang Hürst, Albert-Ludwigs-University.
Search Engines and Information Retrieval Chapter 1.
CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏
Lecture 12 IR in Google Age. Traditional IR Traditional IR examples – Searching a university library – Finding an article in a journal archive – Searching.
Web Data Management Dr. Daniel Deutch. Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of.
Homework 4 Final homework Deadline: Sunday April 20, PM In this homework you have to write a short essay on how Google can handle new types of data.
Modern Information Retrieval Computer engineering department Fall 2005.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Autumn Web Information retrieval (Web IR) Handout #0: Introduction Ali Mohammad Zareh Bidoki ECE Department, Yazd University
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Search Engine Architecture
Autumn Web Information retrieval (Web IR) Handout #1:Web characteristics Ali Mohammad Zareh Bidoki ECE Department, Yazd University
CS315-Web Search & Data Mining. A Semester in 50 minutes or less The Web History Key technologies and developments Its future Information Retrieval (IR)
Structure of IR Systems INST 734 Module 1 Doug Oard.
Modern Information Retrieval Presented by Miss Prattana Chanpolto Faculty of Information Technology.
 A website, also written Web site, web site, or simply site, is a group of Web pages and related text, databases, graphics, audio, and video files that.
Information Retrieval
WIRED Week 4 Syllabus Review Readings Overview - Web IR Chapter - Brin & Page - Google - Kobayashi & Takeda – Overview Search Engine Optimization Assignment.
Web Search – Summer Term 2006 VII. Web Search - Indexing: Structure Index (c) Wolfgang Hürst, Albert-Ludwigs-University.
Toward Semantic Search: RDFa based facet browser Jin Guang Zheng Tetherless World Constellation.
WIRED Week 6 Syllabus Review Readings Overview Search Engine Optimization Assignment Overview & Scheduling Projects and/or Papers Discussion.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
Week-6 (Lecture-1) Publishing and Browsing the Web: Publishing: 1. upload the following items on the web Google documents Spreadsheets Presentations drawings.
General Architecture of Retrieval Systems 1Adrienn Skrop.
The Anatomy of a Large-Scale Hypertextual Web Search Engine (The creation of Google)
Presented By: Carlton Northern and Jeffrey Shipman The Anatomy of a Large-Scale Hyper-Textural Web Search Engine By Lawrence Page and Sergey Brin (1998)
WEB STRUCTURE MINING SUBMITTED BY: BLESSY JOHN R7A ROLL NO:18.
WEB SEARCH BASICS By K.KARTHIKEYAN. Web search basics The Web Ad indexes Web spider Indexer Indexes Search User Sec
Information Retrieval in Practice
Search Engine Architecture
Search Engine Architecture
Search Engines & Subject Directories
Submitted By: Usha MIT-876-2K11 M.Tech(3rd Sem) Information Technology
Thanks to Bill Arms, Marti Hearst
Data Mining Chapter 6 Search Engines
Introduction to Information Retrieval
Search Engines & Subject Directories
Search Engines & Subject Directories
Search Engine Architecture
Presentation transcript:

Exercise 1: Bayes Theorem (a)

Exercise 1: Bayes Theorem (b) P (b 1 | c plain ) = P (c plain ) P (c plain | b 1 ) * P (b 1 )

Exercise 1: Bayes Theorem P (b 1 | c plain ) = P (c plain ) P (c plain | b 1 ) * P (b 1 )

Exercise 2: Bayes Theorem d1: (Germany, win, soccer, worldcup, final, Brazil) d2: (Germany, win, soccer, worldcup, final, Brazil, champion, defeat) d3: (Germany, win, soccer, worldcup, final, Brazil, champion, defeat, Plasma, TV, sale, increase)

Exercise 2: Bayes Theorem

Exercise 3: Query expansion

Web Search – Summer Term 2006 III. Web Search - Introduction (c) Wolfgang Hürst, Albert-Ludwigs-University

INDEX Recap: IR System & Tasks Involved INFORMATION NEEDDOCUMENTS User Interface PERFORMANCE EVALUATION QUERY QUERY PROCESSING (PARSING & TERM PROCESSING) LOGICAL VIEW OF THE INFORM. NEED SELECT DATA FOR INDEXING PARSING & TERM PROCESSING SEARCHING RANKING RESULTS DOCS. RESULT REPRESENTATION

Information Retrieval (IR) Main problem: Unstructured, imprecisely, and imperfectly defined data But also: The whole search process can be characterized as uncertain and vague Hence: Information is often returned in form of a sorted list (docs ranked by relevance ). INFORMATION QUERY DATA / DOCUMENTS INFORMATION NEED

Classic IR vs. Web Search: Documents Hugh amount of data, continuous growth, high rate of change Hugh variability and heterogeneity - Quality, credibility and reputation of the source - Static vs. dynamic docs - Different media types (text, pics, audio, video) - Different formats (HTML, Flash, PDF,...) - Miscellaneous topics - Continuous text vs. note form / keywords - Different languages, encoding Spam and advertisements Web-specific characteristics - Hypertext, linking - Broken links - Unstructured, not always conform with standards Redundancy (syntactic and semantic) Distributed (need to collect them automatically) Different popularity and access frequency

Classic IR vs. Web Search: Users Different needs and aims, e.g. users might want - to learn s.th. ("informational") - to go to a particular site ("navigational") - to do s.th., e.g. shopping, download,... ("transactional") - to do other, miscellaneous things, e.g. finding hubs, "exploratory search",... Different premises, qualifications, languages,... Different network connection / bandwidths Imprecise, unspecific queries Short, ambiguous, inexact, incorrect, no usage of operators or special syntax

IR vs. Web Search Note: Most of this is true for IR as well, but... INFORMATION QUERY DATA / DOCUMENTS INFORMATION NEED The no. of users is huge. Very huge. The web is huge. Very huge. Big variety in dataBig variety in users Doc. authors don't cooperate (spam,...) Users don't cooperate (short queries,...)

How does web search work? Problem: High commercial interest -> Commercial search engines don't tell exactly how they work Nevertheless, many information is available: - High scientific interest -> Publications - Basic research is done (and published) by some companies (e.g. Google: labs.google.com/papers/) - Hard-fought market -> well observed and documented (e.g. - Many fan pages, "anti" fan pages, critical observers, web blogs, etc.

Schedule Web Search : - Introduction - Crawling - Page Repository - Indexing - Ranking (PageRank, HITS) - Exercises for web search basics - Advanced / additional web search topics In parallel : - Programming project (Lucene)