The Last Lecture Agenda –1:40-2:00pm Integrating XML and Search Engines—Niagara way –2:00-2:10pm My concluding remarks (if any) –2:10-2:45pm Interactive.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Chapter 5: Introduction to Information Retrieval
Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
IR Models: Overview, Boolean, and Vector
Search Engines and Information Retrieval
Data Management for XML: Research Directions By: Jennifer Widom Stanford University Reviewer: Kristin Streilein.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India.
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Information Retrieval in Practice
A Topic Specific Web Crawler and WIE*: An Automatic Web Information Extraction Technique using HPS Algorithm Dongwon Lee Database Systems Lab.
1 5/4: Final Agenda… 3:15—3:20 Raspberry bars »In lieu of Google IPO shares.. Homework 3 returned; Questions on Final? 3:15--3:40 Demos of student projects.
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
Web Information Retrieval and Extraction Chia-Hui Chang, Associate Professor National Central University, Taiwan Sep. 16, 2005.
1 Information Retrieval and Web Search Introduction.
Given two randomly chosen web-pages p 1 and p 2, what is the Probability that you can click your way from p 1 to p 2 ? 30%?. >50%?, ~100%? (answer at the.
1 Today  Tools (Yves)  Efficient Web Browsing on Hand Held Devices (Shrenik)  Web Page Summarization using Click- through Data (Kathy)  On the Summarization.
Information Retrieval
Recuperação de Informação. IR: representation, storage, organization of, and access to information items Emphasis is on the retrieval of information (not.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
 Search engines are programs that search documents for specified keywords and returns a list of the documents where the keywords were found.  A search.
The Co-op Database Project Who It's For At Northeastern University cooperative education is an integral part of the education experience. There is a continuous.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Search Engines and Information Retrieval Chapter 1.
1 The BT Digital Library A case study in intelligent content management Paul Warren
CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.
1 Chapter 19: Information Retrieval Chapter 19: Information Retrieval Relevance Ranking Using Terms Relevance Using Hyperlinks Synonyms., Homonyms,
Computing & Information Sciences Kansas State University Monday, 04 Dec 2006CIS 560: Database System Concepts Lecture 41 of 42 Monday, 04 December 2006.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Mini-Project on Web Data Analysis DANIEL DEUTCH. Data Management “Data management is the development, execution and supervision of plans, policies, programs.
EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong.
Chapter 6: Information Retrieval and Web Search
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
IT-522: Web Databases And Information Retrieval By Dr. Syed Noman Hasany.
Given two randomly chosen web-pages p 1 and p 2, what is the Probability that you can click your way from p 1 to p 2 ? 30%?. >50%?, ~100%? (answer at the.
1 Of Crawlers, Portals, Mice and Men: Is there more to Mining the Web? Jiawei Han Simon Fraser University, Canada ACM-SIGMOD’99 Web Mining Panel Presentation.
CS315-Web Search & Data Mining. A Semester in 50 minutes or less The Web History Key technologies and developments Its future Information Retrieval (IR)
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Ranking CSCI 572: Information Retrieval and Search Engines Summer 2010.
Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni
1 Information Retrieval LECTURE 1 : Introduction.
Information Retrieval CSE 8337 Spring 2007 Introduction/Overview Some Material for these slides obtained from: Modern Information Retrieval by Ricardo.
What Does the User Really Want ? Relevance, Precision and Recall.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
Fall CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh
Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.
Information Retrieval in Practice
Information Retrieval
Information Retrieval and Web Search
Prepared by Rao Umar Anwar For Detail information Visit my blog:
Information Retrieval and Web Search
Given two randomly chosen web-pages p1 and p2, what is the
Office Hours: 1-2pm T/Th 8/23
Course Outcomes After this course, you should be able to answer:
Data Mining Chapter 6 Search Engines
CSE 635 Multimedia Information Retrieval
1/21/10 Viewing the Coure in terms of IR, DB, Soc Net, ML adapted to web Start of IR.
Introduction to Information Retrieval
Panagiotis G. Ipeirotis Luis Gravano
Search Engine Architecture
Chapter 31: Information Retrieval
Information Retrieval and Web Design
Information Retrieval and Web Design
Chapter 19: Information Retrieval
Presentation transcript:

The Last Lecture Agenda –1:40-2:00pm Integrating XML and Search Engines—Niagara way –2:00-2:10pm My concluding remarks (if any) –2:10-2:45pm Interactive summarization of the semester –Teaching evaluations (I leave)

This part based on Niagara slides

Niagara

Generating a SEQL Query from XML-QL A different kind of Containment

“Review”

Main Topics Approximately three equal parts: –Information retrieval –Information integration/Aggregation –Information mining –other topics as permitted by time Useful course background –CSE 310 Data structures (Also 4xx course on Algorithms) –CSE 412 Databases –CSE 471 Intro to AI What I said on 1/17

What we did by 4/30

Information Retrieval Traditional Model –Given a set of documents A query expressed as a set of keywords –Return A ranked set of documents most relevant to the query –Evaluation: Precision: Fraction of returned documents that are relevant Recall: Fraction of relevant documents that are returned Efficiency Web-induced headaches –Scale (billions of documents) –Hypertext (inter-document connections) Consequently –Ranking that takes link structure into account Authority/Hub –Indexing and Retrieval algorithms that are ultra fast

Database Style Retrieval Traditional Model (relational) –Given: A single relational database –Schema –Instances A relational (sql) query –Return: All tuples satisfying the query Evaluation –Soundness/Completeness –efficiency Web-induced headaches Many databases all are partially complete overlapping heterogeneous schemas access limitations Network (un)reliability Consequently Newer models of DB Newer notions of completeness Newer approaches for query planning

What about “mining” Didn’t do too much “data” mining  –But did do some “web” mining Mining the link structure –A/H computation etc Clustering the search engine results –K-means; Agglomerative clustering Classification as part of focused crawling –The “distiller” approach

Interactive Review… 2:00-2:45: An interactive summarization of the class. Rather than me show up the list of topics we covered, I thought up a more interesting approach for summarizing the class in *your* collective words. Here is how it will go: *Everyone* in the class will be called on to list one topic/technique/issue that they felt they learned from the course. Generic answers like "I learned about search engines" are discouraged in favor of specific answers (such as "I thought the connection between the dominant eigen values and the way a/h computation works was quite swell"). It is okay to list topics/issues that you got interested in even if those were just a bit beyond what we actually covered. Note that there is an expectation that when your turn comes you will mention something that has not been mentioned by folks who spoke ahead of you. Since I get to decide the order in which to call on you, it is best if you jot down upto 5 things you thought you learned so the chance that you will say something different is higher.

Further headaches brought on by Semi-structured retrieval If everyone puts their pages in XML –Introducing similarity based retrieval into traditional databases –Standardizing on shared ontologies...

Learning Patterns (Web/DB mining) Traditional classification learning (supervised) –Given a set of structured instances of a pattern (concept) –Induce the description of the pattern Evaluation: –Accuracy of classification on the test data –(efficiency of learning) Mining headaches –Training data is not obvious –Training data is massive –Training instances are noisy and incomplete Consequently –Primary emphasis on fast classification Even at the expense of accuracy –80% of the work is “data cleaning”