Web Programming Week 13 Old Dominion University Department of Computer Science CS 418/518 Fall 2010 Martin Klein 11/23/10.

Slides:



Advertisements
Similar presentations
IMPLEMENTATION OF INFORMATION RETRIEVAL SYSTEMS VIA RDBMS.
Advertisements

Chapter 5: Introduction to Information Retrieval
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.
1 CS 430 / INFO 430 Information Retrieval Lecture 3 Vector Methods 1.
Modern Information Retrieval Chapter 1 Introduction.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Search engines fdm 20c introduction to digital media lecture warren sack / film & digital media department / university of california, santa.
1 CS 502: Computing Methods for Digital Libraries Lecture 11 Information Retrieval I.
Chapter 5: Information Retrieval and Web Search
DAT702.  Standard Query Language  Ability to access and manipulate databases ◦ Retrieve data ◦ Insert, delete, update records ◦ Create and set permissions.
Database Systems Lecture 5 Natasha Alechina
DATABASES AND SQL. Introduction Relation: Relation means table(data is arranged in rows and columns) Domain : A domain is a pool of values appearing in.
Introduction To Databases IDIA 618 Fall 2014 Bridget M. Blodgett.
MySql In Action Step by step method to create your own database.
1 Copyright 2006 MySQL AB The World’s Most Popular Open Source Database Full Text Search in MySQL 5.1 New Features and HowTo Alexander Rubin Senior Consultant,
Structured Query Language (SQL) A2 Teacher Up skilling LECTURE 2.
MySQL Dr. Hsiang-Fu Yu National Taipei University of Education
Session 5: Working with MySQL iNET Academy Open Source Web Development.
ASP.NET Programming with C# and SQL Server First Edition
PHP Programming with MySQL Slide 8-1 CHAPTER 8 Working with Databases and MySQL.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 7 INTRODUCTION TO STRUCTURED QUERY LANGUAGE (SQL) Instructor Ms. Arwa.
Dbwebsites 2.1 Making Database backed Websites Session 2 The SQL… Where do we put the data?
MySQL. Dept. of Computing Science, University of Aberdeen2 In this lecture you will learn The main subsystems in MySQL architecture The different storage.
CHAPTER:14 Simple Queries in SQL Prepared By Prepared By : VINAY ALEXANDER ( विनय अलेक्सजेंड़र ) PGT(CS),KV JHAGRAKHAND.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Chapter 4 Introduction to MySQL. MySQL “the world’s most popular open-source database application” “commonly used with PHP”
Introduction to Internet Databases MySQL Database System Database Systems.
CSC 2720 Building Web Applications Database and SQL.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
1 Structured Query Language (SQL). 2 Contents SQL – I SQL – II SQL – III SQL – IV.
Chapter 6: Information Retrieval and Web Search
SQL Basics. 5/27/2016Chapter 32 of 19 Naming SQL commands are NOT case sensitive SQL commands are NOT case sensitive But user identifier names ARE case.
Chapter 8 Databases.
CS146 References: ORACLE 9i PROGRAMMING A Primer Rajshekhar Sunderraman
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Introduction to Information Retrieval Introduction to Information Retrieval CS276 Information Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan.
# 1# 1 Creating Tables, Setting Constraints, and Datatypes What is a constraint and why do we use it? What is a datatype? What does CHAR mean? CS 105.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 7 (Part II) INTRODUCTION TO STRUCTURED QUERY LANGUAGE (SQL) Instructor.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Information Retrieval CSE 8337 Spring 2007 Introduction/Overview Some Material for these slides obtained from: Modern Information Retrieval by Ricardo.
Information Retrieval
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
Chapter 8 Manipulating MySQL Databases with PHP PHP Programming with MySQL 2 nd Edition.
ITS232 Introduction To Database Management Systems Siti Nurbaya Ismail Faculty of Computer Science & Mathematics, Universiti Teknologi MARA (UiTM), Kedah.
1 CS 430 Database Theory Winter 2005 Lecture 10: Introduction to SQL.
Student Centered ODS ETL Processing. Insert Search for rows not previously in the database within a snapshot type for a specific subject and year Search.
There are two types of MySQL instructions (Data Definition Language) DDL: Create database, create table, alter table,,,. (Data Manipulation Language) DML.
Introduction to MySQL Ullman Chapter 4. Introduction MySQL most popular open-source database application Is commonly used with PHP We will learn basics.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Introduction to Information Retrieval Introduction to Information Retrieval Introducing Information Retrieval and Web Search.
ELISQ Systems Demonstration Sagnik Ray Choudhury Doha -- May 2015.
MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Sravanthi Lakkimsety Mar 14,2016.
CS520 Web Programming Full Text Search Chengyu Sun California State University, Los Angeles.
Web Programming Week 14 Old Dominion University Department of Computer Science CS 418/518 Fall 2006 Michael L. Nelson 11/27/06.
Lecture 1.21 SQL Introduction Steven Jones, Genome Sciences Centre.
Decision Analysis Fall Term 2015 Marymount University School of Business Administration Professor Suydam Week 10 Access Basics – Tutorial B; Introduction.
CS 430: Information Discovery
Chapter 8 Working with Databases and MySQL
Web Programming Week 14 Old Dominion University
MySQL Dr. Hsiang-Fu Yu National Taipei University of Education
موضوع پروژه : بازیابی اطلاعات Information Retrieval
CSE 635 Multimedia Information Retrieval
Introduction to Information Retrieval
Chapter 5: Information Retrieval and Web Search
Web Programming Week 14 Old Dominion University
Chapter 4 Introduction to MySQL.
Information Retrieval and Web Design
Information Retrieval and Web Design
Information Retrieval
Presentation transcript:

Web Programming Week 13 Old Dominion University Department of Computer Science CS 418/518 Fall 2010 Martin Klein 11/23/10

Relational Data Model is a Special Case… SELECT pi.fname, m.aces, m.unforced_errors, m.winners FROM player_info pi, matches m WHERE pi.fname = “Andre” AND m.opponent_name = “Sampras” AND m.year = “2002”;

Unstructured Data is More Common…

Precision and Recall source: how much extra stuff did you get? how much did you miss?

Precision and Recall source: 10 documents in the index are relevant search returns 20 documents 5 of which are relevant 1 out of 4 retrieved documents are relevant half of the relevant documents were retrieved

Precision and Recall Precision Recall figure 1.2 in FBY

Why Isn’t Recall Always 100%? Virginia Agricultural and Mechanical College? Virginia Agricultural and Mechanical College and Polytechnic Institute? Virginia Polytechnic Institute? Virginia Polytechnic Institute and State University? Virginia Tech?

Precision and Recall - Literature

CREATE Table mysql> CREATE TABLE ODUtennis( -> id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY, -> title VARCHAR(200), -> body TEXT, -> ); Query OK, 0 rows affected (0.00 sec)

INSERT mysql> INSERT INTO ODUtennis (title, body) VALUES -> ('Monarchs Eliminated from 2010 CAA Tournament in Quarterfinals', 'The fourth-seeded Monarchs...'), -> ('Monarchs Close Out Season With 4-3 Win Over South Alabama', 'ODU closes out the 2010 tennis schedule...'), -> ('ODU Edged By DePaul in Mens Tennis Action, 4-3', 'Junior Tobias Fanselow was the other Monarch…'), -> ('ODU Mens Tennis Drops Delaware, 5-2, in CAA Action', 'The Old Dominion mens tennis team…doubles…Monarchs...'), -> ('Mens Tennis Nipped by #64 UNC Wilmington, 4-3', 'The two teams split the first two doubles matches…Monarchs…doubles...'); Query OK, 5 rows affected (0.00 sec) Records: 5 Duplicates: 0 Warnings: 0

LIKE & REGEXP We can search rows with the “LIKE” (or “REGEXP”) operator – –for tables of any size, this will be s-l-o-w

Example 1 mysql> SELECT id, title FROM ODUtennis WHERE title LIKE 'Monarchs Eliminated from 2010 CAA Tournament in Quarterfinals'; mysql> SELECT id, title FROM ODUtennis WHERE title REGEXP 'Monarchs Eliminated from 2010 CAA Tournament in Quarterfinals'; | id | title | | 1 | Monarchs Eliminated from 2010 CAA Tournament in Quarterfinals | row in set (0.00 sec)

Example 2 mysql> SELECT id, title FROM ODUtennis WHERE title REGEXP 'Monarchs'; | id | title | | 1 | Monarchs Eliminated from 2010 CAA Tournament in Quarterfinals | | 2 | Monarchs Close Out Season With 4-3 Win Over South Alabama | rows in set (0.01 sec) BUT mysql> SELECT id, title FROM ODUtennis WHERE title LIKE 'Monarchs'; Empty set (0.00 sec)

Full-Text Search – The Better Way MATCH()…AGAINST() Performs a natural language search over index Index = set of one ore more columns of the same table Index as argument to MATCH() Search string as argument to AGAINST() If used in WHERE clause result returned in order of relevance score –Relevance: similarity between search string and index row

CREATE Table mysql> CREATE TABLE ODUtennis ( -> id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY, -> title VARCHAR(200), -> body TEXT, -> FULLTEXT (title,body) -> ); Query OK, 0 rows affected (0.00 sec) can only create FULLTEXT on CHAR, VARCHAR or TEXT columns “title” and “body” still available as regular columns if you want to search only on “title”, you need to create a separate index

INSERT mysql> INSERT INTO ODUtennis (title, body) VALUES -> ('Monarchs Eliminated from 2010 CAA Tournament in Quarterfinals', 'The fourth-seeded Monarchs...'), -> ('Monarchs Close Out Season With 4-3 Win Over South Alabama', 'ODU closes out the 2010 tennis schedule...'), -> ('ODU Edged By DePaul in Mens Tennis Action, 4-3', 'Junior Tobias Fanselow was the other Monarch…'), -> ('ODU Mens Tennis Drops Delaware, 5-2, in CAA Action', 'The Old Dominion mens tennis team…doubles…Monarchs...'), -> ('Mens Tennis Nipped by #64 UNC Wilmington, 4-3', 'The two teams split the first two doubles matches…Monarchs…doubles...'); Query OK, 5 rows affected (0.00 sec) Records: 5 Duplicates: 0 Warnings: 0

MATCH.. AGAINST mysql > SELECT * FROM ODUtennis WHERE MATCH(title,body) AGAINST('fanselow'); | id | title | body | | 3 | ODU Edged By DePaul in Mens Tennis Action, 4-3 | Junior Tobias Fanselow was the other … | row in set (0.00 sec) mysql> SELECT * FROM ODUtennis WHERE MATCH(title,body) AGAINST('Monarchs'); Empty set (0.00 sec) why?!

Ranking If the word appears in > 50% of the rows then the word is considered a “stop word” and is not matched (unless you are in Boolean mode) –this makes sense for large collections (the word is not a good discriminator of records), but can lead to unexpected results for small collections

Stopwords Stopwords exist in stoplists or negative dictionaries Idea: remove low semantic content –index should only have “important stuff” What not to index is domain dependent, but often includes: –“small” words: a, and, the, but, of, an, very, etc. –NASA ADS example: –MySQL full-text index:

Stopwords Punctuation, numbers often stripped or treated as stopwords –precision suffers on searches for: NASA TM-3389 F-15 X.500.NET Tree::Suffix MySQL also treats words < 4 characters as stopwords –too bad for: “Liu”, “ORF”, “DEA”, etc.

Getting the Rank mysql> SELECT id, MATCH(title,body) AGAINST('doubles') from ODUtennis; | id | MATCH(title,body) AGAINST('doubles') | | 1 | 0 | | 2 | 0 | | 3 | 0 | | 4 | | | 5 | | rows in set (0.00 sec)

Getting the Rank in Order mysql> SELECT id, title, MATCH(title,body) AGAINST('doubles') AS score FROM ODUtennis WHERE MATCH(title,body) AGAINST('doubles'); | id | title | score | | 5 | Mens Tennis Nipped by #64 UNC Wilmington, 4-3 | | | 4 | ODU Mens Tennis Drops Delaware, 5-2, in CAA Action | | rows in set (0.00 sec)

Boolean Mode Does not use the 50% threshold Does use stopwords, length limitation Operator list: – mysql> SELECT id, title FROM ODUtennis WHERE MATCH(title,body) AGAINST('+Monarchs' IN BOOLEAN MODE); | id | title | | 1 | Monarchs Eliminated from 2010 CAA Tournament in Quarterfinals | | 2 | Monarchs Close Out Season With 4-3 Win Over South Alabama | | 4 | ODU Mens Tennis Drops Delaware, 5-2, in CAA Action | | 5 | Mens Tennis Nipped by #64 UNC Wilmington, 4-3 | rows in set (0.00 sec)

Blind Query Expansion (AKA Automatic Relevance Feedback) General assumption: user query is insufficient –Too short –Too generic –Too many results How does one keep up with Virginia Tech’s multiple names / nicknames? –Hokies, Fighting Gobblers, VPI, VPI&SU, Va Tech, VT Idea: 1.run the query with the requested terms 2.then take the results and 3.re-run the query with the most relevant terms from the initial results

Blind Query Expansion (AKA Automatic Relevance Feedback) mysql> SELECT * FROM ODUtennis WHERE MATCH(title,body) AGAINST('fanselow' WITH QUERY EXPANSION); | id | title | body | | 3 | ODU Edged By DePaul in Mens Tennis Action, 4-3 | Junior Tobias Fanselow was the other Monarch | | | to win, earning the win at No. 1 singles over | | | Alasdair Graetz, 6-2, 7-6 | | 4 | ODU Mens Tennis Drops Delaware, 5-2, in CAA Action | The Old Dominion mens tennis team improved | | | to 14-8 overall and 2-3 in the CAA with a 5-2 | | | victory over the Delaware Fighting Blue Hens | | | on Wednesday. After dropping all three doubles | | | matches, the Monarchs went on to win five of | | | the six singles matches for the victory. | rows in set (0.00 sec)

For More Information… MySQL documentation: – Chapter 12/13 “Building a Content Management System” CS 751/851 “Introduction to Digital Libraries” – –esp. “Information Retrieval Concepts” lecture Is MySQL the right tool for your job? – MySQL examples in this lecture based on those found at dev.mysql.com content snippets taken from