Web Programming Week 14 Old Dominion University Department of Computer Science CS 418/518 Fall 2006 Michael L. Nelson 11/27/06.

Slides:



Advertisements
Similar presentations
MS-Access XP Lesson 1. Introduction to MS-Access Database Management System Software (DBMS) Store data in databases Database is a collection of table.
Advertisements

Chapter 5: Introduction to Information Retrieval
UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
2010/11 : [1]Building Web Applications using MySQL and PHP (W1)MySQL Recap.
Text Operations: Preprocessing. Introduction Document preprocessing –to improve the precision of documents retrieved –lexical analysis, stopwords elimination,
1 CS 430 / INFO 430 Information Retrieval Lecture 3 Vector Methods 1.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Search engines fdm 20c introduction to digital media lecture warren sack / film & digital media department / university of california, santa.
Chapter 5: Information Retrieval and Web Search
DAT702.  Standard Query Language  Ability to access and manipulate databases ◦ Retrieve data ◦ Insert, delete, update records ◦ Create and set permissions.
Databases & Data Warehouses Chapter 3 Database Processing.
Introduction To Databases IDIA 618 Fall 2014 Bridget M. Blodgett.
1 Copyright 2006 MySQL AB The World’s Most Popular Open Source Database Full Text Search in MySQL 5.1 New Features and HowTo Alexander Rubin Senior Consultant,
MySQL Dr. Hsiang-Fu Yu National Taipei University of Education
Web Programming Week 13 Old Dominion University Department of Computer Science CS 418/518 Fall 2010 Martin Klein 11/23/10.
PHP Programming with MySQL Slide 8-1 CHAPTER 8 Working with Databases and MySQL.
Mr. Justin “JET” Turner CSCI 3000 – Fall 2015 CRN Section A – TR 9:30-10:45 CRN – Section B – TR 5:30-6:45.
Dbwebsites 2.1 Making Database backed Websites Session 2 The SQL… Where do we put the data?
MySQL. Dept. of Computing Science, University of Aberdeen2 In this lecture you will learn The main subsystems in MySQL architecture The different storage.
Basics of Information Retrieval Lillian N. Cassel Some of these slides are taken or adapted from Source:
CHAPTER:14 Simple Queries in SQL Prepared By Prepared By : VINAY ALEXANDER ( विनय अलेक्सजेंड़र ) PGT(CS),KV JHAGRAKHAND.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Chapter 4 Introduction to MySQL. MySQL “the world’s most popular open-source database application” “commonly used with PHP”
Introduction to Internet Databases MySQL Database System Database Systems.
CSC 2720 Building Web Applications Database and SQL.
Database and mySQL Week 07 Dynamic Web TCNJ Jean Chu.
1 By: Nour Hilal. Microsoft Access is a database software where data is stored in one or more Tables. A Database is a group of related Tables. Access.
1 Structured Query Language (SQL). 2 Contents SQL – I SQL – II SQL – III SQL – IV.
Chapter 6: Information Retrieval and Web Search
SQL Basics. 5/27/2016Chapter 32 of 19 Naming SQL commands are NOT case sensitive SQL commands are NOT case sensitive But user identifier names ARE case.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
CS146 References: ORACLE 9i PROGRAMMING A Primer Rajshekhar Sunderraman
Introduction to Information Retrieval Introduction to Information Retrieval CS276 Information Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan.
Advanced Web 2012 Lecture 3 Sean Costain What is a Database? Sean Costain 2012 A database is a structured way of dealing with structured information.
Course FAQ’s I do not have any knowledge on SQL concepts or Database Testing. Will this course helps me to get through all the concepts? What kind of.
# 1# 1 Creating Tables, Setting Constraints, and Datatypes What is a constraint and why do we use it? What is a datatype? What does CHAR mean? CS 105.
Introduction to Digital Libraries Information Retrieval.
RDBMS MySQL. MySQL is a Relational Database Management System MySQL allows multiple tables to be related to each other. Similar to a Grandparent to a.
Evaluation of the NSDL and Google for Obtaining Pedagogical Resources Frank McCown, Johan Bollen, and Michael L. Nelson Old Dominion University Computer.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Information Retrieval
1 CS 430 Database Theory Winter 2005 Lecture 10: Introduction to SQL.
Student Centered ODS ETL Processing. Insert Search for rows not previously in the database within a snapshot type for a specific subject and year Search.
There are two types of MySQL instructions (Data Definition Language) DDL: Create database, create table, alter table,,,. (Data Manipulation Language) DML.
Introduction to MySQL Ullman Chapter 4. Introduction MySQL most popular open-source database application Is commonly used with PHP We will learn basics.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Introduction to Information Retrieval Introduction to Information Retrieval Introducing Information Retrieval and Web Search.
CS520 Web Programming Full Text Search Chengyu Sun California State University, Los Angeles.
CS315 Introduction to Information Retrieval Boolean Search 1.
Lecture 1.21 SQL Introduction Steven Jones, Genome Sciences Centre.
Decision Analysis Fall Term 2015 Marymount University School of Business Administration Professor Suydam Week 10 Access Basics – Tutorial B; Introduction.
Large Scale Search: Inverted Index, etc.
Practical Office 2007 Chapter 10
Text Based Information Retrieval
Database application MySQL Database and PhpMyAdmin
Web Programming Week 3 Old Dominion University
Chapter 8 Working with Databases and MySQL
CSCE 561 Information Retrieval System Models
Web Programming Week 14 Old Dominion University
MySQL Dr. Hsiang-Fu Yu National Taipei University of Education
موضوع پروژه : بازیابی اطلاعات Information Retrieval
Introduction to Information Retrieval
Chapter 5: Information Retrieval and Web Search
Web Programming Week 14 Old Dominion University
Web Programming Week 3 Old Dominion University
Search Engine Architecture
Chapter 4 Introduction to MySQL.
Web Programming Week 3 Old Dominion University
Introduction to Digital Libraries Assignment #1
Information Retrieval
Presentation transcript:

Web Programming Week 14 Old Dominion University Department of Computer Science CS 418/518 Fall 2006 Michael L. Nelson 11/27/06

Relational Data Model is a Special Case… SELECT name, catches, yards, touchdowns FROM VT_Boxscores, VT_Roster WHERE game_id = “12” AND number = “4” AND year = “2006”;

Unstructured Data is More Common…

Precision and Recall Precision –“ratio of the number of relevant documents retrieved over the total number of documents retrieved” (p. 10) –how much extra stuff did you get? Recall –“ratio of relevant documents retrieved for a given query over the number of relevant documents for that query in the database” (p. 10) note: assumes a priori knowledge of the denominator! –how much did you miss?

Precision and Recall Precision Recall figure 1.2 in FBY

LIKE & REGEXP We can search rows with the “LIKE” (or “REGEXP”) operator – –for tables of any size, this will be s-l-o-w –there is a better way… mysql> SELECT id, name FROM VT_Roster WHERE name LIKE ‘Se%’ -> AND year=‘2006’); | id | name | | 7 | Sean Glennon | | 70 | Sergio Render | rows in set (0.00 sec)

CREATE Table mysql> CREATE TABLE recaps ( -> id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY, -> title VARCHAR(200), -> body TEXT, -> FULLTEXT (title,body) -> ); Query OK, 0 rows affected (0.00 sec) can only create FULLTEXT on CHAR, VARCHAR or TEXT columns “title” and “body” still available as regular columns if you want to search only on “title”, you need to create a separate index

INSERT mysql> INSERT INTO recaps (title,body) VALUES -> ('Hokies Blank UVa', '#17 Hokies ended the season...'), -> ('Hokies Put Wake in Their Place', 'Sean Glennon threw for...'), -> ('Hokies Blank Kent State', 'Virgina Tech overcame a sloppy...'); Query OK, 3 rows affected (0.00 sec) Records: 3 Duplicates: 0 Warnings: 0

MATCH.. AGAINST mysql> SELECT * FROM recaps -> WHERE MATCH (title,body) AGAINST (’sloppy'); | id | title | body | | 3 | Hokies Blank Kent State | Virginia Tech overcame a sloppy... | row in set (0.00 sec) mysql> SELECT * FROM recaps -> WHERE MATCH (title,body) AGAINST (’Hokies'); | id | title | body | rows in set (0.00 sec) why?!

Ranking If you are not in Boolean mode and the word appears in > 50% of the rows, then the word is considered a “stop word” and is not matched –this makes sense for large collections (the word is not a good discriminator of records), but can lead to unexpected results for small collections

Stopwords Stopwords exist in stoplists or negative dictionaries Idea: remove low semantic content –index should only have “important stuff” What not to index is domain dependent, but often includes: –“small” words: a, and, the, but, of, an, very, etc. –NASA ADS example: –MySQL full-text index:

Stopwords Punctuation, numbers often stripped or treated as stopwords –precision suffers on searches for: NASA TM-3389 F-15 X.500.NET Tree::Suffix MySQL also treats words < 4 characters as stopwords –too bad for: “Liu”, “CFD”, “Ada”, etc.

Getting the Rank mysql> SELECT id, MATCH (title,body) AGAINST (’Sewell') -> FROM recaps; | id | MATCH (title,body) AGAINST (’Sewel') | | 1 | | | 2 | 0 | | 3 | 0 | rows in set (0.00 sec)

Boolean Mode Does not use the 50% threshold Does use stopwords, length limitation Operator list: – mysql> SELECT * FROM recaps -> WHERE MATCH (title,body) AGAINST (’+Hokies’ IN BOOLEAN MODE); | id | title | body | | 1 | Hokies Blank UVa | #17 Hokies ended the season... | | 2 | Hokies Put Wake in... | Sean Glennon threw for... | | 3 | Hokies Blank Kent State | Virginia Tech overcame a sloppy... | rows in set (0.00 sec)

Blind Query Expansion (AKA Automatic Relevance Feedback) How does one keep up with Virginia Tech’s multiple names / nicknames? –Hokies, Fighting Gobblers, VPI, VPI&SU, Va Tech, VT Idea: run the query with the requested terms, then take the results and re- run the query with the most relevant terms from the initial results mysql> SELECT * FROM recaps -> WHERE MATCH (title,body) AGAINST (’Virginia Tech'); | id | title | body | | 3 | Hokies Blank Kent State| Virginia Tech overcame a sloppy... | rows in set (0.00 sec) mysql> SELECT * FROM recaps -> WHERE MATCH (title,body) AGAINST (’Virginia Tech’ WITH QUERY EXPANSION); | id | title | body | | 1 | Hokies Blank UVa | #17 Hokies ended the season... | | 2 | Hokies Put Wake in... | Sean Glennon threw for... | | 3 | Hokies Blank Kent State | Virginia Tech overcame a sloppy... | rows in set (0.00 sec) in this example, pretend “Virginia Tech” did not appear in the game recaps and that “Hokies” appears in > 50% of rows

For More Information… MySQL documentation: – Chapter 12/13 “Building a Content Management System” CS 751/851 “Introduction to Digital Libraries” – –esp. “Information Retrieval Concepts” lecture Is MySQL the right tool for your job? – MySQL examples in this lecture based on those found at dev.mysql.com content snippets taken from