Generalized Vector Space Model Definition Let k i be a vector associated with the index term k i. Independence of index terms in the vector model implies.

Slides:



Advertisements
Similar presentations
PHP I.
Advertisements

4. Internet Programming ENG224 INFORMATION TECHNOLOGY – Part I
The Web Warrior Guide to Web Design Technologies
IR Models: Overview, Boolean, and Vector
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.
Distributed Web Software  Presentation-based, e.g., dynamic web pages  Service-based – Web Services.
Introduction to Google API… By Pratheepan Raveendranathan NLP Group Meeting 10/10/2003.
Google Web API See: Concept: With the Google Web APIs service, software developers can query more than 3 billion web documents.
Distributed Web Software Presentation-based, e.g., dynamic web pages Service-based – Web Services.
COMPUTER TERMS PART 1. COOKIE A cookie is a small amount of data generated by a website and saved by your web browser. Its purpose is to remember information.
IDK0040 Võrgurakendused I Building a site: Publicising Deniss Kumlander.
M. Taimoor Khan * Java Server Pages (JSP) is a server-side programming technology that enables the creation of dynamic,
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
Introduction to Java Appendix A. Appendix A: Introduction to Java2 Chapter Objectives To understand the essentials of object-oriented programming in Java.
FALL 2005CSI 4118 – UNIVERSITY OF OTTAWA1 Part 4 Web technologies: HTTP, CGI, PHP,Java applets)
Database-Driven Web Sites, Second Edition1 Chapter 8 Processing ASP.NET Web Forms and Working With Server Controls.
Chapter 16 The World Wide Web Chapter Goals ( ) Compare and contrast the Internet and the World Wide Web Describe general Web processing.
Chapter 16 The World Wide Web Chapter Goals Compare and contrast the Internet and the World Wide Web Describe general Web processing Describe several.
Server-side Scripting Powering the webs favourite services.
Concept demo System dashboard. Overview Dashboard use case General implementation ideas Use of MULE integration platform Collection Aggregation/Factorization.
Chapter 16 The World Wide Web. 2 The Web An infrastructure of information combined and the network software used to access it Web page A document that.
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
CPS120: Introduction to Computer Science The World Wide Web Nell Dale John Lewis.
INTERNET APPLICATION DEVELOPMENT For More visit:
Grid Computing, B. Wilkinson, 20043b.1 Web Services Part II.
XP New Perspectives on Browser and Basics Tutorial 1 1 Browser and Basics Tutorial 1.
JavaScript, Fourth Edition
WEEK 3 AND 4 USING CLIENT-SIDE SCRIPTS TO ENHANCE WEB APPLICATIONS.
IST 210: PHP BASICS IST 210: Organization of Data IST210 1.
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
ASP.NET.. ASP.NET Environment ASP.NET is Microsoft's programming framework that enables the development of Web applications and services. It is an easy.
WHAT IS A SEARCH ENGINE A search engine is not a physical engine, instead its an electronic code or a software programme that searches and indexes millions.
Chapter 8 Cookies And Security JavaScript, Third Edition.
Dreamweaver Edulaunch Project 1 EQ: What are the key concepts when building the first page of a web site?
Chapter 16 The World Wide Web Chapter Goals Compare and contrast the Internet and the World Wide Web Describe general Web processing Write basic.
.Net and Web Services Security CS795. Web Services A web application Does not have a user interface (as a traditional web application); instead, it exposes.
Tutorial 10 Programming with JavaScript
Done by: Hanadi Muhsen1 Tutorial 1.  Learn the history of JavaScript  Create a script element  Write text to a Web page with JavaScript  Understand.
Tools Menu and Other Concepts Alerts Event Log SLA Management Search Address Space Search Syslog Download NetIIS Standalone Application.
Using Client-Side Scripts to Enhance Web Applications 1.
DHTML AND JAVASCRIPT Genetic Computer School LESSON 5 INTRODUCTION JAVASCRIPT G H E F.
The basics of the array data structure. Storing information Computer programs (and humans) cannot operate without information. Example: The array data.
Chapter 6 Server-side Programming: Java Servlets
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
Search engines are the key to finding specific information on the vast expanse of the World Wide Web. Without sophisticated search engines, it would be.
Nell Dale & John Lewis (adaptation by Michael Goldwasser) The World Wide Web.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
IR Homework #2 By J. H. Wang Mar. 31, Programming Exercise #2: Query Processing and Searching Goal: to search relevant documents for a given query.
Intro to PHP IST2101. Review: HTML & Tags 2IST210.
Topic 1 Object Oriented Programming. 1-2 Objectives To review the concepts and terminology of object-oriented programming To discuss some features of.
Vector Space Models.
Module: Software Engineering of Web Applications Chapter 2: Technologies 1.
8 Chapter Eight Server-side Scripts. 8 Chapter Objectives Create dynamic Web pages that retrieve and display database data using Active Server Pages Process.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
Lucene Jianguo Lu.
8 th Semester, Batch 2009 Department Of Computer Science SSUET.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
©2003 Paula Matuszek GOOGLE API l Search requests: submit a query string and a set of parameters to the Google Web APIs service and receive in return a.
Presented By:. What is JavaHelp: Most software developers do not look forward to spending time documenting and explaining their product. JavaSoft has.
IR Homework #2 By J. H. Wang Apr. 13, Programming Exercise #2: Query Processing and Searching Goal: to search for relevant documents Input: a query.
World Wide Web has been created to share the text document across the world. In static web pages the requesting user has no ability to interact with the.
IST 210: PHP Basics IST 210: Organization of Data IST2101.
Programming Languages Concepts Chapter 1: Programming Languages Concepts Lecture # 4.
1.01- Understand Internet search tools and methods.
PHP and Forms.
Chapter 16 The World Wide Web.
Web Application Development Using PHP
Plug-In Architecture Pattern
Presentation transcript:

Generalized Vector Space Model Definition Let k i be a vector associated with the index term k i. Independence of index terms in the vector model implies that the set of vectors {k 1,k 2,…,k t } is linearly independent and forms a basis for the subspace of interest. The dimension of this space is the number t of index terms in the collection.

An example for independent V 1 =(1, 0, 0), V 2 =(0, 1, 0), V 3 =(0, 0, 1). V 1  V 2 =0+0+0=0. V i  V j =0. Each element represents a keywords. Different keywords are treated as totally different items. This is not reasonable since sometimes they are related.

Definition Given the set {k 1,k 2,…,k t } of index terms in a collection, as before, let w i,j be the weight associated with the term-document pair [k i,d j ]. If the w i,j weights are all binary, then all possible patterns of term co-occurrence (inside documents) can be represented by a set of 2 t minterms given by m 1 =(0,0,…,0), m 2 =(1,0,…,0),…, m 2 t =(1,1,…,1). Let g i (m j ) return the weight {0,1} of the index term k i in the minterm m i.

Definition Let us define the following set of vectors m 1 =(0, 0, …, 1) m 2 =(0, 0, …, 1, 0) ….. m 2 t -1 =(1, 1, …, 1). where each vector m i is associated with the respective minterm m i. for all

An example for Generalized Vector Space Model Suppose that the system has 12 documents and 4 keywords. D1=(2, 1, 0, 0), D2=(5, 1, 0, 0), D3=(1, 1, 1, 1), D4=(0, 0, 2, 2), D5=(0, 1, 1, 2), D6=(0, 0, 1, 1), D7=(0, 0, 1, 0), D8=(1, 1, 0, 0), D9=(2, 1, 1, 1), D10=(0, 2, 2, 2). D11=(1, 0, 2, 0), D12=(0,0, 2,1). Minterms: 6 minterms are used as independent vectors to form a base. m1=(1, 1, 0, 0), m2=(1, 1, 1, 1), m3=(0, 0, 1, 1), m4=(0, 1, 1, 1), m5=(0, 0,1, 0), m6=(1, 0, 1, 0).

Generalized Vector Space Model Independent vectors: v1= (1, 0, 0, 0, 0, 0), v2=(0, 1, 0, 0, 0, 0), v3=(0, 0, 1, 0, 0, 0), v4=(0, 0, 0, 1, 0, 0), v5=(0, 0, 0, 0, 1, 0), v6=(0, 0, 0, 0, 0, 1). V i represents minterm m i. Each pair of V i and V j is orthogonal. (dot product=0) The four keywords k1, k2, k3, and k4 are represent by a combination of the independent vectors.

Generalized Vector Space Model The four keywords k1, k2, k3, and k4 are represent by a combination of the independent vectors. k1=(c 1,1 V1+c 1,2 V2+c 1,3 V3+c 1,4 V4+c 1,5 V5+c 1,6 V6)/C where c 1,1 =w 1,1 +w 1,2 +w 1,8 =2+5+1 (D1, D2, and D8 has minterm m1), c 1,2 =w 1,3 +w 1,9 =1+2=3(D3 and D9 has minterm m2), c 1,3 =w 1,4 +w 1,6 +w 1,12 =0+0+0=0 (D4, D6 and D12 has minterm m3.), c1,4=w 1,5 +w 1,10 =0+0. c1,5=w1,7=0. c 1,6 =w 1,11 =1. C=(c 1,1 2 +c 1,2 2 +c 1,3 2 +c 1,4 2 +c 1,5 2 +c 1,6 2 ) 0.5

Generalized Vector Space Model k2=(c 2,1 V1+c 2,2 V2+c 2,3 V3+c 2,4 V4+c 2,5 V5+c 2,6 V6)/C where c 2,1 =w 2,1 +w 2,2 +w 2,8 = (D1, D2, and D8 has minterm m1), c 2,2 =w 2,3 +w 2,9 =1+1=2(D3 and D9 has minterm m2), c 2,3 =w 2,4 +w 2,6 +w 2,12 =0+0+0=0 (D4, D6 and D12 has minterm m3.), c 2,4 =w 2,5 +w 2,10 =1+2=3. c 2,5 =w 2,7 =0. c 2,6 =w 2,11 =0. C=(c 2,1 2 +c 2,2 2 +c 2,3 2 +c 2,4 2 +c 2,5 2 +c 2,6 2 ) 0.5

Generalized Vector Space Model k3=(c 3,1 V1+c 3,2 V2+c 3,3 V3+c 3,4 V4+c 3,5 V5+c 3,6 V6)/C where c 3,1 =w 3,1 +w 3,2 +w 3,8 =0 (D1, D2, and D8 has minterm m1), c 3,2 =w 3,3 +w 3,9 =1+1=2(D3 and D9 has minterm m2), c 3,3 =w 3,4 +w 3,6 +w 2,12 =2+1+2=5 (D4, D6 and D12 has minterm m3.), c 3,4 =w 3,5 +w 3,10 =1+2=3. c 3,5 =w 3,7 =1. c 3,6 =w 3,11 =2. C=(c 3,1 2 +c 3,2 2 +c 3,3 2 +c 3,4 2 +c 3,5 2 +c 3,6 2 ) 0.5

Generalized Vector Space Model k4=(c 4,1 V1+c 4,2 V2+c 4,3 V3+c 4,4 V4+c 4,5 V5+c 4,6 V6)/C where c 4,1 =w 4,1 +w 4,2 +w 4,8 =0 (D1, D2, and D8 has minterm m1), c 4,2 =w 4,3 +w 4,9 =1+1=2(D3 and D9 has minterm m2), c 4,3 =w 4,4 +w 4,6 +w 4,12 =2+1+1=4 (D4, D6 and D12 has minterm m3.), c 4,4 =w 4,5 +w 4,10 =2+2=4. c 4,5 =w 4,7 =0. c 4,6 =w 4,11 =0. C=(c 4,1 2 +c 4,2 2 +c 4,3 2 +c 4,4 2 +c 4,5 2 +c 4,6 2 ) 0.5 Ki’s are converted from a vector of length 4 into a vector of length 6.

Google Web API See:

Concept: With the Google Web APIs service, software developers can query more than 3 billion web documents directly from their own computer programs. Google uses the SOAP and WSDL standards so a developer can program in his or her favorite environment - such as Java, Perl, or Visual Studio.NET.

Google Web APIs provide three service: Search relative web pages according to the keyword(s) user supplies Return the cached web page to the user by the URL user supplies Correct the spell of the word user inputs

Search Requests: Search requests submit a query string and a set of parameters to the Google Web APIs service and receive in return a set of search results. Search results are derived from Google’s index of over 2 billion Web pages.

Seach Request Format: NameDescription KeyProvided by Google, Google uses the key for authentication and logging QQuery string startZero-based index of the first desired result maxRes ults Number of results desired per query. The maximum value per query is 10. (see next page)

filterActivates or deactivates automatic results filtering, which hides very similar results and results that all come from the same Web host. restrictRestricts the search to a subset of the Google Web index, such as a topic like “Linux”. safeSe arch A Boolean value which enables filtering of adult content in the search results. lrLanguage Restrict-Restricts the search to documents within one or more languages.

Search Results Format: Search Response----Each time you issue a search request to the Google service, a response is returned to you. (We will describe the meanings of the values returned to you.) Result Element

Search Response: --A Boolean value indicating whether filtering was performed on the search results --A text string intended for displaying to an end user --The estimated total number of results that exist for the query

Continue: --A Boolean value indicating that the estimate value is actually the exact value --An array of items. This corresponds to the actual list of search results --This is the value of for the search request

Continue: --Indicates the index (1-based) of the first search result in --Indicates the index(1-based) of the last search result in --A text string intended for displaying to the end user. It provides instructive suggestions on how to use Google

Continue: --An array of items --Text, floating-point number indicating the total server time to return the search results, measured in seconds

Cache Requests: Cache requests submit a URL to the Google Web APIs service and receive in return the contents of the URL when Google’s crawlers last visited the page.

Spelling Requests: Spelling requests submit a query to the Google Web APIs service and receive in return a suggested spell correction for the query (if available).

Java Implementation: Google provides a java implementation of the Google Web APIs We will take a look at it and provide an example finally.

The java classes: com.google.soap.search.GoogleSearch com.google.soap.search.GoogleSearchRe sult com.google.soap.search.GoogleSearchRe sultElement com.google.soap.search.GoogleSearchFa ult com.google.soap.search.GoogleSearchDir ectoryCategory

Usage Demo: GoogleSearch s = new GoogleSearch(); s.setKey(clientKey); try { if (directive.equalsIgnoreCase("search")) { s.setQueryString(directiveArg); GoogleSearchResult r = s.doSearch(); System.out.println(r.toString()); } else if (directive.equalsIgnoreCase("cached")) { byte [] cachedBytes = s.doGetCachedPage(directiveArg); String cachedString = new String(cachedBytes); System.out.println(cachedString); } else if (directive.equalsIgnoreCase("spell")) { System.out.println("Spelling suggestion:"); String suggestion = s.doSpellingSuggestion(directiveArg); System.out.println(suggestion); }

How to build the executive file 1. Write your own code in the right place of the GoogleAPIDemo.java; 2. Compile GoogleAPIDemo.java; 3. Add the GoogleAPIDemo$1.class and GoogleAPIDemo.class (both generated by 2) in the directory “com.google.soap.search” of GoogleAPI.jar with the software WinRAR. 4. Click the exec.bat to run the program.

Example program: You can download the executive files and source files of the example from Dr. Wang’s home page.