1 Searching through the Internet Dr. Eslam Al Maghayreh Computer Science Department Yarmouk University.

Slides:



Advertisements
Similar presentations
Traditional IR models Jian-Yun Nie.
Advertisements

Chapter 5: Introduction to Information Retrieval
Multimedia Database Systems
Google Chrome & Search C Chapter 18. Objectives 1.Use Google Chrome to navigate the Word Wide Web. 2.Manage bookmarks for web pages. 3.Perform basic keyword.
Natural Language Processing WEB SEARCH ENGINES August, 2002.
Best Web Directories and Search Engines Order Out of Chaos on the World Wide Web.
ISP 433/533 Week 2 IR Models.
A Mobile World Wide Web Search Engine Wen-Chen Hu Department of Computer Science University of North Dakota Grand Forks, ND
Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Unit 3 Web Search Engines. Can You Find the Answers? n Connect to Google Google n Search for items on Iran Records ________ n Combine Iran with nuclear.
1 CS 502: Computing Methods for Digital Libraries Lecture 11 Information Retrieval I.
Internet Research Search Engines & Subject Directories.
What’s The Difference??  Subject Directory  Search Engine  Deep Web Search.
What are search engines? Tools used for locating web pages Automated software programs known as spiders or bots to survey the Web and build their databases.
Web Design/Internet Essentials Search Engines and Searching the Web.
11/23/2003 Google Search Tips: Advanced Features Rabie A. Ramadan Adapted from “Robin Hartman, Associate Librarian Darling Library – Hope International.
CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏
Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available.
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
Information Retrieval and Web Search Lecture 1. Course overview Instructor: Rada Mihalcea Class web page:
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
WHAT IS A SEARCH ENGINE A search engine is not a physical engine, instead its an electronic code or a software programme that searches and indexes millions.
Searching the Web by Lorrie Brazier Revised by Paula Walton.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
The Internet October 30, The Internet URL’s Search Engines Boolean Operators Internet Searches Scavenger Hunt.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Autumn Web Information retrieval (Web IR) Handout #1:Web characteristics Ali Mohammad Zareh Bidoki ECE Department, Yazd University
IT-522: Web Databases And Information Retrieval By Dr. Syed Noman Hasany.
Search Tools and Search Engines Searching for Information and common found internet file types.
Search Engines By: Faruq Hasan.
Use Google Smartly O’Neal Tang Internet. Fine-Tune Your Query with More Keywords As many keywords as possible Be descriptive Sample.
Modern Information Retrieval Presented by Miss Prattana Chanpolto Faculty of Information Technology.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Powerpoint Templates Page 1 Powerpoint Templates Technological Applications in Information Literacy.
Digital Literacy Concepts and basic vocabulary. Digital Literacy Knowledge, skills, and behaviors used in digital devices (computers, tablets, smartphones)
Information Retrieval
A Novel Visualization Model for Web Search Results Nguyen T, and Zhang J IEEE Transactions on Visualization and Computer Graphics PAWS Meeting Presented.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
Steve Cassidy Computing at MacquarieNo 1 Searching The Web Steve Cassidy Centre for Language Technology Department of Computing Macquarie University.
11/23/2003 Google Search Tips: Advanced Features By Robin Hartman, Associate Librarian Darling Library – Hope International University Adapted from “A.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Research Skills for Your Essay Where to begin…. Starting the search task for real Finding and selecting the best resources are the key to any project.
General Architecture of Retrieval Systems 1Adrienn Skrop.
XP Including Comments in an HTML Document On a new blank line in an HTML document, type the start code for a comment:
Presented By: Carlton Northern and Jeffrey Shipman The Anatomy of a Large-Scale Hyper-Textural Web Search Engine By Lawrence Page and Sergey Brin (1998)
1 Chapter 5 (3 rd ed) Your library is an excellent resource tool. Your library is an excellent resource tool.
Searching the Web for academic information Ruth Stubbings.
Search Engines and Search techniques
Chapter Five Web Search Engines
Internet Searching: Finding Quality Information
Google Search Tips: Advanced Features
IST 516 Fall 2011 Dongwon Lee, Ph.D.
Web Design/Internet Essentials
Searching for and Accessing Information
Search Engines & Subject Directories
Data Mining Chapter 6 Search Engines
Multimedia Information Retrieval
Understanding the Features of a Web Site
Introduction to Information Retrieval
Search Engines & Subject Directories
Search Engines & Subject Directories
Information Retrieval and Web Design
Information Retrieval and Web Design
Presentation transcript:

1 Searching through the Internet Dr. Eslam Al Maghayreh Computer Science Department Yarmouk University

2 Outline Introduction Information Retrieval Indexing Smarter Internet Searching Examples

Introduction Internet has enormous quantity of information: billions of web pages thousands of newsgroups Two questions face any information seeker: (1) How can I find what I want? (2) How can I know that what I find is any good? 3

4 Information Retrieval Goal = find documents relevant to an information need from a large document set Document collection Info. need Query Answer list IR system Retrieval

5 Example GoogleGoogle Web

Search Engine Consists of: the interface you use to type in a query an index of Web sites that the query is matched with and a software program (called a spider or bot) that goes out on the Web and gets new sites for the index 6

7 IR problem First applications: in libraries (1950s) ISBN: Author: Salton, Gerard Title: Automatic text processing: the transformation, analysis, and retrieval of information by computer Editor: Addison-Wesley Date: 1989 Content: External attributes and internal attribute (content) Search by external attributes = Search in DB IR: search by content

8 Possible approaches 1.String matching (linear search in documents) - Slow 2.Indexing - Fast - Flexible to further improvement

9 Documents Query Results Indexing Query RepresentationDocument Representation Comparison Function Index

10 Main problems in IR Query evaluation (or retrieval process) To what extent does a document correspond to a query? System evaluation How good is a system? Are the retrieved documents relevant? (precision) Are all the relevant documents retrieved? (recall)

11 Document indexing Goal = Find the important meanings and create an internal representation Factors to consider: Accuracy to represent meanings (semantics) Exhaustiveness (cover all the contents) Facility for computer to manipulate What is the best representation of contents? Word: good coverage, not precise Phrase: poor coverage, more precise Concept: poor coverage, precise Coverage (Recall) Accuracy (Precision) Word Phrase Concept

12 Keyword selection and weighting How to select important keywords? Simple method: using middle-frequency words Search engines usually disregard minor words such as "the, and, to, etc."

13 Result of indexing Each document is represented by a set of weighted keywords (terms): D 1  {(t 1, w 1 ), (t 2,w 2 ), …} e.g.D 1  {(comput, 0.2), (architect, 0.3), …} D 2  {(comput, 0.1), (network, 0.5), …}

14 Retrieval The problems underlying retrieval Retrieval model How is a document represented with the selected keywords? How are document and query representations compared to calculate a score?

15 Vector space model Vector space = all the keywords encountered Document D = a i = weight of t i in D Query Q = b i = weight of t i in Q R(D,Q) = Sim(D,Q)

16 Matrix representation t 1 t 2 t 3 … t n D 1 a 11 a 12 a 13 …a 1n D 2 a 21 a 22 a 23 …a 2n D 3 a 31 a 32 a 33 …a 3n … D m a m1 a m2 a m3 …a mn Qb 1 b 2 b 3 …b n Term vector space Document space

17 Some formulas for Sim Dot product Cosine Dice Jaccard t1 t2 D Q

18 (Classic) Presentation of results Query evaluation result is a list of documents, sorted by their similarity to the query. E.g. doc10.67 doc20.65 doc30.54 …

19 IR on the Web No stable document collection (spider, crawler) Duplication Huge number of documents Multimedia documents Multilingual problem …

Tips for smarter Internet searching Use unique, specific terms Use the minus operator (-) to narrow the search yarmouk -university Utilize quotation marks, to view "consecutive words of a phrase," such as "flower arrangement." Enter a short question, such as " what time is it in amman?“, “3.55* =“, “who is the king of england?”, “what is the distance between the sun and earth” 20

Smarter Internet Searching inurl:test results only test must be found in the web address (URL) allinurl:test results Both test AND results must be found in the web address. define: will provide definitions of the words, gathered from various online sources. define: search engine 21

Smarter Internet Searching Allintext Sometimes you get pages that do not have your search term/phrase in them. Why? Because Google also searches for pages that just link to the target page. Use allintext to get only those pages that have your search terms in them. 22

Smarter Internet Searching Allinanchor: Returns only pages that link to pages with your search terms, but not in the actual pages. This is the opposite of allintext. Site: Limit your search to a specific web site. Example: students site:yu.edu.jo students site:yu.edu.jo filetype:pdf 23

Smarter Internet Searching Don't use common words and punctuation Common words and punctuation marks should be used when searching for a specific phrase inside quotes Most search engines do not distinguish between uppercase and lowercase Maximize AutoComplete 24

Smarter Internet Searching The wildcard operator (*): Google calls it the fill in the blank operator. For example, amusement * will return pages with amusement and any other term(s) the Google search engine deems relevant. Using a wildcard (*) for a character does not work in Google. cat* returns the same results as cat. 25

Smarter Internet Searching Related sites: For example, related: can be used to find sites similar to Yarmouk University site. Specific file type: For example Information retrieval filetype:ppt 26

Examples Searching for papers YU library Google scholar Searching for instructor resources Morgan Kaufmann Pearson 27

Examples Searching for books to buy Amazon.com Ebay.com Searching for items to buy Electronics: bustbuy.com Searching for hotels Expedia.com Priceline.com Booking.com 28

Examples Regional search Google jo Searching for images Google images Searching for a job Jobsinacademia.net Academickeys.com 29

The End. 30