Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Locating Information on the WWW Searching for Truth lawrence snyder c h a.

Slides:



Advertisements
Similar presentations
Boolean Operators. Locating Information The number of documents on the web have multiplied immensely over the last few years This means there is simply.
Advertisements

Spring 2013 CS 103 Computer Science – Business Problems Lecture 8: Web Search Instructor: Zhe He Department of Computer Science New Jersey Institute of.
Computer Information Technology – Section 3-2. The Internet Objectives: The Student will: 1. Understand Search Engines and how they work 2. Understand.
Best Web Directories and Search Engines Order Out of Chaos on the World Wide Web.
Chapter 5 Searching for Truth: Locating Information on the WWW.
How Search Engines Work Source:
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Fluency with Information Technology Third Edition by Lawrence Snyder Chapter.
Best Web Directories and Search Engines Order Out of Chaos on the World Wide Web.
Searching and Researching the World Wide: Emphasis on Christian Websites Developed from the book: Searching and Researching on the Internet and World Wide.
Unit 3 Web Search Engines. Can You Find the Answers? n Connect to Google Google n Search for items on Iran Records ________ n Combine Iran with nuclear.
Search Engine Optimization March 23, 2011 Google Search Engine Optimization Starter Guide.
Internet Research Search Engines & Subject Directories.
SEARCHING ON THE INTERNET
SEARCH ENGINE By Ms. Preeti Patel Lecturer School of Library and Information Science DAVV, Indore E mail:
Lesson 12 — The Internet and Research
1 Lesson 29 Web Content Computer Literacy BASICS: A Comprehensive Guide to IC 3, 4 th Edition Morrison / Wells.
Chapter 5 Searching for Truth: Locating Information on the WWW.
Using sources in your Advanced Higher Investigation.
The Internet as a Search Tool Prepared by Ms. Emery, Summer 2006 With billions of web pages online, you could spend a lifetime surfing the Web, following.
Overview In this tutorial you will: learn different ways to conduct a web search learn how to save and print search results learn about social bookmarking.
Conducting Research on the Web. This presentation will teach you about:  Different types of search engines  How to search on the Internet  How to cite.
WHAT IS A SEARCH ENGINE A search engine is not a physical engine, instead its an electronic code or a software programme that searches and indexes millions.
Search Engine By Bhupendra Ratha, Lecturer School of Library and Information Science Devi Ahilya University, Indore
Copyright ©: SAMSUNG & Samsung Hope for Youth. All rights reserved Tutorials The internet: Blogging Suitable for: Advanced.
Fourth Edition Discovering the Internet Discovering the Internet Complete Concepts and Techniques, Second Edition Chapter 3 Searching the Web.
Gregor Gisler-Merz How to hit in google The anatomy of a modern web search engine.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Introduction to Computers CS Dr. Zhizhang Shen Chapter 5: Look up.
LOGO Searching the Web CHAPTER 2 Eastern Mediterranean University School of Computing and Technology Department of Information Technology ITEC229 Client-Side.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
The Internet 8th Edition Tutorial 4 Searching the Web.
Chapter 8 Browsing and Searching the Web. 2Practical PC 5 th Edition Chapter 8 Getting Started In this Chapter, you will learn: − What is a Web page −
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Wading Through the Web Conducting Research on the Internet.
The Internet Do you really know what is out there?
Research 2013.
Search Engines Reyhaneh Salkhi Outline What is a search engine? How do search engines work? Which search engines are most useful and efficient? How can.
Copyright © 2003 Pearson Education, Inc. Slide 1-1 Information Searching.
Living Online Module Lesson 27 — Evaluating Online Information
Unit 1—Computer Basics Lesson 3 The Internet and Research.
Lawrence Snyder University of Washington, Seattle © Lawrence Snyder 2004.
Chapter 1 Getting Listed. Objectives Understand how search engines work Use various strategies of getting listed in search engines Register with search.
1 SEARCHING FOR TRUTH Locating Information on the WWW chapter 5.
Lesson 6, Unit 3 Using the Internet for Research Based on the Plan Ahead educational materials made available by Gap Inc. at and.
Digital Literacy The Basics ent.
Wading Through the Web Conducting Research on the Internet Adapted on 9/3/09 from:
Wading Through the Web Conducting Research on the Internet.
A Pocket Guide to Public Speaking Pages Google and Yahoo may lead to false or biased information.
Searching for Information Effectively Dr. Nazli Hardy Adapted from Fluency with Information Technology, Lawrence Snyder Managing Vast Sources of Information.
Search Engine Marketing Science Writers Conference 2009.
Search Engine Optimization
Client-Side Internet and Web Programming
Fluency with Information Technology
Learning Objectives Explain how a Web search engine works
Chapter 8 Browsing and Searching the Web
Search Engines and Search techniques
Web Searching Strategies
Lecture 8 Searching Part 2.
Wading Through the Web Conducting Research on the Internet
CIW Lesson 6 Web Search Engines.
Search Engines & Subject Directories
Computer Literacy BASICS: A Comprehensive Guide to IC3, 3rd Edition
Wading Through the Web Conducting Research on the Internet
Data Mining Chapter 6 Search Engines
Searching for Truth: Locating Information on the WWW
Searching in All the Right Places
Search Engines & Subject Directories
Search Engines & Subject Directories
Searching for Truth: Locating Information on the WWW
Wading Through the Web Conducting Research on the Internet
Searching for Truth: Locating Information on the WWW
Presentation transcript:

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Locating Information on the WWW Searching for Truth lawrence snyder c h a p t e r 5

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 5-2 Searching in All the Right Places The Obvious and Familiar –To find tax information, ask the tax office Libraries Online –Many college and public libraries let you access their online catalogs and other information resources Libraries provide online facilities that are well organized and trustworthy Remember that many pre-1985 documents are not yet available online Plus Librarians are real live experts

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 5-3

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 5-4

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 5-5

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 5-6 How Is Information Organized? Hierarchical classification (like a family tree) Information is grouped into a small number of categories, each of which is easily described (top- level classification) Information in each category is divided into subcategories (second-level classifications), and so on Eventually the classifications become small enough for you to look through the whole category to find the information you need –This is a process of elimination as much as choosing appropriate subcategories

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 5-7

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 5-8 Important Properties of Classifications Descriptive terms must cover all the information in the category and be easy for a searcher to apply Subcategories do not all have to use the same classifications Information in the category defines how best to classify it There is no single way to classify information

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 5-9 How is Web Site Information Organized? Homepage is the top-level classification for the whole Web site Classifications are the roots of hierarchies that organize large volumes of similar types of information Topic clusters are sets of related links –For example, sidebar and top of page navigation links Content information often fills the rest of a page

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 5-10

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 5-11 Alternate Hierarchy Presentations Top level classifications can be expanded individually for next level information Alternately, a tabular form of the tree can be presented for a broader picture at a glance (sometimes called site map) Our NPR homepage example offers both forms

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 5-12

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 5-13 Design of Hierarchies General rules for design and terminology of hierarchies –Root is usually at the top (branching metaphor) "Going up in the hierarchy" means the classifications becomes more inclusive or general "Going down in the hierarchy" means the classifications become more specific or detailed The greater-than (>) symbol is a common way to show going down through levels of classification

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 5-14 Levels in a Hierarchy A one-level hierarchy has only one level of "branching"—no subdirectories To count levels, remember –There is always a root –There are always "leaves"—the categories themselves –The root and leaves do not count as levels The NPR hierarchy, drawn as a tree, shows 2 classification levels between the root (homepage) and the leaves (content pages)

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 5-15

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 5-16 Other Hierarchy Considerations Groupings may overlap (one item can appear in more than one category), or be partitioned (every category appears only once) Number of levels may differ by category, even in the same hierarchical tree A single path from root to leaf is a full classification of the leaf content –Home > Music > Browse Artists > C > Cave Singers “Tree of Life” biological taxonomy for humans is a path

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 5-17

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 5-18 Searching the Web for Information Individual web sites are carefully organized by their designers (hierarchically, for example) But… no one organizes the entire Web, and it has grown unimaginably HUGE… too huge to just browse looking for specific items Search engines solve the problem Popular Search Engines: Google, Yahoo!, MSN, AOL, Ask

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 5-19 Search Engine Basics A search engine has two basic parts –Crawler: Constantly runs, visits sites on the Internet, discovering Web pages and building/updating an index to the content it finds –Query processor: Looks up user-submitted keywords in the index and reports back a list of Web pages the crawler has found containing those words

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 5-20 Crawlers Build index telling which Web pages (URLs) contain which words, based on their HTML text. When a crawler visits a web page it: 1.Adds all tokens (words) on the page into the index (words from the title, the body content, anchor text, META tags) 2.Associates the URL for the page with each of these words 3.Then visits all pages that are linked to the page being examined, and does steps 1 through 3 on each Crawlers can miss pages –If no page points to it –If a page is dynamically created on-the-fly –Page has only images, or unknown type (not HTML, PDF, etc.)

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

5-22 Query Processors User submits one or more keywords (the query) Index is consulted for these keywords, producing a list of web page URLs found by the crawler Important to give a good query to get a useful list of pages in reply Query not specific  Huge list of unwanted URLs

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 5-23 Multiword Index Searches List of several keywords is an AND-query –red fish blue guppy Very common (default) use, means each found page must contain ALL the words Look in the index for the URL list for each word, then scan the lists for URLs common to all URL lists in indexes are often alphabetized to make this faster

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 5-24 Advanced Searches Search engines allow complex queries to get a smaller, more useful list of returned pages (see Google’s Advanced Search page) Logical operators –AND: Tells search engine to return only pages containing all terms red AND fish AND blue AND guppy –OR: find pages containing any word given, including pages where 2 or more appear marshmallow OR strawberry OR chocolate –NOT/-: Excludes pages containing the given word –Combinations: tigers AND NOT baseball (chocolate OR strawberry) AND sundae Simpson bart OR lisa OR maggie –homer –marge –Use parentheses to make your intent clear

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 5-25

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 5-26 Effective Queries: Narrow the Hitlist Suppose you are writing a report on red giant stars. You issue the Google query red giant and get 4.9 million hits (URLs)… now what? First few pages deal with software and rock bands, so try again… with some restrictions red giant –software –music Now we have 824,000 hits… better, still big, but the early URLs are somehow the ones we need

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

5-28 Ordering the Hits How did Google decide on the order for the 824,000 URLs, and put the “best” ones up top? Order is determined by relevance several ways Top URLs have “red giant” together on the page, in the order given in the query Later down the list are pages with “giant red”, or words separated, or words in anchor text Google enhances this with a relevance score called PageRank for each page

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 5-29 Page Rank Count the links into a page (“important” pages are pointed to by lots of other pages) –Each page that links to a target page is considered a "vote" for that target page If the "voting page" is itself highly ranked, this ups the PageRank for the target page Words in anchor text up the PageRank Crawler computes this as it indexes Complete details of the PageRank algorithm is Google proprietary information

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 5-30 Further Constraining Search Some web sites offer search limited only to the pages of that site We can often focus a search by limiting it to URLs in specific domains (like.gov or.edu) or to specific sites (like This allows Google’s PageRank to order well the hits we get from the restricted domains

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 5-31 Web Information: Truth or Fiction? Anyone can publish anything on the web –Note prevalence of blogs and wikis Some of what gets published is false, misleading, deceptive, self-serving, slanderous, or disgusting –If it is on the web it must be true. – NOT! How do we know if the pages we find in our search are reliable?

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 5-32 Do Not Assume Too Much Registered domain names may be misleading or deliberate hoaxes – vs. vs. Look for who or what organization publishes the Web page –Respected organizations publish the best information A two-step check for the site's publisher 1.InterNIC ( provides the name of the company that assigned the site's IP address, and a link to the WhoIs server maintained by that company 2.Go to the WhoIs Server site and type the domain name or IP address again. –Information returned is the owner's name and physical address

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 5-33 Characteristics of Legitimate Sites Web sites are most believable if they have these features: –Physical Existence—Site provides a street address, phone number, address –Expertise—Site includes references, citations or credentials, related links –Clarity—Site is well organized, easy to use, and has site- searching facilities –Currency—Site was recently updated –Professionalism—Site's grammar, spelling, and punctuation are correct; all links work

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 5-34 Check and Double Check Remember that a site can have many of the features of legitimacy and still not be authoritative. –Example: (Hoax about dangers of Dihydrogen monoxide – H 2 O) Use known authoritative sites to cross check, or consult respected debunkers ( like snopes.com ) When in doubt, check it out. Ask a librarian. Test your assessment skills… check out the Burmese Mountain Dog web page

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 5-35

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 5-36 Summary Libraries are excellent primary resource tools Large libraries have extensive online resources Libraries not only provide information digitally, they also connect us with “pre- digital” archives -- the millions of books, journals, and manuscripts that still exist only in paper form We need software and our own intelligence to search the Internet effectively

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 5-37 Summary We create search queries using the logical operators AND, OR, NOT, and specific terms to pinpoint the information we seek Once we’ve found information, we must judge whether it is correct by investigating the organization that publishes the page, including checking the credentials of the people who write the content. We must cross-check the information with other sources, especially when the information is important