Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fluency with Information Technology

Similar presentations


Presentation on theme: "Fluency with Information Technology"— Presentation transcript:

1 Fluency with Information Technology
7th Edition Chapter 5 Locating Information on the World Wide Web

2 Learning Objectives 1.1 Explain how a web search engine works
1.2 Use logical operators, filtering to express complex queries 1.3 Find information by using a search engine 1.4 Recognize and find authoritative information sources 1.5 Decide whether web information is truth or fiction

3 1.1 Search Engine Fundamentals
A search engine is a collection of computer programs that help us find information on the Web No one organizes the information posted on the Web Search engines look around to find out what’s out there, and then organize what is found

4 How a Search Engine Works
In the first step, crawling, the search engine visits every web page that it can find The crawler has a to-do list that is loaded with a set of pages to start When a URL is found while crawling a page, it adds that URL to the to-do list The main work of the crawler is to build an index

5 Building an Index The index is a list of tokens, or words, that are associated with the page The token might be part of the page’s title There are other ways for a token to be associated with a page For each token, the crawler creates a list of the URLs associated with that token

6 The user presents tokens (aka search terms) to the query processor
Query Processing The user presents tokens (aka search terms) to the query processor The search engine then looks up the word in the index and returns a hit list By creating the index ahead of time, search engines are able to answer user queries with hit lists very quickly

7 Multiple-word searches
With a multiple-word query, the pages returned should be appropriate for all of the queried words AND-query Each page returned should be associated with all the words There are only index entries for the individual words

8 Intersecting Queries For multiple words, the query processor fetches the index lists for each of the terms URLs that are in all of the lists are looked at and compared The query processor intersects the lists The URL lists are alphabetized to speed up the processing Easier to notice when the same URL is on multiple lists

9 Rules for Intersecting Alphabetized Lists
To intersect several alphabetized lists: Put a marker (arrow) at the start of each token’s index list If all markers point to the same URL, save it, because all tokens are associated with the page Move the marker(s) to the next position for whichever URL is earliest in the alphabet Repeat steps 2–3 until some marker reaches the end of the list

10 Figure 5.2 Intersecting Alphabetized Lists
Illustrating the Intersecting Alphabetized Lists rules: In each step (row of boxes), one or more arrows advance; notice Step 3 where the arrow has advanced in two lists in the same step because the earliest URL is on both lists.

11 The Power of Indexed Search
The search engine: takes the time to crawl the data (web pages) builds an index Finds the index entries for each word intersects the lists to find the information for an AND-query Search engines can review billions of web pages and return an answer to a query in a quarter of a second once the index is built

12 What Comes First: Terms Impacting Relevancy
Tokens can be found in different descriptive places. Some that impact relevancy are: Title: The <title> tag encloses a short phrase describing the whole page Anchor text: The link text, inside <a> tags, describes the page it links to Meta tag: A <meta> tag in the <head> section can hold a description of the page Alt attributes: The <img> tag has an alt attribute that gives a textual description <h1> tags: Text of top-level headers

13 PageRank (aka Relevancy)
Why, when the hit list is returned, the page you’re looking for is often first on the hit list or in the top 10? The order in which hits are returned to a query is determined by a number called the PageRank The higher the PageRank, the closer to the top of the list

14 Part of PageRank is Voting Done by Links
Pioneered by Google, page ranking works like a voting system: If page A links to page B, A’s link adds to B’s importance Pages that are linked to by many pages have a higher page ranking and are assumed to be more important Links from pages with a high page ranking are also viewed as more important PageRank is computed by the crawler Counting the number of links to a page is not sufficient

15 Link Voting Complete; Hit List Created
After the crawling is completed, the PageRank computation is completed The query processor puts together the hit list The URLs are sorted by their page ranking, highest to lowest, and returned in that order

16 1.2 Using Logical Operators: AND-queries
All basic queries are AND-queries AND is a logical operator (or Boolean operator) A logical operator specifies a logical relationship between the words it connects For AND, all words submitted in the search must be associated with the page for a hit Search engines treat search words as independent words, which can appear anywhere on the page in any order Search engines insert AND between words by default, so that human powered flight is searched for as: human AND powered AND flight

17 More Advanced Searches: OR-queries
OR-queries hit on pages that are associated with at least one of the words marshmallow OR strawberry OR chocolate This search will find hits on any page with: Just marshmallow Just strawberry Just chocolate Or any combination of all three

18 AND-queries A Venn diagram showing the small set returned when searching using AND-queries

19 OR-queries .A Venn diagram showing the large set returned when searching using OR-queries

20 Complex Queries: AND, OR, and NOT in combination
Another logical operator is NOT NOT queries specify words that are not to be associated with the page Logical operators work like arithmetic, and can be combined and grouped using parentheses: (marshmallow OR strawberry) AND sundae NOT shake

21 Advanced Searching: Using Filters
Many advanced search boxes allow you to enter additional constraints, or filters These limit your results to specified languages, dates, sites or regions, and so on Constraints can be used to help narrow your search and pinpoint specific results

22 1.3 Finding Information Using a Search Engine
Research on the Web requires serious consideration and strategies: Selecting search terms: Choose useful words to include in a query The anatomy of a hit: how to use the information returned Using the hit list: skimming to find what you want quickly Once you find a likely page: locating the desired data on the page

23 Selecting Search Terms
Try these steps: Use advanced search Begin with a general topic Choose descriptive terms Refine by adding words Avoid overconstraining Remove specific words

24 1. Use Advanced Search Advanced search functions give you more precise control over filters and logical operators Search results can be narrowed before the search results come back

25 2. Begin with the General Topic
Words have multiple meanings, so adding the topic to your search can eliminate most of those conflicting words Adding programming to a search for Java will narrow your search in the right direction if you are looking for information about the programming language and not the island

26 3. Choose Descriptive Terms
As you search, take note of how the search terms show in your results and adjust your search to use similar terms when the results are more relevant Be precise: using "framing hammer" instead of "hammer" for learning about building houses will give you more relevant results

27 4. Refine by Adding Words Begin with a “first guess” search
Check the initial results to see what you’ve found The initial list often suggests additional terms to search Consider going through several rounds of adding a word to the query each time This should give you fewer, more relevant results

28 5. Avoid Overconstraining
Adding more words one at a time works best because interesting pages might be accidentally overlooked in a larger set of results Care is required in this process: Add a word only if you are sure the pages you want will have it Don't eliminate interesting pages by adding words too quickly

29 6. Remove Specific Words You can remove words in your query using NOT (or, in some search engines, the minus sign) It is useful to consider eliminating pages with certain words: "I want all of the pages that have arrows AND traffic AND signs NOT cupid" NOT is a good way to eliminate wrong interpretations of words

30 Getting Results: The Anatomy of a Hit
You'll usually see: The title: This is the text between the page’s <title> and </title> tags A snippet A preview of what might be on the page Usually a short phrase from the page containing one or more of the searched words The URL (linked from the title)

31 Using the Hit List (Your Search Results)
Skim the top level of information, and look deeper if it looks promising. Start at the top and go in order:: Title Snippet Search terms are shown in bold, the context around the occurrence is given URL The domain of the site hosting the page is given, and is the first check of how authoritative the information is Page If everything looks good at this point, there is some likelihood that the page includes information you need

32 Once You Find a Likely Page
First, “roll through the page” checking out its main features Next, look for a date to see how current it is Finally, find the location of one of the keywords using ^F to open a search box for that page If the site is good, decide how important it is that the information is perfectly correct: If the importance is high, cross-check the information with another site Find corroborating information! If you’ve found what you want, you’re done If not, return to the hit list

33 1.3 Finding and Recognizing Authoritative Information
Don’t believe everything you read No one in charge of the WWW, so no one checks to see that the information is correct Some information is inaccurate Most information on the Web should be considered suspect Anyone can post a web page (or a full site) and make random statements or claims that can be true, partly true, or completely false

34 Sorting out the Bad from the CRAAP
Librarians developed this test for finding reliable information on the Web – look for: Currency: how recently the information was updated Relevancy: whether or not the information is related to your search Authority: who created the page/site, and are they generally viewed as a trusted resource Accuracy: If the information is supported by evidence, using an unbiased tone, and free from spelling or typographical errors Purpose: if the page/site is trying to sell you something or persuade you

35 What is Authoritative? Authoritative means that we are looking for what experts say We assume that experts are well informed on the topic, and that what they experts say is true There is no way to verify the truth without becoming experts ourselves, so we accept that it’s the best available information

36 Who is Authoritative: Respected Sources
Thousands of professional organizations host websites we can trust, like the Mayo Clinic's site or that of the World Health Organization Individuals can also be good sources of information, as long as we have some reason to believe they are reliable Why might it be difficult to know if a government site in your own country or elsewhere is trustworthy on a particular topic?

37 1.5 Separating Truth from Fiction
Perform a site analysis. Look for: Broken links Failure to give contact information Failure to have a non-web identity No recent updates or blog entries Spelling or grammar mistakes Remember: Legitimate sites may have these issues, too! To decide if a site is legitimate, do a little direct research on the information found there, and cross-check it with another resource

38 Summary Learned the key parts of a search engine: the crawler and the query processor Created basic and complex queries using the logical operators AND, OR, and NOT, using specific terms to pinpoint the information we seek Followed the CRAAP test for verifying information on the Web Identified the hallmarks of authoritative resources and legitimate sites


Download ppt "Fluency with Information Technology"

Similar presentations


Ads by Google