Presentation is loading. Please wait.

Presentation is loading. Please wait.

And Beyond How to Find What You’re Looking for on the Internet.

Similar presentations


Presentation on theme: "And Beyond How to Find What You’re Looking for on the Internet."— Presentation transcript:

1 And Beyond How to Find What You’re Looking for on the Internet

2 How to find information? Think of the Internet as the world's biggest library – but instead of books, its shelves contain billions of individual web pages. Imagine being in such a vast library. It would take forever to find what you were looking for!

3 Two Categories of Search Tools Search Engines – Individual search engine – Meta-search engine (all-in- one) Subject Directories – Developed and maintained by humans (like reference librarians) – Google Directory shut down in 2011 – Yahoo Directory shut down in 2014 – http://www.ipl.org/ (Internet Public Library) closed in 2015 (after 20 years) http://www.ipl.org/

4 What is a Web Search Engine? Search Engine – a program that searches for and identifies items in a database that correspond to keywords or characters specified by the user, used especially for finding particular sites on the World Wide Web. special websites that have indexed billions of pages – and make it easy for you to find a website or page in an instant. Popular search engines include Google, Yahoo!, Bing and Ask. Address bar

5 What is a Web Search Engine for? Search Engine is to to find content on the web that matches the search query you’ve entered. A query is used to extract data from the database in a readable format according to the user's request. The more accurately and faster a search engine can match your search query with a good answer, the happier you are as a user of their search engine.

6 Keyword Search Engines Individual search engines use computer programs called “spiders” to match key search words with the web pages that contain them. – Returns a large volume of results – Information is not filtered for validity, authenticity, or adult content – Results are returned in the form of links to sites that match terms used in the search

7 7 Google’s Dominant Share of Search 2015

8 8 Google Corporate Background Google.com was launched in 1998 with an initial investment of $1 million, 8 employees. Google's mission is to “organize the world's information and make it universally accessible and useful.” Google's founders Larry Page and Sergey Brin developed Google in a Stanford University dorm room and it is currently the world's largest search engine.

9 9 Google Corporate Background “Google” is a play on the word “googol” which is the mathematical term for 1 followed by 100 zeros and “…reflects the company's mission to organize the immense, seemingly infinite amount of information available on the web.”

10 How Search Engines Work 1.Crawling – Discovery 2.Indexing – Database (in data centers) 3.User Search – keyword, questions, etc. 4.Presentation and Ranking – What we see (top secret) How Search Works_Google

11 11 How Google Finds New Pages Google has special programs called spiders or robots (a.k.a. “Google bots”) that constantly search the Internet looking for new or updated Web pages. It’s a never-ending process. When a spider finds a new or updated page, it reads that entire page, reports back to Google, and then visits all of the other pages to which that new page links. Image source: http://www.disobey.com/

12 12 How Google Finds New Pages Collects copies of web pages Assembled & organized into an indexed database Stored in data centers around the world

13 Search Engines Collects copies of web pages from various host servers Assembled into a database Displayed according to relevancy and rank Search engines use a program to search and retrieve web sites; spiders/crawl the web through links Algorithm to determine rank; company ‘trade secret’ Google, Ask Jeeves, Bing, Yahoo GoogleAsk JeevesBingYahoo

14 14 Google’s Cache When the spider reports back to Google, it doesn’t just tell Google the new or updated page’s URL. The spider also sends Google a complete copy of the entire Web page – HTML, text, images, etc. Google then adds that page and all of its content to Google’s cache.

15 15 So What? When you search Google, you’re actually searching Google’s cache of Web pages. And because of this, you can search for more than text or phrases in the body of a Web page. Google has some secret, advanced search operators that let you search specific parts of Web pages or specific types of information. Source: Google Hacks, p. 5

16 16 How Indexing Works Indexing is the process of taking all of that data you have from a crawl, and placing it in a big database, just like a library card catalog.

17 17 How Google Indexing Works When you search for multiple keywords, Google first searches for all of your keywords as a phrase. So, if your keywords are disney fantasyland pirates, any pages on which those words appear as a phrase receive a score of X.

18 18 How Google Works – Adjacency Google then measures the adjacency between your keywords and gives those pages a score of Y. What does this mean in English? Well … Image source: Google Source: Google Hacks, p. 21

19 19 How Adjacency Works A page that says “My favorite Disney attraction, outside of Fantasyland, is Pirates of the Caribbean” will receive a higher adjacency score than a page that says “Walt Disney was a both a genius and a taskmaster. The team at WDI spent many sleepless nights designing Fantasyland. But nothing could compare to the amount of Imagineering work required to create Pirates of the Caribbean.”

20 20 How Google Works – Weights Then, Google measures the number of times your keywords appear on the page (the keywords’ “weights”) and gives those pages a score of Z. A page that has the word disney four times, fantasyland three times, and pirates seven times would receive a higher weights score than a page that only has those words once. Source: Google Hacks, p. 21

21 21 Putting it All Together Google takes – The phrase hits (the Xs), – The adjacency hits (the Ys), – The weights hits (the Zs), and – About 100 other secret variables Throws out everything but the top 2,000 Multiplies each remaining page’s individual score by it’s “PageRank” And, finally, displays the top 1,000 in order according to relevancy and rank

22 The Anatomy of a Search Like the “index” at the end of a book Web page or doc title, URL, snapshot, similar & related searches

23 Google – PageRank? There is a premise in higher education that the importance of a research article can be judged by the number of citations (references) to it from subsequent articles in the same field. Google PageRank (Google PR) is one of the methods Google uses to determine a page's relevance or importance. Important pages receive a higher PageRank and are more likely to appear at the top of the search results.

24 Google – PageRank? Google interprets a link from page A to page B as a vote, by page A, for page B. Google looks at considerably more than the sheer volume (quantity) of votes, or links a page receives; for example, it also analyzes the page that casts the vote (quality). Votes cast by pages that are themselves “important” weigh more heavily and help to make other pages “important.” PageRank As Votes

25 Meta-search Engines Meta-search engines send requests for information to several search engines simultaneously and compile the results. – Duplicates are eliminated, thus yielding fewer results These search engines are useful if you need to run a comprehensive search quickly across a number of different engines

26 ? Google doesn’t understand “natural language” How you search can be as important as what you search for

27 Four Search Strategies Keyword Searching Boolean Question Advanced Basic Search Strategies_GCF ?

28 Search Math – Boolean Operators And (+) DEFAULT A limiter ORAn expander Not (–)A limiter Phrase (“ ”)A limiter

29 All Search Engines use Boolean Search Strategy b a b a a b a AND b a OR b a NOT b disney AND pirates fantasyland OR adventureland pirates NOT fantasyland pirates NOT fantasyland A limiter An expander

30 Boolean Searching – AND (Default) Enter words connect with AND – it will include sites where both words are found Uses: joining different topics (e.g., president AND Washington)

31 Boolean Searching – OR (uppercase) Enter words connect with OR – requires at least one of the terms is found Uses: join similar or synonymous topics (e.g., “global warming” OR “greenhouse effect”)

32 Search for Similar Words ~ operator Searches for all pages that include word and all appropriate synonyms Sample ~elderly This find pages that include not just the word "elderly," but also the words "senior," "aged," "nursing homes,“ ~pets This find pages that include “cats," “dogs,“ “rabbits,"

33 Where IS that “~” key?

34 Linda J. Goff - Fall 2005 - http://library.csus.edu 34 Three Ways to OR at Google Just type OR between keywords – disney fantasyland OR “pirates of the caribbean” Put your OR statement in parentheses – disney (fantasyland OR “pirates of the caribbean”) Use the | (“pipe”) character in place of the word OR – disney (fantasyland | “pirates of the caribbean”) All three methods yield the exact same results. Source: Google Hacks, p. 3

35 Where IS that “pipe” key? Located just above the enter (return) key. Don’t forget to shift!

36 Boolean Searching – NOT (uppercase) Enter words connect with NOT – searches for the first term and excludes sites that have the second term Uses: join similar or synonymous topics (e.g., Washington NOT school)

37 Boolean Searching “–” To exclude a word from your search, use the “–” operator Include a space before the – but not after To exclude many words from your search, use multiple “–” operators (e.g., Washington –state –school –street)

38 38 Phrases To search for a phrase, put it in quotes. For example, disney adventureland “pirates of the caribbean” – This would show you all the pages in Google’s index that contain the word disney AND the word adventureland AND the phrase pirates of the caribbean (without the quotes) Source: http://www.google.com/help/refinesearch.html

39 Boolean – (Nesting) and Near In a string of searching, terms placed in parentheses are searched first Parentheses must be used to group items if there is another Boolean operator being used NEAR may be used to require words to be found within 16 words of each other in the pages returned E.g., (vegan NOT vegetarian) AND (cooking OR recipes) E.g., (“global warming” NEAR “sea level rise”) AND “pacific coast”

40 40 Other Google Searching Tips Common words are ignored (that, to, which, a, the, …) Google won’t accept more than 32 keywords at a time. Searching is not case sensitive. Google does support stemming or truncation (“wildcards”) to obtain variations on the root word - e.g., schedul* - and different spelling. Because Google searches for phrases first, the order of your keywords matters.

41 Searching Tips Use more than one search tool or meta- search engines – the results will differ Bookmark search results if you think you might use them again! A question may be entered in the search field of a search engine. Ask Jeeves is a search engine that encourages the use of question searching

42 Advanced Search Advanced search features are offered on many engines by going to an “Advanced Search” page and making selections This is effective in narrowing search returns to a specific topic or phrase Google Advanced Search https://www.google.com/advanced_search

43 Linda J. Goff - Fall 2005 - http://library.csus.edu 43 https://www.google.com/advanced_search

44 44 In Summary Be specific – use more keywords & Boolean operators Search term is not case sensitive. Google’s Boolean default is AND (not needed). Use minus or NOT to exclude some keywords Use OR to expand search Put quotes around a name or a phrase you want to search. The order of your keywords in quote matters. Google supports truncation or stemming (wildcard*). How Search Works-BrainPop

45 Search Smarter Use Google if you use a search engine! It’s the best. Google.com Put quotes around a name or a phrase you want to search Use the plus sign to add to the search and get more specific Check the Google results page – check for the search terms in bold, read summaries – make sure you can use it.

46 What are you really looking for? Words that will definitely be on a website with the answer to your question Words that might be there (or will narrow your result) If at first you don’t succeed... search, search again!

47 Tune Your Search with Other Operators OperatorUseUsage allinanchor: Restricts search to words in the link text on web pages (with multiple keywords) allinanchor:keyword1 keyword2 allintext: Restricts search to the body text of web pages (with multiple keywords) allintext:keyword1 keyword2 allintitle: Restricts search to the titles only of web pages (with multiple keywords) allintitle:keyword1 keyword2 allinurl: Restricts search to web page addresses (with multiple keywords) inurl:keyword1 keyword2 filetype:Restricts search to files of a specified typefiletype:extension inanchor:Restricts search to words in the link text on web pagesinanchor:keyword intext:Restricts search to the body text of web pagesintext:keyword intitle:Restricts search to the titles only of web pagesintitle:keyword inurl:Restricts search to web page addressesinurl:keyword site:Restricts search to a specific domain or websitesite:domain Relate: display pages that are in some way similar to the specified page related:site

48

49 Limit to government sites— add site:gov

50 Limit to edu institution sites— add site:edu

51 A word on domain types Restricted domains.edu – Limited to post-secondary accredited educational institutions.gov – Limited to U.S. government agencies.mil – Limited to use by the U.S. military Unrestricted domains.com.org.info.net – Anyone can register a site with one of these domains – Take the information with a grain of salt (maintain a degree of skepticism about its truth) www.google.com Top-level Domain There are MANY other domains, including country-specific domains (ex..au,.ca)

52

53 Search within a particular site

54 Use Google is a dictionary or an encyclopedia.

55 Google is the new spell check.

56 Timer & Stopwatch

57 Knowledge Engine -Wolfram|Alpha Wolfram Alpha Example

58 Things Wolfram|Alpha Can Do

59 Ever wonder what the Internet looked like in 1996? Is there a backup for the Internet?

60 WayBack Machine https://archive.org/ WayBack Machine

61 Broken Links

62

63 Limit results by eliminating keywords

64 Keywords with many meaning

65 Limit results by eliminating keywords

66

67 Picture Perfect – Image Search Image: 'Across the Chinese Countryside' http://www.flickr.com/photos/95572727@N00/1341027869

68 The Power of the Tools Image Search

69

70

71

72

73

74

75

76

77

78

79

80

81 The Power of the Tools

82

83 Evaluating a Web Site How do you know what to believe? Evaluating Information on the Internet a Tutorial

84 The Problem? – Quality Control Anyone can publish ANYTHING they want online, including Wikipedia (you can self- publish online: you don’t need an editor or a publisher, just some web space). Some sites are MEANT to mislead – they have a hidden agenda

85 A Word on Domain Types Restricted TLD.edu – Limited to post-secondary accredited educational institutions.gov – Limited to U.S. government agencies.mil – Limited to use by the U.S. military Unrestricted TLD.com.org.info.net – Anyone can register a site with one of these domains – Take the information with a grain of salt (maintain a degree of skepticism about its truth) www.google.com Top-level Domain (TLD) There are MANY other domains, including country-specific domains (ex..au,.ca) no longer restricted to nonprofits

86 Remember… Question your source: just because it’s online, doesn’t mean its true Review your search results--make sure you’ve found what you need Search smarter, find results faster If you’re in doubt, throw it out Media Literacy


Download ppt "And Beyond How to Find What You’re Looking for on the Internet."

Similar presentations


Ads by Google