Locating the right information on the WWW requires effort

Slides:



Advertisements
Similar presentations
Introduction to Web Design Lecture number:. Todays Aim: Introduction to Web-designing and how its done. Modelling websites in HTML.
Advertisements

By Pamela McMahon. Find space on the internet In order to build a website, you must have somewhere to build it. You can buy space and customize it anyway.
HTML Basics Customizing your site using the basics of HTML.
WeB application development
Project 1A—Bogus Web site Misinformation abounds on the Web 5/3/ Copyright 2009, D.A. Clements, MLIS, UW Information School.
Chapter 4 Marking Up With Html: A Hypertext Markup Language Primer.
Hypertext Markup Language. Platform: - Independent  This means it can be interpreted on any computer regardless of the hardware or operating system.
CIS101 Introduction to Computing Week 05. Agenda Your questions CIS101 Survey Introduction to the Internet & HTML Online HTML Resources Using the HTML.
Chapter 5 Searching for Truth: Locating Information on the WWW.
Creating Web Pages Getting Started. Overview What Web Pages Are How Web Pages are Formatted Putting Graphics on Web Pages How Web Pages are Linked Linking.
Marking Up With Html: A Hypertext Markup Language Primer
Designing Web Pages Getting to know HTML... What is HTML? Hyper Text Markup Language HTML is the major language of the Internet’s World Wide Web. Web.
Basic HTML The Magic Of Web Pages. Create an HTML folder  Make a folder in your H drive and name it “HTML”. We will save EVERYTHING for this unit here.
Computer Science 101 HTML. World Wide Web Invented by Tim Berners-Lee at CERN, the European Laboratory for Particle Physics in Geneva, Switzerland (roughly.
HTML. We’ll learn … What HTML is What tags are What a basic web page looks like What 3 HTML tags are required What HTML comments look like How to title.
3.02 The Information Superhighway
Chapter 4 Fluency with Information Technology L. Snyder Marking Up With HTML: A Hypertext Markup Language Primer.
Chapter 5 Searching for Truth: Locating Information on the WWW.
HTML HTML stands for "Hyper Text Mark-up Language“. Technically, HTML is not a programming language, but rather a markup language. Used to create web pages.
Exploring Web Page Design. What is a Web Page?  A web page is a multimedia file which can be stored on a web server.  It can include text, graphics,
.  Entertain  Inform  Educate  Blogs  Sell  Date  Gamble  Religion.
Placing Relative Links A link in file page.html refers to a referenced file ref.html by relative link: link page.htmlref.html "ref.html" "folder/ref.html"
Essential Tags Web Design – Sec 3-3 Part or all of this lesson was adapted from the University of Washington’s “Web Design & Development I” Course materials.
HTML CRASH COURSE. What is HTML?  Hyper Text Markup Language  The language used to make web pages  Written by using tags.
Programming in HTML.  Programming Language  Used to design/create web pages  Hyper Text Markup Language  Markup Language  Series of Markup tags 
Web Programming Basics of HTML. HTML stands for Hyper Text Mark-up Language A mark-up language is different than those that you have learned before in.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
INTRODUCTORY Tutorial 1 Using HTML Tags to Create Web Pages.
HTML: Hyptertext Markup Language Doman’s Sections.
HTML Lesson 3 Hyper Text Markup Language. Assignment Part 2  Set the file name as “FirstName2.htm”  Set the title as “FirstName LastName First Web Site”
HTML. Hyper Text Markup Language Markup your text document The markup is the tag Hyper text means you can jump from place to place.
HTML Basic. What is HTML HTML is a language for describing web pages. HTML stands for Hyper Text Markup Language HTML is not a programming language, it.
Lawrence Snyder University of Washington, Seattle © Lawrence Snyder 2004.
HyperText Markup Language. Web Hosting Creating a web site (on a site like iPage) –Buy domain name ( –iPage has registrar (e.g., FastDomain.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley What did we learn so far? 1.Computer hardware and software 2.Computer experience.
HTML HYPER TEXT MARKUP LANGUAGE. INTRODUCTION Normal text” surrounded by bracketed tags that tell browsers how to display web pages Pages end with “.htm”
ADVANCED COMPUTERS S.Y.B.M.M. LECTURE SERIES - PART 1 - KANISHKA KHATRI m.
HTML. INDEX Introduction to HTML Creating Web Pages Commands And Tags Web Page.
HTML. Hyper Text Markup Language Markup your text document The markup is the tag Hyper text means you can jump from place to place Programming language.
Introduction to HTML Dave Edsall IAGenWeb County Coordinator’s Conference June 30, 2007.
Computer Basics Introduction CIS 109 Columbia College.
1 Chapter 5 (3 rd ed) Your library is an excellent resource tool. Your library is an excellent resource tool.
Basic Web Design UVI CELL Dave Gilliss Dave Gilliss
HTML Structure & syntax
Online PD Basic HTML The Magic Of Web Pages
Section 4.1 Section 4.2 Format HTML tags Identify HTML guidelines
Marking Up with XHTML Tags describe how a web page should look
Introduction to basic HTML
Uppingham Community College
Placing Relative Links
A guide to HTML.
HTML.
Objective % Explain concepts used to create websites.
Fred Dirkse CEO, OIC Group, Inc.
Tutorial Tutorial Read all the directions before proceeding
Searching for Information
Basic HTML and Embed Codes
Marking Up with XHTML Tags describe how a web page should look
Lesson Objectives Aims You should know about: – Web Technologies
Marking Up with XHTML Tags describe how a web page should look
Searching for Truth: Locating Information on the WWW
Searching for Truth: Locating Information on the WWW
Announcements Final Course Survey Thursday: Quiz-11 Today:
Understand basic HTML and CSS terminology, concepts, and basic operations. Objective 3.01.
Searching for Truth: Locating Information on the WWW
Marking Up with XHTML Tags describe how a web page should look
Marking Up with XHTML Tags describe how a web page should look
5.00 Apply procedures to organize content by using Dreamweaver. (22%)
Marking Up with XHTML Tags describe how a web page should look
Presentation transcript:

Locating the right information on the WWW requires effort Searching the WWW Locating the right information on the WWW requires effort

Looking In the Right Place Google is not necessarily the first place to look! Go directly to a Web site -- www.irs.gov Go to your bookmarks -- dictionary.cambridge.org Go to the library -- www.lib.washington.edu Go to the place with the information you want -- www.npr.org Ask, “What site provides this information?” Guessing a site’s URL is often very easy, making it a fast way to find information 8/7/2018 Copyright 2010, Larry Snyder, Computer Science and Engineering

Google Advanced – Use It! 8/7/2018 Copyright 2010, Larry Snyder, Computer Science and Engineering

Caution! In the next few slides, the general principles of keyword search are discussed … Google and Bing “adjust” the results somewhat 8/7/2018 © 2011 Larry Snyder, CSE

Boolean Queries Search Engine words are independent Words don’t have to occur together Use Boolean queries and quotes Logical Operators: AND, OR, NOT monet AND water AND lilies “van gogh” OR gauguin vermeer AND girl AND NOT pearl Search for  Mona Lisa 8/7/2018 Copyright 2010, Larry Snyder, Computer Science and Engineering

Queries In Advanced Search Searching strategies … Limit by top level domains or format … .edu Find terms most specific to topic … ibuprofen Look elsewhere for candidate words, e.g. bio Use exact phrase only if universal, … “Play it again” If too many hits, re-query … let the computer work “Search within results” using “-” … to get rid of junk 8/7/2018 Copyright 2010, Larry Snyder, Computer Science and Engineering

Queries, continued Once found, ask if site is best source How authoritative is it? Can you believe it? How crucial is it that the information be true? Cancer cure for Grandma Hikes around Seattle Party game 8/7/2018 Copyright 2010, Larry Snyder, Computer Science and Engineering

Search Engines No one controls what’s published on the WWW ... it is totally decentralized To find out, search engines crawl Web Two parts Crawler visits Web pages building an index of the content (stored in a database) Query processor checks user requests against the index, reports on known pages [You use this!] We’ll see how these work momentarily Only a fraction of the Web’s content is crawled 8/7/2018 Copyright 2010, Larry Snyder, Computer Science and Engineering

HTML and the Web As you know, the Web uses http:// protocol It’s asking for a Web page, which usually means a page expressed in hyper-text markup language, or HTML Hyper-text refers to text containing links that allow you to leave the linear stream of text, see something else, and return to the place you left Markup language is a notation to describe how a published document is supposed to look: fonts, text color, headings, images, etc. etc. etc. 8/7/2018 © 2011 Larry Snyder, CSE

Three Slides: Basics of HTML 1 Rule 0: Content is given directly; anything that is not content is given inside of tags Rule 1: Tags made of < and > and used this way: Attribute&Value <p style="color:red">This is paragraph.</p> Start Content End Tag Tag It produces: This is paragraph. Rule 2: Tags must be paired or “self terminated” 8/7/2018 © 2011 Larry Snyder, CSE

Example Write HTML in text editor: notepad++ or TextWrangler The file extension is .html; show it in Firefox or your browser 8/7/2018 © 2011 Larry Snyder, CSE

Three Slides: Basics of HTML 2 Rule 3: An HTML file has this structure: <html> <head><title>Name of Page</title></head> <body> Actual HTML page description goes here </body> </html> Rule 4: Tags must be properly nested Rule 5: White space is mostly ignored Rule 6: Attributes (style="color:red") preceded by space, name not quoted, value quoted 8/7/2018 © 2011 Larry Snyder, CSE

Three Sides: Basics of HTML 3 To put in an image (.gif, .jpg, .png), use 1 tag <img src="skier.jpg" alt="Skier in Snow"/> Tag Image Source Alt Description End To put in a link, use 2 tags <a href="http://www.cs.uw.edu/cse120">Pilot </a> Hyper-text reference – the link Anchor More on HTML (including good tutorials) at http://www.w3schools.com/html/default.asp 8/7/2018 © 2011 Larry Snyder, CSE

Return To Search Engines How to crawl the Web: Begin with some Web sites, entered “manually” Select page not yet crawled; look at its HTML For each keyword, associate it with this page’s URL as in http://www.cs.uw.edu/cse120/example : downhill and http://www.cs.uw.edu/cse120/example : skiing Harvest words from URL and inside <title> tags … For every link tag on the page, associate the URL with the words inside of the anchor text, that is, http://www.cs.uw.edu/cse120/ : pilot Save all links and add to list to be crawled 8/7/2018 © 2011 Larry Snyder, CSE

Net Result From Crawling A Page After crawling a page like http://www.cs.washington.edu/education/courses/cse120/11wi/freeProgramming.html the crawler will associate many terms with the URL: Picasso, Chelsey, Tron, … as well as Free, Programming, [from anchor] and cse120 [from URL] Terms from URL and anchor are more important in describing the page 8/7/2018 © 2011 Larry Snyder, CSE

Net Result of Crawling All Pages When the crawling is “done” (it’s never done), the result is an index, a special data structure that a query processor can use to look up your queries: Free: …, www.cs.washington.edu/cse120/freeProgramming.html, … Programming: …, www.cs.washington.edu/cse120/freeProgramming.html, … Picasso: …, www.cs.washington.edu/cse120/freeProgramming.html, … 8/7/2018 © 2011 Larry Snyder, CSE

Make A Query When Google gets the query It “ands” the two lists together, finding URLs that are on both lists It counts them up, records time, shows 10 hits 8/7/2018 © 2011 Larry Snyder, CSE

Houston, We Have A Problem You want the most likely hits … how does Google show you what you want? Page Rank – a mechanism to estimate the “importance” of a page; pages are listed by page rank, highest to lowest 8/7/2018 © 2011 Larry Snyder, CSE

Page Rank Google has never revealed all details of the ranking algorithm, but we know … URL’s are ranked higher for words that occur in the URL and in anchors URL’s get ranked higher if more pages point to them, it’s like: A links to B is a vote by A for B URL’s get ranked higher if the pages that point to them are ranked higher We Are Hit #25 8/7/2018 © 2011 Larry Snyder, CSE

Crawling/Querying Personally Virtual Folders are a “crawling/querying” technology that helps you Mac: Smart Folders PC: Saved Folders In both cases your files are “indexed”, that is, crawled, and the query you make results in a smart folder of the files that “hit” It’s like Googling the stuff on your own computer 8/7/2018 © 2011 Larry Snyder, CSE

Query “thesis” The folder doesn’t exist … it just contains links to the files shown Very convenient! 8/7/2018 © 2011 Larry Snyder, CSE

Search Engines … A Summary A search engine has two parts Crawler, to index the data Query Processor, to answer queries based on index In the case of many hits, a query processor must rank the results; page rank does that by “using data differentially ” … not all associations are equivalent; anchors and file names count more “noting relationship of pages” … a page is more important if important pages link to it Google, Bing, Yahoo and other Search Engines Use All of These Ideas 8/7/2018 © 2011 Larry Snyder, CSE