Download presentation
Presentation is loading. Please wait.
Hypertext Markup Language
INFO/CSE 100 Fluency in Information Technology If you’ve started homework 2, then you may already have found some tools for making Web pages, that let you enter formatted text just as you would in a word processor, without having to know anything about how the formatting is handled. Editing software like this is referred to a wysiwyg – what you see is what you get. But there are limits to what you can do with these tools, and you’ll likely run into those limits fairly quickly. Today, we’re going to look at what’s really in those Web pages, so if you have to, you can go beyond what the wysiwyg editors can do. 11/19/2018 fit html © University of Washington
Readings and References
Fluency with Information Technology Chapter 4, Marking up with HTML References World Wide Web Consortium 11/19/2018 fit html © University of Washington
fit100-05-html © 2004-2005 University of Washington
History of the WWW Issues that drove development of the Web No transparent way to link documents or objects References by citation or ftp address No automatic way to retrieve linked documents Downloading via ftp required human intervention No standard document format Each format required its own application Even with links, no way to find documents except by being told where to find them, following links Almost from the beginning, the Internet allowed people to make files available for downloading. This was by means of ftp, the file transfer protocol, which you’ve already used. Ftp was developed over the years But there was no mechanism for automatically fetching a document that was referred to in another document – a human had to run the ftp tool, log in on the remote machine, navigate to the directory, and request the file. Except for plain text documents, any formatting required having the correct application to display the particular format, and formats and applications were specific to each type of operating system. Even if automated linking of documents had existed, the only way to find out about a document would be for one person to tell another, or to find a link to some other document in a document one already had. 11/19/2018 fit html © University of Washington
fit100-05-html © 2004-2005 University of Washington
History of the WWW Web beginnings 1989: Tim Berners-Lee URLs, http, first browser 1993: NCSA Mosaic Then Netscape, then Mozilla 1994: World Wide Web Consortium Standards organization for Web protocols and formats 1994-5: Web crawlers and search engines WebCrawler, Lycos, AltaVista, Yahoo While working as a software engineer at CERN, the main European high-energy physics lab, Tim Berners-Lee called attention to the reference problem, and proposed a way of dealing with it: links embedded in documents that specify location of another document, automated retrieval, simple format for text documents, means of displaying them. Marc Andreessen and co-workers wrote Mosaic with public funding, then went off to form Netscape, which offered a browser based on Mosaic. This was the original no-apparent-business-model Internet company, because they offered their browser for free. No choice, as they were already taking flak for using a publicly funded tool as the basis for their product. The W3 Consortium is the semi-official standards organization for the Web – semi-official in the sense that it can’t enforce its standards. Certain large companies with a lot of market share tend to ignore the standards or the desire for interoperability when it is convenient for themselves. WebCrawler is often given credit for being the first crawler-based search engine. It was developed right here at the UW. Lycos, like Netscape, served as a model for later Internet companies. It holds the record for the fastest time from launch to IPO in the history of the NASDAQ. AltaVista was for years the main search site. It did not rank results, but had features available nowhere else that allowed the user to control the sort order for results. Only Ask Jeeves today has something similar. 11/19/2018 fit html © University of Washington
Web Pages in HTML Web pages are written in a special form called HTML – hypertext markup language. The term markup comes from typesetting – it refers to the notations written on a text document to tell how it should be laid out: where paragraph breaks go, what text should be set in bold or italics, etc. A markup language is the set of notations, or tags, that indicate these formatting options. Before we get to details of HTML coding, here’s a little philosophical background. When a web browser reads an HTML page, it uses the tags to help it lay out the page contents on the screen. This HTML code looked like this when it was displayed by a browser. But that’s not the only way that the page could be displayed. The user may choose to have the browser not show any images. They might resize the window, which would change how much text the browser can put in each line of a paragraph. Or maybe they’re reading the page on a tiny display, like a cell phone or PDA. Then the browser might abbreviate, or just show headings, and replace paragraphs by links. Some browsers don’t display pages visually at all – they read the page aloud. Headings and paragraph breaks provide places where a screen reading program can pause to let the user choose when to continue. The feature that permits this reformatting is that the tags don’t indicate location on the page – they indicate the purpose of the text. Using that, the browser can make informed decisions about how to display or read the page. Allowing the user and the browser to decide details of how a page is displayed was a design goal of the W3 committee. Attempting to force the page to be laid out just exactly one particular way breaks this goal. If you see a web page that won’t resize properly, or where text is splatted on top of images, it’s probably due to someone trying to use position controls to make it look right just in their own browser, with their own settings. The moral here is, don’t try to impose a strict choices about how the page should be laid out -- let the person viewing the page choose how to customize the display.
fit100-05-html © 2004-2005 University of Washington
HTML Structure All HTML files use the same basic structure: In order to do automated formatting, the typesetting tags have to be readable by a computer – they’re text just like the document contents. But then we need a way to tell the tags apart from the document contents. HTML does that by using angle brackets to surround the names of the tags, because those are rare in most text. Some kinds of tags say to do something at just the place in the document where the tag is, like start a new paragraph, or insert a horizontal line. Others say to start something at the point where the tag is, that will be ended by another tag later, for instance, tags that mark the beginning and end of some text in italics. The ending tags have names that are the same as the beginning tags, but with a slash in front. For instance, here’s a <table> tag that starts a table, and here’s a </table> that ends the table. Some tags need more information than just the name of the tag. For instance, a link has to tell what URL it points to, or a tag that changes the font may need to tell what font name to use, and what the size of the font is. To include this other information, the tag can have attributes (also called options or parameters). For instance, this <a> tag is a link, and this href attribute tells what other web page the link points to. Now let’s look at HTML code itself. Before we get to the visible contents of the page, we need a few preliminaries. First, there are tags that tell what part of the text in the file that the web browser should pay attention to when it interprets the HTML. This part starts with the <html> tag and ends with the </html> tag. Inside that are two sections of the page – the head and the body. The head includes information about the page, e.g. title and author. Or you can put in keywords to tell what the page topic is – these are used by search engines. (Bring up and show meta tags.) The body is where the actual page contents are. We’re going to make a simple page right here, so we’ll talk about the tags for page contents as we put them in. <html> <head> <title>Name of Page Goes Here </title> Heading content goes here </head> <body> Body content goes here </body> </html> 11/19/2018 fit html © University of Washington
fit100-05-html © 2004-2005 University of Washington
First HTML Web Page This HTML code produces this result So let’s get a page started. I’ll bring up a text editor to write the page in, and a browser to display it. Here are the start and end of the entire page, and the head and body sections. Here’s a title, and let’s put in one brief paragraph. 11/19/2018 fit html © University of Washington
fit100-05-html © 2004-2005 University of Washington
HTML Must Be Text Word processors (recall Chap. 2) insert formatting tags, confusing browsers Create source in NotePad, etc. Save in Text or txt format Save with file extension .html Avoid Confusion You’ll notice I’m not using any fancy word processing program that does page layout itself. Word processing programs have their own set of formatting tags, that are saved in the file with the document contents. The word processor just isn’t showing them to you, just like the browser doesn’t normally show you the HTML tags. If you try to display a word processor file, like a Word doc file, using a web browser, the browser will not open the page itself – it will either tell you it doesn’t know to display the file, or it will start up Word to display it. In any case, the browser itself won’t be what’s displaying the file. Instead, I’m using an ordinary text editor, that only puts in exactly the characters I tell it to. And I save the file as just plain text. 11/19/2018 fit html © University of Washington
fit100-05-html © 2004-2005 University of Washington
The Source The HTML code producing a page is the source which can always be viewed, either in the browser or in an editor Now, the browser will also show us what the HTML code for the page is. The code for a page, or a program, is called its source, or sometimes source code. This is a good way to find out about HTML – if you see a page that uses some sort of formatting that you’d like to use in your own page, you can look at the source. Even if it’s too cluttered to serve as a clear example, you can at least see what tags they’re using, and then go look up information about using those tags elsewhere. 11/19/2018 fit html © University of Washington
So far, we have a page with just some text in it.
fit100-05-html © 2004-2005 University of Washington
Add An Image Images are encoded many ways: GIF -- Graphics Interchange Format -- is for diagrams and simple drawings PNG -- Portable Network Graphics - GIF without patent problems JPEG -- Joint Photographic Experts Group -- is for high resolution photos, complex art Image tags for placing images <img src="awa.png" alt="American Writers"> Now lets add a picture. Browsers can display images only if they’re stored in a form that the browser understands. If you want your web page to be displayed by any type of browser, its best to stick to image formats that are supported by any browser. These three formats are the ones you can expect almost all browsers to be able to display. Just as with word processor documents, a browser might be able to display another format by starting up a separate program that’s a viewer for that image format. But for now, let’s stick to what the browser itself can display. GIF files store individual spots of color laid out in a rectangular grid. Then they compress the file in a way that doesn’t lose any of the original information. Unfortunately, the compression technique they use is patented, so technically, we should be licensing it from the patent owner in order to display GIF files. In order to avoid that issue, one can use PNG format instead – it also stores a grid of colors, but uses a different compression method. JPEG also compresses files, but it throws away information in order to get even better compression. Its compression doesn’t work well with really sharp edges like lines in diagrams or text, so it’s better used for photographs. The tag for including an image is img. It inserts the image at one point in the web page, so there is no separate end tag. It has an attribute, src, which is short for source, that tells where the browser can find the image file. In this example, the image file is in the same folder as the web page itself, so we don’t need to include anything but the file name – we don’t need to include a path. Let’s add an image to our page. (Bring up Explorer.) I have a few in a directory that’s inside the same directory as the web page. So to tell the browser how to get to the image from where the web page is, I need to include the folder name in the path. The alt attribute tells what to show if the browser can’t show the image. For instance, if the page is being read aloud by a screen reader, it would read the alt text instead. Here’s a png image. And here’s the same in jpeg format. You may be able to see that the jpeg is just a little fuzzier. tag attribute name attribute value 11/19/2018 fit html © University of Washington
fit100-05-html © 2004-2005 University of Washington
File Locations The path must say how to reach the file When the file is in the directory as the web page, just give the file name, ski.jpg If the file is in a subdirectory, say how to navigate to it, pix/ski.jpg If the file is in a superdirectory, move up using dot-dot notation, ../ski.jpg As you saw, we had to include the directory that the image file was in, because it wasn’t in the same directory as the web page. So we’ve mentioned these two cases. Here’s another example, where the image is in one directory level back out. This is the Unix command line syntax for referring to the enclosing directory or parent directory. So we have text and images – let’s add a link. After all, linking documents was the main motivation for the Web. The most common reason that an image is not displayed is the path is wrong … check! 11/19/2018 fit html © University of Washington
Linking to another page
The link starts with an a tag… <a href=" ... "> The value of the href attribute is a URL. <a href=" The link encloses some text, and ends with… <a href=" ... ">CSE home page</a> You’ve seen links that surround some text. A link tag is the type that starts in one place, and has an end tag later on. It has an attribute, href, that tells what the link refers to – what the other end of the link is. Its value is a URL, and it has to be enclosed in double quotes. Then comes the text that will serve as the place to click on to go to the other page, and finally, the end tag. 11/19/2018 fit html © University of Washington
So, we have text, images, links – enough for a very simple page.
fit100-05-html © 2004-2005 University of Washington
Simple HTML What we’ve seen here is very simple HTML HTML is changing Each document should start with a “DOCTYPE” comment telling which version of HTML it follows We used 4.01 Transitional There is a validator service that will check your page HTML isn’t static – they’re adding new features all the time, and the interpretation of some old tags has changed slightly. So to be sure the browser interprets our page according to the version of HTML used in that page, we can tell it what the version is. We put in a document type declaration, right at the top of the page. Let’s look at a page that has a DOCTYPE tag. This sort of tag that starts with an exclamation point is actually a comment – it’s not part of the page code. But we can provide information for the browser in special comments like this one. So let’s put in a DOCTYPE declaration. 11/19/2018 fit html © University of Washington
fit100-05-html © 2004-2005 University of Washington
Summary Web pages are written in HTML The files must be text The file extension must be .html (or .htm) Tags provide formatting and other info Some have attributes Some need an end tag Use a change-and-test process Specify the relative path to local images Use a link to refer to other documents 11/19/2018 fit html © University of Washington
Similar presentations
© 2025 Inc.
All rights reserved.