Documents, Text Editors, Text Retrieval, and Web Pages Class 3 LBSC 690 Information Technology.

1 Documents, Text Editors, Text Retrieval, and Web Pages Class 3 LBSC 690 Information Technology

2 Agenda Questions Unix Survival Guide Document Creation (Word Processing and HTML) Document Retrieval Project Overview

3 Unix Survival Guide WAM account Directory structure (mkdir, cd,.., /) How much space is used (du, ls -l) Eliminating unneeded files (rm) Managing mail (pine, attachments) Moving files (mv, cp, ftp) Editing files (pico, more) Web anywhere (lynx)

4 Document Creation Editors Word Processors Desktop Publishing Structured Documents HTML/SGML/XML

5 Editors (Text Editing vs. Word Processing) Purpose –Create and modify ASCII text Examples –pico, axe, and emacs on WAM Advantages –Compatible with virtually everything (VT-100) Disadvantages –Limited format control, sometimes no mouse

6 Word Processors Purpose –Create documents intended for human readers Examples –Microsoft Word and Word Perfect in OWL Advantages –Good format control –WYSIWYG (“What You See is What You Get”) Disadvantages –No (universal) standard interchange format

7 Desktop Publishing Purpose –Produce documents for wide (paper) distribution Examples –Adobe Pagemaker in the WAM labs Advantages –Allows very detailed layout control Disadvantages –Requires fairly extensive user expertise

8 Structured Documents Purpose –Specify logical structure of the documents Examples –email, HTML, LaTeX, SGML/XML Advantages –Allows easy reformatting for different displays Disadvantages –Hard to read unless “rendered” before viewing

9 Hyper-Text Markup Language (HTML) Purpose –Structured document language for web pages Advantages –Adapts easily to different display capabilities –Widely available rendering software (browsers) Disadvantages –Direct control over layout is limited –The HTML “standard” is still evolving

10 First Steps in HTML Find a web page you like Select “Document Source” in “View” menu Compare HTML code with rendered version –Observe how to achieve each effect Select “Save As” in “File” menu FTP the file to ~/../pub/ on WAM Edit the file using pico

11 HTML Document Structure Markup tags (open and close) bracket content … Title shows up in the Web browser’s frame Headers show up in the page itself For each link, specify the URL and link text link text Inline graphics can replace the link text

12 Designing Web Pages Key design issues: –Content: What do you want to publish? –Style: How do you want to present it? –Syntax: How can you achieve that presentation? Sources of information –Online tutorials (Yahoo points to lots of these) –Technical materials (e.g., the HTML 3.0 spec)

13 Style Guidelines Design for generic browsers –And test on every version you wish to support Provide appropriate access points –User needs and navigation strategies differ Design useful navigational aids –A web search may lead to the middle of a site Include some indication of currency –Date of last update, “new” icons, etc.

14 HTML Editors Goal is to create web pages, not learn HTML! Several are available –In Explorer, “Edit-Page” for Front Page Express –In Netscape, “File-Edit Page” for Composer You may still need to edit the HTML file –Some editors use browser-specific features –Some HTML features may be missing entirely –File names may be butchered by FTP

15 SGML/XML Generalized Markup Languages –SGML - Standard Generalized Markup Language (for paper documents) –XML - eXtensible Markup Language (for Web documents) (see W3C) These allow people to design –DTDs - Document-type definitions A Document also needs: –DSSSL - Document Stylesheet Specification Language

16 Document Retrieval Making documents is often easier than finding them! Hypertext vs. Cataloging vs. Searching –yahoo vs. altavista Lots of applications –Chasing down citations in papers you read –Web search engines –Managing your personal files Two basic approaches to searching –Explicit queries (“information retrieval”) –“Watch what I do” (“adaptive filtering”)

17 Ways of Searching for Text Controlled vocabulary –Manual indexing based on named concepts Free text –Characterize documents by the words the contain Social filtering –Exchange and interpret personal ratings

18 “Exact Match” Retrieval Find all documents with some characteristic –Indexed as “Presidents -- United States” –Containing the words “Clinton” and “Peso” –Read by my boss A set of documents is returned –Each is as likely to be useful as any other –Usually listed in date or alphabetical order

19 Ranked Retrieval Put most useful documents near top of a list –Put possibly useful documents lower in the list No need to exclude any documents –Just list those least likely to be useful last Two basic techniques –Similarity-based –Probability-based

20 Similarity-Based Retrieval Assume “most useful” = most similar to query Lots of clues to meaning –Repeated words are good cues to meaning –Rarely used words make searches more selective Easily combined –Compute a “weight” for each term –Add up the weights for query terms in a document

21 Project Overview Goal: Solve a practical problem –One which is fairly complex You choose the technology –Make a set of web pages (a web “site”) –Make a database (optional for summer 690) –Do something else that is equally complex Multimedia presentation, Java program, … Suggest two-person groups

22 Web Projects Have significant content! (see “What is a Book” web site under CLIS Dean’s Award) Multiple access points –Taxonomy, search engine, map, etc. Be creative (in a useful way)! For example: –Choose a novel application –Engage the user with an interactive approach –Adopt an innovative organization –Implement a creative layout

23 Database Projects (very ambitious for Summer 690) Your focus should be on scalability –What if the IRS decided to use your database? The user interface is important –Designed to be used without taking 690 first! Include enough content to allow testing –But focus on organization, not on content The same creativity issues as web projects

24 Project Timeline and Deliverables (summer 690) Project specification (1-2 pages) Should include User Manual (FAQ) and Test Plan components Project demonstrations last week of class –Scheduled individually –All two/three team members get the same grade

