1 CS 502: Computing Methods for Digital Libraries Lecture 4 Text.

Slides:



Advertisements
Similar presentations
CSCI N241: Fundamentals of Web Design Copyright ©2004 Department of Computer & Information Science Introducing XHTML: Module B: HTML to XHTML.
Advertisements

DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
Web Development & Design Foundations with XHTML
Text #ICANN50. Text #ICANN50 IDN Variant TLD Program GNSO Update Saturday 21 June 2014.
1 eVenzia Technologies Learning HTML, XHTML & CSS Chapter 1.
Bits and the "Why" of Bytes: Representing Information Digitally
Tafseer Ahmed Department of Computer Science University of Karachi Urdu on Linux International Support.
Web Technologies COMP6115 Session 2: Planning, Designing, Constructing and Testing Static Web Sites Dr. Paul Walcott Department of Computer Science, Mathematics.
Tutorial 1 Getting Started with HTML5
HTML and XHTML Controlling the Display Of Web Content.
Administrivia Assignments Labs Questions?? Class questions – –Goes to dpd and the TA’s Hand in lab assignments.
WMES3103 : INFORMATION RETRIEVAL
Unicode and the Web Nathan Schneider. Special Text In our interactions with computers, it is often desirable to use characters other than the standard.
Chapter 8_2 Bits and the "Why" of Bytes: Representing Information Digitally.
1 CS 502: Computing Methods for Digital Libraries Lecture 6 DTDs.
What is a document? Information need: From where did the metaphor, doing X is like “herding cats”, arise? quotation? “Managing senior programmers is like.
Addition : _________________ Binary Numbers (contd)
1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.
Media: Text “Words and symbols in any form, spoken or written, are the most common system of communication.” ~ unknown.
Upgrading to XHTML DECO 3001 Tutorial 1 – Part 1 Presented by Ji Soo Yoon 19 February 2004 Slides adopted from
Introducing XHTML: Module B: HTML to XHTML. Goals Understand how XHTML evolved as a language for Web delivery Understand the importance of DTDs Understand.
Introducing HTML & XHTML:. Goals  Understand hyperlinking  Understand how tags are formed and used.  Understand HTML as a markup language  Understand.
XML, CM, and KM KMWorld 2001 Thursday November 1, 2001 Darlene Fichter Data Library Coordinator University of Saskatchewan Libraries Frank Cervone Assistant.
Computer Sciences Department
Basics of HTML Shashanka Rao. Learning Objectives 1. HTML Overview 2. Head, Body, Title and Meta Elements 3.Heading, Paragraph Elements and Special Characters.
Chapter 12 Creating and Using XML Documents HTML5 AND CSS Seventh Edition.
Introduction to Human Language Technologies Tomaž Erjavec Karl-Franzens-Universität Graz Tomaž Erjavec Lecture: Character sets
Week 1.  Phillip Chee   Ext.1214 
Globalisation & Computer Systems week 5 1. Localisation presentations 2.Character representation and UNICODE UNICODE design principles UNICODE character.
Encoding and fonts Edward Garrett Software Developer, ELAR.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
Chapter 6 Text and Multimedia Languages and Properties
Lesson 4: Using HTML5 Markup.  The distinguishing characteristics of HTML5 syntax  The new HTML5 sectioning elements  Adding support for HTML5 elements.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
HTML (HyperText Markup Language)
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
Using Html Basics, Text and Links. Objectives  Develop a web page using HTML codes according to specifications and verify that it works prior to submitting.
Building digital libraries in Indian languages: case studies with Hindi and Kannada B.S. Shivaram Trainee ( ) National Center for Science Information.
INFOCODING BASICS & EXAMPLES OF CURRENT USE Introduction to Computer Science Using Ruby (c) 2010 Gideon Frieder.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
HTML | DOM. Objectives  HTML – Hypertext Markup Language  Sematic markup  Common tags/elements  Document Object Model (DOM)  Work on page | HTML.
XML 2nd EDITION Tutorial 1 Creating An Xml Document.
Introduction to web development and HTML MGMT 230 LAB.
XP 2 HTML Tutorial 1: Developing a Basic Web Page.
Chapter 1 Understanding the Web Design Environment Principles of Web Design, 4 th Edition.
Globalisation & Computer systems Week 5/6 Character representation ACII and code pages UNICODE.
Web Development & Design Foundations with XHTML Chapter 2 HTML/XHTML Basics.
XP 1 Creating an XML Document Developing an XML Document for the Jazz Warehouse XML Tutorial.
1 Introduction to XML XML stands for Extensible Markup Language. Because it is extensible, XML has been used to create a wide variety of different markup.
XML Design Goals 1.XML must be easily usable over the Internet 2.XML must support a wide variety of applications 3.XML must be compatible with SGML 4.It.
1 Tutorial 11 Creating an XML Document Developing a Document for a Cooking Web Site.
Copyright © Terry Felke-Morris WEB DEVELOPMENT & DESIGN FOUNDATIONS WITH HTML5 Chapter 2 Key Concepts 1 Copyright © Terry Felke-Morris.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
Document Computing Technologies for Managing Electronic Document Collections Ross Wilkinson... [et al.] Circulation Counter [RES3H] ZA4080.D
©SoftMoore ConsultingSlide 1 Introduction to HTML: Basic Document Structure.
XML CSC1310 Fall HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December  Markup  Markup is a symbol.
XML The Extensible Markup Language (XML ), which is comparable to SGML and modeled on it, describes how to describe a collection of data. A standard way.
Copyright © 2004 ProsoftTraining, All Rights Reserved. Lesson 2: Markup Language and Site Development Essentials © 2007 Prosoft Learning Corporation All.
Microsoft Expression Web 3 – Illustrated Unit D: Structuring and Styling Text.
Objective: To describe the evolution of the Internet and the Web. Explain the need for web standards. Describe universal design. Identify benefits of accessible.
© 2001, Penn State University Encoding on the Internet Elizabeth J. Pyatt CETS.
XP 2 HTML Tutorial 1: Developing a Basic Web Page.
XP 1 HTML Tutorial 1: Developing a Basic Web Page.
Blended HTML and CSS Fundamentals 3 rd EDITION Tutorial 1 Using HTML to Create Web Pages.
TOPICS Information Representation Characters and Images
Workshop on XML-Based Library Applications 5
Basic Communication Concepts
DirectWrite By Lukas Morozovas™.
INFOCODING BASICS & EXAMPLES OF CURRENT USE
ASCII and Unicode.
Presentation transcript:

1 CS 502: Computing Methods for Digital Libraries Lecture 4 Text

2 Administration Assignment 1 submission problems: Due date postponed to Thursday 12:20 Demonstration by Dean Eckstrom Wednesday discussion classes: Olin 155, 7:30-8:25 and 8:35 to 9:00 Check Notices for sections

3 Digital Libraries and Checking Information to Teaching Assistants: "I have heard that..." "There is a rumor that..." Authoritative source(s): Course web site -- Notices

4 Text The richness of text Elements: letters, scripts, symbols Structure: words, sentences, paragraphs, headings, tables Appearance: fonts, layout, design, materials Special: mathematics, music Digital libraries must represent ever variant!

5 Markup and Page Description Mark-up languages represent the structure of text e.g., SGML, XML The mark-up must be combined with a style sheet for rendering. Page description languages represent the appearance of text e.g., PostScript, PDF

6 Markup and Style Sheets style sheet rendering software document content and structure formatted document

7 Alternative Renderings style sheet for display rendering software document content and structure printed document rendering software style sheet for print computer display

8 Example: the Oxford English Dictionary Typography of printed text represented semantic information. Keyboard the text, capturing all typographic information. Automatic parser to extract semantics (e.g., date, quotation, phonetics, etc.). Markup in SGML to tag semantic information. Separate style sheets for various editions, print, CD-ROM, online. Before the web, yet used with the web.

9 Character Distinguish between the abstract character as a structural element, "A" representations of the character A A A A A A "capital a"

10 ASCII A binary encoding of a character as an 8-bit byte, e.g., is the encoding for "A" printable ASCII standard (7-bit) ASCII extended (8-bit) ASCII 32

11 Unicode 16-bit codes that represent distinct characters organized by scripts, not languages compatible with Unihan (Chinese, Japanese, Korean)

12 Scripts Scripts supported by Unicode 2.0 Arabic Armenian Bengali Bopomofo Cyrillic Devanagari Georgian Greek Gujarati Gurmkhi Han Hangul Hebrew Hiragana Kannada Katakana Latin Lao Malayalam Oriya Phonetic Tamil Telugu Thai Tibetan

13 More Scripts Numbers General Diacritics General Punctuation General Symbols Mathematical Symbols Technical Symbols Dingbats Arrows, Blocks, Box Drawing Forms & Geometric Shapes Miscellaneous Symbols Presentation Forms

14 Unicode and UTF-8 UTF-8 a stream encoding of Unicode characters. one to six bytes to represent each Unicode character, identified by number of leading ones. single byte characters are identical to printable ASCII, e.g., has no leading one, therefore it is a single byte code.

15 Markup Languages SGML (Standard Generalized Markup Language) A system for creating markup languages that represent the structure of a document XML (eXtensible Markup Language) A simplified version of SGML intended for use with online information DTD (Data Type Definition) A markup specification for a class of documents, defined within the SGML framework HTML (Hypertext Markup Language) A markup and formatting language with links to other objects

16 XML Example (Metadata) Digital Libraries and the Problem of Purpose David M. Levy Corporation for National Research Initiatives January 2000 article continued on next slide

17 continued from previous slide /january2000-levy English D-Lib Magazine Copyright (c) David M. Levy XML Example (Metadata)

18 Constructing a DTD: Entities Entities are basic units of information: Character entities a b... z ! ?... < α Any other entities &logo; &square-root;

19 Entities The name of an entity is purely mnemonic. It makes no assertions about the context in which the entity is used or its appearance when rendered. The DTD used by a scientific publisher will have about 4,000 entities to represent all the special symbols and the variants used in scientific disciplines.

20 Constructing a DTD: Elements Elements define the structure. An element is a string of entities, bracketed by tags: This is a paragraph. Some heading Jane Austen John Hancock

21 Constructing a DTD: Grammar Every DTD has a grammar that defines: allowable relationships between entities and elements hierarchies and nesting etc. The grammar is expressed as a set of rules that can be processed automatically.