Lis508 lecture 2: characters to textual documents Thomas Krichel 2002-09-30.

Slides:



Advertisements
Similar presentations
Lis508 lecture 1: bits, bytes and characters Thomas Krichel
Advertisements

HTML I. HTML Hypertext mark-up language. Uses tags to identify elements of a page so that a browser such as Internet explorer can render the page on a.
Chapter 2—HTML Dreamweaver for College & Business.

OFFERED BY INSTRUCTIONAL COMPUTING AT THE UNIVERSITY OF MISSOURI – ST.LOUIS.
CIS101 Introduction to Computing Week 05. Agenda Your questions Exam next week - Excel Introduction to the Internet & HTML Online HTML Resources Using.
WMES3103 : INFORMATION RETRIEVAL
Lecture 6 Graphics, Number Systems. 7.2 Bit-map Graphics Similar to real painting on the canvas, there is no way to change something but paint over it.
Chapter 8_2 Bits and the "Why" of Bytes: Representing Information Digitally.
Introduction to HTML 2006 CIS101. What is the Internet? Global network of computers that are connected and communicate via a series of Protocols Protocols.
Introduction to HTML 2006 INT197B. What is the Internet? Global network of computers that are connected and communicate via a series of Protocols Protocols.
Introduction to HTML 2004 CIS101. What is the Internet? Global network of computers that are connected and communicate via a series of Protocols Protocols.
Lecture 3 1 ISO/IEC and Unicode It is a coded character set(codeset) –Designed for text processing and exchange Features: –Universal: characters.
Glencoe Digital Communication Tools Create a Web Page with HTML Chapter Contents Lesson 4.1Lesson 4.1 Get Started with HTML (85) Lesson 4.2Lesson 4.2 Format.
CIS101 Introduction to Computing Week 06. Agenda Your questions Excel Exam during second hour Our status after the snow day Introduction to the Internet.
Components Text Text--Processing Software A Word Processor is a software application that provides the user with the tools to create and edit text.
CPSC 203: Introduction to Computers Tutorials 03 & 29 by Jie (Jeff) Gao.
CCE-EDUSAT SESSION FOR COMPUTER FUNDAMENTALS Date: Session III Topic: Number Systems Faculty: Anita Kanavalli Department of CSE M S Ramaiah.
HYPERTEXT MARKUP LANGUAGE (HTML) Vijaya K Pandey.
COMPUTER FUNDAMENTALS David Samuel Bhatti
Unicode, character sets, and a a little history. Historical Perspective First came EBCIDIC (6 Bits?) Then in the early 1960s came ASCII – Most computers.
RFC Baby Steps Adding UTF-8 Support Tony Hansen IETF 83 March 27, 2012.
Introduction to Human Language Technologies Tomaž Erjavec Karl-Franzens-Universität Graz Tomaž Erjavec Lecture: Character sets
Publications, design sets, web pages
2.1 Different Text Attributes Font A set of printable or displayable text characters with its style and size specified Arial 16 point bold Arial 32 point.
Chapter 2 TEXT.
Encoding and fonts Edward Garrett Software Developer, ELAR.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 10 This presentation © 2004, MacAvon Media Productions Characters & Fonts.
Lecture 2 Character Codes and Low-Structure Text Document Formats.
Working with text ASCII and UNICODE.   
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Text.
File Formats Chapter 9 Bit Literacy. File formats are often ignored by users Applications automatically save files in the application’s format All formats.
.  Entertain  Inform  Educate  Blogs  Sell  Date  Gamble  Religion.
Constructing Your Own Corpus from Written Language.
CP2022 Multimedia Internet Communication1 HTML and Hypertext The workings of the web Lecture 7.
Data Representation and Storage Lecture 5. Representations A number value can be represented in many ways: 5 Five V IIIII Cinq Hold up my hand.
Course Content - Chapter 2 Introduction to HTML Introduction to a Text Editor as a web authoring tool Instructional Activity: Creating a webpage using.
 XML is designed to describe data and to focus on what data is. HTML is designed to display data and to focus on how data looks.  XML is created to structure,
Web Page Creation Part I ST: Introduction to Web Interface Design Prof. Angela Guercio.
Text and Graphics September 26, Unit 3.
1.Obtaining software 2.Sample pdf for this presentation 3.Checking accessibility of the pdf 4.Tackling inaccessibility 5.Tips and helpful links How to.
Introduction to web development and HTML MGMT 230 LAB.
Creating Web Pages Chapter 5 Learn how to… Identify Web page creation strategies. Define HTML Web page elements. Describe the principles of good screen.
Ali Alshowaish. What is HTML? HTML stands for Hyper Text Markup Language Specifically created to make World Wide Web pages Web authoring software language.
Anlab ( ) Kim, Yangjung Characters & Fonts.
Web Page Design Introduction. The ________________ is a large collection of pages stored on computers, or ______________ around the world. Hypertext ________.
Data Files on Computers Text Files (ASCII) Files that can be created by typing on the keyboard while using a text editor such as notepad or TextEdit.
SEC (1.4) Representing Information as bit patterns.
© 2011 Pearson Education, Inc., publishing as Longman Publishers. 1 Chapter 13 Designing Pages and Documents Technical Communication, 12 th Edition John.
Chapter 14: Files and Streams. 2Microsoft Visual C# 2012, Fifth Edition Files and the File and Directory Classes Temporary storage – Usually called computer.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
HTML HyperText Markup Language. Text Files An array of bytes stored on disk Each element of the array is a text character A text editor is a user program.
BASIC WORD PROCESSORS WEEK 5. BASIC WORD PROCESSORS Word Processor Word processor is a program which is used to edit text files and format them with font,
Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley What did we learn so far? 1.Computer hardware and software 2.Computer experience.
1 Problem Solving using Computers “Data....Representation, and Storage.
The idea of adding markup instructions to documents is not new. Before computers, authors would make annotations by hand in their written or typed documents.
Objectives  Explain the basic Unicode concepts in plain language  Install SILConverters 4.0  Install the converters for your branch  Convert several.
Unit 2.6 Data Representation Lesson 2 ‒ Characters
Lesson Objectives Aims You should be able to:
Characters & Fonts Digital Multimedia, 2nd edition
CSCI 198: Lecture 4: Data Representation
CSCI 161: Lecture 4: Data Representation
Representing Information as bit patterns
TOPICS Information Representation Characters and Images
Representing Nonnumeric Data
COMPSCI 111 / 111G An introduction to practical computing
Characters & Fonts Digital Multimedia, 2nd edition
INFO/CSE 100, Spring 2005 Fluency in Information Technology
ASCII and Unicode.
Presentation transcript:

lis508 lecture 2: characters to textual documents Thomas Krichel

Structure Character sets –Coded character set –Character endcoding

Literature Norton “new inside the PC” chapter 4 htmhttp:// htm ations/ictp99/ictp99N2705.htmlhttp://wwwinfo.cern.ch/asdoc/WWW/public ations/ictp99/ictp99N2705.html htmlhttp:// html

Recall from last lecture UCS is a character set defined by the ISO The most important characters are in the basic multilingual plane. It has 2^16=65536 characters. UCS characters in the BMP can be represented by two bytes. Other characters need more space.

Unicode Unicode are an industry consortium. The Unicode Standard published by the Unicode Consortium corresponds to the BMP of ISO All characters are at the same positions and have the same names in both standards. The Unicode Standard defines in addition much more semantics associated with some of the characters. There is a free online book at ml ml

application Word and Wordpad give the option to input Unicode character –Insert symbol –Hex sequence followed by ALT-X You may not see the character if you do not have a font for it. Wordpad and Notepad allow to save the Unicode file in various encodings. When in doubt, use Unicode UTF-8. –likely to be the most widely supported –does not screw up ASCII text

Textual documents

What is textual document? A text is a sequence of characters. A textual document is a text with some formatting –Font –Font shape (e.g. italics) –Spacing and other “lay-out” issues Why are librarians concerned about textual documents?

Creation of textual documents Pure text editors only create text. Usually text is created with wordprocessing software. This surrounds text with digital gibberish that explains the formatting. Formatting instructions are depended on the wordprocessing software. Why is this bad?

Storing of textual documents Most widely used is PDF It is based on a language called postscript that describes documents. –Support for fonts –Support for inclusion of non-textual files PDF compresses PostScript files Proprietary format owned by Adobe Inc. Requires special software Also bad for digital preservation

Thank you for your attention!