Download presentation
Presentation is loading. Please wait.
Published byAmy Gardner Modified over 9 years ago
1
Document Computing Technologies for Managing Electronic Document Collections Ross Wilkinson... [et al.] Circulation Counter [RES3H] ZA4080.D63 1998
2
Chapter 1 Document Lifecycle
3
What is a document? A document records a message from people to people.
4
Characteristics of a document Content Structure Metadata
5
A message has a context, which is important for understanding the message. A document contains not only the contents of a message, but also some information about the document, e.g. author, date, recipients. We called such information the metadata about the document.
6
Why Document Management? It is hard to find documents. It is hard to organize documents. It is hard to control documents. Metadata helps document management.
7
Benefits of Document Management Location-independent delivery of documents upon demand Controlled access to documents A record of the life of a document Better re-use of documents
8
Chapter 2 Electronic Document Description
9
Document Content Simplest type of content – unformatted text Text retrieval system based on search by keywords E.g Windows Desktop Search (video)Windows Desktop Searchvideo Optical character recognition (OCR) system
10
Document Structure Even unformatted text has some structures, e.g. lines, words, images, etc. A document may have elaborate structures. Two levels of structures: –Logical structure –Presentational structure
11
Logical structures Example: TO: John D. FROM: Kate M. DATE: 7/8/98 I have finished Stage B of the design. Could you take a look at it? Simple logical structure: lines of text A logical structure of a memo: (see next slide)
12
A logical structure for a memo Memo Head SenderReceiverDate Body Paragraph
13
Presentational Structure A different presentational structure for the same memo John D., 7/8/98 I have finished Stage B of the design. Could you take a look at it? Kate M.
14
Presentation medium The content of the same document can be presented in different media with different presentational structures: E.g. a PDF file vs. a online Web page
15
Metadata Generally, we need metadata to capture: –Registration information –Usage information –Structural properties –Contextual information –Content description –Historical information
16
The Dublin Core metadata set Title Creator Subject Description Publisher Contributors Date Type Format: e.g. HMTL, pdf Identifier: e.g. URI Source Language Relation Coverage: duration Rights: e.g. copyright
17
Document Description Language (DDL) For use by document management system E.g. RTF, Postcript, SGML DDL support: –Language support, media support, transparency, structure, link support, metadata support Other DDL characteristics: –Document creation, import conversion, export transformation, update, presentation quality, presentation flexibility, etc.
18
Examples of DDLs ASCII (American Standard Code for Information Interchange) Unicode ASCII and Unicode offer very limited support Rich Text Format TeX and LaTeX SGML, HTML, XML Postscript, PDF
19
Rich Text Format (RTF) Developed by Microsoft For interchange between Microsoft Word and other software Main purposes: –Preserve information in Word (blocks of text) Example: next slide
20
{\rtf1\adeflang1025\ansi\ansicpg1252\uc2\adeff0\deff0\stshfdbch13\stshfloch0\stshf hich0\stshfbi0\deflang2057\deflangfe1028{\fonttbl{\f0\froman\fcharset0\fprq2{\*\pan ose 02020603050405020304}Times New Roman … {\title John D}{\author Dr. Yeung}{\operator Dr. Yeung}{\creatim\yr2008\mo3\dy18\hr15\min24}{\revtim\yr2008\mo3\dy18\hr15\mi n25}{\version1}{\edmins1}{\nofpages1}{\nofwords14}{\nofchars81}{\*\company Lingnan University}{\nofcharsws94} … \ltrch\fcs0 \insrsid1782868\charrsid1782868 \hich\af0\dbch\af13\loch\f0 John D., 7/8/98 \par \hich\af0\dbch\af13\loch\f0 I have finished Stage B of the design. Could you take a look at it? \par \par \hich\af0\dbch\af13\loch\f0 Kate M\hich\af0\dbch\af13\loch\f0. \par }\pard \ltrpar\ql \li0\ri0\widctlpar\wrapdefault\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 {\rtlch\fcs1 \af0 \ltrch\fcs0 \insrsid4811147 \par }}
21
TeX and LaTeX TeX created by Donald Knuth TeX is a typesetting software. LaTeX created based on TeX by Leslie Lamport LaTeX use markup constructs to separate logical description from presentation. LaTeX example: see next slide To learn LaTeX: click.click
22
\documentclass{article} \usepackage{times} \pagestyle{empty} \begin{document} \title{Sample Document} \author{ W. L. Yeung\\Department of Computing and Decision Sciences\\ Lingnan University, Hong Kong\\wlyeung@ln.edu.hk} \maketitle \section{Introduction} … \section{Conclusion} … \end{document}
24
SGML Standard Generalized Markup Language To describe a document in SGML, we need: –An SGML declaration –A document type definition (DTD) –A document instance An SGML declaration specifies which characters are used in the DTD. Normally a default is used.
25
SGML (cont.) A document type definition (DTD) defines the rules for forming a class of documents, i.e. the grammar of a document class. The building blocks of SGML documents are elements. A DTD for the memo document: next slide.
27
DTD An element definition gives the name of the element, then the rules for building that element. Elements can contain other elements. Terminal (basic) elements often consist of parsed character data “#PCDATA” or “#CDATA”.
28
The memo in SGML John D Kate M 7/8/1998 I have finished Stage B of the design.
29
HTML Hypertext Markup Language For World Wide Web (WWW) documents Conforms to a SGML DTD HTML is presentation oriented: instructions (tags) are inserted into a document to for presentation effects The DTD for HTML is available on http://www.w3.org/TR/html401/sgml/dtd.html http://www.w3.org/TR/html401/sgml/dtd.html
30
The memo in HTML Memo Memo I have finished Stage B of the design.
31
XML Extensible Markup Language Three basic definitions: –XML for representing data and documents –XLink and XPointer for representing inter- document linking –XSL for representing presentation XML is a near-subset of SGML
32
XML (Cont.) Two classes of XML documents: –Valid XML documents: documents that conform to a specific supplied DTD –Well-formed documents: only satisfy a simple default grammar, without conforming to a specific DTD XML has become the cornerstone of electronic commerce as it allows businesses to exchange electronic documents according to some standard formats based on XML.
33
Postscript Developed by Adobe For representing documents that are to be printed (mainly on laser printers) A page description language optimized for printing text, images, graphics.
34
Portable Document Format (PDF) Developed by Adobe A page description language for representing text, graphics and images A PDF file contains presentation information on pages, annotations, links, fonts, etc. Support delivery of electronic documents exactly as they would appear in printed form. Not designed for editing or document format exchange.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.