Download presentation
Presentation is loading. Please wait.
1
WMES3103 : INFORMATION RETRIEVAL
TEXT AND MULTIMEDIA LANGUAGES AND PROPERTIES
2
INTRODUCTION Text - main form of communicating data and information
Text also supplemented with multimedia elements - to make the contents of an IRS more attractive and interactive Website with a combination ot text and multimedia will be visited by many as compared to one which is text-based only IRS - text and multimedia is depicted via special languages.
3
Metadata New concept on information – metadata
Information about data arrangement, data domain and relationship between the two Data about data 2 types – descriptive and semantic
4
Commonly used Metadata :
descriptive Metadata – metadata which explain about document or one unit of information Commonly used Metadata : Authors Date of publication Source of publication Length of document Type of document
6
Metadata semantic Metadata –resembles subject that can be obtain from the contents of the document – subjects heading Keywords LC Code
7
TEXT With computers, we need to code text into binary digits
First coding schemes – EBCDIC and ASCII – 7 bits to code each symbol Then, ASCII changed to 8 bits to accommodate other languages, accents and diacritical marks Oriental languages – Unicode – 16 bits
8
TEXT Formats No one single format for a text document
Good IRS system should be able to retrieve information from any format Initially, IRS will convert a document to an internal format but this had a lot of disadvantages Now, many new format has been developed for document interchange
9
TEXT RTF – Rich Text Format for word processing
PDF – Portable Document Format for displaying and printing documents Postscript – powerful programming language for drawing MIMT – Multipurpose Internet Mail Exchange to encode Files are compressed – Compress (Unix), ARJ (PCs), ZIP Convert binary files to ASCII text –uuencode/uudecode, binhex
10
MARKUP LANGUAGES Markup = extra textual syntax that can be used to describe formatting actions, structure information, text semantics, attributes, etc. Formal markup languages are more structured Marks = tags - initial and ending tag surrounding the marked text Standard metalanguage = SGML New metalanguange for Web = XML (eXtensible Markup Language) = subset of SGML Most popular markup language used for the Web = HTML (HyperText Markup Language)
11
MULTIMEDIA Applications that handle different types of digital data originating from distinct types of media Text, sound, images, video Digital data distinct and different in volume, format, and processing requirements Different types of formats necessary for storing each type of media
12
MULTIMEDIA Different formats used commonly on the Web and in digital libraries Images Audio Moving Images Textual Images Graphics and Virtual Reality
13
IMAGES XBM, BMP, PCX – direct representation of a bit-mapped (or pixel-based) GIF (Graphic Interchange Format) – includes compression and good for black or white or with small number of clours or gray levels (256) JPEG (Joint Photographic Experts Group) – includes compression TIFF (Tagged Image File Format) – used to exchange different documents between different applications and different computer platforms TGA (Television Targa image file) – associated with video game boards Various other image formats
14
AUDIO Must be digitized before storage
AU, MIDI (standard format to interchange music between electronic instruments and computers), WAVE – for small pieces of digital audio Audio libraries – RealAudio or CD formats Animation or moving pictures MPEG (Moving Pictures Expert Group) – related to JPEG Others – AVI, FLI, QuickTime
15
TEXTUAL IMAGES Images that contain mainly typed or typeset text
Obtained by scanning the documents For archival purposes Saved as images but with further compression Textual and non-textual stored and compressed separately and when neded can be combined and displayed together
16
GRAPHICS AND VIRTUAL REALITY
3-dimensional graphics found on Web CGM (Computer Graphics Metafile) standard Metafile = collection of elements CGM standard specifies which elements are allowed to occur in which positions in a metafile VRML (Virtual Reality Modeling Language) – file format for describing interactive 3D objects and worlds - universal interchange format for 3D graphics and multimedia - can be used for various applications
17
MULTIMEDIA DOCUMENTS MARKUP
HyTime = Hyper/Time-based Structuring Language – standard defined for multimedia documents markup SGML architecture which specifies the generic hypermedia structure of documents
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.