Download presentation
Presentation is loading. Please wait.
Published byOphelia Lane Modified over 9 years ago
1
Structured-Document Processing Languages (3 cu), Spring 2002 Pekka Kilpeläinen University of Kuopio Department of CS & Applied Math Pekka.Kilpelainen@cs.uku.fi
2
SDPL 2002Notes 1: Introduction2 1 Introduction First: Overview and Arrangements What this course is about? 1.1 Structured Documents Review of basic notions
3
SDPL 2002Notes 1: Introduction3 Goals of the Course n To familiarize with the most important models and languages for –manipulating –representing –transforming and –querying structured documents (or XML) n “Core XML processing technology” –little about specific XML applications or commercial systems
4
SDPL 2002Notes 1: Introduction4 NOT an Exhaustive Survey n Bias in selecting course topics: –estimated usefulness/value »centrality (implying longer-lasting value) »maturity: Stable specifications? Existing implementations? –Lecturer up-to-date? n Emphasis on active formalisms (for describing processes on documents) instead of describing documents/data
5
SDPL 2002Notes 1: Introduction5 Motivation? n Practical relevance: “eBusiness” is HOT! n Academic interest on models of information processing XML Internet order invoice
6
SDPL 2002Notes 1: Introduction6 Preliminary Outline 1 Introduction Overview and Arrangements 1.1 Structured Documents 2 Document Instances and Grammars 2.1 Trees and their Grammars 2.2 Review of XML basics: DTDs, Namespaces, Schemas 3 Programmatic Manipulation of Structured Documents (XML APIs) 3.1 SAX 3.2 DOM
7
SDPL 2002Notes 1: Introduction7 Preliminary Outline (2) 4 Styling Structured Documents I 4.1 Essentials of Cascading Style Sheets 5 Transforming Structured Documents 5.1 Addressing: XPath 5.2 XSLT 6 Styling Structured Documents II: XSL 7 XML Web-Site Architectures 8 XML wrapping (or translating data to XML) 9 Querying Structured Documents 9.1 Region Algebra and sgrep 9.2 XML Query Languages
8
SDPL 2002Notes 1: Introduction8 Methodological Goals n Some central professional skills –consulting of technical specifications –experimenting with SW implementations n Ability to think…? –to find out relationships –to apply knowledge in new situations n ("Pidgin English" for scientific communication)
9
SDPL 2002Notes 1: Introduction9 Administration n An elective graduate-level (laudatur) special course –suitable for all specialisation lines (esp. CS/SWE) n 3 cu ( 120 hours of work) n Lectures Mar 11 - May 8, Microteknia MT2 –Lecturer: Pekka.Kilpelainen@uku.fi n Assistant: Sami.Komulainen@uku.fi
10
SDPL 2002Notes 1: Introduction10 Administration: Exercises n Exercises Mar 20 - May 15(?), MT2 –essential for familiarising with the subject –mainly normal homework assignments, solutions discussed in class –1 or 2 groups, depending on attendance n + a few (1-3) "mini-projects" »reading and summarising tasks? »hands-on experimentation? »to be handed-in to lecturer –credited like other exercises (scaled based on quality by a factor in [0, 1.5])
11
SDPL 2002Notes 1: Introduction11 Administration: Grading n Course examination on Wed, May 22, in the Great Lecture hall (SL) –minimum of 50% of exam points to pass the course Grade = (32*Exam/MaxExam + 12*HomeWork/MaxHomeWork - 8)/3 Grade = (32*Exam/MaxExam + 12*HomeWork/MaxHomeWork - 8)/3 n Opportunity to retake the exam –June 12 ( 50% to pass, grade with/without homework credits)
12
SDPL 2002Notes 1: Introduction12 Material n No single textbook n Reports, articles n Course home page –http://www.cs.uku.fi/~kilpelai/RDK02/ –lecture notes, exercises, reference material, announcements, … n Recommended (but not required) text: Deitel, Deitel, Nieto, Lin & Sadhu: XML - How to Program. Prentice Hall, 2001.
13
SDPL 2002Notes 1: Introduction13 Background Check n Basic knowledge of structured documents and document standards –Course "Document standards"? –HTML? n Programming languages and concepts –OO programming, Java? –Unix/Linux \ Windows? n Formal language theory –Theory of Computation / "Ohjelmoinnin ja laskennan teoria"? –regular expressions, automata? –context-free grammars, parse trees?
14
SDPL 2002Notes 1: Introduction14 Course Expectations?
15
SDPL 2002Notes 1: Introduction15 1.1. Structured Documents n Document: –a structured representation of information on some medium ( message) –normally for a human reader »memos, manuals, articles, books, … –also application-to-application messages »EDI (electronic data interchange) –"prose-oriented XML" vs "data-oriented XML" –possibly non-permanent, dynamically generated –processable or conceivable as a unit »(a web page vs a web site)
16
SDPL 2002Notes 1: Introduction16 Text-Based Documents n We concentrate on textual or text-based documents –character data major constituent of information content –as opposed to, say multimedia documents n Next: Presentation vs Structure
17
SDPL 2002Notes 1: Introduction17 Presentation vs Structure n Presentation informs the human reader about the meaning of text and the role of its parts n Markup: indicating the presentation or the meaning of different parts of text –originally hand-written annotations for the typesetter –nowadays primarily codes embedded in digital documents
18
SDPL 2002Notes 1: Introduction18 Markup n Procedural markup –formatting commands (start boldface, produce an empty line, indent 5 mm,...) –proprietary word processor formats, nroff, TeX,... n Descriptive or generic markup –indicating the logical structure of text using chosen names –LaTeX: \begin{abstract}... \end{abstract} –HTML:.... –HTML:.... n Markup language –a fixed set of markup notations (e.g. nroff, TeX, HTML, SVG, …)
19
SDPL 2002Notes 1: Introduction19 Structured documents? Most liberally, any document is structured (punctuation, words, sentences, fields, …) but especially descriptively marked-up documents... especially if they adhere to a rigorous specification of structure.
20
SDPL 2002Notes 1: Introduction20 Structure in documents n Hierarchy or nesting is ubiquitous –chapters of books, warnings in maintenance manuals,... n Linear order essential in prose documents –less important in documents representing data objects n Hypertext and cross-references n We'll be mainly dealing with manipulation of hierarchical, or tree-like document structures Next: How these are modelled?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.