Download presentation
Presentation is loading. Please wait.
1
Document Engineering Robin Burke ECT 360
2
Outline Admin Quiz + Answers Document Engineering In-Class Exercise
3
Admin Project Milestone #2 Project Milestone #3
identify domain for project supposed to be due last week no submission link will be due today Project Milestone #3 document analysis due 10/10
4
Quiz 30 minutes
5
Document Engineering Glushko and McGrath coined this term
"a new discipline for specifying, designing and implementing the documents that serve as the interfaces to business processes." Topic much larger than XML XML provides a mechanism for the results of such an engineering activity Central insight The concept of "document" is very stable and central to many business processes XML technologies allow systems to consume and produce documents
6
Tasks in Document Engineering
Analyzing the Context what is the problem to be solved Analyzing/Design Business Processes express processes at the level where we can identify documents as input and output Analyzing/Design Business Transactions examining the boundaries of processes and seeing what documents go in and out Analyzing Documents & Components collecting documents and analyzing their contents extracting components Assembling Components & Models put components together into data models and document models Implementation writing XML schemas writing code that accepts, manipulates and outputs XML
7
In this class Mostly interested in #6 But defining XML languages
writing code But languages must come from somewhere process and content analysis to derive requirements
8
For your project You will select a domain
in which you can find existing documents assume that the first three steps are complete you know that these are the important documents to represent You will try to figure out what about these documents needs to be represented document analysis
9
Document Analysis Collect representative documents Examine documents
Identify information-bearing components Identify their role in the relevant business process Name them Type them
10
Components Any piece of information that
has a unique label or identifier is a candidate component is self-contained and comprehensible on its own is a candidate component A component is a logical unit, with no presentation implied may be organized structurally
11
Components Just because information is presented as a unit
doesn't mean it is one component Example "Robert J. Glushko and Tim McGrath" Just because information is not presented together doesn't mean the components should be separate Depaul University School of CTI 243 S. Wabash Ave.
12
Hints for Components Spatial features of documents Typography
whitespace rules boxes layout patterns Typography font sizes and styles not always Proximity figures and captions Structure be careful! document may not have the right structure better to pull out internal information components see if the structure emerges from the analysis
13
What to record Tentative name Type of data
must be tentative; names change frequently Type of data
14
Example
15
Example 2
16
Example 3
17
Component Harvest For each document
extract components Do so independently of other documents lets you identify differences in representation and contents
18
Harvesting
19
Component Consolidation
Examine different sets of harvested components Look for similarities and differences Try to resolve differences Renaming Structural reorganization Develop detailed type information value standardization
20
Standardizing Values Assists in writing schema/DTD
Assists in document processing BUT value space not likely to remain constant too many choices doesn't help 180 countries but do you do business with all of them? too few choices is also a problem if the distinctions important to your process can't be captured
21
Naming Names are critical
they communicate what each part of the document is for Names are the most dynamic part of the analysis expect them to change several times useful to have a dictionary nearby In consolidation we need to merge synonyms come up with new names for homonyms usually best to rename all homonyms
22
Example title (in a lecture series) longer names needed to specify
Is it the title of the talk? The job title of the speaker? The name of the lecture series? longer names needed to specify Talk Title Series Title Job Title
23
Exercise Each group to get liner notes documents
Produce harvest tables Produce consolidated table Switch documents see how they fit
24
Next week Schemas Next project milestone
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.