Documentverwerking P11 Document Management Prof.Dr.ir. Patrick P. Bergmans Faculteit IngenieursWetenschappen Universiteit Gent
2 Document Management (1) Document Management controls the “life cycle” of documents in an organization; how they are Document Management controls the “life cycle” of documents in an organization; how they are Created Created Reviewed Reviewed Published Published Consumed Consumed Disposed of, or retained Disposed of, or retained Management implies a “top-down” approach, but Document Management Systems are not always implemented top-down Management implies a “top-down” approach, but Document Management Systems are not always implemented top-down Documents must reflect the culture of a company Documents must reflect the culture of a company But they must also introduce a more formal approach to document usage But they must also introduce a more formal approach to document usage
3 Document Management (2) Tools used for Document Management should be Tools used for Document Management should be Formal and rigorous when required Formal and rigorous when required Yet maximize flexibility Yet maximize flexibility Document Management Systems should Document Management Systems should Promote finding and sharing information easily Promote finding and sharing information easily Organize content in a logical, retrievable way Organize content in a logical, retrievable way Standardize the representation of content Standardize the representation of content Help an organization meet its legal responsibilities Help an organization meet its legal responsibilities Support systems for collaboration Support systems for collaboration Provide systems for efficient archiving Provide systems for efficient archiving
4 Document Management (3) An effective Document Management System specifies: An effective Document Management System specifies: What types of documents and other content can be created within an organization What types of documents and other content can be created within an organization What templates to use for each type of document What templates to use for each type of document What metadata to provide for each type of document What metadata to provide for each type of document Where to store documents at each stage of a document's life cycle Where to store documents at each stage of a document's life cycle How to control access to a document at each stage of its life cycle How to control access to a document at each stage of its life cycle How to move documents within the organization as team members contribute to the documents’ creation, review, approval, publication, and disposition How to move documents within the organization as team members contribute to the documents’ creation, review, approval, publication, and disposition
5 Document Management (4) An effective Document Management System specifies: An effective Document Management System specifies: … What policies to apply to documents so that document-related actions are audited, documents are retained or disposed of properly, and content important to the organization is protected What policies to apply to documents so that document-related actions are audited, documents are retained or disposed of properly, and content important to the organization is protected How documents are converted as they transition from one stage to another during their life cycles How documents are converted as they transition from one stage to another during their life cycles How documents are treated as corporate records, which must be retained according to legal requirements and corporate guidelines How documents are treated as corporate records, which must be retained according to legal requirements and corporate guidelines
6 Planning Document Management Planning steps Planning steps Identify Document Management participants and stakeholders Identify Document Management participants and stakeholders Analyze document usage Analyze document usage Plan document libraries Plan document libraries Plan content types Plan content types Plan versioning, content approval, check-out procedures Plan versioning, content approval, check-out procedures Plan workflows for documents Plan workflows for documents Plan information management policies Plan information management policies
7 Document Management Stakeholders Identify Document Management participants and stakeholders (WHO???) Who creates documents in the organization Who reviews documents Who edits documents Who uses documents Who approves the publication of documents Who designs Web sites used for hosting documents Who manages “records” Who deploys and maintains document servers
8 Analyze Document Usage Analyze Document Usage Analyze Document Usage Document types Document types Specific and detailed description of the usage of a document, or of classes of documents Specific and detailed description of the usage of a document, or of classes of documents Simplify documents with little useSimplify documents with little use Author of all documents Author of all documents Format of the document; format conversions should also be recorded Format of the document; format conversions should also be recorded Describe users of all documents (individuals, teams, departments) Describe users of all documents (individuals, teams, departments) Location of the documents Location of the documents
9 Plan Document Libraries Plan Document Libraries Plan Document Libraries Library in a team site Library in a team site Less formal documents, ideas, proposalsLess formal documents, ideas, proposals Library in a portal area (Intranet site) Library in a portal area (Intranet site) Legal documents, templates, active contractsLegal documents, templates, active contracts Library in a Document Center site Library in a Document Center site Centrally managed documents; best practicesCentrally managed documents; best practices Library in a Records Repository Library in a Records Repository Document archival, long term legal requirements, corporate recordsDocument archival, long term legal requirements, corporate records Translation management document library Translation management document library Slide and presentation library Slide and presentation library
10 Plan Content Types What are content types? What are content types? Properties of the document type Properties of the document type Workflows associated with the type Workflows associated with the type Information management policies associated with the type Information management policies associated with the type Document templates Document templates Document conversions Document conversions Custom features Custom features When creating a “new” document of a specific types, all properties of the type are automatically inherited When creating a “new” document of a specific types, all properties of the type are automatically inherited Document type is stored in the document and cannot be changed by the document author Document type is stored in the document and cannot be changed by the document author May be enforced by DTD of XML Schema approach May be enforced by DTD of XML Schema approach
11 Plan Versioning, Content Approval, Check-out Types of versioning Types of versioning None None Only the last version of the document is kept, and no log of changes or edit are availableOnly the last version of the document is kept, and no log of changes or edit are available Only use for unimportant documentsOnly use for unimportant documents Major versions only Major versions only Uses a simple versioning scheme (1, 2, 3..);Uses a simple versioning scheme (1, 2, 3..); All versions normally read-accessible to all stakeholdersAll versions normally read-accessible to all stakeholders Major and minor versions Major and minor versions Most often double version scheme: [major.minor] (1.2, 2.5,..)Most often double version scheme: [major.minor] (1.2, 2.5,..) Allows to implement a storage policy based on both digitsAllows to implement a storage policy based on both digits For example: retain all current minor releases, and 2 former major releases (5.0, 6.0, 7.0, 7.1, 7.2,..)For example: retain all current minor releases, and 2 former major releases (5.0, 6.0, 7.0, 7.1, 7.2,..)
12 Plan Versioning, Content Approval, Check-out Content Approval: before making a version “official”, its content must be approved. Content Approval: before making a version “official”, its content must be approved. No version control: when an (approved) document is being edited, or has been edited, but its content has not yet been approved, no approved (earlier) version is available No version control: when an (approved) document is being edited, or has been edited, but its content has not yet been approved, no approved (earlier) version is available Major version control: an approved version may be edited, and pending the approval of the new version, the previous (major) version remains available Major version control: an approved version may be edited, and pending the approval of the new version, the previous (major) version remains available Minor version control: an approved version may be edited, and the author has the choice to create a new major or minor version; official approved version is optionally latest minor or major version Minor version control: an approved version may be edited, and the author has the choice to create a new major or minor version; official approved version is optionally latest minor or major version
13 Document Workflows A document workflow described the various steps a document goes through during its life cycle A document workflow described the various steps a document goes through during its life cycle With a formal workflow model (graphic, procedural, scripted) With a formal workflow model (graphic, procedural, scripted) Includes creation, iterative edits, and approvals Includes creation, iterative edits, and approvals Authorizations and signatures Authorizations and signatures Collection of document metadata Collection of document metadata Checking in and checking out Checking in and checking out Includes starts, pauses and stops in the workflow Includes starts, pauses and stops in the workflow Messaging (sending s) Messaging (sending s)
14 Information Management Policies Implementation of Information Access Rights as they apply to documents Implementation of Information Access Rights as they apply to documents Create, modify, read rights Create, modify, read rights By individualsBy individuals By departmentBy department Print rights Print rights Control and management of confidential information, contained in documents Control and management of confidential information, contained in documents Control of digital copies Control of digital copies Control of physical (paper) copies, with automatic registered numbering, overprinting of watermarks, etc Control of physical (paper) copies, with automatic registered numbering, overprinting of watermarks, etc
15 Special Functional Components Metadata collection and generation Metadata collection and generation Indexing Indexing Summarizing Summarizing Auto-translation Auto-translation Terminology Control Terminology Control Conversions Conversions Integration with other systems Integration with other systems
16 Metadata Collection and Generation Metadata are data that are associated with the document, but that are not part of a document’s content Metadata are data that are associated with the document, but that are not part of a document’s content Name of the author Name of the author Date of creation and trail of revisions Date of creation and trail of revisions Authoring application of the document Authoring application of the document Statistical information about the document Statistical information about the document Digital document rights Digital document rights Custom structural data (XML) Custom structural data (XML) Headers, footers, watermarks for printing Headers, footers, watermarks for printing Metadata can be indexed and searched Metadata can be indexed and searched Metadata can be input manually, or automatically generated by the authoring application Metadata can be input manually, or automatically generated by the authoring application
17 Indexing Documents may be indexed for efficient retrieval Documents may be indexed for efficient retrieval Keywords for the index may be manually added, or automatically generated Keywords for the index may be manually added, or automatically generated A few selected keywords, or A few selected keywords, or The whole document may be indexed The whole document may be indexed Indexing should be matched or adapted to the search mechanism associated with the Document Management System Indexing should be matched or adapted to the search mechanism associated with the Document Management System Indexing may also be “semantic” (not just words but their general meaning is indexed, using categories) Indexing may also be “semantic” (not just words but their general meaning is indexed, using categories) Use of categorization systems Use of categorization systems
18 Summarizing Summarization of documents can be performed, at the time a document is stored Summarization of documents can be performed, at the time a document is stored Completely automatically Completely automatically User assisted User assisted There are three types of automatic summerization There are three types of automatic summerization Statistical Statistical Semantic Semantic Mixed Mixed Summary may be Summary may be Included in the document as metadata Included in the document as metadata Saved on the system as a separate document, linked to the original Saved on the system as a separate document, linked to the original
19 Auto-Translation Automatic translation of documents, or section of documents, can be performed by some (experimental) Document Management Systems Automatic translation of documents, or section of documents, can be performed by some (experimental) Document Management Systems Essentially two systems Essentially two systems Complete analysis of sentences, and synthesis in the other language Complete analysis of sentences, and synthesis in the other language Translation memories, storing pre-translated snippets of documents Translation memories, storing pre-translated snippets of documents Translation mostly not perfect, but often usable Translation mostly not perfect, but often usable Especially if the document is very domain-specific Especially if the document is very domain-specific Pharmaceutical documents Pharmaceutical documents User and service manuals User and service manuals But not for legal texts But not for legal texts
20 Terminology Control Terminology Control or Management is used when the meaning of words, used in the document, must be accurately monitored Terminology Control or Management is used when the meaning of words, used in the document, must be accurately monitored Delete = erase = scratch a file; in software manuals; which should be used (in all instances)? Delete = erase = scratch a file; in software manuals; which should be used (in all instances)? Gas = gasoline = fuel = petrol tank; which one should be used in the user manual of a car; in the US? In the UK? Gas = gasoline = fuel = petrol tank; which one should be used in the user manual of a car; in the US? In the UK? Terminology management systems rely on a central terminology server with terminology usage rules Terminology management systems rely on a central terminology server with terminology usage rules Terminology database maintained by dedicated utilities Terminology database maintained by dedicated utilities Can be used interactively when authoring documents Can be used interactively when authoring documents Can be used to screen pre-authored texts for correct terminology usage Can be used to screen pre-authored texts for correct terminology usage
21 Conversions Conversions between formats (see introduction) Conversions between formats (see introduction) ML RTF PDL Bitmap ML RTF PDL Bitmap Bitmap PDL RTF ML Bitmap PDL RTF ML Any format to HTML for viewing with browser Any format to HTML for viewing with browser Also conversions at “constant semantic level” (jpg, tif, png, bmp, etc) Also conversions at “constant semantic level” (jpg, tif, png, bmp, etc) Compression and decompression Compression and decompression Simultaneous tracking of versions at several levels Simultaneous tracking of versions at several levels Automatically delete PDF version, if source (ML or RTF) is being edited Automatically delete PDF version, if source (ML or RTF) is being edited
22 Integration with other Systems Many document management systems attempt to integrate document management directly into other applications Many document management systems attempt to integrate document management directly into other applications Authoring applications: so that users may access (read- modify-write) directly in the document management system repository, without leaving the application; such integration is commonly available for office suites and e- mail software. Authoring applications: so that users may access (read- modify-write) directly in the document management system repository, without leaving the application; such integration is commonly available for office suites and e- mail software. Management applications, such as CRM systems, ERP systems, MRP systems, accounting packages, etc Management applications, such as CRM systems, ERP systems, MRP systems, accounting packages, etc Integration often uses open standards such as ODMA, LDAP, WebDAV and SOAP to allow integration with other software and compliance with internal controls. Integration often uses open standards such as ODMA, LDAP, WebDAV and SOAP to allow integration with other software and compliance with internal controls.