Presentation is loading. Please wait.

Presentation is loading. Please wait.

Michel BIEZUNSKI Infoloom

Similar presentations


Presentation on theme: "Michel BIEZUNSKI Infoloom"— Presentation transcript:

1 Michel BIEZUNSKI Infoloom
Reflections from Advancing Topic Maps Maintenance of Concept Integrity as Names Change Auditability, Collaboration, Transparency Michel BIEZUNSKI Infoloom

2 Contents 1. “Information just the way people made it” versus “Pre- organized information” 2. Information keeps changing–and gets transformed--all the time! 3. The transformations can be made a matter of record. 4. Names versus identifiers 5. Auditability of Information When information is collected as is, as opposed to being organized in advance, the names used to designate subjects show variations that make it impossible to conform to the equation: "one name = one subject". The constant change to documents that serve as input in the Tax Map system (including IRS publications, instructions, forms, FAQs) makes the maintenance of such a knowledge base a moving target. The problem is even made worse because topics not only are acquired from the bottom-up, but they undergo transformations, both automatic and manual (by experts), in order to facilitate intelligibility for the end users. At the end of the day, the names present in TaxMap differ somewhat from the initial names as they occur in sources. One of the basic requirements for the maintenance of this knowledge base is that the origin of each of the names used be traceable, even--and especially--if it has gone through a variety of processes. After a period of trial and error attempts, the solution we found is to maintain the database of names used for topics as a network of interrelated information objects, in which all operations on names are recorded. The "Data Projection Model", a generic model for accountable information systems, serves as the conceptual foundation for this application.

3 Two Ways to Collect Information
Pre-organized: Top-down Instances conform to predefined schemas. As Is: Bottom-up Information is collected as people made it. It is organized after the fact. Organization of information constantly needs to be rethought.

4 Maintenance over time Top-down systems Bottom-up systems
Anything that can't get in is rejected, unless we pretend it fits. Once the schema is obsolete, the system needs to be changed. Initial cost: high, Operating cost: low, Upgrade cost: high. Benefits: decrease over time Concept Integrity: Simple (in appearance) Bottom-up systems Information comes from a variety of sources, and tweaked. Maintenance has to be done constantly. System evolves dynamically with evolution of information Initial cost: low, Operating cost: high, Upgrade cost: minimal. Benefits: increase over time. Concept Integrity: Complex

5 An Example of a Bottom-Up Application
IRS Tax Map is: an electronic research tool featuring topic-based navigation used by assistors and taxpayers to browse publications, forms and other documents by topics. several versions: Intranet, Small Business Cd, Tax Products Cd IRS Tax Map is a bottom-up application because: all information used for navigation is collected on the fly. it doesn't request any change to the workflow in the authoring process. Demo

6 Automatic Processing Acquiring Topic Names from Data Content
Formatting & Creation of Topic Pages Tweaking of the Topic Names Normalized equivalents, Plural / Singular, etc. Useless headers removed (e.g. “Table...”, “Box ...”, “Line ...”) Generating Relations Between Topics Permutation, Containment, Three words in common

7 Manual Processing Applying a Knowledge Base maintained by Tax Experts
Privileging certain topic names Renaming topics Merging topics by name Deleting topics by name Relating topics by name Adding topics by name

8 Integration Challenges
Managing topic names: Married Married and living apart Married child. Married Filing Jointly Married Filing Separately Married individuals. Married persons. Married Taxpayers

9 Maintenance of Tax Map Tax Law changes each year Updates
Publications, Forms, Instructions, etc., modified accordingly. Updates Application based on a batch process that rebuilds Tax Map automatically from current materials each time it is processed. New edition of Tax Map every two weeks, or more often during tax season. Semi-annual knowledge base workshop Handling new topic names as they emerge. Cleaning the mess (continuing activity).

10 Maintenance Problems Identified
5000 – 8000 topic names Because of bottom-up acquisition, names do not uniquely identify subjects. Some names get transformed either by automatic processing, or by manual processing, or by both. In the beginning, we couldn't easily trace the processes through which they had gone. Therefore, a modification did not always have the effect desired. No way to assess how much work was done at a workshop. Graph Structure: No beginning, no end.

11 Help! Hierarchy? The Taxonomy Approach. HOWEVER...
Harmonize: One subject per name. Organize them in a hierarchy. This seems to make sense... HOWEVER... Need to rethink the whole authoring process. If categories or accepted terms don't fit, same problem as in Integrated Voice Response System. Improvement of indexes proved not a practical solution for big scale integration. Several taxonomies may be applicable – and they can overlap/conflict.

12 Housekeeping Never Stops
Taxonomy approach doesn't work as long as new information keeps being collected from various sources. Simply because (among other things): We never know how anybody will end up naming a given thing. We never know whether what we consider the same thing cannot be different things for different people. Similar to housekeeping: as long as we live in a house, it gets messy and we need housekeeping.

13 Account for every process
Record every operation in which every topic name is involved: For example: Its provenance. Why it was deleted (if applicable) By what it has been replaced and under what rule. etc.

14 Tax Map Audited: Income Earned Abroad

15 Tax Map Audited Living Abroad

16 Where does “Living Abroad” come from?

17 Containment Rule Results
If one topic name is entirely contained into another one, they get automatically related.

18 Synonyms Created by Tax Experts

19 Three Words In Common - Rule
Topics that have three words in common get automatically related.

20 Generalization Demo

21 Auditability and Accounting
Financial Auditability requires Accounting. Double Entry Bookkeeping: Money always come from somewhere and always goes somewhere. Each transaction is a Credit to one account and a Debit to another account. The accounts are connected. Credits and debits need to be balanced. Auditability is the ability to navigate the graph linking any account to all other accounts.

22 Auditable Information
No information is ever isolated. Every item is connected to another item through a given process. Every item, seen as a hub, can serve to reveal all its connections Therefore information about any item can be traced back to its origins as well as to its various usages. Information about items becomes auditable.

23 Example of An Information Process: Name <---> Subject
A Name does not identify a Subject: Variant names may be used to designate the same subject. Synonyms Typographical variations One name may identify several subjects.

24 Names and Subjects

25 Strings Become Subjects

26 Connected Items

27 The Data Projection Model
Considers binary relations between information items: 2 operands, one operator: < x | o | y > Can be expressed as RDF statements All information can be represented using binary relations Once information is decomposed into binary relations, information views can be rebuilt according to various perspectives.

28 Auditable Information Systems
Information systems can be made auditable the same way financial information is. Auditing Information = Viewing information within one given perspective. Auditing is no more than one particular way of navigating information.

29 More Information http://www.infoloom.com Contact: Michel Biezunski
Infoloom (718)


Download ppt "Michel BIEZUNSKI Infoloom"

Similar presentations


Ads by Google