Werner Ceusters & Shahid Manzoor InterOntology 2009 Applying Referent Tracking to the Use and Evolution of Websites. Keio University, Tokyo, Japan - February 28, 2009 Werner Ceusters & Shahid Manzoor Ontology Research Group Center of Excellence in Bioinformatics & Life Sciences SUNY at Buffalo, NY
Presentation overview Foundations of referent tracking Referent Tracking Systems Referent Tracking enabled websites
Foundations of Referent Tracking
In (computational) linguistics: ‘Referent Tracking’ In (computational) linguistics: Identifying which words or phrases denote the same entity throughout a discourse. In the newspaper: Obama gave another speech yesterday. The President said hard times are coming. But he was confident to come up with solutions.
In (computational) linguistics: ‘Referent Tracking’ In (computational) linguistics: Identifying which words or phrases denote the same entity throughout a discourse. In the newspaper: Obama gave another speech yesterday. The President said hard times are coming. But he was confident to come up with solutions.
Origin: the semantic / semiotic triangle concept object term referent reference
How can we know what co-refers? prior mention mutual knowledge from shared experience frames/scripts/schemata: culturally established scenes with certain expectable parts the handlebars from a mention of a bicycle; the waiter from a mention of a restaurant uniqueness in the “universe of discourse” (e.g. ‘the sun’)
Important local negotiation aspect during human communication Not fail proof Important local negotiation aspect during human communication Requests for comprehension ‘you know ?’, ‘you remember ?’ Requests for clarification ‘who do you mean?’ Explanations These tools are not available in isolated descriptions
Yolk or white in which case ? From two recipes For meringue: Take an egg, separate the yolk from the white, add sugar and start beating it gently For sabayon: Take an egg, separate the yolk from the white, add sugar and white wine and start beating it gently over a low flame Yolk or white in which case ?
A medical example: morbidity reporting 5572 04/07/1990 26442006 closed fracture of shaft of femur 81134009 Fracture, closed, spiral 12/07/1990 9001224 Accident in public building (supermarket) 79001 Essential hypertension 0939 24/12/1991 255174002 benign polyp of biliary tract 2309 21/03/1992 47804 03/04/1993 58298795 Other lesion on other specified region 17/05/1993 298 22/08/1993 2909872 Closed fracture of radial head 01/04/1997 PtID Date ObsCode Narrative 20/12/1998 255087006 malignant polyp of biliary tract
The problem Generic terms used to denote specific entities do not have enough referential capacity Usually enough to convey that some specific entity is denoted, Not enough to be clear about which one in particular. For many ‘important’ entities, unique identifiers are used: UPS parcels Patients in hospitals VINs on cars …
Fundamental goals of ‘our’ Referent Tracking explicit reference to the concrete individual entities relevant to the accurate description of some portion of reality, ... Ceusters W, Smith B. Strategies for Referent Tracking in Electronic Health Records. J Biomed Inform. 2006 Jun;39(3):362-78.
Method: numbers instead of words Introduce an Instance Unique Identifier (IUI) for each relevant particular (individual) entity 235 78 5678 321 322 666 427 Ceusters W, Smith B. Strategies for Referent Tracking in Electronic Health Records. J Biomed Inform. 2006 Jun;39(3):362-78.
Fundamental goals of ‘our’ Referent Tracking Use these identifiers in expressions using a language that acknowledges the structure of reality e.g.: a yellow ball: #1: the ball #2: #1’s yellow Then not: ball(#1) and yellow(#2) and hascolor(#1, #2) But: instance-of(#1, ball, since t) instance-of(#2, yellow, since t) inheres-in(#1, #2, since t) Strong foundations in realism-based ontology
Codes for types AND identifiers for instances 5572 04/07/1990 26442006 closed fracture of shaft of femur 81134009 Fracture, closed, spiral 12/07/1990 9001224 Accident in public building (supermarket) 79001 Essential hypertension 0939 24/12/1991 255174002 benign polyp of biliary tract 2309 21/03/1992 47804 03/04/1993 58298795 Other lesion on other specified region 17/05/1993 298 22/08/1993 2909872 Closed fracture of radial head 01/04/1997 PtID Date ObsCode Narrative 20/12/1998 255087006 malignant polyp of biliary tract IUI-001 IUI-003 IUI-004 IUI-005 IUI-007 IUI-002 IUI-012 IUI-006
The semantic triangle revisited Representation and Reference First Order Reality concepts about terms concepts objects terms
Terminology Realist Ontology Representation and Reference concepts terms representational units universals particulars about objects First Order Reality
Terminology Realist Ontology Representation and Reference concepts terms representational units about objects universals particulars First Order Reality
Terminology Realist Ontology Representation and Reference representational units concepts terms cognitive units communicative units about objects universals particulars First Order Reality
Three levels of reality in Realist Ontology Representation and Reference Representational units in various forms about (1), (2) or (3) representational units cognitive units communicative units (2) Cognitive entities which are our beliefs about (1) (1) Entities with objective existence which are not about anything universals particulars First Order Reality
Representation and the three levels Level 1, 2 or 3 Level 2 or 3 Level 3 Level 1
Possible mismatches reality / representations
Referent Tracking Systems
Referent Tracking System Environment
RTS farms … RTS Proxy Peer RTS Server Proxy Referent Tracking Server C2 Referent Tracking Server C3 … Referent Tracking Server B2 Referent Tracking Server B3 Referent Tracking Server A2 Referent Tracking Server A3 Information System A Information System C Information System B Referent Tracking Server B1 Referent Tracking Server C1 Referent Tracking System C Referent Tracking Server A1 Referent Tracking System A Referent Tracking System B
Referent Tracking enabled Websites
Architecture
Some central ideas Informative websites are about portions of reality. If the latter change, so should the former. Synchronization should be auditable. Enforce responsibility of information providers and consumers, yet protect their integrity. Cross-fertilization with Information Artifact Ontology.
Some key insights Static versus dynamic pages; Web pages usually keep their name (URL), yet undergo changes; ‘page’ versus ‘file’ Server file never ‘changes’: always replaced by a new file with the same name Changes to a file do not always involve changes to the propositional content; Requests to view a page do not lead the file on the server to be transmitted, but a new copy of it in each single case;
Entities to assign IUIs to The content file of each page The content of each content file The propositional content of each content Each browser page Each checksum Each ontology and terminology used in RT-tuples Each RT-tuple (except D-tuples) The middleware component
Use of the CEN Time Standard for HIT
Tuple generations when adding a page
Tuple generations when updating a page
Tuple insertions: generating a browser page A-tuples n IUIp IUIa tap Key 1 #24 #2 (EVENT("#24 assignment") has-occ AT TP(time-18)) #25 3 #27 (EVENT("#27 assignment") has-occ AT TP(time-20)) #28 9 #34 (EVENT("#34 assignment") has-occ AT TP(time-26)) #35 D-tuples n IUId IUIA td E C S Key 2 #2 #25 (EVENT("#25 inserted") has-occ AT TP(time-19)) I CE #26 4 #28 (EVENT("#28 inserted") has-occ AT TP(time-21)) #29 6 #30 (EVENT("#30 inserted") has-occ AT TP(time-23)) #31 8 #32 (EVENT("#32 inserted") has-occ AT TP(time-25)) #33 10 #35 (EVENT("#35 inserted") has-occ AT TP(time-27)) #36 12 #37 (EVENT("#37 inserted") has-occ AT TP(time-29)) #38 PtoP-tuples n IUIa ta r IUIo P tr Key 5 #2 (EVENT("#30 is asserted") has-occ AT TP(time-22)) MainContentCopyOf #022 #27, #12 (EPISODE("#30 is true") has-occ SINCE TI(time-20)) #30 7 (EVENT("#32 is asserted") has-occ AT TP(time-24)) InstigatorOf #24, #27 (EVENT ("#32 is true") has-occ AT TP(time-18)) #32 11 (EVENT("#37 is asserted") has-occ AT TP(time-28)) ChecksumOf #34, #27 (EPISODE("#37 is true") has-occ SINCE TI(time-26)) #37
You see this is ontology! I got a graph!
Challenges for the Information Artifact Ontology Ontological basis for various relationships that are currently too much CS-ish MainContentCopyOf InstigatorOf DerivesFrom (applicable in this context?) … Ontological nature of files, pages, content, propositional content
Future work Better automatisation Integration in popular web-design softwares RT-enabled vita-publisher Expansion to hard-copies