Presentation is loading. Please wait.

Presentation is loading. Please wait.

ONTOLOGY FOR THE INTELLIGENCE COMMUNITY: Towards Effective Exploitation and Integration of Intelligence Resources Tracking Referents Columbia, MD.

Similar presentations


Presentation on theme: "ONTOLOGY FOR THE INTELLIGENCE COMMUNITY: Towards Effective Exploitation and Integration of Intelligence Resources Tracking Referents Columbia, MD."— Presentation transcript:

1 ONTOLOGY FOR THE INTELLIGENCE COMMUNITY: Towards Effective Exploitation and Integration of Intelligence Resources Tracking Referents Columbia, MD December 1, 2006 Werner CEUSTERS Center of Excellence in Bioinformatics and Life Sciences University at Buffalo, NY, USA

2 The word ‘Ontology’ has two meanings
Ontology: the science of what entities exist and how they relate to each other. An ontology: a representation of some domain which (1) is intelligible to a domain expert, and (2) is formalized in a way that allows it to support automatic information processing.

3 For most computer scientists:
Within the context of ‘an ontology’, the word ‘domain’ has two meanings For most computer scientists: An agreed upon conceptualization about which man and machine can communicate using an agreed upon vocabulary For philosophical ontologists: A portion of reality Still allowing for a variety of entities to be recognised by one school and refuted by another one

4 The concept view of ontology has sad consequences
Too much effort goes into the specification business OWL, DL-reasoners, translators and convertors, syntax checkers, ... Too little effort into the faithfulness of the conceptualizations towards what they represent. Many ‘ontologies’ and ontology-like systems exhibit mistakes of various sorts. For the intelligence analyst: fancy tools with no real added value.

5 The problem with ‘conceptualization’
For example, consider an ontology describing traffic connections in Amsterdam, which includes such concepts as roads, cycle tracks, canals, bridges, and so on. Natalya F. Noy and Michel Klein. Ontology evolution: Not the same as schema evolution. Knowledge and Information Systems, 5, 2003.

6 The problem with ‘conceptualization’
For example, consider an ontology describing traffic connections in Amsterdam, which includes such concepts as roads, cycle tracks, canals, bridges, and so on. Bridges and roads in Amsterdam are not concepts! Natalya F. Noy and Michel Klein. Ontology evolution: Not the same as schema evolution. Knowledge and Information Systems, 5, 2003.

7 An image of the Kloveniersburgwal bridge in Amsterdam

8 An image of James Bond ?

9 Or an image of Pierce Brosnan?

10 Concept orientation mixes up …
What is generic: air plane, philosopher, airport, idiot, … What is specific: Enola Gay, Barry Smith, JFK, George Bush, … What is in between: The people of Irak, the flies in this room, … Beliefs and phantasies: Belief in God, James Bond, What does not exist: A unicorn, a prevented attack, the present king of France  The intelligence analyst will be hampered in his work by this ambiguous concept-based paradigm.

11 The solution: Basic Formal Ontology
An ontology which is Realist: Fallibilist: Perspectivalist: Adequatist: reality and its constituents exist independently of our (linguistic, conceptual, theoretical, cultural) representations thereof, theories and classifications can be subject to revision, there exists a plurality of alternative, equally legitimate perspectives on reality these alternative views are not reducible to any single basic view.  Corresponds with the mindset of a good analyst

12 Static aspects

13 Important to distinguish 3 fundamentally different levels of reality
the reality which exists ‘as it is’ prior to a cognitive agent’s perception thereof; the cognitive representations of this reality embodied in observations and interpretations on the part of cognitive agents; the publicly accessible concretizations of such cognitive representations in representational artifacts of various sorts, of which ontologies, terminologies and data repositories are examples.

14 “concept representation” We should not be in the business of
But beware ! These concretizations are NOT supposed to be the representations of these cognitive representations; “concept representation” We should not be in the business of

15 But beware ! These concretizations are NOT supposed to be the representations of these cognitive representations; They are representations of that part of reality of which cognitive agents have built a cognitive representation They are like the images taken by means of a high quality camera;

16 They are not (or should not be) like the paintings of Salvador Dali
Non-canonical (although nice looking) anatomy

17 Basic Formal Ontology & Granular Partition Theory
Think of it as Alberti’s grid

18 Representational artifacts
Ideally built out of representational units and relationships that mirror the entities and their relationships in reality. Non-Formalized Formalized Primarily about particulars news reports Inventories, referent tracking database Primarily about universals and defined classes scientific theories, textbooks Ontologies, terminologies, Intelligence

19 An “optimal” representational artifact (1)
Because representations, as conceived on realist terms, are artifacts created for some purpose, are at the same time intended to mirror reality, should allow reasoning which is efficient from a computational point of view, we argue that an optimal ontology/inventory should constitute a representation of all and only those portions of reality that are relevant for its purpose.

20 A realist view of the world
The world consists of entities that are Either particulars or universals; Either occurrents or continuants; Either dependent or independent; and, relationships between these entities of the form <particular , universal> e.g. is-instance-of, <particular , particular> e.g. is-member-of <universal , universal> e.g. isa (is-subtype-of)  The minimal ontology required for intelligence

21 A realist view of the world (1)
universals president air plane philosopher airport idiot instance of particulars Enola Gay Barry Smith JFK George Bush

22 A realist view of the world (2)
flying meeting occurrents Enola Gay Barry Smith George Bush JFK continuants

23 A realist view of the world (3)
universals child adult philosopher president t Instance-at t Barry Smith George Bush particulars

24 A realist view of the world (4)
Transformation-of Is-a child adult president ta tc t Instance-at t Barry Smith George Bush particulars

25 Connecting the dots

26 Connecting the dots

27 Connecting the dots 2 1 5 8 7 6 9 4 10 3

28 Inadequate representational units
“JFK” “Enola Gay” “Barry Smith” “George Bush”

29 Proposed Solution: Referent Tracking Now
Proposed Solution: Referent Tracking Now! That should clear up a few things around here ! Purpose: explicit reference to the concrete individual entities relevant to the accurate description of a scene Ceusters W, Smith B. Strategies for Referent Tracking in Electronic Health Records. J Biomed Inform Jun;39(3):

30 Numbers instead of words
Method: Introduce an Instance Unique Identifier (IUI) for each relevant particular (individual) entity 235 78 5678 321 322 666 427

31 Essentials of Referent Tracking
Generation of universally unique identifiers; deciding what particulars should receive a IUI; finding out whether or not a particular has already been assigned a IUI (each particular should receive maximally one IUI); using IUIs in intelligence reports, i.e. issues concerning the syntax and semantics of statements containing IUIs; determining the truth values of statements in which IUIs are used; correcting errors in the assignment of IUIs.

32 Universally Unique IDs:
IUI generation Universally Unique IDs: recently standardized through ISO/IEC :2004, specifies format and generation rules enabling users to produce 128-bit identifiers that are either guaranteed or have a high probability of being globally unique Meaningless strings Central management or certification not needed to guarantee uniqueness (But use as IUI requires this)

33 IUI assignment = an act carried out by the first ‘cognitive agent’ feeling the need to acknowledge the existence of a particular it has information about by labelling it with a UUID. ‘cognitive agent’: A person; An organisation; A device or software agent, e.g. Bank note printer, Image analysis software.

34 Criteria for IUI assignment (1)
The particular’s existence must be determined: Easy for persons in front of you, tools, ... Easy for ‘planned acts’: they do not exist before the plan is executed ! Only the plan exists and possibly the statements made about the future execution of the plan More difficult: a subject’s intensions, emotions But the statements observers make about them do exist ! However: no need to know what the particular exactly is, i.e. which universal it instantiates No need to be able to point to it precisely A member of a specific organization But: this is not a matter of choice, not ‘any’ out of ...

35 Criteria for IUI assignment (2)
The particular’s existence ‘may not already have been determined as the existence of something else’: Morning star and evening star Himalaya  2 observers not knowing they observed the same thing May not have already been assigned a IUI. It must be relevant to do so: Personal decision, (scientific) community guideline, ... Possibilities offered by the EHR system If a IUI has been assigned by somebody, everybody else making statements about the particular should use it

36 Assertion of assignments
IUI assignment is an act of which the execution has to be asserted in the IUI-repository: <da, Ai, td> da IUI of the registering agent Ai the assertion of the assignment <pa, pp, tap, c> pa IUI of the author of the assertion pp IUI of the particular tap time of the assignment c optional description for identification td time of registering Ai in the IUI-repository Neither td or tap give any information about when #pp started to exist ! That might be asserted in statements providing information about #pp .

37 PTP statements - particular to particular
ordered sextuples of the form <sa, ta, r, o, P, tr> sa is the IUI of the author of the statement, ta a reference to the time when the statement is made, r a reference to a relationship (available in o) obtaining between the particulars referred to in P, o a reference to the ontology from which r is taken, P an ordered list of IUIs referring to the particulars between which r obtains, and, tr a reference to the time at which the relationship obtains. P contains as much IUIs as required by the arity of r. In most cases, P will be an ordered pair such that r obtains between the particular represented by the first IUI and the one referred to by the second IUI. As with A statements, these statements must also be accompanied by a meta-statement capturing when the sextuple became available to the referent tracking system.

38 PTCL statements – particular to class
<sa, ta, inst, o, p, cl, tr> sa is the IUI of the author of the statement, ta a reference to the time when the statement is made, inst a reference to an instance relationship available in o obtaining between p and cl, o a reference to the ontology from which inst and cl are taken, p the IUI referring to the particular whose inst relationship with cl is asserted, cl the class in o to which p enjoys the inst relationship, and, tr a reference to the time at which the relationship obtains.

39 Architecture of a Referent Tracking System (RTS)
RTS: system in which all statements referring to particulars contain the IUIs for those particulars judged to be relevant. Ideally set up as broad as possible: Services: IUI generator IUI repository: statements about assignments and reservations Referent Tracking ‘Database’ (RTDB): index (LSID) to statements relating instances to instances and classes

40 Management of the IUI-repository
Adequate safety and security provisions Access authorisation, control, read/write, ... Pseudonymisation Deletionless but facilities for correcting mistakes. Registration of assertion ASAP after IUI assignment (virtual, e.g. LSID) central management with adequate search facilities.

41 Pragmatics of IUIs in intelligence DBs
IUI assignment requires an additional effort In principle no difference qua (or just a little bit more) effort compared to using directly codes from concept-based systems A search for concept-codes is replaced by a search for the appropriate IUI using exactly the same mechanisms Browsing Code-finder software Auto-coding software With that IUI comes a wealth of already registered information If for the same person different IUIs apply, the user must make the decision which one is the one under scrutiny, or whether it is again a new instance A transfert or reference mechanism makes the statements visible through the RTDB

42 mapping as by-product of tracking
Other Advantages mapping as by-product of tracking Descriptions about the same particular using different ontologies/concept-based systems Quality control of ontologies and concept-based systems Systematic “inconsistent” descriptions in or cross terminologies may indicate poor definition of the respective terms

43 Dynamic aspects

44 Accept that everything may change:
changes in the underlying reality: Particulars and universals come and go changes in our (scientific) understanding: The plant Vulcan does not exist reassessments of what is considered to be relevant for inclusion (notion of purpose). encoding mistakes introduced during data entry or ontology development.

45 Reality versus beliefs, both in evolution
p3 Reality IUI-#3 O-#0 O-#2 Belief O-#1 = “denotes” = what constitutes the meaning of representational units …. Therefore: O-#0 is meaningless

46 An “optimal” representational artifact (2)
Each representational unit in such a representational artifact would designate (1) a single portion of reality (POR), which is (2) relevant to its purposes and such that (3) the authors intended to use this representational unit to designate this POR, and (4) there would be no PORs objectively relevant to these purposes that are not referred to in the representational artifact.

47 The Intelligence Analyst
The Ontologist The Intelligence Analyst

48 Sources of error assertion errors: sources may be in error as to what is the case in their target domain; relevance errors: sources and analysts may be in error as to what is objectively relevant to a given purpose; encoding errors: they may not successfully encode their underlying cognitive representations, so that particular representational units fail to point to the intended PORs.

49 Key requirement for updating
Any change in an ontology or data repository should be associated with the reason for that change to be able to assess later what kind of mistake has been made !

50 Example: a person (in this room) ’s gender
In John Smith’s EHR: At t1: “male” at t2: “female” What are the possibilities ? Change in reality: transgender surgery change in legal self-identification Change in understanding: it was female from the very beginning but interpreted wrongly Correction of data entry mistake (was understood as male, but wrongly transcribed)

51 A realism-based metric for data quality
Must be able to deal with a variety of problems by which matching endeavors thus far have been affected different authors may have different though still veridical views on the same portion of reality, authors may make mistakes, when interpreting reality, or when formulating their interpretations in their chosen representation language a matcher can never be sure to what the expressions in an repository actually refer (no God’s eye perspective), if two ontologies are developed at different times, reality itself may have changed in the intervening period.

52 And also most structures in reality are there in advance
An example: merging data from two sources Reality exist before any observation R And also most structures in reality are there in advance

53 Some portions of reality
The author of O1 acknowledges the existence of some Portion Of Reality (POR) B1 Some portions of reality escape his attention. R

54 He considers only some of them relevant for O1, represents thus only part, here with Int = R+.
B1 O1 RU1B1 Both RU1B1 and RU1O1 are representational units referring to #1; RU1O1 is NOT a representation of RU1B1; RU1O1 is created through concretization of RU1B1 in some medium. RU1O1 R #1

55 Similar concerning the author of O2
B1 B2 O1 O2 R

56 Creation of the mapping
B1 B2 O1 O2 Om R

57 Two (out of many other) possible configurations
#1 was not considered to be relevant for O2, but is considered to be relevant for Om. The author of O1 made an encoding mistake, so that his ontology contains a reference to a non-intended referent, and this is copied into Om.

58 Typology of expressions included in and excluded from an ontology in light of relevance and relation to external reality

59 Valid presence in the representation
Typology of expressions included in and excluded from an ontology in light of relevance and relation to external reality Valid presence in the representation Valid absence in the representation

60 Unjustified presence in the representation
Typology of expressions included in and excluded from an ontology in light of relevance and relation to external reality Unjustified presence in the representation Unjustified absence in the representation But sometimes you get lucky …

61 The original beliefs are usually not accessible
Om R

62 The original beliefs are usually not accessible
But if the ontologies are well documented and representations intelligible, then many such beliefs can be inferred, and mistakes found. O1 O2 Om R

63 For concept-based systems, there is also no reality
Om R

64 But that what must hold if both ontologies are believed to be right, can be believed to mirror reality O1 O2 Om

65 The principle of forced backward belief
A lot of information loss

66 A decision support tool for dealing with inconsistencies ?
Holds that penguins are birds, birds fly O2: Holds that penguins are birds, penguins don’t fly The problem for Om: Which source ontology to believe? What might be the source of the inconsistency ? O1 is right and penguins do fly O1 is wrong and either penguins are not birds or not all birds fly Both are right but the representational units ‘penguin’, ‘bird’ and ‘fly’ do not refer to the same entities in reality.

67 Possible evolutions through updates

68 Possible evolutions through updates
Example: a relevant entity ceases to exist, but the representation is not updated:

69 Updating is an active process
authors assume in good faith that all included representational units are of the P+1 type, and all they are aware of, but not included, of A+1 or A+2. If they become aware of a mistake, they make a change under the assumption that their changes are also towards the P+1, A+1, or A+2 cases. Thus at that time, they know of what type the previous entry must of have been under the belief what the current one is, and the reason for the change.

70 This leads to a calculus …
NOT: to demonstrate how good an individual version of an ontology is, But rather to measure how much it improved (hopefully) as compared to its predecessors. Principle: recursive belief revision

71 Backward belief revision over time
Reality: a POR exists and is not relevant R P Beliefs At t about t -2 At time t, an analyst correctly perceives the existence of some particular, but considers it relevant while it isn’t, and he makes an encoding error such that the representational unit does not refer. There is thus a -2 error with respect to reality, but this remains, of course, unknown.

72 Backward belief revision over time
Reality: a POR exists and is not relevant R P Beliefs At t about t -2 At t+1 about t+1 At t+1 about t At t+1, he correct the encoding mistake, which forces him to believe that at t, the unit-reality configuration was of type P-4 rather than P+1.

73 Backward belief revision over time
Reality: a POR exists and is not relevant R P Beliefs At t about t -2 At t+1 about t+1 At t+1 about t -1 -1 Although he believes that the current situation is P+1, it is in reality P-6, where it was P-7 before. The real error is now -1, while the perceived error with respect to t is also -1

74 Backward belief revision over time
Reality: a POR exists and is not relevant R P Beliefs At t about t -2 At t+1 about t+1 At t+1 about t -1 -1 At t+2, he believes that the posited POR in fact does not exist

75 Backward belief revision over time
Reality: a POR exists and is not relevant R P Beliefs At t about t -2 At t+1 about t+1 At t+1 about t -1 -1 At t+2 about t+2 At t+2 about t+1 At t+2 about t -1 -3 -5

76 Conclusion Realist ontology is a powerful quality assurance tool for building high quality ontologies AND high quality databases; Referent tracking, based on realist ontology, is a means to remove the ambiguity in data that cannot be solved by realist ontology alone; It is a form of “adult” annotation Application of RT requires a globally accessible repository The use of “meaningless” IUIs allows very strict safety and security measures to be implemented.


Download ppt "ONTOLOGY FOR THE INTELLIGENCE COMMUNITY: Towards Effective Exploitation and Integration of Intelligence Resources Tracking Referents Columbia, MD."

Similar presentations


Ads by Google