Towards a Logic Formalization of Taxonomic Concepts Dave Thau, Bertram Ludäscher, Shawn Bowers UC Davis thau@learningsite.com
5th International Conference on Ecological Informatics Names are Confusing Adapted from R. Peet Ranunculus plumosa Gray 1834 R.plumosa var intermedia R.plumosa var plumosa Chapman 1860 Kral 1998 Ranunculus pinetcola Ranunculus plumosa Ranunculus plumosa Ranunculus homunculus Thau 2006 thau@learningsite.com 5th International Conference on Ecological Informatics
Impact on Data Analysis Can’t find data If A º B, a search on A should retrieve B Same if A B Can’t aggregate data If A B, you should be able to combine data from A into B thau@learningsite.com 5th International Conference on Ecological Informatics
5th International Conference on Ecological Informatics Where In Greece Can I Find Ranunculus aquatilis? R. aquatilis R. trichophyllus thau@learningsite.com 5th International Conference on Ecological Informatics
5th International Conference on Ecological Informatics Mapping Taxonomies Benson, 1948 FNA-03, 1997 Ranunculus aquatilis º Ranunculus aquatilis R.a. var calvescens R.a. var capillaceus R.a. var aquatilis R.a. var diffusus R.a. var hispidulus º º B A A B B A A B B A B A A overlap B B A A disjoint B This results in 512 (more than 240 million) possible sets of relationships. thau@learningsite.com 5th International Conference on Ecological Informatics
5th International Conference on Ecological Informatics Overview The problems – Names change, experts disagree, data become incomparable The partial solution – Taxonomic Concepts Another part of the solution – Logic Representing taxonomy in logic Using the representation to detect inconsistencies and discover new relations Applications thau@learningsite.com 5th International Conference on Ecological Informatics
5th International Conference on Ecological Informatics Logic, why? Precise modeling language Solid mathematical basis Good tools for reasoning are available Explicit, “portable” representation (not buried in code) thau@learningsite.com 5th International Conference on Ecological Informatics
5th International Conference on Ecological Informatics Basic Taxonomy A Rooted tree Only “Isa” relations isa isa B C T = (N, E) N = {A, B, C} E = {B A, C A} isa B isa A isa C A isaTx:m(x) n(x)m n E, T=(N,E)) } isa In the basic taxonomy TisaT thau@learningsite.com 5th International Conference on Ecological Informatics
Some Additional Constraints No empty nodes All nodes have at least one element Tx: n(x)n N, T=(N,E)) } Disjointness The children of a node are disjoint !Tx: n1(x) n2(x) n1 m E, n2 m E, T=(N,E)) } Closed World A node with children is defined as the union of those children This one’s formula is a bit long – trust me… A B C isa isa isa thau@learningsite.com 5th International Conference on Ecological Informatics
5th International Conference on Ecological Informatics Mapping Formulae Mappings between nodes in two different taxonomies have their owns In the slides and proofs to come I will use these symbols: A B: A is included in B A B: A includes B A B: A and B are equivalent thau@learningsite.com 5th International Conference on Ecological Informatics
Inferring Unstated Correspondences Benson, 1948 Kartesz, 2004 Ranunculus arizonicus Given: º Ranunculus arizonicus Given: R.a. var chihuahua R.a. var typicus We can demonstrate: Peet, 2005: B.1948:R.a.typicus is included in K.2004:R. arizonicus B.1948:R. arizonicus is congruent to K.2004:R. arizonicus thau@learningsite.com 5th International Conference on Ecological Informatics
5th International Conference on Ecological Informatics Proving New Mappings Benson, 1948 Kartesz, 2004 A Ranunculus arizonicus D Ranunculus arizonicus º B R.a. var chihuahua C R.a. var typicus ? Show B D and (D B) thau@learningsite.com 5th International Conference on Ecological Informatics
Formal Proof of Mapping Part 1 Part 2 thau@learningsite.com 5th International Conference on Ecological Informatics
5th International Conference on Ecological Informatics Inconsistent Mapping Benson, 1948 Kartesz, 2004 Ranunculus hydrocharoides º Ranunculus hydrocharoides R.h. var natans R.h. var stolonifer R.h. var typicus R.h. var stolonifer R.h. var typicus º º Peet, 2005: B.1948:R.h.stolonifer is congruent to K.2004:R.h.stolonifer B.1948:R.h.typicus is congruent to K.2004:R.h.typicus B.1948:R. hydrocharoides is congruent to K.2004:R. hydrocharoides thau@learningsite.com 5th International Conference on Ecological Informatics
Proving Inconsistency Benson, 1948 Kartesz, 2004 Ranunculus hydrocharoides º Ranunculus hydrocharoides R.h. var natans R.h. var stolonifer R.h. var typicus R.h. var stolonifer R.h. var typicus º º good – you could animate and ask “does someone see the problem” then (either way), you show the reasoning. do NOT show the formulas first, but give an “abstract proof” have a formal proof as back-up (but essentially you’ll have to skip over) GOAL: 1. convince audience that the reasoning makes sense 2. convince audience that there is an algorithm that could have done the reasoning for us.. thau@learningsite.com 5th International Conference on Ecological Informatics
Formal Proof of Inconsistency thau@learningsite.com 5th International Conference on Ecological Informatics
Showing Inconsistency Using Popular Tools Benson, 1948 Kartesz, 2004 Ranunculus Ranunculus Ranunculus macranthus Ranunculus petiolaris Ranunculus petiolaris … … B.48:R. petiolaris K.04:R. petiolaris B.48:R. macranthus contradicts B.48:R. macranthus and B.48:R. petiolaris are disjoint. Peet, 2005: B.1948:R. macranthus contains K.2004: R. petiolaris B.1948:R. petiolaris is contained by K. petiolaris thau@learningsite.com 5th International Conference on Ecological Informatics
Resolving Inconsistencies Trying to simultaneously satisfy no emptiness, disjointness and the closed world Relaxing any of these makes the mapping consistent – giving us clues to hidden truths It turns out that Kartesz and Benson focus on different localities. thau@learningsite.com 5th International Conference on Ecological Informatics
5th International Conference on Ecological Informatics Inconsistent Mapping Benson, 1948 Kartesz, 2004 Ranunculus hydrocharoides º Ranunculus hydrocharoides R.h. var natans R.h. var stolonifer R.h. var typicus R.h. var stolonifer R.h. var typicus º º Peet, 2005: B.1948:R.h.stolonifer is congruent to K.2004:R.h.stolonifer B.1948:R.h.typicus is congruent to K.2004:R.h.typicus B.1948:R. hydrocharoides is congruent to K.2004:R. hydrocharoides thau@learningsite.com 5th International Conference on Ecological Informatics
5th International Conference on Ecological Informatics Summary Taxonomic Concepts are important Logic is a useful tool when reasoning about mappings between taxonomies We have the beginnings of a representation for taxonomies That representation can find unstated mappings And detect inconsistent mappings thau@learningsite.com 5th International Conference on Ecological Informatics
5th International Conference on Ecological Informatics Future Work Beefing up the representation Formalizing more constraints, such as rank Working in other factors, such as locality Adding ‘intelligence’ to tools which build mappings Using the representation in a workflow system to aid data integration thau@learningsite.com 5th International Conference on Ecological Informatics
5th International Conference on Ecological Informatics Thanks! Questions? We would like to acknowledge: Bob Peet for the Ranunculus data set NSF, under SEEK awards 0225676, 0225665, 0225635, and 0533368 thau@learningsite.com 5th International Conference on Ecological Informatics