Mapping Regulations to Industry– Specific Taxonomies Chin Pang Cheng, Gloria T. Lau, Kincho H. Law Engineering Informatics Group, Stanford University June 5, 2007
Motivating Problem To Legal Practitioners: Hierarchical, well-structured Precise and concise Familiar with regulatory organization systems To Industry Practitioners: Voluminous Not trained to read regulations More familiar with industry- specific terminology and classification structure
Mapping Regulations to Taxonomies Possible Cases: One-Taxonomy-One-Regulation One-Taxonomy-N-Regulation N-Taxonomy-One-Regulation N-Taxonomy-N-Regulation
One-Taxonomy-One-Regulation Simple keyword latching task Stemming (e.g. piling pile, disabled disable) Word interval Concept: “fire alarm system” Regulation: “… fire alarm and detection system …”
Each taxonomy concept is hyperlinked “No Matched Sections” for non- matched OmniClass concepts See other matched related concepts in that section Inverted Regulations
One-Taxonomy-N-Regulation Alabama (AL) regulationArizona (AZ) regulation
One Regulation as the Base (AL) (AZ)
Similarity Comparison on Sections Core from Lau, Law and Wiederhold (2005) Feature extraction (e.g. concepts, measurements) Comparison of shared features Consideration of hierarchical and referential information G.Lau, K.Law and G.Wiederhold. “Legal Information Retrieval and Application to E-Rulemaking,” In Proceedings of the 10 th International Conference on Artificial Intelligence and Law (ICAIL 2005), Bologna, Italy, pp , Jun 6-11, AL regulationAZ regulation
Inclusion of Regulation Hierarchy Terminological differences: revealed by neighbor inclusion
N-Taxonomy-One-Regulation Multiple taxonomies exist in a single industry Translation is unavoidable E.g. in architectural, engineering and construction (AEC) industry Industry Foundation Classes (IFC) CIMsteel Integration Standards (CIS/2) Automating Equipment Information Exchange (AEX) UniFormat TM, MasterFormat TM etc. Possible solution: Merging taxonomy Unfamiliar taxonomy
Proposed System
Proposed Methodology of Taxonomy Mapping [F] Alarms. Approved audible devices shall be connected to every automatic sprinkler system. Such sprinkler water-flow alarm devices shall be activated by water flow equivalent to the flow of a single sprinkler of the smallest orifice size installed in the system. Alarm devices shall be provided on the exterior of the building in an approved location. Where a fire alarm system is installed, actuation of the automatic sprinkler system shall actuate the building fire alarm system. sprinkler system orifice T1 fire alarm T1 water flow T2 fire alarm system T2 Taxonomy Mapping: Mainly manually nowadays Usually term matching (e.g. fire fire alarm)
Demonstration in Construction Industry International Building Code, IBC Taxonomy 1 (OmniClass) Taxonomy 2 (ifcXML) IfcSlab steel Knowledge Corpus Corpus: carefully selected (in the same domain)
Relatedness Analysis on Concepts Notations: a pool of m concepts for a taxonomy a corpus of N regulation sections frequency vector is an N-by-1 vector storing the occurrence frequencies of concept i among the N documents frequency matrix C is an N-by-m matrix in which the i-th column vector is Example: C = m = 4, N = 5 = Concept 3 is matched to Section 4 3 times
Cosine Similarity Measure Common arithmetic measure of similarity to compare documents in text mining Finding angle between two frequency vectors in N dimensions and from Taxonomy 1 and 2 respectively Similarity score = [0, 1] Represented using dot product and magnitude, the similarity score is given by:
Jaccard Similarity Coefficient Statistical measure of the extent of overlapping of two vectors in N dimensions and from Taxonomy 1 and 2 Defined as size of intersection divided by size of union of the vector dimension sets: For concept relatedness analysis, N 11 = number of sections both concepts i and j are matched to N 10 = number of sections concept i is matched to but not concept j N 01 = number of sections concept j is matched to but not concept i
Market Basket Model Probabilistic measure to find item-item correlation used in data-mining Two main elements: (1) set of items; (2) set of baskets Association rule means a basket containing all the items is very likely to contain item j Confidence of a rule = Interest of a rule = Example: Coca-cola Pepsi: Low-confidence but high-interest
Market Basket Model (cont’d) For concept relatedness analysis N 11 = number of sections both concepts i and j are matched to N 01 = number of sections concept j is matched to but not concept i N 10 = number of sections concept i is matched to but not concept j N 00 = number of sections both concepts i and j are NOT matched to Probability of concept j is Confidence of association rule is Forward similarity of concept i and j is the interest as:
Asymmetry of Market Basket Model Asymmetry of market basket model: Forward similarity: Backward similarity: OmniClass concept i IfcXML concept jSim(i, j)Sim(j, i) curtain wallsIfcCurtainWall sound and signal devicesIfcSwitchingDeviceType roof deckingIfcSlab speakersIfcAlarmType gypsum boardIfcWallType concreteIfcSlab
Evaluation of Accuracy Root Mean Square Error (RMSE): Difference between the true values and the predicted values For Taxonomy1 of m concepts and Taxonomy2 of n concepts: Precision: Fraction of predictions that are correct Recall: Fraction of correct matches that are predicted
Evaluation Results Cosine Similarity: Average among three metrics Jaccard Similarity: NOT preferred (unacceptably low recall, though high precision) Market Basket Model: Preferred (lowest RMSE, highest recall) Cosine SimilarityJaccard SimilarityMarket Basket Model RMSE Precision Recall concepts from OmniClass, 20 concepts from ifcXML
Conclusion Mapping industry-specific taxonomy to regulation allows industry practitioners to retrieve regulations faster Four cases: 1-Taxonomy-1-Regulation: simple keyword latching 1-Taxonomy-N-Regulation: hierarchy of regulation sections considered N-Taxonomy-1-Regulation: 3 similarity analysis metrics introduced (cosine similarity, Jaccard similarity, market basket model) N-Taxonomy-N-Regulation: future step
~ Thank You ~