Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mapping Regulations to Industry– Specific Taxonomies Chin Pang Cheng, Gloria T. Lau, Kincho H. Law Engineering Informatics Group, Stanford University June.

Similar presentations


Presentation on theme: "Mapping Regulations to Industry– Specific Taxonomies Chin Pang Cheng, Gloria T. Lau, Kincho H. Law Engineering Informatics Group, Stanford University June."— Presentation transcript:

1 Mapping Regulations to Industry– Specific Taxonomies Chin Pang Cheng, Gloria T. Lau, Kincho H. Law Engineering Informatics Group, Stanford University June 5, 2007

2 Motivating Problem To Legal Practitioners: Hierarchical, well-structured Precise and concise Familiar with regulatory organization systems To Industry Practitioners: Voluminous Not trained to read regulations More familiar with industry- specific terminology and classification structure

3 Mapping Regulations to Taxonomies Possible Cases:  One-Taxonomy-One-Regulation  One-Taxonomy-N-Regulation  N-Taxonomy-One-Regulation  N-Taxonomy-N-Regulation

4 One-Taxonomy-One-Regulation Simple keyword latching task Stemming (e.g. piling  pile, disabled  disable) Word interval  Concept: “fire alarm system”  Regulation: “… fire alarm and detection system …”

5 Each taxonomy concept is hyperlinked “No Matched Sections” for non- matched OmniClass concepts See other matched related concepts in that section Inverted Regulations

6 One-Taxonomy-N-Regulation Alabama (AL) regulationArizona (AZ) regulation

7 One Regulation as the Base (AL) (AZ)

8 Similarity Comparison on Sections Core from Lau, Law and Wiederhold (2005) Feature extraction (e.g. concepts, measurements) Comparison of shared features Consideration of hierarchical and referential information G.Lau, K.Law and G.Wiederhold. “Legal Information Retrieval and Application to E-Rulemaking,” In Proceedings of the 10 th International Conference on Artificial Intelligence and Law (ICAIL 2005), Bologna, Italy, pp. 146-154, Jun 6-11, 2005. AL regulationAZ regulation

9 Inclusion of Regulation Hierarchy Terminological differences: revealed by neighbor inclusion

10 N-Taxonomy-One-Regulation Multiple taxonomies exist in a single industry  Translation is unavoidable  E.g. in architectural, engineering and construction (AEC) industry Industry Foundation Classes (IFC) CIMsteel Integration Standards (CIS/2) Automating Equipment Information Exchange (AEX) UniFormat TM, MasterFormat TM etc. Possible solution: Merging taxonomy  Unfamiliar taxonomy

11 Proposed System

12 Proposed Methodology of Taxonomy Mapping [F] 903.4.2 Alarms. Approved audible devices shall be connected to every automatic sprinkler system. Such sprinkler water-flow alarm devices shall be activated by water flow equivalent to the flow of a single sprinkler of the smallest orifice size installed in the system. Alarm devices shall be provided on the exterior of the building in an approved location. Where a fire alarm system is installed, actuation of the automatic sprinkler system shall actuate the building fire alarm system. sprinkler system orifice T1 fire alarm T1 water flow T2 fire alarm system T2 Taxonomy Mapping:  Mainly manually nowadays  Usually term matching (e.g. fire  fire alarm)

13 Demonstration in Construction Industry International Building Code, IBC Taxonomy 1 (OmniClass) Taxonomy 2 (ifcXML) IfcSlab steel Knowledge Corpus Corpus: carefully selected (in the same domain)

14 Relatedness Analysis on Concepts Notations: a pool of m concepts for a taxonomy a corpus of N regulation sections frequency vector is an N-by-1 vector storing the occurrence frequencies of concept i among the N documents frequency matrix C is an N-by-m matrix in which the i-th column vector is Example: C = m = 4, N = 5 = Concept 3 is matched to Section 4 3 times

15 Cosine Similarity Measure Common arithmetic measure of similarity to compare documents in text mining Finding angle between two frequency vectors in N dimensions and from Taxonomy 1 and 2 respectively Similarity score = [0, 1] Represented using dot product and magnitude, the similarity score is given by:

16 Jaccard Similarity Coefficient Statistical measure of the extent of overlapping of two vectors in N dimensions and from Taxonomy 1 and 2 Defined as size of intersection divided by size of union of the vector dimension sets: For concept relatedness analysis, N 11 = number of sections both concepts i and j are matched to N 10 = number of sections concept i is matched to but not concept j N 01 = number of sections concept j is matched to but not concept i

17 Market Basket Model Probabilistic measure to find item-item correlation used in data-mining Two main elements: (1) set of items; (2) set of baskets Association rule means a basket containing all the items is very likely to contain item j Confidence of a rule = Interest of a rule = Example:  Coca-cola  Pepsi: Low-confidence but high-interest

18 Market Basket Model (cont’d) For concept relatedness analysis  N 11 = number of sections both concepts i and j are matched to  N 01 = number of sections concept j is matched to but not concept i  N 10 = number of sections concept i is matched to but not concept j  N 00 = number of sections both concepts i and j are NOT matched to Probability of concept j is Confidence of association rule is Forward similarity of concept i and j is the interest as:

19 Asymmetry of Market Basket Model Asymmetry of market basket model:  Forward similarity:  Backward similarity: OmniClass concept i IfcXML concept jSim(i, j)Sim(j, i) curtain wallsIfcCurtainWall0.992849 sound and signal devicesIfcSwitchingDeviceType0.998808 roof deckingIfcSlab0.8023440.370313 speakersIfcAlarmType0.8831940.018024 gypsum boardIfcWallType0.5688320.029939 concreteIfcSlab0.1195480.427615

20 Evaluation of Accuracy Root Mean Square Error (RMSE):  Difference between the true values and the predicted values  For Taxonomy1 of m concepts and Taxonomy2 of n concepts: Precision:  Fraction of predictions that are correct Recall:  Fraction of correct matches that are predicted

21 Evaluation Results Cosine Similarity:  Average among three metrics Jaccard Similarity:  NOT preferred (unacceptably low recall, though high precision) Market Basket Model:  Preferred (lowest RMSE, highest recall) Cosine SimilarityJaccard SimilarityMarket Basket Model RMSE0.10000.13000.0825 Precision0.91301.00000.7955 Recall0.35590.11860.5932 20 concepts from OmniClass, 20 concepts from ifcXML

22 Conclusion Mapping industry-specific taxonomy to regulation allows industry practitioners to retrieve regulations faster Four cases:  1-Taxonomy-1-Regulation: simple keyword latching  1-Taxonomy-N-Regulation: hierarchy of regulation sections considered  N-Taxonomy-1-Regulation: 3 similarity analysis metrics introduced (cosine similarity, Jaccard similarity, market basket model)  N-Taxonomy-N-Regulation: future step

23 ~ Thank You ~


Download ppt "Mapping Regulations to Industry– Specific Taxonomies Chin Pang Cheng, Gloria T. Lau, Kincho H. Law Engineering Informatics Group, Stanford University June."

Similar presentations


Ads by Google