Mapping Regulations to Industry– Specific Taxonomies Chin Pang Cheng, Gloria T. Lau, Kincho H. Law Engineering Informatics Group, Stanford University June.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Chapter 5: Introduction to Information Retrieval
Clustering Basic Concepts and Algorithms
Efficient Retrieval of Recommendations in a Matrix Factorization Framework Noam KoenigsteinParikshit RamYuval Shavitt School of Electrical Engineering Tel.
COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.
Comparison of information retrieval techniques: Latent semantic indexing (LSI) and Concept indexing (CI) Jasminka Dobša Faculty of organization and informatics,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A novel document similarity measure based on earth mover’s.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Creating Concept Hierarchies in a Customer Self-Help System Bob Wall CS /29/05.
REGNET Gloria Lau, Shawn Kerrigan, Haoyi Wang, Kincho Law, Gio Wiederhold Stanford University May 14th, 2004 A Software Infrastructure for Government Regulation.
REGNET Gloria Lau, Kincho Law, Gio Wiederhold June 8th, 2004 Legal Information Retrieval and Application to E-Rulemaking.
LinkSelector: A Web Mining Approach to Hyperlink Selection for Web Portals Xiao Fang University of Arizona 10/18/2002.
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Online Learning for Web Query Generation: Finding Documents Matching a Minority Concept on the Web Rayid Ghani Accenture Technology Labs, USA Rosie Jones.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
1 CS 430 / INFO 430 Information Retrieval Lecture 9 Latent Semantic Indexing.
ASEE Southeast Section Conference INTEGRATING MODEL VALIDATION AND UNCERTAINTY ANALYSIS INTO AN UNDERGRADUATE ENGINEERING LABORATORY W. G. Steele and J.
Chapter 5: Information Retrieval and Web Search
Aligning Course Competencies using Text Analytics
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
Utilising software to enhance your research Eamonn Hynes 5 th November, 2012.
Copyright  2007 Information Interoperability for Supply Chain CIFE TAC Information Interoperability for Engineering and Construction Supply Chain.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
Exploiting Wikipedia as External Knowledge for Document Clustering Sakyasingha Dasgupta, Pradeep Ghosh Data Mining and Exploration-Presentation School.
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
1 Chapter 19: Information Retrieval Chapter 19: Information Retrieval Relevance Ranking Using Terms Relevance Using Hyperlinks Synonyms., Homonyms,
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors : JEROEN DE KNIJFF, FLAVIUS FRASINCAR, FREDERIK HOGENBOOM DKE Data & Knowledge.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Chapter 6: Information Retrieval and Web Search
Comparing and Ranking Documents Once our search engine has retrieved a set of documents, we may want to Rank them by relevance –Which are the best fit.
REGNET Gloria Lau, Haoyi Wang, Kincho Law, Gio Wiederhold Stanford University May 16th, 2005 A Relatedness Analysis Approach for Regulation Comparison.
1 Sentence Extraction-based Presentation Summarization Techniques and Evaluation Metrics Makoto Hirohata, Yousuke Shinnaka, Koji Iwano and Sadaoki Furui.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Instance-based mapping between thesauri and folksonomies Christian Wartena Rogier Brussee Telematica Instituut.
Hierarchical Clustering for POS Tagging of the Indonesian Language Derry Tanti Wijaya and Stéphane Bressan.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Using Text Mining and Natural Language Processing for.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
1 CS 430: Information Discovery Lecture 11 Latent Semantic Indexing.
Predicting Voice Elicited Emotions
Automating Readers’ Advisory to Make Book Recommendations for K-12 Readers by Alicia Wood.
Similarity Analysis on Government Regulations Gloria Lau, Kincho Law, Gio Wiederhold {glau, Stanford University.
Comparing Word Relatedness Measures Based on Google n-grams Aminul ISLAM, Evangelos MILIOS, Vlado KEŠELJ Faculty of Computer Science Dalhousie University,
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach Yih-Cheng Chang Department of Computer Science and Information.
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Online Evolutionary Collaborative Filtering RECSYS 2010 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.
Ontology of drinking water contaminants REGNET: A Relatedness Analysis Approach for Regulation Comparison and E-Rulemaking Applications Principal Investigators:
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics Semantic distance between two words.
A Regulatory Information Infrastructure with Application to Accessibility Codes Gloria Lau, Stanford University Kincho Law, Stanford University Bimal Kumar,
COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.
Automatically Labeled Data Generation for Large Scale Event Extraction
Clustering of Web pages
Information Retrieval
R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.
Multimedia Information Retrieval
CSc4730/6730 Scientific Visualization
Interoperation, Mediation and Composition of Engineering Services
Block Matching for Ontologies
Chapter 31: Information Retrieval
Chapter 19: Information Retrieval
Presentation transcript:

Mapping Regulations to Industry– Specific Taxonomies Chin Pang Cheng, Gloria T. Lau, Kincho H. Law Engineering Informatics Group, Stanford University June 5, 2007

Motivating Problem To Legal Practitioners: Hierarchical, well-structured Precise and concise Familiar with regulatory organization systems To Industry Practitioners: Voluminous Not trained to read regulations More familiar with industry- specific terminology and classification structure

Mapping Regulations to Taxonomies Possible Cases:  One-Taxonomy-One-Regulation  One-Taxonomy-N-Regulation  N-Taxonomy-One-Regulation  N-Taxonomy-N-Regulation

One-Taxonomy-One-Regulation Simple keyword latching task Stemming (e.g. piling  pile, disabled  disable) Word interval  Concept: “fire alarm system”  Regulation: “… fire alarm and detection system …”

Each taxonomy concept is hyperlinked “No Matched Sections” for non- matched OmniClass concepts See other matched related concepts in that section Inverted Regulations

One-Taxonomy-N-Regulation Alabama (AL) regulationArizona (AZ) regulation

One Regulation as the Base (AL) (AZ)

Similarity Comparison on Sections Core from Lau, Law and Wiederhold (2005) Feature extraction (e.g. concepts, measurements) Comparison of shared features Consideration of hierarchical and referential information G.Lau, K.Law and G.Wiederhold. “Legal Information Retrieval and Application to E-Rulemaking,” In Proceedings of the 10 th International Conference on Artificial Intelligence and Law (ICAIL 2005), Bologna, Italy, pp , Jun 6-11, AL regulationAZ regulation

Inclusion of Regulation Hierarchy Terminological differences: revealed by neighbor inclusion

N-Taxonomy-One-Regulation Multiple taxonomies exist in a single industry  Translation is unavoidable  E.g. in architectural, engineering and construction (AEC) industry Industry Foundation Classes (IFC) CIMsteel Integration Standards (CIS/2) Automating Equipment Information Exchange (AEX) UniFormat TM, MasterFormat TM etc. Possible solution: Merging taxonomy  Unfamiliar taxonomy

Proposed System

Proposed Methodology of Taxonomy Mapping [F] Alarms. Approved audible devices shall be connected to every automatic sprinkler system. Such sprinkler water-flow alarm devices shall be activated by water flow equivalent to the flow of a single sprinkler of the smallest orifice size installed in the system. Alarm devices shall be provided on the exterior of the building in an approved location. Where a fire alarm system is installed, actuation of the automatic sprinkler system shall actuate the building fire alarm system. sprinkler system orifice T1 fire alarm T1 water flow T2 fire alarm system T2 Taxonomy Mapping:  Mainly manually nowadays  Usually term matching (e.g. fire  fire alarm)

Demonstration in Construction Industry International Building Code, IBC Taxonomy 1 (OmniClass) Taxonomy 2 (ifcXML) IfcSlab steel Knowledge Corpus Corpus: carefully selected (in the same domain)

Relatedness Analysis on Concepts Notations: a pool of m concepts for a taxonomy a corpus of N regulation sections frequency vector is an N-by-1 vector storing the occurrence frequencies of concept i among the N documents frequency matrix C is an N-by-m matrix in which the i-th column vector is Example: C = m = 4, N = 5 = Concept 3 is matched to Section 4 3 times

Cosine Similarity Measure Common arithmetic measure of similarity to compare documents in text mining Finding angle between two frequency vectors in N dimensions and from Taxonomy 1 and 2 respectively Similarity score = [0, 1] Represented using dot product and magnitude, the similarity score is given by:

Jaccard Similarity Coefficient Statistical measure of the extent of overlapping of two vectors in N dimensions and from Taxonomy 1 and 2 Defined as size of intersection divided by size of union of the vector dimension sets: For concept relatedness analysis, N 11 = number of sections both concepts i and j are matched to N 10 = number of sections concept i is matched to but not concept j N 01 = number of sections concept j is matched to but not concept i

Market Basket Model Probabilistic measure to find item-item correlation used in data-mining Two main elements: (1) set of items; (2) set of baskets Association rule means a basket containing all the items is very likely to contain item j Confidence of a rule = Interest of a rule = Example:  Coca-cola  Pepsi: Low-confidence but high-interest

Market Basket Model (cont’d) For concept relatedness analysis  N 11 = number of sections both concepts i and j are matched to  N 01 = number of sections concept j is matched to but not concept i  N 10 = number of sections concept i is matched to but not concept j  N 00 = number of sections both concepts i and j are NOT matched to Probability of concept j is Confidence of association rule is Forward similarity of concept i and j is the interest as:

Asymmetry of Market Basket Model Asymmetry of market basket model:  Forward similarity:  Backward similarity: OmniClass concept i IfcXML concept jSim(i, j)Sim(j, i) curtain wallsIfcCurtainWall sound and signal devicesIfcSwitchingDeviceType roof deckingIfcSlab speakersIfcAlarmType gypsum boardIfcWallType concreteIfcSlab

Evaluation of Accuracy Root Mean Square Error (RMSE):  Difference between the true values and the predicted values  For Taxonomy1 of m concepts and Taxonomy2 of n concepts: Precision:  Fraction of predictions that are correct Recall:  Fraction of correct matches that are predicted

Evaluation Results Cosine Similarity:  Average among three metrics Jaccard Similarity:  NOT preferred (unacceptably low recall, though high precision) Market Basket Model:  Preferred (lowest RMSE, highest recall) Cosine SimilarityJaccard SimilarityMarket Basket Model RMSE Precision Recall concepts from OmniClass, 20 concepts from ifcXML

Conclusion Mapping industry-specific taxonomy to regulation allows industry practitioners to retrieve regulations faster Four cases:  1-Taxonomy-1-Regulation: simple keyword latching  1-Taxonomy-N-Regulation: hierarchy of regulation sections considered  N-Taxonomy-1-Regulation: 3 similarity analysis metrics introduced (cosine similarity, Jaccard similarity, market basket model)  N-Taxonomy-N-Regulation: future step

~ Thank You ~