REGNET Gloria Lau, Haoyi Wang, Kincho Law, Gio Wiederhold Stanford University May 16th, 2005 A Relatedness Analysis Approach for Regulation Comparison.

Slides:



Advertisements
Similar presentations
REGNET: An Infrastructure for Regulatory Information Management and Compliance Assistance Kincho H. Law Prof., Civil and Env. Engr. Gio Wiederhold Prof.,
Advertisements

DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
CMo: When Less Is More Yevgen Borodin Jalal Mahmud I.V. Ramakrishnan Context-Directed Browsing for Mobiles.
USC Graduate Student DayColumbia, SCMarch 2006 Presented by: Jingshan Huang Computer Science & Engineering Department University of South Carolina PhD.
Information Retrieval in Practice
Search Engines and Information Retrieval
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
Xyleme A Dynamic Warehouse for XML Data of the Web.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
Aki Hecht Seminar in Databases (236826) January 2009
REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold Contact
REGNET Gloria Lau, Shawn Kerrigan, Haoyi Wang, Kincho Law, Gio Wiederhold Stanford University May 14th, 2004 A Software Infrastructure for Government Regulation.
REGNET Gloria Lau, Kincho Law, Gio Wiederhold June 8th, 2004 Legal Information Retrieval and Application to E-Rulemaking.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
1 REGNET: An Infrastructure for Regulatory Information Management and Compliance Assistance Kincho H. Law Prof., Civil and Env. Engr. Jim Leckie Prof.,
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Overview of Search Engines
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
Towards Automatic Structured Web Data Extraction System Tomas Grigalis, 2nd year PhD student Scientific supervisor: prof. habil. dr. Antanas Čenys.
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Search Engines and Information Retrieval Chapter 1.
Processing of large document collections Part 10 (Information extraction: multilingual IE, IE from web, IE from semi-structured data) Helena Ahonen-Myka.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.
WEB SEARCH PERSONALIZATION WITH ONTOLOGICAL USER PROFILES Data Mining Lab XUAN MAN.
Feature selection LING 572 Fei Xia Week 4: 1/29/08 1.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Experience from Mapping Existing Models to the Transfer Schema Robert Kukla.
Self Organization of a Massive Document Collection Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Teuvo Kohonen et al.
Chapter 6: Information Retrieval and Web Search
IR Homework #2 By J. H. Wang Mar. 31, Programming Exercise #2: Query Processing and Searching Goal: to search relevant documents for a given query.
SINGULAR VALUE DECOMPOSITION (SVD)
REGNET Stanford University Gloria Lau Dr. Shawn Kerrigan Dr. Kincho Law Dr. Gio Wiederhold WITS’03 Dec 13th, 2003 An Information Infrastructure for Government.
Mapping Regulations to Industry– Specific Taxonomies Chin Pang Cheng, Gloria T. Lau, Kincho H. Law Engineering Informatics Group, Stanford University June.
Web- and Multimedia-based Information Systems Lecture 2.
Progress Report (Concept Extraction) Presented by: Mohsen Kamyar.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
WEB PAGE CONTENTS VERIFICATION AGAINST TAGS USING DATA MINING TOOL IKNOW VІI scientific and practical seminar with international participation "Economic.
A Novel Visualization Model for Web Search Results Nguyen T, and Zhang J IEEE Transactions on Visualization and Computer Graphics PAWS Meeting Presented.
CS307P-SYSTEM PRACTICUM CPYNOT. B13107 – Amit Kumar B13141 – Vinod Kumar B13218 – Paawan Mukker.
Automating Readers’ Advisory to Make Book Recommendations for K-12 Readers by Alicia Wood.
Similarity Analysis on Government Regulations Gloria Lau, Kincho Law, Gio Wiederhold {glau, Stanford University.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Ontology of drinking water contaminants REGNET: A Relatedness Analysis Approach for Regulation Comparison and E-Rulemaking Applications Principal Investigators:
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.
PAIR project progress report Yi-Ting Chou Shui-Lung Chuang Xuanhui Wang.
ANGELA JEFFRIES Deputy Director Regulations and Rulings Division Alcohol and Tobacco Tax and Trade Bureau TTB COLA UPDATE What is TTB doing and how will.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
A Regulatory Information Infrastructure with Application to Accessibility Codes Gloria Lau, Stanford University Kincho Law, Stanford University Bimal Kumar,
Environmental Regulation Tools REGNET Shawn Kerrigan William A. and Martha Campbell SGF Fellowship Kincho Law, James Leckie, Gio Wiederhold, Barton Thompson,
Personalized Ontology for Web Search Personalization S. Sendhilkumar, T.V. Geetha Anna University, Chennai India 1st ACM Bangalore annual Compute conference,
Information Retrieval in Practice
Search Engine Architecture
Guangbing Yang Presentation for Xerox Docushare Symposium in 2011
Single Submission Portal – the trade driven version of Single Window
REGNET projects: Formalizing Laws and Regulations for Automatic Situational Analysis Kincho H. Law (CEE), Gio Wiederhold(CS), Jim Leckie(CEE), Barton.
Block Matching for Ontologies
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Presentation transcript:

REGNET Gloria Lau, Haoyi Wang, Kincho Law, Gio Wiederhold Stanford University May 16th, 2005 A Relatedness Analysis Approach for Regulation Comparison and E-Rulemaking Applications

1 Motivation: regulatory comparison  Multiple sources of regulations  Multiple jurisdictions: federal, state, local, etc.  Different formats, terminologies, contexts UK DDA in HTMLADAAG in HTML   Amending rules, conflicting ideas IBC in PDF

2  Increasing amount of electronic data in e-rulemaking  Example  Alcohol and Tobacco Tax and Trade Bureau received over 14,000 comments in 7 months, the majority of which were s, on a flavored malt beverages proposal  Originally in the Federal Register:  “All comments posted on our Web site will show the name of the commenter but will not show street addresses, telephone numbers, or addresses.”  Later in the Federal Register:  due to the “unusually large number of comments received,” the Bureau later announced that it was difficult to remove all street addresses, telephone numbers and addresses “in a timely manner.” Motivation: e-rulemaking

3 Relatedness analysis based on a regulatory repository  XML regulatory repository with features extracted  Shallow parser to consolidate regulations  HTML, PDF, plain text  XML regulations  Features, references, etc.  Relatedness analysis to help understanding of regulations and the relationships between them  Feature matching  Structural matching  Application to e-rulemaking  Comparisons of drafted regulations and public comments

4 Development of a Regulatory Repository

5 reference parse tree Feature Extraction in XML … … Assembly areas with fixed seating shall comply …...

6 Structural comparisons Related elements: door and entrance Relatedness analysis ADAAG 4.1.6(3)(d) Doors (i) Where it is technically infeasible to comply with clear opening width requirements of , a projection... UFAS Minimum Number Entrances required to be accessible by 4.1 shall be part of an accessible route and shall comply with...

7 Relatedness analysis  To utilize the computational properties of regulations for a complete comparison  Measure  Degree of relatedness: similarity score f (A, U)  (0, 1)  Nodes A and U are provisions from two different regulation trees

8 Base score f 0 computation  Linear combination of feature matching  F ( A, U, i ) = similarity score between Sections ( A, U ) based on feature i  N = total number of features   = weighting coefficient   Feature matching   Based on the Vector model using cosine similarity as the distance between feature vectors   Non-Boolean features   A measurement of “2 inches max” can be a 70% match to “2 inches”   Synonyms exist, e.g., ontology defined for chemicals   Perform vector-space transformation prior to cosine computation

9 Score refinements based on regulation structure  Neighbor inclusion  Diffusion of similarity between clusters of nodes in the tree

10 Score refinements based on regulation structure  Reference distribution  Diffusion of similarity between referencing nodes and referenced nodes in the tree  E.g., f (A5.3, U6.4(a)) updates f (A2.1, U3.3)

11 Performance evaluation  Conduct a user survey of rankings of similarity  10 randomly chosen sections from the ADAAG and UFAS  Ranks 1 to 100 in the order of relevance  Root mean square error ( RMSE )  = user-generated ranking vector  = machine-predicted ranking vector

12 Survey results - Tabulated RMSE’s  Compared our analysis to Latent Semantic Indexing (LSI)   = structural weighting coefficient   = feature weighting coefficient  Average RMSE smaller than LSI  Measurement feature performs best  No improvement in result observed for structural comparison

13 Results of comparisons: ADAAG vs. UFAS  Related accessible elements: door and entrance  No ontological information  Neighbor inclusion reveals higher similarity  Content of neighbors imply similarity between Section 4.1.6(3)(d) in ADAAG and Section in UFAS

14 Results of comparisons : UFAS vs. BS8300   Terminological differences - revealed through neighbor inclusion

15 Results of comparisons : UFAS vs. Scottish Technical Standards  Terminological differences - revealed through reference distribution  Stairs and ramps

16  Application domain: e-rulemaking  Comparison between draft of rules and the associated public comments  ADAAG Chapter 11, rights-of-way draft  Less than 15 pages  Over 1400 public comments received within 4 months  Comments ~ 10MB in size; most are several pages long  New regulation draft can easily generate a huge amount of data that needs to be reviewed and analyzed  Parsing of the draft and comments  From HTML to XML  Recreate structure of the draft using our shallow parser  Extract features from the draft and comments  Treat individual comments as provisions Application to e-rulemaking

17 Application to E-Rulemaking Drafted regulations compared with public comments

18  Related section in draft and public comment Results from e-rulemaking application

19 Results from e-rulemaking application  No related provisions identified  Concern not addressed in the draft

20 Results from e-rulemaking application  Related section in draft and public comment  Commenting per provision  Forward to right personnel

21 Results from e-rulemaking application  Related section in draft and public comment  Suggested revision cannot be located automatically  Linguistic analysis can potentially help

22 Results from e-rulemaking application  Comment on the general intent of the draft  Clustering of comments might help

23 Conclusions   Prototype for relatedness comparisons of regulations   Contextual comparisons   Domain knowledge   Structural comparisons   Performance Evaluation, Results and Applications   User survey and comparisons with LSI   Observations of comparisons between Federal, State, non-profit organization mandated codes and European standards   Application to e-rulemaking   Compare drafted rules with public comments   Observations of comparisons based on a rights-of-way draft

24 Future research directions  Regulatory comparison  Regulatory competition  Cross border data transfer laws  Especially in the polyglot countries in EU  Regulatory updates  Track changes in updates  Track cross references between regulations  E-rulemaking  Automated routing of comment to person in charge  Clustering of comments  Web portal for comment submission per provision, in addition to per draft  Linguistic analysis to match patterns of suggested revision embedded in comments

25 Thank You!