1 REGNET: An Infrastructure for Regulatory Information Management and Compliance Assistance Kincho H. Law Prof., Civil and Env. Engr. Jim Leckie Prof.,

Slides:



Advertisements
Similar presentations
ISO EMS OVERVIEW FOR CONTRACTORS
Advertisements

28 March 2003e-MapScholar: content management system The e-MapScholar Content Management System (CMS) David Medyckyj-Scott Project Director.
REGNET: An Infrastructure for Regulatory Information Management and Compliance Assistance Kincho H. Law Prof., Civil and Env. Engr. Gio Wiederhold Prof.,
Chapter 11 user support. Issues –different types of support at different times –implementation and presentation both important –all need careful design.
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
Requirements Engineering n Elicit requirements from customer  Information and control needs, product function and behavior, overall product performance,
Information Retrieval in Practice
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
REGNET Gloria Lau, Shawn Kerrigan, Haoyi Wang, Kincho Law, Gio Wiederhold Stanford University May 14th, 2004 A Software Infrastructure for Government Regulation.
A Methodology for Developing a Taxonomy – A Subject Oriented Approach
Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 1: Introduction to Decision Support Systems Decision Support.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
1 REGNET: An Infrastructure for Regulatory Information Management and Compliance Assistance Kincho H. Law Prof., Civil and Env. Engr. Jim Leckie Prof.,
Overview of Search Engines
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
1 LOMGen: A Learning Object Metadata Generator Applied to Computer Science Terminology A. Singh, H. Boley, V.C. Bhavsar National Research Council and University.
Database Systems: Design, Implementation, and Management Ninth Edition
Chapter 1 Database Systems. Good decisions require good information derived from raw facts Data is managed most efficiently when stored in a database.
Overview of the Database Development Process
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
DBS201: DBA/DBMS Lecture 13.
1 © Netskills Quality Internet Training, University of Newcastle Metadata Explained © Netskills, Quality Internet Training.
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
Requirements Analysis
IAEA International Atomic Energy Agency Overview of legal framework Regional Workshop - School for Drafting Regulations 3-14 November 2014 Abdelmadjid.
Objectives Overview Define the term, database, and explain how a database interacts with data and information Define the term, data integrity, and describe.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 1 DATABASE SYSTEMS (Cont’d) Instructor Ms. Arwa Binsaleh.
9/14/2012ISC329 Isabelle Bichindaritz1 Database System Life Cycle.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
7. Approaches to Models of Metadata Creation, Storage and Retrieval Metadata Standards and Applications.
Engineering Computing and Informatics Engineering Informatics Group (EIG) Students: Charles Heenan (Law School), Jie Wang (CEE), David Liu (EE/CS), Jerome.
Search Update April 1-3, 2009 Joshua Ganderson Laura Baalman.
1/26/2004TCSS545A Isabelle Bichindaritz1 Database Management Systems Design Methodology.
User Support Chapter 8. Overview Assumption/IDEALLY: If a system is properly design, it should be completely of ease to use, thus user will require little.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
IAEA International Atomic Energy Agency. IAEA Outline Learning Objectives Introduction IRRS review of regulations and guides Relevant safety standards.
REGNET Gloria Lau, Haoyi Wang, Kincho Law, Gio Wiederhold Stanford University May 16th, 2005 A Relatedness Analysis Approach for Regulation Comparison.
REGNET Stanford University Gloria Lau Dr. Shawn Kerrigan Dr. Kincho Law Dr. Gio Wiederhold WITS’03 Dec 13th, 2003 An Information Infrastructure for Government.
10/24/09CK The Open Ontology Repository Initiative: Requirements and Research Challenges Ken Baclawski Todd Schneider.
Health eDecisions Use Case 2: CDS Guidance Service Strawman of Core Concepts Use Case 2 1.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 1 DATABASE SYSTEMS Instructor Ms. Arwa Binsaleh.
Trustworthy Semantic Webs Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #4 Vision for Semantic Web.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
WEB PAGE CONTENTS VERIFICATION AGAINST TAGS USING DATA MINING TOOL IKNOW VІI scientific and practical seminar with international participation "Economic.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Information Retrieval
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
A Patent Document Retrieval System Addressing Both Semantic and Syntactic Properties Liang Chen*,Naoyuki Tokuda+, Hisahiro Adachi+ *University of Northern.
Requirements Analysis
1 REGNET: Logic-Based Regulation Compliance Assistance Kincho H. Law Prof., Civil and Env. Engr. Jim Leckie Prof., Civil and Env. Engr. Barton Thompson.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 1 Database Systems.
Ontology of drinking water contaminants REGNET: A Relatedness Analysis Approach for Regulation Comparison and E-Rulemaking Applications Principal Investigators:
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 1 Research: An Overview.
California Department of Public Health / 1 CALIFORNIA DEPARTMENT OF PUBLIC HEALTH Standards and Guidelines for Healthcare Surge during Emergencies How.
General Architecture of Retrieval Systems 1Adrienn Skrop.
Semantic Web. P2 Introduction Information management facilities not keeping pace with the capacity of our information storage. –Information Overload –haphazardly.
Environmental Regulation Tools REGNET Shawn Kerrigan William A. and Martha Campbell SGF Fellowship Kincho Law, James Leckie, Gio Wiederhold, Barton Thompson,
Database Principles: Fundamentals of Design, Implementation, and Management Chapter 1 The Database Approach.
Information Retrieval in Practice
Human Computer Interaction Lecture 21 User Support
Nuclear and Treaty Law Section Office of Legal Affairs
Human Computer Interaction Lecture 21,22 User Support
Nuclear and Treaty Law Section Office of Legal Affairs
Database Management System (DBMS)
Chapter 1 Database Systems
REGNET projects: Formalizing Laws and Regulations for Automatic Situational Analysis Kincho H. Law (CEE), Gio Wiederhold(CS), Jim Leckie(CEE), Barton.
Introduction into Knowledge and information
Chapter 1 Database Systems
Presentation transcript:

1 REGNET: An Infrastructure for Regulatory Information Management and Compliance Assistance Kincho H. Law Prof., Civil and Env. Engr. Jim Leckie Prof., Civil and Env. Engr. Barton Thompson Prof., School of Law Gio Wiederhold Prof., Computer Science Shawn Kerrigan Bill Labiosa Gloria Lau Haoyi Wang Jie Wang Civil and Env. Engr. Pooja Trivedi Li Zhang Liang Zhou (former students) Computer Science Charles Heenan Researcher, Law Student Stanford University, Stanford, CA 94305

2 The Public and Scientific Problem Regulations are established to protect the public Regulations greatly constrain businesses’ actions Many organizations participate to set and use regulations Interpretation of regulations is costly and inconsistent Regulations are voluminous, often incomplete, sometimes conflicting Regulations are written in natural language The objects and interests being regulated are often encoded Many sources of supportive documents – interpretative documents, guidelines, etc..

3 Motivation The complexity, diversity, and volume of federal and state regulations: Require considerable expertise to understand Increase the risk of companies failing to comply with environmental regulations Hinder public understanding of the government How would IT help to make “applicable” regulations easily accessible? to assist parties involved in regulation compliance?

4 Objective To enhance regulation management, access and the regulatory compliance process through the use of information technology Application Focus Environmental Regulations: Federal CFR Title 40: Protection of Environment 40 CFR 279: Standards For The Management Of Used Oil 40 CFR 141: National Primary Drinking Water Regulations Illinois Title 35: Environmental Protection New York Title 6: Environmental Conservation Rules and Regulations Others REGNET Project (sponsored by Digital Government Program, National Science Foundation)

5 REGNET Research Goals Research questions –What is an appropriate model for a information management system for compliance assistance? –How to build such a system –How to deal with the conflicting objectives? Research goal –Developing information management frameworks that can facilitate public access to regulations, improve the efficiency of regulation compliance and facilitate the compliance process.

6 Repositories: Infrastructure for online repository of regulations and translating texts into processable form and facilitate access Access Tools: Access of the regulation text and related information Ontology Development: Formalize terms and meanings to help development of logical rules about relationships in the regulations and among the different regulations Integrated Access: Retrieval of regulations based on the content or relationships between the regulations Analysis Tools: To validate and improve the quality of the ontology and to check the content of regulations within a domain or across different domains of federal, state and local regulations. Compliance Checking Assistance: To develop the means to interface the regulations with usage. Research Tasks

7

8

9

10 Current Tasks Parsing unstructured documents into “tagged” processable format Investigating methodology to establish concepts and classification structures in the regulatory documents Developing a “logic-based” compliance assistance system Purpose : Feedbacks and Suggestions

11 Document Repository and Access: Examples: Drinking Water Regulations: 40 CFR Part 141 and Background Documents Bill Labiosa, Charles Heenan Engineering Informatics Group Stanford University

12 Overview: Drinking Water Background Information Search Documents of interest for this example: web available documents from and online 40 CFR Part 141 Current keyword search approach vs. concept categorization approach  USEPA Search Engine: file search for keywords  SemioTagger: Concept categorization approach Using two simple categorization hierarchies: an index (alphabetical list of concepts) a regulated drinking water contaminant hierarchy

13 Current Approach: Using “EPA Search” radium removal Search Term: “radium removal”

14

15 Full 112 page Document is returned...

16 More specific search using EPA Search New search expression: “radium removal” AND “drinking water” AND “small systems”

17 Fewer results, but still full documents

18 “radium removal” AND “drinking water” AND “small systems” search confined to

19

20 Where do I begin?

21 Search problem Background documents, even when located, are voluminous. User is forced to do keyword search within documents, trying to find “the right part of the right document”: time consuming and frustrating. When you don’t know which document you want, you can end up in the familiar “information overload” situation.

22 Concept Categorization Approach: SemioTagger Two example hierarchies: –“Index for Drinking Water Information” for web available materials from OGWDW –“Index to the National Primary Drinking Water Regulations” for 40 CFR 141, using a drinking water contaminant hierarchy Both starting lists of concepts extracted by Semio were “cleaned” (irrelevant concepts deleted, important “compound word concepts” modified to meet expectations of drinking water experts).

23 Concept Categorization Approach noun phrase extraction noun phrase co-occurrence cycles hierarchy creation document tagging information retrieval interface

24

25 Document Repository and Access: Demonstration Session I: Index for Drinking Water Information and a Contaminant Hierarchy for 40 CFR 141 Bill Labiosa Engineering Informatics Group Stanford University

26 Example Taxonomy: Drinking Water Contaminants

27 Regulatory Compliance Assistance Shawn Kerrigan Engineering Informatics Group Stanford University

28 Background Current state of compliance checking: Paper-based process Locating and interpreting the relevant regulations is complex, even with the help of supplementary information Small companies have difficulty conducting compliance checks due to lack of resources and knowledge Vision for future: Up-to-date regulations and compliance-checking assistance procedures available online Improved regulation and compliance-requirement transparency through clear presentation and linking

29 Research Questions How can we make the information and rules more accessible? How can we represent the information and rules in environmental regulations in a computer interpretable format? How can we structure this information to assist with regulation compliance checking?

30 General Approach Information Integration Formalization of meaning and relationships Regulation-centric Tie the information to the appropriate portion of the regulation

31 Regulation Assistance System (RAS) Provides a unifying web interface for the regulation documents and meta-data Demonstrates the usefulness of XML structured regulation documents with meta- data Works with a logic-based compliance- checking assistance system to demonstrate web-based regulation services

32 Demonstration Session II Display regulations with meta-data Compliance example Non-compliance example

33 Regulation Parsing Need to transform plain text/PDF regulations into XML Can structure the XML to represent the hierarchical structure of the regulation

34 HTML to XML Regulation Parsing XML Structured Document

35 Regulation Parsing § Prohibitions. (a) Surface impoundment prohibition. Used oil shall not be managed in surface impoundments or waste piles unless the units are subject to regulation under parts 264 or 265 of this chapter. Used oil shall not be managed in surface impoundments or waste piles unless the units are subject to regulation under parts 264 or 265 of this chapter.

36 Adding Meta-Data to Regulations Regulation tagged with meta-data Add Legal Interpretation Reference Extraction Add Logical Interpretation Add Concepts Original XML document Document Program

37 Parsing References PART 279—Standards For The Management Of Used Oil Subpart B – Applicability … § Prohibitions. (a) Surface impoundment prohibition. Used oil shall not be managed in surface impoundments or waste piles unless the units are subject to regulation under parts 264 or 265 of this chapter. (b) Use as a dust suppressant. The use of used oil as a dust suppressant is prohibited, except when such activity takes place in one of the states listed in § (c). (c) Burning in particular units. Off-specification used oil fuel may be burned for energy recovery in only the following devices: (1) Industrial furnaces identified in § of this chapter; (2) Boilers, as defined in § of this chapter, that are identified as follows: (i) Industrial boilers located on the site of a facility engaged in a manufacturing process where substances are transformed into new products, including the component parts of products, by mechanical or chemical processes; (ii) Utility boilers used to produce electric power, steam, heated or cooled air, or other gases or fluids for sale; or (iii) Used oil-fired space heaters provided that the burner meets the provisions of § (3) Hazardous waste incinerators subject to regulation under subpart O of parts 264 or 265 of this chapter. § Used Oil Specification. …..

38 (a) Surface impoundment prohibition. Used oil shall not be managed in surface impoundments or waste piles unless the units are subject to regulation under parts 264 or 265 of this chapter. Before: (a) Surface impoundment prohibition. Used oil shall not be managed in surface impoundments or waste piles unless the units are subject to regulation under parts 264 or 265 of this chapter. After: Parsing References Original XML document XML with Reference List Reference Extraction

39 What is a “Concept”? Examples: –emission requirement –leaked hazardous substance –disposal of solvents –principal hazardous constituent Why are they useful? –identify similar regulations even when they do not reference each other –provide a “context” for the regulation provision

40 Regnet Taxonomy

41 Tagging with Concepts Used oil shall not be managed in surface impoundments or waste piles unless the units are subject to regulation under parts 264 or 265 of this chapter.

42 XML Embedded Logic all _o (usedOil(_o) -> -(dustSuppressant(_o))). Rule logic represents the rules specified by the regulation: 40.CFR b – Use as a dust suppressant: “The use of used oil as a dust suppressant is prohibited…” Option elements define the user interface: Is the used-oil used as a dust suppressant? (usedOil(oil1) & dust_suppressant(oil1)). (usedOil(oil1) & (-(dust_suppressant(oil1))). Control statements specify processing instructions for compliance-checking:

43 XML-based Regulations Additional Input Files Interactive User Input Regulation Compliance Decision Logic input fileFound proof / no proof found RASweb Provides web interface Displays regulation information RCCsession Implements compliance checking procedure User inputResults / requested information RAS System Structure * Otter is an automated- deduction program developed by William McCune at Argonne National Laboratory Otter * Attempts to find proof by contradiction from input file

44 Demonstration Session III Use of control elements Use of “I don’t know” to check multiple paths

45 Summary Can decompose regulations into a structured XML document Adding rich meta-data about regulations enables more sophisticated interaction with the documents Automated assistance with environmental compliance-checking may be possible

46 Thank You! Questions?

47 Discussion Questions How can we explain things better? How will such a system be useful? What are examples of how you could use such a system? What would make the system more useful? Do you have suggestions for people/fields we should contact that might be interested in what we are doing? How are the problems addressed currently dealt with? What are some existing technologies we should investigate? What are recommendations for issues we should address? What might be complementary tools to develop next?

48

49 Translate To Hierarchical Structure PART 279—Standards For The Management Of Used Oil Subpart B – Applicability … § Prohibitions. (a) Surface impoundment prohibition. Used oil shall not be managed in surface impoundments or waste piles unless the units are subject to regulation under parts 264 or 265 of this chapter. (b) Use as a dust suppressant. The use of used oil as a dust suppressant is prohibited, except when such activity takes place in one of the states listed in § (c). (c) Burning in particular units. Off-specification used oil fuel may be burned for energy recovery in only the following devices: (1) Industrial furnaces identified in § of this chapter; (2) Boilers, as defined in § of this chapter, that are identified as follows: (i) Industrial boilers located on the site of a facility engaged in a manufacturing process where substances are transformed into new products, including the component parts of products, by mechanical or chemical processes; …. § Used Oil Specification. ….. Subsection (a) Subsection (b) Subsection (c) Subsection (d) 40 CFR 279 Subpart ASubpart BSubpart I Section Section Section … … …… contains (a) Surface impoundment prohibition. Used oil shall not be managed in surface impoundments or waste piles unless the units … Example: (a) Surface impoundment prohibition. Used oil shall not be managed in surface impoundments or waste piles unless the units are subject to regulation under parts 264 or 265 of this chapter.

50 Document Structures Plain text PDF HTML XML

51 Plain Text Unstructured text Cannot contain non-text elements Difficult for machines to process

52 PDF Allow images and other non-text elements Not an open standard Display-enhancement, data content not structured or tagged with meaning Poses the same information-extraction problem as with plain text

53 HTML Open standard Allows incorporation of display formatting, images, sounds, and video Primarily a method for describing how data should be displayed Does not effectively represent structure or meaning of data

54 XML XML does not improve the “viewability” of web pages XML puts the data in a format that allows us to do more powerful things with it Organized structure Self-describing Searching Selective views Add meta-data

55 XML In XML we are not limited to a predefined set of tags We can now tag the data according to content, rather than display format HTML: Section General Requirements A generator who transports, or offers for transportation… Example XML: Section General Requirements A generator who transports, or offers for transportation…

56 Otter Attempts to find proof by contradiction from input file RCCsession – Otter Interaction FOPC Input File Proof Attempt Output File RCCsession Implements compliance checking procedure Develop input file with appropriate logic sentences Read proof attempt output and take appropriate action

57 Legal Interpretation Interpretation of the provision by a legal expert familiar with the regulations 40 CFR 261.4(b)(1) The following solid wastes are not hazardous wastes: (1) Household waste, including household waste that has been collected, transported, stored, treated, disposed, recovered… This provision has been upheld, but narrowed in scope by the U.S. Supreme Court. Household waste is generally not considered a hazardous waste. The court narrowed this provision when it decided ash produced by incinerating household waste is regulated as a hazardous waste if it has hazardous characteristics. Thus, if an incineration facility burns household waste, it can be considered a generator of hazardous waste.