Presentation is loading. Please wait.

Presentation is loading. Please wait.

REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold Contact

Similar presentations


Presentation on theme: "REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold Contact"— Presentation transcript:

1 REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold http://eig.stanford.edu/regnet Contact glau@stanford.edu http://eig.stanford.edu/glau An E-Government Infrastructure for Regulation Parsing and Relatedness Analysis

2 1 Motivation  Multiple sources of regulations  Multiple jurisdictions: federal, state, local, etc.  Different formats, terminologies, contexts UK DDA in HTMLADAAG in HTML  Amending rules, conflicting ideas IBC in PDF

3 2 Motivation  Multiple sources of regulations  Multiple jurisdictions: federal, state, local, etc.  Different formats, terminologies, contexts  Amending rules, conflicting ideas  Need for a repository  Locate relevant information  E.g., small business: penalty fees for violations  Need for analysis tool  Complexity of regulations  Multiple jurisdictions  Understanding of regulations & their relationships

4 3 Example 1: Related Provisions ADAAG Appendix 4.6.3 … Such a curb ramp opening must be located within the access aisle boundaries, not within the parking space boundaries. CBC 1129B.4.3 … Ramps shall not encroach into any parking space. Exception: 1. Ramps located at the front of accessible parking spaces may encroach into the length of such spaces …  CBC allows curb ramps encroaching into accessible parking stall access aisles, while ADA disallows encroachment into any portion of the stall.

5 4 Example 2: Related but Conflicting Provisions ADAAG 4.7.2 Slope. …Transitions from ramps to walks, gutters, or streets shall be flush and free of abrupt changes… CBC 1127B.5.5 Beveled lip. The lower end of each curb ramp shall have a ½ inch (13mm) lip beveled at 45 degrees as a detectable way- finding edge for persons with visual impairments.  ADAAG focuses on wheelchair traversal; CBC focuses on the visually impaired when using a cane.

6 5 Scope 1. Overview  Examples of system capabilities 2. Repository development3. Relatedness analysis

7 6 40CFR natural structureOriginal 40CFR Overview of System Capabilities: Parsing

8 7 IBC in 2-columned PDF XML hierarchy Overview of System Capabilities: Parsing … … Assembly areas with fixed seating shall comply with Sections …...

9 8 Usages of featuresExtracted features Overview of System Capabilities: Feature Parsing

10 9 Regulation comparison: 40CFR vs. 22CCR Overview of System Capabilities: Comparisons

11 10 Drafted regulations compared with public comments Overview of System Capabilities: E-rulemaking

12 11 Scope 1. Overview  Examples of system capabilities 2. Repository development3. Relatedness analysis

13 12 Repository development

14 13 Shallow parser  Data Source  Americans with Disabilities Act Accessibility Guide (ADAAG), Uniform Federal Accessibility Standards (UFAS), Code of Federal Regulations Title 40 (40CFR), UK and Scottish Disability Discrimination Act, etc.  Current standard: HTML, PDF, hardcopy...  Our system standard: XML  Unit of extraction: section/provision Fixed or built-in seating,...

15 14 Shallow parser: PDF  Basic XML format 40cfr.279.12 (a) Surface impoundment prohibition. Used oil shall not be managed in sur- face impoundments or waste piles un- less the units are subject to regulation under parts 264 or 265 of this chapter.

16 15 Shallow parser: HTML  Basic XML format...... Hazardous waste incinerators subject to regulation under subpart O of parts 264 or 265 of this chapter.

17 16 Shallow parser: extracting references...... Hazardous waste incinerators subject to regulation under subpart O of parts 264 or 265 of this chapter.

18 17 Shallow parser: feature extraction  Non-structural characteristics specific to a corpus  To aid user retrieval of relevant materials  For analysis purpose

19 18  Generic features  Concepts - noun phrases  Exceptions - negated provisions  Definitions - terminologies defined in regulations  Domain-specific features  Glossary terms - definitions from reference guides Shallow parser: feature extraction  Author-prescribed indices - concepts from field handbooks  Measurements - e.g., 2 inches max, 4 ppm  Chemicals - list of drinking water contaminants from EPA  Effective dates - provision updates

20 19 Example of definition / glossary tags Original section 3.5 from the ADAAG 3.5 DEFINITIONS. Accessible. Describes a site, building, facility, or portion thereof … Clear. Unobstructed. Refined section 3.5 in XML format accessible Describes a site, building, facility, or portion thereof... clear Unobstructed.

21 20 Example of indexTerm, concept, measurement & exception tags Original section 4.6.3 from the UFAS 4.6.3* PARKING SPACES. Parking spaces for disabled people shall be at least 96 in (2440 mm) wide and shall have an adjacent access aisle 60 in (1525 mm) wide minimum (see Fig. 9). Parking access aisles shall be part of... EXCEPTION: … an adjacent access aisle at least 96 in (2440 mm) wide complying with 4.5... Refined section 4.6.3 in XML format … Parking spaces for disabled people shall... If accessible parking spaces for...

22 21 Usages of extracted features revisited Usages of featuresExtracted features

23 22 Scope 1. Overview  Examples of system capabilities 2. Repository development3. Relatedness analysis

24 23 Relatedness analysis

25 24 Relatedness analysis  To utilize the structure, referencing of regulations and domain knowledge to obtain a better comparison  Measure  Similarity score f (A, U)  (0, 1)  Nodes A and U are provisions from two different regulation trees f  (0, 1)

26 25 Base score f 0 computation  Linear combination of feature matching  F ( A, U, i ) = similarity score between Sections ( A, U ) based on feature i  N = total number of features  Feature matching  Based on the Vector model using cosine similarity as the distance between feature vectors  Similarity between two documents M and N =  d M and d N are document vectors  Cosine is normalized => always between 0 and 1

27 26 Example of feature vectors  Traditional term match  each index term i is assigned a positive and non-binary weight w i,M in each document vector d M  Weight selection  Frequency of term, or  tf  idf model  tf = term frequency; term density  idf = inverse document frequency = log( n / n i ); term rarity  Excluding stopwords  Feature = concept  Concept vectors are formed per provision based on concept frequency in each provision  F (provision 1, provision 2, feature=concept) = cosine between two concept vectors

28 27 Axis dependency: non-Boolean matching  Vector model assumes mutual independence between axes  Domain experts do not necessarily agree  A measurement of “2 inches max” can be a 70% match to “2 inches”  Synonyms exist, e.g., ontology defined for chemicals  Limitation observed  Need flexibility to model domain knowledge, such as a 0, 50%, 75% and 100% measurement match:

29 28 Proposed non-Boolean matching model  Define a feature matching matrix E  E ij = % match between features i and j  E.g., a 3-dimensional vector space using “2 ppm”, “2 ppm max” and “2 ft” as the first, second and third measurement axes: E =  Vector space transformation  Map feature vectors onto an alternate space via matrix D  Cosines are computed on the consolidated frequency vectors  E.g., similarity based on measurements =

30 29 Vector space transformation  Define D such that E = D T D is fulfilled  Cosine between the consolidated frequency vectors: =  Reduces to a Boolean cosine when E = I

31 30 Score refinements based on regulation structure  Neighbor inclusion  Diffusion of similarity between clusters of nodes in the tree  Self vs. parent-sibling-child (psc), f s-psc  psc vs. psc, f psc-psc

32 31 Neighbor inclusion: psc vs. psc  Take a linear combination of neighboring pair scores  Formulate a neighbor structure matrix N  Define score matrix   We have  psc-psc = N A  0 N U T

33 32 Neighbor inclusion: self vs. psc  Take a linear combination of neighbor vs. self scores  Formulate a neighbor structure matrix N  Define score matrix   We have  s-psc = ½ (  0 N U T + N A  0 )

34 33 Score refinements based on regulation structure  Reference distribution  Diffusion of similarity between referencing nodes and referenced nodes in the tree  E.g., f (A5.3, U6.4(a)) updates f (A2.1, U3.3)

35 34 Reference distribution: s-ref and ref-ref  Take a linear combination of reference vs. self and reference vs. reference scores  Formulate a reference structure matrix R  Define score matrix   We have  ref-ref = R A  0 R U T and  s-ref = ½ (  0 R U T + R A  0 )

36 35  Phrasing difference between American and British regulations ufas.4.13.9 Door Hardware. Handles, pulls, latches, locks, and other operating devices on accessible doors shall have a shape that is easy … bs8300.12.5.4.2 Door Furniture. Door handles on hinged and sliding doors in accessible bedrooms should be easy to grip …  Neighbor similarities imply similarity between the interested nodes Example of results: UFAS vs BS8300

37 36 Example of results: almost identical provisions Regulation comparison: 40CFR vs. CCR

38 37  Application domain: e-rulemaking  Comparison between draft of rules and the associated public comments  ADAAG Chapter 11, rights-of-way draft  Less than 15 pages  Over 1400 public comments received within 4 months  Comments ~ 10MB in size; most are several pages long  New regulation draft can easily generate a huge amount of data that needs to be reviewed and analyzed Example of results: e-rulemaking

39 38 Example of results: e-rulemaking Regulations compared with public comments

40 39  Related draft section and public comment Adaag.1105.4.1 Where signal timing is inadequate for full crossing of all traffic lanes or where the crossing is not signalized, cut-through medians … Deborah Wood, October 29, 2002 … This often means walk lights that are so short in duration that by the time a person who is blind realizes …  No identified related section Donna Ring, September 6, 2002 If you become blind, no amount of electronics … will make you safe … You have to learn modern blindness skills from a good teacher. You have to practice your new skills …  Concern not addressed in the draft Example of results: e-rulemaking

41 40 Conclusions  An infrastructure for  Repository for regulations  Shallow parser  Feature extractions  Similarity comparison  Base score  Score refinements  Results  Comparisons between Federal codes, European codes  Application to e-rulemaking  Future Directions  Extension of application to other domains of semi-structured documents  Conflict analysis?

42 41 Thank You!


Download ppt "REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold Contact"

Similar presentations


Ads by Google