Download presentation
Presentation is loading. Please wait.
1
REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold http://eig.stanford.edu/regnet Contact glau@stanford.edu http://eig.stanford.edu/glau An E-Government Infrastructure for Regulation Parsing and Relatedness Analysis
2
1 Motivation Multiple sources of regulations Multiple jurisdictions: federal, state, local, etc. Different formats, terminologies, contexts UK DDA in HTMLADAAG in HTML Amending rules, conflicting ideas IBC in PDF
3
2 Motivation Multiple sources of regulations Multiple jurisdictions: federal, state, local, etc. Different formats, terminologies, contexts Amending rules, conflicting ideas Need for a repository Locate relevant information E.g., small business: penalty fees for violations Need for analysis tool Complexity of regulations Multiple jurisdictions Understanding of regulations & their relationships
4
3 Example 1: Related Provisions ADAAG Appendix 4.6.3 … Such a curb ramp opening must be located within the access aisle boundaries, not within the parking space boundaries. CBC 1129B.4.3 … Ramps shall not encroach into any parking space. Exception: 1. Ramps located at the front of accessible parking spaces may encroach into the length of such spaces … CBC allows curb ramps encroaching into accessible parking stall access aisles, while ADA disallows encroachment into any portion of the stall.
5
4 Example 2: Related but Conflicting Provisions ADAAG 4.7.2 Slope. …Transitions from ramps to walks, gutters, or streets shall be flush and free of abrupt changes… CBC 1127B.5.5 Beveled lip. The lower end of each curb ramp shall have a ½ inch (13mm) lip beveled at 45 degrees as a detectable way- finding edge for persons with visual impairments. ADAAG focuses on wheelchair traversal; CBC focuses on the visually impaired when using a cane.
6
5 Scope 1. Overview Examples of system capabilities 2. Repository development3. Relatedness analysis
7
6 40CFR natural structureOriginal 40CFR Overview of System Capabilities: Parsing
8
7 IBC in 2-columned PDF XML hierarchy Overview of System Capabilities: Parsing … … Assembly areas with fixed seating shall comply with Sections …...
9
8 Usages of featuresExtracted features Overview of System Capabilities: Feature Parsing
10
9 Regulation comparison: 40CFR vs. 22CCR Overview of System Capabilities: Comparisons
11
10 Drafted regulations compared with public comments Overview of System Capabilities: E-rulemaking
12
11 Scope 1. Overview Examples of system capabilities 2. Repository development3. Relatedness analysis
13
12 Repository development
14
13 Shallow parser Data Source Americans with Disabilities Act Accessibility Guide (ADAAG), Uniform Federal Accessibility Standards (UFAS), Code of Federal Regulations Title 40 (40CFR), UK and Scottish Disability Discrimination Act, etc. Current standard: HTML, PDF, hardcopy... Our system standard: XML Unit of extraction: section/provision Fixed or built-in seating,...
15
14 Shallow parser: PDF Basic XML format 40cfr.279.12 (a) Surface impoundment prohibition. Used oil shall not be managed in sur- face impoundments or waste piles un- less the units are subject to regulation under parts 264 or 265 of this chapter.
16
15 Shallow parser: HTML Basic XML format...... Hazardous waste incinerators subject to regulation under subpart O of parts 264 or 265 of this chapter.
17
16 Shallow parser: extracting references...... Hazardous waste incinerators subject to regulation under subpart O of parts 264 or 265 of this chapter.
18
17 Shallow parser: feature extraction Non-structural characteristics specific to a corpus To aid user retrieval of relevant materials For analysis purpose
19
18 Generic features Concepts - noun phrases Exceptions - negated provisions Definitions - terminologies defined in regulations Domain-specific features Glossary terms - definitions from reference guides Shallow parser: feature extraction Author-prescribed indices - concepts from field handbooks Measurements - e.g., 2 inches max, 4 ppm Chemicals - list of drinking water contaminants from EPA Effective dates - provision updates
20
19 Example of definition / glossary tags Original section 3.5 from the ADAAG 3.5 DEFINITIONS. Accessible. Describes a site, building, facility, or portion thereof … Clear. Unobstructed. Refined section 3.5 in XML format accessible Describes a site, building, facility, or portion thereof... clear Unobstructed.
21
20 Example of indexTerm, concept, measurement & exception tags Original section 4.6.3 from the UFAS 4.6.3* PARKING SPACES. Parking spaces for disabled people shall be at least 96 in (2440 mm) wide and shall have an adjacent access aisle 60 in (1525 mm) wide minimum (see Fig. 9). Parking access aisles shall be part of... EXCEPTION: … an adjacent access aisle at least 96 in (2440 mm) wide complying with 4.5... Refined section 4.6.3 in XML format … Parking spaces for disabled people shall... If accessible parking spaces for...
22
21 Usages of extracted features revisited Usages of featuresExtracted features
23
22 Scope 1. Overview Examples of system capabilities 2. Repository development3. Relatedness analysis
24
23 Relatedness analysis
25
24 Relatedness analysis To utilize the structure, referencing of regulations and domain knowledge to obtain a better comparison Measure Similarity score f (A, U) (0, 1) Nodes A and U are provisions from two different regulation trees f (0, 1)
26
25 Base score f 0 computation Linear combination of feature matching F ( A, U, i ) = similarity score between Sections ( A, U ) based on feature i N = total number of features Feature matching Based on the Vector model using cosine similarity as the distance between feature vectors Similarity between two documents M and N = d M and d N are document vectors Cosine is normalized => always between 0 and 1
27
26 Example of feature vectors Traditional term match each index term i is assigned a positive and non-binary weight w i,M in each document vector d M Weight selection Frequency of term, or tf idf model tf = term frequency; term density idf = inverse document frequency = log( n / n i ); term rarity Excluding stopwords Feature = concept Concept vectors are formed per provision based on concept frequency in each provision F (provision 1, provision 2, feature=concept) = cosine between two concept vectors
28
27 Axis dependency: non-Boolean matching Vector model assumes mutual independence between axes Domain experts do not necessarily agree A measurement of “2 inches max” can be a 70% match to “2 inches” Synonyms exist, e.g., ontology defined for chemicals Limitation observed Need flexibility to model domain knowledge, such as a 0, 50%, 75% and 100% measurement match:
29
28 Proposed non-Boolean matching model Define a feature matching matrix E E ij = % match between features i and j E.g., a 3-dimensional vector space using “2 ppm”, “2 ppm max” and “2 ft” as the first, second and third measurement axes: E = Vector space transformation Map feature vectors onto an alternate space via matrix D Cosines are computed on the consolidated frequency vectors E.g., similarity based on measurements =
30
29 Vector space transformation Define D such that E = D T D is fulfilled Cosine between the consolidated frequency vectors: = Reduces to a Boolean cosine when E = I
31
30 Score refinements based on regulation structure Neighbor inclusion Diffusion of similarity between clusters of nodes in the tree Self vs. parent-sibling-child (psc), f s-psc psc vs. psc, f psc-psc
32
31 Neighbor inclusion: psc vs. psc Take a linear combination of neighboring pair scores Formulate a neighbor structure matrix N Define score matrix We have psc-psc = N A 0 N U T
33
32 Neighbor inclusion: self vs. psc Take a linear combination of neighbor vs. self scores Formulate a neighbor structure matrix N Define score matrix We have s-psc = ½ ( 0 N U T + N A 0 )
34
33 Score refinements based on regulation structure Reference distribution Diffusion of similarity between referencing nodes and referenced nodes in the tree E.g., f (A5.3, U6.4(a)) updates f (A2.1, U3.3)
35
34 Reference distribution: s-ref and ref-ref Take a linear combination of reference vs. self and reference vs. reference scores Formulate a reference structure matrix R Define score matrix We have ref-ref = R A 0 R U T and s-ref = ½ ( 0 R U T + R A 0 )
36
35 Phrasing difference between American and British regulations ufas.4.13.9 Door Hardware. Handles, pulls, latches, locks, and other operating devices on accessible doors shall have a shape that is easy … bs8300.12.5.4.2 Door Furniture. Door handles on hinged and sliding doors in accessible bedrooms should be easy to grip … Neighbor similarities imply similarity between the interested nodes Example of results: UFAS vs BS8300
37
36 Example of results: almost identical provisions Regulation comparison: 40CFR vs. CCR
38
37 Application domain: e-rulemaking Comparison between draft of rules and the associated public comments ADAAG Chapter 11, rights-of-way draft Less than 15 pages Over 1400 public comments received within 4 months Comments ~ 10MB in size; most are several pages long New regulation draft can easily generate a huge amount of data that needs to be reviewed and analyzed Example of results: e-rulemaking
39
38 Example of results: e-rulemaking Regulations compared with public comments
40
39 Related draft section and public comment Adaag.1105.4.1 Where signal timing is inadequate for full crossing of all traffic lanes or where the crossing is not signalized, cut-through medians … Deborah Wood, October 29, 2002 … This often means walk lights that are so short in duration that by the time a person who is blind realizes … No identified related section Donna Ring, September 6, 2002 If you become blind, no amount of electronics … will make you safe … You have to learn modern blindness skills from a good teacher. You have to practice your new skills … Concern not addressed in the draft Example of results: e-rulemaking
41
40 Conclusions An infrastructure for Repository for regulations Shallow parser Feature extractions Similarity comparison Base score Score refinements Results Comparisons between Federal codes, European codes Application to e-rulemaking Future Directions Extension of application to other domains of semi-structured documents Conflict analysis?
42
41 Thank You!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.