Claire Cardie (CS+IS), Cynthia Farina (Law), Matt Rawding (IS), Adil Aijaz (CS) CeRI (Cornell eRulemaking Initiative) Cornell University An eRulemaking.

Slides:

Advertisements

Similar presentations

Introduction to Legal Issues on Social Media & the Federal Government Peter Swire Ohio State University Center for American Progress DHS Conference June.

Advertisements

Notice of Proposed Rulemaking on Standards WECC Board of Directors Meeting December 7-8, 2006.

Text Categorization Moshe Koppel Lecture 1: Introduction Slides based on Manning, Raghavan and Schutze and odds and ends from here and there.

Reflection: TOPIC: Are people naturally “good” or are they forced to be “good” by social rules and legal institutions? INTRODUCE EVIDENCE: Why do you believe.

The National Association of Student Financial Aid Administrators © NASFAA 2012 Listening Session Reauthorization Task Force

TÍTULO GENÉRICO Concept Indexing for Automated Text Categorization Enrique Puertas Sanz Universidad Europea de Madrid.

Annotating Topics of Opinions Veselin Stoyanov Claire Cardie.

1 SCORT 2010 September 21, 2010 David Valenstein Federal Railroad Administration State Rail Planning.

A Statistical Model for Domain- Independent Text Segmentation Masao Utiyama and Hitoshi Isahura Presentation by Matthew Waymost.

Streamlining Access Procedures Gerd Winter Introduction Access procedures in Kenya Streamlining procedures.

Automatic Verb Sense Grouping --- Term Project Proposal for CIS630 Jinying Chen 10/28/2002.

1 Federal Transit Programs Federal Transit Administration Jennifer Stewart FTA Region 8 November 9, 2007.

Hypertext Categorization using Hyperlink Patterns and Meta Data Rayid Ghani Séan Slattery Yiming Yang Carnegie Mellon University.

GSA Expo 2009 Impact of Secure Flight Program on DoD Travel Mr. George Greiling GSA Expo June 2009.

Behavioral Health Coding that Works in Primary Care Mary Jean Mork, LCSW April 16 & 17, 2009.

Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.

Connecticut Department of Transportation Bureau of Policy & Planning.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Andrew Nash Senior Director of Identity Services Topics in Identity and Payments.

Is NEPA Preventing Energy Development? Bryan Hannegan, Ph.D. Associate Director – Energy and Transportation White House Council on Environmental Quality.

Technical Regulations – U.S. Procedures and Practices U.S.-Brazil Commercial Dialogue Digital Video Conference Series August 22, 2006 Mary Saunders Chief,

Federal Docket Management System (FDMS.gov) Regulations.gov Federal Register Liaison Conference June 5, 2014.

Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.

Department of Commerce Bureau of Industry and Security “EAR Regulatory Update” Arlington, Virginia June 10, 2008 Timothy Mooney Export.

The PRISM Privacy Tool: A User’s Guide PHDSC Home Page  PRISM Web Page 

1 Statistical NLP: Lecture 9 Word Sense Disambiguation.

Smarter Balanced Assessment Update English Language Arts February 2012.

Andrew J. Mason 1, Brita Nellermoe 1,2 1 Physics Education Research and Development University of Minnesota, Twin Cities, Minneapolis, MN 2 University.

INTERACTIVE ANALYSIS OF COMPUTER CRIMES PRESENTED FOR CS-689 ON 10/12/2000 BY NAGAKALYANA ESKALA.

Implementing the Regulatory Flexibility Act. 2 Background The Regulatory Flexibility Act (5 U.S.C. 601–612) requires Federal agencies to— –Consider the.

Math Information Retrieval Zhao Jin. Zhao Jin. Math Information Retrieval Examples: –Looking for formulas –Collect teaching resources –Keeping updated.

Domain-Specific Iterative Readability Computation Jin Zhao 13/05/2011.

1 National Outcomes and Casemix Collection Training Workshop Adult Ambulatory.

Reflection: TOPIC: Are people naturally “good” or are they forced to be “good” by social rules and legal institutions? INTRODUCE EVIDENCE: Why do you believe.

PIER Research Methods Protocol Analysis Module Hua Ai Language Technologies Institute/ PSLC.

Projects of National and Regional Significance Program.

Potential of Medicaid and SCHIP Expansions To Increase Insurance Coverage for CSHCN Amy Davidoff, Ph.D. Alshadye Yemane, B.A. The Urban Institute American.

Presentation Agenda Introduction NSF Project Overview Current State Of The Art Our Understanding Of Your Requirements Design Implementation / Demo Progress.

Evaluating an Opinion Annotation Scheme Using a New Multi- perspective Question and Answer Corpus (AAAI 2004 Spring) Veselin Stoyanov Claire Cardie Diane.

APTA Annual Meeting Safety Rulemaking Update October 6, 2015.

Latent Dirichlet Allocation

Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore.

From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:

Consensus Relevance with Topic and Worker Conditional Models Paul N. Bennett, Microsoft Research Joint with Ece Kamar, Microsoft Research Gabriella Kazai,

Defining the Research Ethics Research ethics involves the application of fundamental ethical principles to a variety of topics involving research, including.

Hypertext Categorization using Hyperlink Patterns and Meta Data Rayid Ghani Séan Slattery Yiming Yang Carnegie Mellon University.

Automatic vs manual indexing Focus on subject indexing Not a relevant question? –Wherever full text is available, automatic methods predominate Simple.

Detection of Implicit Citations for Sentiment Detection Awais Athar & Simone Teufel.

Bringing Order to the Web : Automatically Categorizing Search Results Advisor ： Dr. Hsu Graduate ： Keng-Wei Chang Author ： Hao Chen Susan Dumais.

Review of CIP Strategic Plans The National Evaluation of the Court Improvement Program December 10, 2007.

Section 5311 & Charter Rule Explained July and Program Purpose According to 49 C.F.R. Section 604(e) of the Charter Rule, “The requirements.

Learning Procedural Knowledge through Observation -Michael van Lent, John E. Laird – 인터넷 기술 전공 022ITI02 성유진.

The Role of Public Participation in Advancing Environmental Justice.

General Ethical Principles

Soliciting Reader Contributions to Software Tutorials

New Machine Learning in Medical Imaging Journal Club

School of Computer Science & Engineering

Overview of The Access for All Advisory Committee

INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.

Text Classification CS5604 Information Retrieval and Storage – Spring 2016 Virginia Polytechnic Institute and State University Blacksburg, VA Professor:

The Rulemaking Process

Title III Case Scenarios

Title III Case Scenarios

The International Classification of Functioning (WHO, 2002)

ABUSES OF DOMINANT POSITIONS

Paid Family and Medical Leave

Reauthorization Task Force

Autism (ASD) and the Educational Environment

Using the DLLP with Dialogue Journals

Presentation transcript:

Claire Cardie (CS+IS), Cynthia Farina (Law), Matt Rawding (IS), Adil Aijaz (CS) CeRI (Cornell eRulemaking Initiative) Cornell University An eRulemaking Corpus: Identifying Substantive Issues in Public Comments

Plan for the Talk  Background –E-rulemaking  CeRI FTA Grant Circulars Corpus  Text Categorization Experiments

Rulemaking  E-Rulemaking  Rulemaking: one of the principal methods of making regulatory policy in the US -~4000+ per year  “notice and comment” rulemaking: formal public participation phase –10 – 500,000 comments per rule –comment length: 1 sentence – 10’s of pages –agency legally bound to respond to all substantive issues  E-rulemaking = e-notice and e-comment

Current Agency Practice

Goals of Our Current Work Determine the degree to which automatic issue categorization can facilitate analysis of comments by identifying and categorizing “relevant issues”. Framed as a text categorization task: Given a comment set, the automated system should determine, for each sentence in each comment, which of a group of pre-defined issue categories it raises, if any. Builds on the work of Kwon & Hovy (2007) and Kwon et al. (2006)

Plan for the Talk  Background  CeRI FTA Grant Circulars Corpus –Difficulties –Interannotator agreement results  Text Categorization Experiments

FTA Grant Circulars Rule  Topic: guidance to public and private transportation providers applying for federal aid for elderly, disabled and low income persons  267 comments shortest: 1 sentence longest: 1420 sentences  11,094 sentences total

FTA Grant Circulars Issue Set 17 top- level issues 39 fine-grained issues

Kwon & Hovy (2007) vs.

Difficulties for Text Categorization  Large, hierarchical issue set

FTA Grant Circulars Issue Set 17 top- level issues 39 fine-grained issues

Difficulties for Text Categorization  Large, hierarchical issue set  “NONE” category  Skewed distribution across issues –87% of the sentences are from 6 categories –13% of the sentences are from 33 categories  Potentially multiple issues per sentence.  Even long sentences contain few words.  Variation in comment quality, scope, vocabulary and form.

The Annotators

Interannotator Agreement  146 comments used for the study  6 annotators  2.66 annotators per comment  41.5 sentences per comment  Overlap agreement measure

Category-by-Category IAG Results funding planning procedural JARC evaluation

Plan for the Talk  Background –E-rulemaking –Public comment analysis  CeRI FTA Grant Circulars Corpus –Difficulties –Interannotator agreement results  Text Categorization Experiments

 Fine-grained issues (39)  Coarse-grained issues (17) Standard Text Categorization Algorithms Standard (flat) text categorization methods Hierarchical text categorization methods SVMs (0/1 loss) Maxent Naïve Bayes cascaded classification Dumais & Chen (2000)

Cascaded Categorization Some

Cascaded Categorization

Gold Standard Data Set  Simulate agency comment analysis process –One analyst / rule  Six data sets –One data set / annotator

SVM Results with tf.idf Features

Best-Performing Fine-Grained Issues (Annotator 1)

Progress and Plans Promising initial results rule-specific issue categorization of public comments –Annotate comments for more rules –Expert (rulewriter) vs. law student annotation –Integrate automatic text categorization into annotation interface Active learning (Purpura, Cardie & Simons, dg.o 2008) Collaboration with HCI colleagues in InfoSci

The End  For more on –the hierarchical text categorization method Cardie et al. (dg.o 2008) –a new structural learning approach for hierarchical classification Purpura et al. (in preparation) –active learning methods for hierarchical text categorization Purpura, Cardie & Simons (dg.o 2008)

Minimizing the Costliest Errors** **Underinclusive errors are the most costly

The “Sophisticated Commenter”

At the other extreme… I am disabled and take medications and fear flying because of the new government conditions on Air Marshalls to determine if someone looks suspicious behaviour but what if passengers take psychiatric medications and have side effects such as execisive sweating or shallow breathing due to medications? I hope my concern be properly addressed where Airlines can also increase the seating on plans to provide additional information of medicaitons and items which can accomodate passengers when flying instead of assume and act without knowing of there history of medications. I guess disabled people are being forced to give up privacy just to avoid any problems from Air Marshalls. Jon