Claire Cardie (CS+IS), Cynthia Farina (Law), Matt Rawding (IS), Adil Aijaz (CS) CeRI (Cornell eRulemaking Initiative) Cornell University An eRulemaking.

Claire Cardie (CS+IS), Cynthia Farina (Law), Matt Rawding (IS), Adil Aijaz (CS) CeRI (Cornell eRulemaking Initiative) Cornell University An eRulemaking Corpus: Identifying Substantive Issues in Public Comments

Plan for the Talk  Background –E-rulemaking  CeRI FTA Grant Circulars Corpus  Text Categorization Experiments

Rulemaking  E-Rulemaking  Rulemaking: one of the principal methods of making regulatory policy in the US -~4000+ per year  “notice and comment” rulemaking: formal public participation phase –10 – 500,000 comments per rule –comment length: 1 sentence – 10’s of pages –agency legally bound to respond to all substantive issues  E-rulemaking = e-notice and e-comment

Current Agency Practice

Goals of Our Current Work Determine the degree to which automatic issue categorization can facilitate analysis of comments by identifying and categorizing “relevant issues”. Framed as a text categorization task: Given a comment set, the automated system should determine, for each sentence in each comment, which of a group of pre-defined issue categories it raises, if any. Builds on the work of Kwon & Hovy (2007) and Kwon et al. (2006)

Plan for the Talk  Background  CeRI FTA Grant Circulars Corpus –Difficulties –Interannotator agreement results  Text Categorization Experiments

FTA Grant Circulars Rule  Topic: guidance to public and private transportation providers applying for federal aid for elderly, disabled and low income persons  267 comments shortest: 1 sentence longest: 1420 sentences  11,094 sentences total

FTA Grant Circulars Issue Set 17 top- level issues 39 fine-grained issues

Kwon & Hovy (2007) vs.

Difficulties for Text Categorization  Large, hierarchical issue set

FTA Grant Circulars Issue Set 17 top- level issues 39 fine-grained issues

Difficulties for Text Categorization  Large, hierarchical issue set  “NONE” category  Skewed distribution across issues –87% of the sentences are from 6 categories –13% of the sentences are from 33 categories  Potentially multiple issues per sentence.  Even long sentences contain few words.  Variation in comment quality, scope, vocabulary and form.

The Annotators

Interannotator Agreement  146 comments used for the study  6 annotators  2.66 annotators per comment  41.5 sentences per comment  Overlap agreement measure

Category-by-Category IAG Results funding planning procedural JARC evaluation

Plan for the Talk  Background –E-rulemaking –Public comment analysis  CeRI FTA Grant Circulars Corpus –Difficulties –Interannotator agreement results  Text Categorization Experiments

 Fine-grained issues (39)  Coarse-grained issues (17) Standard Text Categorization Algorithms Standard (flat) text categorization methods Hierarchical text categorization methods SVMs (0/1 loss) Maxent Naïve Bayes cascaded classification Dumais & Chen (2000)

Cascaded Categorization Some

Cascaded Categorization

Gold Standard Data Set  Simulate agency comment analysis process –One analyst / rule  Six data sets –One data set / annotator

SVM Results with tf.idf Features

Best-Performing Fine-Grained Issues (Annotator 1)

Progress and Plans Promising initial results rule-specific issue categorization of public comments –Annotate comments for more rules –Expert (rulewriter) vs. law student annotation –Integrate automatic text categorization into annotation interface Active learning (Purpura, Cardie & Simons, dg.o 2008) Collaboration with HCI colleagues in InfoSci

The End  For more on –the hierarchical text categorization method Cardie et al. (dg.o 2008) –a new structural learning approach for hierarchical classification Purpura et al. (in preparation) –active learning methods for hierarchical text categorization Purpura, Cardie & Simons (dg.o 2008)

Minimizing the Costliest Errors** **Underinclusive errors are the most costly

The “Sophisticated Commenter”

At the other extreme… I am disabled and take medications and fear flying because of the new government conditions on Air Marshalls to determine if someone looks suspicious behaviour but what if passengers take psychiatric medications and have side effects such as execisive sweating or shallow breathing due to medications? I hope my concern be properly addressed where Airlines can also increase the seating on plans to provide additional information of medicaitons and items which can accomodate passengers when flying instead of assume and act without knowing of there history of medications. I guess disabled people are being forced to give up privacy just to avoid any problems from Air Marshalls. Jon

Claire Cardie (CS+IS), Cynthia Farina (Law), Matt Rawding (IS), Adil Aijaz (CS) CeRI (Cornell eRulemaking Initiative) Cornell University An eRulemaking.

Similar presentations

Presentation on theme: "Claire Cardie (CS+IS), Cynthia Farina (Law), Matt Rawding (IS), Adil Aijaz (CS) CeRI (Cornell eRulemaking Initiative) Cornell University An eRulemaking."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Claire Cardie (CS+IS), Cynthia Farina (Law), Matt Rawding (IS), Adil Aijaz (CS) CeRI (Cornell eRulemaking Initiative) Cornell University An eRulemaking.

Similar presentations

Presentation on theme: "Claire Cardie (CS+IS), Cynthia Farina (Law), Matt Rawding (IS), Adil Aijaz (CS) CeRI (Cornell eRulemaking Initiative) Cornell University An eRulemaking."— Presentation transcript:

Similar presentations

About project

Feedback