Legislative Influence Detector

Slides:



Advertisements
Similar presentations
Fa07CSE 182 CSE182-L4: Database filtering. Fa07CSE 182 Summary (through lecture 3) A2 is online We considered the basics of sequence alignment –Opt score.
Advertisements

CSCI 3 Introduction to Computer Science. CSCI 3 Course Description: –An overview of the fundamentals of computer science. Topics covered include number.
1 User-Centered Design at the USPTO: Application to Patent IT Modernization Marti Hearst Chief IT Strategist, USPTO May 23, 2011.
INFO 624 Week 3 Retrieval System Evaluation
Similar Sequence Similar Function Charles Yan Spring 2006.
Russell Taylor Lecturer in Computing & Business Studies.
Automated Essay Evaluation Martin Angert Rachel Drossman.
AdWords Instructor: Dawn Rauscher. Quality Score in Action 0a2PVhPQhttp:// 0a2PVhPQ.
Biostatistics-Lecture 15 High-throughput sequencing and sequence alignment Ruibin Xi Peking University School of Mathematical Sciences.
Google (LBC) Local Business Center Free Listing, Free Updates, and ( New ) Free Insights Organize your Ownership Listing for enhanced Optimization and.
Succeeding on Google By NetStart Pty Ltd Absolute Domestics Google Marketing.
Crowdsourcing Predictors of Behavioral Outcomes. Abstract Generating models from large data sets—and deter¬mining which subsets of data to mine—is becoming.
LSP 121 Week 1 Intro to Databases. Welcome to LSP 121 Quantitative Reasoning and Technological Literacy II Continuation of quantitative data concepts.
BLAST: A Case Study Lecture 25. BLAST: Introduction The Basic Local Alignment Search Tool, BLAST, is a fast approach to finding similar strings of characters.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Mission & Current Projects Community Data Webinar June 19, 2013.
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Social Networks in Most Visible Form. Social Networking Techniques in Business Several social networking techniques can help us in reaching maximum number.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
BioInformatics Database of Primer Results In order to help predict the way proteins will act in an organism, biologists cross-examine sequences of amino.
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
The Measurement of Nonmarket Sector Outputs and Inputs Using Cost Weights 2008 World Congress on National Accounts and Economic Performance Measures for.
PhD Research Seminar Series: Writing the Literature Review Dr. K. A. Korb University of Jos.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Ask any Joomla-based site owner and he'll tell you that he doesn't use dynamic keyword insertion. That's because this technology, also known as DKI, is.
Reinforcement Learning for Mapping Instructions to Actions S.R.K. Branavan, Harr Chen, Luke S. Zettlemoyer, Regina Barzilay Computer Science and Artificial.
How does the Constitution limit the powers of the government?
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
Traffic Source Tell a Friend Send SMS Social Network Group chat Banners Advertisement.
Week 1 Intro to the Course Intro to Databases.  Formerly ISP 121  “Continuation” of LSP 120 concepts  Topics include: ◦ Databases ◦ Basic statistics.
Introducing Precictive Analytics
LEAP 2025 Practice Test Webinar for Teachers
Algorithms and Programming
Lesson: Sequence processing
CS 3120 USER INTERFACE DESIGN, IMPLEMENTATION AND EVALUATION (UIDIE)
How to Moodle 1.
Queen Martin Math TEACHnology Queen Martin
Lesson Objectives Aims From the spec:
Text Based Information Retrieval
KW Agent Website Training
Machine Learning With Python Sreejith.S Jaganadh.G.
Prepared by Rao Umar Anwar For Detail information Visit my blog:
Plagiarism Just Isn’t for Everybody Stephen Burd
INTRODUCTION DIGITAL MARKETINGDIGITAL MARKETING IS A WAY OF MARKETING THE BRANDS OR THINGS BY USING DIGITAL MEDIA SUCH AS MOBILE PHONES, INTERNET, COMPUTERS.
Life outside of Physics
Research Presentation
Lecturer: Geoff Hulten TAs: Kousuke Ariga & Angli Liu
Computer Science Getting Ready for Year 13.
SAS Deep Learning: From Toolkit to Fast Model Prototyping
What Are They? Who Needs ‘em? An Example: Scoring in Tennis
Enabling ML Based Research
Java Online documentation
Applied Software Project Management
Finding Trends with Visualizations
MATERIAL Resources for Cross-Lingual Information Retrieval
How Search Engines Work?
Introduction to Information Retrieval
CS246: Information Retrieval
APPROPRIATE POINT OF CARE DIAGNOSTICS
Sequence comparison: Significance of similarity scores
Applying principles of computer science in a biological context
Welcome! Knowledge Discovery and Data Mining
SLIDE DECK 5: Informed Citizenship.
Research Presentation
Presentation transcript:

Legislative Influence Detector Sunlight Foundation State laws are not as unique as you may think. Often legislators copy bills from other states or from bills drafted by interest groups. For the past 14 weeks, we have been working with the sunlight foundation to build a tool to detect in real time where state legislators take text from.

State laws matter states spend more than $1.5 trillion on programs and services pass 75 times more bills than Congress

A month ago, Wisconsin passed a bill limiting abortion rights A month ago, Wisconsin passed a bill limiting abortion rights. Journalists immediately observed that it was similar to other bills. In fact, many bilsl contained the exact same passages.

The highlighted text in this Louisiana bill also appears in the Wisconsin bill.

Similarly, the highlighted text in this kansas bill also appears in the wisconsin bill

Similarly, Kansas had very similar language Similarly, Kansas had very similar language. To find this shared text, journalists must perform the laborious task of googling passages from Wisconsin, reading through documents and finding shared text. Copying text in this way is extremely common because legislators lack time, expertise, and staff to write their own bills. However, Finding these similar bills can be a laborious process. It requires using a search engine such as google and reading manually through the bills.

Our tool--the legislative influence detector, or LID for short--automates the process of finding shared text by using text mining and machne learning algorithms. LID works as follows.

The user inserts a portion of a query bill into the system.

LID then outputs a list of candidate bills that are likely to have shared text.

The user then clicks on one of the candidate bills and LID shows the shared text between the two documents. To do this last step, it uses an algorithm from bioinformatics for finding similar regions in DNA sequences.

To test the usefulness of our system, we inserted the text from the Wisconsin bill. We quickly found matches between (say the names of the states slowly).

In a matter of minutes, we found that there were 41 bills that shared a significant amount of text with the Wisconsin bill. Because LID is a real-time system, it enables users to answer questions that previously would have required months if not years of manually reading documents. This summer we decide to use LID to examine the influence of interest groups on state politics. It is already well-known that interest groups have success in writing legislation and lobbying for it in states. But, how much success?

Number of Introduced Bills drafted by ALICE 2010-2015 Number of Introduced Bills drafted by ALEC 2010-2015 2 163 Number of Bills 84 1 Number of Bills We collected 1000s of model bills from the websites of ALICE (a liberal group) and ALEC (a conservative group). Using LID, we were able to determine how much success these groups have had in each state in the past 5 years. We found that ALICE had introduced 960 bills and passed 84 and that ALEC had introduced 1816 and passed 163. As this analysis shows, LID will enables researchers, journalists, and concerned citizens to better understand where bills come from. We believe that LID has the capacity to help increase transparency of state level politics, helping journalists and citizens to keep government accountable. segue in: for a given query, LID is capable of finding similar bills with shared text in a matter of seconds on our corpus of more than 500,000 state bills. The real-time nature of our tool enables users to conduct large scale analysis that if done by hand (even with the aid of google) would take months and lots of tedious work. For example, this summer we wanted to understand... Total: 960 Total: 1816

Pipeline

Indexing Bills the federal federal land (A) the federal land manager of each such area shall develop a plan for evaluating visibility land manager ElasticSearch the federal land federal land manager the federal land manager federal land manager of

Definition of Alignment “I love the New York Knicks” “I like the Knicks” maybe need transition

An Optimal Alignment Scoring of alignment: Goal: match score mismatch score gap score Goal: given two texts, find the alignment of two subsequences that has the optimal score.

Example I like the Knicks ? love New York This is my example text I like the Knicks ? love New York This is my example text Match Score: 2 Mismatch Score: -2 Gap score: -.5 This is an an example of text matching cahnge the sequence below as you describe it

Example I like the Knicks 2 love New York Match Score: 2 I like the Knicks 2 love New York Match Score: 2 Mismatch Score: -2 Gap score: -.5 This is my example text This is an an example of text matching change the sequence below as you describe it

Example I like the Knicks 2 ? love New York Match Score: 2 I like the Knicks 2 ? love New York Match Score: 2 Mismatch Score: -2 Gap score: -.5 This is my example text This is an an example of text matching cahnge the sequence below as you describe it

Example I like the Knicks 2 1.5 love New York Match Score: 2 I like the Knicks 2 1.5 love New York Match Score: 2 Mismatch Score: -2 Gap score: -.5 This is my example text This is an an example of text matching cahnge the sequence below as you describe it

Example I like the Knicks 2 1.5 ? love New York Match Score: 2 I like the Knicks 2 1.5 ? love New York Match Score: 2 Mismatch Score: -2 Gap score: -.5 This is my example text This is an an example of text matching cahnge the sequence below as you describe it

Example I like the Knicks 2 1.5 1 love New York Match Score: 2 I like the Knicks 2 1.5 1 love New York Match Score: 2 Mismatch Score: -2 Gap score: -.5 This is my example text This is an an example of text matching cahnge the sequence below as you describe it

Example I like the Knicks 2 1.5 1 0.5 love 3 2.5 New York 4 I like the Knicks 2 1.5 1 0.5 love 3 2.5 New York 4 Match Score: 2 Mismatch Score: -2 Gap score: -.5 This is my example text This is an an example of text matching cahnge the sequence below as you describe it

Example I like the Knicks 2 1.5 1 0.5 love 3 2.5 New York 4 I like the Knicks 2 1.5 1 0.5 love 3 2.5 New York 4 Match Score: 2 Mismatch Score: -2 Gap score: -.5 This is my example text This is an an example of text matching cahnge the sequence below as you describe it

Is LID useful for Social Science? a new tool for measuring legislative influence variable in a regression? networks of legislators? limitations

How can computer science help SS? well known quantitative tools of SS: econometrics statistics less well known quantitative tools of SS: algorithms and machine learning

Advice for learning more Online courses Projects

Thank you