1 Sarah Cohen Public Policy, Duke U. Chengkai Li CSE, U. Texas Arlington Jun Yang CS, Duke U. Cong Yu Google Inc. CIDR, January 2011.

Slides:



Advertisements
Similar presentations
Microsoft ® Access ® 2010 Training Create queries for a new database.
Advertisements

MS Access 2003 Tutorial By: Juan GUANTENG!!! Y7. Step 1 Launch the Microsoft Access 2003 program. This can be done by clicking an icon on the desktop.
Finding, Monitoring, and Checking Claims Computationally Based on Structured Data Brett Walenz, You (Will) Wu, Seokhyun (Alex) Song, Emre Sonmez, Eric.
The Experience Factory May 2004 Leonardo Vaccaro.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
introduction to MSc projects
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Minnesota Manual of Accommodations for Students with Disabilities Training Guide
Lecture Nine Database Planning, Design, and Administration
The Social Web: A laboratory for studying s ocial networks, tagging and beyond Kristina Lerman USC Information Sciences Institute.
An expert system is a package that holds a body of knowledge and a set of rules on a subject that has been gained from human experts. An expert system.
Computer Science & Engineering 2111 CSE 2111 Lecture Querying a Database 1CSE 2111 Lecture- Querying a Database.
Cloud Computing Other Mapreduce issues Keke Chen.
Searching Provenance Shankar Pasupathy, Network Appliance PASS Workshop, Harvard October 2005.
Enterprise & Intranet Search How Enterprise is different from Web search What to think about when evaluating Enterprise Search How Intranet use is different.
WISER : OvidSP OvidSP is the new interface for searching many of the science and medicine databases available via OxLIP Catherine Dockerty
Skills for evidence-informed practice: Interactive workshop Cambridge 30 April 2009.
Aardvark Anatomy of a Large-Scale Social Search Engine.
Software Engineering 2003 Jyrki Nummenmaa 1 CASE Tools CASE = Computer-Aided Software Engineering A set of tools to (optimally) assist in each.
Higher Grade Computing Studies 2. Languages and Environments Higher Computing Software Development S. McCrossan 1 Classification of Languages 1. Procedural.
DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer.
LIS510 lecture 3 Thomas Krichel information storage & retrieval this area is now more know as information retrieval when I dealt with it I.
| e n a b l i n g | i n t e r a c t i v e | a d a p t i v e | O V E R V I E W Providing secure access to real-time data via the Internet Focused on delivering.
Hackathons for Scientific Software How and When do they Work? Erik H. Trainer, Chalalai Chaihirunkarn, Arun Kalyanasundaram, James D. Herbsleb.
Database Queries. Queries Queries are questions used to retrieve information from a database. Contain criteria to specify the records and fields to be.
Hipikat: A Project Memory for Software Development The CISC 864 Analysis By Lionel Marks.
Just as there are many human languages, there are many computer programming languages that can be used to develop software. Some are named after people,
Data-Centric Human Computation Jennifer Widom Stanford University.
EXTENDING DATABASE USABILITY Michelle Brown, MSc. Student.
Eric Lease Morgan University of Notre Dame. With the advent of commodity-priced, globally networked computers, the information environment has obviously.
1 Usability Studies. 2 Evaluate Usability Run a usability study to judge how an interface facilitates tasks with respect to the aspects of usability mentioned.
The Enterprise Project Management (EPM) Professional March 28th, 2007 Brendan Giles, BSc., PMP, MOS, MCP (EPM) The Key to Successful Adoption of Enterprise.
Database and Data File Management Oct 6/7/8, 2010 Fall 2010 | / Recitation 2.
Impact of ICT on Society – Part the first ICT 1_6.
2012 Redwood Analytics® User Conference Analysis. Insight. Action. Benchmarking Making Sense of Market Data Chris Burgess Consultant, Redwood Analytics.
Skills for evidence-informed practice: Interactive workshop Dartington Hall, Devon 2 April 2009.
I Power Higher Computing Software Development Development Languages and Environments.
Recording the Context of Action for Process Documentation Ian Wootten Cardiff University, UK
1 Technical & Business Writing (ENG-715) Muhammad Bilal Bashir UIIT, Rawalpindi.
User Requirements and Engagement in Health Informatics Alistair Sutcliffe Sarah Thew, Oscar De Bruijn, Manchester Business School, Jock McNaught National.
Information Retrieval
LibQUAL Survey Results Customer Satisfaction Survey Spring 2005 Sidney Silverman Library Bergen Community College Analysis and Presentation by Mark Thompson,
ASSOCIATIVE BROWSING Evaluating 1 Jinyoung Kim / W. Bruce Croft / David Smith for Personal Information.
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
General Information How Scopus was developed: “Evidence-based development”  Market demand: abstract database, which should be …  Intuitive  Comprehensive.
1 Copyright © 2009, Oracle. All rights reserved. Oracle Business Intelligence Enterprise Edition: Overview.
Planning an effective search strategy Search techniques Managing search results Finding Information for Your Dissertation.
Using Visual Basic.NET Programming Tools in the AIS Course Training Session Brian R. Kovar Kansas State University 7 th AIS Educator Annual Meeting June.
”Smart Containers” Charles F. Vardeman II, Da Huo, Michelle Cheatham, James Sweet, and Jaroslaw Nabrzyski
© 2006 Epiance, Inc. Confidential and Proprietary 1.
Fishing for Success in the Wild with SAS and Oracle Alejandro Farias, Texas Parks & Wildlife Maureen Chew, Oracle
CSE6339 DATA MANAGEMENT AND ANALYSIS FOR COMPUTATIONAL JOURNALISM CSE6339, Spring 2012 Department of Computer Science and Engineering, University of Texas.
Software Development Languages and Environments. Computer Languages Just as there are many human languages, there are many computer programming languages.
Developing Visual Basic Applications to Interact with an Access Database Training Session Brian R. Kovar Kansas State University 8 th AIS Educator Annual.
Cloud data. Tap the buttons to count your vote! Demo: VOTING APP.
Adam Saxton. Sr. Content Developer working on Business Intelligence products 10 years supporting SQL Connectivity and BI Products Avid blogger and YouTuber.
Why Should You Apply to Graduate School? Masters Degree
Definition CASE tools are software systems that are intended to provide automated support for routine activities in the software process such as editing.
Human Computer Interaction Lecture 21,22 User Support
Lesson Objectives Aims You should be able to:
Cloud data.
UNIT 2 – LESSON 6 ENCODE AN EXPERIENCE.
RELATIONAL DATABASE MODEL
Social Networks and Data Journalism Chapter 10 Mantie Reid
Database Queries.
Lecture 12: Data Wrangling
Query Processing.
Interactive Powerpoint
Presentation transcript:

1 Sarah Cohen Public Policy, Duke U. Chengkai Li CSE, U. Texas Arlington Jun Yang CS, Duke U. Cong Yu Google Inc. CIDR, January 2011

2 Quis custodiet ipsos custodes? (Who will guard the guardians?)

Democratizing data: more data are becoming publicly available Computation has a proven track record with big data  Computational journalism Lower cost Increase effectiveness Broaden participation: democratizing data analysis 3

Fact-checking is absurdly difficult, even if you know SQL and the databases are cleansed and documented  U-check: a relational investigative tool for you No knowledge of schema or SQL required But is this simply natural language querying (NLQ)? 4 … (Lincoln) Davis voted with Nancy Pelosi 94 percent of the time… … For 36 months in a row, our district has maintained the lowest unemployment rate among our neighboring five districts…

In the 2007 Republican presidential debate, Giuliani claimed that “adoptions went up 65 to 70 percent” in New York when he was in office 5 Administration for Children’s Services was created in

Claims often are vague and/or involve complex queries Users don’t expect one-click fact-checking with instant gratification Clarifying a claim and tweaking the way it presents data are instructive in their own right  An interactive interface that relies on user feedback Suggest possible SQL queries for user to choose To help user choose, show English translations, preview answers, ask questions… 6

Test how robust a claim is See if similar claims hold for different settings Monitor a claim over time  Allow reuse of expertise/effort beyond a single story 7 … For 36 months in a row, our district has maintained the lowest unemployment rate among our neighboring five districts… What’s the margin? Did it change over time? What if we compare with six instead of five districts? How does my district do in a similar comparison? How about median income instead of employment rate? What if we revisit the comparison a year later? Can we get an alert when the streak is broken? +

U-check allows us to build up a “library” of datasets, queries leading to claims, and stories using them  A Reporters’ Black Box Learn “standard” query templates from the library and human experts Run all templates on new/updated data to find claims that hold Rank claims for further investigation by journalists 8

Cloud: aggregate/share computing resources Large-scale, real-time data analysis E.g., map/reduce for machine translation, information extraction, reporters’ black box, etc. Crowd: aggregate/share data, tools, and insights Leverage the crowd in simpler and more effective ways An “optimizer” for the investigative process with crowdsourcing support 9

10

The investigative process is difficult to plan Can our system help plan it intelligently (incl. directing the crowd), in a goal-driven fashion, like a query optimizer? Specify tasks declaratively Identify mini-tasks that can be crowdsourced Quantify cost-benefit of mini-tasks Matching mini-tasks to users Coordinate/reprioritize execution of mini-tasks … 11

The need to save watchdog journalism is pressing You and I may hold the key Journalism is not only a consumer of technology, but it can also drive computer science Our paper discusses more ideas and relevant research areas, but we have barely scratched the surface Don’t miss out working on something with a cause! 12

13

14

Attract crowd with incentives Provide useful and usable tools for investigation Cater to users’ willingness to do good for things they care Accumulate knowledge from usage Improve our system by incorporating user feedbacks and outcomes from using it  Next: one example of such a tool 15