Download presentation
Presentation is loading. Please wait.
1
Co-Directors: Yigal Arens USC / Information Sciences Institute Judith Klavans Columbia University
2
2 The purpose of DGRC To Make Digital Government Happen Advance information systems research Bring the benefits of cutting edge IS research to government systems Help educate government and the community Learn needs from government partners to drive next stage system development Build pilot systems as part of new infrastructure
3
3 The problem and the solution Solution: Create a system to provide easy standardized access: need multi-database access engine, need powerful user interface, need terminology standardization mechanism. Problem:FedStats has thousands of databases in over seventy Government agencies: data is duplicated and near-duplicated, even Government officials and specialists cannot find it
4
4 The Vision: Ask the Government... How have property values in the area changed over the past decade? How many people had breast cancer in the area over the past 30 years? Is there an orchestra? An art gallery? How far are the nightclubs? We’re thinking of moving to Denver...What are the schools like there? Census Labor Stats
5
5 Research challenges Scale to incorporate many databases … build data models automatically Process large and disparate data efficiently … develop fast processing techniques … create aggregation and substitution operators Integrate data models across sources and agencies …take a large ontology and link the models into it automatically … develop ways to automatically harvest glossary data for building ontologies Develop new ways to interact with data … use language processing tools for question-answering Display complex information from distributed sources …develop and evaluate new presentation techniques
6
6 The Energy Data Consortium EDC members Government partners Research challenge Information Sciences Institute, USC Columbia University Energy Information Admin. (EIA) Bureau of Labor Statistics (BLS) Census Bureau Make accessible in standardized way the contents of thousands of data sets, represented in many different ways (webpages, pdf, MS Access, text…) Xxx x x Xx xxxxxx Xx xx Xxx xx X Xxx x x xx
7
7 The Vision: Ask the Government... Are alternative energy sources any cheaper to use? Which state has the highest oil production? How long has the nuclear plant been in service? We’re thinking of moving to Cambridge…How much does gas cost there? Census Labor Stats
8
8 Data Integration Labor EPA EIA Census Heterogeneous Data Sources User InterfaceInformation Access Definition Ontology query
9
9 From Phase I to Phase II Phase One Terminology/ontology Information integration and in-memory data analysis New Interfaces for Complex Human-computer interaction Phase Two Question-Answering Usability Testing and Evaluation Privacy Portal
10
10 Data Integration Labor EPA EIA Census Heterogeneous Data Sources User InterfaceInformation Access Definition Ontology Trade Main Memory Query Processing Question-Answer Access User Evaluation Task-based Evaluation query
11
11 Data Integration Labor EPA EIA Census Heterogeneous Data Sources User InterfaceInformation Access Definition Ontology Trade Main Memory Query Processing Question-Answer Access User Evaluation Task-based Evaluation query
12
12 Data Integration ??? EPA EIA Census Heterogeneous Data & Meta-data Sources User InterfaceInformation Access Data Definitions (Ontology) interface query Labor definitions Metadata mediates
13
13 http://www.eia.doe.gov/emeu/states/main_ca.html Recent example EIA problem: Data cleared for publication is grouped together across states Also need data gathered by state separately Need general ability to ungroup and reaggregate data http://www.eia.doe.gov/emeu/states/main_ca.html
14
14 Main Memory Achievements on large data manipulation – optimization for efficiency and speed New input for visualization with dials that user can manipulate Applications with electoral boundaries
15
15 Get Gloss The Identification of Glossaries in High Fan-out Websites Large sites with many links Glossaries hidden all over No coherent view within and across sites No way to determine who is defining what and how
16
16 Glossary Finding Function Function to compute a best guess score Ranked list Higher is better Evaluation to determine how likely it is that a high score will be associated with a (large) glossary.
17
17 ParseGloss Once a glossary is found, then how can individual definitions be analyzed Once analyzed into components, how then can this be loaded into the ontology GetGlossParseGloss Ontology
18
18 Evaluation New Effort Peter Sommer, Director of Education Center for New Media Teaching and Learning Focus on purposeful use of emerging technologies for researchers, students, teachers, analysts… Funded by NSF and BLS
19
19 Privacy Portal Increasing multiple access to data bases creates a security problem Original DGRC proposal included component on privacy Newly funded NSF SGER proposal Columbia – Computer Science and School of Business (Stolfo and Johnson)
20
20 Privacy and Government Websites What are user fears? What are their preferences? What are their perceptions of privacy issues? What are the implications for design of systems and interfaces?
21
21 Social Science Research Explorations of “dial manipulation” application for health databases for dynamic querying Useful for interactive mapping for redistricting Use statistics on neighborhoods, e.g. CPS (long and wide) Census summary data is another source – tables compiled for various levels Joint with ISERP Social Science Research Center
22
22 Proposals SGER proposal funded Topic: Urban transportation study—new methods for freight tracking in LA by comparing across databases Grant awarded to USC, shared by ISI and USC’s Dept of Policy and Planning White paper to DoT Topic: Searching for patterns in freight traffic Submitted by USC campus people and Jose Luis Ambite ITR proposal submitted Topic: Semi-automated topic hierarchy creation Partners: Eduard Hovy communicated with EPA group If funded will use EPA’s CARAT ontology as starting point and evaluation standard
23
23 Digital Government is Here! An increasing quantity and variety of information is available in digital form Government agencies already collect much digital information Government is a holder and provider of often unique data and services Access to information/services by industry and citizen-users must be facilitated, while limiting cost and risk
24
24 Well – Not Quite... Expectations are very high due to the pervasiveness of Web/Internet information technology Government IT/IS is behind best practices Legacy, stovepipe systems designed for trusted staff Failed very large modernization efforts A disconnect exists between the research community and government IS
25
25 The purpose of DGRC To Make Digital Government Happen Advance information systems research Bring the benefits of cutting edge IS research to government systems Help educate government and the community Learn needs from government partners to drive next stage system development Build pilot systems as part of new infrastructure
26
26 Thank you! Any questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.