Data-gov Wiki: Towards Linking Government Data Li Ding, Dominic DiFranzo, Alvaro Graves, James R. Michaelis, Xian Li, Deborah L. McGuinness and Jim Hendler.

Slides:



Advertisements
Similar presentations
RPI Li Ding, Jim Hendler and Deborah McGuinness Tetherless World Constellation, Rensselaer Polytechnic Institute July 27, 2010 The Data-gov.
Advertisements

INCREASING THE PROBABILITY OF SUCCESS WITH COMPETITIVE GRANT PROPOSALS.
NOAA Science Advisory Board The U.S. Climate Change Science Program Strategic Plan James R. Mahoney, Ph.D. Assistant Secretary of Commerce for Oceans and.
Jennifer A. Dunne Santa Fe Institute Pacific Ecoinformatics & Computational Ecology Lab Rich William, Neo Martinez, et al. Challenges.
Build VIVO in the Cloud NIH Workshop on Value Added Services for VIVO Brand Niemann Semantic Community March 25-26,
Data-gov Demos Jim Hendler, Li Ding and Deborah L. McGuinness Tetherless World Constellation Rensselaer Polytechnic Institute April 15, 2010
Next Generation Environmental Informatics as exemplified by the Tetherless World Semantic Water Quality Portal Ping Wang 1 Jin Guang.
McGuinness – Microsoft eScience – December 8, Semantically-Enabled Science Informatics: With Supporting Knowledge Provenance and Evolution Infrastructure.
A Semantic Sommelier as an Ontology-powered Mobile Social Application and a Pedagogical Tool Deborah L. McGuinness and Evan W. Patton.
Data.gov Wiki: A Semantic Web Approach to Government Data Li Ding, Dominic DiFranzo, Sarah Magidson, Jim Hendler Tetherless World Constellation Aug 7,
Data.gov Wiki: A Semantic Web Approach to Government Data Li Ding, Dominic DiFranzo, Sarah Magidson, Alvaro Graves, James R. Michaelis, Xian Li, Deborah.
Experiences Developing a User- centric Presentation of A Domain- enhanced Provenance Data Model Cynthia Chang 1, Stephan Zednik 1, Chris Lynnes 2, Peter.
Applying Semantics in Dataset Summarization for Solar Data Ingest Pipelines James Michaelis ( ), Deborah L. McGuinness
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1
A Semantic Workflow Mechanism to Realise Experimental Goals and Constraints Edoardo Pignotti, Peter Edwards, Alun Preece, Nick Gotts and Gary Polhill School.
EGHNA Development and Support. Agenda  About EGHNA  About Drupal  Who is using Drupal?  What you can do with Drupal  Why use Drupal?  Project Deliverables.
Semantic Similarity Computation and Concept Mapping in Earth and Environmental Science Jin Guang Zheng Xiaogang Ma Stephan.
Networking Session: Global Information Structures for Science & Cultural Heritage - The Interoperability Challenge «INTEROPERABILITY FROM THE CULTURAL.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
Linking Disparate Datasets of the Earth Sciences with the SemantEco Annotator Session: Managing Ecological Data for Effective Use and Reuse Patrice Seyed.
Advances in Technology and CRIS Nikos Houssos National Documentation Centre / National Hellenic Research Foundation, Greece euroCRIS Task Group Leader.
Problem Resolution Subcommittee Presentation to the National Council on Federal Labor-Management Relations March 19, 2014.
The Linked Government Data Landscape Today data.gov and TWC LOGD Li Ding, Jim Hendler and Deborah L. McGuinness Tetherless World Constellation Rensselaer.
Beyond a Data Portal: A Collaborative Environment for the Deep Carbon Science Communities Han Wang, Yu Chen, Patrick West, John Erickson, Xiaogang Ma,
Mash-up of Linked Government Data from Li Ding, Jim Hendler and Deborah L. McGuinness Tetherless World Constellation, Rensselaer Polytechnic.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
State of the Federation Winter Meeting Washington, D.C. January 9, 2008.
SemantAqua: A Semantically-Enabled Provenance-Aware Water Quality Portal Evan W. Patton, Ping Wang, Jin Guang Zheng, Timothy Lebo, Li Ding, Joanne Luciano,
Global Change Information System: Information Model and Semantic Application Prototypes (GCIS-IMSAP) Status 01/08/2013 Stephan Zednik 1, Curt Tilmes 2,
Data Science and Analytics Curriculum development at Rensselaer (and the Tetherless World Constellation) (Adapted from NRC BigData Education Was April.
References: [1] [2] [3] Acknowledgments:
First they have to find it: Getting Government Data Discovered and Used Adapted from: John S. Erickson, Ph.D. Tetherless World Constellation Rensselaer.
TWC LOGD: A Portal for Linking Open Government Data Li Ding, Deborah L. McGuinness, Jim Hendler Tetherless World Constellation Rensselaer Polytechnic Institute.
Federal Acquisition Service ETS2 Transition Planning Meeting 20 ETS2 Transition Planning Meeting 20 February 15, 2012.
1 Advanced Semantic Technologies Prof. Deborah McGuinness and Dr. Patrice Seyed CSCI CSCI ITWS ITWS TA: Justin.
NREL is a national laboratory of the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy, operated by the Alliance for Sustainable.
A Semantically-Enabled Provenance- Aware Water Quality Portal Joint work with: Jin Guang Zheng, Ping Wang, Evan Patton, Timothy Lebo, Joanne Luciano Deborah.
Motivations and Challenges: Proper data management hinges on recording and maintaining “steps” applied to create data. Consumers require methods to assess.
Modeling and Representing National Climate Assessment Information using Linked Data Jin Guang Zheng 1 Curt Tilmes 2
Tetherless World Constellation Open Government Data Jim Hendler Tetherless World Professor of Computer and Cognitive Science Assistant Dean of Information.
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1
Prof. Peter #twcrpi) Tetherless World Constellation Chair, Earth and Environmental Science/ Computer Science/ Cognitive.
Introduction to Tetherless World RPI by Jie Bao Slides will be available from:
Applying Provenance Extensions to OPeNDAP Framework Patrick West, James Michaelis, Tim Lebo, Deborah L. McGuinness Rensselaer Polytechnic Institute Tetherless.
A Short Tutorial to Semantic Media Wiki (SMW) [[date:: July 21, 2009 ]] At [[part of:: Web Science Summer Research Week ]] By [[has speaker:: Jie Bao ]]
Linking Open Government Data (TWC LOGD) Li Ding, Jim Hendler and Deborah L. McGuinness Tetherless World Constellation Rensselaer Polytechnic Institute.
Tetherless World Constellation Semantic Web Science Jim Hendler Tetherless World Professor of Computer and Cognitive Science Assistant Dean of Information.
TWC-SWQP: A Semantically-Enabled Provenance-Aware Water Quality Portal Ping Wang, Jin Guang Zheng, Linyun Fu, Evan W. Patton, Timothy Lebo, Li Ding, Joanne.
Linked Open Government Data: What’s Next? Li Ding, James A. Hendler, and Deborah L. McGuinness With thanks to the entire RPI Tetherless World LOGD team:
References: [1] Lebo, T., Sahoo, S., McGuinness, D. L. (eds.), PROV-O: The PROV Ontology. Available via: [2]
1 Advanced Semantic Technologies Deborah McGuinness CSCI , 97543, CSCI , 97014, ITWS , 98113, ITWS , TA: Abigail.
Semantic Similarity Computation and Concept Mapping in Earth and Environmental Science Jin Guang Zheng Xiaogang Ma Stephan.
A Semantic Web Approach for the Third Provenance Challenge Tetherless World Rensselaer Polytechnic Institute James Michaelis, Li Ding,
Determining Fitness-For-Use of Ontologies through Change Management, Versioning and Publication Best Practices Patrick West 1 Stephan.
1/6/2016Cyber SMW developers meetup1 Semantic RPI Jie Bao and Li Ding Tetherless World Constellation Rensselaer Polytechnic Institute April 2, 2009.
Determining Fitness-For-Use of Ontologies through Change Management, Versioning and Publication Best Practices Patrick West 1 Stephan.
TWC LOGD: A Portal for Linking Open Government Data Dominic DiFranzo, Li Ding, John S. Erickson, Xian Li, Tim Lebo, James Michaelis, Alvaro Graves, Gregory.
1 Data.gov Initiative Implementation Acceleration Discussion Architecture and Infrastructure Committee Meeting March 19, 2009 Mike Carleton and Sonny Bhagowalia.
Driving Innovation with Open Data Chris Musialek in place for Jeanne Holm Data.gov February 9, 2012.
Semantic Water Quality Portal Jin Guang Zheng and Ping Wang Tetherless World Constellation.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Presenting Semantic Data Through “Instance Hubs” Using Authoritative URI Design Schemes Alexei Bulazel 1 ( ), Dominic Difranzo 1 (
Open Government Data Dominic DiFranzo PhD Student/Research Assistant Rensselaer Polytechnic Institute Tetherless World Constellation.
Tetherless World Constellation Open Government Data Jim Hendler Tetherless World Professor of Computer and Cognitive Science Assistant Dean of Information.
Google Coordination FGDC Coordination Group Meeting August 7, 2007 Tony LaVoi NOAA Coastal Services Center.
Scaling the Wall: Experiences adapting a Semantic Web application to utilize social networks on mobile devices Evan W. Patton 1 ( ) &
<Panel: The Art & Science of Data Visualization>
XBRL International Conference “A Roadmap for use of XBRL in
Veterans Employment Initiative
Semantic Web Towards a Web of Knowledge - Outline
Presentation transcript:

Data-gov Wiki: Towards Linking Government Data Li Ding, Dominic DiFranzo, Alvaro Graves, James R. Michaelis, Xian Li, Deborah L. McGuinness and Jim Hendler Tetherless World Constellation March 22, 2010

Outline Background –Open Government Data Initiative –data.gov The Data-gov Wiki –Making Government Linkable –Linking and Using Government Data –Provenance Issues

Background

Open Government Data Initiative Transparency Participation Collaboration Open Government Directive (Dec 8, 2009) Publish Government Information Online Improve the Quality of Government Information Create and Institutionalize a Culture of Open Government Create an Enabling Policy Framework for Open Government

data.gov, data.gov.uk and beyond What’s next? More datasets More links More provenance £30 million to fund "Institute of Web Science"

Statistics about data.gov 50 participating agencies: USDA, DOC, DOD, ED, DOE, HHS, DHS, HUD, DOI, DOJ, DOL, STATE, DOT, TREAS, VA, EPA, GSA, NASA, NSF, NRC, OPM, SBA, SSA, USAID, BBG, CFTC, CNS, EXIM, EOP, FCC, FDIC, FEC, FRB, IMLS, MSPB, NARA, NEA, NEH, NLRB, NTSB, OSHRC, ONHIR, OPIC, PBGC, RRB, SEC, SSS, TVA, CPSC, EEOC Source: accessed March 21, 2010http://

The Data-gov Wiki

About the data-gov Wiki Mission The data-gov project investigates the role of semantic web technologies, esp. linked data, in producing, processing and utilizing government data found in data.gov. Objectives Support linked government data publishing, applications and provenance using semantic technologies Educate potential developers and users Enable social collaborations on linked government data This project is run by the Tetherless World Constellation at RPI, headed by Profressor Jim Hendler and Deborah McGuinness and led by Li Ding. Other team members include: Dominic DiFranzo, Sarah Magidson,James Michaelis, Alvaro Graves, Adam Bell, Jin Guang Zheng, Xian Li, Tim Lebo, Gregory Todd Williams, and Peter Coons.

Data-gov Wiki Architecture Data Web Linked Data Linked Data LGD in RDF Enhancement Conversion Knowledge Provenance … Usage LGD: Linked government data

Data-gov Cloud (Oct 2009) US-COMMUNITY ( ) CASTNET (1990 – Present) RECS (2005) GOV-BUDGET ( ) TOXIC-RELEASE ( ) EARTHQUAKE (Present) STATE-LIB ( ) PUBLIC-LIB ( ) MED-COST ( ) LABOR-STAT (19xx-Present) DATA-GOV-CATALOG (present) Government Community Services Environment CASTNET sites RECS code US agency US location Linked Data USAspending ( ) GeoNames

data.gov + uk gov data + NY times + DBpedia

From Open Government Data (OGD) to Linked Government Data (LGD)

Make government data linkable Account nameAgency name Donations, Donations for the Official Residence of the Vice President Executive Office of the President RDF Conversion *Minimal and extensible * Web accessible Donations, Donations for the Official Residence of the Vice President … Executive Office of the President Raw RDF: Raw Data:

Linking at Conversion Time Reuse Property Defense Vessel Transfer Receipt Account … Department of Defense--Military Raw RDF: Donations, Donations for the Official Residence of the Vice President … Executive Office of the President Raw RDF:

Linking using Semantic Wiki enrich ontology definition Property Definition: [[rdfs:subPropertyOf::Property:rdfs:label]] 92/title … Property Definition:

Linking using Semantic Wiki connect entities using owl:sameAs X Wrong Wikipedia Name  Correct Wikipedia Name

Incremental Data Enhancement ….. Enhance raw RDF with links: Link to DBpedia: Executive Office of the President ……

Runtime Linking in Applications Link datasets by common literal value Link datasets by overlapping time –Align multiple time series –Support users to comment on time series data

Provenance Issues

Provenance Annotation Descriptions Relations Dataset Demo Agency

Provenance Events CSV2RDF SemDiff Archive Enhance visualize derive create derive revision

Results from Revision Provenance The number of datasets published at data.gov has been tripled since July 2009 Dataset updates on data.gov are not limited to additions.

Conclusion

Conclusion - Observations  Minimal and extensible RDF conversion is useful for generate linked government data in a timely fashion  Literal name is still useful in linking data, especially if we know the context of data  Social semantic web technologies can help distributing high cost tasks, e.g. mapping entity names, to the crowd.  Provenance is a growing requirement to the transparency of open data applications

Conclusions – Ongoing Work build hub datasets GOV-BUDGET ( ) PUBLIC-LIB ( ) CASTNET sites US agency US location USAspending ( ) Employment statistics Medicare cost IRS annual Tax report DATA-GOV-CATALOG (present) US Census State population Blah, blah… … ….. skos:altLabel owl:sameAs

Conclusions – Ongoing Work Making Sense of LGD AI + CI ! To appear in Web Sci 2010 conference – co-located with WWW 2010

Conclusions – Ongoing Work incremental knowledge on social semantic web A social semantic web website can substantially promote collaborations on knowledge accumulation (ontology as well as instance linkage) We need a tradeoff on costly high quality conversion and ugly minimal conversion #a dgp92:title “my title” dgp92:title rdfs:subPropertyOf rdfs:label #a rdfs:label “my title” #a skos:prefLabel “my title” #a foaf:name “my title” ?

Conclusions – Ongoing Work provenance is everywhere Evaluate issues on exposing provenance data and improve semantic-difference computation.  provenance vocabulary  provenance awareness  provenance reasoning  provenance mining  …

Ok, it is really the final conclusion The data-gov project does not use much AI for now (most on representation side), but even little semantics goes a long way The massive knowledge accumulated in this project is now raising a number of challenges to AI (especially the computation side) Semantic technologies are not far from us, undergraduate students can build a demo quickly!

BTW,…. Questions? Shameless self-promotions Link: “Browsing and Finding Linked Data” by Shangguan this afternoon See us at demo/poster session, we have more exciting demos to show you IPAW 2010 (June 2010, Troy, NY) will be looking for late breaking news from you!