Presentation is loading. Please wait.

Presentation is loading. Please wait.

Harmonization and Integration of Semi- Structured Data Through Wikis and Controlled Tagging E. M. Robinson, R. B. Husar Washington University, St. Louis,

Similar presentations


Presentation on theme: "Harmonization and Integration of Semi- Structured Data Through Wikis and Controlled Tagging E. M. Robinson, R. B. Husar Washington University, St. Louis,"— Presentation transcript:

1 Harmonization and Integration of Semi- Structured Data Through Wikis and Controlled Tagging E. M. Robinson, R. B. Husar Washington University, St. Louis, MO

2 Abstract: The contents of cyberspace are increasingly generated and distributed by individuals. This is manifested by the explosive growth of web-based social software like wikis, media-sharing services and blogs. This architectural, technological and cultural transformation of the Internet, commonly referred to as Web 2.0, is good news for the Earth Science community since it offers new possibilities for sharing and harvesting community-provided content as well as collaboratively creating new things. One key feature of all of these new softwares is the end-user's ability to add tags, adding value by extending the metadata of the particular object. Ad hoc tagging (folksonomy) gives a rich description of the internet resources, but it has the disadvantage of providing a fuzzy schema. The semantic uniformity of the internet resources can be improved by controlled tagging which apply a consistent namespace and tag combinations to diverse objects. We have used the above tagging approaches in order to gather internet resources pertaining to air quality events. Initial event analysis of the southern Georgia fires, which burned in April and May, 2007, began with filtering and harvesting user- contributed web content. The Google Blog Search of 'Florida smoke' returned several thousand entries, many of them unrelated to the wildfires. Visually scanning the blog entries yielded a number of interesting posts, which were given the controlled tags '070508+Florida+Smoke' in the social bookmarking tool del.icio.us. Additional smoke photos were found in the photo-sharing service, Flickr and given the same set of controlled tags. Together, these tools yielded a rich but only qualitative description of the Georgia Fires. Because of the common set of controlled tags these web objects (i.e. links and photos) were harvested in a wiki environment, which also contained the links to quantitative air quality analysis based on satellite and surface observations.

3 Goal: Cross-leverage the shared resources on the Web, while maintaining autonomy of different services. Better apply decision-support material in research, regulation and policy Amplify and connect minds Approach: Harvest and aggregate Web content Use collaborative wiki workspaces Create knowledge products through communal and individual analysis

4 Web 2.0 software allows users to easily add objects to the web – : LinksLinks – : PhotosPhotos – : VideoVideo – : PresentationsPresentations – : Blogs/WikisBlogs/Wikis Structured metadata is already encoded on these types of data (date, user, type) All objects have URL User-Generated Content

5 Wiki Wikis originally used just for collaborative writing Features: –Editable by web users –Tags –Discussion pages –Versioning Now they are dynamic workspaces, able to embed web objects from disparate sources Add additional context, facilitate collaborative analysis Allow two-way transfer of knowledge Discuss Edit View Collaborate

6 Tags Keywords added to web objects either by provider or user Pro: Tags can be added by anyone, to any URL Allow for multiple types of categorization, not just one hierarchy Can tag in any service Con: Uncontrolled number of tags Multiple words with same meaning Can tag in any service

7 Controlled Tag-based Mediation Users can be mediators of web-based content by “wrapping” it with a unique controlled tag (or set of tags) in two ways: –Use Del.icio.us to homogenize the heterogeneous objects –Create wiki page as the web object. Add semantic tags. Create wiki page which harvests queries and adds context to create emergent, reusable knowledge Controlled Tag-based Connectivity

8 Communal Event Analysis Southern California Fire Smoke Given the high density and short response of user-generated content about air pollution events it is said that the Earth, has now acquired a "skin" for the detection of changes in the environment.

9 Control Tag : 071022SoCalSmoke Quantitative: –Harvest links and relevant datasets –Controlled tagging in the wiki (datasets) and in Del.icio.us (links) –Query/RSS from Del.icio.us and wiki into EventSpace wiki page Qualitative (Blogs, Flickr, YouTube): –Use service to perform coarse filtering –Controlled tagging in del.icio.us –RSS feed from del.icio.us into the EventSpace Datasets Links

10 Data System Profiles Data System Semantic Tags Multiple Wiki Views Wrap Data System Metadata

11 Needed consistent description of multiple, autonomous data systems All of the data systems were web-based, however the metadata about them was distributed. Used semantic tagging in the ESIP wiki to wrap distributed, heterogeneous data system metadata into a homogenous view for easy comparison of systems. Semantic tags are sets of tags with a specific type and attribute –Type determines kind of response that can be given (text, enumeration, date, location) –Attribute is the semantic tag name Queried semantic tags returns filtered list

12 Community Data Sharing - ‘DataSpaces’ Catalog – Find Dataset Dataset Reuse Meta Multiple Views DataSpaces Wrap metadata with Semantic Tags

13 Two parts: –Semantic Tags: Structured –User-added content: Unstructured Semantic Tags: –Define common features of all datasets (BBox, time range, provider) –Can be queried within the wiki to show a subset of datasets. –Ready for Export/Harvesting with RDF Feeds for use by Registries, Catalogs, XSLT transformations User-added Content: –Feedback/FAQ’s from users about datasets –Tagged, relevant papers about dataset –Dataset lineage –…

14 Summary By adding unique tags, groups can collaboratively curate lists of resources The wiki allows the integration of seemingly unrelated information from distributed web objects to be brought together by harvesting unique tags. Tagging within the wiki allows emergent structure to evolve.

15 Future Work Continue to learn how to add structure with tagging Continue to mash structured tagging with the wiki ‘canvas’ Use tagging as a way to allow feedback from user to provider. Facilitate community tagging and collaboration


Download ppt "Harmonization and Integration of Semi- Structured Data Through Wikis and Controlled Tagging E. M. Robinson, R. B. Husar Washington University, St. Louis,"

Similar presentations


Ads by Google