Growing the Semantic Web By Charla Woodbury June 11, 2004
INTERNET to SEMANTIC WEB The present internet is too large to conduct specific searches in its present format The Semantic Web holds the promise of a much richer and easily searchable information resource Most current research targets small areas of development of the Semantic Web rather than looking at the whole process and showing its advantages What is needed is a working example of the Semantic Web that demonstrates the advantages and minimizes the problems to be able to start growing webpages for the Semantic Web
High-volume Information Publishers should be the first TARGET The old adage is to deal with the new water coming in rather than changing the water already in the lake if you want to change the lake’s water in any way By starting with high-volume information publishers, the nature of the internet lake would change very quickly
Embedded Obituary Ontology Obituary Prototype Newspaper Publisher Obituary vocabulary Word Net Daily News obituaries Daily News HOME PAGE Obituary vocabulary
Once the faucet is turned on the population pool of Semantic Webpages would grow very quickly
Thesis Statement The cost/benefit analysis of populating the Semantic Web by building an embedded OWL ontology and the corresponding specialized vocabulary on top of WordNet for EACH information publisher using an obituary prototype is practical and cost effective.
ADVANTAGES Each information publisher The ontology is only built once and used many times The specialized vocabulary is only built once and accessed many times The ontology and vocabulary belong to the publisher who can change them as the format and vocabulary of the obituaries they produce change (deletion discouraged) Most of the cost would be incurred in setting up the ontology and the specialized vocabulary
ADVANTAGES Information extraction would be done without contacting the publisher other than an agent There would be no need to index the information once the information retrieval portion was in place HTML information is easy to store and maintain HTML files are much smaller than digitized microfilm presently used
METHODS Each Newspaper Contact selected newspapers to produce semantic obituary webpages Learn how they archive the HTML version of the newspaper Get estimates on the cost to the newspaper to index, microfilm, and store their archives Request a reporter in obituaries to list specialized vocabulary and build the vocabulary and OWL ontology to be embedded Train a newspaper employee to test and edit the ontology and vocabulary Test that vocabulary and ontology to make sure that it is sufficiently inclusive Compare the time needed to build the first newspaper with the subsequent ones
METHODS Organizations using Obituary information Contact Family History businesses, Genealogical societies, and Government agencies that would use obituary information Find out how they get their obituary information now and how much that costs in time and money Measure their future interest in using agents to retrieve obituary information instead Discover what parts of the obituary information they consider minimal to their work and what information would be desired and optimal Present the results of obituary prototype and re-measure their future interest in using agents to retrieve obituary informaiton
PROBLEMS The first problem is how to entice publishers to start the process The basic problem is a semantic one? How will regional burial practices and language differences impact the process? But the biggest problem is how to maintain the ontology and vocabulary with the least amount of human intervention
First Problem How to entice publishers to start the process of making semantic webpages? Find Grants, Research Money, and/or money from Corporate sponsorship by those companies that would profit from the information Petition for Government Support Office of Internet Semantic Information (i.e. Library of Congress) Demonstrate by prototype - Obituaries Process works well (Electric lights in large cities) Specific information is far more easily found Their information is more available The maintenance process is minimal The rewards are maximal Everyone else is doing it
SECOND PROBLEM The basic problem is a semantic one? How will regional burial practices and language differences impact the process? The basic format of the specialized vocabulary would be the same as WordNet with rich word relationships (i.e. interred – interment – buried – burial as homonyms) Regional and language differences would be expressed in adding rich vocabulary as deemed necessary by the individual publisher Fine-tune and test the vocabulary and the ontology Teach the computer to speak obituary language
THIRD PROBLEM How to simplify and automate the testing and maintenance of the ontology and vocabulary? TESTING and SIMPLE MAINTENANCE Install a tool for creating and editing an OWL ontology as automated as possible Set up procedures for how often to test the ontology (i.e. new reporter, new obituary template, a set length of time) Write program that tests how effective the ontology is and lists words in the obituaries that are not in the vocabulary for review and addition to the vocabulary Teach the machine to add those words automatically to the vocabulary if possible
Evaluation Cost/benefit analysis in time and money between the original process and the new Semantic Web process Survey those testing and maintaining the Semantic Webpages about the process and the tools provided Compare Survey given to possible information retrievers before and after demonstration of the obituary prototype
CONTRIBUTIONS A working model of the Semantic Web A growing pool of semantic webpages for future information extraction & retrieval As new standards emerge, adjustments in the process could be made immediately and only once for everyone A replacement for the cost of human indexing the information
Future Work How will agents interpret many different obituary ontologies and vocabularies? Newspaper Publisher Newspaper Publisher Newspaper Publisher Newspaper Publisher Newspaper Publishers Embedded Obituary Ontology Daily News obituaries Embedded Obituary Ontology Daily News obituaries Embedded Obituary Ontology Daily News obituaries Embedded Obituary Ontology Daily News obituaries Embedded Obituary Ontologies Daily News obituaries Obituary vocabulary Obituary vocabulary Obituary vocabulary Obituary vocabulary Obituary vocabularies
Future Work Should there be one global obituary ontology and/or one global burial vocabulary? (All languages and burial practices) GLOBAL Obituary Ontology
Future Work Or will the agent be smart enough to traverse the associated vocabulary for the correct information? Obituary vocabulary Obituary vocabulary Obituary vocabulary Obituary vocabulary Obituary vocabularies AGENT
Future Work How will the agents deliver the obituary extracted information? Obituary Extracted Database Daily News || 26 Jan 2004 || Charles Lambert || b. 12 June 1911 || d. 24 Jan 2004 HTML REPORT All Obituaries with surname LAMBERT URL’s to the actual Newspaper Obituaries Charles Lambert d. 24 Jan 2004 Richard Greaves Lambert d. 17 Oct 2003 Embedded Obituary Ontology Daily News obituaries
Future Work Will it be necessary to hire and pay obituary indexers? Will the newspapers continue to be microfilmed or just stored in HTML? Will storage space be an issue? Will the whole process including information retrieval be cost effective?
QUESTIONS? COMMENTS?