Presentation is loading. Please wait.

Presentation is loading. Please wait.

Finding knowledge, data and answers on the Semantic Web

Similar presentations


Presentation on theme: "Finding knowledge, data and answers on the Semantic Web"— Presentation transcript:

1 Finding knowledge, data and answers on the Semantic Web
Tim Finin University of Maryland, Baltimore County Joint work with Li Ding, Anupam Joshi, Cynthia Parr, Joel Sachs, Andriy Parafiynyk and Lushan Han  This work was partially supported by DARPA contract F , NSF grants CCR and IIS

2 This talk Motivation Semantic Web background
Swoogle Semantic Web search engine Use cases and applications Social Semantic Web Conclusions

3 Google has made us smarter
Software agents will need something similar to maximize the use of information on the semantic web.

4 But what about our agents?
Software agents will need something similar to maximize the use of information on the semantic web. tell register Agents still have a very minimal understanding of text and images.

5 But what about our agents?
Swoogle Swoogle Swoogle tell register Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle Software agents will need something similar to maximize the use of information on the semantic web. Swoogle A Google for knowledge on the Semantic Web is needed by software agents and programs

6 This talk Motivation Semantic Web background
Swoogle Semantic Web search engine Use cases and applications Social Semantic Web Conclusions

7 Brief history of the Semantic Web
Tim Berners-Lee’s original 1989 WWW proposal described a web of relationships among named objects unifying many info. management tasks. Guha’s MCF (~94) XML+MCF=>RDF (~96) Semantic Web coined (~97) RDF+OO=>RDFS (~99) RDFS+KR=>DAML+OIL (00) W3C’s SW activity (01) W3C’s OWL (03) SPARQL (06) Rules, RDFa, ….

8 Interest is high Interest in industry, government and VCs is high
RDF is in Adobe’s products, Oracle 10g and 11g, Microsoft Vista, and Yahoo’s food portal Several high-visibility startups use RDF Joost (internet TV), Teranode (Bioinformatics), Garlik (personal info monitoring) And, if you want more evidence that interest is high …

9 $1795 $695 CD Only

10 What do we mean by “Semantic Web”
“a smarter Google” “NLP” PowerSet explicit semantics topic maps ad hoc approaches Microformats Tags Folksonomies XML KR based other structured Freebase Google Base RDF+OWL

11 RDF is the first SW language
Graph XML Encoding RDF Data Model <rdf:RDF ……..> <….> </rdf:RDF> Good For Human Viewing Good for Machine Processing Triples stmt(docInst, rdf_type, Document) stmt(personInst, rdf_type, Person) stmt(inroomInst, rdf_type, InRoom) stmt(personInst, holding, docInst) stmt(inroomInst, person, personInst) Good For Reasoning RDF is a simple language for building graph based representations Grounded in web standards With terms to support ontologies, description logic, rules and much of first order logic

12 IMHO Better NLP will help search engines, it’s a long term, incremental project We need an well-defined and extensible representation system for explicit knowledge It should be backed by open, non-proprietary standards supported by industry, Government and other interested parties The W3C approach is not perfect But “The perfect is the enemy of the good.” “Semantic Web” vs. “semantic web”

13 This talk Motivation Semantic Web background
Swoogle Semantic Web search engine Use cases and applications Social Semantic Web Conclusions

14 Running since summer 2004 2.1M RDF docs, 420M triples, 10K ontologies, 15K namespaces, 1.5M classes, 185K properties, 49M instances, 800 registered users

15 Swoogle Architecture Analysis Index Discovery Search Services …
IR Indexer Search Services Semantic Web metadata Web Service Server Candidate URLs Bounded Web Crawler Google Crawler SwoogleBot SWD Indexer Ranking document cache SWD classifier human machine html rdf/xml the Web Information flow Swoogle‘s web interface Legends

16 A Hybrid Harvesting Framework
true Swoogle Sample Dataset Submissions & pings Inductive learner would Seeds M Seeds H Seeds R Meta crawling Bounded HTML crawling RDF crawling google Google API call crawl crawl the Web

17 Performance – Site Coverage
SW06MAR - Basic statistics (Mar 31, 2006) 1.3M SWDs from 157K websites 268M triples 61K SWOs including >10K in high quality 1.4M SWTs using 12K namespaces Significance Compare with existing works ( DAML crawler, scutter ) Compare SW06MAR with Google’s estimated SWDs SWDs per website Website

18 Performance – crawlers’ contribution
High SWD ratio: 42% URLs are confirmed as SWD Consistent growth rate: SWDs per day RDF crawler: best harvesting method HTML crawler: best accuracy Meta crawler: best in detecting websites # of documents

19 This talk Motivation Semantic Web background
Swoogle Semantic Web search engine Use cases and applications Social Semantic Web Conclusions

20 Applications and use cases
Supporting Semantic Web developers Ontology designers, vocabulary discovery, who’s using my ontologies or data?, use analysis, errors, statistics, etc. Searching specialized collections Spire: aggregating observations and data from biologists InferenceWeb: searching over and enhancing proofs SemNews: Text Meaning of news stories Supporting SW tools Triple shop: finding data for SPARQL queries 1 2 3

21 1

22 80 ontologies were found that had these three terms
By default, ontologies are ordered by their ‘popularity’, but they can also be ordered by recency or size. Let’s look at this one

23 Basic Metadata hasDateDiscovered:   hasDatePing:   hasPingState:  PingModified type:  SemanticWebDocument isEmbedded:  false hasGrammar:  RDFXML hasParseState:  ParseSuccess hasDateLastmodified:   hasDateCache:   hasEncoding:  ISO hasLength:  18K hasCntTriple:  311.00 hasOntoRatio:  0.98 hasCntSwt:  94.00 hasCntSwtDef:  72.00 hasCntInstance:  8.00

24 Who uses this ontology and how do they access it?

25 rdfs:range was used 41 times to assert a value.
owl:ObjectProperty was instantiated 28 times time:Cal… defined once and used 24 times (e.g., as range)

26 All of this is available in RDF form for the agents among us.
These are the namespaces this ontology uses. Clicking on one shows all of the documents using the namespace. All of this is available in RDF form for the agents among us.

27 Here’s what the agent sees
Here’s what the agent sees. Note the swoogle and wob (web of belief) ontologies.

28 We can also search for terms (classes, properties) like terms for “person”.

29 10K terms associated with “person”! Ordered by use.
Let’s look at foaf:Person’s metadata

30 Metadata stored for a term is information about it’s definition – both what and by whom

31 10K terms associated with “person”! Ordered by use.

32 How do other terms use foaf:Person
How do other terms use foaf:Person? 100 documents assert that foaf:publication is a property of a foaf:Person

33 87K documents used foaf:gender with a foaf:Person instance as the subject

34 3K documents used dc:creator with a foaf:Person instance as the object

35 Swoogle’s archive saves every version of a SWD it’s seen.

36

37 2 An NSF ITR collaborative project with
University of Maryland, Baltimore County University of Maryland, College Park U. Of California, Davis Rocky Mountain Biological Laboratory

38 An invasive species scenario
Nile Tilapia fish have been found in a California lake. Can this invasive species thrive in this environment? If so, what will be the likely consequences for the ecology? So…we need to understand the effects of introducing this fish into the food web of a typical California lake

39 Food Webs A food web models the trophic (feeding) relationships between organisms in an ecology Food web simulators are used to explore the consequences of changes in the ecology, such as the introduction or removal of a species A locations food web is usually constructed from studies of the frequencies of the species found there and the known trophic relations among them. Goal: automatically construct a food web for a new location using existing data and knowledge ELVIS: Ecosystem Location Visualization and Information System

40 East River Valley Trophic Web
The web structure in the image is organized vertically, with node color representing trophic level. Red nodes represent basal species, such as plants and detritus, orange nodes represent intermediate species, and yellow nodes represent top species or primary predators. Links characterize the interaction between two nodes, and the width of the link attenuates down the trophic cascade (i.e. a link is thicker at the predator end and thinner at the prey end).

41 Species List Constructor
Click a county, get a species list

42 The problem We have data on what species are known to be in the location and can further restrict and fill in with other ecological models But we don’t know which of these the Nile Tilapia eats of who might eat it. We can reason from taxonomic data (similar species) and known natural history data (size, mass, habitat, etc.) to fill in the gaps.

43

44 Predict food web links using database and taxonomic reasoning.
Food Web Constructor Predict food web links using database and taxonomic reasoning. In an new estuary, Nile Tilapia could compete with ostracods (green) to eat algae. Predators (red) and prey (blue) of ostracods may be affected

45 Examine evidence for predicted links.
Evidence Provider Examine evidence for predicted links.

46 Status ELVIS (Ecosystem Location Visualization and Information System) as an integrated set of web services for constructing food webs for a given location. Background ontologies SpireEcoConcepts: concepts and properties to represent food webs, and ELVIS related tasks, inputs and outputs ETHAN (Evolutionary Trees and Natural History) Concepts and properties for ‘natural history’ information on species derived from data in the Animal diversity web and other taxonomic sources. 250K classes on plants and animals Under development Connect to visualization software Connect to triple shop to discover more data

47 3 Supporting SW Tools Semantic Web applications can access Swoogle through a REST-based Web interface or via SQL. Two examples: A system to help scientists construct datasets from RDF documents on the Web Tools to manage Semantic Web data in Blogs and other forms of social media

48 UMBC Triple Shop http://sparql.cs.umbc.edu/
Online SPARQL RDF query processing with several interesting features Automatically finds SWDs for give queries using Swoogle backend database Datasets, queries and results can be saved, tagged, annotated, shared, searched for, etc. RDF datasets as first class objects Can be stored on our server or downloaded Can be materialized in a database or (soon) as a Jena model

49 What’s SPARQL? SPARQL is the standard language (& protocol) for querying RDF graphs Think: SQL for RDF PREFIX rdf: < PREFIX foaf: < SELECT ?person ?name ? FROM < WHERE { ?person a foaf:Person . ?person foaf:name ?name . OPTIONAL {?person foaf:mbox ? } . }

50 The Fractal nature of SW systems
A SPARQL endpoint can make any Web data source look like a RDF graph that can be queried Give a graph as a query, get a graph as a result

51 Web-scale semantic web data access
agent data access service the Web Index RDF data ask (“person”) Search vocabulary Search URIrefs in SW vocabulary inform (“foaf:Person”) Compose query ask (“?x rdf:type foaf:Person”) Populate RDF database Search URLs in SWD index inform (doc URLs) Fetch docs Query local RDF database

52 Who knows Anupam Joshi? Show me their names, address and pictures

53 The UMBC ebiquity site publishes lots of RDF data, including FOAF profiles

54 PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT ?p2name ?p2mbox ?p2pix FROM ??? WHERE { ?p1 foaf:surname "Joshi" ?p1 foaf:firstName “Anupam" . ?p1 foaf:mbox ?p1mbox . ?p2 foaf:knows ?p3 . ?p3 foaf:mbox ?p1mbox . ?p2 foaf:name ?p2name . ?p2 foaf:mbox ?p2mbox . OPTIONAL { ?p2 foaf:depiction ?p2pix } . } ORDER BY ?p2name No FROM clause!

55 log in specify dataset Enter query w/o FROM clause!

56 We want to create a reusable dataset

57 Find RDF data using terms found in the query
That also satisfy some simple constraints (e.g., for trust)

58 302 RDF documents were found that might have useful data.

59 We’ll select them all and add them to the current dataset.

60 We’ll run the query against this dataset to see if the results are as expected.

61 The results can be produced in any of several formats

62

63 Looks like a useful dataset
Looks like a useful dataset. Let’s save it and also materialize it the TS triple store. An extension will let us ask that it be automatically updated when constituents change

64

65 We can also annotate, save and share queries.

66 This talk Motivation Semantic Web background
Swoogle Semantic Web search engine Use cases and applications Social Semantic Web Conclusions

67 Social media sites have become the biggest source of new content on the Web
Blogs, Wikis, Photo sites, forums, etc. Accounting for ~1/3 of new Web content

68 It’s a global phenomenon
Japanese is now the most common language

69 Social media sites have embraced new ways of letting users add semantic information
Showing users the potential of semantics

70 Social Media and the Semantic Web
Many are exploring how Semantic Web technology can work with social media Social media like blogs are typically temporally organized valued for their timely and dynamic information! If static pages form the Web’s long term memory, then the Blogosphere is its stream of consciousness Maybe we can (1) help people publish data in RDF on their blogs and (2) mine social media sites for useful information

71 The OWL icon links to the data in RDF
A BioBlitz involves going out to an area and recording every organism you see The OWL icon links to the data in RDF

72

73 A good Semantic Web opportunity
We want to make it easy for scientists to enter and collect information from social media Professionals, students and amateurs! Two early examples SPOTter – a tool to add Semantic Web data to blogs Splickr – a system to mine Flickr for images of organisms

74 SPOTter: SPire Observation Tool
We’ve developed some simple components to help people add RDF data to blogs and ping Swoogle to get it indexed. SPOTter is an initial prototype that uses the ETHAN ontology and is being used in some BioBlitz activities with students. We’re working toward a version that uses Twitter so that people can make the blog entries from the cell phones via SMS The SPOTter agent will get the entries (via RSS) and index the data

75 SPOTter button Once entered, the data is embedded into the blog post and Swoogle is pinged to index it

76 Prototype SPOTter Search engine
We can draw a bounding box on The map and find observations An RSS feed provided for each query Prototype SPOTter Search engine

77 Flickr The Flickr “photo sharing” site has millions of photographs
Many of plants and animals Most of them have descriptions, timestamps, tags and even geo-tags Flickr has even introduced “machine tags” that can be mapped into RDF Any Flickr users (humans or bots) can add comments and annotations There’s a good API It could be a good source of ecological information

78 Splickr is an AJAX-based application using Flickr API for querying Flickr database of publicly available pictures Pictures have tags (e.g. names of animals) and geographical coordinates, therefore we can determine location of invasive species Results can be delivered in forms for people and machines

79

80 Results for people and machines

81 This talk Motivation Semantic Web background
Swoogle Semantic Web search engine Use cases and applications Social Semantic Web Conclusions

82 Conclusion The web will contain the world’s knowledge in forms accessible to people and computers We need better ways to discover, index, search and reason over SW knowledge SW search engines address different tasks than html search engines So they require different techniques and APIs Swoogle like systems can help create consensus ontologies and foster best practices Social media provide new challenges and opportunities for the Semantic Web

83 For more information Annotated in OWL


Download ppt "Finding knowledge, data and answers on the Semantic Web"

Similar presentations


Ads by Google