Presentation is loading. Please wait.

Presentation is loading. Please wait.

Knowledge Streams: Stream Processing of Semantic Web Content Mike Dean Principal Engineer Raytheon BBN Technologies 1.

Similar presentations


Presentation on theme: "Knowledge Streams: Stream Processing of Semantic Web Content Mike Dean Principal Engineer Raytheon BBN Technologies 1."— Presentation transcript:

1 Knowledge Streams: Stream Processing of Semantic Web Content Mike Dean Principal Engineer Raytheon BBN Technologies mdean@bbn.com 1

2 Assumptions Technology – Intermediate –Familiarity with RDF and OWL Interest in –Stream processing –Scalability 2

3 Presenter Background Principal Engineer at Raytheon BBN Technologies (1984-present) Principal Investigator for DARPA Agent Markup Language (DAML) Integration and Transition (2000-2005) –Chaired the Joint US/EU Committee that developed DAML+OIL and SWRL Developer and/or Principal Investigator for many Semantic Web tools, datasets, and applications (2000-present) Member of the W3C RDF Core, Web Ontology, and Rule Interchange Format Working Groups –Co-editor of the W3C OWL Reference Local co-chair for ISWC2009 Other SemTech presentations –Semantic Query: Solving the Needs of a Net-Centric Data Sharing Environment (2007, w/ Matt Fisher) –Semantic Queries and Mediation in a RESTful Architecture (2008, w/ John Gilman and Matt Fisher) –Use of SWRL for Ontology Translation (2008) –Semantic Web @ BBN: Application to the Digital Whitewater Challenge (2009, w/ John Hebeler) –How is the Semantic Web Being Used? An Analysis of the Billion Triples Challenge Corpus (2009) –Finding a Good Ontology: The Open Ontology Repository Initiative (2010, w/ Peter Yim and Todd Schneider) 3

4 Outline Motivation Vision Building Blocks Demonstration 4

5 Motivations Timeliness Performance 5

6 Timeliness Streaming minimizes latency –Processing elements see events as they occur –Resources are expended only when an event occurs This is in contrast to polling –Latency averages half the polling interval –Resources are expended on every poll –Popular web syndication mechanisms such as RSS and Atom involve polling 6

7 Performance Many Semantic Web tools provide streaming parsers rather than, or in addition to, model access –Analogous to XML SAX vs. DOM For suitable applications, this can be 10x faster than loading all statements into memory or a KB 7

8 2 Streaming Stories dumpont of OpenCyc (circa 2003) –HTML-based ontology visualization tool periodically bogged down daml.org server –Reimplementation using event-based Jena ARP parser yielded 10x performance and scalability improvements Billion Triples Challenge 2009 –Streaming analysis of the 2009 corpus was performed at an overall rate of 103K statements/sec on a Mac laptop with a portable external disk –Compare to loading 10-20K statements/second on a server 8

9 Stream Processing Examples Unix pipes Dataflow architectures Streambase IBM System S/InfoSphere Streams 9

10 aggregation persistent queries persistent queries augmentation context filter context filter alerts correlation translation inference distribution DataSources Distribution And Processing Elements Users CEP NLP SensorNetwork Imagery RSS IM Gazetteer Sensor Semantic Web Database Persistent pipelines Streams of statements comprising object subgraphs URI naming allows drill-down Provenance, timestamps Processing elements Consume and produce subgraphs Multiple functions may be combined Archive User 2 User 3 Community of Interest 1 Community of Interest 2 User 1 Vision: Knowledge Streams 10

11 Goals Web-scale –Decentralized among multiple sites –Heterogenous implementations Long-lived, persistent connections –User accountability Introspection over the processing network for control and optimization –E.g. aggregating subscriptions –Balance with security, privacy, and autonomy concerns 11

12 Building Blocks RDF Content Existing stream processing frameworks Workflow systems Publish/subscribe message oriented middleware 12

13 RDF Payloads Malleable data –Standards-based graph structure –Can easily add, remove, and transform statements Self-describing –Unique naming via URIs –References to vocabularies and ontologies Potential for inference 13

14 Workflow Systems Graphical environments for developing processing pipelines –Yahoo Pipes, DERI Pipes, SPARQLMotion –Nice user interfaces for development and execution 14 http://pipes.deri.org

15 Semantic Complex Event Processing Complex Event Processing –One of the leading edges of rules technology –Formal specification of higher-level events in terms of lower-level events E.g. alert if the moving average increases 15% within a 10 minute window –Engine can be compiled/optimized for a specific rule set –High-volume deployments in finance and other industries –Most implementations focus on self-contained tuples Semantic Complex Event Processing –Enrich CEP using Semantic Web technology –Emerging topic at recent conferences Early implementations –Wrappers around open source CEP engines –Native implementation Provides a powerful set of operators and engines for Knowledge Streams 15

16 Implementation Approach Well-defined APIs for implementing operators Operator execution containers –Could encapsulate existing engines Start with manual processing network configuration, then automate 16

17 Use Cases Dissemination of metadata for new satellite imagery Social network changes Alerting of friends’ new publications … 17

18 Demo Processing using DERI Pipes with new operators –Ingest of #SemTechBiz tweets using Twitter Streaming API –Conversion of JSON to RDF –Mapping to SIOC vocabulary using SWRL rules –Enrich by matching Twitter @handles with contacts –Persistent buffering using Java Message Service –Monitoring 18


Download ppt "Knowledge Streams: Stream Processing of Semantic Web Content Mike Dean Principal Engineer Raytheon BBN Technologies 1."

Similar presentations


Ads by Google