Copyright Antidot™ 1 Linked Enterprise Data LEVERAGING THE SEMANTIC WEB STACK IN A CORPORATE ENVIRONMENT ISWC 2012 – BOSTON FABRICE LACROIX –
Copyright Antidot™ 2 Antidot – who we are French-based Software Vendor Since 1999 | Paris, Lyon, Aix-en-Provence Information access | Data management Mission: Provide our customers with innovative customizable solutions that help them create value with their data, and make their employees more aware and efficient.
Copyright Antidot™ 3 Clients Publishing Healthcare EnterprisesE-commerce
Copyright Antidot™ 4 Unstructured documents files, ECM, collaborative spaces intranet, extranet, Web sites s, instant messaging
Copyright Antidot™ 5 Structured data CRM, ERP, directory knowledge bases business applications (production, support)
Copyright Antidot™ 6 IS are bloated 1 practice => 1 need => 1 application => 1 silo Information system is driven by the process Data are numerous, various and scattered
Copyright Antidot™ 7 Solutions or workarounds? BIMDM SOASearch
Copyright Antidot™ 8 Solutions and workarounds Enterprise Search brings little value to users Document oriented Does not solve real business problems Google like Verity like
Copyright Antidot™ 9 What we want
Copyright Antidot™ 10 What we want LDAP CRM Production ERP ECM Files Support
Copyright Antidot™ 11 Changing the paradigm Switching from an application view to a data centric way of thinking.
Copyright Antidot™ 12 Bring out the implicit Build the Giant Enterprise Graph
Copyright Antidot™ 13 LED Linked Enterprise Data application of the Semantic Web technologies and Linked Data principles to the enterprise infrastructure
Copyright Antidot™ 14 What works for the Web… Federating silos on the Web
Copyright Antidot™ 15 …can’t always be used in corporate IS Legacy apps can’t be "Sparql’ed" 80% un- or semi- structured data don’t fit in the model as such Defining vocabularies/ontologies for silos is too complex and expensive Don’t want RDF per se but valuable information External data is available in XML/JSON through Web Services Staff trained for RDB, XML, Web apps. No Risk and stability strategy: SemWeb technology considered as new and immature
Copyright Antidot™ 16 The RDF/storage approach Setting up a global RDF repository does not work either ITs are afraid by the "RDF everywhere" activists
Copyright Antidot™ 17 Semantic Web technology still is the right solution in corporate environment BUT it is not an aim JUST use it as a means
Copyright Antidot™ 18 Just do it Think of it as a stream paradigm build new objects using existing data without interfering with the existing infrastructure with SemWeb somewhere under the hood
Copyright Antidot™ 19 Enterprise Graph HowTo Construct the graph generate triples from data create triples from documents Leverage the graph enrich infer Browse the graph select resources build objects Trash the graph
Copyright Antidot™ 20 How: extract & normalize Harvest and normalize as in an ETL fetch, clean, transform… normalize records (names, IDs) to prepare the linking step For databases db2triples : an RDB2RDF implementation by Antidot (open source, W3C validated)
Copyright Antidot™ 21 How: semantize Don’t transform everything in RDF cherry-pick a subset of interesting fields for each object and create their RDF triples counterpart interesting == needed for linking or inferring Semantize
Copyright Antidot™ 22 How: semantize Triples generation Be smart: avoid upfront ontology design, use small vocabularies Be pragmatic: transform XML tags and field names to predicates Be agile: only insert what you need. And when you need more, add more. Semantic Web fuels the modeling, linking and information building process
Copyright Antidot™ 23 Enterprise Graph HowTo Construct the graph generate triples from data create triples from documents Leverage the graph enrich infer Browse the graph select resources build objects Trash the graph
Copyright Antidot™ 24 How: semantize Unstructured documents Extract metadata and transform them as needed to RDF. ➡ Ex: author => dc:creator Use of text-mining to extract named entities: people, organizations, products… ➡ generate those entities list using the data sources: directory for employees, CRM for companies and people, ERP for products ➡ create triples like doc_URI quotes entity_URI
Copyright Antidot™ 25 How: semantize Unstructured documents Compare documents using various and dedicated algorithms ➡ is the same ➡ is included ➡ is similar ➡ is related Generates new triples ➡ create triples like is_sub_version_of
Copyright Antidot™ 26 Enterprise Graph HowTo Construct the graph generate triples from data create triples from documents Leverage the graph enrich infer Browse the graph select resources build objects Trash the graph
Copyright Antidot™ 27 How: enrich Enrich the graph run specific algorithms to generate more links and triples (classifiers, topic detection, …) insert external data gathered from the LOD or other external datasets or APIs
Copyright Antidot™ 28 How: infer Create new knowledge add rules according to your needs IF a coworker is quoted in documents THEN the business unit is bound to the documents AND this coworker belongs to a business unit
Copyright Antidot™ 29 Enterprise Graph HowTo Construct the graph generate triples from data create triples from documents Leverage the graph enrich infer Browse the graph select resources build objects Trash the graph
Copyright Antidot™ 30 How: build Build select resources corresponding to objects seeds (using Sparql queries) for each seed, follow links smartly in order to create basic objects Build
Copyright Antidot™ 31 How: build Finalize decorate the new knowledge objects with data set apart (not loaded in the triplestore) now we have rich user-actionable objects Build Finalize
Copyright Antidot™ 32 Enterprise Graph HowTo Construct the graph generate triples from data create triples from documents Leverage the graph enrich infer Browse the graph select resources build objects Trash the graph
Copyright Antidot™ 33 How: expose Make the new information available to users and to the entire IS Enrich Harvest Classify Semantize Normalize Annotate Indexation AFS search engine RDF Triplestore (Linked Data) Relational DB
Copyright Antidot™ 34 Conclusion It works! The triples we create and the inference rules we add are dictated by the goal / application ➡ usage and value oriented We benefit from the lazy-flexible-dynamic modeling of RDF-RDFS-OWL ➡ we are agile What matters is the graph. But the graph is not the triplestore ➡ storage independent
Copyright Antidot™ 35 There’s an app for that Antidot Information Factory a software solution designed specifically to leverage structured and unstructured data enable large-scale processing of existing data automate publishing of enriched or newly created information. Harvest Normalize Semantize Enrich Build Expose
Copyright Antidot™ 36 The Giant Enterprise Graph Now we have a path to let SemWeb enter the enterprise
Copyright Antidot™ 37 THANKS FOR YOUR ATTENTION QUESTIONS? Discuss Understand Learn Exchange