Jon Atle GullaSpråkteknologi og innovasjon1 Språkteknologi i industrielle anvendelser Or: How we have commercialized linguistic technologies 1. Linguistics in search 2. Semantics for interoperability Jon Atle Gulla Norwegian University of Science and Technology, Trondheim, Norway 3. Ontologies in process mining 4. Linguistics in news reporting
Who am I? Professor, Information Systems group, IDI/NTNU Education: Siv.ing./dr.ing. (information systems, NTH) Cand.philol. (linguistics, AVH) MSc (management, London Business School) Work experience: Fast Search & Transfer, Munich (linguistics in search) Norsk Hydro, Brussels (enterprise systems) GMD, Darmstadt (information retrieval) Field of research: Search technologies Semantic Web Social Web Sentiment analysis and recommendations Jon Atle GullaICEIS 20082
1. The FAST Alltheweb.com site 2000: Alltheweb.com was one of the largest search engines on the Internet FAST acquired Elexir Sprachtechnologie in Munich Intended to add linguistics to search engine Query Retrieved documents Jon Atle GullaSpråkteknologi og innovasjon
Linguistic Techniques in FAST Linguistics in search: Documents Categories of documents Search options Category-based selection All selected Categorizing techniques Reduced search space Relevant documents Transformed documents Query Transformed query Content-based search Keyword-based search Transformational techniques Increased semantics Presentational techniques List of documents Presentation of document list Content-based access Title-based access Improved transparency Language identification Spam detection Topic categorization Lemmatization Phrasing Anti-phrasing Clustering Jon Atle GullaSpråkteknologi og innovasjon
The FAST Experience Linguistics a small part of a large system Linguistics as behind-the-scene technology Linguistics not a major breakthrough Linguistics is not easy: Data-intensive Only statistical approaches feasible at the time Jon Atle GullaICEIS What happened to FAST? 2003: Internet part sold to Overture (Yahoo) 2009: Enterprise part sold to Microsoft What happened to FAST? 2003: Internet part sold to Overture (Yahoo) 2009: Enterprise part sold to Microsoft
2. Semantics in Interoperability Semantic Web: Adding semantics to data/services for humans and computers to communicate better Ontology: Explicit representation of a shared conceptualization (domain terminology model) Semantic markup languages for ontology building (OWL, RDF) 2003: Petromax IIP project for construction of ontology for the oil & gas sector (based on ISO15926) 2011: EU LinkedDesign project for use of ontologies in manufacturing processes Jon Atle GullaICEIS 20086
Jon Atle GullaICEIS Silly Semantic Conflicts Prevent Data harmonization Even simple terms are misunderstood
Jon Atle GullaICEIS … An artefact that is an assembly of pipes and piping parts, with valves and associated control equipment that is connected to the top of a wellhead and is intended for control of fluid from a well. CHRISTMAS TREE … OWL petroleum ontology
SemanticWeb Lessons Learned Data integration and harmonization improved in sector But: Demanding and complex technologies Semantic Web technologies still immature and expensive So far few commercial solutions using semantic technologies (Some work on ontology-driven search applications) Jon Atle GullaICEIS 20089
3. Ontologies in Process Mining Process mining: Techniques and tools for discovering process flow, control, data, organizational and social structures from enterprise systems’ event logs Dynamic reporting for exposing real business flows and explaining interesting transaction patterns Semantic process mining: Using ontologies to improve the interpretation of event logs and the construction of business flows Jon Atle GullaICEIS
Semantic Process Mining Jon Atle GullaICEIS Detected process flow Formal definition of process terminology Ontology
Commercialization of Technology 2004: Businesscape founded Ongoing work on Enterprise Visualization Suite: Combines two challenging technologies (data mining and Semantic Web) Substantial improvement from traditional process mining (and traditional reporting tools) However: Difficult to explain the complexity and capability of solution to customers Few customers competent enough to distinguish process mining from traditional reporting Jon Atle GullaICEIS
4. Linguistics in News Reporting Semantic approaches to news reporting: Extract content from news articles Validate content of articles Opinion mining from news articles and social sites Model user preferences for news recommendation Combine/aggregate knowledge from heterogenous sources Commercial potential uncertain Jon Atle GullaICEIS
Conclusions Linguistics often a supporting technology Good linguistic resources tedious and expensive to develop Not always easy to justify inclusion of linguistics Linguistics in our projects: Enable new services and products Enhance existing services and products Jon Atle GullaICEIS