Semantics Rule, Keywords Drool J. Brooke Aker CEO Expert System USA February 2010.

Slides:



Advertisements
Similar presentations
Taxonomy & Ontology Impact on Search Infrastructure John R. McGrath Sr. Director, Fast Search & Transfer.
Advertisements

National Institute of Statistics, Geography and Informatics (INEGI) Implementation of SDMX in Mexico.
EMISSIONS REDUCTION MADE EASY! EMISSIONS REDUCTION MADE EASY!
Semantic Technology for Government Technologies, Applications, & Solutions J. Brooke Aker CEO Expert System USA Prepared for theE-Gov Conference.
Introduction Lesson 1 Microsoft Office 2010 and the Internet
Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,
Understanding Tables on the Web Jingjing Wang. Problem to Solve A wealth of information in the World Wide Web Not easy to access or process by machine.
:: DIAsDEM :: Seminar: Web Mining WS 2003/2004 Ingo Kampe Heiko Scharff.
Click to edit Master title style Click to edit Master text styles –Second level Third level –Fourth level »Fifth level June 10 th, 2009Event details (title,
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
In the universe of knowledge with linguistic intelligence and semantic logic.
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
Natural Language Query Interface Mostafa Karkache & Bryce Wenninger.
Eleventh Edition 1 Introduction to Essentials for Information Systems Irwin/McGraw-Hill Copyright © 2002, The McGraw-Hill Companies, Inc. All rights reserved.
Eleventh Edition 1 Introduction to Essentials for Information Systems Irwin/McGraw-Hill Copyright © 2002, The McGraw-Hill Companies, Inc. All rights reserved.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Libraries and Institutional Content Management Systems
Siemens Big Data Analysis GROUP 3: MARIO MASSAD, MATTHEW TOSCHI, TYLER TRUONG.
Yuri de Lugt Collexis Karin Clavel TU Delft Library.
ASIDIC Spring Conference ‘Smart Content’ Uncovering the Value and Benefits of Semantic Technology Richard C. Fusco Director, Content Strategy – McGraw-Hill.
Language Recognition… Searching with Precision Santa Clara, CA October 31, 2001 Julian Henkin Vice President, Worldwide Customer Services LexiQuest, Inc.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Databases and Data Warehouses How Do You Organize Large Amounts of Information? Chapter 10.
Information Retrieval – and projects we have done. Group Members: Aditya Tiwari ( ) Harshit Mittal ( ) Rohit Kumar Saraf ( ) Vinay.
CAREERS IN LINGUISTICS OUTSIDE OF ACADEMIA CAREERS IN INDUSTRY.
Database Design - Lecture 1
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Claudia Marzi Institute for Computational Linguistics, “Antonio Zampolli” – Italian National Research Council University of Pavia – Dept. of Theoretical.
1 The BT Digital Library A case study in intelligent content management Paul Warren
CIG Conference Norwich September 2006 AUTINDEX 1 AUTINDEX: Automatic Indexing and Classification of Texts Catherine Pease & Paul Schmidt IAI, Saarbrücken.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Semantic Technology for Business J. Brooke Aker CEO Expert System USA prepared for Gilbane San Francisco Spring 2010.
Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Chapter 3 DECISION SUPPORT SYSTEMS CONCEPTS, METHODOLOGIES, AND TECHNOLOGIES: AN OVERVIEW Study sub-sections: , 3.12(p )
When Search is not Enough Case Study: The Advertising Research Foundation Gilbane Boston November 27, 2007 Gilbane Boston November 27, 2007.
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
GTRI.ppt-1 NLP Technology Applied to e-discovery Bill Underwood Principal Research Scientist “The Current Status and.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
IBM Research © Copyright IBM Corporation 2005 | A Development Environment for Configurable Meta-Annotators in a Pipelined NLP Architecture Youssef Drissi,
Natural Language Processing for Information Retrieval -KVMV Kiran ( )‏ -Neeraj Bisht ( )‏ -L.Srikanth ( )‏
Search 2.0: The Next Chapter of Search Gora Sudindranath Senior Solutions Consultant BCS ISRG Search Solutions, May 2007.
Introduction to the Semantic Web and Linked Data
WEB PAGE CONTENTS VERIFICATION AGAINST TAGS USING DATA MINING TOOL IKNOW VІI scientific and practical seminar with international participation "Economic.
Data Mining: Text Mining
Information Retrieval
Why BI….? Most companies collect a large amount of data from their business operations. To keep track of that information, a business and would need to.
Toward Semantic Search: RDFa based facet browser Jin Guang Zheng Tetherless World Constellation.
11/23/00UNU/IAS/UNL Centre1 The Universal Networking Language United Nations University Institute of Advanced Studies United Networking Language ® UNU/IAS.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
©2012 Paula Matuszek CSC 9010: Information Extraction Overview Dr. Paula Matuszek (610) Spring, 2012.
By Kyle McCardle.  Issues with Natural Language  Basic Components  Syntax  The Earley Parser  Transition Network Parsers  Augmented Transition Networks.
Text Analytics World San Francisco – March 31, :15-4:45pm Speaker: Bryan Bell, Executive Vice President, Expert System USA  What is in Your Business.
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
E-Commerce Lecture 8.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Ontology Summit 2016 – Track B
Multimedia Information Retrieval
Social Knowledge Mining
CSE 635 Multimedia Information Retrieval
CS246: Information Retrieval
Jonathan Griffin, Managing Director, IFIS Publishing &
Artificial Intelligence 2004 Speech & Natural Language Processing
Information Retrieval
Presentation transcript:

Semantics Rule, Keywords Drool J. Brooke Aker CEO Expert System USA February 2010

Corporate background Most accurate, largest, fastest growing semantics company worldwide 100+ customers including large corporations, government in; –business intelligence- enterprise search & data extensibility –market sentiment - customer care 100+ dedicated engineers focused on core semantic technology, applications, tools and services: –200 man/years in the development of COGITO over the last 10 years. 20 years old, private & profitable –FY2008: $13.5M, 110+ employees, 30% growth each of last 3 years –Offices in Connecticut, California, UK, Italy, & Germany 2

Why Do Keywords Drool? 3 Problems with Search Technology; 1. Same Word Different Meanings Jaguar (animal) Jaguar (car) 2. Different Words Same Meaning Disability Legislation Equal Opportunity Law 3. Different Words Related Meaning Organization Company Organization Charity Organization Trade Union

Results in Declining Productivity Productivity of Search Amount of Information Databases Files & Folders Directories Keyword Search (Google) Tagging Natural Language Search Semantic Search Desktop PC Era World Wide Web Web 1.0 Social Web Web 2.0 Semantic Web Web 3.0

Information Tasks In Business Query Well Formed Query Not Well Formed DiscoveryAnalysis Exploration Sources Known Sources Not Known Search

Information Measures In Business 1.Precision: Retrieving a high level of accurate results relevant to your search query (a measure of exactness) 2.Recall: Retrieving a high percentage of relevant documents (a measure of completeness) Recall Precision low high low PowerSet Keywords Statistics Semantics

What Business Wants IT to Provide Semantics plays a role in all these except perhaps the last 2. Source: AMR Research

So What Then is the Semantic Web? Web 1.0 ProducerConsumer Web 2.0 Web 3.0 One Producer Many Consumers Everyone Produces Everyone Consumes Everyone Produces Pinpoint Consumption semantics

COGITO ® : deep analysis 4 ApproachesDefinitionExample Morphological Analysis understand word forms dog, dogs, and dog-catcher are closely related Grammatical Analysis understand the parts of speech "There are 40 rows in the table" uses rows as a noun, vs. "She rows 5 times a week" uses rows as a verb Logical Analysis understand how words relate to other words "Jeffrey Skilling, represented by Attorney Daniel Petrocelli, is married to Rebecca Carter". Rebecca is married to Jeffrey not Daniel. Semantic Analysis (disambiguation) understand the context of key words "I used beef broth for my soup stock" uses stock in the context of food, vs. "The company keeps lots of stock on hand" uses stock in the context of inventory. Technology that understands the real meaning of the words – based on theories of human comprehension

The solution is Semantics Using human comprehension for machine understanding of text. Machine understanding of text needs: A semantic network A parser to trace each text back to its basic elements A linguistic engine to query the semantic network A system to eliminate ambiguity Steps to establish meaning Semantic Network Parse Eliminate Ambiguity Order & Priority 1 23 Linguistic Query Engine

COGITO ® is generic and horizontal and can transform unstructured information in structured data that can be managed with standard databases

The heart of semantic technology ; Quality of results derived from the complexity and richness of the network. Includes all definitions of all words. Include relationships among all words. COGITO® English Semantic Network: - 350,000 words - 2.8m relationships What is a Semantic Network?

Semantic Networks Traditional technologies can only guess the meaning using; keywords, shallow linguistics, & statistics Semantic Networks instead indentify; Connections Concepts Terms Abbrev. PhrasesMeanings Domains San Jose is an American city San Jose is a geographic part of California

Semantic Network Semantic Network Semantic Network Semantic Network Technology Stack Semantic Network Linguistic Query Engine Development Studio English Arabic Italian German Other Middle Eastern 1. Morphology 2. Grammatical 4. Disambiguation Develop & Add Custom Rules 3. Logic 80% Precision 90%+ Precision

Semantic Intelligence Linguistic rules Sentence analysis Semantic Network Shallow text analytics Statistics Heuristic rules Morphological recognition Keyword-based technologies Disambiguation Entity extraction Categorization Natural lang. UI Semantic Search Discovery Sentiment 100% Semantic Technology

60KB / sec Semantic text analysis processing speed (one CPU) <10 -6 sec Scalability in number of CPUsTypical time of access to a concept in the semantic netNumber of concepts in English semantic netHyponyms and hypernymsHypernyms and troponyms Average # of attributes for each concept Number of relations in semantic net (English) Software memory footprint (semantic net and engine) 50 MB 350, , , ,800,000 Virtually unlimited Superior Performance

Expert System Unique Feature #1 Expanded Definition Sets - captures all possible ways of expressing a concept, beyond the use of a single word; Compound word – like blackbird or cookbook Collocation – like overhead projector or landing field Idiomatic expression – like to fly off the handle or to weight anchor Locutions – group of words that express simple concepts that cannot be expressed by a single word Verbal lemmas – such as a verb in the infinitive form, e.g. to write, or verbal collocations, e.g. to sneak away Keyword / Statistical and Shallow Semantic Tech Fails Here treats to fly off the handle all as separate words not as a concept.

Expert System Unique Feature #2 Expanded Semantic Relations - expanded set (65) of relations between concepts by looking at their use within the text. Answers questions like Who did what to whom?, often called a triple or a subject-action-object. WordNet for example contains only 5 relation types. Verb / Subject Verb / Direct Object Adjective / Class Syncon / Class Syncon / Corpus Syncon / Geography Fine Grain / Coarse Grain Supernomen / Subnomen Omninomen / Parsnomen Keyword / Statistical and Shallow Semantic Tech Fails Here treats RIM sued Verizon as the same thing as Verizon sued RIM

Expert System Unique Feature #3 Categories of Attributes – every concept in the semantic network also contains attributes which are organized into a hierarchy of categories. The attributes and categories are assigned to maximize similarities and differences between concepts as an aid in disambiguation. object animals plants people concepts places time natural phenomena states quantity groups Keyword / Statistical and Shallow Semantic Tech Fails Here cant tell you what portions of a document are related to categorically … e.g. only points to words not sections within a long document as a first cut.

Thank you Brooke Aker CEO of Expert System US