Download presentation
Presentation is loading. Please wait.
Published byCathleen Simon Modified over 9 years ago
1
Powerset Natural Language and the Semantic Web Barney Pell, Ph.D. Founder and CTO ISWC 2007
2
© 2007 Powerset Page 2 Powerset Outline Natural language helps the semantic web Powerset natural language search Demos Vision for ecosystem acceleration
3
© 2007 Powerset Page 3 Powerset Semantic Web Chicken and Egg Semantic markup and resources are costly People will work these at scale only when there are valuable applications Applications are not viable without semantic markup and resources Hence, semantic web is slow to realize its potential
4
© 2007 Powerset Page 4 Powerset NLP addresses Chicken and Egg NLP reduces semantic development effort ► Create annotations from unstructured text ► Generate ontologies Natural language search is a great application ► Consuming semantic web information ► Exposing semantic web services in response to natural language queries
5
© 2007 Powerset Page 5 Powerset Powerset: A natural language search company Our goal is to enable people to interact with information and services as naturally and effectively as possible We combine deep NL and scalable search technology How we do it: natural language search ► Interpret the web ► Index ► Interpret the query ► Search… Match Our system creates and uses Semantic Web information in multiple ways
6
© 2007 Powerset Page 6 Powerset What’s next in web search? Goal: Matching query intents with document intents Changes to document model drive largest innovations: ► Proximity: shift from “doc as bag-of-keywords” to “doc as vector-of-keywords” ► Anchor Text: Adding off-page text to doc What unexploited aspect of the document is next?
7
© 2007 Powerset Page 7 Powerset Linguistic Structure Documents are loaded with linguistic structure Currently this is mostly discarded and ignored ► Largely unaddressed on the document side due to computational cost and complexity ► Most previous work focuses on query only But this structure has immense value! It is how the document’s intent is encoded.
8
© 2007 Powerset Page 8 Powerset Powerset’s Semantic Index The semantic indexer cracks the linguistic structure to extract meaning Applies deep NLP to the entire corpus to build a rich representation This new kind of index is a platform for innovation that allows for greatly expanded capabilities
9
© 2007 Powerset Page 9 Powerset Natural language for consumer search Economics ► Costs came down: Moore’s law, Tech speedups ► Revenue came up: Search ad monetization User experience can be transformed with current NL capabilities ► Perfection not required ► Robust broad-coverage integrated systems ► Multi-lingual core platform, languages as plug-ins Change user behavior ► Change to something easier ► Already happening (voice, mobile, Yahoo Answers, Shortcuts) ► Can give results even with today’s behavior
10
© 2007 Powerset Page 10 Powerset Converging trends Language technology ► (Relatively) efficient and mature language computation (XLE) ► Large scale grammars (Pargram) Lexical and ontological knowledge resources ► Existing: Wordnet, verbnet, framenet, SUMO… ► Data-driven acquisition methods Moore’s law ► Computing and storage cheap enough now, getting cheaper Open source software ► Cluster management, map-reduce, big-table ► Effect: can now down-load Google/Yahoo core competencies Commodity computing: Amazon EC2, S3, SDS
11
© 2007 Powerset Page 11 Powerset Sir Edward Heath died from pneumonia... Sir Edward Heath (noun) subj from Sir Edward Heath (name) UK Prime Minister politician ► Parses each sentence on the page ► Extracts entities & semantic relationships ► Identifies and expands to similar entities, relationships & abstractions ► Indexes multiple facts for each sentence Powerset reads each sentence by pneumonia (noun) disease die (verb) killed
12
© 2007 Powerset Page 12 Powerset Multiple queries retrieve the same “facts”
13
© 2007 Powerset Page 13 Powerset Integrating Diverse Resources Powerset’s Natural Language Technology enables integrated search results Based on the answer to the query Can tap into multiple different sources, e.g.: Websites Newsfeeds Blogs Archives Metadata Video Podcasts Databases From web results From Freebase From Blinx
14
© 2007 Powerset Page 14 Powerset NL Database Query (Entertainment) Natural language query converted into database query Results from database drive further engagement
15
© 2007 Powerset Page 15 Powerset Parse Generate Select Transfer Interpret Transfer/Glue Semantics LFG Syntax Finite-state morphology English French Japanese German Algorithms Engineering Mathematics Multidimensional, multilingual architecture from long-term research Language Technologies: Parc ⇔ Powerset Smart summaries Meaning-sensitive applications Translation Question answering Consumer Search Relation extraction Dialog Chinese Norwegian Stochastic models Entity recognition Note-taking Theory Software Tableware Ambiguity Management Scale Community knowledge resources XLEXLE Pargram
16
© 2007 Powerset Page 16 Powerset We parse the Web.
17
© 2007 Powerset Page 17 Powerset Natural Language search architecture
18
© 2007 Powerset Page 18 Powerset Also required… Scalability ► Coordinated computing on 1000’s of machines ► Serving millions of users ► Managing processors, storage, communication System integration ► Robust combination of complex components Data resources ► For guidance, training, testing ► Parse banks, relevance and ranking sets… Knowledge resources ► Lexical and conceptual mappings ► Facts, inferences ► Acquisition Strategies ■ Manual, Semi-Automatic, Automatic, Community Resources User experience: create expectations, change behavior
19
© 2007 Powerset Page 19 Powerset Ecosystem acceleration Wisdom of crowds can accelerate semantic web Publishers ► Uploading ontologies to get more traffic ► Get feedback on improving their content Users ► Play games to create and improve resources ► Provide feedback to get better search ► Create ontologies for personalization and groups Developers ► Package knowledge for specialized apps So what starts as a broad platform gets deeper faster Realizing a semantic web faster than expected
20
© 2007 Powerset Page 20 Powerset Community involvement plans Powerset Open Platform ► API ► Access to technology to build mashups, applications Powerset Contributions ► Datasets ► Annotations ► Open Source Software Partnerships & Collaborations Powerlabs ► Early Access to Powerset Technology ► Tuning Product Concept & Design ► High-Value feedback from users eager to help ► Signup at www.powerset.com
21
© 2007 Powerset Page 21 Powerset Thank you
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.