Facilitating Semantic Web Search with Embedded Grammar Tags (EGTs) Gautham K.Dorai Yaser Yacoob Department of Computer Science University of Maryland – College Park
The Future – A Forecast Speech Grammar based Search Engine What is the value of Nasdaq today ? WWW The value is !! ???
Roadmap Forecast Problem Statement and Our Solution Related Work Demonstration Summary Future Work
Problem Statement (1) Web Content is represented for human consumption Software agents do not have interpretive tools for semantic information recovery Hence agents cannot understand web content
Problem Statement (2) Web content does not support queries by natural language interaction e.g. : Query : “What is the weather at College Park” - searches lead to links on related subject content
Our Solution (1) We embed natural language queries in the web content Embedded Grammar Tags (EGTs) represent queries in a general (parseable) format Discovery of relevant response by EGT matching
EGT – The Big Picture HTML Page EGT Annotation Internet Web Search Engines EGT Search QUERY NLP
Our Solution (2) EGT uses the general BNF grammar format to represent queries E.g.: * [is] the temperature [is] at College Park Captures queries such as - What is the temperature at College Park ? - Tell me what the temperature is at College Park ?
Our Solution (3) EGT structure : * can be replaced by any word/set of words () mandatory words for EGT match [] optional words Web Content is annotated with EGTs e.g: * [is] the weather [is] at College Park [is] mostly sunny
Our Solution (4) EGTs - More examples : Wind: * [is the] [wind] (speed|velocity) [of the] [wind] [is] at [is] 3mph * [is] [Nasdaq *] [the] (value|quote|price) [of Nasdaq] *
Related Work (1) Natural Language Processing (NLP) – attempt to uncover meaning in HTML content DAML, RDF, SHOE, XML – add metadata to describe the web content Facilitate more efficient content search
Related Work (2) E.g. : Gautham Dorai - special tags (, ) to describe content Fine grained natural language queries on the content require an expandable universally available tag database
Why EGTs ? (1) RDF Triple can also be used e.g: College Park - [weatherat] mostly sunny Nasdaq - [quoteof] But EGTs are naturally expandable and more amenable to change Simplifies search engine complexity
Why EGTs ? (2) EGTs describe content in an unconstrained format EGTs are already present in speech recognition technology Ease of transition from visual phone browsers
Demonstration We annotate a given home-page with EGTs The user can query the content in natural language Search engine parses the web page for EGT match and responds
EGT Annotator (Preliminary) Create a template page that is EGT-ready, i.e., EGTs are transparent to the user The template is for home-pages at CS Dept. The user can simply copy information from the HTML page onto the annotator
Summary EGTs enable software agents to respond to natural language queries An EGT search engine can be implemented on top of conventional content search engines Responses are constructed based on the extracted information from an EGT match
Future Work Expandable Universal Grammar - Universally Available query grammar packages EGT Recognition Metrics - Statistical Analysis to search for EGT matches EGT Crawler - Crawler that parses through EGT annotated web-pages