Download presentation
Presentation is loading. Please wait.
1
Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering Andrew Zitzelberger
2
Problem
3
Constraint Based Queries
4
Queries Test Queries 1) Find me a Wii game. 2) Find me a Honda for under 15 thousand dollars. 3) Roller Coaster more than 150 feet high 4) mountains at least 15K feet 5) games under $25 6) mountains less than 4 km 7) ps games < $40 8) coasters longer than 1000 feet 9) car for under 5 grand newer than 1990 with less than 115K miles 10) more than 15K miles under 5 grand newer than 2004
5
Keywords + Semantics Semantic queries are computationally expensive Keyword queries are fast and simple o People are used to keyword queries Synergistic solution: o extract numerical constraints from the query o use keywords to quickly narrow the search space o use constraints as a filter
6
Data Frames Price internal representation: Double external representation: \$[1-9]\d{0,2}(,\d{3})*|...... right units: (K)?\s*(cents|dollars|[Gg]rand|...) canonicalization method: toUSDollars comparison methods: LessThan(p1: Price, p2: Price) returns (Boolean) external representation: (less than|<|under|...)\s*{p2}|...... end
7
Data Frame Library
8
Free Form Query Car under 6 grand newer than 1990 with less than 115K miles
9
Step 1: Condition Extraction Car under 6 grand newer than 1990 with less than 115K miles Extracted Conditions o (Price < 6000) o (Year > 1990) o (Distance < 115000)
10
Step 2: Remove Condition Values Car under newer than with less than
11
Step 3: Remove Stopwords Car
12
Step 4: Perform Keyword Search
13
Step 5: Filter Document on Constraints Keep page if every constraint is satisfied by at least one extracted value
14
Experimental Setup 300 web documents o 100 car+trucks pages from http://provo.craigslist.org o 100 video gaming pages from http://provo.craigslist.org o 50 mountain pages from http://en.wikipedia.org o 50 roller coaster pages from http://en.wikipedia.org 10 queries o 8 with usable conditions 2 data sets o test-development o blind test
15
Results Summary Precision increase for 56% of queries o 75% for test-dev, 50% for blind-test Precision never worse than keyword query Most effective for short, focused documents
16
Discussion Issues: 1.inadequate narrowing or ranking of search space 2.noise caused by other numbers Distance < 115000
17
Future Work Scalability o Indexing data frame extracted terms Precision vs Recall trade-offs Pay-as-you-go search construction
18
Related Work Question-Answering Systems Keyword search over databases and semantic stores
19
Questions?
20
Results (Test-Dev Set)
21
Results (Blind Test Set)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.