Presentation is loading. Please wait.

Presentation is loading. Please wait.

Keyword Searching Weighted Federated Search with Key Word in Context Date: 10/2/2008 Dan McCreary President Dan McCreary & Associates

Similar presentations


Presentation on theme: "Keyword Searching Weighted Federated Search with Key Word in Context Date: 10/2/2008 Dan McCreary President Dan McCreary & Associates"— Presentation transcript:

1 Keyword Searching Weighted Federated Search with Key Word in Context Date: 10/2/2008 Dan McCreary President Dan McCreary & Associates dan@danmccreary.com (952) 931-9198 M D Metadata Solutions

2 M D Copyright 2008 Dan McCreary & Associates2 Acknowledgements Joe Wicentowski wrote the original keyword search examples Joe’s work was based on the KWIC code done by Wolfgang Meier

3 M D Copyright 2008 Dan McCreary & Associates3 Note About Example and Functions In an actual production system the code would be modularized into a series of functions This example has the functions intentionally removed to make the process easier to view A functionalized version will also be available for students to use in their production applications

4 M D Copyright 2008 Dan McCreary & Associates4 Motivation You have a large complex web site with many heterogeneous data collections –people, blogs, news stories, event calendar etc Want a single search function that will find any item in any of these collections Each item has different: –Collection –Title –Item Viewer Function

5 M D Copyright 2008 Dan McCreary & Associates5 Heterogeneous Items in a Collection Search results come back as heterogeneous items in a sequence Each hit item has a different structure Each hit item has a document type and the title is consistently at the same XPath expression for each item type t t tt t sequence of hit items title element hit item person blog countrypersonblog

6 M D Copyright 2008 Dan McCreary & Associates6 Detailed Steps Gather search keywords Construct scope (collections) Execute query (generate hits) Score and sort Prepare summary results for top hits Display top results

7 M D Copyright 2008 Dan McCreary & Associates7 Basic Search Algorithm let $q := get-parameter(“q”, “”) for $hit in $collection-list/type [$hit contains($hit, $q)] return $hit 1.Get the search query 2.Find the documents that match [ ] is like the SQL where statement 3.Return a short summary of the matching documents pseudo-code

8 M D Copyright 2008 Dan McCreary & Associates8 Collection Paths and Predicate for $hit in (collection('/db/test/articles')/article/body, collection('/db/test/people')/person/biography) [. &= $q] In a production system the list of collections would be stored in an XML file and a function would return a sequence of the the collections

9 M D Copyright 2008 Dan McCreary & Associates9 Sample HTML Search Form Keyword Search Keyword Search Keyword Search: The path to XQuery REST service that your form uses

10 M D Copyright 2008 Dan McCreary & Associates10 Protection against injection attacks let $q := xs:string(request:get-parameter("q", "")) let $filtered-q := replace($q, "[&"-*;-`~!@#$%^*()_+=\[\]\{\}\|';:/.,?(:]", "") This will remove any characters from the input query that might contain characters any special characters that could be used as SQL injection attacks.

11 M D Copyright 2008 Dan McCreary & Associates11 Create a Scope Sequence let $scope := ( collection('/db/test/articles')/article/body, collection('/db/test/people')/people/person/biography ) A scope is the list of all the items that you will query against. Note that we will usually replace this “inline” scope variable with a function xrx:get-searchable-collections() to search for all collections in the future

12 M D Copyright 2008 Dan McCreary & Associates12 Scoring Each Hit let $keyword-matches := text:match-count($hit) let $hit-node-length := string-length($hit) let $score := $keyword-matches div $hit-node-length text:match-count() is the number of times a hit matches a keyword hit. If a document has five occurrences of the keywords the match count would return 5. Once you have the sequence of hits, you can now score each of the hits and return a new sequence of the top scoring hits. In the example above the score is the number of matches within the document divided by the total length of the document (in this case the total number of characters in the file).

13 M D Copyright 2008 Dan McCreary & Associates13 Score and Sort let $sorted-hits := for $hit in $hits let $keyword-matches := text:match-count($hit) let $hit-node-length := string-length($hit) let $score := $keyword-matches div $hit-node- length order by $score descending return $hit Once you have the sequence of hits, you can now score each of the hits and return a list of the top scoring hits

14 M D Copyright 2008 Dan McCreary & Associates14 Result Pagination let $perpage := xs:integer(request:get-parameter("perpage", "10")) let $start := xs:integer(request:get-parameter("start", "0")) let $end := $start + $perpage let $results := for $hit in $sorted-hits[$start to $end] The remainder of our example deals with iterating through the results N records at time where N is the number of results per page ($perpage). In this case $perpage and $start are both optional parameters to our search query. $end is the sum of the start and the number per page. Adding the [$start to $end] to a new query is the same as performing a subsequence() operation on the sorted hist to get the final $result sequence to display on the page.

15 M D Copyright 2008 Dan McCreary & Associates15 Showing Results With Highlighted Keyword in Context We want to show each result as an HTML div element containing 3 components: –The document title –a summary with an excerpt of the hit showing the keywords highlighted in context –and a link to display the full document

16 M D Copyright 2008 Dan McCreary & Associates16 Extracting the Collection and Document let $collection := util:collection-name($hit) let $document := util:document-name($hit) We did not need to keep track of the original collection and document that the hit came from because we can always find the collection and document using the these two functions.

17 M D Copyright 2008 Dan McCreary & Associates17 KWIC Functions let $summary := kwic:summarize($hit, $config)

18 M D Copyright 2008 Dan McCreary & Associates18 Displaying the Keyword in Context The word or words you used in your search should be highlighted in the context of the search results. You can customize how much of the surrounding text you want to display.

19 M D Copyright 2008 Dan McCreary & Associates19 Calculating number of pages let $perpage := xs:integer(request:get-parameter("perpage", "10")) let $start := xs:integer(request:get-parameter("start", "0")) let $total-result-count := count($hits) let $end := if ($total-result-count lt $perpage) then $total-result-count else $start + $perpage let $number-of-pages := xs:integer(ceiling($total-result-count div $perpage)) let $current-page := xs:integer(($start + $perpage) div $perpage)

20 M D Copyright 2008 Dan McCreary & Associates20 Managing Federated Search Each application you use needs to communicate the following items to the federated search tool: –Collection name –Collection data path –Collection document path –Collection title path –Collection id path –Collection viewer path

21 M D Copyright 2008 Dan McCreary & Associates21 Sample App Config File Articles /db/test/articles article/body article/title/text() article/id/text() /db/test/articles/views/view-article.xq?id= If you create a file called app-info.xml in each collection that you want to search on you can create dynamically create a list of applications that you want to search. If you do this you can automate the installation of interoperable applications.

22 M D Copyright 2008 Dan McCreary & Associates22 Thank You! Please contact me for more information: Native XML Databases Metadata Management Metadata Registries Service Oriented Architectures Business Intelligence and Data Warehouse Semantic Web Dan McCreary, President Dan McCreary & Associates Metadata Strategy Development dan@danmccreary.com (952) 931-9198


Download ppt "Keyword Searching Weighted Federated Search with Key Word in Context Date: 10/2/2008 Dan McCreary President Dan McCreary & Associates"

Similar presentations


Ads by Google