Presentation is loading. Please wait.

Presentation is loading. Please wait.

XAIRA is an XML Aware Indexing and Retrieval Architecture ● Developed from the British National Corpus Sara program, it provides: – platform-independent.

Similar presentations


Presentation on theme: "XAIRA is an XML Aware Indexing and Retrieval Architecture ● Developed from the British National Corpus Sara program, it provides: – platform-independent."— Presentation transcript:

1 XAIRA is an XML Aware Indexing and Retrieval Architecture ● Developed from the British National Corpus Sara program, it provides: – platform-independent XML indexer and server components – a toolkit and query interface for Windows ● All server functions and structures are defined by the Xaira Object Model: an open API for language corpus work ● Developed with funding from the Andrew W Mellon foundation, xaira is an open source system released under the GPL ➔ http://xaira.sf.org

2 Yes, but what does it do? ● Xaira facilitates linguistic exploration of corpora with or without XML markup. ● Specifically, it produces – word lists and lexica – KWIC concordances – collocation and colligation lists – distribution patterns ● Intelligent search and retrieval, based on markup and marked-up structures

3 A word query ● Enter a stem or pattern ● See forms and frequencies ● See selected variants ● Look up hits ● Save wordlists

4 A Keyword in Context concordance

5 A collocation list ● After any query... ● See word forms that collocate with it ● Adjust window, filter by frequency, select score ● Look up collocates ● Save wordlists

6 adverbial pretty vs. rather

7 Let's get started! (1) Start the xaira-tools program (2) Select Index Wizard from File menu (3) Name your corpus (4) Find some plain text files (5) Index them (6) Look at the results with Xaira client

8 What did we just do? Texts filelist.xml bib.xml corpus_header.xml corpus_parameters.xml corpus wizard Index indexer


Download ppt "XAIRA is an XML Aware Indexing and Retrieval Architecture ● Developed from the British National Corpus Sara program, it provides: – platform-independent."

Similar presentations


Ads by Google