Presentation is loading. Please wait.

Presentation is loading. Please wait.

20/07/2000, Page 1 HYPERGEO 1 st technical verification ARISTOTLE UNIVERSITY OF THESSALONIKI NATURAL LANGUAGE REQUEST ANALYSIS COMPONENT.

Similar presentations


Presentation on theme: "20/07/2000, Page 1 HYPERGEO 1 st technical verification ARISTOTLE UNIVERSITY OF THESSALONIKI NATURAL LANGUAGE REQUEST ANALYSIS COMPONENT."— Presentation transcript:

1 20/07/2000, Page 1 HYPERGEO 1 st technical verification ARISTOTLE UNIVERSITY OF THESSALONIKI NATURAL LANGUAGE REQUEST ANALYSIS COMPONENT

2 20/07/2000, Page 2 HYPERGEO 1 st technical verification ARISTOTLE UNIVERSITY OF THESSALONIKI Goal: to develop a document retrieval system based on statistical natural language processing. Steps implemented: Corpus acquisition Corpus preprocessing and feature extraction Creation of word category map Development of baseline document retrieval system.

3 20/07/2000, Page 3 HYPERGEO 1 st technical verification ARISTOTLE UNIVERSITY OF THESSALONIKI Hypergeo Corpus profile

4 20/07/2000, Page 4 HYPERGEO 1 st technical verification ARISTOTLE UNIVERSITY OF THESSALONIKI Hypergeo Corpus Themes

5 20/07/2000, Page 5 HYPERGEO 1 st technical verification ARISTOTLE UNIVERSITY OF THESSALONIKI Corpus Processing Corpus checking & merging Text processing (html & plain text cleaning) Stemming (Porter with stop-list) Feature Vector Extraction Stem frequencies computation Vocabulary construction Bigram generation Manipulation of stem frequencies & bigram files Collection of contextual statistics (average context vector) Corpus processing and feature vector extraction

6 20/07/2000, Page 6 HYPERGEO 1 st technical verification ARISTOTLE UNIVERSITY OF THESSALONIKI Word Category Map Creation Fast winner search Random projections for dimensionality reduction Word category map (first results)

7 20/07/2000, Page 7 HYPERGEO 1 st technical verification ARISTOTLE UNIVERSITY OF THESSALONIKI Future objectives Work done so far

8 20/07/2000, Page 8 HYPERGEO 1 st technical verification ARISTOTLE UNIVERSITY OF THESSALONIKI 256300350400450500512

9 20/07/2000, Page 9 HYPERGEO 1 st technical verification ARISTOTLE UNIVERSITY OF THESSALONIKI

10 20/07/2000, Page 10 HYPERGEO 1 st technical verification ARISTOTLE UNIVERSITY OF THESSALONIKI (23,14) node backpack born calahorra citi highland huelva marri mountain nice pseudo student victor (24,15) node anaya andov basqu beauti build doric extremadura fu goddess ibiza magic monasterio pulpit stone visit (25,15) node arrebatacapa bedroom danc mobil orlean peplo street therapeut torno tour (22,4) node artwork impoverish museo personalis rainer spindleruv tourism veranda vouli (1,13) node clarksburg divers glasgow hump monasteri part portug roman romn tel (11,15) node abdul arco belmaco creme fresco got gothic liter masstourist mediev mourn palac riunion rua splendid (13,14) node american central montreal perch process produc raymond romant scienc serv triumphal unusu upgrad venic (9,13) node burjasot catacomb centr cid comfort culmin histor miss novemb pelagio piedra pride silesia stit Some of the characteristic nodes on the network

11 20/07/2000, Page 11 HYPERGEO 1 st technical verification ARISTOTLE UNIVERSITY OF THESSALONIKI Baseline document retrieval system Statistics: collection frequency, term frequency and document length. Query terms are given by the user Stemming of the query terms (Simple and Porter Stemmer) Look up of each query term in the structure that holds term- document-combined weight Document’s score calculation: sum of the combined weights of all the query terms in the specific document Document Ranking: determined by the user a. according to their estimated score b. according to i) the number of query terms that appear in it and ii) their estimated score 20/07/2000, Page 7

12 20/07/2000, Page 12 HYPERGEO 1 st technical verification ARISTOTLE UNIVERSITY OF THESSALONIKI Recall – Precision Graph for the query “museum” 20/07/2000, Page 11


Download ppt "20/07/2000, Page 1 HYPERGEO 1 st technical verification ARISTOTLE UNIVERSITY OF THESSALONIKI NATURAL LANGUAGE REQUEST ANALYSIS COMPONENT."

Similar presentations


Ads by Google