Download presentation
Presentation is loading. Please wait.
Published byAnastasia Wiggins Modified over 9 years ago
1
Departamento de Lenguajes y Sistemas Informáticos Cross-language experiments with IR-n system CLEF-2003
2
IR-n system is a Passage Retrieval System. IR-n system participated in the Conferences CLEF-2001 and CLEF-2002 in Spanish monolingual task. This year we have participated Monolingual (Spanish, French, German, Italian). Bilingual (Italian-Spanish). Multilingual (4 languages). IR-n system Introduction
3
Index IR-n System Multilingual Experiments Passage Retrieval Systems Bilingual Experiments Conclusions Monolingual Experiments
4
Index IR-n System Monolingual Experiments Multilingual Experiments Passage Retrieval Systems Bilingual Experiments Conclusions Passage Retrieval Systems
5
Relevance measures General Custer was Civil War Union Major soldier. One of the most famous and controversial figures in United States Military history. Graduated last in his West Point Class (June 1861). Spent first part of the Civil War as a courier and staff officer. Promoted from Captain to Brigadier General of Volunteers just prior to the Battle of Gettysburg, and was given command of the Michigan "Wolverines" Cavalary brigade. He helped defeat General Stuart's attempt to make a cavalry strike behind Union lines on the 3rd Day of the Battle (July 3, 1863), thus markedly contributing to the Army of the Potomac's victory (a large monument to his Brigade now stands in the East Cavalry Field in Gettysburg). Participated in nearly every cavalry action in Virginia from that point until the end of the war, always performing boldly, most often brilliantly, and always seeking publicity for himself and his actions. Ended the war as a Major General of Volunteers and a Brevet Major General in the Regular Army. Upon Army reorganization in 1886, he was appointed Lieutenant Colonel of the soon to be renown 7th United States Cavalry. Fought in the various actions against the Western Indians, often with a singular brutality (exemplified by his wiping out of a Cheyenne village on the Washita in November 1868). His exploits on the Plains were romanticized by Eastern Unites States newspapermen, and he was elevated to legendary status in his time. The death of his friend, Lucarelli change his life. The death of General Custer General Custer death Shared terms between document and query
6
General Custer was Civil War Union Major soldier. One of the most famous and controversial figures in United States Military history. Graduated last in his West Point Class (June 1861). Spent first part of the Civil War as a courier and staff officer. Promoted from Captain to Brigadier General of Volunteers just prior to the Battle of Gettysburg, and was given command of the Michigan "Wolverines" Cavalary brigade. He helped defeat General Stuart's attempt to make a cavalry strike behind Union lines on the 3rd Day of the Battle (July 3, 1863), thus markedly contributing to the Army of the Potomac's victory (a large monument to his Brigade now stands in the East Cavalry Field in Gettysburg). Participated in nearly every cavalry action in Virginia from that point until the end of the war, always performing boldly, most often brilliantly, and always seeking publicity for himself and his actions. Ended the war as a Major General of Volunteers and a Brevet Major General in the Regular Army. Upon Army reorganization in 1886, he was appointed Lieutenant Colonel of the soon to be renown 7th United States Cavalry. Fought in the various actions against the Western Indians, often with a singular brutality (exemplified by his wiping out of a Cheyenne village on the Washita in November 1868). His exploits on the Plains were romanticized by Eastern Unites States newspapermen, and he was elevated to legendary status in his time. The death of his friend, Lucarelli change his life. Passage Retrieval Systems IR systems based on whole document IR systems study in a global way to determine the similarity between a query and a document
7
Use a short fragments of documents instead of whole documents to evaluate the relevance or similarity. These fragments are called passages. Each document is divided into passages before calculate the relevance. Passage Retrieval Systems Definition
8
General Custer was Civil War Union Major soldier. One of the most famous and controversial figures in United States Military history. Graduated last in his West Point Class (June 1861). Spent first part of the Civil War as a courier and staff officer. Promoted from Captain to Brigadier General of Volunteers just prior to the Battle of Gettysburg, and was given command of the Michigan "Wolverines" Cavalary brigade. He helped defeat General Stuart's attempt to make a cavalry strike behind Union lines on the 3rd Day of the Battle (July 3, 1863), thus markedly contributing to the Army of the Potomac's victory (a large monument to his Brigade now stands in the East Cavalry Field in Gettysburg). Participated in nearly every cavalry action in Virginia from that point until the end of the war, always performing boldly, most often brilliantly, and always seeking publicity for himself and his actions. Ended the war as a Major General of Volunteers and a Brevet Major General in the Regular Army. Upon Army reorganization in 1886, he was appointed Lieutenant Colonel of the soon to be renown 7th United States Cavalry. Fought in the various actions against the Western Indians, often with a singular brutality (exemplified by his wiping out of a Cheyenne village on the Washita in November 1868). His exploits on the Plains were romanticized by Eastern Unites States newspapermen, and he was elevated to legendary status in his time. The death of his friend, Lucarelli change his life. 1 – Define the passages 2 – Evaluate the relevance of each passage 3 – Evaluate the relevance of document in function of passages relevance Passage Retrieval Systems Definition (II) Steps
9
Add the concept of proximity to calculate the similarity between document and query Allow locate short relevant fragments on a non- relevant documents Avoid the difficulties of comparing documents of different length Passage Retrieval Systems Advantages
10
Index IR-n System Monolingual Experiments Multilingual Experiments Passage Retrieval Systems Bilingual Experiments Conclusions IR-n System
11
General Custer was Civil War Union Major soldier. One of the most famous and controversial figures in United States Military history. Graduated last in his West Point Class (June 1861). Spent first part of the Civil War as a courier and staff officer. Promoted from Captain to Brigadier General of Volunteers just prior to the Battle of Gettysburg, and was given command of the Michigan "Wolverines" Cavalary brigade. He helped defeat General Stuart's attempt to make a cavalry strike behind Union lines on the 3rd Day of the Battle (July 3, 1863), thus markedly contributing to the Army of the Potomac's victory (a large monument to his Brigade now stands in the East Cavalry Field in Gettysburg). Participated in nearly every cavalry action in Virginia from that point until the end of the war, always performing boldly, most often brilliantly, and always seeking publicity for himself and his actions. Ended the war as a Major General of Volunteers and a Brevet Major General in the Regular Army. Upon Army reorganization in 1886, he was appointed Lieutenant Colonel of the soon to be renown 7th United States Cavalry. Fought in the various actions against the Western Indians, often with a singular brutality (exemplified by his wiping out of a Cheyenne village on the Washita in November 1868). His exploits on the Plains were romanticized by Eastern Unites States newspapermen, and he was elevated to legendary status in his time. The death of his friend, Lucarelli change his life. 1 – Definition of passages 2 – Evaluate the relevance of each passage 3 – Evaluate the relevance of document in function of passages relevance IR-n system Definition Steps
12
IR-n system use the sentence to define the passages Every passage have the same number of sentences This number depends on The collection of documents Size of the query A sentence expresses an idea in the document There are algorithms to obtain each sentence with a high precision Sentences are full units allowing to show an understandable information by users or provide this information to a subsequent system IR-n system Passage concept
13
General Custer was Civil War Union Major soldier. One of the most famous and controversial figures in United States Military history. Graduated last in his West Point Class (June 1861). Spent first part of the Civil War as a courier and staff officer. Promoted from Captain to Brigadier General of Volunteers just prior to the Battle of Gettysburg, and was given command of the Michigan "Wolverines" Cavalary brigade. He helped defeat General Stuart's attempt to make a cavalry strike behind Union lines on the 3rd Day of the Battle (July 3, 1863), thus markedly contributing to the Army of the Potomac's victory (a large monument to his Brigade now stands in the East Cavalry Field in Gettysburg). Participated in nearly every cavalry action in Virginia from that point until the end of the war, always performing boldly, most often brilliantly, and always seeking publicity for himself and his actions. Ended the war as a Major General of Volunteers and a Brevet Major General in the Regular Army. Upon Army reorganization in 1886, he was appointed Lieutenant Colonel of the soon to be renown 7th United States Cavalry. Fought in the various actions against the Western Indians, often with a singular brutality (exemplified by his wiping out of a Cheyenne village on the Washita in November 1868). His exploits on the Plains were romanticized by Eastern Unites States newspapermen, and he was elevated to legendary status in his time. The death of his friend, Lucarelli change his life. 1 – Obtains the sentences of the document 2 – Defines the passages in base of a number fixed of sentences (5) IR-n system Passage concept (II) IR-n system defines the passages in the following way SENTENCE 1 SENTENCE 2 SENTENCE 3 SENTENCE 4 SENTENCE 5 SENTENCE 6 SENTENCE 7 SENTENCE 8 SENTENCE 9 SENTENCE 10 SENTENCE 11 SENTENCE 12 SENTENCE 13 SENTENCE 14 SENTENCE 15 Passage 1 Passage 2 Passage 3
14
General Custer was Civil War Union Major soldier. One of the most famous and controversial figures in United States Military history. Graduated last in his West Point Class (June 1861). Spent first part of the Civil War as a courier and staff officer. Promoted from Captain to Brigadier General of Volunteers just prior to the Battle of Gettysburg, and was given command of the Michigan "Wolverines" Cavalary brigade. He helped defeat General Stuart's attempt to make a cavalry strike behind Union lines on the 3rd Day of the Battle (July 3, 1863), thus markedly contributing to the Army of the Potomac's victory (a large monument to his Brigade now stands in the East Cavalry Field in Gettysburg). Participated in nearly every cavalry action in Virginia from that point until the end of the war, always performing boldly, most often brilliantly, and always seeking publicity for himself and his actions. Ended the war as a Major General of Volunteers and a Brevet Major General in the Regular Army. Upon Army reorganization in 1886, he was appointed Lieutenant Colonel of the soon to be renown 7th United States Cavalry. Fought in the various actions against the Western Indians, often with a singular brutality (exemplified by his wiping out of a Cheyenne village on the Washita in November 1868). His exploits on the Plains were romanticized by Eastern Unites States newspapermen, and he was elevated to legendary status in his time. The death of his friend, Lucarelli change his life. 1 – Definition of passages 2 – Evaluate the relevance of each passage 3 – Evaluate the relevance of document in function of passages relevance IR-n system Definition Steps
15
IR-n system Similarity Measure Query-Passage In this year we have change the originally similarity measure of IR-n system, improving the results IR-n uses Number of appearances of term in query and passage Number of different documents that contains each term IR-n does not use Normalization measures depending on the passage size, due to all passages have the same size (the same number of sentences)
16
General Custer was Civil War Union Major soldier. One of the most famous and controversial figures in United States Military history. Graduated last in his West Point Class (June 1861). Spent first part of the Civil War as a courier and staff officer. Promoted from Captain to Brigadier General of Volunteers just prior to the Battle of Gettysburg, and was given command of the Michigan "Wolverines" Cavalary brigade. He helped defeat General Stuart's attempt to make a cavalry strike behind Union lines on the 3rd Day of the Battle (July 3, 1863), thus markedly contributing to the Army of the Potomac's victory (a large monument to his Brigade now stands in the East Cavalry Field in Gettysburg). Participated in nearly every cavalry action in Virginia from that point until the end of the war, always performing boldly, most often brilliantly, and always seeking publicity for himself and his actions. Ended the war as a Major General of Volunteers and a Brevet Major General in the Regular Army. Upon Army reorganization in 1886, he was appointed Lieutenant Colonel of the soon to be renown 7th United States Cavalry. Fought in the various actions against the Western Indians, often with a singular brutality (exemplified by his wiping out of a Cheyenne village on the Washita in November 1868). His exploits on the Plains were romanticized by Eastern Unites States newspapermen, and he was elevated to legendary status in his time. The death of his friend, Lucarelli change his life. 1 – Definition of passages 2 – Evaluate the relevance of each passage 3 – Evaluate the relevance of document in function of passages relevance IR-n system Definition Steps
17
Based on the best similarity of passages IR-n system Similarity measure Document-query
18
IR-n system use Overlapping passages Relevance Feedback based on passages IR-n system Another aspects
19
IR-n system uses overlapping in the definition of the passages In this way, a fragment of document can be in more than one passage IR-n system uses the sentence to define the overlapping. IR-n system Passage overlapping
20
General Custer was Civil War Union Major soldier. One of the most famous and controversial figures in United States Military history. Graduated last in his West Point Class (June 1861). Spent first part of the Civil War as a courier and staff officer. Promoted from Captain to Brigadier General of Volunteers just prior to the Battle of Gettysburg, and was given command of the Michigan "Wolverines" Cavalary brigade. He helped defeat General Stuart's attempt to make a cavalry strike behind Union lines on the 3rd Day of the Battle (July 3, 1863), thus markedly contributing to the Army of the Potomac's victory (a large monument to his Brigade now stands in the East Cavalry Field in Gettysburg). Participated in nearly every cavalry action in Virginia from that point until the end of the war, always performing boldly, most often brilliantly, and always seeking publicity for himself and his actions. Ended the war as a Major General of Volunteers and a Brevet Major General in the Regular Army. Upon Army reorganization in 1886, he was appointed Lieutenant Colonel of the soon to be renown 7th United States Cavalry. Fought in the various actions against the Western Indians, often with a singular brutality (exemplified by his wiping out of a Cheyenne village on the Washita in November 1868). His exploits on the Plains were romanticized by Eastern Unites States newspapermen, and he was elevated to legendary status in his time. The death of his friend, Lucarelli change his life. 1 – Obtain the sentences of document 2 – Define the passages using the size of passages and degree of overlapping IR-n system Passage overlapping (II) Definition of passages using overlapping SENTENCE 1 SENTENCE 2 SENTENCE 3 SENTENCE 4 SENTENCE 5 SENTENCE 6 SENTENCE 7 SENTENCE 8 P1 P2 P3 P4
21
This way of define the passages increment the number of passages to evaluate. However, the architecture of IR-n system allows to use overlapping passages without considerable increment of processing time IR-n system Passage overlapping (III)
22
IR-n system Relevance Feedback using passages Una estructura de datos es una colección de datos que se puede definir en base a la organización Una estructura de datos es una colección de datos que se puede definir en base a la organizaciónUna estructura de datos es una colección de datos que se puede definir en base a la organización Expanded Query Expans. Query IR-n Documents
23
Index IR-n System Multilingual Experiments Passage Retrieval Systems Conclusions Bilingual Experiments Monolingual Experiments
24
Training Test collections CLEF-2002 Objectives Determine the passage size of each collection Resources Stop-word lists (Clef) Stemmer (Clef) Monolingual Experiments Training
25
Monolingual Experiments Determining the best size of passages Características SpanishEnglish Best size14 Avg Sentences Document9.3027.34 Number of documents 215,738113,005 9 French 14 16.86 87,191 German 14 17.70 225,371 Italian 8 16.28 108,578
26
First Conclusions Good results on Spanish, French and English Bad results in German and Italian Query expansion allow improve over 10% AvgP Problems with German Problems with Italian Monolingual Experiments Conclusions of training
27
First Conclusions Good results on Spanish, French and English Bad results in German and Italian Query expansion allow improve over 10% AvgP Problems with German We have not a algorithm to split compound nouns Solution Use a list of more frequently compound names (200.000 terms). The use of this list allows to improve over 19,7% AvgP Problems with Italian Monolingual Experiments Conclusions of training
28
First Conclusions Good results on Spanish, French and English Bad results in German and Italian Query expansion allow improve over 10% AvgP Problems with German Problems with Italian ¿? Monolingual Experiments Conclusions of training
29
Index IR-n System Multilingual Experiments Passage Retrieval Systems Conclusions Bilingual Experiments Monolingual Experiments Bilingual Experiments
30
Training Test collections CLEF-2002 Objectives Determine how to translate the queries Resources PowerTranslator FreeTranslator BabelFish Google Bilingual Experiments Training
31
Google was the translator with worst results Power Translator was the translator with best results However, we obtained better results (5%) than Power using three translators (Power, Free and Babel) at time. Bilingual Experiments Conclusions Original Query Power Free Babel Final Query
32
Index IR-n System Multilingual Experiments Passage Retrieval Systems Conclusions Bilingual Experiments Monolingual Experiments Multilingual Experiments
33
Training Test collections CLEF-2002 Objectives Determine how to generate the multilingual document list Multilingual Experiments Training
34
Method We are working in a model based in dictionaries to generate the main list. But, we could not finish the development of this model for this conference Used method Translated the original query (English) to each language, using three translators Process each query separately Merge the four documents list using a formula to normalize each document. We test several formulas and obtained better results with: Multilingual Experiments Training
35
Method We use for each language two kind of passages The best passage in monolingual experiments The same passage size for all collections (11 sentences) We obtained similar results (Better in the first) We want continue exploring the use of the same size for all collections, maybe better for compare each collection Multilingual Experiments Training (II)
36
Index IR-n System Multilingual Experiments Passage Retrieval Systems Conclusions Bilingual Experiments Monolingual Experiments Conclusions
37
Comparison with the Clef Average. Monolingual Spanish Language Inc. % +8,75% French German Italian -2,06% +5,88% +7,48%
38
Conclusions Comparison with the Clef Average. Cross-lingual Italian/Spanish LanguageInc. % +25,78% Multilingual(4) LanguageInc. % +22,71%
39
Improve the efficiency of IR-n system in the retrieval task Develop a algorithm to split compound nouns (German) Continue with the develop of our method to multilingual retrieval using dictionaries If you are interested in use IR-n system (llopis@dlsi.ua.es) Conclusions Work in progress
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.