LogCLEF 2009 Log Analysis for Digital Societies (LADS) Thomas Mandl, Maristella Agosti, Giorgio Maria Di Nunzio, Alexander Yeh, Inderjeet Mani, Christine Doran, Julia Maria Schulz LogCLEF Overview LADS Task October 1, 2009, Corfu, GR CLEF Workshop
Overview Task Data Participants Results LogCLEF Overview LADS Task October 1, 2009, Corfu, GR CLEF Workshop
The LADS Task The aim of the LADS task is to analyze user behavior with a focus on multilingual search. User interaction with the portal at query time –e.g. how users interact with the search interface, what kind of search they perform –how many of them reformulate queries, browse results, leave the portal to follow the search in a national library. LogCLEF Overview LADS Task October 1, 2009, Corfu, GR CLEF Workshop
The LADS Task LADS deals with logs from The European Library (TEL) TEL is a free service that offers access to the resources of 48 national libraries of Europe in 35 languages. Resources can be both digital (e.g. books, posters, maps, sound recordings, videos) and bibliographical. Quality and reliability are guaranteed by the 48 collaborating national libraries of Europe. LogCLEF Overview LADS Task October 1, 2009, Corfu, GR CLEF Workshop
Goals This task was open to diverse approaches, in particular data mining techniques in order to extract knowledge from the data and find interesting user patterns: –user session reconstruction (necessary) –user interaction with the portal at query time –multilinguality and query reformulation –user context and user profile LogCLEF Overview LADS Task October 1, 2009, Corfu, GR CLEF Workshop
TEL Environment LogCLEF Overview LADS Task October 1, 2009, Corfu, GR CLEF Workshop
TEL Environment LogCLEF Overview LADS Task October 1, 2009, Corfu, GR CLEF Workshop
Data The data used for the LADS task are search (“action”) logs of The European Library portal All the actions are logged and stored by TEL in a relational table –each record represents a user action. The most significant columns of the table are: –A numeric id, for identifying registered users or “guest” otherwise; –User’s IP address; –An automatically generated alphanumeric, identifying sequential actions of the same user (sessions) ; –Query contents; –Name of the action that a user performed; –The corresponding collection’s alphanumeric id; –Date and time of the action’s occurrence. LogCLEF Overview LADS Task October 1, 2009, Corfu, GR CLEF Workshop
Data Action logs distributed to the participants of the task cover the period from 1st January 2007 until 30th June –1,866,330 records PostgreSQL table, csv file Description of the collection LogCLEF Overview LADS Task October 1, 2009, Corfu, GR CLEF Workshop
Participants About 20 participants registered 4 participants submitted results –University of Sunderland –Trinity College Dublin –University of Hildesheim –CELI Research, Torino LogCLEF Overview LADS Task October 1, 2009, Corfu, GR CLEF Workshop
Results CELI: identify translations of search queries. –The result is a list of pairs of queries in two languages. –Combined with session information, it is possible the check whether users translate their query within a session. University of Sunderland: users rarely switch the query language during their sessions. –They also found out that queries are typically submitted in the language of the interface which the user selects. Trinity College Dublin: thorough analysis of query reformulation, query length and activity sequence. –understanding of the behavior of users from different linguistic or cultural backgrounds. University of Hildesheim: sequences of interactions within the log file. –Visualized in an interactive user interface which allows the exploration of the sequences. University of Amsterdam: gain more context information –limited knowledge about the user which is inherent in log files needs to be tackled –semantic enrichment of the queries by linking them to digital objects [7]. LogCLEF Overview LADS Task October 1, 2009, Corfu, GR CLEF Workshop
Conclusions LogCLEF has provided an evaluation resource with log files of user activities in multilingual search environments: –the Tumba! Search engine and –The European Library (TEL) Web site. The results and approaches of the participants to the 2009 campaign will be helpful to define a more formal task in the next LogCLEF. Advertise better! –Workshop on Query Log Analyisis (TrebleCLEF 2009) –Workshop on Understanding the User Logging and interpreting user interactions in information search and retrieval (SIGIR 2009) Sharing resources and knowledge about log files, Collaborative User Log Analysis Pool –Mailing list –Web site LogCLEF Overview LADS Task October 1, 2009, Corfu, GR CLEF Workshop