“The need for Semantic Desktop Dataset” L3S and University of Hannover, Germany Sergey Chernov, Tereza Iofciu, Wolfgang Nejdl, Xuan Zhou (chernov, iofciu, nejdl, zhou@l3s.de) Speaker: Sergey Chernov www.l3s.de
NEPOMUK: Social Semantic Desktop NEPOMUK - Networked Environment for Personalized Ontology-based Management of Unified Knowledge, about 20 participating organizations with budget over 11 mln euros - Desktop: Help individuals in managing information on their PC - Semantic: Make content available to automated processing - Social: Enable exchange across individual boundaries Person Email friend Event Topic acquaintance Person WebSite Document colleague Image Personal Semantic Web: a semantically enlarged Social protocols NEPOMUK enabled intimate supplement to memory and distributed search peers Sergey Chernov, Tereza Iofciu, Wolfgang Nejdl, Xuan Zhou 01/05/19
NEPOMUK: The Society-Scale Semantic Web Application Today necessary technologies & communities exist: Standardized metadata: Semantic Web Scalable distributed infrastructure: P2P Computing Knowledge articulation and interaction: Desktop Technology Human centric information exchange: Online Social Networks Memex (Vannevar Bush) A memex is "a device in which an individual stores all his books, records, and communications”. Open Hypertext System (Doug Engelbart) “The open hyperdocument system is a standards-based, open source framework for developing collaborative, knowledge management applications.” WWW (Tim Berners-Lee) “There was a second part of the dream[…] we could then use computers to help us analyse it, make sense of what we re doing, where we individually fit in, and how we can better work together.” Challenge: Extension & merging of research streams Ontology driven distributed Social Networking Ontology driven Social Networking Semantic Desktop Social Semantic Desktop P2P networks Semantic Web Desktop/ Wiki Semantic P2P Social Networking Phase 1 Phase 2 Phase 3 Sergey Chernov, Tereza Iofciu, Wolfgang Nejdl, Xuan Zhou 01/05/19
A scenario for desktop search Xuan can search for “the articles about multimedia conferences and workshops, which are titled “call for papers” or “upcoming events” and were recommended by Mounia”. Query: multimedia workshop /title upcoming events /receivedFrom Mounia affiliatedTo fn uid:123 Queen Mary Uni Mounia Lalmas family receivedFrom given http://inex.is.informatik.uni-duisburg.de/2005/index.html Lalmas accessedFrom msgid:00465 Mounia Upcoming Events storedFrom publication title type publishedIn c:\inex1.8\xml\mu\1998\u40c2.xml IEEE MULTIMEDIA 1999 issn 1070-986X Multimedia Computing and Networking 1999 (MMCN 99) … This conference … multimedia systems… year text 1998 Sergey Chernov, Tereza Iofciu, Wolfgang Nejdl, Xuan Zhou 01/05/19
Using the INEX dataset for experiments with Desktop Search1 FULL-TEXT 1 XQL and Proximal Nodes Ricardo Baeza-Yates Gonzalo Navarro We consider the recently proposed language … Introduction Searching on structured text is becoming more important … XML SNIPPET <paper id=”1”> <title> XQL and Proximal Nodes </title> <author> Ricardo Baeza-Yates </author> < author > Gonzalo Navarro </ author > <abstract> We consider the recently proposed language …</ abstract> <section name=”Introduction”> Searching on structured text is becoming more important … <subsection name=“Related Work”> The XQL language … </subsection> </section> … <cite xmlns:xlink=”http://www.acm.org/www8/xmlql> …</cite> </paper> TITLE XQL and Proximal Nodes CREATOR Ricardo Baeza-Yates Gonzalo Navarro ABSTRACT We consider the recently proposed language … CITATION http://www.acm.org/www8/xmlql> … //*[about(.//au, Ricardo Baeza-Yates) and about(., xql proximal nodes)] /creator Ricardo Baeza-Yates AND xql proximal nodes 1 Sergey Chernov, Tereza Iofciu, and Wolfgang Nejdl, Integrating Metadata and Full-Text Search on the Desktop, Technical Report, 2005. Sergey Chernov, Tereza Iofciu, Wolfgang Nejdl, Xuan Zhou 01/05/19
Desktop Metadata Missing from INEX StoredFrom - Web links as sources of publications ReceivedFrom - Email activity information, emails containing publications EmailAnnotations - Email annotations (from sender) SearchKeyword - Search keywords, which were used at Web search engine to find the document OpenLast, MovedFrom - User action history in regard to the publications Annotation - User annotations Sergey Chernov, Tereza Iofciu, Wolfgang Nejdl, Xuan Zhou 01/05/19
Challenges for Designing a Dataset for Desktop Data obtained through logging Pros: real-data Cons: privacy issues, high level of user cooperation is required, low-scalability Data created through simulations Pros: scalable, easy-to-modify, cheap, less restrictions regarding privacy , existing datasets like INEX or Enterprise Track can be re-used Cons: can be based on wrong assumptions Sergey Chernov, Tereza Iofciu, Wolfgang Nejdl, Xuan Zhou 01/05/19
Our suggestion We invite everyone to collaborate in creating a dataset for desktop search It would be nice to consider additional metadata when designing new INEX collection Sergey Chernov, Tereza Iofciu, Wolfgang Nejdl, Xuan Zhou 01/05/19