ALEXANDRIA Temporal Retrieval, Exploration and Analytics in Web Archives Wolfgang Nejdl L3S Research Center Hannover, Germany.

Slides:



Advertisements
Similar presentations
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
Advertisements

Libraries for Future Generations Martha Anderson Director National Digital Information Infrastructure and Preservation Program The Library of Congress.
“ Leveraging SharePoint 2010 Search Technologies ” With: Ivan Neganov.
Analysis and Forecasting of Trending Topics in Online Media Streams 1 ACM MM 2013 Tim Althoff, Damian Borth, Jörn Hees, Andreas Dengel German Research.
Enabling the Social Web Krishna P. Gummadi Networked Systems Group Max Planck Institute for Software Systems.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
23. Juli Let’s Search Together! : Collaborative Web. Sergej Zerr, Ivana Marenzi
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Overview of Web Data Mining and Applications Part I
Time-Sensitive Web Image Ranking and Retrieval via Dynamic Multi-Task Regression Gunhee Kim Eric P. Xing 1 School of Computer Science, Carnegie Mellon.
EGHNA Development and Support. Agenda  About EGHNA  About Drupal  Who is using Drupal?  What you can do with Drupal  Why use Drupal?  Project Deliverables.
TERENA News Update TERENA User Services related Activity IETF50, Minneapolis IETF User Services WG Yuri Demchenko, TERENA
Web 2.0: Concepts and Applications 2 Publishing Online.
Result presentation. Search Interface Input and output functionality – helping the user to formulate complex queries – presenting the results in an intelligent.
Digital Library Architecture and Technology
Attention and Event Detection Identifying, attributing and describing spatial bursts Early online identification of attention items in social media Louis.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Web Archives, IDEAL, and PBL Overview Edward A. Fox Digital Library Research Laboratory Dept. of Computer Science Virginia Tech Blacksburg, VA, USA 21.
From Local to Global: Launching the New Rangelands West Portals and Database XIV IAALD World Congress 2013 Cornell University July 22, 2013.
Multimedia Databases (MMDB)
Web Science and Web Archive L3S Wolfgang Nejdl L3S Research Center Hannover, Germany.
Science Research: Journey to 10,000 Sources Presented by: Abe Lederman, President and Founder Deep Web Technologies, Inc. Special Libraries Association.
TERENA News Update TERENA User Services related Activity IETF49, San Diego IETF User Services WG Yuri Demchenko, TERENA
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
Semantic Search: different meanings. Semantic search: different meanings Definition 1: Semantic search as the problem of searching documents beyond the.
Social Software for Lifelong Competence Development: Scenario and Challenges Ivana Marenzi, Elena Demidova, Wolfgang Nejdl, Daniel Olmedilla L3S Research.
Web Data Management Dr. Daniel Deutch. Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of.
Collaborative Research: Curriculum Development for Digital Library Education Presentation in May 1,2006
Microsoft Academic Search Search | Explore | Discover Alex D. Wade Director - Scholarly Communication.
ON INCENTIVE-BASED TAGGING Xuan S. Yang, Reynold Cheng, Luyi Mo, Ben Kao, David W. Cheung {xyang2, ckcheng, lymo, kao, The University.
Internet Skills The World Wide Web (Web) consists of billions of interconnected pages of information from a wide variety of sources. In this section: Web.
Linked-data and the Internet of Things Payam Barnaghi Centre for Communication Systems Research University of Surrey March 2012.
1 nlresearch.com The First ReSearch Engine: Northern Light® Susan M. Stearns Director of Enterprise Marketing March, 1999.
Proposal for Term Project J. H. Wang Mar. 2, 2015.
EU Project proposal. Andrei S. Lopatenko 1 EU Project Proposal CERIF-SW Andrei S. Lopatenko Vienna University of Technology
Topic Rathachai Chawuthai Information Management CSIM / AIT Review Draft/Issued document 0.1.
Collaborative Information Retrieval - Collaborative Filtering systems - Recommender systems - Information Filtering Why do we need CIR? - IR system augmentation.
“Ask the Internet Librarian!” Introducing LibinfO, the online information service of Hungarian libraries Presented by Kristóf Iványi.
Introducing Intute: Social Sciences Your Guide to the Best of the Web.
Data Mining for Web Intelligence Presentation by Julia Erdman.
How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, Jose San.
Mining real world data Web data. World Wide Web Hypertext documents –Text –Links Web –billions of documents –authored by millions of diverse people –edited.
Introduction to the Semantic Web and Linked Data
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #15 Secure Multimedia Data.
Information Retrieval and Web Search Course overview Instructor: Rada Mihalcea.
S YMPOSIUM ON B IAS AND D IVERSITY IN IR (aka L IVING K NOWLEDGE S UMMER S CHOOL ) LivingKnowledge Consortium ESSIR Summer School 2011 August 31, 2011,
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
Search and Access Technologies for Large Scale Web Archives Joseph JaJa, Sangchul Song, and Mike Smorul Institute for Advanced Computer Studies Department.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Data Mining with Big Data. Abstract Big Data concerns large-volume, complex, growing data sets with multiple, autonomous sources. With the fast development.
IR. SI 650/EECS 549 Information Retrieval People search the Web daily Search engines –Google –Bing –Baidu –Yandex Information Retrieval is about search.
Social Information Processing March 26-28, 2008 AAAI Spring Symposium Stanford University
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Providing Social Sharing Functionalities in LearnWeb2.0 Ivana Marenzi, Sergej Zerr, Wolfgang Nejdl L3S Research Center, Hanover, Germany {marenzi, zerr,
23. Juli deskWeb2.0: Combining Desktop and Social Search Sergej Zerr, Elena Demidova, Sergej Chernov L3S Research Center Hannover, Germany
Web Design Terminology Unit 2 STEM. 1. Accessibility – a web page or site that address the users limitations or disabilities 2. Active server page (ASP)
INTRODUCTION TO DOCUMENT AUTHORING AND ELECTRONIC PUBLISHING.
Data Mining in Germany IIM Conference, Oct. 24, 2012 Gottfried Schwarz, DLR > Lecture > Author Document > Datewww.DLR.de Chart 1.
Sergej Zerr, Ivana Marenzi
Proposal for Term Project
Personalized Social Image Recommendation
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Topics in Web Science SS 2016 Prof. Wolfgang Nejdl, Nam Khanh Tran
Advanced Methods of Information Retrieval An Overview
Advanced Methods of IR An Overview
CSE 635 Multimedia Information Retrieval
Personal Privacy and Security
“The need for Semantic Desktop Dataset” L3S and University of Hannover, Germany Sergey Chernov, Tereza Iofciu, Wolfgang Nejdl, Xuan Zhou (chernov, iofciu,
Metadata supported full-text search in a web archive
Presentation transcript:

ALEXANDRIA Temporal Retrieval, Exploration and Analytics in Web Archives Wolfgang Nejdl L3S Research Center Hannover, Germany

Computer Science and interdisciplinary research on all aspects of the Web Internet: Communication and Networks Information: Accessing information and knowledge on and through the Web Community: Supporting communities and groups on the Web, for research, education, production and entertainment Society: Requirements (technological, social, legal) for the Web Selected projects Web L3S LivingKnowledge: Diversity, opinion and bias on the Web CUbRIK: Searching by computers and humans Real-time data processing for finance predictions Privacy, Property and Internet Governance Cross-media analysis and interpretation ForgetIT: Concise Preservation via Managed Forgetting MAPPING

Spam Attack on Copts Gun running from Sudan Are we loosing the past of the web?

Library of Congress In April 2010 LoC and Twitter signed an agreement to archive all tweets since 2006 January 2013: It is clear that technology to allow for scholarship access to large data sets is lagging behind technology for creating and distributing such data. The Library is pursuing partnerships to allow some limited access capability in reading rooms. German National Library Based on a law of June 22, 2006, the GNL should collect, enrich, catalog, archive Web publications Internet Archive Archiving the Web (10 Petabyte) since 1996 Access possible through the URL Relevant L3S Web Archiving: LiWA, ARCOMEM, ForgetIT Web Search: PHAROS, CUBRIK Web and Stream Analytics: EUMSSI, Qualimaster ERC Advanced Grant: ALEXANDRIA (2014 – 2018, 2.5 Mill. Euro) Cooperations German National Library, British Library, Internet Archive, Rutgers University, et al

Looking back: The Austrian Socialist Party and Europe

What is missing? ALEXANDRIA Vision and 9 Research Questions

Q1: How to link web archive content against multiple entity and event collections evolving over time? Ioannou, E., Nejdl, W., Niederée, C. and Velegrakis, Y LinkDB: A Probabilistic Linkage Database System. SIGMOD (New York, New York, USA, Jun. 2011) Q2: How to maintain entity and event information and indexes for web- scale archives? Papadakis, G., Ioannou, E., Niederée, C., Palpanas, T. and Nejdl, W Beyond 100 million entities: large-scale blocking-based resolution for heterogeneous data. WSDM (New York, NY, USA, 2012), 53–62. Papadakis, G., Ioannou, E., Palpanas, T., Niederée, C. and Nejdl, W A Blocking Framework for Entity Resolution in Highly Heterogeneous Information Spaces. TKDE. (2012). Evolution-Aware Entity-Based Enrichment and Indexing

Huge and Heterogeneous Information Spaces Voluminous, (semi-)structured datasets.  DBPedia 3.4: 36,5 million triples and 2,1 million entities  BTC09: 1,15 billion triples and 182 million entities. Users are free to insert not only attribute values but also attribute names  high levels of heterogeneity.  DBPedia 3.4: 50,000 attribute names  Google Base:100,000 schemata and 10,000 entity types. Large portion of data stemming from automatic information extraction  noise, tag-style values and this does neither involve time nor entity evolution …

Q3: How to archive complex and dynamic network structures from social media? Siersdorfer, S., Chelaru, S., Nejdl, W. and San Pedro, J How useful are your comments? Analyzing and Predicting YouTube Comments and Comment Ratings. WWW (New York, New York, USA, Apr. 2010), extended for TWEB (2014) Risse, T., Dietze, S., Peters, W., Doka, K., Stavrakas, Y. and Senellart, P Exploiting the Social and Semantic Web for guided Web Archiving. TPDL (Sep. 2012) Q4: How to aggregate social media streams for archiving? Minack, E., Siberski, W. and Nejdl, W Incremental diversification for very large sets: a streaming-based approach. SIGIR (New York, New York, USA, Jul. 2011) Diaz-Aviles, E., Drumond, L., Schmidt-Thieme, L. and Nejdl, W Real-time top-n recommendation in social streams. RecSys (New York, New York, USA, 2012) Aggregating Social Networks and Streams

Using comment analysis to find relevant resources

Temporal Retrieval and Ranking Q5: How to support time-sensitive and entity-based query formulation? Kanhabua, N. and Nørvåg, K Exploiting time-based synonyms in searching document archives. JCDL (New York, New York, USA, Jun. 2010) Nguyen, T., and Kanhabua, N Leveraging dynamic query subtopics for time- aware search result diversification. ECIR (Amsterdam, April 2014) Q6: How to improve result ranking and clustering for time-sensitive and entity-based queries? Kanhabua, N., Blanco, R. and Matthews, M Ranking related news predictions. SIGIR (New York, New York, USA, Jul. 2011) G. Demartini, C. Firan, T. Iofciu, R. Krestel, W. Nejdl: Why finding entities in Wikipedia is difficult, sometimes. Inf. Retr. 13(5): (2010)

march madness began 14/03/2006 ncaa women tournament began 18/03/ /04/2006 final four began query: ncaa Dynamic subtopic mining for query extension and ranking

Q7: How to support collaborative and complex search and analysis processes? Ivana Marenzi and Sergej Zerr. Multiliteracies and Active Learning in CLIL - The Development of LearnWeb2.0 - IEEE Transactions on Learning Technologies (2012) Q8: How to leverage (user) search and analysis processes to improve the web archive? K. Bischoff, C. Firan, W.Nejdl, R. Paiu: Bridging the gap between tagging and querying vocabularies: Analyses and applications for enhancing multimedia IR. J. Web Sem. 8(2- 3): (2010) M. Georgescu, N. Kanhabua, D. Krause, W. Nejdl, S. Siersdorfer: Extracting Event- Related Information from Article Updates in Wikipedia. ECIR 2013: Collaborative Exploration and Analytics

Peaks in Wikipedia update activity correlate with events Edit history for the Barack Obama article (monthly) Announced his candidacy February 10, 2007 won the 2009 Nobel Peace Prize

Trust, privacy, and privacy preserving data mining Q9: How to achieve privacy using privacy-preserving data publishing and data-mining? W. Nejdl, D. Olmedilla, M. Winslett : Peertrust: Automated trust negotiation for peers on the semantic web. Secure Data Management 2004, S. Zerr, D. Olmedilla, W. Nejdl, W. Siberski: Zerber+R: top-k retrieval from a confidential index. 12th Intl. Conference on Extending Database Technology, EDBT 2009, Saint Petersburg, Russia. S. Zerr, S. Siersdorfer, J. S. Hare, E. Demidova: Privacy-aware image classification and search. SIGIR 2012, N. Forgó, T. Krügel: Mit oder ohne Zustimmung? Soziale Netzwerke und der Datenschutz. FL 2011

Public and private photos: colors and edges Public Private

(Nikolaus Forgó)

By placing an order via this Web site on the first day of the fourth month of the year 2010 Anno Domini, you agree to grant Us a non transferable option to claim, for now and for ever more, your immortal soul. Should We wish to exercise this option, you agree to surrender your immortal soul, and any claim you may have on it, within 5 (five) working days of receiving written notification from gamestation.co.uk or one of its duly authorized minions. (Nikolaus Forgó)