Alexandria at JCDL 2017

The 2017 ACM/IEEE-CS Joint Conference on Digital Libraries will be held in Toronto (Ontario, Canada) on June 19-23. L3S will be again present in this annual venue with one full research paper and one poster, both conducted in the context of the ALEXANDRIA project. The works will be presented by Dr. Pavlos Fafalios and are co-authored by Prof. Wolfgang Nejdl.
The research paper, entitled “Building and Querying Semantic Layers for Web Archives” and authored by P. Fafalios, H. Holzmann, V. Kasturia, and W. Nejdl, introduces an RDF/S model and a distributed framework for building semantic layers that describe semantic information about the contents of web archives. A semantic layer allows describing metadata information about the archived documents, annotating them with useful semantic information (like entities, concepts and events), and publishing all this data on the Web as Linked Data. Such structured repositories offer advanced query and integration capabilities and make web archives directly exploitable by other systems and tools. A preprint of the article is available in this link.
In the same context, the poster paper (entitled “Towards a Ranking Model for Semantic Layers over Digital Archives” and authored by P. Fafalios, V. Kasturia, and W. Nejdl) focuses on the problem of ranking archived documents returned by structured (SPARQL) queries over semantic layers. The poster discusses the motivation for this work, formalizes the problem, and proposes a baseline model that considers and combines the following three aspects: i) the relativeness of documents to entities, ii) the timeliness of documents, and iii) the relations among the entities. A preprint is available in this link.
Following last year’s nomination for the ArchiveSpark <> paper, this is the second-in-a-row best paper nomination at JCDL for the Alexandria team.

CIKM 2016 with Alexandria contributions

The 2016 edition of the Conference on Information and Knowledge Management (CIKM) took place on October 24-29, 2016 in Indianapolis, IN, USA. CIKM is major forum for presentation and discussion of research on information and knowledge management. The Alexandria team of L3S was involved with two papers by Besnik Fetahu et al. on finding news citations for Wikipedia, and Jaspreet Singh et al. on a human-in-the-loop retrieval method for discovering entities:



Read the rest of this entry »

Alexandria at JCDL 2016

Helge Holzmann presenting ArchiveSparkOn June 16-23 this year’s ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL 2016) took place in Newark, New Jersey. JCDL is a major international forum focusing on digital libraries and associated technical, practical, and social issues. The L3S was involved in this important event with two research paper presentations by Helge Holzmann, showing results of their research in the area of Web archiving as part of the EU project Alexandria (an ERC Advanced Grant by Prof. Wolfgang Nejdl). The first paper was co-authored by Prof. Wolfgang Nejdl as well as Dr. Avishek Anand: “The Dawn of Today’s Popular Domains: A Study of the Archived German Web over 18 Years”.

The second paper, again co-authored by Dr. Avishek Anand, has been joint work with the Internet Archive ( and showed their successful work on the Web archive data processing framework ArchiveSpark, developed by Helge Holzmann (L3S) and Vinay Goel (Internet Archive): “ArchiveSpark: Efficient Web Archive Access, Extraction and Derivation”. The paper on ArchiveSpark was nominated for the best paper award of the conference, which shows the need of this tool as well as the growing importance of Web archives in the Digital Libraries community.

ArchiveSpark is open-source and freely available under

Save the Data – 3rd International Alexandria Workshop in conjunction with TPDL 2016

Alexandria LogoSave the date to attend the 3rd International Alexandria Workshop. After the successful workshops 2014 and 2015 we continue this year with the 3rd edition on 8/9 September 2016. This time the workshop will be organized in conjunction with the TPDL 2016 conference. The registration via the TPDL registration system is already possible.

If you have questions don’t hesitate to contact us via email with the subject “Alexandria WS” to

Stay tuned for more information!!


Tempas – Temporal Archive Search Based on Tags presented at WWW2016

Limited search and access patterns over Web archives have been well documented. One of the key reasons is the lack of understanding of the user access patterns over such collections, which in turn is attributed to the lack of effective search interfaces. Current search interfaces for Web archives are (a) either purely navigational or (b) have sub-optimal search experience due to ineffective retrieval models or query modeling. We identify that external longitudinal resources, such as social bookmarking data, are crucial sources to identify important and popular websites in the past. To this extent we present Tempas, a tag-based temporal search engine for Web archives.

Websites are posted at specific times of interest on several external platforms, such as bookmarking sites like Delicious. Attached tags not only act as relevant descriptors useful for retrieval, but also encode the time of relevance. With Tem- pas we tackle the challenge of temporally searching a Web archive by indexing tags and time. We allow temporal selections for search terms, rank documents based on their popularity and also provide meaningful query recommendations by exploiting tag-tag and tag-document co-occurrence statistics in arbitrary time windows. Finally, Tempas operates as a fairly non-invasive indexing framework. By not dealing with contents from the actual Web archive it constitutes an attractive and low-overhead approach for quick access into Web archives.

Older posts «

» Newer posts