Save the Data – 3rd International Alexandria Workshop in conjunction with TPDL 2016

Alexandria LogoSave the date to attend the 3rd International Alexandria Workshop. After the successful workshops 2014 and 2015 we continue this year with the 3rd edition on 8/9 September 2016. This time the workshop will be organized in conjunction with the TPDL 2016 conference. The registration via the TPDL registration system is already possible.

If you have questions don’t hesitate to contact us via email with the subject “Alexandria WS” to

Stay tuned for more information!!


Tempas – Temporal Archive Search Based on Tags presented at WWW2016

Limited search and access patterns over Web archives have been well documented. One of the key reasons is the lack of understanding of the user access patterns over such collections, which in turn is attributed to the lack of effective search interfaces. Current search interfaces for Web archives are (a) either purely navigational or (b) have sub-optimal search experience due to ineffective retrieval models or query modeling. We identify that external longitudinal resources, such as social bookmarking data, are crucial sources to identify important and popular websites in the past. To this extent we present Tempas, a tag-based temporal search engine for Web archives.

Websites are posted at specific times of interest on several external platforms, such as bookmarking sites like Delicious. Attached tags not only act as relevant descriptors useful for retrieval, but also encode the time of relevance. With Tem- pas we tackle the challenge of temporally searching a Web archive by indexing tags and time. We allow temporal selections for search terms, rank documents based on their popularity and also provide meaningful query recommendations by exploiting tag-tag and tag-document co-occurrence statistics in arbitrary time windows. Finally, Tempas operates as a fairly non-invasive indexing framework. By not dealing with contents from the actual Web archive it constitutes an attractive and low-overhead approach for quick access into Web archives.

ArchEE Among the Top 3 Startups in Lower Saxony

When exploring news archives, a key requirement of historians is to get an overview of their search results initially. To address this problem we developed a novel retrieval model – HistDiv – which ranks articles according to historical relevance. The Archive Exploration Engine (ArchEE) system was built to showcase how HistDiv and various other state-of-the-art retrieval models coupled with time-lines and entity filters can help users better explore large news archives.

ArchEE also been selected as one of the top 3 startups in Lower Saxony for the 2016 Going Global competition organized by Hannover Impuls.

A demo of ArchEE can be found here:

ALEXANDRIA Internet Archive Search Prototype

We are delighted to announce the first public release of our ALEXANDRIA Internet Archive Search Prototype:

ArchiveSearch provides, for the first time, entity based search and exploration functionalities into the Web Archive of the Internet Archive allowing you to use (most of) the 1.9 million concepts of the German Wikipedia or (most of) the 5 million concepts of the English Wikipedia as search terms.

For these search terms, the current version provides the most important results from the Internet Archive, ranking resources based on Bing search, with more sophisticated re-ranking in future releases, as well as related entity suggestions for most of the queries.

Read the rest of this entry »

Successful 2nd International Alexandria Workshop

IMG_0833_smThe second Alexandria Workshop took place in L3S Research Center on 2-3rd November 2015. The workshop was aimed at bringing together communities involved in web archiving, digital preservation, digital humanities and information retrieval to encourage a closer dialogue between researchers from computer science, digital humanities and cultural heritage institutions. It was widely attended from participants from national libraries, humanities to computer scientists from varying disciplines like Information retrieval, natural language processing, database systems and distributed systems. The workshop, spanning two days, included two keynotes, several research talks, system demonstrations and a panel discussion on shortcomings, research infrastructures, and future directions.

Read the rest of this entry »

Older posts «

» Newer posts