ALEXANDRIA Internet Archive Search Prototype

ArchiveSearchArchiveSearch provides, for the first time, entity based search and exploration functionalities into the Web Archive of the Internet Archive allowing you to use (most of) the 1.9 million concepts of the German Wikipedia or (most of) the 5 million concepts of the English Wikipedia as search terms.

For these search terms, the current version provides the most important results from the Internet Archive, ranking resources based on Bing search, with more sophisticated re-ranking in future releases, as well as related entity suggestions for most of the queries.

ArchEE – Archive Exploration Engine

archeeWhen exploring news archives, a key requirement of historians is to get an overview of their search results initially. To address this problem we developed a novel retrieval model – HistDiv – which ranks articles according to historical relevance. The Archive Exploration Engine (ArchEE) system was built to showcase how HistDiv and various other state-of-the-art retrieval models coupled with time-lines and entity filters can help users better explore large news archives.


ArchEE also been selected as one of the top 3 startups in Lower Saxony for the 2016 Going Global competition organized by Hannover Impuls.

Extraction of Evolution Descriptions from the Web

2014-06-06_103307The evolution of named entities affects exploration and retrieval tasks in digital libraries. An information retrieval system that is aware of name changes can actively support users in finding former occurrences of evolved entities. However, current structured knowledge bases, such as DBpedia or Freebase, do not provide enough information about evolutions, even though the data is available on their resources, like Wikipedia. Our \emph{Evolution Base} prototype will demonstrate how excerpts describing name evolutions can be identified on these websites with a promising precision. The descriptions are classified by means of models that we trained based on a recent analysis of named entity evolutions on Wikipedia.


Tempas – Temporal Archive Search Based on Tags

TempasLimited search and access patterns over Web archives have been well documented. One of the key reasons is the lack of understanding of the user access patterns over such collections, which in turn is attributed to the lack of effective search interfaces. Current search interfaces for Web archives are (a) either purely navigational or (b) have sub-optimal search experience due to ineffective retrieval models or query modeling. We identify that external longitudinal resources, such as social bookmarking data, are crucial sources to identify important and popular websites in the past. To this extent we present Tempas, a tag-based temporal search engine for Web archives.

Websites are posted at specific times of interest on several external platforms, such as bookmarking sites like Delicious. Attached tags not only act as relevant descriptors useful for retrieval, but also encode the time of relevance. With Tem- pas we tackle the challenge of temporally searching a Web archive by indexing tags and time. We allow temporal selections for search terms, rank documents based on their popularity and also provide meaningful query recommendations by exploiting tag-tag and tag-document co-occurrence statistics in arbitrary time windows. Finally, Tempas operates as a fairly non-invasive indexing framework. By not dealing with contents from the actual Web archive it constitutes an attractive and low-overhead approach for quick access into Web archives.