CHI with Alexandria Contributions

In this year’s CHI conference in Glasgow Scotland, members of the Alexandria project will be presenting their work  on mitigating crowd biases in subjective tasks.

  • Christoph Hube, Besnik Fetahu, and Ujwal Gadiraju: “Understanding and Mitigating Worker Biases in the Crowdsourced Collection of Subjective Judgments”

The problem is framed in the context of subjective tasks and how the inherent worker biases impact the collection of judgements for such tasks. A description of the work is shown below.


Crowdsourced data acquired from tasks that comprise a subjective component (e.g. opinion detection, sentiment analysis) is potentially affected by the inherent bias of crowd workers who contribute to the tasks. This can lead to biased and noisy ground-truth data, propagating the undesirable bias and noise when used in turn to train machine learning models or evaluate systems. In this work, we aim to understand the influence of workers’ own opinions on their performance in the subjective task of bias detection. We analyze the influence of workers’ opinions on their annotations corresponding to different topics. Our findings reveal that workers with strong opinions tend to produce biased annotations. We show that such bias can be mitigated to improve the overall quality of the data collected. Experienced crowd workers also fail to distance themselves from their own opinions to provide objective annotations.


WWW with Alexandria Contributions

In this year’s WWW conference, members of the Alexandria project will be presenting two of their works at the The Web Conference (WWW ’19) in San Francisco, USA.

The works touch upon two different topics, such as understanding the reason why and how people cite in collaborative environments like Wikipedia, and the second work was on how we can leverage tabular information on the Web by aligning tables with fine-grained relations (e.g. subPartOf or equivalent).



The datasets and code for the TableNet project are available for further use and comparison.

WSDM with Alexandria Contributions

In the twelfth Web Search and Data Mining conference, held in Melbourne Australia between 11-16 of February, the Alexandria team and other L3S members presented their three research publications. WSDM is a highly selective conference with participation from major universities around the world and has an acceptance rate of only 16%.


The three works were on three diverse fields ranging from language bias, efficient training of word embeddings, and crowdsourcing.


The works were well received by the community leading to many fruitful discussions during the poster session at WSDM.


Award at TPDL conference

A paper contributed by Prof. Ewerth’s Visual Analytics Research Group received the “Honorable Mention Award” at TPDL 2018 (22nd International Conference on Theory and Practice of Digital Libraries) in Porto (Portugal) from 10 to 13 September. In total, 51 “full papers” were submitted to TPDL 2018, of which 16 (31%) were invited to present in a full-length oral presentation at the conference. This year, research groups of the TIB (Data Science & Digital Libraries, Scientific Data Management, Open Science Lab) participated in TPDL with numerous contributions (5 talks and two demo posters). The work was developed in the context of the ALEXANDRIA project (ERC Advanced Research Grant, Prof. Wolfgang Nejdl), which is conducted at the L3S Research Center and investigates innovative methods for temporal search in web archives.

In the Honorable Mention Award winning paper “Finding Person Relations in Image Data of News Collections in the Internet Archive”, the authors present a system that automatically recognises and identifies people in photos of news collections from the Internet Archiv. The approach is based on state-of-the-art machine learning methods, so-called deep (i.e., very large) neural networks. Through an appropriate visualisation of results, users and researchers, e.g. historians, can quickly and effectively explore information about person relationships in photos, i.e. how often persons of public interest were presented together on the same photo or in photos of the same article within a certain period of time. The authors demonstrate the benefits of the system via two use cases for online news from politics and entertainment in 2013. The trained models as well as the demo of the use cases described in the article are freely available online (see corresponding links). In the future, the approach will be extended so that person recognition in images can be linked or compared with mentions in the corresponding news text.

Reference: Eric Müller-Budack, Kader Pustu-Iren, Sebastian Diering, and Ralph Ewerth: “Finding Person Relations in Image Data of News Collections in the Internet Archive”. In: Méndez E., Crestani F., Ribeiro C., David G., Lopes J. (eds) Digital Libraries for Open Knowledge. TPDL 2018. Lecture Notes in Computer Science, Vol 11057. Springer, Cham. DOI: 10.1007/978-3-030-00066-0_20

Alexandria at JCDL 2018

The 2018 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2018) was held in Fort Worth (Texas, USA) on 3-6 June. The Alexandria project was again present in this annual venue with one full research paper. The paper, entitled  “Ranking Archived Documents for Structured Queries on Semantic Layers”, was presented by Dr. Pavlos Fafalios and is co-authored by Prof. Wolfgang Nejdl.

The paper introduces the problem of ranking archived documents for structured queries on semantic layers and proposes two ranking models (a probabilistic one and a Random Walk-based one) which jointly consider: i) the relativeness of a document to the query entities, ii) the timeliness of a document’s publication date, iii) the temporal relatedness of the query entities to other entities mentioned in the documents.

A preprint of the article is available at:

The presentation slides are available at:

Older posts «