CIKM 2016 with Alexandria contributions

The 2016 edition of the Conference on Information and Knowledge Management (CIKM) took place on October 24-29, 2016 in Indianapolis, IN, USA. CIKM is major forum for presentation and discussion of research on information and knowledge management. The Alexandria team of L3S was involved with two papers by Besnik Fetahu et al. on finding news citations for Wikipedia, and Jaspreet Singh et al. on a human-in-the-loop retrieval method for discovering entities:



Fetahu, Besnik, Markert, Katja, Nejdl, Wolfgang and Anand, Avishek; Finding News Citations for Wikipedia; In: Proc. of 25th ACM International on Conference on Information and Knowledge Management ; CIKM’16; ACM; 2016. DOI: 10.1145/2983323.2983808

An important editing policy in Wikipedia is to provide citations for added statements in Wikipedia pages, where statements can be arbitrary pieces of text, ranging from a sentence to a paragraph. In many cases citations are either outdated or missing altogether. In this work we address the problem of finding and updating news citations for statements in entity pages. We propose a two-stage supervised approach for this problem. In the first step, we construct a classifier to find out whether statements need a news citation or other kinds of citations (web, book, journal, etc.). In the second step, we develop a news citation algorithm for Wikipedia statements, which recommends appropriate citations from a given news collection. Apart from IR techniques that use the statement to query the news collection, we also formalize three properties of an appropriate citation, namely: (i) the citation should entail the Wikipedia statement, (ii) the statement should be central to the citation, and (iii) the citation should be from an authoritative source. We perform an extensive evaluation of both steps, using 20 million articles from a real-world news collection. Our results are quite promising, and show that we can perform this task with high precision and at scale.


Singh, Jaspreet and Hoffart, Johannes and Anand, Avishek; Discovering Entities with Just a Little Help from You; In: Proc. of 25th ACM International on Conference on Information and Knowledge Management; CIKM ’16; ACM; 2016. DOI: 10.1145/2983323.2983798

Linking entities like people, organizations, books, music groups and their songs in text to knowledge bases (KBs) is a fundamental task for many downstream search and mining applications. Achieving high disambiguation accuracy crucially depends on a rich and holistic representation of the entities in the KB. For popular entities, such a representation can be easily mined from Wikipedia, and many current entity disambiguation and linking methods make use of this fact. However, Wikipedia does not contain long-tail entities that only few people are interested in, and also at times lags behind until newly emerging entities are added. For such entities, mining a suitable representation in a fully automated fashion is very difficult, resulting in poor linking accuracy. What can automatically be mined, though, is a high-quality representation given the context of a new entity occurring in any text. Due to the lack of knowledge about the entity, no method can retrieve these occurrences automatically with high precision, resulting in a chicken-egg problem. To address this, our approach automatically generates candidate occurrences of entities, prompting the user for feedback to decide if the occurrence refers to the actual entity in question. This feedback gradually improves the knowledge and allows our methods to provide better candidate suggestions to keep the user engaged. We propose novel human-in-the-loop retrieval methods for generating candidates based on gradient interleaving of diversification and textual relevance approaches. We conducted extensive experiments on the FACC dataset, showing that our approaches convincingly outperform carefully selected baselines in both intrinsic and extrinsic measures while keeping users engaged.

Leave a Reply