New Algorithms and Methodologies for Building Information Retrieval Collections

  1. Otero Freijeiro, David
Supervised by:
  1. Álvaro Barreiro García Director
  2. Javier Parapar Director

Defence university: Universidade da Coruña

Fecha de defensa: 05 April 2024

Committee:
  1. David Enrique Losada Carril Chair
  2. Paula López-Otero Secretary
  3. Maria Maistro Committee member

Type: Thesis

Abstract

Information retrieval systems play a crucial role in addressing users’ information needs by aiding their exploration of vast collections of information. This thesis is framed in a critical information retrieval research aspect: evaluation. In particular, we propose new approaches for creating annotated test collections. Such collections are essential for evaluating retrieval systems’ effectiveness in controlled experiments. Reflecting real-world conditions accurately in these test collections is pivotal for progress in the field. We aim to introduce innovative techniques for efficiently assembling reliable test collections, facilitating broader research and development in information retrieval. The thesis first proposes a new method for building new pooled test collections without requiring costly evaluation campaigns. This approach simplifies and economizes the process of building new benchmarks. Then, we introduce a novel adjudication method for determining which pooled documents warrant human judgment, aiming to reduce the need for extensive expert assessments. This method is both cost-effective and efficient. Additionally, the thesis presents a fresh perspective on evaluating adjudication methods, emphasizing statistical significance, an aspect often overlooked in previous document adjudication research. As a demonstration of the methods explored in this thesis, we applied them to develop a new test collection whose construction process we describe here as an example of the use of reduced-budget methods. In summary, this thesis integrates established information retrieval knowledge with new methodologies to create annotated collections that are both cost-effective and reliable. This fusion is crucial for advancing the development of more effective retrieval systems.