Algorithms and compressed data structures for information retrieval

  1. Ladra González, Susana
Dirixida por:
  1. Gonzalo Navarro Badino Director
  2. Nieves R. Brisaboa Director

Universidade de defensa: Universidade da Coruña

Fecha de defensa: 08 de abril de 2011

Tribunal:
  1. Isidro Ramos Salavert Presidente/a
  2. Ricardo Baeza Yates Secretario/a
  3. Paolo Ferragina Vogal
  4. Alejandro López Ortiz Vogal
  5. Josep Díaz Cort Vogal
Departamento:
  1. Ciencias da Computación e Tecnoloxías da Información

Tipo: Tese

Teseo: 306738 DIALNET lock_openRUC editor

Resumo

In this thesis we address the problem of the efficiency in Information Retrieval by presenting new compressed data structures and algorithms that can be used in different application domains and achieve interesting space/time properties. We propose (i) a new variable-length encoding scheme for sequences of integers that enables fast direct access to the encoded sequence and outperforms other solutions used in practice, such as sampling methods that introduce an undesirable space and time penalty to the encoding; (ii) a new self-indexed representation of the compressed text obtained by any word-based, byte-oriented compression technique that allows for fast searches of words and phrases over the compressed text occupying the same space than the space achieved by the compressors of such type, and obtains better performance than classical inverted indexes when little space is used; and (iii) a new compact representation of Web graphs that supports efficient forward and reverse navigation over the graph using the smallest space reported in the literature, and in addition it also allows for extended functionality not usually considered in compressed graph representations. These data structures and algorithms can be used in several scenarios, and we experimentally show that they can successfully compete with other techniques commonly used in those domains.