Algorithms and compressed data structures for information retrieval

  1. Ladra, Susana
Dirigida por:
  1. Gonzalo Navarro Badino Director/a
  2. Nieves R. Brisaboa Directora

Universidad de defensa: Universidade da Coruña

Fecha de defensa: 08 de abril de 2011

Tribunal:
  1. Isidro Ramos Salavert Presidente/a
  2. Ricardo Baeza Yates Secretario/a
  3. Paolo Ferragina Vocal
  4. Alejandro López Ortiz Vocal
  5. Josep Díaz Cort Vocal
Departamento:
  1. Ciencias de la Computación y Tecnologías de la Información

Tipo: Tesis

Teseo: 306738 DIALNET lock_openRUC editor

Resumen

In this thesis we address the problem of the efficiency in Information Retrieval by presenting new compressed data structures and algorithms that can be used in different application domains and achieve interesting space/time properties. We propose (i) a new variable-length encoding scheme for sequences of integers that enables fast direct access to the encoded sequence and outperforms other solutions used in practice, such as sampling methods that introduce an undesirable space and time penalty to the encoding; (ii) a new self-indexed representation of the compressed text obtained by any word-based, byte-oriented compression technique that allows for fast searches of words and phrases over the compressed text occupying the same space than the space achieved by the compressors of such type, and obtains better performance than classical inverted indexes when little space is used; and (iii) a new compact representation of Web graphs that supports efficient forward and reverse navigation over the graph using the smallest space reported in the literature, and in addition it also allows for extended functionality not usually considered in compressed graph representations. These data structures and algorithms can be used in several scenarios, and we experimentally show that they can successfully compete with other techniques commonly used in those domains.