Consultas con Errores Ortográficos en RI Multilingüeanálisis y Tratamiento

  1. Vilares Calvo, David
  2. Blanco González, Adrián
  3. Vilares, Jesús
Journal:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Year of publication: 2013

Issue: 51

Pages: 25-32

Type: Article

More publications in: Procesamiento del lenguaje natural

Abstract

This paper studies the impact of misspelled queries on the performance of Cross-Language Information Retrieval systems and proposes two strategies for dealing with them: the use of automatic spelling correction techniques and the use of character n-grams both as index terms and translation units, thus allowing to take advantage of their inherent robustness. Our results demonstrate the sensitivity of these systems to such errors and the effectiveness of the proposed solutions. To the best of our knowledge there are no similar jobs in the cross-language field

Bibliographic References

  • Bendersky, M. y W.B. Croft. 2009. Analysis of long queries in a large scale search log. En Proc. of WSCD’09, págs. 8–14. ACM.
  • Dale, R., H. Moisi, y H. Somers, eds. 2000. Handbook of Natural Language Processing. Marcel Dekker, Inc.
  • Di Nunzio, G.M., N. Ferro, T. Mandl, y C. Peters. 2006. CLEF 2006: Ad Hoc Track Overview. En Working Notes of the CLEF 2006 Workshop, págs. 21–34.
  • Graña, J., M.A. Alonso, y M. Vilares. 2002. A common solution for tokenization and part-of-speech tagging: One-pass Viterbi algorithm vs. iterative approaches. LNCS, 2448:3–10.
  • Graña, J., F.M. Barcala, y J. Vilares. 2002. Formal methods of tokenization for part-of-speech tagging. LNCS, 2276:240–249.
  • Guo, J., G. Xu, H. Li, y X. Cheng. 2008. A unified and discriminative model for query refinement. En Proc. of ACM SIGIR’08, págs. 379–386. ACM.
  • Jansen, B.J., A. Spink, y T. Saracevic. 2000. Real life, real users, and real needs: a study and analysis of user queries on the web. Information Processing and Management, 36(2):207–227.
  • Koehn, P. 2005. Europarl: A Parallel Corpus for Statistical Machine Translation. En Proc. of MT Summit X, págs. 79–86. Corpus disponible en http://www.statmt.org/europarl/.
  • Koehn, P., F.J. Och, y D. Marcu. 2003. Statistical phrase-based translation. En Proc. of NAACL’03, p´ags. 48–54. ACL.
  • Kukich, K. 1992. Techniques for automatically correcting words in text. ACM Computing Surveys (CSUR), 24(4):377–439.
  • Levenshtein, V.I. 1966. Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics-Doklandy, 6:707–710.
  • Manning, C.D., P. Raghavan, y H. Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press.
  • McNamee, P. y J. Mayfield. 2004a. Character N-gram Tokenization for European Language Text Retrieval. Information Retrieval, 7(1-2):73–97.
  • McNamee, P. y J. Mayfield. 2004b. JHU/APL experiments in tokenization and non-word translation. LNCS, 3237:85–97.
  • Nie, J.-Y. 2010. Cross-Language Information Retrieval, vol. 8 de Synthesis Lectures on Human Language Technologies. Morgan& Claypool Publishers.
  • Och, F.J. y H. Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1):19–51. Herramienta disponible en http://code.google.com/p/giza-pp/.
  • Otero, J., J. Graña, y M. Vilares. 2007. Contextual Spelling Correction. LNCS, 4739:290–296.
  • Ounis, I., C. Lioma, C. Macdonald, y V. Plachouras. 2007. Research directions in Terrier: a search engine for advanced retrieval on the web. Novática/UPGRADE Special Issue on Web Information Access, 8(1):49–56. Toolkit disponible en http://www.terrier.org.
  • Rehm, G. y H. Uszkoreit, eds. 2011. METANET White Paper Series. Springer. Disponibles en http://www.meta-net.eu/ whitepapers.
  • Robertson, A.M. y P. Willett. 1998. Applications of n-grams in textual information systems. Journal of Documentation, 54(1):48–69.
  • Savary, A. 2002. Typographical nearestneighbor search in a finite-state lexicon and its application to spelling correction. LNCS, 2494:251–260.
  • Véronis, J. 1999. Multext-Corpora. An annotated corpus for five European languages. CD-ROM. Distributed by ELRA/ELDA.
  • Vilares, J., M.P. Oakes, y M. Vilares. 2007. A Knowledge-Light Approach to Query Translation in Cross-Language Information Retrieval. En Proc. of RANLP 2007, págs. 624–630.
  • Vilares, M., J. Otero, y J. Graña. 2004. On asymptotic finite-state error repair. LNCS, 3246:271–272.
  • Vilares, J., M. Vilares, y J. Otero. 2011. Managing Misspelled Queries in IR Applications. Information Processing & Management, 47(2):263–286.