Viability of Sequence Labeling Encodings for Dependency Parsing

Strzyz, Michalina María

Viability of Sequence Labeling Encodings for Dependency Parsing

Strzyz, Michalina María

Supervised by:

David Vilares Calvo Co-director
Carlos Gómez Rodríguez Co-director

Defence university: Universidade da Coruña

Fecha de defensa: 20 December 2021

Committee:

Joakim Nivre Chair
Miguel Á. Alonso Secretary
Reut Tsarfaty Committee member

Department:

Computer Science and Information Technologies

Type: Thesis

Teseo: 697413 DIALNET RUC editor

Abstract

This thesis presents new methods for recasting dependency parsing as a sequence labeling task yielding a viable alternative to the traditional transition- and graph-based approaches. It is shown that sequence labeling parsers provide several advantages for dependency parsing, such as: (i) a good trade-off between accuracy and parsing speed, (ii) genericity which enables running a parser in generic sequence labeling software and (iii) pluggability which allows using full parse trees as features to downstream tasks. The backbone of dependency parsing as sequence labeling are the encodings which serve as linearization methods for mapping dependency trees into discrete labels, such that each token in a sentence is associated with a label. We introduce three encoding families comprising: (i) head selection, (ii) bracketing-based and (iii) transition-based encodings which are differentiated by the way they represent a dependency tree as a sequence of labels. We empirically examine the viability of the encodings and provide an analysis of their facets. Furthermore, we explore the feasibility of leveraging external complementary data in order to enhance parsing performance. Our sequence labeling parser is endowed with two kinds of representations. First, we exploit the complementary nature of dependency and constituency parsing paradigms and enrich the parser with representations from both syntactic abstractions. Secondly, we use human language processing data to guide our parser with representations from eye movements. Overall, the results show that recasting dependency parsing as sequence labeling is a viable approach that is fast and accurate and provides a practical alternative for integrating syntax in NLP tasks.