Viability of Sequence Labeling Encodings for Dependency Parsing
- Strzyz, Michalina María
- David Vilares Calvo Co-director
- Carlos Gómez Rodríguez Co-director
Defence university: Universidade da Coruña
Fecha de defensa: 20 December 2021
- Joakim Nivre Chair
- Miguel Á. Alonso Secretary
- Reut Tsarfaty Committee member
Type: Thesis
Abstract
This thesis presents new methods for recasting dependency parsing as a sequence labeling task yielding a viable alternative to the traditional transition- and graph-based approaches. It is shown that sequence labeling parsers provide several advantages for dependency parsing, such as: (i) a good trade-off between accuracy and parsing speed, (ii) genericity which enables running a parser in generic sequence labeling software and (iii) pluggability which allows using full parse trees as features to downstream tasks. The backbone of dependency parsing as sequence labeling are the encodings which serve as linearization methods for mapping dependency trees into discrete labels, such that each token in a sentence is associated with a label. We introduce three encoding families comprising: (i) head selection, (ii) bracketing-based and (iii) transition-based encodings which are differentiated by the way they represent a dependency tree as a sequence of labels. We empirically examine the viability of the encodings and provide an analysis of their facets. Furthermore, we explore the feasibility of leveraging external complementary data in order to enhance parsing performance. Our sequence labeling parser is endowed with two kinds of representations. First, we exploit the complementary nature of dependency and constituency parsing paradigms and enrich the parser with representations from both syntactic abstractions. Secondly, we use human language processing data to guide our parser with representations from eye movements. Overall, the results show that recasting dependency parsing as sequence labeling is a viable approach that is fast and accurate and provides a practical alternative for integrating syntax in NLP tasks.