The Lemmatisation of Old English Weak Verbs on a Relational Database
- Tío Sáenz, Marta
- Francisco Javier Martín Arista Director
- Monika Kirner Ludwig Co-director
Defence university: Universidad de La Rioja
Fecha de defensa: 28 October 2019
- Javier Calle Martín Chair
- Roberto Torre Alonso Secretary
- Begoña Crespo Committee member
Type: Thesis
Abstract
This thesis deals with the verbal morphology of the Old English language in order to identify and lemmatise weak verbs in a corpus accessed through a lexical database. Lemmatisation is a pending task in the field of historical linguistics given the lack of comprehensive and lemmatised corpora in this language. The focus of this doctoral dissertation is on the lemmatisation of the three classes of weak verbs, although the linguistic fields of Lexicography and Corpus Linguistics are also relevant to this research. The main aim involves the identification of the canonical and non-canonical realisations of the Old English weak verbs and their lemmatisation with a lemma from a reference list of weak verbs. Achieving this goal involves, firstly, the use of the available sources of the Old English language in order to lemmatise and validate the results and, secondly, the design of a semi-automatic research methodology that combines automatic searches in the lexical database Nerthus and the manual revision of the results in order to achieve this task. The sources for this investigation are the inflectional forms that are attested in the Dictionary of Old English Corpus (DOEC) which are available in the lemmatiser Norna, the lexicographical sources published on the Old English language, mainly the Dictionary of Old English (DOE), and other textual sources such as the York-Toronto-Helsinki Parsed Corpus of Old English (YCOE) and an index of secondary sources of Old English. The methodology comprises four successive steps and several tasks within each step. The first step aims at the lemmatisation of the transparent forms of weak verbs with the search of specific query strings for each subclass of weak verbs in the lemmatiser Norna, where an index type of the DOEC, the most reliable source of information regarding the Old English language, is available. Then, the second step validates the results with the DOE and adds to the analysis the non-canonical attestations for the weak verbs from the letter A-H. Thirdly, the identification of the most recurrent non-canonical inflectional endings and stem vowels attested in weak verbs gives rise to lemmatisation patterns. The search of these sets of correspondences and the list of non-canonical prefixes that is available in Norna results in the lemmatisation of the non-canonical inflections of weak verbs. The validation of the results from the letter I-Y concludes the research methodology with the syntactic parsing provided by the YCOE and the data retrieved from the index of secondary sources of Old English Freya. The issues that arise throughout the lemmatisation process mainly concern the idiosyncrasy of the Old English language writing system and the limitations of the lemmatisation by type that this investigation follows. The quantitative and qualitative discussion of the results of the analysis concludes this thesis. The main contributions of this thesis are the lists of weak lemmas and their lemmatised inflectional forms, specially those of the verbs I-Y which are not available yet and the designed research methodology to identify these forms, including the sets of lemmatisation patterns of the non-canonical inflectional endings and stem vowels of weak verbs.