Please use this identifier to cite or link to this item:
http://hdl.handle.net/10071/26157
Author(s): | Bico, M. I. Baptista, J. Batista, F. Cardeira, E. |
Editor: | Silvello, G., Corcho, O., Manghi, P., Di Nunzio, G. M., Golub, K., Ferro, N., and Poggi, A. |
Date: | 2022 |
Title: | Early experiments on automatic annotation of Portuguese medieval texts |
Volume: | 13541 |
Book title/volume: | Linking theory and practice of digital libraries. Lecture Notes in Computer Science |
Pages: | 442 - 449 |
Event title: | 26th International Conference on Theory and Practice of Digital Libraries, TPDL 2022 |
Reference: | Bico, M. I., Baptista, J., Batista, F., & Cardeira, E. (2022). Early experiments on automatic annotation of Portuguese medieval texts. In G. Silvello, O. Corcho, P. Manghi, G. M. Di Nunzio, K. Golub, N. Ferro, & A. Poggi (Eds.), Lecture notes in computer science: Vol. 13541. Linking theory and practice of digital libraries (pp. 442-449). Springer. https://doi.org/10.1007/978-3-031-16802-4_44 |
ISSN: | 0302-9743 |
ISBN: | 978-3-031-16802-4 |
DOI (Digital Object Identifier): | 10.1007/978-3-031-16802-4_44 |
Keywords: | Automatic annotation Lemmatization Part-of-speech tagging Old portuguese |
Abstract: | This paper presents the challenges and solutions adopted to the lemmatization and part-of-speech (PoS) tagging of a corpus of Old Portuguese texts (up to 1525), to pave the way to the implementation of an automatic annotation of these Medieval texts. A highly granular tagset, previously devised for Modern Portuguese, was adapted to this end. A large text (∼155 thousand words) was manually annotated for PoS and lemmata and used to train an initial PoS-tagger model. When applied to two other texts, the resulting model attained 91.2% precision with a textual variant of the same text, and 67.4% with a new, unseen text. A second model was then trained with the data provided by the previous three texts and applied to two other unseen texts. The new model achieved a precision of 77.3% and 82.4%, respectively. |
Peerreviewed: | yes |
Access type: | Open Access |
Appears in Collections: | IT-CRI - Comunicações a conferências internacionais |
Files in This Item:
File | Size | Format | |
---|---|---|---|
conferenceobject_90833.pdf | 428,17 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.