Please use this identifier to cite or link to this item: http://hdl.handle.net/10071/26157
Author(s): Bico, M. I.
Baptista, J.
Batista, F.
Cardeira, E.
Editor: Silvello, G., Corcho, O., Manghi, P., Di Nunzio, G. M., Golub, K., Ferro, N., and Poggi, A.
Date: 2022
Title: Early experiments on automatic annotation of Portuguese medieval texts
Volume: 13541
Book title/volume: Linking theory and practice of digital libraries. Lecture Notes in Computer Science
Pages: 442 - 449
Event title: 26th International Conference on Theory and Practice of Digital Libraries, TPDL 2022
Reference: Bico, M. I., Baptista, J., Batista, F., & Cardeira, E. (2022). Early experiments on automatic annotation of Portuguese medieval texts. In G. Silvello, O. Corcho, P. Manghi, G. M. Di Nunzio, K. Golub, N. Ferro, & A. Poggi (Eds.), Lecture notes in computer science: Vol. 13541. Linking theory and practice of digital libraries (pp. 442-449). Springer. https://doi.org/10.1007/978-3-031-16802-4_44
ISSN: 0302-9743
ISBN: 978-3-031-16802-4
DOI (Digital Object Identifier): 10.1007/978-3-031-16802-4_44
Keywords: Automatic annotation
Lemmatization
Part-of-speech tagging
Old portuguese
Abstract: This paper presents the challenges and solutions adopted to the lemmatization and part-of-speech (PoS) tagging of a corpus of Old Portuguese texts (up to 1525), to pave the way to the implementation of an automatic annotation of these Medieval texts. A highly granular tagset, previously devised for Modern Portuguese, was adapted to this end. A large text (∼155 thousand words) was manually annotated for PoS and lemmata and used to train an initial PoS-tagger model. When applied to two other texts, the resulting model attained 91.2% precision with a textual variant of the same text, and 67.4% with a new, unseen text. A second model was then trained with the data provided by the previous three texts and applied to two other unseen texts. The new model achieved a precision of 77.3% and 82.4%, respectively.
Peerreviewed: yes
Access type: Open Access
Appears in Collections:IT-CRI - Comunicações a conferências internacionais

Files in This Item:
File SizeFormat 
conferenceobject_90833.pdf428,17 kBAdobe PDFView/Open


FacebookTwitterDeliciousLinkedInDiggGoogle BookmarksMySpaceOrkut
Formato BibTex mendeley Endnote Logotipo do DeGóis Logotipo do Orcid 

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.