Skip navigation
User training | Reference and search service

Library catalog

EDS
b-on
More
resources
Content aggregators
Please use this identifier to cite or link to this item:

acessibilidade

http://hdl.handle.net/10071/20833
acessibilidade
Title: Automatic truecasing of video subtitles using BERT: a multilingual adaptable approach
Authors: Ricardo Rei
Nuno Miguel Guerreiro
Batista, F.
Editors: Lesot, Marie-Jeanne and Vieira, Susana and Reformat, Marek Z. and Carvalho, João Paulo and Wilbik, Anna and Bouchon-Meunier, Bernadette and Yager, Ronald R.
Issue Date: 2020
Publisher: Springer International Publishing
Abstract: This paper describes an approach for automatic capitalization of text without case information, such as spoken transcripts of video subtitles, produced by automatic speech recognition systems. Our approach is based on pre-trained contextualized word embeddings, requires only a small portion of data for training when compared with traditional approaches, and is able to achieve state-of-the-art results. The paper reports experiments both on general written data from the European Parliament, and on video subtitles, revealing that the proposed approach is suitable for performing capitalization, not only in each one of the domains, but also in a cross-domain scenario. We have also created a versatile multilingual model, and the conducted experiments show that good results can be achieved both for monolingual and multilingual data. Finally, we applied domain adaptation by finetuning models, initially trained on general written data, on video subtitles, revealing gains over other approaches not only in performance but also in terms of computational cost.
Peer reviewed: yes
URI: http://hdl.handle.net/10071/20833
DOI: 10.1007/978-3-030-50146-4_52
ISBN: 978-3-030-50146-4
Ciência-IUL: https://ciencia.iscte-iul.pt/id/ci-pub-72401
Appears in Collections:IT-CRI - Comunicações a conferências internacionais

Files in This Item:
acessibilidade
File Description SizeFormat 
Rei2020_Chapter_AutomaticTruecasingOfVideoSubt.pdfVersão Editora385.68 kBAdobe PDFView/Open


FacebookTwitterDeliciousLinkedInDiggGoogle BookmarksMySpace
Formato BibTex MendeleyEndnote Currículo DeGóis 

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.