Utilize este identificador para referenciar este registo: http://hdl.handle.net/10071/32083
Registo completo
Campo DCValorIdioma
dc.contributor.authorRamos, G.-
dc.contributor.authorBatista, F.-
dc.contributor.authorRibeiro, R.-
dc.contributor.authorFialho, P.-
dc.contributor.authorMoro, S.-
dc.contributor.authorFonseca, A.-
dc.contributor.authorGuerra, R.-
dc.contributor.authorCarvalho, P.-
dc.contributor.authorMarques, C.-
dc.contributor.authorSilva, C.-
dc.date.accessioned2024-07-26T14:47:10Z-
dc.date.available2024-07-26T14:47:10Z-
dc.date.issued2024-
dc.identifier.citationRamos, G., Batista, F., Ribeiro, R., Fialho, P., Moro, S., Fonseca, A., Guerra, R., Carvalho, P., Marques, C., & Silva, C. (2024). Leveraging transfer learning for hate speech detection in Portuguese social media posts. IEEE Access, 12, 101374-101389. http://doi.org/10.1109/ACCESS.2024.3430848-
dc.identifier.issn2169-3536-
dc.identifier.urihttp://hdl.handle.net/10071/32083-
dc.description.abstractThe rapid rise of social media has brought about new ways of digital communication, along with a worrying increase in online hate speech (HS), which, in turn, has led researchers to develop several Natural Language Processing methods for its detection. Although significant strides have been made in automating HS detection, research focusing on the European Portuguese language remains scarce (as it happens in several under-resourced languages). To address this gap, we explore the efficacy of various transfer learning models, which have been shown in the literature to have better performance for this task than other Deep Learning models. We employ BERT-like models pre-trained on Portuguese text, such as BERTimbau and mDeBERTa, as well as GPT, Gemini and Mistral generative models, for the detection of HS within Portuguese online discourse. Our study relies on two annotated corpora of YouTube comments and tweets, both annotated as HS and non-HS. Our findings show that the best model for the YouTube corpus was a variant of BERTimbau retrained with European Portuguese tweets and fine-tuned for the HS task, with an F-score of 87.1% for the positive class, outperforming the baseline models by more than 20% and with a 1.8% increase compared with base BERTimbau. The best model for the Twitter corpus was GPT-3.5, with an F-score of 50.2% for the positive class. We also assess the impact of using in-domain and mixed-domain training sets, as well as the impact of providing context in generative model prompts on their performance.eng
dc.language.isoeng-
dc.publisherIEEE-
dc.relation101049306-
dc.rightsopenAccess-
dc.subjectHate speecheng
dc.subjectTransfer learningeng
dc.subjectTransformer modelseng
dc.subjectGenerative modelseng
dc.subjectText classificationeng
dc.titleLeveraging transfer learning for hate speech detection in Portuguese social media postseng
dc.typearticle-
dc.pagination101374 - 101389-
dc.peerreviewedyes-
dc.volume12-
dc.date.updated2024-07-26T15:45:04Z-
dc.description.versioninfo:eu-repo/semantics/publishedVersion-
dc.identifier.doi10.1109/ACCESS.2024.3430848-
iscte.identifier.cienciahttps://ciencia.iscte-iul.pt/id/ci-pub-104797-
iscte.journalIEEE Access-
Aparece nas coleções:CIS-RI - Artigos em revistas científicas internacionais com arbitragem científica
CTI-RI - Artigos em revistas científicas internacionais com arbitragem científica
ISTAR-RI - Artigos em revistas científicas internacionais com arbitragem científica

Ficheiros deste registo:
Ficheiro TamanhoFormato 
article_104797.pdf3,06 MBAdobe PDFVer/Abrir


FacebookTwitterDeliciousLinkedInDiggGoogle BookmarksMySpaceOrkut
Formato BibTex mendeley Endnote Logotipo do DeGóis Logotipo do Orcid 

Todos os registos no repositório estão protegidos por leis de copyright, com todos os direitos reservados.