Leveraging transfer learning for hate speech detection in Portuguese social media posts

Ramos, G.; Batista, F.; Ribeiro, R.; Fialho, P.; Moro, S.; Fonseca, A.; Guerra, R.; Carvalho, P.; Marques, C.; Silva, C.

doi:10.1109/ACCESS.2024.3430848

Utilize este identificador para referenciar este registo: http://hdl.handle.net/10071/32083

Registo completo

Campo DC	Valor	Idioma
dc.contributor.author	Ramos, G.	-
dc.contributor.author	Batista, F.	-
dc.contributor.author	Ribeiro, R.	-
dc.contributor.author	Fialho, P.	-
dc.contributor.author	Moro, S.	-
dc.contributor.author	Fonseca, A.	-
dc.contributor.author	Guerra, R.	-
dc.contributor.author	Carvalho, P.	-
dc.contributor.author	Marques, C.	-
dc.contributor.author	Silva, C.	-
dc.date.accessioned	2024-07-26T14:47:10Z	-
dc.date.available	2024-07-26T14:47:10Z	-
dc.date.issued	2024	-
dc.identifier.citation	Ramos, G., Batista, F., Ribeiro, R., Fialho, P., Moro, S., Fonseca, A., Guerra, R., Carvalho, P., Marques, C., & Silva, C. (2024). Leveraging transfer learning for hate speech detection in Portuguese social media posts. IEEE Access, 12, 101374-101389. http://doi.org/10.1109/ACCESS.2024.3430848	-
dc.identifier.issn	2169-3536	-
dc.identifier.uri	http://hdl.handle.net/10071/32083	-
dc.description.abstract	The rapid rise of social media has brought about new ways of digital communication, along with a worrying increase in online hate speech (HS), which, in turn, has led researchers to develop several Natural Language Processing methods for its detection. Although significant strides have been made in automating HS detection, research focusing on the European Portuguese language remains scarce (as it happens in several under-resourced languages). To address this gap, we explore the efficacy of various transfer learning models, which have been shown in the literature to have better performance for this task than other Deep Learning models. We employ BERT-like models pre-trained on Portuguese text, such as BERTimbau and mDeBERTa, as well as GPT, Gemini and Mistral generative models, for the detection of HS within Portuguese online discourse. Our study relies on two annotated corpora of YouTube comments and tweets, both annotated as HS and non-HS. Our findings show that the best model for the YouTube corpus was a variant of BERTimbau retrained with European Portuguese tweets and fine-tuned for the HS task, with an F-score of 87.1% for the positive class, outperforming the baseline models by more than 20% and with a 1.8% increase compared with base BERTimbau. The best model for the Twitter corpus was GPT-3.5, with an F-score of 50.2% for the positive class. We also assess the impact of using in-domain and mixed-domain training sets, as well as the impact of providing context in generative model prompts on their performance.	eng
dc.language.iso	eng	-
dc.publisher	IEEE	-
dc.relation	101049306	-
dc.rights	openAccess	-
dc.subject	Hate speech	eng
dc.subject	Transfer learning	eng
dc.subject	Transformer models	eng
dc.subject	Generative models	eng
dc.subject	Text classification	eng
dc.title	Leveraging transfer learning for hate speech detection in Portuguese social media posts	eng
dc.type	article	-
dc.pagination	101374 - 101389	-
dc.peerreviewed	yes	-
dc.volume	12	-
dc.date.updated	2024-07-26T15:45:04Z	-
dc.description.version	info:eu-repo/semantics/publishedVersion	-
dc.identifier.doi	10.1109/ACCESS.2024.3430848	-
iscte.identifier.ciencia	https://ciencia.iscte-iul.pt/id/ci-pub-104797	-
iscte.journal	IEEE Access	-
Aparece nas coleções:	BRU-RI - Artigos em revistas científicas internacionais com arbitragem científica CIS-RI - Artigos em revistas científicas internacionais com arbitragem científica CTI-RI - Artigos em revistas científicas internacionais com arbitragem científica ISTAR-RI - Artigos em revistas científicas internacionais com arbitragem científica

Ficheiros deste registo:

Ficheiro	Tamanho	Formato
article_104797.pdf	3,06 MB	Adobe PDF	Ver/Abrir

Mostrar registo em formato simples Visualizar estatísticas