Comparing different approaches for detecting hate speech in online Portuguese comments

Matos, B. C.; Santos, R. B.; Carvalho, P.; Ribeiro, R.; Batista, F.

doi:10.4230/OASIcs.SLATE.2022.10

Utilize este identificador para referenciar este registo: http://hdl.handle.net/10071/25974

Registo completo

Campo DC	Valor	Idioma
dc.contributor.author	Matos, B. C.	-
dc.contributor.author	Santos, R. B.	-
dc.contributor.author	Carvalho, P.	-
dc.contributor.author	Ribeiro, R.	-
dc.contributor.author	Batista, F.	-
dc.contributor.editor	Cordeiro, J., Pereira, M. J., Rodrigues, N. F., and Pais, S.	-
dc.date.accessioned	2022-08-02T14:17:06Z	-
dc.date.available	2022-08-02T14:17:06Z	-
dc.date.issued	2022	-
dc.identifier.isbn	978-3-95977-245-7	-
dc.identifier.issn	2190-6807	-
dc.identifier.uri	http://hdl.handle.net/10071/25974	-
dc.description.abstract	Online Hate Speech (OHS) has been growing dramatically on social media, which has motivated researchers to develop a diversity of methods for its automated detection. However, the detection of OHS in Portuguese is still little studied. To fill this gap, we explored different models that proved to be successful in the literature to address this task. In particular, we have explored transfer learning approaches, based on existing BERT-like pre-trained models. The performed experiments were based on CO-HATE, a corpus of YouTube comments posted by the Portuguese online community that was manually labeled by different annotators. Among other categories, those comments were labeled regarding the presence of hate speech and the type of hate speech, specifically overt and covert hate speech. We have assessed the impact of using annotations from different annotators on the performance of such models. In addition, we have analyzed the impact of distinguishing overt and and covert hate speech. The results achieved show the importance of considering the annotator’s profile in the development of hate speech detection models. Regarding the hate speech type, the results obtained do not allow to make any conclusion on what type is easier to detect. Finally, we show that pre-processing does not seem to have a significant impact on the performance of this specific task.	eng
dc.language.iso	eng	-
dc.publisher	Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing	-
dc.relation	HATE Covid-19 (Proj. 759274510)	-
dc.relation	info:eu-repo/grantAgreement/FCT/3599-PPCDT/PTDC%2FCCI-CIF%2F32607%2F2017/PT	-
dc.relation	info:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDB%2F50021%2F2020/PT	-
dc.relation.ispartof	OpenAccess Series in Informatics	-
dc.rights	openAccess	-
dc.subject	Hate speech	eng
dc.subject	Text classification	eng
dc.subject	Transfer learning	eng
dc.subject	Supervised learning	eng
dc.subject	Deep learning	eng
dc.title	Comparing different approaches for detecting hate speech in online Portuguese comments	eng
dc.type	conferenceObject	-
dc.event.title	11th Symposium on Languages, Applications and Technologies (SLATE 2022)	-
dc.event.type	Conferência	pt
dc.event.location	Covilhã	eng
dc.event.date	2022	-
dc.peerreviewed	yes	-
dc.volume	104	-
dc.date.updated	2022-08-02T15:14:09Z	-
dc.description.version	info:eu-repo/semantics/publishedVersion	-
dc.identifier.doi	10.4230/OASIcs.SLATE.2022.10	-
dc.subject.fos	Domínio/Área Científica::Ciências Naturais::Ciências da Computação e da Informação	por
dc.subject.fos	Domínio/Área Científica::Humanidades::Línguas e Literaturas	por
iscte.subject.ods	Paz, justiça e instituições eficazes	por
iscte.identifier.ciencia	https://ciencia.iscte-iul.pt/id/ci-pub-89930	-
Aparece nas coleções:	IT-CRI - Comunicações a conferências internacionais

Ficheiros deste registo:

Ficheiro	Tamanho	Formato
conferenceobject_89930.pdf	516,26 kB	Adobe PDF	Ver/Abrir

Mostrar registo em formato simples Visualizar estatísticas