Utilize este identificador para referenciar este registo: http://hdl.handle.net/10071/25973
Registo completo
Campo DCValorIdioma
dc.contributor.authorSantos, R. B.-
dc.contributor.authorMatos, B. C.-
dc.contributor.authorCarvalho, P.-
dc.contributor.authorBatista, F.-
dc.contributor.authorRibeiro, R.-
dc.contributor.editorCordeiro, J., Pereira, M. J., Rodrigues, N. F., and Pais, S.-
dc.date.accessioned2022-08-02T13:59:09Z-
dc.date.available2022-08-02T13:59:09Z-
dc.date.issued2022-
dc.identifier.isbn978-3-95977-245-7-
dc.identifier.issn2190-6807-
dc.identifier.urihttp://hdl.handle.net/10071/25973-
dc.description.abstractWith the increasing spread of hate speech (HS) on social media, it becomes urgent to develop models that can help detecting it automatically. Typically, such models require large-scale annotated corpora, which are still scarce in languages such as Portuguese. However, creating manually annotated corpora is a very expensive and time-consuming task. To address this problem, we propose an ensemble of two semi-supervised models that can be used to automatically create a corpus representative of online hate speech in Portuguese. The first model combines Generative Adversarial Networks and a BERT-based model. The second model is based on label propagation, and consists of propagating labels from existing annotated corpora to the unlabeled data, by exploring the notion of similarity. We have explored the annotations of three existing corpora (CO-HATE, ToLR-BR, and HPHS) in order to automatically annotate FIGHT, a corpus composed of geolocated tweets produced in the Portuguese territory. Through the process of selecting the best model and the corresponding setup, we have tested different pre-trained embeddings, performed experiments using different training subsets, labeled by different annotators with different perspectives, and performed several experiments with active learning. Furthermore, this work explores back translation as a mean to automatically generate additional hate speech samples. The best results were achieved by combining all the labeled datasets, obtaining 0.664 F1-score for the Hate Speech class in FIGHT.eng
dc.language.isoeng-
dc.publisherSchloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing-
dc.relationHATE Covid-19 (Proj. 759274510)-
dc.relationinfo:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDB%2F50021%2F2020/PT-
dc.relationinfo:eu-repo/grantAgreement/FCT/3599-PPCDT/PTDC%2FCCI-CIF%2F32607%2F2017/PT-
dc.relation.ispartofOpenAccess Series in Informatics-
dc.rightsopenAccess-
dc.subjectHate speecheng
dc.subjectSemi-supervised learningeng
dc.subjectSemi-automatic annotationeng
dc.titleSemi-supervised annotation of Portuguese hate speech across social media domainseng
dc.typeconferenceObject-
dc.event.title11th Symposium on Languages, Applications and Technologies (SLATE 2022)-
dc.event.typeConferênciapt
dc.event.locationCovilhãeng
dc.event.date2022-
dc.peerreviewedyes-
dc.volume104-
dc.date.updated2022-08-02T14:57:13Z-
dc.description.versioninfo:eu-repo/semantics/publishedVersion-
dc.identifier.doi10.4230/OASIcs.SLATE.2022.11-
dc.subject.fosDomínio/Área Científica::Ciências Naturais::Ciências da Computação e da Informaçãopor
dc.subject.fosDomínio/Área Científica::Humanidades::Línguas e Literaturaspor
iscte.subject.odsPaz, justiça e instituições eficazespor
iscte.identifier.cienciahttps://ciencia.iscte-iul.pt/id/ci-pub-89928-
Aparece nas coleções:IT-CRI - Comunicações a conferências internacionais

Ficheiros deste registo:
Ficheiro TamanhoFormato 
conferenceobject_89928.pdf541,72 kBAdobe PDFVer/Abrir


FacebookTwitterDeliciousLinkedInDiggGoogle BookmarksMySpaceOrkut
Formato BibTex mendeley Endnote Logotipo do DeGóis Logotipo do Orcid 

Todos os registos no repositório estão protegidos por leis de copyright, com todos os direitos reservados.