Utilize este identificador para referenciar este registo: http://hdl.handle.net/10071/28844
Registo completo
Campo DCValorIdioma
dc.contributor.authorFarkhari, H.-
dc.contributor.authorViana, J.-
dc.contributor.authorCampos, L. M.-
dc.contributor.authorSebastião, P.-
dc.contributor.authorBernardo, L.-
dc.contributor.editorFonseca, N. L. S. da., Marca, J. R. B. da., Bregni, S., and Granville, L. Z.-
dc.date.accessioned2023-06-30T09:39:18Z-
dc.date.available2023-06-30T09:39:18Z-
dc.date.issued2022-
dc.identifier.citationFarkhari, H., Viana, J., Campos, L. M., Sebastião, P., & Bernardo, L. (2022). New PCA-based category encoder for efficient data processing in IoT devices. In N. L. S. da Fonseca, J. R. B. da Marca, S. Bregni, & L. Z. Granville (Eds.), 2022 IEEE Globecom Workshops (GC Wkshps) (pp. 789-795). IEEE. https://doi.org/10.1109/GCWkshps56602.2022.10008757-
dc.identifier.isbn978-1-6654-5975-4-
dc.identifier.urihttp://hdl.handle.net/10071/28844-
dc.description.abstractIncreasing the cardinality of categorical variables might decrease the overall performance of machine learning (ML) algorithms. This paper presents a novel computational preprocessing method to convert categorical to numerical variables ML algorithms. It uses a supervised binary classifier to extract additional context-related features from the categorical values. The method requires two hyperparameters: a threshold related to the distribution of categories in the variables and the PCA representativeness. This paper applies the proposed approach to the well-known cybersecurity NSLKDD dataset to select and convert three categorical features to numerical features. After choosing the threshold parameter, we use conditional probabilities to convert the three categorical variables into six new numerical variables. Next, we feed these numerical variables to the PCA algorithm and select the whole or partial numbers of the Principal Components (PCs). Finally, by applying binary classification with ten different classifiers, we measure the performance of the new encoder and compare it with the other 17 well-known category encoders. The new technique achieves the highest performance related to accuracy and Area Under the Curve (AUC) on high cardinality categorical variables. Also, we define the harmonic average metrics to find the best trade-off between train and test performances and prevent underfitting and overfitting. Ultimately, the number of newly created numerical variables is minimal. This data reduction improves computational processing time in Internet of things (IoT) devices connected to future networks.eng
dc.language.isoeng-
dc.publisherIEEE-
dc.relationinfo:eu-repo/grantAgreement/EC/H2020/813391/EU-
dc.relation.ispartof2022 IEEE Globecom Workshops (GC Wkshps)-
dc.rightsopenAccess-
dc.subjectCategorical encoderseng
dc.subjectDimensionality reductioneng
dc.subjectInternet of thingseng
dc.subjectFeature selectioneng
dc.subjectMachine learningeng
dc.subjectNSLKDDeng
dc.subjectPrincipal component analyseseng
dc.titleNew PCA-based category encoder for efficient data processing in IoT deviceseng
dc.typeconferenceObject-
dc.event.title2022 IEEE GLOBECOM Workshops, GC Wkshps 2022-
dc.event.typeWorkshoppt
dc.event.locationRio de Janeiro, Brazileng
dc.event.date2022-
dc.pagination789 - 795-
dc.peerreviewedyes-
dc.date.updated2023-06-30T10:37:49Z-
dc.description.versioninfo:eu-repo/semantics/acceptedVersion-
dc.identifier.doi10.1109/GCWkshps56602.2022.10008757-
dc.subject.fosDomínio/Área Científica::Ciências Naturais::Ciências da Computação e da Informaçãopor
dc.subject.fosDomínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informáticapor
iscte.identifier.cienciahttps://ciencia.iscte-iul.pt/id/ci-pub-96441-
iscte.alternateIdentifiers.scopus2-s2.0-85146906738-
Aparece nas coleções:IT-CRI - Comunicações a conferências internacionais

Ficheiros deste registo:
Ficheiro TamanhoFormato 
conferenceobject_96441.pdf628,03 kBAdobe PDFVer/Abrir


FacebookTwitterDeliciousLinkedInDiggGoogle BookmarksMySpaceOrkut
Formato BibTex mendeley Endnote Logotipo do DeGóis Logotipo do Orcid 

Todos os registos no repositório estão protegidos por leis de copyright, com todos os direitos reservados.