Please use this identifier to cite or link to this item: http://hdl.handle.net/10071/37300
Full metadata record
DC FieldValueLanguage
dc.contributor.authorLopes, D. G.-
dc.contributor.authorGasiba, T.-
dc.contributor.authorSathwik, A.-
dc.contributor.authorPinto-Albuquerque, M.-
dc.contributor.editorQueirós, Ricardo-
dc.contributor.editorPinto, Mário-
dc.contributor.editorPortela, Filipe-
dc.contributor.editorSimões, Alberto-
dc.date.accessioned2026-05-19T08:36:39Z-
dc.date.available2026-05-19T08:36:39Z-
dc.date.issued2025-
dc.identifier.citationLopes, D. G., Gasiba, T., Sathwik, A., & Pinto-Albuquerque, M. (2025). Can open large language models catch vulnerabilities?. In R. Queirós, M. Pinto, F. Portela, & A. Simões (Eds.), 6th International Computer Programming Education Conference (ICPEC 2025). Schloss Dagstuhl. https://doi.org/10.4230/OASIcs.ICPEC.2025.4-
dc.identifier.isbn978-3-95977-393-5-
dc.identifier.issn1868-8969-
dc.identifier.urihttp://hdl.handle.net/10071/37300-
dc.description.abstractAs Large Language Models (LLMs) become increasingly integrated into secure software development workflows, a critical question remains unanswered: can these models not only detect insecure code but also reliably classify vulnerabilities according to standardized taxonomies? In this work, we conduct a systematic evaluation of three state-of-the-art LLMs - Llama3, Codestral, and Deepseek R1 - using a carefully filtered subset of the Big-Vul dataset annotated with eight representative Common Weakness Enumeration categories. Adopting a closed-world classification setup, we assess each model's performance in both identifying the presence of vulnerabilities and mapping them to the correct CWE label. Our findings reveal a sharp contrast between high detection rates and markedly poor classification accuracy, with frequent overgeneralization and misclassification. Moreover, we analyze model-specific biases and common failure modes, shedding light on the limitations of current LLMs in performing fine-grained security reasoning.These insights are especially relevant in educational contexts, where LLMs are being adopted as learning aids despite their limitations. A nuanced understanding of their behaviour is essential to prevent the propagation of misconceptions among students. Our results expose key challenges that must be addressed before LLMs can be reliably deployed in security-sensitive environments.eng
dc.language.isoeng-
dc.publisherSchloss Dagstuhl-
dc.relationinfo:eu-repo/grantAgreement/FCT/Concurso de avaliação no âmbito do Programa Plurianual de Financiamento de Unidades de I&D (2017%2F2018) - Financiamento Base/UIDB%2F04466%2F2020/PT-
dc.relation.ispartof6th International Computer Programming Education Conference (ICPEC 2025)-
dc.rightsopenAccess-
dc.subjectLarge Language Models (LLMs)eng
dc.subjectSecure codingeng
dc.subjectCWE Classificationeng
dc.subjectMachine learningeng
dc.subjectSoftware vulnerability detectioneng
dc.subjectArtificial intelligenceeng
dc.subjectCode analysiseng
dc.subjectBig-Vul dataseteng
dc.titleCan open large language models catch vulnerabilities?eng
dc.typeconferenceObject-
dc.event.typeConferênciapt
dc.event.locationPortoeng
dc.event.date2025-
dc.peerreviewedyes-
dc.volume133-
dc.date.updated2026-05-19T09:35:51Z-
dc.description.versioninfo:eu-repo/semantics/publishedVersion-
dc.identifier.doi10.4230/OASIcs.ICPEC.2025.4-
dc.subject.fosDomínio/Área Científica::Ciências Naturais::Matemáticaspor
dc.subject.fosDomínio/Área Científica::Ciências Sociais::Geografia Económica e Socialpor
iscte.identifier.cienciahttps://ciencia.iscte-iul.pt/id/ci-pub-116621-
iscte.alternateIdentifiers.wosWOS:001748591000004-
Appears in Collections:ISTAR-CRI - Comunicações a conferências internacionais

Files in This Item:
File SizeFormat 
conferenceObject_116621.pdf712,05 kBAdobe PDFView/Open


FacebookTwitterDeliciousLinkedInDiggGoogle BookmarksMySpaceOrkut
Formato BibTex mendeley Endnote Logotipo do DeGóis Logotipo do Orcid 

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.