Can open large language models catch vulnerabilities?

Lopes, D. G.; Gasiba, T.; Sathwik, A.; Pinto-Albuquerque, M.

doi:10.4230/OASIcs.ICPEC.2025.4

Please use this identifier to cite or link to this item: http://hdl.handle.net/10071/37300

Full metadata record

DC Field	Value	Language
dc.contributor.author	Lopes, D. G.	-
dc.contributor.author	Gasiba, T.	-
dc.contributor.author	Sathwik, A.	-
dc.contributor.author	Pinto-Albuquerque, M.	-
dc.contributor.editor	Queirós, Ricardo	-
dc.contributor.editor	Pinto, Mário	-
dc.contributor.editor	Portela, Filipe	-
dc.contributor.editor	Simões, Alberto	-
dc.date.accessioned	2026-05-19T08:36:39Z	-
dc.date.available	2026-05-19T08:36:39Z	-
dc.date.issued	2025	-
dc.identifier.citation	Lopes, D. G., Gasiba, T., Sathwik, A., & Pinto-Albuquerque, M. (2025). Can open large language models catch vulnerabilities?. In R. Queirós, M. Pinto, F. Portela, & A. Simões (Eds.), 6th International Computer Programming Education Conference (ICPEC 2025). Schloss Dagstuhl. https://doi.org/10.4230/OASIcs.ICPEC.2025.4	-
dc.identifier.isbn	978-3-95977-393-5	-
dc.identifier.issn	1868-8969	-
dc.identifier.uri	http://hdl.handle.net/10071/37300	-
dc.description.abstract	As Large Language Models (LLMs) become increasingly integrated into secure software development workflows, a critical question remains unanswered: can these models not only detect insecure code but also reliably classify vulnerabilities according to standardized taxonomies? In this work, we conduct a systematic evaluation of three state-of-the-art LLMs - Llama3, Codestral, and Deepseek R1 - using a carefully filtered subset of the Big-Vul dataset annotated with eight representative Common Weakness Enumeration categories. Adopting a closed-world classification setup, we assess each model's performance in both identifying the presence of vulnerabilities and mapping them to the correct CWE label. Our findings reveal a sharp contrast between high detection rates and markedly poor classification accuracy, with frequent overgeneralization and misclassification. Moreover, we analyze model-specific biases and common failure modes, shedding light on the limitations of current LLMs in performing fine-grained security reasoning.These insights are especially relevant in educational contexts, where LLMs are being adopted as learning aids despite their limitations. A nuanced understanding of their behaviour is essential to prevent the propagation of misconceptions among students. Our results expose key challenges that must be addressed before LLMs can be reliably deployed in security-sensitive environments.	eng
dc.language.iso	eng	-
dc.publisher	Schloss Dagstuhl	-
dc.relation	info:eu-repo/grantAgreement/FCT/Concurso de avaliação no âmbito do Programa Plurianual de Financiamento de Unidades de I&D (2017%2F2018) - Financiamento Base/UIDB%2F04466%2F2020/PT	-
dc.relation.ispartof	6th International Computer Programming Education Conference (ICPEC 2025)	-
dc.rights	openAccess	-
dc.subject	Large Language Models (LLMs)	eng
dc.subject	Secure coding	eng
dc.subject	CWE Classification	eng
dc.subject	Machine learning	eng
dc.subject	Software vulnerability detection	eng
dc.subject	Artificial intelligence	eng
dc.subject	Code analysis	eng
dc.subject	Big-Vul dataset	eng
dc.title	Can open large language models catch vulnerabilities?	eng
dc.type	conferenceObject	-
dc.event.type	Conferência	pt
dc.event.location	Porto	eng
dc.event.date	2025	-
dc.peerreviewed	yes	-
dc.volume	133	-
dc.date.updated	2026-05-19T09:35:51Z	-
dc.description.version	info:eu-repo/semantics/publishedVersion	-
dc.identifier.doi	10.4230/OASIcs.ICPEC.2025.4	-
dc.subject.fos	Domínio/Área Científica::Ciências Naturais::Matemáticas	por
dc.subject.fos	Domínio/Área Científica::Ciências Sociais::Geografia Económica e Social	por
iscte.identifier.ciencia	https://ciencia.iscte-iul.pt/id/ci-pub-116621	-
iscte.alternateIdentifiers.wos	WOS:001748591000004	-
Appears in Collections:	ISTAR-CRI - Comunicações a conferências internacionais

Files in This Item:

File	Size	Format
conferenceObject_116621.pdf	712,05 kB	Adobe PDF	View/Open

Show simple item record