Automation of information extraction and analysis of Portuguese judgments

Caetano, Igor da Cunha

Please use this identifier to cite or link to this item: http://hdl.handle.net/10071/36334

Author(s):	Caetano, Igor da Cunha
Advisor:	Barateiro, José Eduardo de Mendonça Tomás Dias, João Miguel de Sousa de Assis
Date:	27-Nov-2025
Title:	Automation of information extraction and analysis of Portuguese judgments
Reference:	Caetano, I. da C. (2025). Automation of information extraction and analysis of Portuguese judgments [Dissertação de mestrado, Iscte - Instituto Universitário de Lisboa]. Repositório Iscte. http://hdl.handle.net/10071/36334
Keywords:	Large Language Models Retrieval augmented generation Documentos Portugueses Portuguese Documents
Abstract:	Legal search entails thoroughly analyzing possible relevant documents to identify applicable legal principles, rules and legislation that can support legal decision-making in order to strengthen or undermine a legal position. This demanding process requires experts to invest a considerable amount of time and labor, highlighting one of the challenges inherent in the field. The growth of Natural Language Processing (NLP) has propelled the interest of many researchers and developers to create and develop tools for the legal domain in order to support legal experts in such tasks, allowing them to work more efficiently. With the emergence of LLMs, many have been developing tools specific for the legal domain due to their vast capabilities across fields and tasks. However, the use of such technologies brings challenges related not only to the models’ tendency to hallucinate but also on how we can evaluate these systems without annotated datasets. This study aims to explore how, through building a proof-of-concept RAG-based tool, LLMs can be used for aiding legal experts in legal search while minimizing hallucinations, and to develop an evaluation methodology that can be applied when we don’t have access to annotated data to validate or evaluate our systems. A pesquisa jurídica envolve a análise minuciosa de documentos relevantes para identificar princípios jurídicos, regras e legislação aplicáveis que possam apoiar a tomada de decisões jurídicas, a fim de fortalecer ou enfraquecer uma posição jurídica. Esse processo exigente requer que os especialistas invistam uma quantidade considerável de tempo e trabalho, destacando um dos desafios inerentes à área. O crescimento do Processamento de Linguagem Natural (NLP) impulsionou o interesse de muitos investigadores e desenvolvedores em criar e desenvolver ferramentas para o domínio jurídico, a fim de apoiar especialistas jurídicos nessas tarefas, permitindo-lhes trabalhar com mais eficiência. Com o surgimento dos Large Language Models (LLMs), muitos têm desenvolvido ferramentas específicas para o domínio jurídico devido às suas vastas capacidades em vários domínios e tarefas. No entanto, o uso destas tecnologias traz desafios relacionados não apenas à tendência dos modelos de alucinar, mas também à forma como podemos avaliar esses sistemas sem conjuntos de dados anotados. Este estudo tem como objetivo explorar como, através da construção de uma ferramenta baseada em Retrieval Augmented Generation (RAG) de prova de conceito, os LLMs podem ser usados para auxiliar peritos jurídicos na pesquisa jurídica, minimizando as alucinações, e desenvolver uma metodologia de avaliação que possa ser aplicada quando não temos acesso a dados anotados para validar ou avaliar os nossos sistemas.
Department:	Departamento de Ciências e Tecnologias da Informação
Degree:	Mestrado em Sistemas Integrados de Apoio à Decisão
Peerreviewed:	yes
Access type:	Open Access
Appears in Collections:	T&D-DM - Dissertações de mestrado

Files in This Item:

File	Description	Size	Format
master_igor_cunha_caetano.pdf		3,75 MB	Adobe PDF	View/Open

Show full item record

This item is licensed under a Creative Commons License