Comparing different machine learning approaches for disfluency structure detection in a corpus of university lectures

Medeiros, H.; Batista, F.; Moniz, H.; Trancoso, I.; Nunes, L.

doi:10.4230/OASIcs.SLATE.2013.259

Utilize este identificador para referenciar este registo: http://hdl.handle.net/10071/27854

Registo completo

Campo DC	Valor	Idioma
dc.contributor.author	Medeiros, H.	-
dc.contributor.author	Batista, F.	-
dc.contributor.author	Moniz, H.	-
dc.contributor.author	Trancoso, I.	-
dc.contributor.author	Nunes, L.	-
dc.contributor.editor	Leal, J. P., Rocha, R., and Simões, A.	-
dc.date.accessioned	2023-02-13T11:07:07Z	-
dc.date.available	2023-02-13T11:07:07Z	-
dc.date.issued	2013-01-01	-
dc.identifier.citation	Medeiros, H., Batista, F., Moniz, H., Trancoso, I., & Nunes, L. (2013). Comparing different machine learning approaches for disfluency structure detection in a corpus of university lectures. In J. P. Leal, R. Rocha, & A. Simões (Eds.), 2nd Symposium on Languages, Applications and Technologies, SLATE 2013 (vol. 29, pp. 259-269). OASIcs. https://doi.org/10.4230/OASIcs.SLATE.2013.259	-
dc.identifier.isbn	978-3-939897-52-1	-
dc.identifier.issn	2190-6807	-
dc.identifier.uri	http://hdl.handle.net/10071/27854	-
dc.description.abstract	This paper presents a number of experiments focusing on assessing the performance of different machine learning methods on the identification of disfluencies and their distinct structural regions over speech data. Several machine learning methods have been applied, namely Naive Bayes, Logistic Regression, Classification and Regression Trees (CARTs), J48 and Multilayer Perceptron. Our experiments show that CARTs outperform the other methods on the identification of the distinct structural disfluent regions. Reported experiments are based on audio segmentation and prosodic features, calculated from a corpus of university lectures in European Portuguese, containing about 32h of speech and about 7.7% of disfluencies. The set of features automatically extracted from the forced alignment corpus proved to be discriminant of the regions contained in the production of a disfluency. This work shows that using fully automatic prosodic features, disfluency structural regions can be reliably identified using CARTs, where the best results achieved correspond to 81.5% precision, 27.6% recall, and 41.2% F-measure. The best results concern the detection of the interregnum, followed by the detection of the interruption point.	eng
dc.language.iso	eng	-
dc.publisher	OASIcs	-
dc.relation	info:eu-repo/grantAgreement/FCT/PIDDAC/SFRH%2FBD%2F44671%2F2008/PT	-
dc.relation	info:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/PEst-OE%2FEEI%2FLA0021%2F2011/PT	-
dc.relation	info:eu-repo/grantAgreement/FCT/3599-PPCDT/CMU-PT%2FHuMach%2F0039%2F2008/PT	-
dc.relation.ispartof	2nd Symposium on Languages, Applications and Technologies, SLATE 2013	-
dc.rights	openAccess	-
dc.subject	Machine learning	eng
dc.subject	Speech processing	eng
dc.subject	Prosodic features	eng
dc.subject	Automatic detection of disfluencies	eng
dc.title	Comparing different machine learning approaches for disfluency structure detection in a corpus of university lectures	eng
dc.type	conferenceObject	-
dc.event.title	2nd Symposium on Languages, Applications and Technologies, SLATE 2013	-
dc.event.type	Conferência	pt
dc.event.location	Porto	eng
dc.event.date	2013	-
dc.pagination	259 - 269	-
dc.peerreviewed	yes	-
dc.volume	29	-
dc.date.updated	2023-02-13T11:04:08Z	-
dc.description.version	info:eu-repo/semantics/publishedVersion	-
dc.identifier.doi	10.4230/OASIcs.SLATE.2013.259	-
dc.subject.fos	Domínio/Área Científica::Ciências Naturais::Matemáticas	por
dc.subject.fos	Domínio/Área Científica::Ciências Sociais::Geografia Económica e Social	por
iscte.identifier.ciencia	https://ciencia.iscte-iul.pt/id/ci-pub-12189	-
iscte.alternateIdentifiers.scopus	2-s2.0-84893245717	-
Aparece nas coleções:	ISTAR-CRI - Comunicações a conferências internacionais IT-CRI - Comunicações a conferências internacionais

Ficheiros deste registo:

Ficheiro	Tamanho	Formato
conferenceobject_12189.pdf	413 kB	Adobe PDF	Ver/Abrir

Mostrar registo em formato simples Visualizar estatísticas