Please use this identifier to cite or link to this item: http://hdl.handle.net/10071/5355
Author(s): Medeiros, Henrique
Moniz, Helena
Batista, Fernando
Trancoso, Isabel
Nunes, Luís
Date: 30-Jul-2013
Title: Comparing different machine learning approaches for disfluency structure detection in a corpus of university lectures
Event title: Speech and Language Technology in Education (SLaTE 2013)
Keywords: Machine learning
speech processing
prosodic features
automatic detection of disfluencies
Abstract: machine learning methods on the identification of disfluencies and their distinct structural regions over speech data. Several machine learning methods have been applied, namely Naive Bayes, Logistic Regression, Classification and Regression Trees (CARTs), J48 and Multilayer Perceptron. Our experiments show that CARTs outperform the other methods on the identification of the distinct structural disfluent regions. Reported experiments are based on audio segmentation and prosodic features, calculated from a corpus of university lectures in European Portuguese, containing about 32h of speech and about 7.7% of disfluencies. The set of features automatically extracted from the forced alignment corpus proved to be discriminant of the regions contained in the production of a disfluency. This work shows that using fully automatic prosodic features, disfluency structural regions can be reliably identified using CARTs, where the best results achieved correspond to 81.5% precision, 27.6% recall, and 41.2% F-measure. The best results concern the detection of the interregnum, followed by the detection of the interruption point.
Peerreviewed: Sim
Access type: Restricted Access
Appears in Collections:CTI-CRI - Comunicações a conferências internacionais

Files in This Item:
File Description SizeFormat 
comparing-ml-disfs - camera ready.pdf
  Restricted Access
425,65 kBAdobe PDFView/Open Request a copy


FacebookTwitterDeliciousLinkedInDiggGoogle BookmarksMySpaceOrkut
Formato BibTex mendeley Endnote Logotipo do DeGóis Logotipo do Orcid 

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.