Disfluency detection based on prosodic features for university lectures

Medeiros, H.; Moniz, H.; Batista, F.; Trancoso, I.; Nunes, L.

doi:10.21437/Interspeech.2013-605

Utilize este identificador para referenciar este registo: http://hdl.handle.net/10071/27819

Autoria:	Medeiros, H. Moniz, H. Batista, F. Trancoso, I. Nunes, L.
Editor:	Bimbot, F., Cerisara, C., Fougeron, C., Gravier, G., Lamel, L., Pellegrino, F., and Perrier, P.
Data:	2013
Título próprio:	Disfluency detection based on prosodic features for university lectures
Volume:	4
Título e volume do livro:	Proceedings of the 14th Annual Conference of the International Speech Communication Association (INTERSPEECH 2013)
Paginação:	2629 - 2633
Título do evento:	14th Annual Conference of the International Speech Communication Association (INTERSPEECH 2013)
Referência bibliográfica:	Medeiros, H., Moniz, H., Batista, F., Tjalve, M., Trancoso, I., & Nunes, L. (2013). Disfluency detection based on prosodic features for university lectures. In F. Bimbot, C. Cerisara, C. Fougeron, G. Gravier, L. Lamel, F. Pellegrino, & P. Perrier (Eds.), Proceedings of the 14th Annual Conference of the International Speech Communication Association (INTERSPEECH 2013) (vol. 4, pp. 2629-2633). International Speech Communication Association. https://doi.org/10.21437/Interspeech.2013-605
ISSN:	2308-457X
ISBN:	978-1-62993-443-3
DOI (Digital Object Identifier):	10.21437/Interspeech.2013-605
Palavras-chave:	Prosodic features Automatic disfluency detection Corpus of university lectures Machine learning
Resumo:	This paper focuses on the identification of disfluent sequences and their distinct structural regions, based on acoustic and prosodic features. Reported experiments are based on a corpus of university lectures in European Portuguese, with roughly 32h, and a relatively high percentage of disfluencies (7.6%). The set of features automatically extracted from the corpus proved to be discriminant of the regions contained in the production of a disfluency. Several machine learning methods have been applied, but the best results were achieved using Classification and Regression Trees (CART). The set of features which was most informative for cross-region identification encompasses word duration ratios, word confidence score, silent ratios, and pitch and energy slopes. Features such as the number of phones and syllables per word proved to be more useful for the identification of the interregnum, whereas energy slopes were most suited for identifying the interruption point.
Arbitragem científica:	yes
Acesso:	Acesso Aberto
Aparece nas coleções:	ISTAR-CRI - Comunicações a conferências internacionais IT-CRI - Comunicações a conferências internacionais

Ficheiros deste registo:

Ficheiro	Tamanho	Formato
conferenceobject_42668.pdf	219,68 kB	Adobe PDF	Ver/Abrir

Mostrar registo em formato completo Visualizar estatísticas