Human activity recognition and prediction in RGB-D videos

Jardim, David Walter Figueira

Utilize este identificador para referenciar este registo: http://hdl.handle.net/10071/19571

Autoria:	Jardim, David Walter Figueira
Orientação:	Nunes, Luís Miguel Martins Dias, Miguel Sales
Data:	31-Jan-2018
Título próprio:	Human activity recognition and prediction in RGB-D videos
Referência bibliográfica:	JARDIM, David Walter Figueira - Human activity recognition and prediction in RGB-D videos [Em linha]. Lisboa: ISCTE-IUL, 2018. Tese de doutoramento. [Consult. Dia Mês Ano] Disponível em www:<http://hdl.handle.net/10071/19571>.
ISBN:	978-989-781-238-5
Palavras-chave:	Kinect RGB-D Machine learning Reconhecimento de padrões Skeletal-tracking Temporal segmentation Labeling Human motion analysis Action recognition Action prediction Anticipation
Resumo:	Reconhecimento de atividade humana é uma área de investigação multidisciplinar que tem atraído o interesse de investigadores especializados em aprendizagem automática, visão por computador e medicina. Esta área tem diversas aplicações: sistemas de vigilância, interação homem-máquina, análise de desportos, robôs colaborativos, saúde e automóveis autónomos. Capturar atividade humana apresenta diﬁculdades técnicas como oclusão, iluminação insuﬁciente, seguimento erróneo e questões éticas. O movimento humano pode ser ambíguo e com múltiplas intenções. A forma como interagimos com outros seres humanos e objetos cria uma combinação quase inﬁnita de variações de como fazemos as coisas. O objetivo desta dissertação é desenvolver um sistema capaz de reconhecer e prever a atividade humana usando técnicas de aprendizagem automática para extrair signiﬁcado de características calculadas a partir de articulações do corpo humano capturado pela câmara Kinect. Propomos uma arquitetura hierárquica e modular que realiza segmentação temporal de sequências de ações, anotação semi-supervisionada de sub-atividades utilizando técnicas de clustering, reconhecimento de sub-atividade frame-a-frame em tempo real usando classiﬁcadores binários de random decision forests logo a partir dos primeiros instantes da ação e previsão de atividade em tempo real baseada em conditional random ﬁelds para modelar a estrutura das sequências de ações para obter as futuras possibilidades. Gravámos um novo conjunto de dados contendo sequências de ações agressivas com um total de 72 sequências, 360 amostras de 8 ações distintas realizadas por 12 sujeitos. Efetuamos testes extensivos com dois conjuntos de dados, comparando o desempenho de reconhecimento de vários classiﬁcadores supervisionados treinados com dados anotados manualmente ou com dados anotados de forma semi-supervisionada. Aprendemos como a qualidade dos conjuntos de treino afeta os resultado que dependem também da complexidade das ações que estão a ser reconhecidas. Conseguímos obter melhores resultados que algumas das abordagens existentes na literatura em reconhecimento de atividade, efetuamos o reconhecimento de forma antecipada e obtivemos resultados encorajadores na previsão de atividades. Human Activity Recognition is an interdisciplinary research area that has been attracting interest from several research communities specialized in machine learning, computer vision, and medical research. The potential applications range from surveillance systems, human computer interfaces, sports analysis, digital assistants, collaborative robots, health-care and self-driving cars. Capturing human activity presents technical diﬃculties like occlusion, insuﬃcient lighting, unreliable tracking and ethical concerns. Human motion can be ambiguous and have multiple intents. The complexity of our lives and how we interact with other humans and objects prompt to a nearly inﬁnite combination of variations in how we do things. The focus of this dissertation is to develop a system capable of recognizing and predicting human activity using machine learning techniques to extract meaning from features computed from relevant joints of the human body captured by the skeleton tracker of the Kinect sensor. We propose a modular framework that performs oﬀ-line temporal segmentation of sequences of actions, oﬀ-line semi unsupervised labeling of sub-activities via clustering techniques, real-time frame by-frame sub-activity recognition using random decision forest binary classiﬁers right from the very ﬁrst frames of the action and real-time activity prediction with conditional random ﬁelds to model the sequential structure of sequences of actions to reason about future possibilities. We recorded a new dataset containing long sequences of aggressive actions with a total of 72 sequences, 360 samples of 8 distinct actions performed by 12 subjects. We experimented extensively with two diﬀerent datasets, compared the recognition performance of several supervised classiﬁers trained with manually labeled data versus semi-unsupervised labeled data. We learned how the quality of the training data aﬀects the results which also depends on the complexity of the actions being recognized. We outperformed state-ofthe-art activity recognition approaches, performed early action recognition and obtained encouraging results in activity prediction.
Designação do grau:	Doutoramento em Ciências e Tecnologias da Informação
Arbitragem científica:	yes
Acesso:	Acesso Aberto
Aparece nas coleções:	T&D-TD - Teses de doutoramento

Ficheiros deste registo:

Ficheiro	Descrição	Tamanho	Formato
master_david_figueira_jardim.pdf		7,89 MB	Adobe PDF	Ver/Abrir

Mostrar registo em formato completo Visualizar estatísticas