Ramadan, Mona
(2019)
Video Analysis by Deep Learning.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
The tasks of automatically classifying the content of videos or predicting the outcome of a series of events occurring in a sequence of frames, while may sound simple, are still very challenging research areas despite the vast improvement in computing hardware and the easy access to large sets of data. In our work, we extend machine learning techniques to comprehend videos by tackling three challenging tasks: video classification on the full-length video level, video classification both on the level of actions performed in certain frames and the full-length video level, and action prediction of upcoming events.
Classification on the video level is a classic machine learning problem that has been addressed previously. We address this problem both using a standard deep learning approach, where a deep convolutional neural network (CNN) is trained on video frames then a Long Short Term Memory (LSTM) network is used to aggregate the features learned by the CNN into a single video label. And we introduce a different approach that uses still images of a data set that is independent on the video data set to train a CNN that is later used to classify a selection of video frames and make a conclusion about the video class. Our approach results in a classification accuracy that ranges between 91% and 94% when processing only 10 to 300 video frames, respectively, of the test videos on a subset of the YouTube Sports-1M dataset.
Classification on the actions level and the video class level is not a well-addressed problem. We tackle the challenge by using a hybrid CNN-Hidden Markov Model (HMM) system where a dictionary of actions is constructed from the training data and is used to detect a sequence of video actions then map this actions sequence into a video class for the entire video. Our approach detects the actions in videos of the Actions for Cooking Eggs (ACE) data set with an accuracy of 79% while classifying the videos with a 100% accuracy.
Finally, we address the problem of next action prediction by using the same hybrid CNN-HMM system to predict the next performed action when only part of the video is available. Our approach successfully predicts the next first and second performed actions in a video stream with a probability higher than 50% when 60% or more of the video is available for processing, with the prediction accuracy continuing to increase as the system gains access to more video frames.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
19 June 2019 |
Date Type: |
Publication |
Defense Date: |
14 November 2018 |
Approval Date: |
19 June 2019 |
Submission Date: |
18 March 2019 |
Access Restriction: |
No restriction; Release the ETD for access worldwide immediately. |
Number of Pages: |
113 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
Swanson School of Engineering > Electrical Engineering |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
Video classification, video prediction, deep learning |
Date Deposited: |
19 Jun 2019 15:00 |
Last Modified: |
19 Jun 2019 15:00 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/36069 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |