Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Video Analysis by Deep Learning

Ramadan, Mona (2019) Video Analysis by Deep Learning. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Download (6MB) | Preview


The tasks of automatically classifying the content of videos or predicting the outcome of a series of events occurring in a sequence of frames, while may sound simple, are still very challenging research areas despite the vast improvement in computing hardware and the easy access to large sets of data. In our work, we extend machine learning techniques to comprehend videos by tackling three challenging tasks: video classification on the full-length video level, video classification both on the level of actions performed in certain frames and the full-length video level, and action prediction of upcoming events.

Classification on the video level is a classic machine learning problem that has been addressed previously. We address this problem both using a standard deep learning approach, where a deep convolutional neural network (CNN) is trained on video frames then a Long Short Term Memory (LSTM) network is used to aggregate the features learned by the CNN into a single video label. And we introduce a different approach that uses still images of a data set that is independent on the video data set to train a CNN that is later used to classify a selection of video frames and make a conclusion about the video class. Our approach results in a classification accuracy that ranges between 91% and 94% when processing only 10 to 300 video frames, respectively, of the test videos on a subset of the YouTube Sports-1M dataset.
Classification on the actions level and the video class level is not a well-addressed problem. We tackle the challenge by using a hybrid CNN-Hidden Markov Model (HMM) system where a dictionary of actions is constructed from the training data and is used to detect a sequence of video actions then map this actions sequence into a video class for the entire video. Our approach detects the actions in videos of the Actions for Cooking Eggs (ACE) data set with an accuracy of 79% while classifying the videos with a 100% accuracy.

Finally, we address the problem of next action prediction by using the same hybrid CNN-HMM system to predict the next performed action when only part of the video is available. Our approach successfully predicts the next first and second performed actions in a video stream with a probability higher than 50% when 60% or more of the video is available for processing, with the prediction accuracy continuing to increase as the system gains access to more video frames.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Ramadan, Monamhr23@pitt.edumhr230000-0002-1999-0142
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairEl-Jaroudi,
Committee MemberSejdic,
Committee MemberZhi-Hong,
Committee MemberAkcakaya,
Committee MemberLoughlin,
Thesis AdvisorEl-Jaroudi,
Date: 19 June 2019
Date Type: Publication
Defense Date: 14 November 2018
Approval Date: 19 June 2019
Submission Date: 18 March 2019
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 113
Institution: University of Pittsburgh
Schools and Programs: Swanson School of Engineering > Electrical Engineering
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Video classification, video prediction, deep learning
Date Deposited: 19 Jun 2019 15:00
Last Modified: 19 Jun 2019 15:00


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item