Policicchio, Shauna
(2015)
Bulk Analysis of Malicious PDF Documents.
Master's Thesis, University of Pittsburgh.
(Unpublished)
Abstract
From 2007 onward, the PDF document has proven to be a successful vector for malware infections, making up 80% of all exploits found by Cisco ScanSafe in 2009 [1]. Creating new PDF documents is very easy and the volume of PDF documents identified as malicious has grown beyond the capabilities of security researchers to analyze by hand. The solution proposed by this thesis is to automatically extract features from the PDF documents to group and classify them, so that similar malware may be identified without manual analysis, thus reducing the workload of the malware analyst. These features may also be studied to identify trends within the PDF documents, such as similar exploits or obfuscation techniques. Our results show that the object graph structure of the PDF document is an effective way to create an initial grouping of malicious PDF documents.
Finding similarities in PDF documents reveals further information about a data set. In our first case study, we examine the entire data set to identify large groups of similar PDF documents and make conjectures about their origins. In our second case study, we use a PDF document of known origin to find similar PDF documents within a data set. Through the two case studies, we were able to identify 50.3% of our data set with very little manual analysis of the malicious PDF documents.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
7 May 2015 |
Date Type: |
Publication |
Defense Date: |
11 March 2015 |
Approval Date: |
7 May 2015 |
Submission Date: |
17 April 2015 |
Access Restriction: |
1 year -- Restrict access to University of Pittsburgh for a period of 1 year. |
Number of Pages: |
62 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
School of Information Sciences > Information Science |
Degree: |
MS - Master of Science |
Thesis Type: |
Master's Thesis |
Refereed: |
Yes |
Uncontrolled Keywords: |
PDF documents, malicious document analysis, malware |
Date Deposited: |
07 May 2015 15:45 |
Last Modified: |
15 Nov 2016 14:27 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/24955 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |