Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Bulk Analysis of Malicious PDF Documents

Policicchio, Shauna (2015) Bulk Analysis of Malicious PDF Documents. Master's Thesis, University of Pittsburgh. (Unpublished)

Primary Text

Download (419kB)


From 2007 onward, the PDF document has proven to be a successful vector for malware infections, making up 80% of all exploits found by Cisco ScanSafe in 2009 [1]. Creating new PDF documents is very easy and the volume of PDF documents identified as malicious has grown beyond the capabilities of security researchers to analyze by hand. The solution proposed by this thesis is to automatically extract features from the PDF documents to group and classify them, so that similar malware may be identified without manual analysis, thus reducing the workload of the malware analyst. These features may also be studied to identify trends within the PDF documents, such as similar exploits or obfuscation techniques. Our results show that the object graph structure of the PDF document is an effective way to create an initial grouping of malicious PDF documents.
Finding similarities in PDF documents reveals further information about a data set. In our first case study, we examine the entire data set to identify large groups of similar PDF documents and make conjectures about their origins. In our second case study, we use a PDF document of known origin to find similar PDF documents within a data set. Through the two case studies, we were able to identify 50.3% of our data set with very little manual analysis of the malicious PDF documents.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Policicchio, Shaunasmh137@pitt.eduSMH137
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee CoChairKrishnamurthy, Prashantprashk@pitt.eduPRASHK
Committee CoChairSpring,
Committee MemberMetcalf,
Committee MemberPalanisamy, Balajibpalan@pitt.eduBPALAN
Committee MemberPelechrinis, Konstantinoskpele@pitt.eduKPELE
Date: 7 May 2015
Date Type: Publication
Defense Date: 11 March 2015
Approval Date: 7 May 2015
Submission Date: 17 April 2015
Access Restriction: 1 year -- Restrict access to University of Pittsburgh for a period of 1 year.
Number of Pages: 62
Institution: University of Pittsburgh
Schools and Programs: School of Information Sciences > Information Science
Degree: MS - Master of Science
Thesis Type: Master's Thesis
Refereed: Yes
Uncontrolled Keywords: PDF documents, malicious document analysis, malware
Date Deposited: 07 May 2015 15:45
Last Modified: 15 Nov 2016 14:27


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item