Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Statistical analysis of infectious disease data on networks

Li, Xuan (2015) Statistical analysis of infectious disease data on networks. Master's Thesis, University of Pittsburgh. (Unpublished)

Submitted Version

Download (1MB)


Infectious disease modeling has a long history in helping researchers to understand the complex spread pattern of infectious disease. Social contact networks and agent-based models can be used to conceptualize social contact pattern and spread process of infectious disease. The goal of this research is to investigate the relationship between network measurements and individual infection risk using statistical analysis.
Public Health significance
This research will help in gaining a better understanding of the important factors of infection risk in a population. Identification of central people may be used to inform building an efficient surveillance and prevention program.
Three social contact network models were used in this thesis, Erdos-Renyi network, Barabasi-Albert network and Jefferson County contact network using FRED platform. We simulated mild and severe epidemic outbreaks on them and calculated infection risk and infection speed of each individual. Network measurements, degree, betweenness centrality, closeness centrality, eigenvector centrality, PageRank, and clustering coefficient were measured on the ability to identify groups of different infection risk level and infection speed. Random Forest and variable importance were used to estimate the most important factors in predicting infection risk
For Barabasi-Albert and Erdos-Renyi networks, centrality measurements are critical factors in identifying infection risk. Degree is the most important factor in Barabasi-Albert network while closeness and degree are the most important in the mild outbreak and severe outbreak respectively in the Erdos-Renyi network. Results of Jefferson County contact network in FRED find out the importance of location sizes. The highly clustered structure of location-based model makes betweenness centrality and clustering coefficient important in predicting infection risk.
Different network structures and characteristics of the disease will influence the importance of network measurements. Network structures also influence the correlations between network measurements. Random forest is a powerful tool for classifying infection risk. Centrality network measurements may help in identifying high infection risk people.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Li, Xuanxul23@pitt.eduXUL230000-0001-7300-9960
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairMarsh, Garygmarsh@pitt.eduGMARSH
Committee MemberGrefenstette, John J.gref@pitt.eduGREF
Committee MemberGuclu, Hasanguclu@pitt.eduGUCLU
Committee MemberKumar, Supriyasupriya@pitt.eduSUPRIYA
Date: 28 September 2015
Date Type: Publication
Defense Date: 29 June 2015
Approval Date: 28 September 2015
Submission Date: 24 July 2015
Access Restriction: 3 year -- Restrict access to University of Pittsburgh for a period of 3 years.
Number of Pages: 80
Institution: University of Pittsburgh
Schools and Programs: School of Public Health > Biostatistics
Degree: MS - Master of Science
Thesis Type: Master's Thesis
Refereed: Yes
Uncontrolled Keywords: Social Contact Network; Random Forest; Infectious Disease; Agent-Based Model;FRED;
Date Deposited: 28 Sep 2015 18:32
Last Modified: 01 Sep 2018 05:15


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item