Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

A COMPARISON OF LOGISTIC REGRESSION TO RANDOM FORESTS FOR EXPLORING DIFFERENCES IN RISK FACTORS ASSOCIATED WITH STAGE ATDIAGNOSIS BETWEEN BLACK AND WHITE COLON CANCER PATIENTS

Geng, Ming (2006) A COMPARISON OF LOGISTIC REGRESSION TO RANDOM FORESTS FOR EXPLORING DIFFERENCES IN RISK FACTORS ASSOCIATED WITH STAGE ATDIAGNOSIS BETWEEN BLACK AND WHITE COLON CANCER PATIENTS. Master's Thesis, University of Pittsburgh. (Unpublished)

[img]
Preview
PDF
Primary Text

Download (23MB) | Preview

Abstract

Introduction: Colon cancer is one of the most common malignancies in America. According to the American Cancer Society, blacks have lower survival rate than whites. Many previous studies suggested that it is because blacks were more likely to be diagnosed at a late stage. Hence, it is crucial to determine factors that are associated with colon cancer stage at diagnosis. Objectives: The objectives of this study are twofold: 1)To compare logistic regression modeling to Random Forests classification with respect to variables selected and classification accuracy; and 2) To evaluate the factors related to colon cancer stage at diagnosis in a population based study. Many studies have comparedClassification and Regression Trees (CART) to logistic regression and found that they have very similar power with respect to the proportion correctly classified and the variables selected. This study extends previous methodological research by comparing the Random Forests classification techniques to logistic regression modeling using a relatively small and incomplete dataset. Methods and Materials: The data used in this research were from National Cancer Institute Black/White Cancer Survival Study which had 960 cases of invasive colon cancer. Stage at diagnosis was used as the dependent variable for fitting logistic regression models and Random Forests Classification to multiple potential explanatory variables, which included some missing data. Results: Odds ratio (blacks vs. whites) decreased from 1.628 (95%CI: 1.068-2.481) to 1.515 (95% CI: 0.920-2.493) after adjustment was made for patient delay in diagnosis, occupation, histology and grade of tumor. Race became no longer important after these variables were entered in the Random Forests. These four variables were identified as the most important variables associated with racial disparity in colon cancer stage at diagnosis in both logistic regression and Random Forests. The correctclassification rate was 47.9% using logistic regression and was 33.9% using Random Forests. Conclusion: 1). Logistic regression and Random Forests had very similar power in variable selection. 2). Logistic regression had higher classification accuracy than Random Forests with respect to overall correct classification rate.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: University of Pittsburgh ETD
Status: Unpublished
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Geng, MingMing.Geng@med.va.gov
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairRedmond, Carol Kckr3@pitt.eduCKR3
Committee MemberRicci, Edmund Memricci@pitt.eduEMRICCI
Committee MemberMazumdar, Satimaz1@pitt.eduMAZ1
Date: 1 June 2006
Date Type: Completion
Defense Date: 21 January 2006
Approval Date: 1 June 2006
Submission Date: 12 April 2006
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Institution: University of Pittsburgh
Schools and Programs: Graduate School of Public Health > Biostatistics
Degree: MS - Master of Science
Thesis Type: Master's Thesis
Refereed: Yes
Uncontrolled Keywords: colon cancer; polytomous logistic regression; proportional odds model; random forests; stage at diagnosis
Other ID: http://etd.library.pitt.edu/ETD/available/etd-04122006-102254/, etd-04122006-102254
Date Deposited: 10 Nov 2011 19:36
Last Modified: 15 Nov 2016 13:39
URI: http://d-scholarship.pitt.edu/id/eprint/7034

Metrics

Monthly Views for the past 3 years

Plum Analytics


Actions (login required)

View Item View Item