Detection of influential observations in longitudinal multivariate mixed effects regression modelsLing, Yun (2014) Detection of influential observations in longitudinal multivariate mixed effects regression models. Doctoral Dissertation, University of Pittsburgh. (Unpublished)
AbstractThe purpose of this dissertation is to detect possible influential observations in longitudinal data with more than one observation per subject at each time point, that is, in multivariate longitudinal data. An influential observation is an observation which has large effect on the parameter estimation of a given model. Influential observations are important because: (1) removal of the observation(s) from the data set can substantially change the values of the estimated parameters; (2) in multivariate longitudinal mixed effect models, influential observations can affect the population and subject-specific trajectories; (3) influential observation(s) of one response may affect the predicted effects of the other response within the same individual; (4) an influential observation may indicate an abnormal or misdiagnosed subject. This research was motivated by opthalmological clinical research in glaucoma. In many ophthalmology studies, both eyes are repeatedly measured. Sometimes one eye can be measured by different devices or measured for different quantities (retina thickness for different quadrants, OCT, VFI, etc.). For example, in one study considered in this dissertation, multivariate measurements (Retinal Nerve Fiber Layer (RNFL) thickness and Ganglion Cell Complex (GCC) thickness) were repeatedly measured on each eye, within each patient (cluster). When we detect influential observations for longitudinal ophthamology data, our trajectory model must take into account three kinds of correlations: (1) correlation among different characteristics measured at the same time point within the same eye; (2) correlation among different time points; (3) correlation between characteristics in the two eyes. In the first part of my dissertation, we propose a multivariate conditional version of Cook's distance for multivariate mixed effect models. Some research has shown that, in mixed effect models, influential observations having a large effect on subject-specific parameters cannot always be detected by the original Cook's distance due to large between-subject variation. Hence, in the multivariate longitudinal setting, the influential observation problem is better approached by conditioning on subjects and characteristics. Repeated simulations within this dissertation show that multivariate conditional Cook's distance successfully detected most 92.5% influential observations, but unconditional Cook's distance only detected 7.5%. In the second part of the dissertation, we extend the multivariate conditional Cook's distance to multilevel multivariate mixed effect model. In this model, there are two levels of random effects to handle the subject level and cluster level correlations among different time points, and the residual covariance matrix to handle correlations among different responses. Also, the two-level multivariate conditional Cook's distance can be decomposed into six parts, indicating the influences of fixed effects, 1st and 2nd level of random effects, and the co-variation between them, respectively. Examples are given to illustrate how the influential observation in one characteristic changes the effects of both characteristics. This research has public health implications because the influence of outliers can bias the results of any longitudinal study in public health. Hence, recognizing observations which have undue influence on study results ensures that reliable conclusions can be obtained in medical and public health research settings. Share
Details
MetricsMonthly Views for the past 3 yearsPlum AnalyticsActions (login required)
|