State space model multiple imputation for missing data in non-stationary multivariate time series with application in digital Psychiatry

-
Cunz 160 and Zoom

Xiaoxuan Cai, PhD, OSU

Xiaoxuan Cai, PhD

Xiaoxuan Cai, PhD   
Assistant Professor   
https://xiaoxuan-cai.github.io/   
OSU Department of Statistics 

 

Mobile technology (e.g., mobile phones and wearable devices) provides effective and scalable methods for collecting  physiological and behavioral biomarkers in patients’ naturalistic settings, as well as opportunity for therapeutic advancements  and scientific discoveries regarding the etiology of psychiatric illness. Continuous data collection yields a new type of data:  entangled multivariate time series of outcome, exposure, and covariates. Missing data is a pervasive problem in biomedical and  social science research, and Ecological Momentary Assessment (EMA) in psychiatric research via mobile devices is no exception.  However, complex data structure of multivariate time series and non-stationarity make missing data a major challenge for  proper inference. Time series analyses typically include history information as explanatory variables to control for auto correlation, exacerbating the missing data problem and potentially rendering unfeasible to adjust appropriately for confounding. The majority of available imputation methods are either designed for longitudinal data with limited follow-up times or for  stationary time series. Limited work on non-stationary time series either focuses on missing exogenous information or ignores  the complex relationship among outcome, exposure and covariates time series. How to handle missing data in complex non stationary multivariate time series is a key problem that remains unresolved, and the performance of existing imputation  methods remains to be evaluated in the context of non-stationary mobile device data. We propose a novel data imputation  solution based on the state space model and multiple imputation to properly address missing data in non-stationary multivariate  time series. We demonstrate its advantages over other widely used missing data imputation strategies by evaluating its  theoretical properties and empirical performance in simulations of both stationary and non-stationary time series, subject to  various missing mechanisms. We apply the proposed method to investigate the association between digital social interaction  and negative mood in a multi-year smartphone observational study of bipolar and schizophrenia patients.