Longitudinal Data Analysis

Longitudinal data comprise a response that is measured repeatedly over time, for a number of individual units, in a study.  This might include measurement of serum proteins from nephrology patients over time, measurements of smoking behaviour over time in a smoking cessation trial or observational study, or measures of air particulates in geographical regions over time (e.g., Khan, Chiu, and Dubin, 2009; Khan et al., 2012). When there exist two or more distinct responses each measured over time (e.g., three serum proteins such as albumin, C-reactive protein, and ceruloplasmin, or two measures of smoking activity, such as self-report and exhaled carbon monoxide, Raffa and Dubin, 2015), these can be defined as multivariate longitudinal data. Such data may be a collection of responses recorded at each of the same discrete set of time points (e.g., Dubin and Müller, 2005), or, more generally, each response may be collected over time but at different time points from one another.  In this latter case, a smoothing step is typically required to “connect” the different responses prior to modeling (e.g., Xiong and Dubin, 2010).

We will be utilizing various longitudinal data analysis techniques for the MIMIC-II database, both for single longitudinal outcomes as well as for multivariate longitudinal outcomes.  We are also developing new methodology for such analysis as needed.