Instructions for running imputation from the command line in MATLAB.
James Carpenter, 1st December, 2011.
Queries and comments to: james.carpenter@lshtm.ac.uk
(Note, I cannot support new MATLAB users)
*Should work with MATLAB release 2009b and subsequent releases.
To reproduce the results in the paper (up to Monte-Carlo error):
First: unzip the file
realcom-impute-matlab.zip
with the matlab code to reproduce the imputations, and also the
necessary input data.
To run this:
a) start matlab
b) on the matlab toolbar, change directory to where the files were unzipped
c) at the matlab command line, type
mcmcdriver
Notes:
mcmcdriver.m is the master matlab file; all other .m files are
subroutines.
For user simplicity, the program looks for all data files in the directory
with mcmcdriver.m in.
This can be changed by uncommenting l88-89 of mcmcdriver.m
The program will then prompt for the directory with the data files.
All input data files have capital letters (required if running on *nix platforms).
They are all ASCII.
These are:
(a) Data files
========
Y - response variables in imputation model in columns, with -9.999+e029 denoting missing values
Dimension: N by P, where N is number of observations (at level 1) and P is number of variables
In this example, N=7394 and P = 5.
NCATS - number of categories of each of the response variables:
1 if continuous, otherwise number of categories.
NRTYPE - response type for each of the response variables:
1 - continuous, 2 - unordered categorical, 3 - ordered categorical
NVARS - number of response variables (P)
LEVEL - level of each of the response variables: 1 for level 1, and 2 for level 2.
X - covariates in impuation model: typically just the constant, 1. Dimension N by Q= no of covariates.
XLEVEL - level that covariates enter the imputation model: 1 by Q.
Z2 - level 2 random covariates, typically just the constant. Dimension N by Q2 = no random covariates at level 2
N2 - number of covariates random at level 2: typically 1, the constant.
(b) Hierarchy variables
==============
L2ID - N by 1, level 2 identifier. Note the data must be sorted by level 2 identifier
(c) Control variables
============
ITERPARS - 1X2 - entry (1,1) is burn in and entry (1,2) is number of post-burn in updates of MCMC sampler
IMPUTE - vector of post burn-in iterations at which data are imputed;
RANDSEED - if specified, random seed (useful for repeating results). If unspecified, initialised from clock.
ITEREPORT - if present, how many updates of MCMC sampler between reporting on progress.
(d) Starting values for sampler
=================
L2VARSTART - vector of dimension 1 by P. Starting values for diagonal elements variance covariance matrix at level 2. Typically all 1. Note that a K category variable has (K-1) variance terms at level 2. We have 4 continuous and
1 4-level categorical variable in Y, therefore we have 4+ (4-1)=7 terms in this model.
Output files:
=======
IMPUTEDRESPONSES - if we do M imputations, this is of dimension NM by Q, with all missing values imputed: in other words the M imputed datasets, each of dimension N by Q, are stacked on top of each other.