R: Regression standardization in conditional generalized...

stdGee

R Documentation

Regression standardization in conditional generalized estimating equations

Description

stdGee performs regression standardization in linear and log-linear fixed effects models, at specified values of the exposure, over the sample covariate distribution. Let Y, X, and Z be the outcome, the exposure, and a vector of covariates, respectively. It is assumed that data are clustered with a cluster indicator i. stdGee uses fitted fixed effects model, with cluster-specific intercept a_i (see details), to estimate the standardized mean θ(x)=E\{E(Y|i,X=x,Z)\}, where x is a specific value of X, and the outer expectation is over the marginal distribution of (a_i,Z).

Usage

stdGee(fit, data, X, x, clusterid, subsetnew)

Arguments

`fit`	an object of class `"gee"`, with argument `cond = TRUE`, as returned by the `gee` function in the drgee package. If arguments `weights` and/or `subset` are used when fitting the model, then the same weights and subset are used in `stdGee`.
`data`	a data frame containing the variables in the model. This should be the same data frame as was used to fit the model in `fit`.
`X`	a string containing the name of the exposure variable X in `data`.
`x`	an optional vector containing the specific values of X at which to estimate the standardized mean. If X is binary (0/1) or a factor, then `x` defaults to all values of X. If X is numeric, then `x` defaults to the mean of X. If `x` is set to `NA`, then X is not altered. This produces an estimate of the marginal mean E(Y)=E\{E(Y\|X,Z)\}.
`clusterid`	an mandatory string containing the name of a cluster identification variable. Must be identical to the clusterid variable used in the gee call.
`subsetnew`	an optional logical statement specifying a subset of observations to be used in the standardization. This set is assumed to be a subset of the subset (if any) that was used to fit the regression model.

Details

stdGee assumes that a fixed effects model

η\{E(Y|i,X,Z)\}=a_i+h(X,Z;β)

has been fitted. The link function η is assumed to be the identity link or the log link. The conditional generalized estimating equation (CGGE) estimate of β is used to obtain estimates of the cluster-specific means:

\hat{a}_i=∑_{j=1}^{n_i}r_{ij}/n_i,

where

r_{ij}=Y_{ij}-h(X_{ij},Z_{ij};\hat{β})

if η is the identity link, and

r_{ij}=Y_{ij}exp\{-h(X_{ij},Z_{ij};\hat{β})\}

if η is the log link, and (X_{ij},Z_{ij}) is the value of (X,Z) for subject j in cluster i, j=1,...,n_i, i=1,...,n. The CGEE estimate of β and the estimate of a_i are used to estimate the mean E(Y|i,X=x,Z):

\hat{E}(Y|i,X=x,Z)=η^{-1}\{\hat{a}_i+h(X=x,Z;\hat{β})\}.

For each x in the x argument, these estimates are averaged across all subjects (i.e. all observed values of Z and all estimated values of a_i) to produce estimates

\hat{θ}(x)=∑_{i=1}^n ∑_{j=1}^{n_i} \hat{E}(Y|i,X=x,Z_i)/N,

where N=∑_{i=1}^n n_i. The variance for \hat{θ}(x) is obtained by the sandwich formula.

Value

An object of class "stdGee" is a list containing

`call`	the matched call.
`input`	`input` is a list containing all input arguments.
`est`	a vector with length equal to `length(x)`, where element `j` is equal to \hat{θ}(`x[j]`).
`vcov`	a square matrix with `length(x)` rows, where the element on row `i` and column `j` is the (estimated) covariance of \hat{θ}(`x[i]`) and \hat{θ}(`x[j]`).

Note

The variance calculation performed by stdGee does not condition on the observed covariates \bar{Z}=(Z_{11},...,Z_{nn_i}). To see how this matters, note that

var\{\hat{θ}(x)\}=E[var\{\hat{θ}(x)|\bar{Z}\}]+var[E\{\hat{θ}(x)|\bar{Z}\}].

The usual parameter β in a generalized linear model does not depend on \bar{Z}. Thus, E(\hat{β}|\bar{Z}) is independent of \bar{Z} as well (since E(\hat{β}|\bar{Z})=β), so that the term var[E\{\hat{β}|\bar{Z}\}] in the corresponding variance decomposition for var(\hat{β}) becomes equal to 0. However, θ(x) depends on \bar{Z} through the average over the sample distribution for Z, and thus the term var[E\{\hat{θ}(x)|\bar{Z}\}] is not 0, unless one conditions on \bar{Z}.

Author(s)

Arvid Sjolander.

References

Goetgeluk S. and Vansteelandt S. (2008). Conditional generalized estimating equations for the analysis of clustered and longitudinal data. Biometrics 64(3), 772-780.

Martin R.S. (2017). Estimation of average marginal effects in multiplicative unobserved effects panel models. Economics Letters 160, 16-19.

Sjolander A. (2019). Estimation of marginal causal effects in the presence of confounding by cluster. Biostatistics doi: 10.1093/biostatistics/kxz054

Examples


require(drgee)

n <- 1000
ni <- 2
id <- rep(1:n, each=ni)
ai <- rep(rnorm(n), each=ni)
Z <- rnorm(n*ni)
X <- rnorm(n*ni, mean=ai+Z)
Y <- rnorm(n*ni, mean=ai+X+Z+0.1*X^2)
dd <- data.frame(id, Z, X, Y)
fit <- gee(formula=Y~X+Z+I(X^2), data=dd, clusterid="id", link="identity",
  cond=TRUE)
fit.std <- stdGee(fit=fit, data=dd, X="X", x=seq(-3,3,0.5), clusterid="id")
print(summary(fit.std, contrast="difference", reference=2))
plot(fit.std)