Generalized linear model function for WASH Benefits study. washb_glm

Usage

washb_glm(Y,tr,pair,W=NULL, forcedW=NULL, V=NULL, id,contrast,family="gaussian", pval=0.2, print=TRUE)

Arguments

Y: Outcome variable (continuous, such as LAZ, or binary, such as diarrhea)
tr: Binary treatment group variable (ideally a factor), comparison group first
pair: Pair-matched randomization ID variable (in WASH Benefits: block)
W: Optional data frame that includes adjustment covariates (for adjusted estimates)
forcedW: Optional vector of variable names to force as adjustment covariates (no screening)
V: Optional variable name for subgroup analyses, which is interacted with 'tr'.
id: ID variable for independent units (cluster ID)
contrast: Vector of length 2 that includes the groups to contrast, e.g., c("Control","Water")
family: GLM model family (gaussian, binomial, poisson, and negative binomial). Use "binonial(link='log')" to return prevalence ratios instead of odds ratios when the outcome is binary. Use "neg.binom" for a Negative binomial model.
pval: The p-value threshold: any variables with a p-value from the lielihood ratio test below this threshold will be returned. Defaults to 0.2
print: Logical for whether to print function output, defaults to TRUE.
print: Logical for whether to print names and descriptions of returned list objects

Value

Returns a list of the risk ratios or risk differences, the variance-covariance matrix, and a vector indexing the rows of observations used to fit the glm model

Description

Note that in the WASH Benefits diarrheal disease primary outcome analysis, the glm model is fit with a log link, `family=binomial(link='log')`, to estimate prevalence ratios rather than with a logit link, `family="binomial"`, to estimate the odds ratio.

Details

Occasionally, a glm model with a non-canonical link function like `family=binomial(link='log')` will fail to converge. If this occurs, use a modified poisson regression to estimate prevalence ratio using the argument `family=poisson(link='log')`. See [Zou 2004](http://www.uvm.edu/~rsingle/stat380/F04/possible/Zou-AJE-2004_PoissonRegBinaryData.pdf) for details.

Examples


#washb_glm function applied to the Bangladesh diarrheal disease outcome to determine both unadjusted and adjusted
#prevalence ratios between intervention and control arms.


#Cleans and merge the enrollment and diarrhea data:
library(washb)
data(washb_bd_enrol)
data(washb_bd_diar)

# drop svydate and month because they are superceded in the child level diarrhea data
#washb_bd_enrol$svydate <- NULL
#washb_bd_enrol$month <- NULL

# merge the baseline dataset to the follow-up dataset
ad <- merge(washb_bd_enrol,washb_bd_diar,by=c("dataid","clusterid","block","tr"),all.x=F,all.y=T)

# subset to the relevant measurement
# Year 1 or Year 2
ad <- subset(ad,svy==1|svy==2)

#subset the diarrhea to children <36 mos at enrollment
### (exlude new births that are not target children)
ad <- subset(ad,sibnewbirth==0)
ad <- subset(ad,gt36mos==0)

# Exclude children with missing data
ad <- subset(ad,!is.na(ad$diar7d))

#Re-order the tr factor for convenience
ad$tr <- factor(ad$tr,levels=c("Control","Water","Sanitation","Handwashing","WSH","Nutrition","Nutrition + WSH"))

#Ensure that month is coded as a factor
ad$month <- factor(ad$month)

#Sort the data for perfect replication when using V-fold cross-validation
ad <- ad[order(ad$block,ad$clusterid,ad$dataid,ad$childid),]

###Create vector of contrasts for each hypothesis to facilitate comparisons between arms.
#Hypothesis 1: Each intervention arm vs. Control
h1.contrasts <- list(
  c("Control","Water"),
  c("Control","Sanitation"),
  c("Control","Handwashing"),
  c("Control","WSH"),
  c("Control","Nutrition"),
  c("Control","Nutrition + WSH")
)



###Unadjusted GLM estimates for diarrheal disease outcome.
#As an example, the following code applies the washb_glm function to compare 7-day recall diarrheal disease prevalence between the sanitation and control arms. Notice that the glm model is fit with a log link, `family=binomial(link='log')`, to estimate prevalence ratios rather than with a logit link, `family="binomial"`, to estimate the odds ratio.

Diar.glm.C.S <- washb_glm(Y=ad$diar7d,tr=ad$tr,pair=ad$block, id=ad$clusterid, contrast=c("Control","Sanitation"), family=binomial(link='log'))


-----------------------------------------
 GLM Fit: Sanitation vs. Control 
-----------------------------------------
                   PR      2.5%    97.5%  Estimate Std. Error z value  Pr(>|z|)
trSanitation 1.173836 0.9345889 1.474329 0.1602771  0.1162886 1.37827 0.1681199

 Type "`modelname'$TR" to return the treatment effect.
 Type "`modelname'$fit" to return full glm model estimates.
 Type "`modelname'$vcv" to return the variance-covariance matrix.
 Type "`modelname'$rowdropped" to return the vector list of observations included in the model fit
 Type "`modelname'$glmModel" to return the glm model fit to be used with predict() to return model predictions of the outcome

#On top of the function's auto-printed output, the washb_glm function contains a number of objects. For example, `'objectname'$vcv` returns the variance-covariance matrix.


#All returned objects are:
#'objectname$fit` to return full glm model estimates.
#'objectname$vcv` to return the variance-covariance matrix.
#'objectname$rowdropped` to return the vector list of observations included in the model fit.
#'objectname$lincom` to return subgroup-specific conditional relative risk estimates if a subgroup V is specified.