Generalized linear model function for WASH Benefits study. washb_glm

Usage

washb_glm(Y,tr,pair,W=NULL, forcedW=NULL, V=NULL, id,contrast,family="gaussian", pval=0.2, print=TRUE)

Arguments

Y
Outcome variable (continuous, such as LAZ, or binary, such as diarrhea)
tr
Binary treatment group variable (ideally a factor), comparison group first
pair
Pair-matched randomization ID variable (in WASH Benefits: block)
W
Optional data frame that includes adjustment covariates (for adjusted estimates)
forcedW
Optional vector of variable names to force as adjustment covariates (no screening)
V
Optional variable name for subgroup analyses, which is interacted with 'tr'.
id
ID variable for independent units (cluster ID)
contrast
Vector of length 2 that includes the groups to contrast, e.g., c("Control","Water")
family
GLM model family (gaussian, binomial, poisson, and negative binomial). Use "binonial(link='log')" to return prevalence ratios instead of odds ratios when the outcome is binary. Use "neg.binom" for a Negative binomial model.
pval
The p-value threshold: any variables with a p-value from the lielihood ratio test below this threshold will be returned. Defaults to 0.2
print
Logical for whether to print function output, defaults to TRUE.
print
Logical for whether to print names and descriptions of returned list objects

Value

Returns a list of the risk ratios or risk differences, the variance-covariance matrix, and a vector indexing the rows of observations used to fit the glm model

Description

Note that in the WASH Benefits diarrheal disease primary outcome analysis, the glm model is fit with a log link, `family=binomial(link='log')`, to estimate prevalence ratios rather than with a logit link, `family="binomial"`, to estimate the odds ratio.

Details

Occasionally, a glm model with a non-canonical link function like `family=binomial(link='log')` will fail to converge. If this occurs, use a modified poisson regression to estimate prevalence ratio using the argument `family=poisson(link='log')`. See [Zou 2004](http://www.uvm.edu/~rsingle/stat380/F04/possible/Zou-AJE-2004_PoissonRegBinaryData.pdf) for details.

Examples

#washb_glm function applied to the Bangladesh diarrheal disease outcome to determine both unadjusted and adjusted #prevalence ratios between intervention and control arms. #Cleans and merge the enrollment and diarrhea data: library(washb) data(washb_bd_enrol) data(washb_bd_diar) # drop svydate and month because they are superceded in the child level diarrhea data #washb_bd_enrol$svydate <- NULL #washb_bd_enrol$month <- NULL # merge the baseline dataset to the follow-up dataset ad <- merge(washb_bd_enrol,washb_bd_diar,by=c("dataid","clusterid","block","tr"),all.x=F,all.y=T) # subset to the relevant measurement # Year 1 or Year 2 ad <- subset(ad,svy==1|svy==2) #subset the diarrhea to children <36 mos at enrollment ### (exlude new births that are not target children) ad <- subset(ad,sibnewbirth==0) ad <- subset(ad,gt36mos==0) # Exclude children with missing data ad <- subset(ad,!is.na(ad$diar7d)) #Re-order the tr factor for convenience ad$tr <- factor(ad$tr,levels=c("Control","Water","Sanitation","Handwashing","WSH","Nutrition","Nutrition + WSH")) #Ensure that month is coded as a factor ad$month <- factor(ad$month) #Sort the data for perfect replication when using V-fold cross-validation ad <- ad[order(ad$block,ad$clusterid,ad$dataid,ad$childid),] ###Create vector of contrasts for each hypothesis to facilitate comparisons between arms. #Hypothesis 1: Each intervention arm vs. Control h1.contrasts <- list( c("Control","Water"), c("Control","Sanitation"), c("Control","Handwashing"), c("Control","WSH"), c("Control","Nutrition"), c("Control","Nutrition + WSH") ) ###Unadjusted GLM estimates for diarrheal disease outcome. #As an example, the following code applies the washb_glm function to compare 7-day recall diarrheal disease prevalence between the sanitation and control arms. Notice that the glm model is fit with a log link, `family=binomial(link='log')`, to estimate prevalence ratios rather than with a logit link, `family="binomial"`, to estimate the odds ratio. Diar.glm.C.S <- washb_glm(Y=ad$diar7d,tr=ad$tr,pair=ad$block, id=ad$clusterid, contrast=c("Control","Sanitation"), family=binomial(link='log'))
----------------------------------------- GLM Fit: Sanitation vs. Control ----------------------------------------- PR 2.5% 97.5% Estimate Std. Error z value Pr(>|z|) trSanitation 1.173836 0.9345889 1.474329 0.1602771 0.1162886 1.37827 0.1681199 Type "`modelname'$TR" to return the treatment effect. Type "`modelname'$fit" to return full glm model estimates. Type "`modelname'$vcv" to return the variance-covariance matrix. Type "`modelname'$rowdropped" to return the vector list of observations included in the model fit Type "`modelname'$glmModel" to return the glm model fit to be used with predict() to return model predictions of the outcome
#On top of the function's auto-printed output, the washb_glm function contains a number of objects. For example, `'objectname'$vcv` returns the variance-covariance matrix. #All returned objects are: #'objectname$fit` to return full glm model estimates. #'objectname$vcv` to return the variance-covariance matrix. #'objectname$rowdropped` to return the vector list of observations included in the model fit. #'objectname$lincom` to return subgroup-specific conditional relative risk estimates if a subgroup V is specified.