The Calculus of M-estimation in R with geex

M-estimation, or estimating equation, methods are widely applicable for point estimation and asymptotic inference. In this paper, we present an R package that can find roots and compute the empirical sandwich variance estimator for any set of user-specified, unbiased estimating equations. Examples from the M-estimation primer by Stefanski and Boos (2002) demonstrate use of the software. The package also includes a framework for finite sample variance corrections and a website with an extensive collection of tutorials.


Introduction
M-estimation methods are a simple but general class of statistical procedures for carrying out point estimation and asymptotic inference (Boos and Stefanski, 2013). Also known as estimating equation or estimating function methods, M-estimation was originally developed in studying the large sample properties of robust statistics (Huber and Ronchetti, 2009). The general result from M-estimation theory states that if an estimator can be expressed as the solution to an unbiased estimating equation, then under suitable regularity conditions the estimator is asymptotically Normal and its asymptotic variance can be consistently estimated using the empirical sandwich estimator. Many estimators can be expressed as solutions to unbiased estimating equations, thus M-estimation has extensive applicability. The primer by Stefanski and Boos (2002) demonstrates how many statistics can be expressed as M-estimators, including the popular method of generalized estimating equations (GEE) for longitudinal data analysis (Liang and Zeger, 1986).
Despite the broad applicability of M-estimation, existing statistical software packages implement M-estimators specific to particular forms of estimating equations such as GEE.
This paper introduces the package geex for R (R Core Team, 2016), which can obtain point and variance estimates from any set of unbiased estimating equations. The analyst translates the mathematical expression of an estimating function into an R function that takes unit-level data and returns a function in terms of parameters. The package geex then uses numerical routines to compute parameter estimates and the empirical sandwich variance estimator. This paper is outlined as follows. Section 2 reviews M-estimation theory and outlines how geex translates mathematical expressions of estimating functions into R syntax. Section 3 implements three examples from Stefanski and Boos (2002) (hereafter SB) using geex plus an example of GEE. All of the SB examples and several more are available at the package website (https://bsaul.github.io/geex/). Section 4 compares geex to existing R packages. Section 5 demonstrates the finite sample correction feature of geex, and Section 6 concludes with a brief discussion of the software's didactic utility and pragmatic applications.

From M-estimation math to code
In the basic set-up, M-estimation applies to estimators of the p×1 parameter θ = (θ 1 , θ 2 , . . . , θ p ) ⊺ which can be obtained as solutions to an equation of the form where O 1 , . . . , O m are m independent and identically distributed (iid) random variables, and the function ψ returns a vector of length p and does not depend on i or m. See SB for the case where the O i are independent but not necessarily identically distributed. The roots of (1) are referred to as M-estimators and denoted byθ. M-estimators can be solved for analytically in some cases or computed numerically in general. Under certain regularity conditions, the asymptotic properties ofθ can be derived from Taylor series approximations, the law of large numbers, and the central limit theorem (Boos and Stefanski, 2013, sec. 7.2).
In brief, let θ 0 be the true parameter value defined by Then under suitable regularity assumptions,θ is consistent and asymptotically Normal, i.e., The empirical sandwich estimator of the variance ofθ is: The geex package provides an application programming interface (API) for carrying out M-estimation. The analyst provides a function, called estFUN, corresponding to ψ(O i , θ) that maps data O i to a function of θ. Numerical derivatives approximateψ so that evalu-atingΣ is entirely a computational exercise. No analytic derivations are required from the analyst.
Consider estimating the population mean θ = E[Y i ] using the sample meanθ = The estimatorθ can be expressed as the solution to the following estimating equation: An estFUN is a translation of ψ into an R function whose first argument is data and returns a function whose first argument is theta. An estFUN corresponding to the estimating equation for the sample mean of Y is: The geex package exploits R as functional programming language: functions can return and modify other functions (Wickham, 2014, ch. 10). If an estimator fits into the above framework, then the user need only specify estFUN. No other programming is required to obtain point and variance estimates. The remaining sections provide examples of translating ψ into an estFUN.

Calculus of M-estimation Examples
The geex package can be installed from CRAN with install.packages("geex"). The Y 2 ∼ N(2, 1) for m = 100 and are included in the geexex dataset. The last example in this section introduces a GEE application, which is elaborated on in Section 5 to demonstrate finite sample corrections.

Example 1: Sample moments
The first example estimates the population mean (θ 1 ) and variance (θ 2 ) of Y 1 . Figure   1 shows the estimating equations and corresponding estFUN code. The solution to the estimating equations in Figure 1  The primary geex function is m estimate, which requires two inputs: estFUN (the ψ function), data (the data frame containing O i for i = 1, . . . , m). The package defaults to rootSolve::multiroot (Soetaert and Herman, 2009;Soetaert, 2009) for estimating the roots of (1), though the solver algorithm can be specified in the root control argument.
Starting values for rootSolve::multiroot are passed via the root control argument; e.g., The m estimate function returns an object of the S4 class geex, which contains an estimates slot and vcov slot forθ andΣ, respectively. These slots can be accessed by the functions coef (or roots) and vcov. The point estimates obtained for θ 1 and θ 2 are analogous to the base R functions mean and var (after multiplying by m − 1/m for the latter). SB gave a closed form for A(θ 0 ) (an identity matrix) and B(θ 0 ) (not shown here) and suggest plugging in sample moments to compute B(θ). The point and variance estimates using both geex and the analytic solutions are shown in Table 1. The maximum absolute difference between either the point or variance estimates is 4e-11, thus demonstrating excellent agreement between the numerical results obtained from geex and the closed form solutions for this set of estimating equations and data.

Example 2: Ratio estimator
This example calculates a ratio estimator ( Figure 2) and illustrates the delta method via M-estimation. The estimating equations target the means of Y 1 and Y 2 , labelled θ 1 and θ 2 , as well as the estimand The solution to (1) for this ψ function yieldsθ 3 =Ȳ 1 /Ȳ 2 , whereȲ j denotes the sample mean of Y j1 , . . . , Y jm for j = 1, 2.
SB gave closed form expressions for A(θ 0 ) and B(θ 0 ), into which we plug in appropriate estimates for the matrix components and compare to the results from geex. The point estimates again show excellent agreement (maximum absolute difference 4.4e-16), while the covariance estimates differ by the third decimal (maximum absolute difference 1e-03).

Example 3: Delta method continued
This example extends Example 1 to again illustrate the delta method. The estimating equations target not only the mean (θ 1 ) and variance (θ 2 ) of Y 1 , but also the standard deviation (θ 3 ) and the log of the variance (θ 4 ) of Y 1 .

Example 4: Generalized Estimating Equations
In their seminal paper, Liang and Zeger (1986) introduced generalized estimating equations (GEE) for the analysis of longitudinal or clustered data. Let m denote the number of independent clusters. For cluster i, let n i be the cluster size, Y i be the n i ×1 outcome vector, and X i be the n i × p matrix of covariates. Let µ(X i ; θ) = E[Y i |X i ; θ] and assume µ(X i ; θ) = g −1 (X i θ), where g is some user-specified link function. The generalized estimating equations are: where the matrix R(α) is the "working" correlation matrix. The example below uses an exchangeable correlation structure with off-diagonal elements α. The matrix W i is a diagonal matrix with elements containing ∂ 2 µ(X i ; θ)/∂θ 2 . The equations in (3) can be translated into an estFUN as: gee_estfun <-function(data, formula, family){ X <-model.matrix(object = formula, data = data) Y <-model.response(model.frame(formula = formula, data = data)) n <-nrow(X) function(theta, alpha, psi){ mu <-drop(family$linkinv(X %*% theta)) Dt <-t(X) %*% diag(mu, nrow = n) W <-diag(family$variance(mu), nrow = n) R <-matrix(alpha, nrow = n, ncol = n) diag(R) <-1 V <-psi * (sqrt(W) %*% R %*% sqrt(W)) Dt %*% solve(V) %*% (Y -mu) } } This estFUN treats the correlation parameter α and scale parameter φ as fixed, though some estimation algorithms use an iterative procedure that alternates between estimating θ 0 and these parameters. By customizing the root finding function, such an algorithm can be implemented using geex [see vignette("v03 root solvers") for more information].
We use this example to compare covariance estimates obtained from the gee function (Carey, 2015), and so do not estimate roots using geex. To compute only the sandwich variance estimator, set compute roots = FALSE and pass estimates of θ 0 via the roots argument. For this example, estimated roots of (3), i.e.θ, and estimates for α and φ are extracted from the object returned by gee. This example shows that an estFUN can accept additional arguments to be passed to either the outer (data) function or the inner (theta) function. Unlike previous examples, the independent units are clusters (types of wool), which is specified in m estimate by the units argument. By default, m equals the number of rows in the data frame.

Comparison to existing software
The above examples demonstrate the basic utility of the geex package and the power of R's functional programming capability. To our knowledge, geex is the first R package to create an extensible API for any estimator that is the solution to estimating equations in the form of (1). Existing R packages such as gee (Carey, 2015), geepack (Halekoh et al., 2006), For computing a sandwich variance estimator, geex is similar to the popular sandwich package (Zeileis, 2004(Zeileis, , 2006, which computes the empirical sandwich variance estimator from modelling methods such as lm, glm, gam, survreg, and others. For comparison to the exposition herein, the infrastructure of sandwich is explained in Zeileis (2006). Advantages To compare sandwich and geex, consider estimatingΣ for the θ parameters in the following simple linear model contained in the geexex data: where ǫ ∼ N(0, 1). The estimating equation for θ in this model can be expressed in an estFUN as: Thenθ andΣ can be computed in geex: results <-m_estimate( estFUN = lm_estfun, data = geexex, root_control = setup_root_control(start = c(0, 0, 0))) or from the lm and sandwich functions: fm <-lm(Y4~X1 + X2, data = geexex) sand_vcov <-sandwich::sandwich(fm) The results are virtually identical (maximum absolute difference 1.4e-12). The lm/sandwich option is faster computationally, but geex can be made faster by, for example, changing the options of the derivative function via deriv control or computingΣ using the parameter estimates from lm. The important difference is that geex lays bare the estimating function used, which is both a powerful didactic tool as well as a programming advantage when developing custom estimating functions.

Finite sample corrections
The empirical sandwich variance estimator is known to underestimate V (θ 0 ) in small samples (Fay and Graubard, 2001). Particularly in the context of GEE, many authors (e.g., see Paul and Zhang, 2014;Li and Redden, 2015, and references therein) have proposed corrections that modify components ofΣ and/or by assumingθ follows a t (or F), as opposed to Normal, distribution with some estimated degrees of freedom. Many of the proposed corrections somehow modify a combination of the A i , A m , B i , or B m matrices.
Users may specify functions that utilize these matrices to form corrections within geex.
A finite sample correction function requires at least the argument components, which is an S4 object with slots for the A (= i A i ) matrix, A i (a list of all m A i matrices), the B (= i B i ) matrix, and B i (a list of all m B i matrices). Additional arguments may also be specified, as shown in the example. The geex package includes the bias correction and degrees of freedom corrections proposed by Fay and Graubard (2001) in the fay bias correction and fay df correction functions respectively. The following demonstrates the construction and use of the bias correction. Fay and Graubard (2001) proposed the modified variance estimatorΣ bc and W jj denotes the jj element of a matrix W . When {A i A −1 m } jj is close to 1, the adjustment toΣ bc (b) may be extreme, and the constant b is chosen by the analyst to limit over adjustments. The bias corrected estimatorΣ bc (b) can be implemented in geex by the following function: In the geex output, the slot corrections contains a list of the results of computing each item in the corrections, which can be accessed with the get corrections function. The corrections of Fay and Graubard (2001) are also implemented in the saws package (Fay and Graubard, 2001). Comparing the geex results to the results of the saws::geeUOmega function, the maximum absolute difference for any of the corrected estimated covariance matrices is 1.8e-09.

Summary
This paper demonstrates how M-estimators and finite sample corrections can be transparently implemented in geex. The package website (https://bsaul.github.io/geex/) showcases many examples of M-estimation including instrumental variables, sample quantiles, robust regression, generalized linear models, and more. A valuable feature of Mestimators is that estimating functions corresponding to parameters from multiple models may be combined, or "stacked," in a single set of estimating functions. The geex package makes it easy to stack estimating functions for the target parameters with estimating functions from each of the component models, as shown in the package vignette v06 causal example. Indeed, the software was motivated by causal inference problems (Saul et al., 2017)where target causal parameters are functions of parameters in multiple models.
The theory of M-estimation is broadly applicable, yet existing R packages only im-