method = "optimal" performs optimal pair
matching. The matching is optimal in the sense that that sum of the absolute
pairwise distances in the matched sample is as small as possible. The method
functionally relies on
Advantages of optimal pair matching include that the matching order is not required to be specified and it is less likely that extreme within-pair distances will be large, unlike with nearest neighbor matching. Generally, however, as a subset selection method, optimal pair matching tends to perform similarly to nearest neighbor matching in that similar subsets of units will be selected to be matched.
This page details the allowable arguments with
method = "optmatch".
matchit() for an explanation of what each argument means in a general
context and how it can be specified.
Below is how
matchit() is used for optimal pair matching:
matchit(formula, data = NULL, method = "optimal", distance = "glm", link = "logit", distance.options = list(), estimand = "ATT", exact = NULL, mahvars = NULL, antiexact = NULL, discard = "none", reestimate = FALSE, s.weights = NULL, ratio = 1, min.controls = NULL, max.controls = NULL, verbose = FALSE, ...)
a two-sided formula object containing the treatment and covariates to be used in creating the distance measure used in the matching. This formula will be supplied to the functions that estimate the distance measure.
a data frame containing the variables named in
If not found in
data, the variables will be sought in the
set here to
the distance measure to be used. See
for allowable options. Can be supplied as a distance matrix.
distance is specified as a method of estimating
propensity scores, an additional argument controlling the link function used
in estimating the distance measure. See
distance for allowable
options with each option.
a named list containing additional arguments
supplied to the function that estimates the distance measure as determined
by the argument to
a string containing the desired estimand. Allowable options
"ATC". See Details.
for which variables exact matching should take place.
for which variables Mahalanobis distance matching should take
distance corresponds to a propensity score (e.g., for
caliper matching or to discard units for common support). If specified, the
distance measure will not be used in matching.
for which variables ant-exact matching should take place.
Anti-exact matching is processed using
a string containing a method for discarding units outside a
region of common support. Only allowed when
distance is not
"mahalanobis" and not a matrix.
discard is not
"none", whether to
re-estimate the propensity score in the remaining sample prior to matching.
the variable containing sampling weights to be incorporated into propensity score models and balance statistics.
how many control units should be matched to each treated unit for k:1 matching. For variable ratio matching, see section "Variable Ratio Matching" in Details below.
for variable ratio matching, the minimum and maximum number of controls units to be matched to each treated unit. See section "Variable Ratio Matching" in Details below.
logical; whether information about the matching
process should be printed to the console. What is printed depends on the
matching method. Default is
FALSE for no printing other than
additional arguments passed to
Allowed arguments include
solver. See the
optmatch::fullmatch() documentation for details. In general,
should be set to a low number (e.g.,
1e-7) to get a more precise
m.order are ignored with a warning.
Mahalanobis distance matching can be done one of two ways:
If no propensity score needs to be estimated,
distance should be
"mahalanobis", and Mahalanobis distance matching will occur
using all the variables in
formula. Arguments to
mahvars will be ignored. For example, to perform simple Mahalanobis
distance matching, the following could be run:
matchit(treat ~ X1 + X2, method = "nearest", distance = "mahalanobis")
X2, and matching occurs on this distance. The
distancecomponent of the
matchit()output will be empty.
If a propensity score needs to be estimated for common support with
distance should be whatever method is used to
estimate the propensity score or a vector of distance measures, i.e., it
should not be
mahvars to specify the
variables used to create the Mahalanobis distance. For example, to perform
Mahalanobis after discarding units outside the common support of the
propensity score in both groups, the following could be run:
matchit(treat ~ X1 + X2 + X3, method = "nearest", distance = "glm", discard = "both", mahvars = ~ X1 + X2)
X3are used to estimate the propensity score (using the
"glm"method, which by default is logistic regression), which is used to identify the common support. The actual matching occurs on the Mahalanobis distance computed only using
X2, which are supplied to
mahvars. The estimated propensity scores will be included in the
distancecomponent of the
estimand argument controls whether control units are selected to be matched with treated units
estimand = "ATT") or treated units are selected to be matched with
control units (
estimand = "ATC"). The "focal" group (e.g., the
treated units for the ATT) is typically made to be the smaller treatment
group, and a warning will be thrown if it is not set that way unless
replace = TRUE. Setting
estimand = "ATC" is equivalent to
swapping all treated and control labels for the treatment variable. When
estimand = "ATC", the
match.matrix component of the output
will have the names of the control units as the rownames and be filled with
the names of the matched treated units (opposite to when
estimand = "ATT"). Note that the argument supplied to
necessarily correspond to the estimand actually targeted; it is merely a
switch to trigger which treatment group is considered "focal".
matchit() can perform variable
ratio matching, which involves matching a different number of control units
to each treated unit. When
ratio > 1, rather than requiring all
treated units to receive
ratio matches, the arguments to
min.controls can be specified to control the
maximum and minimum number of matches each treated unit can have.
ratio controls how many total control units will be matched:
n1 * ratio control units will be matched, where
n1 is the number of
treated units, yielding the same total number of matched controls as fixed
ratio matching does.
Variable ratio matching can be used with any
ratio does not have to be an integer but must be greater than 1 and
n1 are the number of
control and treated units, respectively. Setting
ratio = n0/n1
performs a restricted form of full matching where all control units are
min.controls is not specified, it is set to 1 by default.
min.controls must be less than
must be greater than
ratio. See the Examples section of
method_nearest() for an example of their use, which is the same
as it is with optimal matching.
Optimal pair matching is a restricted form of optimal full matching
where the number of treated units in each subclass is equal to 1, whereas in
unrestricted full matching, multiple treated units can be assigned to the
optmatch::pairmatch() is simply a wrapper for
optmatch::fullmatch(), which performs optimal full matching and is the
method_full. In the same way,
optmatch::fullmatch() under the hood, imposing the restrictions that
make optimal full matching function like optimal pair matching (which is
simply to set
min.controls >= 1 and to pass
ratio to the
mean.controls argument). This distinction is not important for
regular use but may be of interest to those examining the source code.
"optmatch_max_problem_size" is automatically set to
Inf during the matching process, different from its default in
optmatch. This enables matching problems of any size to be run, but
may also let huge, infeasible problems get through and potentially take a
long time or crash R. See
optmatch::setMaxProblemSize() for more details.
All outputs described in
matchit() are returned with
method = "optimal". When
include.obj = TRUE in the call to
matchit(), the output of the call to
optmatch::fullmatch() will be
included in the output. When
exact is specified, this will be a list
of such objects, one for each stratum of the
In a manuscript, be sure to cite the following paper if using
method = "optimal":
Hansen, B. B., & Klopfer, S. O. (2006). Optimal Full Matching and Related Designs via Network Flows. Journal of Computational and Graphical Statistics, 15(3), 609–627. doi:10.1198/106186006X137047
For example, a sentence might read:
Optimal pair matching was performed using the MatchIt package (Ho, Imai, King, & Stuart, 2011) in R, which calls functions from the optmatch package (Hansen & Klopfer, 2006).
data("lalonde") #1:1 optimal PS matching with exact matching on race m.out1 <- matchit(treat ~ age + educ + race + nodegree + married + re74 + re75, data = lalonde, method = "optimal", exact = ~race) #> Warning: Fewer control units than treated units in some `exact` strata; not all treated units will get a match. m.out1 #> A matchit object #> - method: 1:1 optimal pair matching #> - distance: Propensity score #> - estimated with logistic regression #> - number of obs.: 614 (original), 232 (matched) #> - target estimand: ATT #> - covariates: age, educ, race, nodegree, married, re74, re75 summary(m.out1) #> #> Call: #> matchit(formula = treat ~ age + educ + race + nodegree + married + #> re74 + re75, data = lalonde, method = "optimal", exact = ~race) #> #> Summary of Balance for All Data: #> Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean #> distance 0.5774 0.1822 1.7941 0.9211 0.3774 #> age 25.8162 28.0303 -0.3094 0.4400 0.0813 #> educ 10.3459 10.2354 0.0550 0.4959 0.0347 #> raceblack 0.8432 0.2028 1.7615 . 0.6404 #> racehispan 0.0595 0.1422 -0.3498 . 0.0827 #> racewhite 0.0973 0.6550 -1.8819 . 0.5577 #> nodegree 0.7081 0.5967 0.2450 . 0.1114 #> married 0.1892 0.5128 -0.8263 . 0.3236 #> re74 2095.5737 5619.2365 -0.7211 0.5181 0.2248 #> re75 1532.0553 2466.4844 -0.2903 0.9563 0.1342 #> eCDF Max #> distance 0.6444 #> age 0.1577 #> educ 0.1114 #> raceblack 0.6404 #> racehispan 0.0827 #> racewhite 0.5577 #> nodegree 0.1114 #> married 0.3236 #> re74 0.4470 #> re75 0.2876 #> #> Summary of Balance for Matched Data: #> Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean #> distance 0.4972 0.4901 0.0324 1.0026 0.0054 #> age 25.6810 26.1379 -0.0639 0.5003 0.0688 #> educ 10.1207 10.3793 -0.1286 0.6544 0.0254 #> raceblack 0.7500 0.7500 0.0000 . 0.0000 #> racehispan 0.0948 0.0948 0.0000 . 0.0000 #> racewhite 0.1552 0.1552 0.0000 . 0.0000 #> nodegree 0.6724 0.6207 0.1138 . 0.0517 #> married 0.2414 0.2759 -0.0880 . 0.0345 #> re74 2782.5274 3090.7096 -0.0631 1.3821 0.0598 #> re75 1743.7041 1952.2387 -0.0648 1.5705 0.0647 #> eCDF Max Std. Pair Dist. #> distance 0.0776 0.0434 #> age 0.1810 1.2422 #> educ 0.0690 1.2863 #> raceblack 0.0000 0.0000 #> racehispan 0.0000 0.0000 #> racewhite 0.0000 0.0000 #> nodegree 0.0517 1.0239 #> married 0.0345 0.5283 #> re74 0.2845 0.6796 #> re75 0.1552 0.8397 #> #> Sample Sizes: #> Control Treated #> All 429 185 #> Matched 116 116 #> Unmatched 313 69 #> Discarded 0 0 #> #2:1 optimal matching on the scaled Euclidean distance m.out2 <- matchit(treat ~ age + educ + race + nodegree + married + re74 + re75, data = lalonde, method = "optimal", ratio = 2, distance = "scaled_euclidean") m.out2 #> A matchit object #> - method: 2:1 optimal pair matching #> - distance: Scaled Euclidean #> - number of obs.: 614 (original), 555 (matched) #> - target estimand: ATT #> - covariates: age, educ, race, nodegree, married, re74, re75 summary(m.out2, un = FALSE) #> #> Call: #> matchit(formula = treat ~ age + educ + race + nodegree + married + #> re74 + re75, data = lalonde, method = "optimal", distance = "scaled_euclidean", #> ratio = 2) #> #> Summary of Balance for Matched Data: #> Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean #> age 25.8162 26.4757 -0.0922 0.4947 0.0658 #> educ 10.3459 10.2730 0.0363 0.5810 0.0286 #> raceblack 0.8432 0.2351 1.6726 . 0.6081 #> racehispan 0.0595 0.1216 -0.2629 . 0.0622 #> racewhite 0.0973 0.6432 -1.8422 . 0.5459 #> nodegree 0.7081 0.6243 0.1843 . 0.0838 #> married 0.1892 0.4432 -0.6487 . 0.2541 #> re74 2095.5737 4120.1633 -0.4143 0.7972 0.1577 #> re75 1532.0553 2107.1613 -0.1786 1.0608 0.0935 #> eCDF Max Std. Pair Dist. #> age 0.1405 0.5477 #> educ 0.0865 0.3750 #> raceblack 0.6081 1.6726 #> racehispan 0.0622 0.2629 #> racewhite 0.5459 1.8422 #> nodegree 0.0838 0.1843 #> married 0.2541 0.6487 #> re74 0.4081 0.4806 #> re75 0.2595 0.3198 #> #> Sample Sizes: #> Control Treated #> All 429 185 #> Matched 370 185 #> Unmatched 59 0 #> Discarded 0 0 #>