In `matchit()`

, setting `method = "genetic"`

performs genetic matching.
Genetic matching is a form of nearest neighbor matching where distances are
computed as the generalized Mahalanobis distance, which is a generalization
of the Mahalanobis distance with a scaling factor for each covariate that
represents the importance of that covariate to the distance. A genetic
algorithm is used to select the scaling factors. The scaling factors are
chosen as those which maximize a criterion related to covariate balance,
which can be chosen, but which by default is the smallest p-value in
covariate balance tests among the covariates. This method relies on and is a
wrapper for `Matching::GenMatch()`

and `Matching::Match()`

, which use
`rgenoud::genoud()`

to perform the optimization using the genetic
algorithm.

This page details the allowable arguments with `method = "genetic"`

.
See `matchit()`

for an explanation of what each argument means in a general
context and how it can be specified.

Below is how `matchit()`

is used for genetic matching:

```
matchit(formula,
data = NULL,
method = "genetic",
distance = "glm",
link = "logit",
distance.options = list(),
estimand = "ATT",
exact = NULL,
mahvars = NULL,
antiexact = NULL,
discard = "none",
reestimate = FALSE,
s.weights = NULL,
replace = FALSE,
m.order = NULL,
caliper = NULL,
ratio = 1,
verbose = FALSE,
...)
```

- formula
a two-sided formula object containing the treatment and covariates to be used in creating the distance measure used in the matching. This formula will be supplied to the functions that estimate the distance measure and is used to determine the covariates whose balance is to be optimized.

- data
a data frame containing the variables named in

`formula`

. If not found in`data`

, the variables will be sought in the environment.- method
set here to

`"genetic"`

.- distance
the distance measure to be used. See

`distance`

for allowable options. When set to a method of estimating propensity scores or a numeric vector of distance values, the distance measure is included with the covariates in`formula`

to be supplied to the generalized Mahalanobis distance matrix unless`mahvars`

is specified. Otherwise, only the covariates in`formula`

are supplied to the generalized Mahalanobis distance matrix to have their scaling factors chosen.`distance`

*cannot*be supplied as a distance matrix. Supplying any method of computing a distance matrix (e.g.,`"mahalanobis"`

) has the same effect of omitting propensity score but does not affect how the distance between units is computed otherwise.- link
when

`distance`

is specified as a method of estimating propensity scores, an additional argument controlling the link function used in estimating the distance measure. See`distance`

for allowable options with each option.- distance.options
a named list containing additional arguments supplied to the function that estimates the distance measure as determined by the argument to

`distance`

.- estimand
a string containing the desired estimand. Allowable options include

`"ATT"`

and`"ATC"`

. See Details.- exact
for which variables exact matching should take place.

- mahvars
when a distance corresponds to a propensity score (e.g., for caliper matching or to discard units for common support), which covariates should be supplied to the generalized Mahalanobis distance matrix for matching. If unspecified, all variables in

`formula`

will be supplied to the distance matrix. Use`mahvars`

to only supply a subset. Even if`mahvars`

is specified, balance will be optimized on all covariates in`formula`

. See Details.- antiexact
for which variables ant-exact matching should take place. Anti-exact matching is processed using the

`restrict`

argument to`Matching::GenMatch()`

and`Matching::Match()`

.- discard
a string containing a method for discarding units outside a region of common support. Only allowed when

`distance`

corresponds to a propensity score.- reestimate
if

`discard`

is not`"none"`

, whether to re-estimate the propensity score in the remaining sample prior to matching.- s.weights
the variable containing sampling weights to be incorporated into propensity score models and balance statistics. These are also supplied to

`GenMatch()`

for use in computing the balance t-test p-values in the process of matching.- replace
whether matching should be done with replacement.

- m.order
the order that the matching takes place. The default is

`"largest"`

when`distance`

corresponds to a propensity score and`"data"`

otherwise. See`matchit()`

for allowable options.- caliper
the width(s) of the caliper(s) used for caliper matching. See Details and Examples.

- std.caliper
`logical`

; when calipers are specified, whether they are in standard deviation units (`TRUE`

) or raw units (`FALSE`

).- ratio
how many control units should be matched to each treated unit for k:1 matching. Should be a single integer value.

- verbose
`logical`

; whether information about the matching process should be printed to the console. When`TRUE`

, output from`GenMatch()`

with`print.level = 2`

will be displayed. Default is`FALSE`

for no printing other than warnings.- ...
additional arguments passed to

`Matching::GenMatch()`

. Potentially useful options include`pop.size`

,`max.generations`

, and`fit.func`

. If`pop.size`

is not specified, a warning from*Matching*will be thrown reminding you to change it. Note that the`ties`

and`CommonSupport`

arguments are set to`FALSE`

and cannot be changed. If`distance.tolerance`

is not specified, it is set to 0, whereas the default in*Matching*is 1e-5.

In genetic matching, covariates play three roles: 1) as the variables on
which balance is optimized, 2) as the variables in the generalized
Mahalanobis distance between units, and 3) in estimating the propensity
score. Variables supplied to `formula`

are always used for role (1), as
the variables on which balance is optimized. When `distance`

corresponds to a propensity score, the covariates are also used to estimate
the propensity score (unless it is supplied). When `mahvars`

is
specified, the named variables will form the covariates that go into the
distance matrix. Otherwise, the variables in `formula`

along with the
propensity score will go into the distance matrix. This leads to three ways
to use `distance`

and `mahvars`

to perform the matching:

When

`distance`

corresponds to a propensity score and`mahvars`

*is not*specified, the covariates in`formula`

along with the propensity score are used to form the generalized Mahalanobis distance matrix. This is the default and most typical use of`method = "genetic"`

in`matchit()`

.When

`distance`

corresponds to a propensity score and`mahvars`

*is*specified, the covariates in`mahvars`

are used to form the generalized Mahalanobis distance matrix. The covariates in`formula`

are used to estimate the propensity score and have their balance optimized by the genetic algorithm. The propensity score is not included in the generalized Mahalanobis distance matrix.When

`distance`

is a method of computing a distance matrix (e.g.,`"mahalanobis"`

), no propensity score is estimated, and the covariates in`formula`

are used to form the generalized Mahalanobis distance matrix. Which specific method is supplied has no bearing on how the distance matrix is computed; it simply serves as a signal to omit estimation of a propensity score.

When a caliper is specified, any variables mentioned in `caliper`

,
possibly including the propensity score, will be added to the matching
variables used to form the generalized Mahalanobis distance matrix. This is
because *Matching* doesn't allow for the separation of caliper
variables and matching variables in genetic matching.

The `estimand`

argument controls whether control
units are selected to be matched with treated units (`estimand = "ATT"`

) or treated units are selected to be matched with control units
(`estimand = "ATC"`

). The "focal" group (e.g., the treated units for
the ATT) is typically made to be the smaller treatment group, and a warning
will be thrown if it is not set that way unless `replace = TRUE`

.
Setting `estimand = "ATC"`

is equivalent to swapping all treated and
control labels for the treatment variable. When `estimand = "ATC"`

, the
default `m.order`

is `"smallest"`

, and the `match.matrix`

component of the output will have the names of the control units as the
rownames and be filled with the names of the matched treated units (opposite
to when `estimand = "ATT"`

). Note that the argument supplied to
`estimand`

doesn't necessarily correspond to the estimand actually
targeted; it is merely a switch to trigger which treatment group is
considered "focal". Note that while `GenMatch()`

and `Match()`

support the ATE as an estimand, `matchit()`

only supports the ATT and
ATC for genetic matching.

All outputs described in `matchit()`

are returned with
`method = "genetic"`

. When `replace = TRUE`

, the `subclass`

component is omitted. When `include.obj = TRUE`

in the call to
`matchit()`

, the output of the call to `Matching::GenMatch()`

will be
included in the output.

In a manuscript, be sure to cite the following papers if using
`matchit()`

with `method = "genetic"`

:

Diamond, A., & Sekhon, J. S. (2013). Genetic matching for estimating causal effects: A general multivariate matching method for achieving balance in observational studies. Review of Economics and Statistics, 95(3), 932–945. doi:10.1162/REST_a_00318

Sekhon, J. S. (2011). Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching package for R. Journal of Statistical Software, 42(1), 1–52. doi:10.18637/jss.v042.i07

For example, a sentence might read:

*Genetic matching was performed using the MatchIt package (Ho, Imai,
King, & Stuart, 2011) in R, which calls functions from the Matching package
(Diamond & Sekhon, 2013; Sekhon, 2011).*

`matchit()`

for a detailed explanation of the inputs and outputs of
a call to `matchit()`

.

`Matching::GenMatch()`

and `Matching::Match()`

, which do the work.

```
data("lalonde")
# 1:1 genetic matching with PS as a covariate
m.out1 <- matchit(treat ~ age + educ + race + nodegree +
married + re74 + re75, data = lalonde,
method = "genetic",
pop.size = 10) #use much larger pop.size
m.out1
#> A matchit object
#> - method: 1:1 genetic matching without replacement
#> - distance: Propensity score
#> - estimated with logistic regression
#> - number of obs.: 614 (original), 370 (matched)
#> - target estimand: ATT
#> - covariates: age, educ, race, nodegree, married, re74, re75
summary(m.out1)
#>
#> Call:
#> matchit(formula = treat ~ age + educ + race + nodegree + married +
#> re74 + re75, data = lalonde, method = "genetic", pop.size = 10)
#>
#> Summary of Balance for All Data:
#> Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
#> distance 0.5774 0.1822 1.7941 0.9211 0.3774
#> age 25.8162 28.0303 -0.3094 0.4400 0.0813
#> educ 10.3459 10.2354 0.0550 0.4959 0.0347
#> raceblack 0.8432 0.2028 1.7615 . 0.6404
#> racehispan 0.0595 0.1422 -0.3498 . 0.0827
#> racewhite 0.0973 0.6550 -1.8819 . 0.5577
#> nodegree 0.7081 0.5967 0.2450 . 0.1114
#> married 0.1892 0.5128 -0.8263 . 0.3236
#> re74 2095.5737 5619.2365 -0.7211 0.5181 0.2248
#> re75 1532.0553 2466.4844 -0.2903 0.9563 0.1342
#> eCDF Max
#> distance 0.6444
#> age 0.1577
#> educ 0.1114
#> raceblack 0.6404
#> racehispan 0.0827
#> racewhite 0.5577
#> nodegree 0.1114
#> married 0.3236
#> re74 0.4470
#> re75 0.2876
#>
#> Summary of Balance for Matched Data:
#> Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
#> distance 0.5774 0.3514 1.0259 0.6922 0.1693
#> age 25.8162 25.4216 0.0551 0.5403 0.0650
#> educ 10.3459 10.0270 0.1586 0.6164 0.0270
#> raceblack 0.8432 0.4703 1.0259 . 0.3730
#> racehispan 0.0595 0.2757 -0.9143 . 0.2162
#> racewhite 0.0973 0.2541 -0.5289 . 0.1568
#> nodegree 0.7081 0.6541 0.1189 . 0.0541
#> married 0.1892 0.2757 -0.2208 . 0.0865
#> re74 2095.5737 3518.1230 -0.2911 0.8316 0.1120
#> re75 1532.0553 1916.2742 -0.1194 1.1293 0.0664
#> eCDF Max Std. Pair Dist.
#> distance 0.4216 1.0353
#> age 0.2108 0.7819
#> educ 0.0919 0.5619
#> raceblack 0.3730 1.0259
#> racehispan 0.2162 1.1886
#> racewhite 0.1568 0.5289
#> nodegree 0.0541 0.1189
#> married 0.0865 0.2208
#> re74 0.3243 0.4510
#> re75 0.2162 0.5542
#>
#> Sample Sizes:
#> Control Treated
#> All 429 185
#> Matched 185 185
#> Unmatched 244 0
#> Discarded 0 0
#>
# 2:1 genetic matching with replacement without PS
m.out2 <- matchit(treat ~ age + educ + race + nodegree +
married + re74 + re75, data = lalonde,
method = "genetic", replace = TRUE,
ratio = 2, distance = "mahalanobis",
pop.size = 10) #use much larger pop.size
m.out2
#> A matchit object
#> - method: 2:1 genetic matching with replacement
#> - distance: Mahalanobis
#> - number of obs.: 614 (original), 307 (matched)
#> - target estimand: ATT
#> - covariates: age, educ, race, nodegree, married, re74, re75
summary(m.out2, un = FALSE)
#>
#> Call:
#> matchit(formula = treat ~ age + educ + race + nodegree + married +
#> re74 + re75, data = lalonde, method = "genetic", distance = "mahalanobis",
#> replace = TRUE, ratio = 2, pop.size = 10)
#>
#> Summary of Balance for Matched Data:
#> Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
#> age 25.8162 25.5108 0.0427 0.6553 0.0434
#> educ 10.3459 10.3730 -0.0134 0.9363 0.0085
#> raceblack 0.8432 0.8378 0.0149 . 0.0054
#> racehispan 0.0595 0.0595 0.0000 . 0.0000
#> racewhite 0.0973 0.1027 -0.0182 . 0.0054
#> nodegree 0.7081 0.6811 0.0594 . 0.0270
#> married 0.1892 0.1892 0.0000 . 0.0000
#> re74 2095.5737 1943.6668 0.0311 1.4145 0.0331
#> re75 1532.0553 1170.0871 0.1124 1.6028 0.0318
#> eCDF Max Std. Pair Dist.
#> age 0.1784 0.5020
#> educ 0.0270 0.2796
#> raceblack 0.0054 0.0149
#> racehispan 0.0000 0.0000
#> racewhite 0.0054 0.0182
#> nodegree 0.0270 0.0713
#> married 0.0000 0.0000
#> re74 0.1838 0.2132
#> re75 0.0784 0.2708
#>
#> Sample Sizes:
#> Control Treated
#> All 429. 185
#> Matched (ESS) 51.35 185
#> Matched 122. 185
#> Unmatched 307. 0
#> Discarded 0. 0
#>
# 1:1 genetic matching on just age, educ, re74, and re75
# within calipers on PS and educ; other variables are
# used to estimate PS
m.out3 <- matchit(treat ~ age + educ + race + nodegree +
married + re74 + re75, data = lalonde,
method = "genetic",
mahvars = ~ age + educ + re74 + re75,
caliper = c(.05, educ = 2),
std.caliper = c(TRUE, FALSE),
pop.size = 10) #use much larger pop.size
m.out3
#> A matchit object
#> - method: 1:1 genetic matching without replacement
#> - distance: Mahalanobis [matching]
#> Propensity score [caliper]
#> - estimated with logistic regression
#> - caliper: <distance> (0.015), educ (2)
#> - number of obs.: 614 (original), 218 (matched)
#> - target estimand: ATT
#> - covariates: age, educ, race, nodegree, married, re74, re75
summary(m.out3, un = FALSE)
#>
#> Call:
#> matchit(formula = treat ~ age + educ + race + nodegree + married +
#> re74 + re75, data = lalonde, method = "genetic", mahvars = ~age +
#> educ + re74 + re75, caliper = c(0.05, educ = 2), std.caliper = c(TRUE,
#> FALSE), pop.size = 10)
#>
#> Summary of Balance for Matched Data:
#> Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
#> distance 0.5003 0.4926 0.0348 1.0286 0.0115
#> age 25.7064 25.2110 0.0692 0.5675 0.0578
#> educ 10.2385 10.4404 -0.1004 0.5484 0.0299
#> raceblack 0.7339 0.7156 0.0505 . 0.0183
#> racehispan 0.1009 0.1101 -0.0388 . 0.0092
#> racewhite 0.1651 0.1743 -0.0310 . 0.0092
#> nodegree 0.6881 0.6147 0.1614 . 0.0734
#> married 0.2385 0.2110 0.0703 . 0.0275
#> re74 2732.5276 2173.0786 0.1145 1.8109 0.0480
#> re75 2042.2572 1497.7280 0.1691 2.1270 0.0391
#> eCDF Max Std. Pair Dist.
#> distance 0.0826 0.0429
#> age 0.2018 0.9745
#> educ 0.0826 1.0221
#> raceblack 0.0183 0.1009
#> racehispan 0.0092 0.4267
#> racewhite 0.0092 0.2167
#> nodegree 0.0734 0.8879
#> married 0.0275 0.6325
#> re74 0.2202 0.5467
#> re75 0.1101 0.5774
#>
#> Sample Sizes:
#> Control Treated
#> All 429 185
#> Matched 109 109
#> Unmatched 320 76
#> Discarded 0 0
#>
```