help synth_runner
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Title
synth_runner -- Automation for multiple Synthetic Control estimations.
Syntax
synth_runner depvar predictorvars , [ trunit(#) trperiod(#) d(varname) trends pre_limit_mult(real>=1) training_propr(real) gen_vars ci pvals1s max_lead(int) noenforce_const_pre_length
n_pl_avgs(string) parallel deterministicout pred_prog(string) drop_units_prog(string) xperiod_prog(string) mspeperiod_prog(string) noredo_tr_error aggfile_v(string) aggfile_w(string)
synthsettings ]
The dataset must be declared as a (balanced) panel using tsset. Variables specified in depvar and predictorvars must be numeric variables; abbreviations are not allowed. The command synth
(available in SSC) is required. Auxiliary commands for generating graphs post-estimation are shown in the examples below. Finally, the version of the package can be found by running
synth_runner version and checking r(version) or viewing the displayed output.
Description
synth_runner automates the process of running multiple synthetic control estimations by synth. It will run placebo estimates in-space (estimations for the same treatment period but on all the
control units). It will then provide inference (p-values) comparing the estimated main effect to the distribution of placebo effects. It handles the case where several units receive
treatment, possibly at different time periods. If there are multiple treatment periods, then effects are centered around the treatment period so as to be comparable. The maximum common
number of leads and lags that can be achieved in the data given the treated units are used for analysis. It provides facilities for automatically generating outcome predictors using a
training proportion of the pre-treatment period. It also provides diagnostics to assess fit. synth_runner is designed to accompany synth but not to supersede it. For more details about
single estimations (variable weights, observation weights, covariate balance, and synthetic control outcomes when there are multiple time periods) use synth directly. See synth and Abadie and
Gardeazabal (2003) and Abadie, Diamond, and Hainmueller (2010, 2014) for more details.
Required Settings
depvar the outcome variable.
predictorvars the list of predictor variables. See synth for more details.
For specifying the unit and time-period of treatment, there are two methods. Exactly one of these is required.
trunit(#) and trperiod(#). This syntax (used by synth) can be used when there is a single unit entering treatment. Since synthetic control methods split time into pre-treatment and treated
periods, trperiod is the first of the treated periods and, slightly confusingly, also called post-treatment.
d(varname). The d variable should be a binary variable which is 1 for treated units in treated periods, and 0 everywhere else. This allows for multiple units to undergo treatment, possibly at
different times.
Options
trends will force synth to match on the trends in the outcome variable. It does this by scaling each unit's outcome variable so that it is 1 in the last pre-treatment period.
pre_limit_mult(real>=1) will not include placebo effects in the pool for inference if the match quality of that control, pre-treatment Root Mean Squared Predictive Error (RMSPE), is greater
than pre_limit_mult times the match quality of the treated unit.
training_propr(0<=real<=1) instructs synth_runner to automatically generate the outcome predictors. The default (0) is to not generate any (the user then includes the desired ones in
predictorvars). If set to a number greater than 0, then that initial proportion of the pre-treatment period is used as a training period with the rest being the validation period.
Outcome predictors for every time in the training period will be added to the synth commands. Diagnostics of the fit for the validation period will be outputted. If the value is between 0
and 1, there will be at least one training period and at least one validation period. If it is set to 1, then all the pre-treatment period outcome variables will be used as predictors.
This will make other covariate predictors redundant.
ci outputs confidence intervals from randomization inference for raw effect estimates. These should only be used if the treatment is randomly assigned (conditional on covariates and
interactive fixed-effects). If treatment is not randomly assigned then these confidence intervals do not have a straight-forward interpretation (in contrast to p-values which do).
pvals1s outputs one-sided p-values in addition to the two-sided p-values.
gen_vars generates variables in the dataset from estimation. This is only allowed if there is a single period in which unit(s) enter treatment. If gen_vars is specified, it will generate the
following variables:
lead:
A variable that contains the respective time period relative to treatment. Lead=1 specifies the first period of treatment. This is to match Cavallo et al. (2013) and in
effect is the offset from the last non-treatment period.
depvar_synth:
A variable that contains the unit's synthetic control outcome for that time period.
effect:
A variable that contains the difference between the unit's outcome and its synthetic control for that time period.
pre_rmspe:
A variable, constant for a unit, containing the pre-treatment match quality in terms of RMSPE.
post_rmspe:
A variable, constant for a unit, containing a measure of the post-treatment effect (jointly over all post-treatment time periods) in terms of RMSPE.
depvar_scaled:
If the match was done on trends, this is the unit's outcome variable normalized so that its last pre-treatment period outcome is 1.
depvar_scaled_synth:
If the match was done on trends, this is the unit's synthetic control's (scaled) outcome variable.
effect_scaled:
If the match was done on trends, this is the difference between the unit's (scaled) outcome and its (scaled) synthetic control for that time period.
n_pl_avgs(string) controls the number of placebo averages to compute for inference. The total possible grows exponentially with the number of treated events. If omitted, the default behavior
is cap the number of averages computed at 1,000,000 and if the total is more than that to sample (with replacement) the full distribution. The option n_pl_avgs(all) can be used to
override this behavior and compute all the possible averages. The option n_pl_avgs(#) can be used to specify a specific number less than the total number of averages possible.
max_lead(int) will limit the number of post-treatment periods analyzed. The default is the maximum number of leads that is available for all treatment periods.
noenforce_const_pre_length - When there are multiple periods, estimations at later treatment dates will have more pre-treatment history available. By default, these histories are trimmed on
the early side so that all estimations have the same amount of history. If instead, maximal histories are desired at each estimation stage, use noenforce_const_pre_length.
parallel will enable parallel processing if the parallel command is installed and configured. Version 1.18.2 is needed at a minimum (available via https://github.com/gvegayon/parallel/).
deterministicoutput eliminates displayed output that would vary depending on the machine (e.g. timers and number of parallel clusters) so that log files can be easily compared across runs.
pred_prog(string) is a method to allow time-contingent predictor sets. The user writes a program that takes as input a time period and outputs via r(predictors) a synth-style predictor
string. If one is not using training_propr then pred_program could be used to dynamically include outcome predictors. See Example 3 for usage details.
drop_units_prog(string) is the name of a program that, when passed the unit to be considered treated, will drop other units that should not be considered when forming the synthetic control.
Commonly this is because they are neighboring or interfering units. See Example 3 for usage details.
xperiod_prog(string) allows for setting of synth's xperiod option that varies with the treatment period. The user-written program is passed the treatment period and should return, via
r(xperiod), a numlist suitable for synth's xperiod (the period over which generic predictor variables are averaged). See synth for more details on the xperiod option. See Example 3 for
usage details.
mspeperiod_prog(string) allows for setting of synth's mspeperiod option that varies with the treatment period. The user-written program is passed the treatment period and should return, via
r(mspeperiod), a numlist suitable for synth's mspeperiod (the period over which the prediction outcome is evaluated). See synth for more details on the mspeperiod option. See Example 3
for usage details.
noredo_tr_error By default an error when estimating synth on a treated unit will be redone so that the output and error from synth can be seen by the user. Use this option to not redo the
estimation on error.
aggfile_v(string) and aggfile_w(string) overwrites those filenames with variable weights and unit weights from all the estimations. Both must be specified or neither. aggfile_v will have
variables V1-Vk, tr_unit_varname, and tr_time_varname, and for each tr_unit_varname-tr_time_varname estimation there will be one observation. aggfile_w has variables _Co_Number, _W_Weight,
tr_unit_varname, and tr_time_varname, and for each tr_unit_varname-tr_time_varname estimation there will be a row for each control donor.
synthsettings pass-through options sent to synth. See help synth for more information. The following which are disallowed: counit, figure, resultsperiod.
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Saved Results
synth_runner returns the following scalars and matrices.
e(treat_control) :
A matrix with the average treatment outcome (centered around treatment) and the average of the outcome of those unit's synthetic controls for the pre- and post-treatment periods.
e(b):
A vector with the per-period effects (unit's actual outcome minus the outcome of its synthetic control) for post-treatment periods.
e(n_pl):
The number of placebo averages used for comparison. For single treatment setups, this can be used to calculate purely randomized p-values.
e(pvals):
A vector of the proportions of placebo effects that are at least as large as the main effect for each post-treatment period.
e(pvals_std):
A vector of the proportions of placebo standardized effects that are at least as large as the main standardized effect for each post-treatment period.
e(pval_joint_post):
The proportion of placebos that have a post-treatment RMSPE at least as large as the average for the treated units.
e(pval_joint_post_std):
The proportion of placebos that have a ratio of post-treatment RMSPE over pre-treatment RMSPE at least as large as the average ratio for the treated units.
e(avg_pre_rmspe_p):
The proportion of placebos that have a pre-treatment RMSPE at least as large as the average of the treated units. A measure of fit. Concerning if significant.
e(failed_opt_targets):
Errors when constructing the synthetic controls for non-treated units are handled gracefully. If any are detected they will be listed in this matrix. (Errors when constructing the
synthetic control for treated units will abort the method.)
e(avg_val_rmspe_p):
When specifying training_propr, this is the proportion of placebos that have a RMSPE for the validation period at least as large as the average of the treated units. A measure of fit.
Concerning if significant.
Examples
The following examples use data from the synth package. Ensure that synth was installed with ancillary files (e.g., ssc install synth, all). This panel dataset contains information for 39 US
States for the years 1970-2000 (see Abadie, Diamond, and Hainmueller (2010) for details). Note, that the synth package's dataset might have a different name. It was originally uploaded as
smoking, then for a while the dataset installed was incorrect (there was a name collision with another package), and now the dataset is correct and named synth_smoking.
sysuse synth_smoking
tsset state year
Example 1 - Reconstruct the initial synth example plus graphs:
synth_runner cigsale beer(1984(1)1988) lnincome(1972(1)1988) retprice age15to24 cigsale(1988) cigsale(1980) cigsale(1975), trunit(3) trperiod(1989) gen_vars
single_treatment_graphs, trlinediff(-1) effects_ylabels(-30(10)30) effects_ymax(35) effects_ymin(-35)
effect_graphs , trlinediff(-1)
pval_graphs
In this example, synth_runner conducts all the estimations and inference. Since there was only a single treatment period we can save the output into the dataset. Then we can create the
various graphs. Note the option trlinediff allows the offset of a vertical treatment line. Likely options include values in the range from (first treatment period - last post-treatment
period) to 0 and the default value is -1 (to match Abadie et al. 2010).
Example 2 - Same treatment, but a bit more complicated setup:
cap drop pre_rmspe post_rmspe lead effect cigsale_synth
gen byte D = (state==3 & year>=1989)
synth_runner cigsale beer(1984(1)1988) lnincome(1972(1)1988) retprice age15to24, trunit(3) trperiod(1989) trends training_propr(`=13/18') gen_vars pre_limit_mult(10)
single_treatment_graphs, scaled
effect_graphs , scaled
pval_graphs
Again there is a single treatment period, so output can be saved and merged back into the dataset. In this setting we (a) specify the treated units/periods with a binary variable, (b)
generate the outcome predictors automatically using the initial 13 periods of the pre-treatment era (the rest is the "validation" period), and (c) we match on trends.
Example 3 - Multiple treatments at different time periods:
cap drop pre_rmspe post_rmspe lead effect cigsale_synth
cap drop cigsale_scaled effect_scaled cigsale_scaled_synth D
cap program drop my_pred my_drop_units my_xperiod my_mspeperiod
program my_pred, rclass
args tyear
return local predictors "beer(`=`tyear'-4'(1)`=`tyear'-1') lnincome(`=`tyear'-4'(1)`=`tyear'-1')"
end
program my_drop_units
args tunit
if `tunit'==39 qui drop if inlist(state,21,38)
if `tunit'==3 qui drop if state==21
end
program my_xperiod, rclass
args tyear
return local xperiod "`=`tyear'-12'(1)`=`tyear'-1'"
end
program my_mspeperiod, rclass
args tyear
return local mspeperiod "`=`tyear'-12'(1)`=`tyear'-1'"
end
gen byte D = (state==3 & year>=1989) | (state==7 & year>=1988)
synth_runner cigsale retprice age15to24, d(D) pred_prog(my_pred) trends training_propr(`=13/18') drop_units_prog(my_drop_units)) xperiod_prog(my_xperiod) mspeperiod_prog(my_mspeperiod)
effect_graphs
pval_graphs
We extend Example 2 by considering a control state now to be treated (Georgia in addition to California). No treatment actually happened in Georgia in 1987. Now that we have several
treatment periods we can not merge in a simple file. Some of the graphs (of single_treatment_graphs) can no longer be made. We also show how predictors, unit dropping, xperiod, and
mspeperiod can be dynamically generated depending on the treatment year.
Development
If you encounter a bug in the program, please ensure your are running the most recent version from the GitHub site. If the problem persists, see if the bug has been previously reported at
https://github.com/bquistorff/synth_runner/issues. If not, file a new 'issue' there and list (a) the steps causing the problem (with output) and (b) the version of synth_runner used (found from
which synth_runner).
Contributions may also be made via a pull request from the GitHub page.
To be notified of new releases, subscribe to notifications of this issue .
Citation of synth_runner
synth_runner is not an official Stata command. It is a free contribution to the research community, like a paper. Please cite it as such:
Brian Quistorff and Sebastian Galiani. The synth_runner package: Utilities to automate synthetic control estimation using synth, August 2017. https://github.com/bquistorff/synth_runner.
Version 1.6.0.
And in bibtex format:
@Misc{QG17,
Title = {The synth\_runner Package: Utilities to Automate Synthetic Control Estimation Using synth},
Author = {Brian Quistorff and Sebastian Galiani},
Month = aug,
Note = {Version 1.6.0},
Year = {2017},
Url = {https://github.com/bquistorff/synth_runner}
}
References
Abadie, A., Diamond, A., and Hainmueller, J. 2014. Comparative Politics and the Synthetic Control Method. American Journal of Political Science, 59(2):495–510, Apr 2014.
Abadie, A., Diamond, A., and Hainmueller, J. 2010. Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program. Journal of the
American Statistical Association 105(490): 493-505.
Abadie, A. and Gardeazabal, J. 2003. Economic Costs of Conflict: A Case Study of the Basque Country. American Economic Review 93(1): 113-132.
Cavallo, E., Galiani, S., Noy, I., and Pantano, J. 2013. Catastrophic natural disasters and economic growth. Review of Economics and Statistics, 95(5):1549–1561, Dec 2013.
Authors
Brian Quistorff, brian-work@quistorff.com (corresponding author, see Development section for reportings bugs)
Bureau of Economic Analysis
Sebastian Galiani
University of Maryland