help parallel also see: miparallel
-------------------------------------------------------------------------------------------------------------------------------------------------
Title
parallel -- Stata module for Parallel computing
Index
Sections
1. Syntax Command syntax.
2. Description Command description.
3. Details How does parallel works.
4. Parallel Append Using -parallel append- syntax.
5. Caveats Things to consider before using parallel.
6. Technical note Some details under the hood.
7. Examples Some examples using parallel
8. Saved results A list of parallel's save results
9. Citation How cite parallel.
10. Development Up-to-date version and bug reporting
11. Source code parallel's (MATA) source code
12. Authors Authors behind parallel
13. Contributors Notable contributors
14. Also see Other modules related to parallel
15. FAQs Frequently Asked Questions
Available commands
1. parallel initialize Setting the number of child processes.
2. parallel numprocessors Getting the number of processors on the system.
3. parallel do Parallelizing a do-file.
4. parallel : (prefix) Parallelizing a Stata command (parallel prefix).
5. parallel bs Parallel bootstrapping.
6. parallel sim Parallel simulate.
7. parallel append Multiple file processing and appending.
8. parallel clean Removing auxiliary files.
9. parallel printlog Checking out child processes' log files.
10. parallel version Query parallel current version.
11. parallel citation How to cite parallel.
1. Syntax
---------------------------------------------------------------------------------------------------------------------------------------------
Setting the number of child processes (threads/processors)
parallel initialize [ # , force statapath(stata_path) includefile(filename) hostnames(string) ssh(string) procexec(int)]
---------------------------------------------------------------------------------------------------------------------------------------------
Getting the number of processors on the system
parallel numprocessors
---------------------------------------------------------------------------------------------------------------------------------------------
Parallelizing a do-file
parallel do filename [, by(varlist) force nodata setparallelid(pll_id) execution_options]
---------------------------------------------------------------------------------------------------------------------------------------------
Parallelizing a Stata command (parallel prefix)
parallel [, by(varlist) force keep nodata setparallelid(pll_id) execution_options]: command
---------------------------------------------------------------------------------------------------------------------------------------------
Parallel bootstrapping
parallel bs [, expression(exp_list) execution_options bs_options ] [: command]
---------------------------------------------------------------------------------------------------------------------------------------------
Parallel simulate
parallel sim [ , expression(exp_list) execution_options sim_options ] [: command]
---------------------------------------------------------------------------------------------------------------------------------------------
Multiple file processing and appending
parallel append [file(s)] , do(cmd|dofile) [in(in) if(if) expression(expand expression (see details)) execution_options ]
---------------------------------------------------------------------------------------------------------------------------------------------
Removing auxiliary files
parallel clean [, event(pll_id) all force]
---------------------------------------------------------------------------------------------------------------------------------------------
Checking out child processes' logfiles by printing the output.
parallel printlog [#] [, event(pll_id)]
Checking out child processes' logfiles by showing the output in a view window.
parallel viewlog [#] [, event(pll_id)]
---------------------------------------------------------------------------------------------------------------------------------------------
Query parallel current version
parallel version
parallel citation
options Description
-------------------------------------------------------------------------------------------------------------------------------------------
Setting the number of child processes
# The number of child processes. If omitted the default is max(floor(num_processors*0.75),1)
force Overrides the restriction on using more child processes than processors on your machine (see the WARNING in description).
This option is assumed when specifying hostnames.
statapath File path. parallel tries to automatically identify Stata's exe path. By using this option you will override this and
force parallel to use a specific path to stata.exe.
includefile File path. This file will be included before parallel commands are executed. The target purpose for this is to allow one
to copy over preferences that parallel does not copy automatically.
hostnames a space delimited list of hostnames. For the local machine, use localhost. Work will be assigned in the order of the
list and the list elements will be re-used if num child processes is longer than the list. An example would be
localhost node2 node3. If no option is provided, then localhost is assumed. Leave blank for local execution.
ssh The command used to connect to remote machines. If none is provided, this will be ssh. This option is not needed for
local execution.
procexec On Windows, controls how child processes are spawned. The default value 2 will launch them in a hidden desktop (they can
still be seen in the task manager) so that the child applications don't briefly steal the window focus (which is
annoying). With value 1 the child processes will be launched in the user's desktop, will be launch auto-minimized, but
will still briefly steal the focus. and will steal focus and perhaps briefly show the windows of the child processes.
execution_options
keep Keeps auxiliary files generated by parallel. Use this and the next option with care as there can be many file that take
up space.
keeplast Keeps auxiliary files and remove those last time saved during the current session.
programs A list of programs to be passed to each child process. To do this, parallel needs to echo the contents of those programs
to the output window. If parallel is being run from inside an ado (say my_cmd.ado) and you need to access local
subroutines (other programs defined in the ado beside the primary my_cmd), then you must pass their names in this option
as my_cmd.local_subroutine_name for them to be accessible.
mata If the algorithm needs to use mata objects, this option allows to pass to each child process every mata object loaded in
the current session (including functions). Note that when mata objects are loaded into the child processes they will
have different locations and therefore pointers may no longer be accurate.
noglobal Avoid passing current session's globals to the child processes.
seeds Numlist. With this option the user can pass an specific seed to be used within each child process.
randtype String. Tells parallel whether to use the current seed (-current-), the current datetime (-datetime-) or random.org API
(-random.org-) to generate the seeds for each child processes (please read the Description section).
processors Integer. If running on StataMP, sets the number of processors each child process should use. Default value is 1, to help
avoid the sum total of Stata processes across child instances being more than the number of physical processors (which
can severly limit performance).
timeout Integer. If a child process hasn't started, how much time in seconds does parallel has to wait until assume that there was
a connection error and thus the child process won't start. Default value is 60.
outputopts A list of option names that are aggregating output options. parallel automtically aggregates main data from child
processes. Often, though, a program will aggregate more than one type of data. outputopts allows generic file-based
aggregation (appending). A sequential call such as my_prog, output1(outputfile.dta) can be converted to parallel,
outputopts(output1): my_prog, output1(outputfile.dta). parallel will execute each child process with its own file
passed to output1 and at the end, append them all and save it to outputfile.dta.
deterministicoutput
will eliminates displayed output that would vary depending on the machine (e.g. timers, seeds, and number of parallel
child processes) so that log files can be easily compared across runs. Errors are still printed.
Byable parallelization
by Varlist. Tells the command through which observations the current dataset can be divided, avoiding stories (panel)
splitting over two or more child processes. The semantics for by are not the same as for Stata. When Stata implements
by, the command that is run will only see a section of the data where the by-variables are the same. parallel's
semantics are that no observations with the same by-values will be in different child processes. It pools together
combinations when there are fewer child processes than by-var combinations. If you need Stata-style semantics, the
solution is to add by in the subcommand. For example, parallel, by(byvar): by byvar: egen x_max = max(x).
force When using by, parallel checks whether if the dataset is properly sorted. By using force the command skips this check.
Parallel bootstrap
expression An exp_list to be passed to the bootstrap command.
bs_options Further options to be passed to the bootstrap command, including the optional reps() parameter.
Parallel simulate
expression An exp_list to be passed to the simulate command.
sim_options Further options to be passed to the simulate command, including the required reps() parameter.
Multiple file processing and appending
do Stata cmd or dofile. Note that parallel do does not support passing options to the do-file. If you need arguments then
use the prefix style.
files Explicit list of files to process.
expression String. Expression representing file names in the form of "%fmts, numlist1 [, numlist2 [, ...]]"
Removing auxiliary files
event String. Specifies which executed (and stored) event's files should be removed.
all Tells parallel to remove every remnant auxiliary files generated by it in the current directory.
force Forces the command to remove (apparently) in-use auxiliary files. Otherwise these will not get deleted.
Other options
event String. With printlog and viewlog this specifies which event's log files should be displayed.
setparallelid Programmers' option. Forces parallel to use an specific id (pll_id) (see Technical Notes).
nodata Tells parallel not to use loaded data and thus not to try splitting or appending anything.
2. Description
-parallel- allows to implement parallel computing, without having StataMP, substantially reducing computing time. Specially suitable for
bootstrapping and simulations, parallel includes out-of-the-box tools for implementing such algorithms.
In order to use -parallel- it is necessary to set the number of desired child processes with which the user wants to work with. To do this
the user should use -parallel initialize- syntaxes, replacing # with the desired number of child processes. Setting more child processes
than physical cores the user's computer has it is not recommended (see the WARNING in description).
-parallel do- is the equivalent (wrapper) to -do-. When using this syntax parallel runs the dofile in as many child processes as there
where specified by the user, this is, start $PLL_CHILDREN Stata instances in batch mode. By default the loaded dataset will be split into
the number of child processes specified by -parallel initialize- and the do-file will be executed independently over each and every one of
the data chunks, so once after all the parallel-instances stops, the datasets will be appended. In order to avoid loading the current
dataset in the child processes, the user should specify the -nodata- option.
-parallel :- (as a prefix) allows to, after splitting the loaded dataset, execute a stata_cmd over the specified number of data chunks in
order to speed up computations. Like -parallel do-, after all the parallel-instances stops, the datasets will be appended.
-parallel bs- and -parallel sim- are parallel wrappers for the commands -bootstrap- and -simulate-. Specially suited for these algorithms,
-parallel- allows conducting embarrassingly parallel computing. In terms of syntax, besides cmd names, the only difference that these two
commands have with their serial versions is how are expressions passed (please refer to the examples section for this).
Every time that -parallel- runs several auxiliary files are generated which, after finishing, are automatically deleted. In the case that
the user sets -keep- or -keeplast- the auxiliary files are kept, thus the syntax -parallel clean- becomes handy. With -parallel clean- the
user can remove the last generated auxiliary files (default option), an specific parallel instance files (using #pll_id number), or every
stored auxiliary file (with -all-). For security reasons, in-use auxiliary files will not be deleted unless the user specifies it through
the option force, which will override not deleting in-use auxiliary files (see the Technical note section for more information about this).
Log files from the runs are stored in c(tmpdir) so that they can be inspected by the user. The user will likely want to delete these
periodically with parallel clean, all.
In the case of handling multiple files (because it is, for example, a big dataset divided into multiple dta files), -parallel append-
becomes handy as it allows the user to process them simultaneously. By providing a list of files and a Stata cmd or dofile, -parallel
append- opens and executes the cmd/dofile within each file, stores each file results and appends them into a single file. Also, if the
files to be processed have a pattern base name, the user can provide -parallel append- with an expression representing the list of files to
be processed; for information on how to use this feature, see the section Parallel Append.
Given N child processes, within each child process -parallel- creates the macros pll_id (equal for all the child processes) and
pll_instance (ranging 1 up to N, equaling 1 inside the first child process and N inside the last child process), both as globals and locals
macros. This allows the user setting different tasks/actions depending on the child process. Also the global macro PLL_CHILDREN (equal to
N) is available within each child process. For an example using this macros, please refer to the Examples section.
As by now, -parallel- by default automatically identifies Stata's executable file path. This is necessary as it is used to run Stata in
batch mode (the mainstream of the module). Either way, after some reports, that file path is not always correctly identified; where the
option -statadir- in -parallel initialize- can be used to manually set it.
In the case of pseudo-random-numbers, the module allows to pass different seed for each child process. Moreover, if the user does not
provide a numlist of seeds, -parallel- generates its own numlist of seeds using three different options: (1) based on the current seed;
(2) using the current datetime and user as a seed to generate each seed, restoring the original seed afterwards; or (3) using random.org
API (requires internet connection) to directly generate each seed (also restoring the original seed afterwards). -parallel- saves a macro
with the used seeds in the r(pll_seeds) macro.
WARNINGS For each child process -parallel- starts a new Stata instance (thus running as many processes as child processes), this way,
should the user set more child processes than cores the computer has, it is possible that the computer freezes.
3. Details
Inspired by the R library ``snow'' and to be used in multicore CPUs , -parallel- implements parallel computing methods through an OS's
shell scripting (using Stata in batch mode) to speedup computations by splitting the dataset into a determined number of child processes in
such a way to implement a data parallelism algorithm.
The number of efficient computing child processes depends upon the number of physical cores (CPUs) with which your computer is built, e.g.
if you have a quad-core computer, the correct child process setting should be four. In the case of simultaneous multithreading, such as
that from Intel's hyper-threading technology (HTT), setting -parallel- following the number of processors threads, as it was expected,
hardly results into a perfect speedup scaling. In spite of it, after several tests on HTT capable architectures, the results of
implementing -parallel- according to the machines physical cores versus its logical cores shows small though significant differences.
-parallel- is especially handy when it comes to implementing loop-based simulation models (or simply loops), Stata commands such as reshape
, or any job that (a) can be repeated through data-blocks, and (b) routines that processes big datasets (see the append section).
Furthermore, the commands -parallel bs- and -parallel sim- are specially designed to easily implement bootstrapping and (monte carlo)
simulations in parallel fashion.
At this time -parallel- has been successfully tested in Windows, Unix and MacOS for Stata versions 11 to 14.
-parallel- does not change the RNG state (even if subcommands invoke randomization functions).
After several tests, it has been proven that--thanks to how -parallel- has been written--it is possible to use the algorithm in such a way
that other techniques of parallel computing can be implemented; such as Monte Carlo Simulations, simultaneously running models, etc.. An
extensive example through Monte Carlo Simulations is provided here.
To distribute work across different machines in a computer cluster, the machines need to be Linux/MacOS, share a global file-system (e.g.
NFS), and have a non-interactive way to remotely execute commands. The most common way to remotely execute commands is to use ssh with
keyfiles so that no password is needed. This is still a new feature, and synchronizing across machines in child processes can have odd
corner cases, so users may encounter some trouble getting this to work.
4. Parallel Append
Imagine we have several dta files named -income.dta- stored in a set of folders ranging 2008_01 up to 2012_12, this is, a total of 60 files
(12 times 5) monthly ordered which may look something like this:
2008_01/income.dta
2008_02/income.dta
2008_03/income.dta
...more files...
2010_01/income.dta
2010_02/income.dta
2010_03/income.dta
...more files...
2012_10/income.dta
2012_11/income.dta
2012_12/income.dta
Now, imagine that for each and every one of those files we would like to execute the following program:
program def myprogram
gen female = (gender == "female")
collapse (mean) income, by(female) fast
end
Instead of writing a forval/foreach loop (which would be the natural solution for this situation), -parallel append- allows us to smoothly
solve this with the following command.
. parallel append, do(myprogram) prog(myprogram) ///
e("%g_%02.0f/income.dta, 2008/2012, 1/12")
Where element by element, we are telling parallel:
(1) do(myprogram): execute the command -myprogram-,
(2) prog(myprogram): -myprogram- is a user written program, and
(3) e("%g_%02.0f/income.dta, 2008/2012, 1/12"): this should process files 2008_01/income.dta up to 2012_12/income.dta.
Besides of the simplicity of its syntax, the advantage of using -parallel append- lies in doing so in a parallel fashion, this is, instead
of processing one file at a time, -parallel- manages to process these files in groups of as many files as child processes are set.
Step-by-step, what this command does is:
1. Distribute groups of files across child processes
Once each child process starts, for each dta file
2. Opens the file using [if] [in] accordingly to in and if options.
3. Executes the command/dofile specified by the user.
3. Stores the results in a temp dta file.
Finally, once all the files have been processed
4. Appends all the resulting files into a single one.
5. Caveats
When the -stata_cmd- or -do-file- saves results, as -parallel- runs Stata in batch mode, none of the results will be kept. This is also
true for matrices, scalars, mata objects, returns, or whatever other object different from data.
Although -parallel- passes-through programs, macros and mata objects, in the current version it is not capable of doing the same with
matrices or scalars. The tempname internal state is copied to childre, but the parent does not receive any of this state from the
children. That is, -parallel- advances the tempname (tempvar) sequence in the children to not overlap with any produced by the parent.
If the number of tasks to be done is less than the number of child processes, parallel will temporarily reduce the number of child
processes. This is reported in the global $LAST_PLL_N.
Expressions run in the child-processes that contain _n or _N will be evaluated locally to the child not the parent dataset. These
expressions may therefore be different if run in parallel than without parallel.
When executing Stata on separate machines via ssh, no environment variables except PWD and STATATMP are copied over.
6. Technical note
In order to protect a pll_id code (and thus ancillary files), once -parallel- is called it creates a new file called __pll[pll_id]sandbox
(stored at c(tmpdir)). This forbids -parallel clean- from deleting any auxiliary file
used by that process and reserves the pll_id so that no other call of -parallel- can use this pll_id. Once every child process has
finished, the sandbox file is removed, freeing the pll_id.
If for any reason the algorithm breaks due to a flaw or crush of the system, the sandbox file and the rest of auxiliary files will not be
deleted. In order to clean up this, the user will be able to do so manually (moving the file(s) to the OS recycle bin) or using parallel
clean, all force syntax. This way all sandbox files in the c(tmpdir) folder and auxiliary files stored at the current directory will be
deleted.
In earlier versions of -parallel-, tempfiles generation was not safe as while running multiple Stata instances simultaneously these could
overwrite each other's tempfiles. Starting version 1.14, this is no longer a problem as each Stata instance starts with a different
c(tmpdir) location. This way, instances' tempfile management will not interfere with each other, allowing to safely use commands or
algorithms depending on tempfile generation (such as preserve and restore).
The option -setparallelid- is designed to let programmers recycle a parallel id (pll_id). Intended to be used with -parallel_sandbox-
(undocumented, please refer to the source code of -parallel_sandbox()-), this option allows calling parallel several times using the same
pll_id, which makes auxiliary files management far simpler. Take the following example
program def mypllwrapper
// Reserving a pll_id
m: parallel_sandbox(5)
// Using the generated pll_id
save __pll`parallelid'_mypllwrapper, replace
// Recycling the pll_id
forval i=1/10 {
parallel, setparallelid(`parallelid') keep: some_other_cmd
}
// Cleanning up and freeing the pll_id. This will remove all files
// and folders named with prefix '__pll[parallelid]'
parallel clean, e(`parallelid')
m: parallel_sandbox(2,"`parallelid'")
end
For a real example of this, please see -parallel.bs- and -parallel_sim.ado-.
Windows-shell: Spawning child processes with shell command on Windows (Deprecated)
Originally child processes on Windows were spawned as they were on other platforms using Stata's shell methods (e.g. winexec). This had a
number of problems (spawned processes stole the UI focus, failure to recover from killed child processes, difficulty in batch-mode), so now
Windows uses a plugin that launches the child processes directly using Win32 system calls. The original functionality is retained, but
deprecated. To enable it you must specified the procexec(0) option.
Since shell commmands are ignored by Stata in batch-mode on Windows, a work around is needed. The method is to have Stata write out the
commands to be executed to a file (called the gateway) and have a separate process read new inputs to this file and execute the commands.
This latter part requires the user to install Cygwin and run a few commands prior to starting Stata. In a Cygwin terminal, navigate to the
appropriate directory and do the following:
$ rm pll_gateway.sh
$ touch pll_gateway.sh
$ tail -f pll_gateway.sh | bash
Then you can execute your Stata script in batch-mode on Windows. The Cygwin tail process can stay running through multiple uses.
The default gateway file assumed is pll_gateway.sh. If you would like a different file modify the Cygwin script above and pass a new value
for gateway(gateway_path) to parallel initialize.
Since Cygwin is going to execute the commands to start the parallel Stata instances it needs a Cygwin-like Stata path. If the user does not
specify the Stata path then -parallel- will take the generated windows path and convert it to "/cygdrive/<drive letter>/...". If this does
not work you will need to specify the statapath explicitly.
In this mode, there is no automatic way for the parent process to stop the child processes in case the user has requested a break in
execution. The original (but now deprecated) parallel break can still be used (and mata equivalents parallel_break() and
_parallel_break()). This is a call that is you write into the code that executes in the children that queries if the mother process has
requested to break. If this is not used appropriately, and a child process is executing for a long period (e.g. an endless loop) the user
must kill the child processes manually.
Example 1: using prefix syntax
In this example we'll generate a variable containing the maximum blood-pressure measurement (bp) by patient.
Setup for a quad-core computer
. sysuse bplong.dta
. sort patient
. parallel initialize 4
Computes the maximum of bp for each patient. We add the option by(patient) to tell parallel not to split stories.
. parallel, by(patient): by patient: egen max_bp = max(bp)
Which is the ``parallel way'' to do:
. by patient: egen max_bp = max(bp)
Giving you the same result.
Example 2: using -parallel do- syntax
Another usage that may get big benefits from it is implementing loop-base simulations. Imagine that we have a model that requires looping
over each and every record of a panel-data dataset.
Using -parallel-, the proper way to do this would be using the ``parallel do'' syntax
. use mybigpanel.dta, clear
. parallel initialize 4
. parallel do mymodel.do
. collapse ...
where mymodel.do would look something like this
----------------------------------- begin of do-file ------------
local maxiter = _N
forval i = 1/`maxiter' {
...some routine...
}
----------------------------------- end of the do-file ----------
Or, in the case of using mata, this would look something like this
----------------------------------- begin of do-file ------------
mata:
N=c("N")
for(i = 1;i<=N;i++) {
...some routine...
}
----------------------------------- end of the do-file ----------
Example 3: setting the right path
In the case of -parallel- setting the stata.exe's path wrongly, using -setstatapath- will correct the situation. So, if "C:\Archivos de
programa\Stata12/stata.exe" is the right path we only have to write:
. parallel initialize 2, s("C:\Archivos de programa\Stata12/stata.exe")
Example 4: Using -parallel bs-
In this example we'll evaluate a regression model using bootstrapping
Setup for a quad-core computer
. sysuse auto, clear
. parallel initialize 4
Running parallel bs.
. parallel bs: reg price c.weig##c.weigh foreign rep
Which is the ``parallel way'' to do:
. bs: reg price c.weig##c.weigh foreign rep
Example 5: Using -parallel sim-
Example from simulate
Setup for a quad-core computer
. parallel initialize 4
Experiment that will be performed
program define lnsim, rclass
version 17
syntax [, obs(integer 1) mu(real 0) sigma(real 1) ]
drop _all
set obs `obs'
tempvar z
gen `z' = exp(rnormal(`mu',`sigma'))
summarize `z'
return scalar mean = r(mean)
return scalar Var = r(Var)
end
Running parallel sim.
. parallel sim, expr(mean=r(mean) var=r(Var)) reps(10000): lnsim, obs(100)
Which is the ``parallel way'' to do:
. simulate mean=r(mean) var=r(Var), reps(10000): lnsim, obs(100)
Example 6: Using -pll_instance- and -PLL_CHILDREN- macros
By using -pll_instance- and -PLL_CHILDREN- global macros the user can run -parallel- in such a way that each child process performs a
different task. Take the following example:
Setup for a quad-core computer
. parallel initialize 4
. sysuse auto, clear
program def myprog
gen x = $pll_instance
gen y = $PLL_CHILDREN
// For the first child process
if ($pll_instance == 1) gen z = exp(2)
// For the second child process
else if ($pll_instance == 2) {
summ price
gen z = r(mean)
}
// For the third and fourth child processes
else gen z = 0
end
Running the program
. parallel, prog(myprog): myprog
Here, running with 4 cores, the program -myprog- performs different actions depending on the value (number) of -pll_instance-. For those
observation in the first child process, -parallel- will generate -z- equal to exp(2), for those in the second child process it will compute
-z- equal to the average price and for the rest of the child processes it will generate -z- equal to zero.
8. Saved results
-parallel- saves the following in r():
Scalars
r(pll_n) Number of parallel child processes last used
r(pll_t_fini) Time took to appending and cleaning
r(pll_t_calc) Time took to complete the parallel job
r(pll_t_setu) Time took to setup (before the parallelization) and to finish the job (after the parallelization)
r(pll_errs) Number of child processes which stopped with an error.
Macros
r(pll_id) Id of the last parallel instance executed (needed to use parallel clean)
r(pll_dir) Directory where parallel ran and stored the auxiliary files.
r(pll_seeds) Seeds used within each child process.
-parallel bs- and -parallel sim- save the following in e():
Scalars
e(pll) 1.
-parallel version- saves the following in r():
Macros
r(pll_vers) Current version of the module.
-parallel numprocessors- saves the following in r():
Scalars
r(numprocessors) Number of logical processors on the system.
-parallel- saves the following global macros:
$LAST_PLL_DIR A copy of r(pll_dir).
$LAST_PLL_N A copy of r(pll_n).
$LAST_PLL_ID A copy of r(pll_id).
$PLL_LASTRNG Number of times that -parallel_randomid()- has been executed.
$PLL_STATA_PATH, $PLL_CLUSTERS (deprecated), $PLL_CHILDREN, $USE_PROCEXEC, $PLL_HOSTNAMES, $PLL_SSH
Internal usage.
9. Citation
When using parallel, please include the following:
Vega Yon GG, Quistorff B. parallel: A command for parallel computing. The Stata Journal. 2019;19(3):667-684. doi:10.1177/1536867X19874242
For a bibentry, checkout the parallel citation command.
10. Development
You can always have access to the latest version of -parallel-. One option is from its github repo (on-development source code):
https://github.com/gvegayon/parallel
Or from the project's website:
. net install parallel, from(https://raw.github.com/gvegayon/parallel/master/) replace
. mata mata mlib index
You can track new releases on GitHub or by following the RSS feed https://github.com/gvegayon/parallel/releases.atom
In the case of bug reporting, you can submit issues here:
https://github.com/gvegayon/parallel/issues
Please try the latest version to see if your problem has been solved. Include the steps to reproduce the issue and the output of the Stata
command -creturn list-.
11. mata source code
Most of -parallel- has been programmed in mata. This means that, as a difference from typical ado files, -parallel- is distributed with
lparallel mata library (compiled code) and thus source code can not be reached directly by users. Given this, the help file
parallel_source.sthlp is included in the package, help file which contains the source code in a fancy way.
In order to get access to different sections of the source code you can follow these links:
Stops a child process after the user pressed break parallel_break.mata
Remove auxiliary files parallel_clean.mata
Distributes observations across child processes parallel_divide_index.mata
Export global macros parallel_export_globals.mata
Export programs parallel_export_programs.mata
Wait until a child process finishes parallel_finito.mata
(on development) parallel_for.mata
Normalize a filepath parallel_normalizepath.mata
Generate random alphanum parallel_randomid.mata
Lunch simultaneous Stata instances in batch mode parallel_run.mata
Set of tools to protect parallel aux files parallel_sandbox.mata
Set the number of child processes parallel_initialize.mata
Set the Stata EXE directory parallel_setstatapath.mata
Write a ``diagnosis'' parallel_write_diagnosis.mata
Write a dofile to be paralellized parallel_write_do.mata
12. References
Luke Tierney, A. J. Rossini, Na Li and H. Sevcikova (2012). snow: Simple Network of Workstations. R package version 0.3-9.
http://CRAN.R-project.org/package=snow
R Core Team (2012). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN
3-900051-07-0, URL http://www.R-project.org/.
George Vega Y (2012). Introducing PARALLEL: Stata Module for Parallel Computing. Chilean Pension Supervisor, Santiago de Chile, URL
http://fmwww.bc.edu/repec/bocode/p/parallel.pdf.
George Vega Y (2013). Introducing PARALLEL: Stata Module for Parallel Computing. Stata Conference 2013, New Orleans (USA), URL
http://ideas.repec.org/p/boc/norl13/4.html.
Haahr, M. (2006). Random.org: True random number service. Random.org. http://www.random.org/clients/http/.
13. Authors
George Vega Yon [cre,aut], University of Southern California. mailto:g.vegayon@gmail.com http://ggvy.cl/
Brian Quistorff [aut], Bureau of Economic Analysis. mailto:brian-work@quistorff.com http://quistorff.com
14. Contributors
Special Thanks to: Elan P. Kugelmass (aka as epkugelmass at github) [ctb], Timothy Mak (University of Hong Kong) (author of miparallel)
Damian C. Clarke (Oxford University, England), Felix Villatoro (Superintendencia de Pensiones, Chile), Eduardo Fajnzylber (Universidad
Adolfo Ibáñez, Chile), Eric Melse (CAREM, Netherlands), Tomás Rau (Universidad Católica, Chile), Research Division (Superindentendia de
Pensiones, Chile), attendees to the Stata conference 2013 (New Orleans), Philippe Ruh (University of Zurich), Michael Lacy (Colorado
State).
15. Also see
Manual: [GSM] Advanced Stata usage (Mac), [GSU] Advanced Stata usage (Unix), [GSW] Advanced Stata usage (Windows)
Online: Running Stata batch-mode in Mac, Unix and Windows
Project's wiki page of other examples.
16. FAQs
Here follows a list of Frequently Asked Questions:
1. I am getting error (608) file is read-only; cannot be modified or erased. What can I do to solve it?
As Stata suggests, you are trying to either run parallel in a read-only directory, or your program/dofile is trying to write (save a
dta file for example) in a read-only directory. Try running parallel (or making your program to write files) in a directory where you
have writing priviledges (where you can save files).
1. How can I create reproducible results between sequential and parallel excecution when randomness is involved?
See our utility command seeding.