This report lists the candidate variable for DataScheme variables of the construct marital.

(I) Exposition

This report is a record of interaction with a data transfer object (dto) produced by ./manipulation/0-ellis-island.R.

The next section recaps this script, exposes the architecture of the DTO, and demonstrates the language of interacting with it.

(I.A) Ellis Island

All data land on Ellis Island.

The script 0-ellis-island.R is the first script in the analytic workflow. It accomplished the following:

    1. Reads in raw data files from the candidate studies
    1. Extract, combines, and exports their metadata (specifically, variable names and labels, if provided) into ./data/shared/derived/meta-data-live.csv, which is updated every time Ellis Island script is executed.
    1. Augments raw metadata with instructions for renaming and classifying variables. The instructions are provided as manually entered values in ./data/shared/meta-data-map.csv. They are used by automatic scripts in later harmonization and analysis.
    1. Combines unit and metadata into a single DTO to serve as a starting point to all subsequent analyses.
# load the product of 0-ellis-island.R,  a list object containing data and metadata
dto <- readRDS("./data/unshared/derived/dto.rds")
# the list is composed of the following elements
names(dto)
[1] "studyName" "filePath"  "unitData"  "metaData" 
# 1st element - names of the studies as character vector
dto[["studyName"]]
[1] "alsa"  "lbsl"  "satsa" "share" "tilda"
# 2nd element - file paths of the data files for each study as character vector
dto[["filePath"]]
[1] "./data/unshared/raw/ALSA-Wave1.Final.sav"         "./data/unshared/raw/LBSL-Panel2-Wave1.Final.sav" 
[3] "./data/unshared/raw/SATSA-Q3.Final.sav"           "./data/unshared/raw/SHARE-Israel-Wave1.Final.sav"
[5] "./data/unshared/raw/TILDA-Wave1.Final.sav"       
# 3rd element - is a list object containing the following elements
names(dto[["unitData"]])
[1] "alsa"  "lbsl"  "satsa" "share" "tilda"
# each of these elements is a raw data set of a corresponding study, for example
dplyr::tbl_df(dto[["unitData"]][["lbsl"]]) 
Source: local data frame [656 x 31]

        id AGE94 SEX94  MSTAT94 EDUC94     NOWRK94  SMK94                                         SMOKE
     (int) (int) (int)   (fctr)  (int)      (fctr) (fctr)                                        (fctr)
1  4001026    68     1 divorced     16 no, retired     no                                  never smoked
2  4012015    94     2  widowed     12 no, retired     no                                  never smoked
3  4012032    94     2  widowed     20 no, retired     no don't smoke at present but smoked in the past
4  4022004    93     2       NA     NA          NA     NA                                  never smoked
5  4022026    93     2  widowed     12 no, retired     no                                  never smoked
6  4031031    92     1  married      8 no, retired     no don't smoke at present but smoked in the past
7  4031035    92     1  widowed     13 no, retired     no don't smoke at present but smoked in the past
8  4032201    92     2       NA     NA          NA     NA don't smoke at present but smoked in the past
9  4041062    91     1  widowed      7          NA     no don't smoke at present but smoked in the past
10 4042057    91     2       NA     NA          NA     NA                                            NA
..     ...   ...   ...      ...    ...         ...    ...                                           ...
Variables not shown: ALCOHOL (fctr), WINE (int), BEER (int), HARDLIQ (int), SPORT94 (int), FIT94 (int), WALK94 (int),
  SPEC94 (int), DANCE94 (int), CHORE94 (int), EXCERTOT (int), EXCERWK (int), HEIGHT94 (int), WEIGHT94 (int), HWEIGHT
  (int), HHEIGHT (int), SRHEALTH (fctr), smoke_now (lgl), smoked_ever (lgl), year_of_wave (dbl), age_in_years (dbl),
  year_born (dbl), female (lgl)

Meta

# 4th element - a dataset names and labels of raw variables + added metadata for all studies
dto[["metaData"]] %>% dplyr::select(study_name, name, item, construct, type, categories, label_short, label) %>% 
  DT::datatable(
    class   = 'cell-border stripe',
    caption = "This is the primary metadata file. Edit at `./data/shared/meta-data-map.csv",
    filter  = "top",
    options = list(pageLength = 6, autoWidth = TRUE)
  )

(I.B) Target-H

Everybody wants to be somebody.

We query metadata set to retrieve all variables potentially tapping the construct marital. These are the candidates to enter the DataSchema and contribute to computing harmonized variables.

NOTE: what is being retrieved depends on the manually entered values in the column construct of the metadata file ./data/shared/meta-data-map.csv. To specify a different group of variables, edit the metadata, not the script.

meta_data <- dto[["metaData"]] %>%
  dplyr::filter(construct %in% c('marital')) %>% 
  dplyr::select(study_name, name, construct, label_short, categories, url) %>%
  dplyr::arrange(construct, study_name)
knitr::kable(meta_data)
study_name name construct label_short categories url
alsa MARITST marital Marital status 7
lbsl MSTAT94 marital Marital Status in 1994 6
satsa GMARITAL marital What is your marital status? 5
share DN0140 marital Marital Status 9
tilda SOCMARRIED marital 2
tilda CS006 marital 6
tilda MAR_4 marital 4

View descriptives : marital for closer examination of each candidate.

The responses to variables loading on the construct marital are as such: marital raw

After reorganizing the possible repsonses, the following clustering has emerged

marital harmonized

marital harmonized

After reviewing descriptives and relevant codebooks, the following operationalization of the harmonized variables for marital have been adopted:

Target (1) : marital

  • -1 - mar_cohab - married or cohabiting
  • 0 - single- not married - REFERENCE
  • 1 - sep_divorced - separated or divorced
  • 2 - widowed - widowed

Target (2) : single

  • 0 - FALSE - Reference group
  • 1 - TRUE - Risk factor

These variables will be generated next, in the Development section.

(II) Development

The particulare goal of this section is to ensure that the schema to encode the values for the marital variable is consisten across studies.

In this section we will define the schema sets for harmonizing marital construct (i.e. specify which variables from which studies will be contributing to computing harmonized variables ). Each of these schema sets will have a particular pattern of possible response values to these variables, which we will export for inspection as .csv tables. We then will manually edit these .csv tables, populating new columns that will map values of harmonized variables to the specific response pattern of the schema set variables. We then will import harmonization algorithms encoded in .csv tables and apply them to compute harmonized variables in the dataset combining raw and harmonized variables for marital construct across studies.

(II.A)

(1) Schema sets

Having all potential variables in categorical format we have defined the sets of data schema variables thus:

Each of these schema sets have a particular pattern of possible response values, for example:

We output these tables into self-standing .csv files, so we can manually provide the logic of computing harmonized variables.

You can examine them in `./data/meta/response-profiles-live/

(II.B) marital

Target : marital

  • 1 - mar_cohab - married or cohabiting
  • 2 - sep_divorced - separated or divorced
  • 3 - single- not married
  • 4 - widowed - widowed

ALSA

Items that can contribute to generating values for the harmonized variable marital are:

dto[["metaData"]] %>%
  dplyr::filter(study_name=="alsa", construct %in% c("marital")) %>%
  dplyr::select(study_name, name, label,categories)
  study_name    name          label categories
1       alsa MARITST Marital status          7

We encode the harmonization rule by manually editing the values in a corresponding .csv file located in ./data/meta/h-rules/. Then, we apply the recoding logic it contains and append the newly created, harmonized variable to the initial data set.

study_name <- "alsa"
path_to_hrule <- "./data/meta/h-rules/h-rules-marital-alsa.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
  dto,
  study_name = study_name, 
  variable_names = c("MARITST"), 
  harmony_name = "marital"
)
Source: local data frame [7 x 3]
Groups: MARITST [?]

        MARITST      marital     n
          (chr)        (chr) (int)
1      De facto    mar_cohab     6
2      Divorced sep_divorced    33
3       Married    mar_cohab  1361
4 Never married       single    76
5     Separated sep_divorced    16
6       Widowed      widowed   594
7            NA           NA     1
# verify
dto[["unitData"]][["alsa"]] %>%
  dplyr::filter(id %in% sample(unique(id),10)) %>%
  dplyr::select_("id", "MARITST","marital")
      id MARITST   marital
1   6891 Widowed   widowed
2   6981 Married mar_cohab
3   8212 Married mar_cohab
4  10781 Married mar_cohab
5  12902 Married mar_cohab
6  13291 Married mar_cohab
7  14081 Married mar_cohab
8  15041 Widowed   widowed
9  21641 Married mar_cohab
10 22881 Married mar_cohab

LBSL

Items that can contribute to generating values for the harmonized variable marital are:

dto[["metaData"]] %>%
  dplyr::filter(study_name == "lbsl", construct == "marital") %>%
  # dplyr::filter(name %in% c("MSTAT94")) %>%
  dplyr::select(study_name, name, label_short,categories)
  study_name    name            label_short categories
1       lbsl MSTAT94 Marital Status in 1994          6

We encode the harmonization rule by manually editing the values in a corresponding .csv file located in ./data/meta/h-rules/. Then, we apply the recoding logic it contains and append the newly created, harmonized variable to the initial data set.

study_name <- "lbsl"
path_to_hrule <- "./data/meta/h-rules/h-rules-marital-lbsl.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
  dto,
  study_name = study_name, 
  variable_names = c("MSTAT94"), 
  harmony_name = "marital"
)
Source: local data frame [6 x 3]
Groups: MSTAT94 [?]

    MSTAT94      marital     n
      (chr)        (chr) (int)
1  divorced sep_divorced    73
2   married    mar_cohab   326
3 separated sep_divorced     4
4    single       single    22
5   widowed      widowed   134
6        NA           NA    97
# verify
dto[["unitData"]][["lbsl"]] %>%
  dplyr::filter(id %in% sample(unique(id),10)) %>%
  dplyr::select_("id", "MSTAT94", "marital")
        id  MSTAT94      marital
1  4132095     <NA>         <NA>
2  4191087  married    mar_cohab
3  4191200  married    mar_cohab
4  4261081     <NA>         <NA>
5  4271074 divorced sep_divorced
6  4311082  married    mar_cohab
7  4421013     <NA>         <NA>
8  4452040     <NA>         <NA>
9  4541001 divorced sep_divorced
10 4562003     <NA>         <NA>

SATSA

Items that can contribute to generating values for the harmonized variable marital are:

dto[["metaData"]] %>%
  dplyr::filter(study_name == "satsa", construct == "marital") %>%
  # dplyr::filter(name %in% c("GMARITAL")) %>%
  dplyr::select(study_name, name, label_short,categories)
  study_name     name                  label_short categories
1      satsa GMARITAL What is your marital status?          5

We encode the harmonization rule by manually editing the values in a corresponding .csv file located in ./data/meta/h-rules/. Then, we apply the recoding logic it contains and append the newly created, harmonized variable to the initial data set.

study_name <- "satsa"
path_to_hrule <- "./data/meta/h-rules/h-rules-marital-satsa.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
  dto,
  study_name = study_name, 
  variable_names = c("GMARITAL"), 
  harmony_name = "marital"
)
Source: local data frame [5 x 3]
Groups: GMARITAL [?]

                              GMARITAL      marital     n
                                 (chr)        (chr) (int)
1                             divorced sep_divorced   113
2 married /living together with person    mar_cohab   961
3                          Not married       single   149
4                        widow/widower      widowed   259
5                                   NA           NA    15
# verify
dto[["unitData"]][["satsa"]] %>%
  dplyr::filter(id %in% sample(unique(id),10)) %>%
  dplyr::select_("id", "GMARITAL", "marital")
        id                             GMARITAL   marital
1    19612                        widow/widower   widowed
2   133522                        widow/widower   widowed
3   150011                        widow/widower   widowed
4   154632 married /living together with person mar_cohab
5   163402                        widow/widower   widowed
6   164321 married /living together with person mar_cohab
7  2212402                          Not married    single
8  2239662 married /living together with person mar_cohab
9  2432001 married /living together with person mar_cohab
10 2445412 married /living together with person mar_cohab

SHARE

Items that can contribute to generating values for the harmonized variable marital are:

dto[["metaData"]] %>%
  dplyr::filter(study_name == "share", construct == "marital") %>%
  # dplyr::filter(name %in% c("DN0140")) %>%
  dplyr::select(study_name, name, label_short,categories)
  study_name   name    label_short categories
1      share DN0140 Marital Status          9

We encode the harmonization rule by manually editing the values in a corresponding .csv file located in ./data/meta/h-rules/. Then, we apply the recoding logic it contains and append the newly created, harmonized variable to the initial data set.

study_name <- "share"
path_to_hrule <- "./data/meta/h-rules/h-rules-marital-share.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
  dto,
  study_name = study_name, 
  variable_names = c("DN0140"), 
  harmony_name = "marital"
)
Source: local data frame [9 x 3]
Groups: DN0140 [?]

                                   DN0140      marital     n
                                    (chr)        (chr) (int)
1                                divorced sep_divorced   140
2                              don't know           NA     1
3 married and living together with spouse    mar_cohab  2039
4   married, living separated from spouse sep_divorced    19
5                           never married       single    51
6                                 refusal           NA     1
7                  registered partnership    mar_cohab    10
8                                 widowed      widowed   336
9                                      NA           NA     1
# verify
dto[["unitData"]][["share"]] %>%
  dplyr::filter(id %in% sample(unique(id),10)) %>%
  dplyr::select_("id", "DN0140", "marital")
             id                                  DN0140      marital
1  2.505220e+12 married and living together with spouse    mar_cohab
2  2.505242e+12 married and living together with spouse    mar_cohab
3  2.505251e+12                                divorced sep_divorced
4  2.505259e+12 married and living together with spouse    mar_cohab
5  2.505262e+12 married and living together with spouse    mar_cohab
6  2.505267e+12 married and living together with spouse    mar_cohab
7  2.505268e+12 married and living together with spouse    mar_cohab
8  2.505268e+12 married and living together with spouse    mar_cohab
9  2.505276e+12 married and living together with spouse    mar_cohab
10 2.605219e+12 married and living together with spouse    mar_cohab

TILDA

Items that can contribute to generating values for the harmonized variable marital are:

dto[["metaData"]] %>%
  dplyr::filter(study_name == "tilda", construct == "marital") %>%
  # dplyr::filter(name %in% c("SMK94", "SMOKE")) %>%
  dplyr::select(study_name, name, label_short,categories)
  study_name       name label_short categories
1      tilda SOCMARRIED                      2
2      tilda      CS006                      6
3      tilda      MAR_4                      4

We encode the harmonization rule by manually editing the values in a corresponding .csv file located in ./data/meta/h-rules/. Then, we apply the recoding logic it contains and append the newly created, harmonized variable to the initial data set.

study_name <- "tilda"
path_to_hrule <- "./data/meta/h-rules/h-rules-marital-tilda.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
  dto,
  study_name = study_name, 
  variable_names = c("SOCMARRIED", "MAR_4", "CS006"), 
  harmony_name = "marital"
)
Source: local data frame [6 x 5]
Groups: SOCMARRIED, MAR_4, CS006 [?]

   SOCMARRIED         MAR_4                               CS006      marital     n
        (chr)         (chr)                               (chr)        (chr) (int)
1     Married       Married Living with a partner as if married    mar_cohab   218
2     Married       Married                             Married    mar_cohab  5748
3 Not married Never married              Single (never married)       single   791
4 Not married  Sep/divorced                            Divorced sep_divorced   200
5 Not married  Sep/divorced                           Separated sep_divorced   352
6 Not married       Widowed                             Widowed      widowed  1195
# verify
dto[["unitData"]][["tilda"]] %>%
  dplyr::filter(id %in% sample(unique(id),10)) %>%
  dplyr::select_("id", "SOCMARRIED", "MAR_4", "CS006", "marital")
                   id  SOCMARRIED         MAR_4                  CS006      marital
1  10321                  Married       Married                Married    mar_cohab
2  24611                  Married       Married                Married    mar_cohab
3  138652                 Married       Married                Married    mar_cohab
4  207651             Not married       Widowed                Widowed      widowed
5  224891             Not married Never married Single (never married)       single
6  243411             Not married  Sep/divorced              Separated sep_divorced
7  325612                 Married       Married                Married    mar_cohab
8  329612                 Married       Married                Married    mar_cohab
9  445201             Not married       Widowed                Widowed      widowed
10 475571                 Married       Married                Married    mar_cohab

(II.B) single

Target (2) : single

  • 0 - FALSE - Reference group
  • 1 - TRUE - Risk factor

ALSA

Items that can contribute to generating values for the harmonized variable single are:

dto[["metaData"]] %>%
  dplyr::filter(study_name=="alsa", construct %in% c("marital")) %>%
  dplyr::select(study_name, name, label,categories)
  study_name    name          label categories
1       alsa MARITST Marital status          7

We encode the harmonization rule by manually editing the values in a corresponding .csv file located in ./data/meta/h-rules/. Then, we apply the recoding logic it contains and append the newly created, harmonized variable to the initial data set.

study_name <- "alsa"
path_to_hrule <- "./data/meta/h-rules/h-rules-marital-alsa.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
  dto,
  study_name = study_name, 
  variable_names = c("MARITST"), 
  harmony_name = "single"
)
Source: local data frame [7 x 3]
Groups: MARITST [?]

        MARITST single     n
          (chr)  (lgl) (int)
1      De facto  FALSE     6
2      Divorced   TRUE    33
3       Married  FALSE  1361
4 Never married   TRUE    76
5     Separated   TRUE    16
6       Widowed   TRUE   594
7            NA     NA     1
# verify
dto[["unitData"]][["alsa"]] %>%
  dplyr::filter(id %in% sample(unique(id),10)) %>%
  dplyr::select_("id", "MARITST","single")
      id       MARITST single
1   4871       Married  FALSE
2   7992       Married  FALSE
3  10311       Widowed   TRUE
4  10391       Married  FALSE
5  11001       Married  FALSE
6  18611       Widowed   TRUE
7  21351 Never married   TRUE
8  21832       Married  FALSE
9  23712       Married  FALSE
10 36961       Married  FALSE

LBLS

Items that can contribute to generating values for the harmonized variable single are:

dto[["metaData"]] %>%
  dplyr::filter(study_name == "lbsl", construct == "marital") %>%
  # dplyr::filter(name %in% c("MSTAT94")) %>%
  dplyr::select(study_name, name, label_short,categories)
  study_name    name            label_short categories
1       lbsl MSTAT94 Marital Status in 1994          6

We encode the harmonization rule by manually editing the values in a corresponding .csv file located in ./data/meta/h-rules/. Then, we apply the recoding logic it contains and append the newly created, harmonized variable to the initial data set.

study_name <- "lbsl"
path_to_hrule <- "./data/meta/h-rules/h-rules-marital-lbsl.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
  dto,
  study_name = study_name, 
  variable_names = c("MSTAT94"), 
  harmony_name = "single"
)
Source: local data frame [6 x 3]
Groups: MSTAT94 [?]

    MSTAT94 single     n
      (chr)  (lgl) (int)
1  divorced   TRUE    73
2   married  FALSE   326
3 separated   TRUE     4
4    single   TRUE    22
5   widowed   TRUE   134
6        NA     NA    97
# verify
dto[["unitData"]][["lbsl"]] %>%
  dplyr::filter(id %in% sample(unique(id),10)) %>%
  dplyr::select_("id", "MSTAT94", "single")
        id MSTAT94 single
1  4051023 widowed   TRUE
2  4131200 married  FALSE
3  4202081 widowed   TRUE
4  4212083 married  FALSE
5  4232086 widowed   TRUE
6  4271073 married  FALSE
7  4402047 married  FALSE
8  4402048 married  FALSE
9  4421039 married  FALSE
10 4452038 married  FALSE

SATSA

Items that can contribute to generating values for the harmonized variable single are:

dto[["metaData"]] %>%
  dplyr::filter(study_name == "satsa", construct == "marital") %>%
  # dplyr::filter(name %in% c("GMARITAL")) %>%
  dplyr::select(study_name, name, label_short,categories)
  study_name     name                  label_short categories
1      satsa GMARITAL What is your marital status?          5

We encode the harmonization rule by manually editing the values in a corresponding .csv file located in ./data/meta/h-rules/. Then, we apply the recoding logic it contains and append the newly created, harmonized variable to the initial data set.

study_name <- "satsa"
path_to_hrule <- "./data/meta/h-rules/h-rules-marital-satsa.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
  dto,
  study_name = study_name, 
  variable_names = c("GMARITAL"), 
  harmony_name = "single"
)
Source: local data frame [5 x 3]
Groups: GMARITAL [?]

                              GMARITAL single     n
                                 (chr)  (lgl) (int)
1                             divorced   TRUE   113
2 married /living together with person  FALSE   961
3                          Not married   TRUE   149
4                        widow/widower   TRUE   259
5                                   NA     NA    15
# verify
dto[["unitData"]][["satsa"]] %>%
  dplyr::filter(id %in% sample(unique(id),10)) %>%
  dplyr::select_("id", "GMARITAL", "single")
        id                             GMARITAL single
1   152802 married /living together with person  FALSE
2   154522                        widow/widower   TRUE
3   158002 married /living together with person  FALSE
4   158901 married /living together with person  FALSE
5   159461 married /living together with person  FALSE
6   164022                        widow/widower   TRUE
7   173802                          Not married   TRUE
8   190121 married /living together with person  FALSE
9  2154892 married /living together with person  FALSE
10 2395002 married /living together with person  FALSE

SHARE

Items that can contribute to generating values for the harmonized variable single are:

dto[["metaData"]] %>%
  dplyr::filter(study_name == "share", construct == "marital") %>%
  # dplyr::filter(name %in% c("DN0140")) %>%
  dplyr::select(study_name, name, label_short,categories)
  study_name   name    label_short categories
1      share DN0140 Marital Status          9

We encode the harmonization rule by manually editing the values in a corresponding .csv file located in ./data/meta/h-rules/. Then, we apply the recoding logic it contains and append the newly created, harmonized variable to the initial data set.

study_name <- "share"
path_to_hrule <- "./data/meta/h-rules/h-rules-marital-share.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
  dto,
  study_name = study_name, 
  variable_names = c("DN0140"), 
  harmony_name = "single"
)
Source: local data frame [9 x 3]
Groups: DN0140 [?]

                                   DN0140 single     n
                                    (chr)  (lgl) (int)
1                                divorced   TRUE   140
2                              don't know     NA     1
3 married and living together with spouse  FALSE  2039
4   married, living separated from spouse   TRUE    19
5                           never married   TRUE    51
6                                 refusal     NA     1
7                  registered partnership  FALSE    10
8                                 widowed   TRUE   336
9                                      NA     NA     1
# verify
dto[["unitData"]][["share"]] %>%
  dplyr::filter(id %in% sample(unique(id),10)) %>%
  dplyr::select_("id", "DN0140", "single")
             id                                  DN0140 single
1  2.505215e+12                                 widowed   TRUE
2  2.505232e+12 married and living together with spouse  FALSE
3  2.505244e+12                                 widowed   TRUE
4  2.505248e+12 married and living together with spouse  FALSE
5  2.505250e+12 married and living together with spouse  FALSE
6  2.505259e+12 married and living together with spouse  FALSE
7  2.505268e+12 married and living together with spouse  FALSE
8  2.505271e+12 married and living together with spouse  FALSE
9  2.605254e+12 married and living together with spouse  FALSE
10 2.605286e+12 married and living together with spouse  FALSE

TILDA

Items that can contribute to generating values for the harmonized variable single are:

dto[["metaData"]] %>%
  dplyr::filter(study_name == "tilda", construct == "marital") %>%
  # dplyr::filter(name %in% c("SMK94", "SMOKE")) %>%
  dplyr::select(study_name, name, label_short,categories)
  study_name       name label_short categories
1      tilda SOCMARRIED                      2
2      tilda      CS006                      6
3      tilda      MAR_4                      4

We encode the harmonization rule by manually editing the values in a corresponding .csv file located in ./data/meta/h-rules/. Then, we apply the recoding logic it contains and append the newly created, harmonized variable to the initial data set.

study_name <- "tilda"
path_to_hrule <- "./data/meta/h-rules/h-rules-marital-tilda.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
  dto,
  study_name = study_name, 
  variable_names = c("SOCMARRIED", "MAR_4", "CS006"), 
  harmony_name = "single"
)
Source: local data frame [6 x 5]
Groups: SOCMARRIED, MAR_4, CS006 [?]

   SOCMARRIED         MAR_4                               CS006 single     n
        (chr)         (chr)                               (chr)  (lgl) (int)
1     Married       Married Living with a partner as if married  FALSE   218
2     Married       Married                             Married  FALSE  5748
3 Not married Never married              Single (never married)   TRUE   791
4 Not married  Sep/divorced                            Divorced   TRUE   200
5 Not married  Sep/divorced                           Separated   TRUE   352
6 Not married       Widowed                             Widowed   TRUE  1195
# verify
dto[["unitData"]][["tilda"]] %>%
  dplyr::filter(id %in% sample(unique(id),10)) %>%
  dplyr::select_("id", "SOCMARRIED", "MAR_4", "CS006", "single")
                   id  SOCMARRIED         MAR_4                  CS006 single
1  68771                  Married       Married                Married  FALSE
2  147681             Not married       Widowed                Widowed   TRUE
3  335091                 Married       Married                Married  FALSE
4  361322                 Married       Married                Married  FALSE
5  378751                 Married       Married                Married  FALSE
6  453071                 Married       Married                Married  FALSE
7  558521                 Married       Married                Married  FALSE
8  564651             Not married       Widowed                Widowed   TRUE
9  582481             Not married Never married Single (never married)   TRUE
10 610111                 Married       Married                Married  FALSE

(III) Recapitulation

At this point the dto[["unitData"]] elements (raw data files for each study) have been augmented with the harmonized variable marital. We retrieve harmonized variables to view frequency counts across studies:

dumlist <- list()
for(s in dto[["studyName"]]){
  ds <- dto[["unitData"]][[s]]
  dumlist[[s]] <- ds[,c("id","marital","single")]
}
ds <- plyr::ldply(dumlist,data.frame,.id = "study_name")
head(ds)
  study_name  id   marital single
1       alsa  41 mar_cohab  FALSE
2       alsa  42 mar_cohab  FALSE
3       alsa  61   widowed   TRUE
4       alsa  71   widowed   TRUE
5       alsa  91   widowed   TRUE
6       alsa 121   widowed   TRUE
ds$id <- 1:nrow(ds) # some ids values might be identical, replace
table( ds$marital, ds$study_name, useNA = "always")
              
               alsa lbsl satsa share tilda <NA>
  mar_cohab    1367  326   961  2049  5966    0
  sep_divorced   49   77   113   159   552    0
  single         76   22   149    51   791    0
  widowed       594  134   259   336  1195    0
  <NA>            1   97    15     3     0    0
table( ds$single, ds$study_name, useNA = "always")
       
        alsa lbsl satsa share tilda <NA>
  FALSE 1367  326   961  2049  5966    0
  TRUE   719  233   521   546  2538    0
  <NA>     1   97    15     3     0    0

Finally, we have added the newly created, harmonized variables to the raw source objects and save the data transfer object.

# Save as a compress, binary R dataset.  It's no longer readable with a text editor, but it saves metadata (eg, factor information).
saveRDS(dto, file="./data/unshared/derived/dto.rds", compress="xz")