This report lists the candidate variable for DataScheme variables of the construct marital.
This report is a record of interaction with a data transfer object (dto) produced by
./manipulation/0-ellis-island.R
.
The next section recaps this script, exposes the architecture of the DTO, and demonstrates the language of interacting with it.
All data land on Ellis Island.
The script 0-ellis-island.R
is the first script in the analytic workflow. It accomplished the following:
./data/shared/derived/meta-data-live.csv
, which is updated every time Ellis Island script is executed../data/shared/meta-data-map.csv
. They are used by automatic scripts in later harmonization and analysis.# load the product of 0-ellis-island.R, a list object containing data and metadata
dto <- readRDS("./data/unshared/derived/dto.rds")
# the list is composed of the following elements
names(dto)
[1] "studyName" "filePath" "unitData" "metaData"
# 1st element - names of the studies as character vector
dto[["studyName"]]
[1] "alsa" "lbsl" "satsa" "share" "tilda"
# 2nd element - file paths of the data files for each study as character vector
dto[["filePath"]]
[1] "./data/unshared/raw/ALSA-Wave1.Final.sav" "./data/unshared/raw/LBSL-Panel2-Wave1.Final.sav"
[3] "./data/unshared/raw/SATSA-Q3.Final.sav" "./data/unshared/raw/SHARE-Israel-Wave1.Final.sav"
[5] "./data/unshared/raw/TILDA-Wave1.Final.sav"
# 3rd element - is a list object containing the following elements
names(dto[["unitData"]])
[1] "alsa" "lbsl" "satsa" "share" "tilda"
# each of these elements is a raw data set of a corresponding study, for example
dplyr::tbl_df(dto[["unitData"]][["lbsl"]])
Source: local data frame [656 x 31]
id AGE94 SEX94 MSTAT94 EDUC94 NOWRK94 SMK94 SMOKE
(int) (int) (int) (fctr) (int) (fctr) (fctr) (fctr)
1 4001026 68 1 divorced 16 no, retired no never smoked
2 4012015 94 2 widowed 12 no, retired no never smoked
3 4012032 94 2 widowed 20 no, retired no don't smoke at present but smoked in the past
4 4022004 93 2 NA NA NA NA never smoked
5 4022026 93 2 widowed 12 no, retired no never smoked
6 4031031 92 1 married 8 no, retired no don't smoke at present but smoked in the past
7 4031035 92 1 widowed 13 no, retired no don't smoke at present but smoked in the past
8 4032201 92 2 NA NA NA NA don't smoke at present but smoked in the past
9 4041062 91 1 widowed 7 NA no don't smoke at present but smoked in the past
10 4042057 91 2 NA NA NA NA NA
.. ... ... ... ... ... ... ... ...
Variables not shown: ALCOHOL (fctr), WINE (int), BEER (int), HARDLIQ (int), SPORT94 (int), FIT94 (int), WALK94 (int),
SPEC94 (int), DANCE94 (int), CHORE94 (int), EXCERTOT (int), EXCERWK (int), HEIGHT94 (int), WEIGHT94 (int), HWEIGHT
(int), HHEIGHT (int), SRHEALTH (fctr), smoke_now (lgl), smoked_ever (lgl), year_of_wave (dbl), age_in_years (dbl),
year_born (dbl), female (lgl)
# 4th element - a dataset names and labels of raw variables + added metadata for all studies
dto[["metaData"]] %>% dplyr::select(study_name, name, item, construct, type, categories, label_short, label) %>%
DT::datatable(
class = 'cell-border stripe',
caption = "This is the primary metadata file. Edit at `./data/shared/meta-data-map.csv",
filter = "top",
options = list(pageLength = 6, autoWidth = TRUE)
)
Everybody wants to be somebody.
We query metadata set to retrieve all variables potentially tapping the construct marital
. These are the candidates to enter the DataSchema and contribute to computing harmonized variables.
NOTE: what is being retrieved depends on the manually entered values in the column construct
of the metadata file ./data/shared/meta-data-map.csv
. To specify a different group of variables, edit the metadata, not the script.
meta_data <- dto[["metaData"]] %>%
dplyr::filter(construct %in% c('marital')) %>%
dplyr::select(study_name, name, construct, label_short, categories, url) %>%
dplyr::arrange(construct, study_name)
knitr::kable(meta_data)
study_name | name | construct | label_short | categories | url |
---|---|---|---|---|---|
alsa | MARITST | marital | Marital status | 7 | |
lbsl | MSTAT94 | marital | Marital Status in 1994 | 6 | |
satsa | GMARITAL | marital | What is your marital status? | 5 | |
share | DN0140 | marital | Marital Status | 9 | |
tilda | SOCMARRIED | marital | 2 | ||
tilda | CS006 | marital | 6 | ||
tilda | MAR_4 | marital | 4 |
View descriptives : marital for closer examination of each candidate.
The responses to variables loading on the construct marital
are as such:
After reorganizing the possible repsonses, the following clustering has emerged
marital harmonized
After reviewing descriptives and relevant codebooks, the following operationalization of the harmonized variables for marital
have been adopted:
marital
-1
- mar_cohab
- married or cohabiting0
- single
- not married - REFERENCE1
- sep_divorced
- separated or divorced2
- widowed
- widowedsingle
0
- FALSE
- Reference group1
- TRUE
- Risk factorThese variables will be generated next, in the Development section.
The particulare goal of this section is to ensure that the schema to encode the values for the marital
variable is consisten across studies.
In this section we will define the schema sets for harmonizing marital
construct (i.e. specify which variables from which studies will be contributing to computing harmonized variables ). Each of these schema sets will have a particular pattern of possible response values to these variables, which we will export for inspection as .csv
tables. We then will manually edit these .csv
tables, populating new columns that will map values of harmonized variables to the specific response pattern of the schema set variables. We then will import harmonization algorithms encoded in .csv
tables and apply them to compute harmonized variables in the dataset combining raw and harmonized variables for marital
construct across studies.
Having all potential variables in categorical format we have defined the sets of data schema variables thus:
Each of these schema sets have a particular pattern of possible response values, for example:
We output these tables into self-standing .csv
files, so we can manually provide the logic of computing harmonized variables.
You can examine them in `./data/meta/response-profiles-live/
marital
marital
1
- mar_cohab
- married or cohabiting2
- sep_divorced
- separated or divorced3
- single
- not married4
- widowed
- widowedItems that can contribute to generating values for the harmonized variable marital
are:
dto[["metaData"]] %>%
dplyr::filter(study_name=="alsa", construct %in% c("marital")) %>%
dplyr::select(study_name, name, label,categories)
study_name name label categories
1 alsa MARITST Marital status 7
We encode the harmonization rule by manually editing the values in a corresponding .csv
file located in ./data/meta/h-rules/
. Then, we apply the recoding logic it contains and append the newly created, harmonized variable to the initial data set.
study_name <- "alsa"
path_to_hrule <- "./data/meta/h-rules/h-rules-marital-alsa.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
dto,
study_name = study_name,
variable_names = c("MARITST"),
harmony_name = "marital"
)
Source: local data frame [7 x 3]
Groups: MARITST [?]
MARITST marital n
(chr) (chr) (int)
1 De facto mar_cohab 6
2 Divorced sep_divorced 33
3 Married mar_cohab 1361
4 Never married single 76
5 Separated sep_divorced 16
6 Widowed widowed 594
7 NA NA 1
# verify
dto[["unitData"]][["alsa"]] %>%
dplyr::filter(id %in% sample(unique(id),10)) %>%
dplyr::select_("id", "MARITST","marital")
id MARITST marital
1 6891 Widowed widowed
2 6981 Married mar_cohab
3 8212 Married mar_cohab
4 10781 Married mar_cohab
5 12902 Married mar_cohab
6 13291 Married mar_cohab
7 14081 Married mar_cohab
8 15041 Widowed widowed
9 21641 Married mar_cohab
10 22881 Married mar_cohab
Items that can contribute to generating values for the harmonized variable marital
are:
dto[["metaData"]] %>%
dplyr::filter(study_name == "lbsl", construct == "marital") %>%
# dplyr::filter(name %in% c("MSTAT94")) %>%
dplyr::select(study_name, name, label_short,categories)
study_name name label_short categories
1 lbsl MSTAT94 Marital Status in 1994 6
We encode the harmonization rule by manually editing the values in a corresponding .csv
file located in ./data/meta/h-rules/
. Then, we apply the recoding logic it contains and append the newly created, harmonized variable to the initial data set.
study_name <- "lbsl"
path_to_hrule <- "./data/meta/h-rules/h-rules-marital-lbsl.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
dto,
study_name = study_name,
variable_names = c("MSTAT94"),
harmony_name = "marital"
)
Source: local data frame [6 x 3]
Groups: MSTAT94 [?]
MSTAT94 marital n
(chr) (chr) (int)
1 divorced sep_divorced 73
2 married mar_cohab 326
3 separated sep_divorced 4
4 single single 22
5 widowed widowed 134
6 NA NA 97
# verify
dto[["unitData"]][["lbsl"]] %>%
dplyr::filter(id %in% sample(unique(id),10)) %>%
dplyr::select_("id", "MSTAT94", "marital")
id MSTAT94 marital
1 4132095 <NA> <NA>
2 4191087 married mar_cohab
3 4191200 married mar_cohab
4 4261081 <NA> <NA>
5 4271074 divorced sep_divorced
6 4311082 married mar_cohab
7 4421013 <NA> <NA>
8 4452040 <NA> <NA>
9 4541001 divorced sep_divorced
10 4562003 <NA> <NA>
Items that can contribute to generating values for the harmonized variable marital
are:
dto[["metaData"]] %>%
dplyr::filter(study_name == "satsa", construct == "marital") %>%
# dplyr::filter(name %in% c("GMARITAL")) %>%
dplyr::select(study_name, name, label_short,categories)
study_name name label_short categories
1 satsa GMARITAL What is your marital status? 5
We encode the harmonization rule by manually editing the values in a corresponding .csv
file located in ./data/meta/h-rules/
. Then, we apply the recoding logic it contains and append the newly created, harmonized variable to the initial data set.
study_name <- "satsa"
path_to_hrule <- "./data/meta/h-rules/h-rules-marital-satsa.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
dto,
study_name = study_name,
variable_names = c("GMARITAL"),
harmony_name = "marital"
)
Source: local data frame [5 x 3]
Groups: GMARITAL [?]
GMARITAL marital n
(chr) (chr) (int)
1 divorced sep_divorced 113
2 married /living together with person mar_cohab 961
3 Not married single 149
4 widow/widower widowed 259
5 NA NA 15
# verify
dto[["unitData"]][["satsa"]] %>%
dplyr::filter(id %in% sample(unique(id),10)) %>%
dplyr::select_("id", "GMARITAL", "marital")
id GMARITAL marital
1 19612 widow/widower widowed
2 133522 widow/widower widowed
3 150011 widow/widower widowed
4 154632 married /living together with person mar_cohab
5 163402 widow/widower widowed
6 164321 married /living together with person mar_cohab
7 2212402 Not married single
8 2239662 married /living together with person mar_cohab
9 2432001 married /living together with person mar_cohab
10 2445412 married /living together with person mar_cohab
Items that can contribute to generating values for the harmonized variable marital
are:
dto[["metaData"]] %>%
dplyr::filter(study_name == "tilda", construct == "marital") %>%
# dplyr::filter(name %in% c("SMK94", "SMOKE")) %>%
dplyr::select(study_name, name, label_short,categories)
study_name name label_short categories
1 tilda SOCMARRIED 2
2 tilda CS006 6
3 tilda MAR_4 4
We encode the harmonization rule by manually editing the values in a corresponding .csv
file located in ./data/meta/h-rules/
. Then, we apply the recoding logic it contains and append the newly created, harmonized variable to the initial data set.
study_name <- "tilda"
path_to_hrule <- "./data/meta/h-rules/h-rules-marital-tilda.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
dto,
study_name = study_name,
variable_names = c("SOCMARRIED", "MAR_4", "CS006"),
harmony_name = "marital"
)
Source: local data frame [6 x 5]
Groups: SOCMARRIED, MAR_4, CS006 [?]
SOCMARRIED MAR_4 CS006 marital n
(chr) (chr) (chr) (chr) (int)
1 Married Married Living with a partner as if married mar_cohab 218
2 Married Married Married mar_cohab 5748
3 Not married Never married Single (never married) single 791
4 Not married Sep/divorced Divorced sep_divorced 200
5 Not married Sep/divorced Separated sep_divorced 352
6 Not married Widowed Widowed widowed 1195
# verify
dto[["unitData"]][["tilda"]] %>%
dplyr::filter(id %in% sample(unique(id),10)) %>%
dplyr::select_("id", "SOCMARRIED", "MAR_4", "CS006", "marital")
id SOCMARRIED MAR_4 CS006 marital
1 10321 Married Married Married mar_cohab
2 24611 Married Married Married mar_cohab
3 138652 Married Married Married mar_cohab
4 207651 Not married Widowed Widowed widowed
5 224891 Not married Never married Single (never married) single
6 243411 Not married Sep/divorced Separated sep_divorced
7 325612 Married Married Married mar_cohab
8 329612 Married Married Married mar_cohab
9 445201 Not married Widowed Widowed widowed
10 475571 Married Married Married mar_cohab
single
single
0
- FALSE
- Reference group1
- TRUE
- Risk factorItems that can contribute to generating values for the harmonized variable single
are:
dto[["metaData"]] %>%
dplyr::filter(study_name=="alsa", construct %in% c("marital")) %>%
dplyr::select(study_name, name, label,categories)
study_name name label categories
1 alsa MARITST Marital status 7
We encode the harmonization rule by manually editing the values in a corresponding .csv
file located in ./data/meta/h-rules/
. Then, we apply the recoding logic it contains and append the newly created, harmonized variable to the initial data set.
study_name <- "alsa"
path_to_hrule <- "./data/meta/h-rules/h-rules-marital-alsa.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
dto,
study_name = study_name,
variable_names = c("MARITST"),
harmony_name = "single"
)
Source: local data frame [7 x 3]
Groups: MARITST [?]
MARITST single n
(chr) (lgl) (int)
1 De facto FALSE 6
2 Divorced TRUE 33
3 Married FALSE 1361
4 Never married TRUE 76
5 Separated TRUE 16
6 Widowed TRUE 594
7 NA NA 1
# verify
dto[["unitData"]][["alsa"]] %>%
dplyr::filter(id %in% sample(unique(id),10)) %>%
dplyr::select_("id", "MARITST","single")
id MARITST single
1 4871 Married FALSE
2 7992 Married FALSE
3 10311 Widowed TRUE
4 10391 Married FALSE
5 11001 Married FALSE
6 18611 Widowed TRUE
7 21351 Never married TRUE
8 21832 Married FALSE
9 23712 Married FALSE
10 36961 Married FALSE
Items that can contribute to generating values for the harmonized variable single
are:
dto[["metaData"]] %>%
dplyr::filter(study_name == "lbsl", construct == "marital") %>%
# dplyr::filter(name %in% c("MSTAT94")) %>%
dplyr::select(study_name, name, label_short,categories)
study_name name label_short categories
1 lbsl MSTAT94 Marital Status in 1994 6
We encode the harmonization rule by manually editing the values in a corresponding .csv
file located in ./data/meta/h-rules/
. Then, we apply the recoding logic it contains and append the newly created, harmonized variable to the initial data set.
study_name <- "lbsl"
path_to_hrule <- "./data/meta/h-rules/h-rules-marital-lbsl.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
dto,
study_name = study_name,
variable_names = c("MSTAT94"),
harmony_name = "single"
)
Source: local data frame [6 x 3]
Groups: MSTAT94 [?]
MSTAT94 single n
(chr) (lgl) (int)
1 divorced TRUE 73
2 married FALSE 326
3 separated TRUE 4
4 single TRUE 22
5 widowed TRUE 134
6 NA NA 97
# verify
dto[["unitData"]][["lbsl"]] %>%
dplyr::filter(id %in% sample(unique(id),10)) %>%
dplyr::select_("id", "MSTAT94", "single")
id MSTAT94 single
1 4051023 widowed TRUE
2 4131200 married FALSE
3 4202081 widowed TRUE
4 4212083 married FALSE
5 4232086 widowed TRUE
6 4271073 married FALSE
7 4402047 married FALSE
8 4402048 married FALSE
9 4421039 married FALSE
10 4452038 married FALSE
Items that can contribute to generating values for the harmonized variable single
are:
dto[["metaData"]] %>%
dplyr::filter(study_name == "satsa", construct == "marital") %>%
# dplyr::filter(name %in% c("GMARITAL")) %>%
dplyr::select(study_name, name, label_short,categories)
study_name name label_short categories
1 satsa GMARITAL What is your marital status? 5
We encode the harmonization rule by manually editing the values in a corresponding .csv
file located in ./data/meta/h-rules/
. Then, we apply the recoding logic it contains and append the newly created, harmonized variable to the initial data set.
study_name <- "satsa"
path_to_hrule <- "./data/meta/h-rules/h-rules-marital-satsa.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
dto,
study_name = study_name,
variable_names = c("GMARITAL"),
harmony_name = "single"
)
Source: local data frame [5 x 3]
Groups: GMARITAL [?]
GMARITAL single n
(chr) (lgl) (int)
1 divorced TRUE 113
2 married /living together with person FALSE 961
3 Not married TRUE 149
4 widow/widower TRUE 259
5 NA NA 15
# verify
dto[["unitData"]][["satsa"]] %>%
dplyr::filter(id %in% sample(unique(id),10)) %>%
dplyr::select_("id", "GMARITAL", "single")
id GMARITAL single
1 152802 married /living together with person FALSE
2 154522 widow/widower TRUE
3 158002 married /living together with person FALSE
4 158901 married /living together with person FALSE
5 159461 married /living together with person FALSE
6 164022 widow/widower TRUE
7 173802 Not married TRUE
8 190121 married /living together with person FALSE
9 2154892 married /living together with person FALSE
10 2395002 married /living together with person FALSE
Items that can contribute to generating values for the harmonized variable single
are:
dto[["metaData"]] %>%
dplyr::filter(study_name == "tilda", construct == "marital") %>%
# dplyr::filter(name %in% c("SMK94", "SMOKE")) %>%
dplyr::select(study_name, name, label_short,categories)
study_name name label_short categories
1 tilda SOCMARRIED 2
2 tilda CS006 6
3 tilda MAR_4 4
We encode the harmonization rule by manually editing the values in a corresponding .csv
file located in ./data/meta/h-rules/
. Then, we apply the recoding logic it contains and append the newly created, harmonized variable to the initial data set.
study_name <- "tilda"
path_to_hrule <- "./data/meta/h-rules/h-rules-marital-tilda.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
dto,
study_name = study_name,
variable_names = c("SOCMARRIED", "MAR_4", "CS006"),
harmony_name = "single"
)
Source: local data frame [6 x 5]
Groups: SOCMARRIED, MAR_4, CS006 [?]
SOCMARRIED MAR_4 CS006 single n
(chr) (chr) (chr) (lgl) (int)
1 Married Married Living with a partner as if married FALSE 218
2 Married Married Married FALSE 5748
3 Not married Never married Single (never married) TRUE 791
4 Not married Sep/divorced Divorced TRUE 200
5 Not married Sep/divorced Separated TRUE 352
6 Not married Widowed Widowed TRUE 1195
# verify
dto[["unitData"]][["tilda"]] %>%
dplyr::filter(id %in% sample(unique(id),10)) %>%
dplyr::select_("id", "SOCMARRIED", "MAR_4", "CS006", "single")
id SOCMARRIED MAR_4 CS006 single
1 68771 Married Married Married FALSE
2 147681 Not married Widowed Widowed TRUE
3 335091 Married Married Married FALSE
4 361322 Married Married Married FALSE
5 378751 Married Married Married FALSE
6 453071 Married Married Married FALSE
7 558521 Married Married Married FALSE
8 564651 Not married Widowed Widowed TRUE
9 582481 Not married Never married Single (never married) TRUE
10 610111 Married Married Married FALSE
At this point the dto[["unitData"]]
elements (raw data files for each study) have been augmented with the harmonized variable marital
. We retrieve harmonized variables to view frequency counts across studies:
dumlist <- list()
for(s in dto[["studyName"]]){
ds <- dto[["unitData"]][[s]]
dumlist[[s]] <- ds[,c("id","marital","single")]
}
ds <- plyr::ldply(dumlist,data.frame,.id = "study_name")
head(ds)
study_name id marital single
1 alsa 41 mar_cohab FALSE
2 alsa 42 mar_cohab FALSE
3 alsa 61 widowed TRUE
4 alsa 71 widowed TRUE
5 alsa 91 widowed TRUE
6 alsa 121 widowed TRUE
ds$id <- 1:nrow(ds) # some ids values might be identical, replace
table( ds$marital, ds$study_name, useNA = "always")
alsa lbsl satsa share tilda <NA>
mar_cohab 1367 326 961 2049 5966 0
sep_divorced 49 77 113 159 552 0
single 76 22 149 51 791 0
widowed 594 134 259 336 1195 0
<NA> 1 97 15 3 0 0
table( ds$single, ds$study_name, useNA = "always")
alsa lbsl satsa share tilda <NA>
FALSE 1367 326 961 2049 5966 0
TRUE 719 233 521 546 2538 0
<NA> 1 97 15 3 0 0
Finally, we have added the newly created, harmonized variables to the raw source objects and save the data transfer object.
# Save as a compress, binary R dataset. It's no longer readable with a text editor, but it saves metadata (eg, factor information).
saveRDS(dto, file="./data/unshared/derived/dto.rds", compress="xz")