This report lists the candidate variable for DataScheme variables of the construct education.

(I) Exposition

This report is a record of interaction with a data transfer object (dto) produced by ./manipulation/0-ellis-island.R.

The next section recaps this script, exposes the architecture of the DTO, and demonstrates the language of interacting with it.

(I.A) Ellis Island

All data land on Ellis Island.

The script 0-ellis-island.R is the first script in the analytic workflow. It accomplished the following:

    1. Reads in raw data files from the candidate studies
    1. Extract, combines, and exports their metadata (specifically, variable names and labels, if provided) into ./data/shared/derived/meta-data-live.csv, which is updated every time Ellis Island script is executed.
    1. Augments raw metadata with instructions for renaming and classifying variables. The instructions are provided as manually entered values in ./data/shared/meta-data-map.csv. They are used by automatic scripts in later harmonization and analysis.
    1. Combines unit and metadata into a single DTO to serve as a starting point to all subsequent analyses.
# load the product of 0-ellis-island.R,  a list object containing data and metadata
dto <- readRDS("./data/unshared/derived/dto.rds")
# the list is composed of the following elements
names(dto)
[1] "studyName" "filePath"  "unitData"  "metaData" 
# 1st element - names of the studies as character vector
dto[["studyName"]]
[1] "alsa"  "lbsl"  "satsa" "share" "tilda"
# 2nd element - file paths of the data files for each study as character vector
dto[["filePath"]]
[1] "./data/unshared/raw/ALSA-Wave1.Final.sav"         "./data/unshared/raw/LBSL-Panel2-Wave1.Final.sav" 
[3] "./data/unshared/raw/SATSA-Q3.Final.sav"           "./data/unshared/raw/SHARE-Israel-Wave1.Final.sav"
[5] "./data/unshared/raw/TILDA-Wave1.Final.sav"       
# 3rd element - is a list object containing the following elements
names(dto[["unitData"]])
[1] "alsa"  "lbsl"  "satsa" "share" "tilda"
# each of these elements is a raw data set of a corresponding study, for example
dplyr::tbl_df(dto[["unitData"]][["lbsl"]]) 
Source: local data frame [656 x 33]

        id AGE94 SEX94  MSTAT94 EDUC94     NOWRK94  SMK94                                         SMOKE
     (int) (int) (int)   (fctr)  (int)      (fctr) (fctr)                                        (fctr)
1  4001026    68     1 divorced     16 no, retired     no                                  never smoked
2  4012015    94     2  widowed     12 no, retired     no                                  never smoked
3  4012032    94     2  widowed     20 no, retired     no don't smoke at present but smoked in the past
4  4022004    93     2       NA     NA          NA     NA                                  never smoked
5  4022026    93     2  widowed     12 no, retired     no                                  never smoked
6  4031031    92     1  married      8 no, retired     no don't smoke at present but smoked in the past
7  4031035    92     1  widowed     13 no, retired     no don't smoke at present but smoked in the past
8  4032201    92     2       NA     NA          NA     NA don't smoke at present but smoked in the past
9  4041062    91     1  widowed      7          NA     no don't smoke at present but smoked in the past
10 4042057    91     2       NA     NA          NA     NA                                            NA
..     ...   ...   ...      ...    ...         ...    ...                                           ...
Variables not shown: ALCOHOL (fctr), WINE (int), BEER (int), HARDLIQ (int), SPORT94 (int), FIT94 (int), WALK94 (int),
  SPEC94 (int), DANCE94 (int), CHORE94 (int), EXCERTOT (int), EXCERWK (int), HEIGHT94 (int), WEIGHT94 (int), HWEIGHT
  (int), HHEIGHT (int), SRHEALTH (fctr), smoke_now (lgl), smoked_ever (lgl), year_of_wave (dbl), age_in_years (dbl),
  year_born (dbl), female (lgl), marital (chr), single (lgl)

Meta

# 4th element - a dataset names and labels of raw variables + added metadata for all studies
dto[["metaData"]] %>% dplyr::select(study_name, name, item, construct, type, categories, label_short, label) %>% 
  DT::datatable(
    class   = 'cell-border stripe',
    caption = "This is the primary metadata file. Edit at `./data/shared/meta-data-map.csv",
    filter  = "top",
    options = list(pageLength = 6, autoWidth = TRUE)
  )

(I.B) Target-H

Everybody wants to be somebody.

We query metadata set to retrieve all variables potentially tapping the construct education. These are the candidates to enter the DataSchema and contribute to computing harmonized variables.

NOTE: what is being retrieved depends on the manually entered values in the column construct of the metadata file ./data/shared/meta-data-map.csv. To specify a different group of variables, edit the metadata, not the script.

meta_data <- dto[["metaData"]] %>%
  dplyr::filter(construct %in% c('education')) %>% 
  dplyr::select(study_name, name, construct, label_short, categories, url) %>%
  dplyr::arrange(construct, study_name)
knitr::kable(meta_data)
study_name name construct label_short categories url
alsa SCHOOL education Age left school 8
alsa TYPQUAL education Highest qualification 10
lbsl EDUC94 education Years of school completed 18
satsa EDUC education Education 4
share DN0100 education Edcuation 13
share DN012D01 education yeshiva, religious high institution NA
share DN012D02 education nursing school NA
share DN012D03 education polytechnic NA
share DN012D04 education university, Bachelors degree NA
share DN012D05 education university, graduate degree NA
share DN012D09 education still in further education or training NA
share DN012DNO education no further education NA
share DN012DOT education other further education NA
share DN012DRF education refused NA
share DN012DDK education dont know NA
tilda DM001 education 4

View descriptives : education for closer examination of each candidate.

After reviewing descriptives and relevant codebooks, the following operationalization of the harmonized variables for education have been adopted:

Target : educ3

  • -1 - less then high school
  • 0 - high school - REFERENCE group
  • 1 - more than high school

These variables will be generated next, in the Development section.

(II) Development

The particulare goal of this section is to ensure that the schema to encode the values for the education variable is consisten across studies.

In this section we will define the schema sets for harmonizing education construct (i.e. specify which variables from which studies will be contributing to computing harmonized variables ). Each of these schema sets will have a particular pattern of possible response values to these variables, which we will export for inspection as .csv tables. We then will manually edit these .csv tables, populating new columns that will map values of harmonized variables to the specific response pattern of the schema set variables. We then will import harmonization algorithms encoded in .csv tables and apply them to compute harmonized variables in the dataset combining raw and harmonized variables for education construct across studies.

(II.A)

(1) Schema sets

Having all potential variables in categorical format we have defined the sets of data schema variables thus:

Each of these schema sets have a particular pattern of possible response values, for example:

We output these tables into self-standing .csv files, so we can manually provide the logic of computing harmonized variables.

You can examine them in `./data/meta/response-profiles-live/

(II.B) educ4

Target (1) : educ4

  • 1 - less then high-school
  • 2 - high-school most
  • 3 - college
  • 4 - college plus

ALSA

Items that can contribute to generating values for the harmonized variable education are:

dto[["metaData"]] %>%
  dplyr::filter(study_name=="alsa", construct %in% c("education")) %>%
  dplyr::select(study_name, name, label,categories)
  study_name    name                 label categories
1       alsa  SCHOOL       Age left school          8
2       alsa TYPQUAL Highest qualification         10

We encode the harmonization rule by manually editing the values in a corresponding .csv file located in ./data/meta/h-rules/. Then, we apply the recoding logic it contains and append the newly created, harmonized variable to the initial data set.

study_name <- "alsa"
path_to_hrule <- "./data/meta/h-rules/h-rules-education-alsa.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
  dto,
  study_name = study_name, 
  variable_names = c("SCHOOL","TYPQUAL"), 
  harmony_name = "educ3"
)
Source: local data frame [50 x 4]
Groups: SCHOOL, TYPQUAL [?]

                   SCHOOL                                  TYPQUAL                 educ3     n
                    (chr)                                    (chr)                 (chr) (int)
1  Eighteen or more years Bachelor Degree or Post Graduate Diploma more than high school    31
2  Eighteen or more years                   Certificate or Diploma more than high school    34
3  Eighteen or more years                     Higher Qualification more than high school     6
4  Eighteen or more years                        No Formal Tuition more than high school     1
5  Eighteen or more years                    Primary School Course more than high school     1
6  Eighteen or more years                  Secondary School Course more than high school     1
7  Eighteen or more years                  Trade or Apprenticeship more than high school     6
8  Eighteen or more years                                       NA more than high school    33
9           Fifteen years          Adult Education or Hobby Course more than high school     1
10          Fifteen years Bachelor Degree or Post Graduate Diploma more than high school     6
11          Fifteen years                   Certificate or Diploma more than high school    71
12          Fifteen years                                    Other more than high school     1
13          Fifteen years                  Secondary School Course more than high school     8
14          Fifteen years                  Trade or Apprenticeship more than high school    52
15          Fifteen years                                       NA more than high school   243
16         Fourteen years          Adult Education or Hobby Course           high school     8
17         Fourteen years Bachelor Degree or Post Graduate Diploma           high school     6
18         Fourteen years                   Certificate or Diploma           high school    72
19         Fourteen years                     Higher Qualification           high school     5
20         Fourteen years                        No Formal Tuition           high school     2
21         Fourteen years                                    Other           high school     2
22         Fourteen years                  Secondary School Course           high school     1
23         Fourteen years                  Trade or Apprenticeship           high school   109
24         Fourteen years                                       NA           high school   614
25   Never went to school                   Certificate or Diploma less than high school     4
26   Never went to school                  Trade or Apprenticeship less than high school     1
27   Never went to school                                       NA less than high school    25
28        Seventeen years          Adult Education or Hobby Course less than high school     1
29        Seventeen years Bachelor Degree or Post Graduate Diploma more than high school    22
30        Seventeen years                   Certificate or Diploma more than high school    41
31        Seventeen years                     Higher Qualification more than high school     1
32        Seventeen years                                    Other more than high school     2
33        Seventeen years                  Secondary School Course more than high school     1
34        Seventeen years                  Trade or Apprenticeship more than high school     6
35        Seventeen years                                       NA more than high school    57
36          Sixteen years Bachelor Degree or Post Graduate Diploma more than high school    14
37          Sixteen years                   Certificate or Diploma more than high school    82
38          Sixteen years                     Higher Qualification more than high school     2
39          Sixteen years                  Secondary School Course more than high school     4
40          Sixteen years                  Trade or Apprenticeship more than high school    26
41          Sixteen years                                       NA more than high school   152
42   Under fourteen years          Adult Education or Hobby Course less than high school     1
43   Under fourteen years Bachelor Degree or Post Graduate Diploma less than high school     1
44   Under fourteen years                   Certificate or Diploma less than high school    27
45   Under fourteen years                                    Other less than high school     1
46   Under fourteen years                  Secondary School Course less than high school     2
47   Under fourteen years                  Trade or Apprenticeship less than high school    36
48   Under fourteen years                                       NA less than high school   238
49                     NA                   Certificate or Diploma                    NA     1
50                     NA                                       NA                    NA    25
# verify
dto[["unitData"]][["alsa"]] %>%
  dplyr::filter(id %in% sample(unique(id),10)) %>%
  dplyr::select_("id", "SCHOOL","TYPQUAL","educ3")
      id                 SCHOOL TYPQUAL                 educ3
1   2001          Fifteen years    <NA> more than high school
2   3491   Under fourteen years    <NA> less than high school
3   4892         Fourteen years    <NA>           high school
4   5622   Under fourteen years    <NA> less than high school
5  10091 Eighteen or more years    <NA> more than high school
6  12411          Sixteen years    <NA> more than high school
7  22411          Sixteen years    <NA> more than high school
8  23341   Under fourteen years    <NA> less than high school
9  30352          Sixteen years    <NA> more than high school
10 32572          Sixteen years    <NA> more than high school

LBSL

Items that can contribute to generating values for the harmonized variable education are:

dto[["metaData"]] %>%
  dplyr::filter(study_name == "lbsl", construct == "education") %>%
  # dplyr::filter(name %in% c("EDUC94")) %>%
  dplyr::select(study_name, name, label_short,categories)
  study_name   name               label_short categories
1       lbsl EDUC94 Years of school completed         18

We encode the harmonization rule by manually editing the values in a corresponding .csv file located in ./data/meta/h-rules/. Then, we apply the recoding logic it contains and append the newly created, harmonized variable to the initial data set.

study_name <- "lbsl"
path_to_hrule <- "./data/meta/h-rules/h-rules-education-lbsl.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
  dto,
  study_name = study_name, 
  variable_names = c("EDUC94"), 
  harmony_name = "educ3"
)
Source: local data frame [18 x 3]
Groups: EDUC94 [?]

   EDUC94                 educ3     n
    (chr)                 (chr) (int)
1      10 less than high school    29
2      11 less than high school    18
3      12           high school   170
4      13 more than high school    40
5      14 more than high school    85
6      15 more than high school    37
7      16 more than high school    62
8      17 more than high school    15
9      18 more than high school    28
10     19 more than high school    10
11     20 more than high school    31
12     21 more than high school     1
13     23 more than high school     1
14      4 less than high school     1
15      7 less than high school     6
16      8 less than high school    16
17      9 less than high school     4
18     NA                    NA   102
# verify
dto[["unitData"]][["lbsl"]] %>%
  dplyr::filter(id %in% sample(unique(id),10)) %>%
  dplyr::select_("id", "EDUC94", "educ3")
        id EDUC94                 educ3
1  4092016     14 more than high school
2  4101168     16 more than high school
3  4131061     NA                  <NA>
4  4141203     12           high school
5  4242074     13 more than high school
6  4291048     NA                  <NA>
7  4301087     18 more than high school
8  4312017      8 less than high school
9  4612001     12           high school
10 4612005     13 more than high school

SATSA

Items that can contribute to generating values for the harmonized variable education are:

dto[["metaData"]] %>%
  dplyr::filter(study_name == "satsa", construct == "education") %>%
  # dplyr::filter(name %in% c("EDUC")) %>%
  dplyr::select(study_name, name, label_short,categories)
  study_name name label_short categories
1      satsa EDUC   Education          4

We encode the harmonization rule by manually editing the values in a corresponding .csv file located in ./data/meta/h-rules/. Then, we apply the recoding logic it contains and append the newly created, harmonized variable to the initial data set.

study_name <- "satsa"
path_to_hrule <- "./data/meta/h-rules/h-rules-education-satsa.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
  dto,
  study_name = study_name, 
  variable_names = c("EDUC"), 
  harmony_name = "educ3"
)
Source: local data frame [5 x 3]
Groups: EDUC [?]

                                         EDUC                 educ3     n
                                        (chr)                 (chr) (int)
1                           Elementary school less than high school   858
2                         gymnasium (A-level)           high school   121
3 O-level or vocational school or folk school less than high school   381
4                        university or higher more than high school   109
5                                          NA                    NA    28
# verify
dto[["unitData"]][["satsa"]] %>%
  dplyr::filter(id %in% sample(unique(id),10)) %>%
  dplyr::select_("id", "EDUC", "educ3")
        id                                        EDUC                 educ3
1    24262 O-level or vocational school or folk school less than high school
2   132021                           Elementary school less than high school
3   138291                           Elementary school less than high school
4   150541                        university or higher more than high school
5   165662 O-level or vocational school or folk school less than high school
6   178602                           Elementary school less than high school
7   274701                           Elementary school less than high school
8   294801 O-level or vocational school or folk school less than high school
9   295001                           Elementary school less than high school
10 2151972 O-level or vocational school or folk school less than high school

SHARE

Items that can contribute to generating values for the harmonized variable education are:

dto[["metaData"]] %>%
  dplyr::filter(study_name == "share", construct == "education") %>%
  # dplyr::filter(name %in% c("DN0100")) %>%
  dplyr::select(study_name, name, label_short,categories)
   study_name     name                            label_short categories
1       share   DN0100                              Edcuation         13
2       share DN012D01    yeshiva, religious high institution         NA
3       share DN012D02                         nursing school         NA
4       share DN012D03                            polytechnic         NA
5       share DN012D04           university, Bachelors degree         NA
6       share DN012D05            university, graduate degree         NA
7       share DN012D09 still in further education or training         NA
8       share DN012DNO                   no further education         NA
9       share DN012DOT                other further education         NA
10      share DN012DRF                                refused         NA
11      share DN012DDK                              dont know         NA

We encode the harmonization rule by manually editing the values in a corresponding .csv file located in ./data/meta/h-rules/. Then, we apply the recoding logic it contains and append the newly created, harmonized variable to the initial data set.

study_name <- "share"
path_to_hrule <- "./data/meta/h-rules/h-rules-education-share.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
  dto,
  study_name = study_name,
  variable_names = c("DN0100","DN012D01","DN012D02","DN012D03",
                     "DN012D04","DN012D05","DN012D09", "DN012DNO", "DN012DOT",
                     "DN012DRF", "DN012DDK"),
  harmony_name = "educ3"
)
Source: local data frame [74 x 13]
Groups: DN0100, DN012D01, DN012D02, DN012D03, DN012D04, DN012D05, DN012D09, DN012DNO, DN012DOT, DN012DRF, DN012DDK [?]

                                                        DN0100     DN012D01     DN012D02     DN012D03     DN012D04
                                                         (chr)        (chr)        (chr)        (chr)        (chr)
1                                                   don't know not selected not selected not selected not selected
2                                            Elementary school not selected not selected not selected not selected
3                                            Elementary school not selected not selected not selected not selected
4                                            Elementary school not selected not selected not selected     selected
5                                            Elementary school not selected not selected     selected not selected
6                                            Elementary school not selected     selected not selected not selected
7                                            Elementary school     selected not selected not selected not selected
8            full Academic secondary school (no matriculation) not selected not selected not selected not selected
9            full Academic secondary school (no matriculation) not selected not selected not selected not selected
10           full Academic secondary school (no matriculation) not selected not selected not selected not selected
11           full Academic secondary school (no matriculation) not selected not selected not selected not selected
12           full Academic secondary school (no matriculation) not selected not selected not selected     selected
13           full Academic secondary school (no matriculation) not selected not selected not selected     selected
14           full Academic secondary school (no matriculation) not selected not selected     selected not selected
15           full Academic secondary school (no matriculation) not selected     selected not selected not selected
16           full Academic secondary school (no matriculation) not selected     selected not selected     selected
17         full Academic secondary school (with matriculation) not selected not selected not selected not selected
18         full Academic secondary school (with matriculation) not selected not selected not selected not selected
19         full Academic secondary school (with matriculation) not selected not selected not selected not selected
20         full Academic secondary school (with matriculation) not selected not selected not selected not selected
21         full Academic secondary school (with matriculation) not selected not selected not selected not selected
22         full Academic secondary school (with matriculation) not selected not selected not selected     selected
23         full Academic secondary school (with matriculation) not selected not selected not selected     selected
24         full Academic secondary school (with matriculation) not selected not selected not selected     selected
25         full Academic secondary school (with matriculation) not selected not selected not selected     selected
26         full Academic secondary school (with matriculation) not selected not selected     selected not selected
27         full Academic secondary school (with matriculation) not selected not selected     selected     selected
28         full Academic secondary school (with matriculation) not selected     selected not selected not selected
29         full Academic secondary school (with matriculation) not selected     selected not selected not selected
30         full Academic secondary school (with matriculation)     selected not selected not selected not selected
31         full Academic secondary school (with matriculation)     selected not selected not selected not selected
32       full occipational secondary school (no matriculation) not selected not selected not selected not selected
33       full occipational secondary school (no matriculation) not selected not selected not selected not selected
34       full occipational secondary school (no matriculation) not selected not selected not selected not selected
35       full occipational secondary school (no matriculation) not selected not selected not selected not selected
36       full occipational secondary school (no matriculation) not selected not selected not selected     selected
37       full occipational secondary school (no matriculation) not selected not selected not selected     selected
38       full occipational secondary school (no matriculation) not selected not selected     selected not selected
39       full occipational secondary school (no matriculation) not selected not selected     selected     selected
40       full occipational secondary school (no matriculation) not selected     selected not selected not selected
41     full occipational secondary school (with matriculation) not selected not selected not selected not selected
42     full occipational secondary school (with matriculation) not selected not selected not selected not selected
43     full occipational secondary school (with matriculation) not selected not selected not selected not selected
44     full occipational secondary school (with matriculation) not selected not selected not selected not selected
45     full occipational secondary school (with matriculation) not selected not selected not selected     selected
46     full occipational secondary school (with matriculation) not selected not selected     selected not selected
47     full occipational secondary school (with matriculation) not selected     selected not selected not selected
48                                                        none not selected not selected not selected not selected
49                                                        none not selected not selected not selected not selected
50                                    other type (also abroad) not selected not selected not selected not selected
51                                    other type (also abroad) not selected not selected not selected not selected
52                                    other type (also abroad) not selected not selected not selected not selected
53                                    other type (also abroad) not selected not selected not selected     selected
54                                    other type (also abroad) not selected     selected not selected not selected
55                                    other type (also abroad)     selected not selected not selected not selected
56        partial Academic secondary school (no matriculation) not selected not selected not selected not selected
57        partial Academic secondary school (no matriculation) not selected not selected not selected not selected
58        partial Academic secondary school (no matriculation) not selected not selected not selected not selected
59        partial Academic secondary school (no matriculation) not selected not selected not selected     selected
60        partial Academic secondary school (no matriculation) not selected not selected     selected not selected
61        partial Academic secondary school (no matriculation) not selected     selected not selected not selected
62        partial Academic secondary school (no matriculation)     selected not selected not selected not selected
63 Partial occipational secondary school (did not graduate, no not selected not selected not selected not selected
64 Partial occipational secondary school (did not graduate, no not selected not selected not selected not selected
65 Partial occipational secondary school (did not graduate, no not selected not selected not selected     selected
66 Partial occipational secondary school (did not graduate, no not selected not selected     selected not selected
67                 yeshiva secondary school (no matriculation) not selected not selected not selected not selected
68                 yeshiva secondary school (no matriculation) not selected not selected not selected not selected
69                 yeshiva secondary school (no matriculation) not selected not selected not selected     selected
70                 yeshiva secondary school (no matriculation)     selected not selected not selected not selected
71                yeshiva secondary school (wih matriculation) not selected not selected not selected not selected
72                yeshiva secondary school (wih matriculation) not selected not selected not selected not selected
73                yeshiva secondary school (wih matriculation) not selected not selected not selected     selected
74                                                          NA           NA           NA           NA           NA
Variables not shown: DN012D05 (chr), DN012D09 (chr), DN012DNO (chr), DN012DOT (chr), DN012DRF (chr), DN012DDK (chr),
  educ3 (chr), n (int)
# verify
knitr::kable(dto[["unitData"]][["share"]] %>%
  dplyr::filter(id %in% sample(unique(id),10)) %>%
  dplyr::select_("id", "DN0100","DN012D01","DN012D02","DN012D03",
                 "DN012D04","DN012D05","DN012D09", "DN012DNO", "DN012DOT",
                 "DN012DRF", "DN012DDK", "educ3"))
id DN0100 DN012D01 DN012D02 DN012D03 DN012D04 DN012D05 DN012D09 DN012DNO DN012DOT DN012DRF DN012DDK educ3
2.505203e+12 full Academic secondary school (with matriculation) not selected not selected not selected not selected not selected not selected selected not selected not selected not selected high school
2.505216e+12 full occipational secondary school (no matriculation) not selected not selected not selected not selected not selected not selected not selected selected not selected not selected high school
2.505225e+12 full Academic secondary school (with matriculation) not selected not selected not selected selected not selected not selected not selected not selected not selected not selected more than high school
2.505230e+12 Elementary school not selected not selected not selected not selected not selected not selected selected not selected not selected not selected less than high school
2.505234e+12 full Academic secondary school (with matriculation) not selected not selected not selected not selected selected not selected not selected not selected not selected not selected more than high school
2.505245e+12 full occipational secondary school (no matriculation) not selected not selected not selected not selected not selected not selected not selected selected not selected not selected high school
2.505270e+12 Elementary school not selected not selected not selected not selected not selected not selected selected not selected not selected not selected less than high school
2.605202e+12 partial Academic secondary school (no matriculation) not selected not selected not selected not selected not selected not selected selected not selected not selected not selected less than high school
2.605215e+12 partial Academic secondary school (no matriculation) not selected not selected not selected not selected not selected not selected selected not selected not selected not selected less than high school
2.605230e+12 full Academic secondary school (no matriculation) not selected not selected not selected not selected not selected not selected selected not selected not selected not selected high school
dto[["unitData"]][["share"]] %>%
  dplyr::group_by(DN0100) %>%
  dplyr::summarize(count=n())
Source: local data frame [13 x 2]

                                                        DN0100 count
                                                        (fctr) (int)
1                                            Elementary school   501
2  Partial occipational secondary school (did not graduate, no   102
3        full occipational secondary school (no matriculation)   174
4      full occipational secondary school (with matriculation)   113
5         partial Academic secondary school (no matriculation)   219
6            full Academic secondary school (no matriculation)   274
7          full Academic secondary school (with matriculation)  1024
8                  yeshiva secondary school (no matriculation)     8
9                 yeshiva secondary school (wih matriculation)     6
10                                                        none   143
11                                    other type (also abroad)    32
12                                                  don't know     1
13                                                          NA     1

TILDA

Items that can contribute to generating values for the harmonized variable education are:

dto[["metaData"]] %>%
  dplyr::filter(study_name == "tilda", construct == "education") %>%
  # dplyr::filter(name %in% c("SMK94", "SMOKE")) %>%
  dplyr::select(study_name, name, label_short,categories)
  study_name  name label_short categories
1      tilda DM001                      4

We encode the harmonization rule by manually editing the values in a corresponding .csv file located in ./data/meta/h-rules/. Then, we apply the recoding logic it contains and append the newly created, harmonized variable to the initial data set.

study_name <- "tilda"
path_to_hrule <- "./data/meta/h-rules/h-rules-education-tilda.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
  dto,
  study_name = study_name, 
  variable_names = c("DM001"), 
  harmony_name = "educ3"
)
Source: local data frame [9 x 3]
Groups: DM001 [?]

                                                DM001                 educ3     n
                                                (chr)                 (chr) (int)
1                                 Diploma/certificate           high school  1335
2 Intermediate/junior/group certificate or equivalent less than high school  1971
3                   Leaving certificate or equivalent           high school  1460
4                                                None less than high school     9
5                          Postgraduate/higher degree more than high school   483
6                                      Primary degree less than high school   730
7                               Primary or equivalent less than high school  2232
8                         Some primary (not complete) less than high school   280
9                                                  NA                    NA     4
# verify
dto[["unitData"]][["tilda"]] %>%
  dplyr::filter(id %in% sample(unique(id),10)) %>%
  dplyr::select_("id", "DM001", "educ3")
                   id                                               DM001                 educ3
1  75911              Intermediate/junior/group certificate or equivalent less than high school
2  97071                                            Primary or equivalent less than high school
3  195521                                             Diploma/certificate           high school
4  252201                                      Postgraduate/higher degree more than high school
5  286431                                      Postgraduate/higher degree more than high school
6  463892                                           Primary or equivalent less than high school
7  477791                                                  Primary degree less than high school
8  492302             Intermediate/junior/group certificate or equivalent less than high school
9  493222                                                  Primary degree less than high school
10 564502                                             Diploma/certificate           high school

(III) Recapitulation

At this point the dto[["unitData"]] elements (raw data files for each study) have been augmented with the harmonized variable educ4. We retrieve harmonized variables to view frequency counts across studies:

dumlist <- list()
for(s in dto[["studyName"]]){
  ds <- dto[["unitData"]][[s]]
  dumlist[[s]] <- ds[,c("id","educ3")]
}
ds <- plyr::ldply(dumlist,data.frame,.id = "study_name")
head(ds)
  study_name  id                 educ3
1       alsa  41 more than high school
2       alsa  42           high school
3       alsa  61           high school
4       alsa  71           high school
5       alsa  91 more than high school
6       alsa 121           high school
ds$id <- 1:nrow(ds) # some ids values might be identical, replace
ds$educ3 <- car::recode(ds$educ3,"
                         'less than high school'=0;
                         'high school' =1;
                         'more than high school'=2
                         ",as.factor.result=TRUE )
ds$educ3 <- factor(
  ds$educ3,
  levels = c("less than high school",
             "high school",
             "more than high school"),
  labels = c(0,1,2)
)
table( ds$educ3, ds$study_name, useNA = "always")
      
       alsa lbsl satsa share tilda <NA>
  0       0    0     0     0     0    0
  1       0    0     0     0     0    0
  2       0    0     0     0     0    0
  <NA> 2087  656  1497  2598  8504    0

Finally, we have added the newly created, harmonized variables to the raw source objects and save the data transfer object.

# Save as a compress, binary R dataset.  It's no longer readable with a text editor, but it saves metadata (eg, factor information).
saveRDS(dto, file="./data/unshared/derived/dto.rds", compress="xz")