This report lists the candidate variable for DataScheme variables of the construct alcohol.

(I) Exposition

This report is a record of interaction with a data transfer object (dto) produced by ./manipulation/0-ellis-island.R.

The next section recaps this script, exposes the architecture of the DTO, and demonstrates the language of interacting with it.

(I.A) Ellis Island

All data land on Ellis Island.

The script 0-ellis-island.R is the first script in the analytic workflow. It accomplished the following:

    1. Reads in raw data files from the candidate studies
    1. Extract, combines, and exports their metadata (specifically, variable names and labels, if provided) into ./data/shared/derived/meta-data-live.csv, which is updated every time Ellis Island script is executed.
    1. Augments raw metadata with instructions for renaming and classifying variables. The instructions are provided as manually entered values in ./data/shared/meta-data-map.csv. They are used by automatic scripts in later harmonization and analysis.
    1. Combines unit and metadata into a single DTO to serve as a starting point to all subsequent analyses.
# load the product of 0-ellis-island.R,  a list object containing data and metadata
dto <- readRDS("./data/unshared/derived/dto.rds")
# the list is composed of the following elements
names(dto)
[1] "studyName" "filePath"  "unitData"  "metaData" 
# 1st element - names of the studies as character vector
dto[["studyName"]]
[1] "alsa"  "lbsl"  "satsa" "share" "tilda"
# 2nd element - file paths of the data files for each study as character vector
dto[["filePath"]]
[1] "./data/unshared/raw/ALSA-Wave1.Final.sav"         "./data/unshared/raw/LBSL-Panel2-Wave1.Final.sav" 
[3] "./data/unshared/raw/SATSA-Q3.Final.sav"           "./data/unshared/raw/SHARE-Israel-Wave1.Final.sav"
[5] "./data/unshared/raw/TILDA-Wave1.Final.sav"       
# 3rd element - is a list object containing the following elements
names(dto[["unitData"]])
[1] "alsa"  "lbsl"  "satsa" "share" "tilda"
# each of these elements is a raw data set of a corresponding study, for example
dplyr::tbl_df(dto[["unitData"]][["lbsl"]]) 
Source: local data frame [656 x 35]

        id AGE94 SEX94  MSTAT94 EDUC94     NOWRK94  SMK94                                         SMOKE
     (int) (int) (int)   (fctr)  (int)      (fctr) (fctr)                                        (fctr)
1  4001026    68     1 divorced     16 no, retired     no                                  never smoked
2  4012015    94     2  widowed     12 no, retired     no                                  never smoked
3  4012032    94     2  widowed     20 no, retired     no don't smoke at present but smoked in the past
4  4022004    93     2       NA     NA          NA     NA                                  never smoked
5  4022026    93     2  widowed     12 no, retired     no                                  never smoked
6  4031031    92     1  married      8 no, retired     no don't smoke at present but smoked in the past
7  4031035    92     1  widowed     13 no, retired     no don't smoke at present but smoked in the past
8  4032201    92     2       NA     NA          NA     NA don't smoke at present but smoked in the past
9  4041062    91     1  widowed      7          NA     no don't smoke at present but smoked in the past
10 4042057    91     2       NA     NA          NA     NA                                            NA
..     ...   ...   ...      ...    ...         ...    ...                                           ...
Variables not shown: ALCOHOL (fctr), WINE (int), BEER (int), HARDLIQ (int), SPORT94 (int), FIT94 (int), WALK94 (int),
  SPEC94 (int), DANCE94 (int), CHORE94 (int), EXCERTOT (int), EXCERWK (int), HEIGHT94 (int), WEIGHT94 (int), HWEIGHT
  (int), HHEIGHT (int), SRHEALTH (fctr), smoke_now (lgl), smoked_ever (lgl), year_of_wave (dbl), age_in_years (dbl),
  year_born (dbl), female (lgl), marital (chr), single (lgl), educ3 (chr), current_work_2 (lgl)

Meta

# 4th element - a dataset names and labels of raw variables + added metadata for all studies
dto[["metaData"]] %>% dplyr::select(study_name, name, item, construct, type, categories, label_short, label) %>% 
  DT::datatable(
    class   = 'cell-border stripe',
    caption = "This is the primary metadata file. Edit at `./data/shared/meta-data-map.csv",
    filter  = "top",
    options = list(pageLength = 6, autoWidth = TRUE)
  )

(I.B) Target-H

Everybody wants to be somebody.

We query metadata set to retrieve all variables potentially tapping the construct alcohol. These are the candidates to enter the DataSchema and contribute to computing harmonized variables.

NOTE: what is being retrieved depends on the manually entered values in the column construct of the metadata file ./data/shared/meta-data-map.csv. To specify a different group of variables, edit the metadata, not the script.

meta_data <- dto[["metaData"]] %>%
  dplyr::filter(construct %in% c('alcohol')) %>% 
  dplyr::select(study_name, name, construct, label_short, categories, url) %>%
  dplyr::arrange(construct, study_name)
knitr::kable(meta_data)
study_name name construct label_short categories url
alsa FR6ORMOR alcohol Frequency six or more drinks 5
alsa NOSTDRNK alcohol Number of standard drinks 5
alsa FREQALCH alcohol Frequency alcohol 5
lbsl ALCOHOL alcohol Alcohol use 7
lbsl WINE alcohol Number of glasses of wine last week 17
lbsl BEER alcohol Number of cans/bottles of beer last week 16
lbsl HARDLIQ alcohol Number of drinks containing hard liquor last week 15
satsa GALCOHOL alcohol Do you ever drink alcoholic beverages? 2
satsa GBEERX alcohol How much beer do you usually drink at a time? 7
satsa GBOTVIN alcohol …more than 1 bottle 4
satsa GDRLOTS alcohol How often more than 5 beers? 8
satsa GEVRALK alcohol Do you ever drink alcoholic drinks? - Yes 3
satsa GFREQBER alcohol How often do you drink beer (not light beer)? 9
satsa GFREQLIQ alcohol How often do you usually drink hard liquor? 9
satsa GFREQVIN alcohol How often do you usually drink wine (red or white)? 9
satsa GLIQX alcohol How much hard liquot do you usually drink at time? 8
satsa GSTOPALK alcohol Do you ever drink alcoholic drinks? -No I quit. When? 19__ 32
satsa GVINX alcohol How much wine do you usually drink at a time? 6
share BR0100 alcohol beverages consumed last 6 months 7
share BR0110 alcohol freq more than 2 glasses beer in a day 7
share BR0120 alcohol freq more than 2 glasses wine in a day 8
share BR0130 alcohol freq more than 2 hard liquor in a day 8
tilda BEHALC.DRINKSPERDAY alcohol Standard drinks per day 35
tilda BEHALC.DRINKSPERWEEK alcohol Standard drinks a week 120
tilda BEHALC.FREQ.WEEK alcohol Average times drinking per week 7
tilda SCQALCOFREQ alcohol Frequency of drinking alcohol 7
tilda SCQALCOHOL alcohol Alcoholic drinks 2
tilda SCQALCONO1 alcohol More than 2 drinks/day 7
tilda SCQALCONO2 alcohol How many drinks consumed on days drink taken 19

View descriptives : alcohol for closer examination of each candidate.

After reviewing descriptives and relevant codebooks, the following operationalization of the harmonized variables for alcohol have been adopted:

Target: current_drink

  • 0 - FALSE healthy choice - REFERENCE group
  • 1 - TRUE risk factor

These variables will be generated next, in the Development section.

(II) Development

The particulare goal of this section is to ensure that the schema to encode the values for the alcohol variable is consisten across studies.

In this section we will define the schema sets for harmonizing alcohol construct (i.e. specify which variables from which studies will be contributing to computing harmonized variables ). Each of these schema sets will have a particular pattern of possible response values to these variables, which we will export for inspection as .csv tables. We then will manually edit these .csv tables, populating new columns that will map values of harmonized variables to the specific response pattern of the schema set variables. We then will import harmonization algorithms encoded in .csv tables and apply them to compute harmonized variables in the dataset combining raw and harmonized variables for alcohol construct across studies.

(II.A)

(1) Schema sets

Having all potential variables in categorical format we have defined the sets of data schema variables thus:

schema_sets <- list(
  "alsa" = c("FREQALCH","NOSTDRNK","FR6ORMOR"),
  "lbsl" = c("ALCOHOL", "BEER","HARDLIQ","WINE"),
  "satsa" =  c("GALCOHOL","GEVRALK","GBEERX","GLIQX","GVINX" ),
  "share" = c("BR0100","BR0110", "BR0120","BR0130"), 
  "tilda" = c("SCQALCOHOL","BEHALC.DRINKSPERDAY","BEHALC.DRINKSPERWEEK") 
)

Each of these schema sets have a particular pattern of possible response values, for example:

# view the profile of responses
dto[["unitData"]][["alsa"]] %>% 
  dplyr::group_by_("FREQALCH","NOSTDRNK","FR6ORMOR") %>% 
  dplyr::summarize(count = n()) 
Source: local data frame [51 x 4]
Groups: FREQALCH, NOSTDRNK [?]

                    FREQALCH      NOSTDRNK          FR6ORMOR count
                      (fctr)        (fctr)            (fctr) (int)
1                      Never            NA                NA   774
2            Monthly or less    One or two             Never   337
3            Monthly or less    One or two Less than monthly     6
4            Monthly or less Three or four             Never    18
5            Monthly or less Three or four Less than monthly     3
6            Monthly or less   Five or six Less than monthly     2
7            Monthly or less Seven to nine Less than monthly     1
8            Monthly or less            NA                NA     1
9  Two to four times a month    One or two             Never   132
10 Two to four times a month    One or two Less than monthly     4
..                       ...           ...               ...   ...

We output these tables into self-standing .csv files, so we can manually provide the logic of computing harmonized variables.

# define function to extract profiles
response_profile <- function(dto, h_target, study, varnames_values){
  ds <- dto[["unitData"]][[study]]
  varnames_values <- lapply(varnames_values, as.symbol)   # Convert character vector to list of symbols
  d <- ds %>% 
    dplyr::group_by_(.dots=varnames_values) %>% 
    dplyr::summarize(count = n()) 
  write.csv(d,paste0("./data/meta/response-profiles-live/",h_target,"-",study,".csv"))
}
# extract response profile for data schema set from each study
for(s in names(schema_sets)){
  response_profile(dto,
                   study = s,
                   h_target = 'alcohol',
                   varnames_values = schema_sets[[s]]
  )
}

You can examine them in `./data/meta/response-profiles-live/

(II.B) current_drink

Target (1) : current_drink

  • 0 - FALSE healthy choice
  • 1 - TRUE risk factor

ALSA

Items that can contribute to generating values for the harmonized variable alcohol are:

dto[["metaData"]] %>%
  dplyr::filter(study_name=="alsa", construct %in% c("alcohol")) %>%
  dplyr::select(study_name, name, label,categories)
  study_name     name                        label categories
1       alsa FR6ORMOR Frequency six or more drinks          5
2       alsa NOSTDRNK    Number of standard drinks          5
3       alsa FREQALCH            Frequency alcohol          5

We encode the harmonization rule by manually editing the values in a corresponding .csv file located in ./data/meta/h-rules/. Then, we apply the recoding logic it contains and append the newly created, harmonized variable to the initial data set.

study_name <- "alsa"
path_to_hrule <- "./data/meta/h-rules/h-rules-alcohol-alsa.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
  dto,
  study_name = study_name, 
  variable_names = c("FREQALCH", "NOSTDRNK", "FR6ORMOR"), 
  harmony_name = "current_drink"
)
Source: local data frame [51 x 5]
Groups: FREQALCH, NOSTDRNK, FR6ORMOR [?]

                    FREQALCH      NOSTDRNK              FR6ORMOR current_drink     n
                       (chr)         (chr)                 (chr)         (lgl) (int)
1  Four or more times a week   Five or six Daily or almost daily          TRUE    12
2  Four or more times a week   Five or six     Less than monthly          TRUE     2
3  Four or more times a week   Five or six               Monthly          TRUE     5
4  Four or more times a week   Five or six                 Never          TRUE     6
5  Four or more times a week   Five or six                Weekly          TRUE     6
6  Four or more times a week    One or two     Less than monthly          TRUE    49
7  Four or more times a week    One or two               Monthly          TRUE     7
8  Four or more times a week    One or two                 Never          TRUE   324
9  Four or more times a week    One or two                Weekly          TRUE     3
10 Four or more times a week Seven to nine Daily or almost daily          TRUE     8
11 Four or more times a week Seven to nine               Monthly          TRUE     2
12 Four or more times a week Seven to nine                 Never          TRUE     1
13 Four or more times a week   Ten or more Daily or almost daily          TRUE     1
14 Four or more times a week Three or four Daily or almost daily          TRUE     1
15 Four or more times a week Three or four     Less than monthly          TRUE    35
16 Four or more times a week Three or four               Monthly          TRUE    15
17 Four or more times a week Three or four                 Never          TRUE    52
18 Four or more times a week Three or four                Weekly          TRUE    15
19           Monthly or less   Five or six     Less than monthly          TRUE     2
20           Monthly or less    One or two     Less than monthly          TRUE     6
21           Monthly or less    One or two                 Never          TRUE   337
22           Monthly or less Seven to nine     Less than monthly          TRUE     1
23           Monthly or less Three or four     Less than monthly          TRUE     3
24           Monthly or less Three or four                 Never          TRUE    18
25           Monthly or less            NA                    NA          TRUE     1
26                     Never            NA                    NA         FALSE   774
27 Two to four times a month   Five or six     Less than monthly          TRUE     1
28 Two to four times a month   Five or six                 Never          TRUE     2
29 Two to four times a month   Five or six                Weekly          TRUE     1
30 Two to four times a month    One or two     Less than monthly          TRUE     4
31 Two to four times a month    One or two               Monthly          TRUE     2
32 Two to four times a month    One or two                 Never          TRUE   132
33 Two to four times a month Seven to nine               Monthly          TRUE     1
34 Two to four times a month Seven to nine                Weekly          TRUE     1
35 Two to four times a month Three or four     Less than monthly          TRUE     5
36 Two to four times a month Three or four                 Never          TRUE    18
37 Two to three times a week   Five or six Daily or almost daily          TRUE     1
38 Two to three times a week   Five or six     Less than monthly          TRUE     3
39 Two to three times a week   Five or six               Monthly          TRUE     1
40 Two to three times a week   Five or six                 Never          TRUE     1
41 Two to three times a week   Five or six                Weekly          TRUE     3
42 Two to three times a week    One or two     Less than monthly          TRUE    14
43 Two to three times a week    One or two               Monthly          TRUE     6
44 Two to three times a week    One or two                 Never          TRUE   149
45 Two to three times a week Seven to nine                 Never          TRUE     1
46 Two to three times a week Seven to nine                Weekly          TRUE     1
47 Two to three times a week   Ten or more                Weekly          TRUE     1
48 Two to three times a week Three or four     Less than monthly          TRUE     9
49 Two to three times a week Three or four                 Never          TRUE    23
50 Two to three times a week Three or four                Weekly          TRUE     1
51                        NA            NA                    NA            NA    20
# verify
dto[["unitData"]][["alsa"]] %>%
  dplyr::filter(id %in% sample(unique(id),10)) %>%
  dplyr::select_("id", "FREQALCH", "NOSTDRNK", "FR6ORMOR", "current_drink")
      id                  FREQALCH      NOSTDRNK          FR6ORMOR current_drink
1   1331 Four or more times a week Three or four Less than monthly          TRUE
2   5101                     Never          <NA>              <NA>         FALSE
3   5302                     Never          <NA>              <NA>         FALSE
4   9591 Two to four times a month    One or two             Never          TRUE
5  12061                     Never          <NA>              <NA>         FALSE
6  12751 Two to three times a week    One or two             Never          TRUE
7  14541 Two to three times a week    One or two             Never          TRUE
8  24902                     Never          <NA>              <NA>         FALSE
9  35371 Four or more times a week Three or four             Never          TRUE
10 36351           Monthly or less    One or two             Never          TRUE

LBSL

Items that can contribute to generating values for the harmonized variable alcohol are:

dto[["metaData"]] %>%
  dplyr::filter(study_name == "lbsl", construct == "alcohol") %>%
  dplyr::select(study_name, name, label_short,categories)
  study_name    name                                       label_short categories
1       lbsl ALCOHOL                                       Alcohol use          7
2       lbsl    WINE               Number of glasses of wine last week         17
3       lbsl    BEER          Number of cans/bottles of beer last week         16
4       lbsl HARDLIQ Number of drinks containing hard liquor last week         15

We encode the harmonization rule by manually editing the values in a corresponding .csv file located in ./data/meta/h-rules/. Then, we apply the recoding logic it contains and append the newly created, harmonized variable to the initial data set.

study_name <- "lbsl"
path_to_hrule <- "./data/meta/h-rules/h-rules-alcohol-lbsl.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
  dto,
  study_name = study_name, 
  variable_names = c("ALCOHOL", "BEER" ,   "HARDLIQ", "WINE" ), 
  harmony_name = "current_drink"
)
Source: local data frame [174 x 6]
Groups: ALCOHOL, BEER, HARDLIQ, WINE [?]

                  ALCOHOL  BEER HARDLIQ  WINE current_drink     n
                    (chr) (chr)   (chr) (chr)         (lgl) (int)
1   daily or almost daily     0       0    10          TRUE     1
2   daily or almost daily     0       0    14          TRUE     1
3   daily or almost daily     0       0    15          TRUE     2
4   daily or almost daily     0       0     6          TRUE     1
5   daily or almost daily     0       0     7          TRUE     4
6   daily or almost daily     0       0     8          TRUE     1
7   daily or almost daily     0       0     9          TRUE     1
8   daily or almost daily     0       1    10          TRUE     1
9   daily or almost daily     0       1     5          TRUE     1
10  daily or almost daily     0       1     6          TRUE     1
11  daily or almost daily     0      14     0          TRUE     2
12  daily or almost daily     0      14     1          TRUE     1
13  daily or almost daily     0      14    10          TRUE     1
14  daily or almost daily     0       2    10          TRUE     1
15  daily or almost daily     0       2    14          TRUE     3
16  daily or almost daily     0       2    15          TRUE     1
17  daily or almost daily     0       2     5          TRUE     1
18  daily or almost daily     0      25     0          TRUE     1
19  daily or almost daily     0       4    12          TRUE     1
20  daily or almost daily     0       4     4          TRUE     1
21  daily or almost daily     0       4     5          TRUE     1
22  daily or almost daily     0       5     0          TRUE     2
23  daily or almost daily     0       6     3          TRUE     1
24  daily or almost daily     0       7     0          TRUE     3
25  daily or almost daily     0       7     3          TRUE     1
26  daily or almost daily     0       7     5          TRUE     1
27  daily or almost daily     0       7     7          TRUE     2
28  daily or almost daily     0       8     0          TRUE     1
29  daily or almost daily     1      14     0          TRUE     1
30  daily or almost daily     1      14    12          TRUE     1
31  daily or almost daily     1      14     2          TRUE     1
32  daily or almost daily     1       2    15          TRUE     1
33  daily or almost daily     1       2     6          TRUE     1
34  daily or almost daily    10       0     0          TRUE     2
35  daily or almost daily    10       0     1          TRUE     1
36  daily or almost daily    10       2     0          TRUE     1
37  daily or almost daily    10       5    NA          TRUE     1
38  daily or almost daily    12       0     6          TRUE     2
39  daily or almost daily    14       3     7          TRUE     1
40  daily or almost daily     2      10     0          TRUE     1
41  daily or almost daily     2      14    10          TRUE     1
42  daily or almost daily     2       7     3          TRUE     1
43  daily or almost daily    25       3     0          TRUE     1
44  daily or almost daily     3       0    12          TRUE     1
45  daily or almost daily     3       0     7          TRUE     1
46  daily or almost daily     3       6     0          TRUE     1
47  daily or almost daily    30       0     0          TRUE     1
48  daily or almost daily     4      12     0          TRUE     1
49  daily or almost daily     4       3     2          TRUE     1
50  daily or almost daily     5       6     0          TRUE     1
51  daily or almost daily     6       0     0          TRUE     1
52  daily or almost daily     6       3     2          TRUE     1
53  daily or almost daily     7       0     0          TRUE     1
54  daily or almost daily     7       0     2          TRUE     1
55  daily or almost daily     7       2     0          TRUE     1
56  daily or almost daily     7       2     3          TRUE     1
57  daily or almost daily     7       7    NA          TRUE     1
58  daily or almost daily     8       0     0          TRUE     1
59  daily or almost daily     8       0     8          TRUE     1
60  daily or almost daily     9       0     0          TRUE     1
61  daily or almost daily    NA      14    NA          TRUE     1
62  daily or almost daily    NA      15     3          TRUE     1
63  daily or almost daily    NA      21    NA          TRUE     1
64  daily or almost daily    NA       7     7          TRUE     1
65  daily or almost daily    NA       7    NA          TRUE     3
66  daily or almost daily    NA      NA    21          TRUE     1
67  daily or almost daily    NA      NA     7          TRUE     1
68       few times a year     0       0     0          TRUE    69
69       few times a year     0       0     1          TRUE    10
70       few times a year     0       0     2          TRUE     2
71       few times a year     0       1     0          TRUE     1
72       few times a year     1       0     0          TRUE     5
73       few times a year     1       0     1          TRUE     1
74       few times a year     1       0     2          TRUE     2
75       few times a year     1      NA     1          TRUE     1
76       few times a year     1      NA    NA          TRUE     1
77       few times a year     2       0     0          TRUE     3
78       few times a year     2       0     1          TRUE     2
79       few times a year     2       0     5          TRUE     1
80       few times a year     3       0     0          TRUE     1
81       few times a year     9       9     9          TRUE     1
82       few times a year    NA      NA    NA          TRUE    43
83            never drank     0       0     0         FALSE    10
84            never drank    NA      NA    NA         FALSE    82
85       not in last year     0       0     0         FALSE    13
86       not in last year     1       0     0         FALSE     1
87       not in last year    NA      NA    NA         FALSE    78
88            once a week     0       0     0          TRUE     5
89            once a week     0       0     1          TRUE     2
90            once a week     0       0     2          TRUE     5
91            once a week     0       0     3          TRUE     1
92            once a week     0       0     4          TRUE     1
93            once a week     0       0     5          TRUE     1
94            once a week     0       0     6          TRUE     1
95            once a week     0       1     0          TRUE     2
96            once a week     0       1     1          TRUE     3
97            once a week     0       1     3          TRUE     1
98            once a week     0       2     0          TRUE     4
99            once a week     0       2     1          TRUE     1
100           once a week     0       4     1          TRUE     1
..                    ...   ...     ...   ...           ...   ...
# verify
dto[["unitData"]][["lbsl"]] %>%
  dplyr::filter(id %in% sample(unique(id),10)) %>%
  dplyr::select_("id", "ALCOHOL", "BEER" ,   "HARDLIQ", "WINE", "current_drink")
        id                 ALCOHOL BEER HARDLIQ WINE current_drink
1  4042082   daily or almost daily    0       0    7          TRUE
2  4132042             never drank   NA      NA   NA         FALSE
3  4141201        not in last year    0       0    0         FALSE
4  4152089        few times a year    0       0    0          TRUE
5  4161060 once or twice per month    0       0    1          TRUE
6  4202081        not in last year   NA      NA   NA         FALSE
7  4212011        few times a year   NA      NA   NA          TRUE
8  4241078        not in last year   NA      NA   NA         FALSE
9  4261199                    <NA>   NA      NA   NA            NA
10 4432039                    <NA>   NA      NA   NA            NA

SATSA

Items that can contribute to generating values for the harmonized variable alcohol are:

dto[["metaData"]] %>%
  dplyr::filter(study_name == "satsa", construct == "alcohol") %>%
  dplyr::select(study_name, name, label_short,categories)
   study_name     name                                                label_short categories
1       satsa GALCOHOL                     Do you ever drink alcoholic beverages?          2
2       satsa   GBEERX              How much beer do you usually drink at a time?          7
3       satsa  GBOTVIN                                        <U+0085>more than 1 bottle          4
4       satsa  GDRLOTS                               How often more than 5 beers?          8
5       satsa  GEVRALK                  Do you ever drink alcoholic drinks? - Yes          3
6       satsa GFREQBER              How often do you drink beer (not light beer)?          9
7       satsa GFREQLIQ                How often do you usually drink hard liquor?          9
8       satsa GFREQVIN        How often do you usually drink wine (red or white)?          9
9       satsa    GLIQX         How much hard liquot do you usually drink at time?          8
10      satsa GSTOPALK Do you ever drink alcoholic drinks? -No I quit. When? 19__         32
11      satsa    GVINX              How much wine do you usually drink at a time?          6

We encode the harmonization rule by manually editing the values in a corresponding .csv file located in ./data/meta/h-rules/. Then, we apply the recoding logic it contains and append the newly created, harmonized variable to the initial data set.

study_name <- "satsa"
path_to_hrule <- "./data/meta/h-rules/h-rules-alcohol-satsa.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
  dto,
  study_name = study_name, 
  variable_names = c("GALCOHOL","GEVRALK","GBEERX","GLIQX" ,"GVINX" ), 
  harmony_name = "current_drink"
)
Source: local data frame [231 x 7]
Groups: GALCOHOL, GEVRALK, GBEERX, GLIQX, GVINX [?]

    GALCOHOL                                 GEVRALK           GBEERX                                     GLIQX
       (chr)                                   (chr)            (chr)                                     (chr)
1         No No, I have never drunk alcoholic drinks 1 bottle (33 cl) 4 cl (approx. a small shot or equivalent)
2         No No, I have never drunk alcoholic drinks 1 bottle (33 cl)                                        NA
3         No No, I have never drunk alcoholic drinks  1 glass or less                                        NA
4         No No, I have never drunk alcoholic drinks  1 glass or less                                        NA
5         No No, I have never drunk alcoholic drinks               NA 4 cl (approx. a small shot or equivalent)
6         No No, I have never drunk alcoholic drinks               NA 4 cl (approx. a small shot or equivalent)
7         No No, I have never drunk alcoholic drinks               NA                                        NA
8         No No, I have never drunk alcoholic drinks               NA                                        NA
9         No                             No, I quit. 1 bottle (33 cl) 4 cl (approx. a small shot or equivalent)
10        No                             No, I quit. 1 bottle (33 cl)                                        NA
11        No                             No, I quit.  1 glass or less 4 cl (approx. a small shot or equivalent)
12        No                             No, I quit.  1 glass or less                                        NA
13        No                             No, I quit.               NA                                      8 cl
14        No                             No, I quit.               NA                                        NA
15        No                             No, I quit.               NA                                        NA
16        No                             No, I quit.               NA                                        NA
17        No                                     Yes 1 bottle (33 cl) 4 cl (approx. a small shot or equivalent)
18        No                                     Yes  1 glass or less 4 cl (approx. a small shot or equivalent)
19        No                                     Yes  1 glass or less                                        NA
20        No                                     Yes  1 glass or less                                        NA
21        No                                     Yes               NA 4 cl (approx. a small shot or equivalent)
22        No                                     Yes               NA 4 cl (approx. a small shot or equivalent)
23        No                                     Yes               NA 4 cl (approx. a small shot or equivalent)
24        No                                     Yes               NA                                        NA
25        No                                     Yes               NA                                        NA
26        No                                     Yes               NA                                        NA
27        No                                      NA  1 glass or less 4 cl (approx. a small shot or equivalent)
28        No                                      NA  1 glass or less 4 cl (approx. a small shot or equivalent)
29        No                                      NA  1 glass or less                                        NA
30        No                                      NA               NA 4 cl (approx. a small shot or equivalent)
31        No                                      NA               NA 4 cl (approx. a small shot or equivalent)
32        No                                      NA               NA                                        NA
33        No                                      NA               NA                                        NA
34       Yes No, I have never drunk alcoholic drinks 1 bottle (33 cl)                                        NA
35       Yes No, I have never drunk alcoholic drinks  1 glass or less                                        NA
36       Yes No, I have never drunk alcoholic drinks  1 glass or less                                        NA
37       Yes No, I have never drunk alcoholic drinks  1 glass or less                                        NA
38       Yes No, I have never drunk alcoholic drinks        2 bottles 4 cl (approx. a small shot or equivalent)
39       Yes No, I have never drunk alcoholic drinks               NA 4 cl (approx. a small shot or equivalent)
40       Yes No, I have never drunk alcoholic drinks               NA                                        NA
41       Yes No, I have never drunk alcoholic drinks               NA                                        NA
42       Yes                             No, I quit. 1 bottle (33 cl)           6 cl (a big shot or equivalent)
43       Yes                             No, I quit. 1 bottle (33 cl)                                        NA
44       Yes                             No, I quit. 1 bottle (33 cl)                                        NA
45       Yes                             No, I quit.  1 glass or less                                        NA
46       Yes                             No, I quit.  1 glass or less                                        NA
47       Yes                                     Yes 1 bottle (33 cl)                                     12 cl
48       Yes                                     Yes 1 bottle (33 cl)                                     12 cl
49       Yes                                     Yes 1 bottle (33 cl)                                     12 cl
50       Yes                                     Yes 1 bottle (33 cl)                                     12 cl
51       Yes                                     Yes 1 bottle (33 cl)                                     12 cl
52       Yes                                     Yes 1 bottle (33 cl)                                     18 cl
53       Yes                                     Yes 1 bottle (33 cl)                                     18 cl
54       Yes                                     Yes 1 bottle (33 cl)                                     18 cl
55       Yes                                     Yes 1 bottle (33 cl)                                     18 cl
56       Yes                                     Yes 1 bottle (33 cl)                                     18 cl
57       Yes                                     Yes 1 bottle (33 cl)                     37 cl (half a bottle)
58       Yes                                     Yes 1 bottle (33 cl)                     37 cl (half a bottle)
59       Yes                                     Yes 1 bottle (33 cl)                     37 cl (half a bottle)
60       Yes                                     Yes 1 bottle (33 cl)                     37 cl (half a bottle)
61       Yes                                     Yes 1 bottle (33 cl)                     37 cl (half a bottle)
62       Yes                                     Yes 1 bottle (33 cl)                     37 cl (half a bottle)
63       Yes                                     Yes 1 bottle (33 cl) 4 cl (approx. a small shot or equivalent)
64       Yes                                     Yes 1 bottle (33 cl) 4 cl (approx. a small shot or equivalent)
65       Yes                                     Yes 1 bottle (33 cl) 4 cl (approx. a small shot or equivalent)
66       Yes                                     Yes 1 bottle (33 cl) 4 cl (approx. a small shot or equivalent)
67       Yes                                     Yes 1 bottle (33 cl)           6 cl (a big shot or equivalent)
68       Yes                                     Yes 1 bottle (33 cl)           6 cl (a big shot or equivalent)
69       Yes                                     Yes 1 bottle (33 cl)           6 cl (a big shot or equivalent)
70       Yes                                     Yes 1 bottle (33 cl)           6 cl (a big shot or equivalent)
71       Yes                                     Yes 1 bottle (33 cl)           6 cl (a big shot or equivalent)
72       Yes                                     Yes 1 bottle (33 cl)                    75 cl (1 whole bottle)
73       Yes                                     Yes 1 bottle (33 cl)                                      8 cl
74       Yes                                     Yes 1 bottle (33 cl)                                      8 cl
75       Yes                                     Yes 1 bottle (33 cl)                                      8 cl
76       Yes                                     Yes 1 bottle (33 cl)                                      8 cl
77       Yes                                     Yes 1 bottle (33 cl)                                      8 cl
78       Yes                                     Yes 1 bottle (33 cl)                                      8 cl
79       Yes                                     Yes 1 bottle (33 cl)                                        NA
80       Yes                                     Yes 1 bottle (33 cl)                                        NA
81       Yes                                     Yes 1 bottle (33 cl)                                        NA
82       Yes                                     Yes 1 bottle (33 cl)                                        NA
83       Yes                                     Yes 1 bottle (33 cl)                                        NA
84       Yes                                     Yes 1 bottle (33 cl)                                        NA
85       Yes                                     Yes  1 glass or less                                     12 cl
86       Yes                                     Yes  1 glass or less                                     12 cl
87       Yes                                     Yes  1 glass or less                                     12 cl
88       Yes                                     Yes  1 glass or less                                     12 cl
89       Yes                                     Yes  1 glass or less                                     18 cl
90       Yes                                     Yes  1 glass or less                                     18 cl
91       Yes                                     Yes  1 glass or less                                     18 cl
92       Yes                                     Yes  1 glass or less                                     18 cl
93       Yes                                     Yes  1 glass or less                     37 cl (half a bottle)
94       Yes                                     Yes  1 glass or less                     37 cl (half a bottle)
95       Yes                                     Yes  1 glass or less 4 cl (approx. a small shot or equivalent)
96       Yes                                     Yes  1 glass or less 4 cl (approx. a small shot or equivalent)
97       Yes                                     Yes  1 glass or less 4 cl (approx. a small shot or equivalent)
98       Yes                                     Yes  1 glass or less 4 cl (approx. a small shot or equivalent)
99       Yes                                     Yes  1 glass or less           6 cl (a big shot or equivalent)
100      Yes                                     Yes  1 glass or less           6 cl (a big shot or equivalent)
..       ...                                     ...              ...                                       ...
Variables not shown: GVINX (chr), current_drink (lgl), n (int)
# verify
dto[["unitData"]][["satsa"]] %>%
  dplyr::filter(id %in% sample(unique(id),10)) %>%
  dplyr::select_("id", "GALCOHOL","GEVRALK","GBEERX","GLIQX" ,"GVINX", "current_drink")
        id GALCOHOL                                 GEVRALK           GBEERX                                     GLIQX
1   133161      Yes                                     Yes  1 glass or less           6 cl (a big shot or equivalent)
2   146542       No                             No, I quit.             <NA>                                      <NA>
3   159831      Yes                             No, I quit.  1 glass or less                                      <NA>
4   174001       No                             No, I quit.             <NA>                                      <NA>
5   191272      Yes                                     Yes 1 bottle (33 cl)                                     12 cl
6   239602       No No, I have never drunk alcoholic drinks             <NA>                                      <NA>
7  2105692      Yes                             No, I quit. 1 bottle (33 cl)           6 cl (a big shot or equivalent)
8  2115422      Yes                                     Yes  1 glass or less 4 cl (approx. a small shot or equivalent)
9  2182371      Yes                                     Yes 1 bottle (33 cl) 4 cl (approx. a small shot or equivalent)
10 2225502      Yes                                     Yes 1 bottle (33 cl)                                     12 cl
                   GVINX current_drink
1   10 cl (1 wine glass)          TRUE
2                   <NA>         FALSE
3                   <NA>          TRUE
4                   <NA>         FALSE
5   10 cl (1 wine glass)          TRUE
6                   <NA>         FALSE
7  37 cl (half a bottle)          TRUE
8   10 cl (1 wine glass)          TRUE
9   10 cl (1 wine glass)          TRUE
10                 20 cl          TRUE

SHARE

Items that can contribute to generating values for the harmonized variable alcohol are:

dto[["metaData"]] %>%
  dplyr::filter(study_name == "share", construct == "alcohol") %>%
  dplyr::select(study_name, name, label_short,categories)
  study_name   name                            label_short categories
1      share BR0100       beverages consumed last 6 months          7
2      share BR0110 freq more than 2 glasses beer in a day          7
3      share BR0120 freq more than 2 glasses wine in a day          8
4      share BR0130  freq more than 2 hard liquor in a day          8

We encode the harmonization rule by manually editing the values in a corresponding .csv file located in ./data/meta/h-rules/. Then, we apply the recoding logic it contains and append the newly created, harmonized variable to the initial data set.

study_name <- "share"
path_to_hrule <- "./data/meta/h-rules/h-rules-alcohol-share.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
  dto,
  study_name = study_name, 
  variable_names = c("BR0100", "BR0110", "BR0120", "BR0130"), 
  harmony_name = "current_drink"
)
Source: local data frame [168 x 6]
Groups: BR0100, BR0110, BR0120, BR0130 [?]

                             BR0100                          BR0110                          BR0120
                              (chr)                           (chr)                           (chr)
1                  almost every day                almost every day                almost every day
2                  almost every day                almost every day          less than once a month
3                  almost every day                almost every day            once or twice a week
4                  almost every day                almost every day            once or twice a week
5                  almost every day         five or six days a week         five or six days a week
6                  almost every day         five or six days a week not at all in the last 6 months
7                  almost every day          less than once a month          less than once a month
8                  almost every day          less than once a month          less than once a month
9                  almost every day          less than once a month not at all in the last 6 months
10                 almost every day          less than once a month           once or twice a month
11                 almost every day          less than once a month           once or twice a month
12                 almost every day not at all in the last 6 months                almost every day
13                 almost every day not at all in the last 6 months                almost every day
14                 almost every day not at all in the last 6 months                almost every day
15                 almost every day not at all in the last 6 months          less than once a month
16                 almost every day not at all in the last 6 months          less than once a month
17                 almost every day not at all in the last 6 months not at all in the last 6 months
18                 almost every day not at all in the last 6 months not at all in the last 6 months
19                 almost every day not at all in the last 6 months not at all in the last 6 months
20                 almost every day not at all in the last 6 months not at all in the last 6 months
21                 almost every day not at all in the last 6 months           once or twice a month
22                 almost every day not at all in the last 6 months            once or twice a week
23                 almost every day not at all in the last 6 months            once or twice a week
24                 almost every day not at all in the last 6 months            once or twice a week
25                 almost every day not at all in the last 6 months            once or twice a week
26                 almost every day not at all in the last 6 months       three or four days a week
27                 almost every day           once or twice a month                almost every day
28                 almost every day           once or twice a month                almost every day
29                 almost every day           once or twice a month not at all in the last 6 months
30                 almost every day           once or twice a month            once or twice a week
31                 almost every day           once or twice a month            once or twice a week
32                 almost every day            once or twice a week                almost every day
33                 almost every day            once or twice a week                almost every day
34                 almost every day            once or twice a week           once or twice a month
35                 almost every day            once or twice a week            once or twice a week
36                 almost every day       three or four days a week          less than once a month
37                 almost every day       three or four days a week       three or four days a week
38          five or six days a week         five or six days a week not at all in the last 6 months
39          five or six days a week          less than once a month          less than once a month
40          five or six days a week          less than once a month           once or twice a month
41          five or six days a week          less than once a month       three or four days a week
42          five or six days a week not at all in the last 6 months not at all in the last 6 months
43          five or six days a week not at all in the last 6 months           once or twice a month
44          five or six days a week           once or twice a month          less than once a month
45          five or six days a week           once or twice a month not at all in the last 6 months
46           less than once a month          less than once a month          less than once a month
47           less than once a month          less than once a month          less than once a month
48           less than once a month          less than once a month not at all in the last 6 months
49           less than once a month          less than once a month           once or twice a month
50           less than once a month          less than once a month           once or twice a month
51           less than once a month not at all in the last 6 months          less than once a month
52           less than once a month not at all in the last 6 months          less than once a month
53           less than once a month not at all in the last 6 months          less than once a month
54           less than once a month not at all in the last 6 months not at all in the last 6 months
55           less than once a month not at all in the last 6 months not at all in the last 6 months
56           less than once a month not at all in the last 6 months           once or twice a month
57           less than once a month not at all in the last 6 months           once or twice a month
58           less than once a month not at all in the last 6 months           once or twice a month
59           less than once a month not at all in the last 6 months            once or twice a week
60           less than once a month           once or twice a month          less than once a month
61           less than once a month           once or twice a month          less than once a month
62           less than once a month           once or twice a month not at all in the last 6 months
63           less than once a month           once or twice a month           once or twice a month
64           less than once a month           once or twice a month           once or twice a month
65           less than once a month            once or twice a week not at all in the last 6 months
66  not at all in the last 6 months                              NA                              NA
67            once or twice a month                almost every day          less than once a month
68            once or twice a month          less than once a month          less than once a month
69            once or twice a month          less than once a month          less than once a month
70            once or twice a month          less than once a month          less than once a month
71            once or twice a month          less than once a month not at all in the last 6 months
72            once or twice a month          less than once a month           once or twice a month
73            once or twice a month          less than once a month           once or twice a month
74            once or twice a month          less than once a month            once or twice a week
75            once or twice a month not at all in the last 6 months                      don't know
76            once or twice a month not at all in the last 6 months          less than once a month
77            once or twice a month not at all in the last 6 months          less than once a month
78            once or twice a month not at all in the last 6 months not at all in the last 6 months
79            once or twice a month not at all in the last 6 months not at all in the last 6 months
80            once or twice a month not at all in the last 6 months not at all in the last 6 months
81            once or twice a month not at all in the last 6 months not at all in the last 6 months
82            once or twice a month not at all in the last 6 months           once or twice a month
83            once or twice a month not at all in the last 6 months           once or twice a month
84            once or twice a month not at all in the last 6 months            once or twice a week
85            once or twice a month not at all in the last 6 months            once or twice a week
86            once or twice a month           once or twice a month          less than once a month
87            once or twice a month           once or twice a month          less than once a month
88            once or twice a month           once or twice a month not at all in the last 6 months
89            once or twice a month           once or twice a month           once or twice a month
90            once or twice a month           once or twice a month           once or twice a month
91            once or twice a month           once or twice a month           once or twice a month
92            once or twice a month           once or twice a month            once or twice a week
93            once or twice a month           once or twice a month            once or twice a week
94            once or twice a month            once or twice a week           once or twice a month
95            once or twice a month       three or four days a week          less than once a month
96             once or twice a week          less than once a month          less than once a month
97             once or twice a week          less than once a month          less than once a month
98             once or twice a week          less than once a month          less than once a month
99             once or twice a week          less than once a month          less than once a month
100            once or twice a week          less than once a month not at all in the last 6 months
..                              ...                             ...                             ...
Variables not shown: BR0130 (chr), current_drink (lgl), n (int)
# verify
knitr::kable(dto[["unitData"]][["share"]] %>%
               dplyr::filter(id %in% sample(unique(id),10)) %>%
               dplyr::select_("id", "BR0100", "BR0110", "BR0120", "BR0130", "current_drink"))
id BR0100 BR0110 BR0120 BR0130 current_drink
2.505226e+12 not at all in the last 6 months NA NA NA FALSE
2.505226e+12 once or twice a week not at all in the last 6 months not at all in the last 6 months not at all in the last 6 months TRUE
2.505230e+12 once or twice a week not at all in the last 6 months not at all in the last 6 months not at all in the last 6 months TRUE
2.505244e+12 less than once a month less than once a month less than once a month not at all in the last 6 months TRUE
2.505248e+12 three or four days a week once or twice a week once or twice a week not at all in the last 6 months TRUE
2.505279e+12 not at all in the last 6 months NA NA NA FALSE
2.605210e+12 not at all in the last 6 months NA NA NA FALSE
2.605260e+12 not at all in the last 6 months NA NA NA FALSE
2.605283e+12 not at all in the last 6 months NA NA NA FALSE
2.705272e+12 not at all in the last 6 months NA NA NA FALSE

TILDA

Items that can contribute to generating values for the harmonized variable alcohol are:

dto[["metaData"]] %>%
  dplyr::filter(study_name == "tilda", construct == "alcohol") %>%
  dplyr::select(study_name, name, label_short,categories)
  study_name                 name                                  label_short categories
1      tilda  BEHALC.DRINKSPERDAY                      Standard drinks per day         35
2      tilda BEHALC.DRINKSPERWEEK                       Standard drinks a week        120
3      tilda     BEHALC.FREQ.WEEK              Average times drinking per week          7
4      tilda          SCQALCOFREQ                Frequency of drinking alcohol          7
5      tilda           SCQALCOHOL                             Alcoholic drinks          2
6      tilda           SCQALCONO1                       More than 2 drinks/day          7
7      tilda           SCQALCONO2 How many drinks consumed on days drink taken         19

We encode the harmonization rule by manually editing the values in a corresponding .csv file located in ./data/meta/h-rules/. Then, we apply the recoding logic it contains and append the newly created, harmonized variable to the initial data set.

study_name <- "tilda"
path_to_hrule <- "./data/meta/h-rules/h-rules-alcohol-tilda.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
  dto,
  study_name = study_name, 
  variable_names = c("SCQALCOHOL" ,"BEHALC.DRINKSPERDAY","BEHALC.DRINKSPERWEEK"), 
  harmony_name = "current_drink"
)
Source: local data frame [167 x 5]
Groups: SCQALCOHOL, BEHALC.DRINKSPERDAY, BEHALC.DRINKSPERWEEK [?]

    SCQALCOHOL BEHALC.DRINKSPERDAY BEHALC.DRINKSPERWEEK current_drink     n
         (chr)               (chr)                (chr)         (lgl) (int)
1           no                   0                    0         FALSE  1803
2           no                  NA                   NA         FALSE     9
3          yes                   0                    0          TRUE    29
4          yes                   0                   NA          TRUE     3
5          yes                 0.5                    0          TRUE     1
6          yes                 0.5        0.05999999866            NA     3
7          yes                 0.5                 0.75          TRUE     3
8          yes                 0.5                 1.75          TRUE     1
9          yes      0.699999988079        0.08399999887            NA     1
10         yes                   1                    0          TRUE    23
11         yes                   1        0.11999999732            NA   181
12         yes                   1        0.34999999404            NA   152
13         yes                   1                  1.5          TRUE   150
14         yes                   1                  3.5          TRUE    47
15         yes                   1                  5.5          TRUE    28
16         yes                   1                  6.5          TRUE    45
17         yes                   1                   NA          TRUE     5
18         yes                 1.5                    0          TRUE     1
19         yes                 1.5        0.17999999225            NA    19
20         yes                 1.5        0.52499997616            NA    24
21         yes                 1.5                 2.25          TRUE    53
22         yes                 1.5                 5.25          TRUE    23
23         yes                 1.5                 8.25          TRUE     6
24         yes                 1.5                 9.75          TRUE    12
25         yes                 1.5                   NA          TRUE     1
26         yes                  10        1.19999992848            NA     6
27         yes                  10                   15          TRUE    35
28         yes                  10                  3.5          TRUE     9
29         yes                  10                   35          TRUE     9
30         yes                  10                   65          TRUE     7
31         yes                  10                   NA          TRUE     1
32         yes                  11                 16.5          TRUE     2
33         yes                  11                 71.5          TRUE     1
34         yes                  12        1.43999993801            NA     5
35         yes                  12                   18          TRUE    10
36         yes                  12        4.19999980927            NA     5
37         yes                  12                   42          TRUE     4
38         yes                  12                   NA          TRUE     4
39         yes                  13                   NA          TRUE     1
40         yes                  14        1.67999994755            NA     3
41         yes                  14                   21          TRUE     2
42         yes                  15                 22.5          TRUE     1
43         yes                  15                 5.25          TRUE     1
44         yes                  15                 52.5          TRUE     2
45         yes                  15                 82.5          TRUE     1
46         yes                  15                 97.5          TRUE     1
47         yes                  16                   24          TRUE     3
48         yes                  16                   56          TRUE     2
49         yes                  16                   88          TRUE     1
50         yes                  16                   NA          TRUE     2
51         yes                  18                   27          TRUE     2
52         yes                   2                    0          TRUE    20
53         yes                   2        0.23999999464            NA   170
54         yes                   2        0.69999998808            NA   223
55         yes                   2                   11          TRUE    52
56         yes                   2                   13          TRUE    94
57         yes                   2                    3          TRUE   445
58         yes                   2                    7          TRUE   186
59         yes                   2                   NA          TRUE    21
60         yes                 2.5                    0          TRUE     2
61         yes                 2.5        0.29999998212            NA    11
62         yes                 2.5                0.875          TRUE    26
63         yes                 2.5                13.75          TRUE    23
64         yes                 2.5                16.25          TRUE    22
65         yes                 2.5                 3.75          TRUE    77
66         yes                 2.5                 8.75          TRUE    38
67         yes                  20                    0          TRUE     1
68         yes                  20                  110          TRUE     1
69         yes                  20                  130          TRUE     1
70         yes                  20                   30          TRUE     7
71         yes                  20                    7          TRUE     3
72         yes                  20                   70          TRUE     1
73         yes                  20                   NA          TRUE     1
74         yes                  21        7.34999990464            NA     1
75         yes                  22                   77          TRUE     1
76         yes                  24                   36          TRUE     1
77         yes                  24                   84          TRUE     3
78         yes                  25                 37.5          TRUE     1
79         yes                   3                    0          TRUE     8
80         yes                   3         0.3599999845            NA    76
81         yes                   3        1.04999995232            NA   118
82         yes                   3                 10.5          TRUE   152
83         yes                   3                 16.5          TRUE    55
84         yes                   3                 19.5          TRUE    82
85         yes                   3                  4.5          TRUE   321
86         yes                   3                   NA          TRUE    11
87         yes                 3.5        0.41999998689            NA     6
88         yes                 3.5        1.22500002384            NA    12
89         yes                 3.5                12.25          TRUE    33
90         yes                 3.5                19.25          TRUE    10
91         yes                 3.5                22.75          TRUE    13
92         yes                 3.5                 5.25          TRUE    77
93         yes                 3.5                   NA          TRUE     1
94         yes                  30                 10.5          TRUE     1
95         yes                  30                  105          TRUE     2
96         yes                  30                   45          TRUE     1
97         yes                  30                   NA          TRUE     1
98         yes                  35                227.5          TRUE     1
99         yes                   4                    0          TRUE     4
100        yes                   4        0.47999998927            NA    59
..         ...                 ...                  ...           ...   ...
# verify
dto[["unitData"]][["tilda"]] %>%
  dplyr::filter(id %in% sample(unique(id),10)) %>%
  dplyr::select_("id","SCQALCOHOL" ,"BEHALC.DRINKSPERDAY","BEHALC.DRINKSPERWEEK","current_drink")
                   id SCQALCOHOL BEHALC.DRINKSPERDAY BEHALC.DRINKSPERWEEK current_drink
1  31381                     yes                   2                 0.24            NA
2  236321                     no                   0                 0.00         FALSE
3  314731                    yes                   2                13.00          TRUE
4  399031                    yes                   4                14.00          TRUE
5  500841                    yes                   2                 3.00          TRUE
6  520251                    yes                   2                13.00          TRUE
7  526891                     no                   0                 0.00         FALSE
8  545011                    yes                  12                 1.44            NA
9  578751                    yes                  NA                   NA          TRUE
10 627252                    yes                   0                 0.00          TRUE

(III) Recapitulation

At this point the dto[["unitData"]] elements (raw data files for each study) have been augmented with the harmonized variable alcohol. We retrieve harmonized variables to view frequency counts across studies:

dumlist <- list()
for(s in dto[["studyName"]]){
  ds <- dto[["unitData"]][[s]]
  dumlist[[s]] <- ds[,c("id","current_drink")]
}
ds <- plyr::ldply(dumlist,data.frame,.id = "study_name")
head(ds)
  study_name  id current_drink
1       alsa  41          TRUE
2       alsa  42          TRUE
3       alsa  61          TRUE
4       alsa  71          TRUE
5       alsa  91          TRUE
6       alsa 121          TRUE
ds$id <- 1:nrow(ds) # some ids values might be identical, replace
table( ds$current_drink, ds$study_name, useNA="always")
       
        alsa lbsl satsa share tilda <NA>
  FALSE  774  185   537  1855  1812    0
  TRUE  1293  378   934   739  4048    0
  <NA>    20   93    26     4  2644    0

Finally, we have added the newly created, harmonized variables to the raw source objects and save the data transfer object.

# Save as a compress, binary R dataset.  It's no longer readable with a text editor, but it saves metadata (eg, factor information).
saveRDS(dto, file="./data/unshared/derived/dto.rds", compress="xz")