This report lists the candidate variable for DataScheme variables of the construct alcohol.
This report is a record of interaction with a data transfer object (dto) produced by
./manipulation/0-ellis-island.R
.
The next section recaps this script, exposes the architecture of the DTO, and demonstrates the language of interacting with it.
All data land on Ellis Island.
The script 0-ellis-island.R
is the first script in the analytic workflow. It accomplished the following:
./data/shared/derived/meta-data-live.csv
, which is updated every time Ellis Island script is executed../data/shared/meta-data-map.csv
. They are used by automatic scripts in later harmonization and analysis.# load the product of 0-ellis-island.R, a list object containing data and metadata
dto <- readRDS("./data/unshared/derived/dto.rds")
# the list is composed of the following elements
names(dto)
[1] "studyName" "filePath" "unitData" "metaData"
# 1st element - names of the studies as character vector
dto[["studyName"]]
[1] "alsa" "lbsl" "satsa" "share" "tilda"
# 2nd element - file paths of the data files for each study as character vector
dto[["filePath"]]
[1] "./data/unshared/raw/ALSA-Wave1.Final.sav" "./data/unshared/raw/LBSL-Panel2-Wave1.Final.sav"
[3] "./data/unshared/raw/SATSA-Q3.Final.sav" "./data/unshared/raw/SHARE-Israel-Wave1.Final.sav"
[5] "./data/unshared/raw/TILDA-Wave1.Final.sav"
# 3rd element - is a list object containing the following elements
names(dto[["unitData"]])
[1] "alsa" "lbsl" "satsa" "share" "tilda"
# each of these elements is a raw data set of a corresponding study, for example
dplyr::tbl_df(dto[["unitData"]][["lbsl"]])
Source: local data frame [656 x 35]
id AGE94 SEX94 MSTAT94 EDUC94 NOWRK94 SMK94 SMOKE
(int) (int) (int) (fctr) (int) (fctr) (fctr) (fctr)
1 4001026 68 1 divorced 16 no, retired no never smoked
2 4012015 94 2 widowed 12 no, retired no never smoked
3 4012032 94 2 widowed 20 no, retired no don't smoke at present but smoked in the past
4 4022004 93 2 NA NA NA NA never smoked
5 4022026 93 2 widowed 12 no, retired no never smoked
6 4031031 92 1 married 8 no, retired no don't smoke at present but smoked in the past
7 4031035 92 1 widowed 13 no, retired no don't smoke at present but smoked in the past
8 4032201 92 2 NA NA NA NA don't smoke at present but smoked in the past
9 4041062 91 1 widowed 7 NA no don't smoke at present but smoked in the past
10 4042057 91 2 NA NA NA NA NA
.. ... ... ... ... ... ... ... ...
Variables not shown: ALCOHOL (fctr), WINE (int), BEER (int), HARDLIQ (int), SPORT94 (int), FIT94 (int), WALK94 (int),
SPEC94 (int), DANCE94 (int), CHORE94 (int), EXCERTOT (int), EXCERWK (int), HEIGHT94 (int), WEIGHT94 (int), HWEIGHT
(int), HHEIGHT (int), SRHEALTH (fctr), smoke_now (lgl), smoked_ever (lgl), year_of_wave (dbl), age_in_years (dbl),
year_born (dbl), female (lgl), marital (chr), single (lgl), educ3 (chr), current_work_2 (lgl)
# 4th element - a dataset names and labels of raw variables + added metadata for all studies
dto[["metaData"]] %>% dplyr::select(study_name, name, item, construct, type, categories, label_short, label) %>%
DT::datatable(
class = 'cell-border stripe',
caption = "This is the primary metadata file. Edit at `./data/shared/meta-data-map.csv",
filter = "top",
options = list(pageLength = 6, autoWidth = TRUE)
)
Everybody wants to be somebody.
We query metadata set to retrieve all variables potentially tapping the construct alcohol
. These are the candidates to enter the DataSchema and contribute to computing harmonized variables.
NOTE: what is being retrieved depends on the manually entered values in the column construct
of the metadata file ./data/shared/meta-data-map.csv
. To specify a different group of variables, edit the metadata, not the script.
meta_data <- dto[["metaData"]] %>%
dplyr::filter(construct %in% c('alcohol')) %>%
dplyr::select(study_name, name, construct, label_short, categories, url) %>%
dplyr::arrange(construct, study_name)
knitr::kable(meta_data)
study_name | name | construct | label_short | categories | url |
---|---|---|---|---|---|
alsa | FR6ORMOR | alcohol | Frequency six or more drinks | 5 | |
alsa | NOSTDRNK | alcohol | Number of standard drinks | 5 | |
alsa | FREQALCH | alcohol | Frequency alcohol | 5 | |
lbsl | ALCOHOL | alcohol | Alcohol use | 7 | |
lbsl | WINE | alcohol | Number of glasses of wine last week | 17 | |
lbsl | BEER | alcohol | Number of cans/bottles of beer last week | 16 | |
lbsl | HARDLIQ | alcohol | Number of drinks containing hard liquor last week | 15 | |
satsa | GALCOHOL | alcohol | Do you ever drink alcoholic beverages? | 2 | |
satsa | GBEERX | alcohol | How much beer do you usually drink at a time? | 7 | |
satsa | GBOTVIN | alcohol | …more than 1 bottle | 4 | |
satsa | GDRLOTS | alcohol | How often more than 5 beers? | 8 | |
satsa | GEVRALK | alcohol | Do you ever drink alcoholic drinks? - Yes | 3 | |
satsa | GFREQBER | alcohol | How often do you drink beer (not light beer)? | 9 | |
satsa | GFREQLIQ | alcohol | How often do you usually drink hard liquor? | 9 | |
satsa | GFREQVIN | alcohol | How often do you usually drink wine (red or white)? | 9 | |
satsa | GLIQX | alcohol | How much hard liquot do you usually drink at time? | 8 | |
satsa | GSTOPALK | alcohol | Do you ever drink alcoholic drinks? -No I quit. When? 19__ | 32 | |
satsa | GVINX | alcohol | How much wine do you usually drink at a time? | 6 | |
share | BR0100 | alcohol | beverages consumed last 6 months | 7 | |
share | BR0110 | alcohol | freq more than 2 glasses beer in a day | 7 | |
share | BR0120 | alcohol | freq more than 2 glasses wine in a day | 8 | |
share | BR0130 | alcohol | freq more than 2 hard liquor in a day | 8 | |
tilda | BEHALC.DRINKSPERDAY | alcohol | Standard drinks per day | 35 | |
tilda | BEHALC.DRINKSPERWEEK | alcohol | Standard drinks a week | 120 | |
tilda | BEHALC.FREQ.WEEK | alcohol | Average times drinking per week | 7 | |
tilda | SCQALCOFREQ | alcohol | Frequency of drinking alcohol | 7 | |
tilda | SCQALCOHOL | alcohol | Alcoholic drinks | 2 | |
tilda | SCQALCONO1 | alcohol | More than 2 drinks/day | 7 | |
tilda | SCQALCONO2 | alcohol | How many drinks consumed on days drink taken | 19 |
View descriptives : alcohol for closer examination of each candidate.
After reviewing descriptives and relevant codebooks, the following operationalization of the harmonized variables for alcohol
have been adopted:
current_drink
0
- FALSE
healthy choice - REFERENCE group1
- TRUE
risk factorThese variables will be generated next, in the Development section.
The particulare goal of this section is to ensure that the schema to encode the values for the alcohol
variable is consisten across studies.
In this section we will define the schema sets for harmonizing alcohol
construct (i.e. specify which variables from which studies will be contributing to computing harmonized variables ). Each of these schema sets will have a particular pattern of possible response values to these variables, which we will export for inspection as .csv
tables. We then will manually edit these .csv
tables, populating new columns that will map values of harmonized variables to the specific response pattern of the schema set variables. We then will import harmonization algorithms encoded in .csv
tables and apply them to compute harmonized variables in the dataset combining raw and harmonized variables for alcohol
construct across studies.
Having all potential variables in categorical format we have defined the sets of data schema variables thus:
schema_sets <- list(
"alsa" = c("FREQALCH","NOSTDRNK","FR6ORMOR"),
"lbsl" = c("ALCOHOL", "BEER","HARDLIQ","WINE"),
"satsa" = c("GALCOHOL","GEVRALK","GBEERX","GLIQX","GVINX" ),
"share" = c("BR0100","BR0110", "BR0120","BR0130"),
"tilda" = c("SCQALCOHOL","BEHALC.DRINKSPERDAY","BEHALC.DRINKSPERWEEK")
)
Each of these schema sets have a particular pattern of possible response values, for example:
# view the profile of responses
dto[["unitData"]][["alsa"]] %>%
dplyr::group_by_("FREQALCH","NOSTDRNK","FR6ORMOR") %>%
dplyr::summarize(count = n())
Source: local data frame [51 x 4]
Groups: FREQALCH, NOSTDRNK [?]
FREQALCH NOSTDRNK FR6ORMOR count
(fctr) (fctr) (fctr) (int)
1 Never NA NA 774
2 Monthly or less One or two Never 337
3 Monthly or less One or two Less than monthly 6
4 Monthly or less Three or four Never 18
5 Monthly or less Three or four Less than monthly 3
6 Monthly or less Five or six Less than monthly 2
7 Monthly or less Seven to nine Less than monthly 1
8 Monthly or less NA NA 1
9 Two to four times a month One or two Never 132
10 Two to four times a month One or two Less than monthly 4
.. ... ... ... ...
We output these tables into self-standing .csv
files, so we can manually provide the logic of computing harmonized variables.
# define function to extract profiles
response_profile <- function(dto, h_target, study, varnames_values){
ds <- dto[["unitData"]][[study]]
varnames_values <- lapply(varnames_values, as.symbol) # Convert character vector to list of symbols
d <- ds %>%
dplyr::group_by_(.dots=varnames_values) %>%
dplyr::summarize(count = n())
write.csv(d,paste0("./data/meta/response-profiles-live/",h_target,"-",study,".csv"))
}
# extract response profile for data schema set from each study
for(s in names(schema_sets)){
response_profile(dto,
study = s,
h_target = 'alcohol',
varnames_values = schema_sets[[s]]
)
}
You can examine them in `./data/meta/response-profiles-live/
current_drink
current_drink
0
- FALSE
healthy choice1
- TRUE
risk factorItems that can contribute to generating values for the harmonized variable alcohol
are:
dto[["metaData"]] %>%
dplyr::filter(study_name=="alsa", construct %in% c("alcohol")) %>%
dplyr::select(study_name, name, label,categories)
study_name name label categories
1 alsa FR6ORMOR Frequency six or more drinks 5
2 alsa NOSTDRNK Number of standard drinks 5
3 alsa FREQALCH Frequency alcohol 5
We encode the harmonization rule by manually editing the values in a corresponding .csv
file located in ./data/meta/h-rules/
. Then, we apply the recoding logic it contains and append the newly created, harmonized variable to the initial data set.
study_name <- "alsa"
path_to_hrule <- "./data/meta/h-rules/h-rules-alcohol-alsa.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
dto,
study_name = study_name,
variable_names = c("FREQALCH", "NOSTDRNK", "FR6ORMOR"),
harmony_name = "current_drink"
)
Source: local data frame [51 x 5]
Groups: FREQALCH, NOSTDRNK, FR6ORMOR [?]
FREQALCH NOSTDRNK FR6ORMOR current_drink n
(chr) (chr) (chr) (lgl) (int)
1 Four or more times a week Five or six Daily or almost daily TRUE 12
2 Four or more times a week Five or six Less than monthly TRUE 2
3 Four or more times a week Five or six Monthly TRUE 5
4 Four or more times a week Five or six Never TRUE 6
5 Four or more times a week Five or six Weekly TRUE 6
6 Four or more times a week One or two Less than monthly TRUE 49
7 Four or more times a week One or two Monthly TRUE 7
8 Four or more times a week One or two Never TRUE 324
9 Four or more times a week One or two Weekly TRUE 3
10 Four or more times a week Seven to nine Daily or almost daily TRUE 8
11 Four or more times a week Seven to nine Monthly TRUE 2
12 Four or more times a week Seven to nine Never TRUE 1
13 Four or more times a week Ten or more Daily or almost daily TRUE 1
14 Four or more times a week Three or four Daily or almost daily TRUE 1
15 Four or more times a week Three or four Less than monthly TRUE 35
16 Four or more times a week Three or four Monthly TRUE 15
17 Four or more times a week Three or four Never TRUE 52
18 Four or more times a week Three or four Weekly TRUE 15
19 Monthly or less Five or six Less than monthly TRUE 2
20 Monthly or less One or two Less than monthly TRUE 6
21 Monthly or less One or two Never TRUE 337
22 Monthly or less Seven to nine Less than monthly TRUE 1
23 Monthly or less Three or four Less than monthly TRUE 3
24 Monthly or less Three or four Never TRUE 18
25 Monthly or less NA NA TRUE 1
26 Never NA NA FALSE 774
27 Two to four times a month Five or six Less than monthly TRUE 1
28 Two to four times a month Five or six Never TRUE 2
29 Two to four times a month Five or six Weekly TRUE 1
30 Two to four times a month One or two Less than monthly TRUE 4
31 Two to four times a month One or two Monthly TRUE 2
32 Two to four times a month One or two Never TRUE 132
33 Two to four times a month Seven to nine Monthly TRUE 1
34 Two to four times a month Seven to nine Weekly TRUE 1
35 Two to four times a month Three or four Less than monthly TRUE 5
36 Two to four times a month Three or four Never TRUE 18
37 Two to three times a week Five or six Daily or almost daily TRUE 1
38 Two to three times a week Five or six Less than monthly TRUE 3
39 Two to three times a week Five or six Monthly TRUE 1
40 Two to three times a week Five or six Never TRUE 1
41 Two to three times a week Five or six Weekly TRUE 3
42 Two to three times a week One or two Less than monthly TRUE 14
43 Two to three times a week One or two Monthly TRUE 6
44 Two to three times a week One or two Never TRUE 149
45 Two to three times a week Seven to nine Never TRUE 1
46 Two to three times a week Seven to nine Weekly TRUE 1
47 Two to three times a week Ten or more Weekly TRUE 1
48 Two to three times a week Three or four Less than monthly TRUE 9
49 Two to three times a week Three or four Never TRUE 23
50 Two to three times a week Three or four Weekly TRUE 1
51 NA NA NA NA 20
# verify
dto[["unitData"]][["alsa"]] %>%
dplyr::filter(id %in% sample(unique(id),10)) %>%
dplyr::select_("id", "FREQALCH", "NOSTDRNK", "FR6ORMOR", "current_drink")
id FREQALCH NOSTDRNK FR6ORMOR current_drink
1 1331 Four or more times a week Three or four Less than monthly TRUE
2 5101 Never <NA> <NA> FALSE
3 5302 Never <NA> <NA> FALSE
4 9591 Two to four times a month One or two Never TRUE
5 12061 Never <NA> <NA> FALSE
6 12751 Two to three times a week One or two Never TRUE
7 14541 Two to three times a week One or two Never TRUE
8 24902 Never <NA> <NA> FALSE
9 35371 Four or more times a week Three or four Never TRUE
10 36351 Monthly or less One or two Never TRUE
Items that can contribute to generating values for the harmonized variable alcohol
are:
dto[["metaData"]] %>%
dplyr::filter(study_name == "lbsl", construct == "alcohol") %>%
dplyr::select(study_name, name, label_short,categories)
study_name name label_short categories
1 lbsl ALCOHOL Alcohol use 7
2 lbsl WINE Number of glasses of wine last week 17
3 lbsl BEER Number of cans/bottles of beer last week 16
4 lbsl HARDLIQ Number of drinks containing hard liquor last week 15
We encode the harmonization rule by manually editing the values in a corresponding .csv
file located in ./data/meta/h-rules/
. Then, we apply the recoding logic it contains and append the newly created, harmonized variable to the initial data set.
study_name <- "lbsl"
path_to_hrule <- "./data/meta/h-rules/h-rules-alcohol-lbsl.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
dto,
study_name = study_name,
variable_names = c("ALCOHOL", "BEER" , "HARDLIQ", "WINE" ),
harmony_name = "current_drink"
)
Source: local data frame [174 x 6]
Groups: ALCOHOL, BEER, HARDLIQ, WINE [?]
ALCOHOL BEER HARDLIQ WINE current_drink n
(chr) (chr) (chr) (chr) (lgl) (int)
1 daily or almost daily 0 0 10 TRUE 1
2 daily or almost daily 0 0 14 TRUE 1
3 daily or almost daily 0 0 15 TRUE 2
4 daily or almost daily 0 0 6 TRUE 1
5 daily or almost daily 0 0 7 TRUE 4
6 daily or almost daily 0 0 8 TRUE 1
7 daily or almost daily 0 0 9 TRUE 1
8 daily or almost daily 0 1 10 TRUE 1
9 daily or almost daily 0 1 5 TRUE 1
10 daily or almost daily 0 1 6 TRUE 1
11 daily or almost daily 0 14 0 TRUE 2
12 daily or almost daily 0 14 1 TRUE 1
13 daily or almost daily 0 14 10 TRUE 1
14 daily or almost daily 0 2 10 TRUE 1
15 daily or almost daily 0 2 14 TRUE 3
16 daily or almost daily 0 2 15 TRUE 1
17 daily or almost daily 0 2 5 TRUE 1
18 daily or almost daily 0 25 0 TRUE 1
19 daily or almost daily 0 4 12 TRUE 1
20 daily or almost daily 0 4 4 TRUE 1
21 daily or almost daily 0 4 5 TRUE 1
22 daily or almost daily 0 5 0 TRUE 2
23 daily or almost daily 0 6 3 TRUE 1
24 daily or almost daily 0 7 0 TRUE 3
25 daily or almost daily 0 7 3 TRUE 1
26 daily or almost daily 0 7 5 TRUE 1
27 daily or almost daily 0 7 7 TRUE 2
28 daily or almost daily 0 8 0 TRUE 1
29 daily or almost daily 1 14 0 TRUE 1
30 daily or almost daily 1 14 12 TRUE 1
31 daily or almost daily 1 14 2 TRUE 1
32 daily or almost daily 1 2 15 TRUE 1
33 daily or almost daily 1 2 6 TRUE 1
34 daily or almost daily 10 0 0 TRUE 2
35 daily or almost daily 10 0 1 TRUE 1
36 daily or almost daily 10 2 0 TRUE 1
37 daily or almost daily 10 5 NA TRUE 1
38 daily or almost daily 12 0 6 TRUE 2
39 daily or almost daily 14 3 7 TRUE 1
40 daily or almost daily 2 10 0 TRUE 1
41 daily or almost daily 2 14 10 TRUE 1
42 daily or almost daily 2 7 3 TRUE 1
43 daily or almost daily 25 3 0 TRUE 1
44 daily or almost daily 3 0 12 TRUE 1
45 daily or almost daily 3 0 7 TRUE 1
46 daily or almost daily 3 6 0 TRUE 1
47 daily or almost daily 30 0 0 TRUE 1
48 daily or almost daily 4 12 0 TRUE 1
49 daily or almost daily 4 3 2 TRUE 1
50 daily or almost daily 5 6 0 TRUE 1
51 daily or almost daily 6 0 0 TRUE 1
52 daily or almost daily 6 3 2 TRUE 1
53 daily or almost daily 7 0 0 TRUE 1
54 daily or almost daily 7 0 2 TRUE 1
55 daily or almost daily 7 2 0 TRUE 1
56 daily or almost daily 7 2 3 TRUE 1
57 daily or almost daily 7 7 NA TRUE 1
58 daily or almost daily 8 0 0 TRUE 1
59 daily or almost daily 8 0 8 TRUE 1
60 daily or almost daily 9 0 0 TRUE 1
61 daily or almost daily NA 14 NA TRUE 1
62 daily or almost daily NA 15 3 TRUE 1
63 daily or almost daily NA 21 NA TRUE 1
64 daily or almost daily NA 7 7 TRUE 1
65 daily or almost daily NA 7 NA TRUE 3
66 daily or almost daily NA NA 21 TRUE 1
67 daily or almost daily NA NA 7 TRUE 1
68 few times a year 0 0 0 TRUE 69
69 few times a year 0 0 1 TRUE 10
70 few times a year 0 0 2 TRUE 2
71 few times a year 0 1 0 TRUE 1
72 few times a year 1 0 0 TRUE 5
73 few times a year 1 0 1 TRUE 1
74 few times a year 1 0 2 TRUE 2
75 few times a year 1 NA 1 TRUE 1
76 few times a year 1 NA NA TRUE 1
77 few times a year 2 0 0 TRUE 3
78 few times a year 2 0 1 TRUE 2
79 few times a year 2 0 5 TRUE 1
80 few times a year 3 0 0 TRUE 1
81 few times a year 9 9 9 TRUE 1
82 few times a year NA NA NA TRUE 43
83 never drank 0 0 0 FALSE 10
84 never drank NA NA NA FALSE 82
85 not in last year 0 0 0 FALSE 13
86 not in last year 1 0 0 FALSE 1
87 not in last year NA NA NA FALSE 78
88 once a week 0 0 0 TRUE 5
89 once a week 0 0 1 TRUE 2
90 once a week 0 0 2 TRUE 5
91 once a week 0 0 3 TRUE 1
92 once a week 0 0 4 TRUE 1
93 once a week 0 0 5 TRUE 1
94 once a week 0 0 6 TRUE 1
95 once a week 0 1 0 TRUE 2
96 once a week 0 1 1 TRUE 3
97 once a week 0 1 3 TRUE 1
98 once a week 0 2 0 TRUE 4
99 once a week 0 2 1 TRUE 1
100 once a week 0 4 1 TRUE 1
.. ... ... ... ... ... ...
# verify
dto[["unitData"]][["lbsl"]] %>%
dplyr::filter(id %in% sample(unique(id),10)) %>%
dplyr::select_("id", "ALCOHOL", "BEER" , "HARDLIQ", "WINE", "current_drink")
id ALCOHOL BEER HARDLIQ WINE current_drink
1 4042082 daily or almost daily 0 0 7 TRUE
2 4132042 never drank NA NA NA FALSE
3 4141201 not in last year 0 0 0 FALSE
4 4152089 few times a year 0 0 0 TRUE
5 4161060 once or twice per month 0 0 1 TRUE
6 4202081 not in last year NA NA NA FALSE
7 4212011 few times a year NA NA NA TRUE
8 4241078 not in last year NA NA NA FALSE
9 4261199 <NA> NA NA NA NA
10 4432039 <NA> NA NA NA NA
Items that can contribute to generating values for the harmonized variable alcohol
are:
dto[["metaData"]] %>%
dplyr::filter(study_name == "satsa", construct == "alcohol") %>%
dplyr::select(study_name, name, label_short,categories)
study_name name label_short categories
1 satsa GALCOHOL Do you ever drink alcoholic beverages? 2
2 satsa GBEERX How much beer do you usually drink at a time? 7
3 satsa GBOTVIN <U+0085>more than 1 bottle 4
4 satsa GDRLOTS How often more than 5 beers? 8
5 satsa GEVRALK Do you ever drink alcoholic drinks? - Yes 3
6 satsa GFREQBER How often do you drink beer (not light beer)? 9
7 satsa GFREQLIQ How often do you usually drink hard liquor? 9
8 satsa GFREQVIN How often do you usually drink wine (red or white)? 9
9 satsa GLIQX How much hard liquot do you usually drink at time? 8
10 satsa GSTOPALK Do you ever drink alcoholic drinks? -No I quit. When? 19__ 32
11 satsa GVINX How much wine do you usually drink at a time? 6
We encode the harmonization rule by manually editing the values in a corresponding .csv
file located in ./data/meta/h-rules/
. Then, we apply the recoding logic it contains and append the newly created, harmonized variable to the initial data set.
study_name <- "satsa"
path_to_hrule <- "./data/meta/h-rules/h-rules-alcohol-satsa.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
dto,
study_name = study_name,
variable_names = c("GALCOHOL","GEVRALK","GBEERX","GLIQX" ,"GVINX" ),
harmony_name = "current_drink"
)
Source: local data frame [231 x 7]
Groups: GALCOHOL, GEVRALK, GBEERX, GLIQX, GVINX [?]
GALCOHOL GEVRALK GBEERX GLIQX
(chr) (chr) (chr) (chr)
1 No No, I have never drunk alcoholic drinks 1 bottle (33 cl) 4 cl (approx. a small shot or equivalent)
2 No No, I have never drunk alcoholic drinks 1 bottle (33 cl) NA
3 No No, I have never drunk alcoholic drinks 1 glass or less NA
4 No No, I have never drunk alcoholic drinks 1 glass or less NA
5 No No, I have never drunk alcoholic drinks NA 4 cl (approx. a small shot or equivalent)
6 No No, I have never drunk alcoholic drinks NA 4 cl (approx. a small shot or equivalent)
7 No No, I have never drunk alcoholic drinks NA NA
8 No No, I have never drunk alcoholic drinks NA NA
9 No No, I quit. 1 bottle (33 cl) 4 cl (approx. a small shot or equivalent)
10 No No, I quit. 1 bottle (33 cl) NA
11 No No, I quit. 1 glass or less 4 cl (approx. a small shot or equivalent)
12 No No, I quit. 1 glass or less NA
13 No No, I quit. NA 8 cl
14 No No, I quit. NA NA
15 No No, I quit. NA NA
16 No No, I quit. NA NA
17 No Yes 1 bottle (33 cl) 4 cl (approx. a small shot or equivalent)
18 No Yes 1 glass or less 4 cl (approx. a small shot or equivalent)
19 No Yes 1 glass or less NA
20 No Yes 1 glass or less NA
21 No Yes NA 4 cl (approx. a small shot or equivalent)
22 No Yes NA 4 cl (approx. a small shot or equivalent)
23 No Yes NA 4 cl (approx. a small shot or equivalent)
24 No Yes NA NA
25 No Yes NA NA
26 No Yes NA NA
27 No NA 1 glass or less 4 cl (approx. a small shot or equivalent)
28 No NA 1 glass or less 4 cl (approx. a small shot or equivalent)
29 No NA 1 glass or less NA
30 No NA NA 4 cl (approx. a small shot or equivalent)
31 No NA NA 4 cl (approx. a small shot or equivalent)
32 No NA NA NA
33 No NA NA NA
34 Yes No, I have never drunk alcoholic drinks 1 bottle (33 cl) NA
35 Yes No, I have never drunk alcoholic drinks 1 glass or less NA
36 Yes No, I have never drunk alcoholic drinks 1 glass or less NA
37 Yes No, I have never drunk alcoholic drinks 1 glass or less NA
38 Yes No, I have never drunk alcoholic drinks 2 bottles 4 cl (approx. a small shot or equivalent)
39 Yes No, I have never drunk alcoholic drinks NA 4 cl (approx. a small shot or equivalent)
40 Yes No, I have never drunk alcoholic drinks NA NA
41 Yes No, I have never drunk alcoholic drinks NA NA
42 Yes No, I quit. 1 bottle (33 cl) 6 cl (a big shot or equivalent)
43 Yes No, I quit. 1 bottle (33 cl) NA
44 Yes No, I quit. 1 bottle (33 cl) NA
45 Yes No, I quit. 1 glass or less NA
46 Yes No, I quit. 1 glass or less NA
47 Yes Yes 1 bottle (33 cl) 12 cl
48 Yes Yes 1 bottle (33 cl) 12 cl
49 Yes Yes 1 bottle (33 cl) 12 cl
50 Yes Yes 1 bottle (33 cl) 12 cl
51 Yes Yes 1 bottle (33 cl) 12 cl
52 Yes Yes 1 bottle (33 cl) 18 cl
53 Yes Yes 1 bottle (33 cl) 18 cl
54 Yes Yes 1 bottle (33 cl) 18 cl
55 Yes Yes 1 bottle (33 cl) 18 cl
56 Yes Yes 1 bottle (33 cl) 18 cl
57 Yes Yes 1 bottle (33 cl) 37 cl (half a bottle)
58 Yes Yes 1 bottle (33 cl) 37 cl (half a bottle)
59 Yes Yes 1 bottle (33 cl) 37 cl (half a bottle)
60 Yes Yes 1 bottle (33 cl) 37 cl (half a bottle)
61 Yes Yes 1 bottle (33 cl) 37 cl (half a bottle)
62 Yes Yes 1 bottle (33 cl) 37 cl (half a bottle)
63 Yes Yes 1 bottle (33 cl) 4 cl (approx. a small shot or equivalent)
64 Yes Yes 1 bottle (33 cl) 4 cl (approx. a small shot or equivalent)
65 Yes Yes 1 bottle (33 cl) 4 cl (approx. a small shot or equivalent)
66 Yes Yes 1 bottle (33 cl) 4 cl (approx. a small shot or equivalent)
67 Yes Yes 1 bottle (33 cl) 6 cl (a big shot or equivalent)
68 Yes Yes 1 bottle (33 cl) 6 cl (a big shot or equivalent)
69 Yes Yes 1 bottle (33 cl) 6 cl (a big shot or equivalent)
70 Yes Yes 1 bottle (33 cl) 6 cl (a big shot or equivalent)
71 Yes Yes 1 bottle (33 cl) 6 cl (a big shot or equivalent)
72 Yes Yes 1 bottle (33 cl) 75 cl (1 whole bottle)
73 Yes Yes 1 bottle (33 cl) 8 cl
74 Yes Yes 1 bottle (33 cl) 8 cl
75 Yes Yes 1 bottle (33 cl) 8 cl
76 Yes Yes 1 bottle (33 cl) 8 cl
77 Yes Yes 1 bottle (33 cl) 8 cl
78 Yes Yes 1 bottle (33 cl) 8 cl
79 Yes Yes 1 bottle (33 cl) NA
80 Yes Yes 1 bottle (33 cl) NA
81 Yes Yes 1 bottle (33 cl) NA
82 Yes Yes 1 bottle (33 cl) NA
83 Yes Yes 1 bottle (33 cl) NA
84 Yes Yes 1 bottle (33 cl) NA
85 Yes Yes 1 glass or less 12 cl
86 Yes Yes 1 glass or less 12 cl
87 Yes Yes 1 glass or less 12 cl
88 Yes Yes 1 glass or less 12 cl
89 Yes Yes 1 glass or less 18 cl
90 Yes Yes 1 glass or less 18 cl
91 Yes Yes 1 glass or less 18 cl
92 Yes Yes 1 glass or less 18 cl
93 Yes Yes 1 glass or less 37 cl (half a bottle)
94 Yes Yes 1 glass or less 37 cl (half a bottle)
95 Yes Yes 1 glass or less 4 cl (approx. a small shot or equivalent)
96 Yes Yes 1 glass or less 4 cl (approx. a small shot or equivalent)
97 Yes Yes 1 glass or less 4 cl (approx. a small shot or equivalent)
98 Yes Yes 1 glass or less 4 cl (approx. a small shot or equivalent)
99 Yes Yes 1 glass or less 6 cl (a big shot or equivalent)
100 Yes Yes 1 glass or less 6 cl (a big shot or equivalent)
.. ... ... ... ...
Variables not shown: GVINX (chr), current_drink (lgl), n (int)
# verify
dto[["unitData"]][["satsa"]] %>%
dplyr::filter(id %in% sample(unique(id),10)) %>%
dplyr::select_("id", "GALCOHOL","GEVRALK","GBEERX","GLIQX" ,"GVINX", "current_drink")
id GALCOHOL GEVRALK GBEERX GLIQX
1 133161 Yes Yes 1 glass or less 6 cl (a big shot or equivalent)
2 146542 No No, I quit. <NA> <NA>
3 159831 Yes No, I quit. 1 glass or less <NA>
4 174001 No No, I quit. <NA> <NA>
5 191272 Yes Yes 1 bottle (33 cl) 12 cl
6 239602 No No, I have never drunk alcoholic drinks <NA> <NA>
7 2105692 Yes No, I quit. 1 bottle (33 cl) 6 cl (a big shot or equivalent)
8 2115422 Yes Yes 1 glass or less 4 cl (approx. a small shot or equivalent)
9 2182371 Yes Yes 1 bottle (33 cl) 4 cl (approx. a small shot or equivalent)
10 2225502 Yes Yes 1 bottle (33 cl) 12 cl
GVINX current_drink
1 10 cl (1 wine glass) TRUE
2 <NA> FALSE
3 <NA> TRUE
4 <NA> FALSE
5 10 cl (1 wine glass) TRUE
6 <NA> FALSE
7 37 cl (half a bottle) TRUE
8 10 cl (1 wine glass) TRUE
9 10 cl (1 wine glass) TRUE
10 20 cl TRUE
Items that can contribute to generating values for the harmonized variable alcohol
are:
dto[["metaData"]] %>%
dplyr::filter(study_name == "tilda", construct == "alcohol") %>%
dplyr::select(study_name, name, label_short,categories)
study_name name label_short categories
1 tilda BEHALC.DRINKSPERDAY Standard drinks per day 35
2 tilda BEHALC.DRINKSPERWEEK Standard drinks a week 120
3 tilda BEHALC.FREQ.WEEK Average times drinking per week 7
4 tilda SCQALCOFREQ Frequency of drinking alcohol 7
5 tilda SCQALCOHOL Alcoholic drinks 2
6 tilda SCQALCONO1 More than 2 drinks/day 7
7 tilda SCQALCONO2 How many drinks consumed on days drink taken 19
We encode the harmonization rule by manually editing the values in a corresponding .csv
file located in ./data/meta/h-rules/
. Then, we apply the recoding logic it contains and append the newly created, harmonized variable to the initial data set.
study_name <- "tilda"
path_to_hrule <- "./data/meta/h-rules/h-rules-alcohol-tilda.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
dto,
study_name = study_name,
variable_names = c("SCQALCOHOL" ,"BEHALC.DRINKSPERDAY","BEHALC.DRINKSPERWEEK"),
harmony_name = "current_drink"
)
Source: local data frame [167 x 5]
Groups: SCQALCOHOL, BEHALC.DRINKSPERDAY, BEHALC.DRINKSPERWEEK [?]
SCQALCOHOL BEHALC.DRINKSPERDAY BEHALC.DRINKSPERWEEK current_drink n
(chr) (chr) (chr) (lgl) (int)
1 no 0 0 FALSE 1803
2 no NA NA FALSE 9
3 yes 0 0 TRUE 29
4 yes 0 NA TRUE 3
5 yes 0.5 0 TRUE 1
6 yes 0.5 0.05999999866 NA 3
7 yes 0.5 0.75 TRUE 3
8 yes 0.5 1.75 TRUE 1
9 yes 0.699999988079 0.08399999887 NA 1
10 yes 1 0 TRUE 23
11 yes 1 0.11999999732 NA 181
12 yes 1 0.34999999404 NA 152
13 yes 1 1.5 TRUE 150
14 yes 1 3.5 TRUE 47
15 yes 1 5.5 TRUE 28
16 yes 1 6.5 TRUE 45
17 yes 1 NA TRUE 5
18 yes 1.5 0 TRUE 1
19 yes 1.5 0.17999999225 NA 19
20 yes 1.5 0.52499997616 NA 24
21 yes 1.5 2.25 TRUE 53
22 yes 1.5 5.25 TRUE 23
23 yes 1.5 8.25 TRUE 6
24 yes 1.5 9.75 TRUE 12
25 yes 1.5 NA TRUE 1
26 yes 10 1.19999992848 NA 6
27 yes 10 15 TRUE 35
28 yes 10 3.5 TRUE 9
29 yes 10 35 TRUE 9
30 yes 10 65 TRUE 7
31 yes 10 NA TRUE 1
32 yes 11 16.5 TRUE 2
33 yes 11 71.5 TRUE 1
34 yes 12 1.43999993801 NA 5
35 yes 12 18 TRUE 10
36 yes 12 4.19999980927 NA 5
37 yes 12 42 TRUE 4
38 yes 12 NA TRUE 4
39 yes 13 NA TRUE 1
40 yes 14 1.67999994755 NA 3
41 yes 14 21 TRUE 2
42 yes 15 22.5 TRUE 1
43 yes 15 5.25 TRUE 1
44 yes 15 52.5 TRUE 2
45 yes 15 82.5 TRUE 1
46 yes 15 97.5 TRUE 1
47 yes 16 24 TRUE 3
48 yes 16 56 TRUE 2
49 yes 16 88 TRUE 1
50 yes 16 NA TRUE 2
51 yes 18 27 TRUE 2
52 yes 2 0 TRUE 20
53 yes 2 0.23999999464 NA 170
54 yes 2 0.69999998808 NA 223
55 yes 2 11 TRUE 52
56 yes 2 13 TRUE 94
57 yes 2 3 TRUE 445
58 yes 2 7 TRUE 186
59 yes 2 NA TRUE 21
60 yes 2.5 0 TRUE 2
61 yes 2.5 0.29999998212 NA 11
62 yes 2.5 0.875 TRUE 26
63 yes 2.5 13.75 TRUE 23
64 yes 2.5 16.25 TRUE 22
65 yes 2.5 3.75 TRUE 77
66 yes 2.5 8.75 TRUE 38
67 yes 20 0 TRUE 1
68 yes 20 110 TRUE 1
69 yes 20 130 TRUE 1
70 yes 20 30 TRUE 7
71 yes 20 7 TRUE 3
72 yes 20 70 TRUE 1
73 yes 20 NA TRUE 1
74 yes 21 7.34999990464 NA 1
75 yes 22 77 TRUE 1
76 yes 24 36 TRUE 1
77 yes 24 84 TRUE 3
78 yes 25 37.5 TRUE 1
79 yes 3 0 TRUE 8
80 yes 3 0.3599999845 NA 76
81 yes 3 1.04999995232 NA 118
82 yes 3 10.5 TRUE 152
83 yes 3 16.5 TRUE 55
84 yes 3 19.5 TRUE 82
85 yes 3 4.5 TRUE 321
86 yes 3 NA TRUE 11
87 yes 3.5 0.41999998689 NA 6
88 yes 3.5 1.22500002384 NA 12
89 yes 3.5 12.25 TRUE 33
90 yes 3.5 19.25 TRUE 10
91 yes 3.5 22.75 TRUE 13
92 yes 3.5 5.25 TRUE 77
93 yes 3.5 NA TRUE 1
94 yes 30 10.5 TRUE 1
95 yes 30 105 TRUE 2
96 yes 30 45 TRUE 1
97 yes 30 NA TRUE 1
98 yes 35 227.5 TRUE 1
99 yes 4 0 TRUE 4
100 yes 4 0.47999998927 NA 59
.. ... ... ... ... ...
# verify
dto[["unitData"]][["tilda"]] %>%
dplyr::filter(id %in% sample(unique(id),10)) %>%
dplyr::select_("id","SCQALCOHOL" ,"BEHALC.DRINKSPERDAY","BEHALC.DRINKSPERWEEK","current_drink")
id SCQALCOHOL BEHALC.DRINKSPERDAY BEHALC.DRINKSPERWEEK current_drink
1 31381 yes 2 0.24 NA
2 236321 no 0 0.00 FALSE
3 314731 yes 2 13.00 TRUE
4 399031 yes 4 14.00 TRUE
5 500841 yes 2 3.00 TRUE
6 520251 yes 2 13.00 TRUE
7 526891 no 0 0.00 FALSE
8 545011 yes 12 1.44 NA
9 578751 yes NA NA TRUE
10 627252 yes 0 0.00 TRUE
At this point the dto[["unitData"]]
elements (raw data files for each study) have been augmented with the harmonized variable alcohol
. We retrieve harmonized variables to view frequency counts across studies:
dumlist <- list()
for(s in dto[["studyName"]]){
ds <- dto[["unitData"]][[s]]
dumlist[[s]] <- ds[,c("id","current_drink")]
}
ds <- plyr::ldply(dumlist,data.frame,.id = "study_name")
head(ds)
study_name id current_drink
1 alsa 41 TRUE
2 alsa 42 TRUE
3 alsa 61 TRUE
4 alsa 71 TRUE
5 alsa 91 TRUE
6 alsa 121 TRUE
ds$id <- 1:nrow(ds) # some ids values might be identical, replace
table( ds$current_drink, ds$study_name, useNA="always")
alsa lbsl satsa share tilda <NA>
FALSE 774 185 537 1855 1812 0
TRUE 1293 378 934 739 4048 0
<NA> 20 93 26 4 2644 0
Finally, we have added the newly created, harmonized variables to the raw source objects and save the data transfer object.
# Save as a compress, binary R dataset. It's no longer readable with a text editor, but it saves metadata (eg, factor information).
saveRDS(dto, file="./data/unshared/derived/dto.rds", compress="xz")