This report lists the candidate variable for DataScheme variables of the construct alcohol.
This report is meant to be compiled after having executed the script
./manipulation/0-ellis-island.R
, which prepares the necessary data transfer object (DTO). We begin with a brief recap of this script and the DTO it produces.
All data land on Ellis Island.
The script 0-ellis-island.R
is the first script in the analytic workflow. It accomplished the following:
./data/shared/derived/meta-data-live.csv
, which is updated every time Ellis Island script is executed../data/shared/meta-data-map.csv
. They are used by automatic scripts in later harmonization and analysis.# load the product of 0-ellis-island.R, a list object containing data and metadata
dto <- readRDS("./data/unshared/derived/dto.rds")
# the list is composed of the following elements
names(dto)
[1] "studyName" "filePath" "unitData" "metaData"
# 1st element - names of the studies as character vector
dto[["studyName"]]
[1] "alsa" "lbsl" "satsa" "share" "tilda"
# 2nd element - file paths of the data files for each study as character vector
dto[["filePath"]]
[1] "./data/unshared/raw/ALSA-Wave1.Final.sav" "./data/unshared/raw/LBSL-Panel2-Wave1.Final.sav"
[3] "./data/unshared/raw/SATSA-Q3.Final.sav" "./data/unshared/raw/SHARE-Israel-Wave1.Final.sav"
[5] "./data/unshared/raw/TILDA-Wave1.Final.sav"
# 3rd element - list objects with the following elements
names(dto[["unitData"]])
[1] "alsa" "lbsl" "satsa" "share" "tilda"
# each of these elements is a raw data set of a corresponding study, for example
dplyr::tbl_df(dto[["unitData"]][["lbsl"]])
Source: local data frame [656 x 27]
id AGE94 SEX94 MSTAT94 EDUC94 NOWRK94 SMK94 SMOKE
(int) (int) (int) (fctr) (int) (fctr) (fctr) (fctr)
1 4001026 68 1 divorced 16 no, retired no never smoked
2 4012015 94 2 widowed 12 no, retired no never smoked
3 4012032 94 2 widowed 20 no, retired no don't smoke at present but smoked in the past
4 4022004 93 2 NA NA NA NA never smoked
5 4022026 93 2 widowed 12 no, retired no never smoked
6 4031031 92 1 married 8 no, retired no don't smoke at present but smoked in the past
7 4031035 92 1 widowed 13 no, retired no don't smoke at present but smoked in the past
8 4032201 92 2 NA NA NA NA don't smoke at present but smoked in the past
9 4041062 91 1 widowed 7 NA no don't smoke at present but smoked in the past
10 4042057 91 2 NA NA NA NA NA
.. ... ... ... ... ... ... ... ...
Variables not shown: ALCOHOL (fctr), WINE (int), BEER (int), HARDLIQ (int), SPORT94 (int), FIT94 (int), WALK94 (int),
SPEC94 (int), DANCE94 (int), CHORE94 (int), EXCERTOT (int), EXCERWK (int), HEIGHT94 (int), WEIGHT94 (int), HWEIGHT
(int), HHEIGHT (int), SRHEALTH (fctr), smoke_now (lgl), smoked_ever (lgl)
# 4th element - a dataset names and labels of raw variables + added metadata for all studies
dto[["metaData"]] %>% dplyr::select(study_name, name, item, construct, type, categories, label_short, label) %>%
DT::datatable(
class = 'cell-border stripe',
caption = "This is the primary metadata file. Edit at `./data/shared/meta-data-map.csv",
filter = "top",
options = list(pageLength = 6, autoWidth = TRUE)
)
dto[["metaData"]] %>% dplyr::filter(study_name=="alsa", name=="FR6ORMOR") %>% dplyr::select(name,label)
name label
1 FR6ORMOR Frequency six or more drinks
dto[["unitData"]][["alsa"]]%>% histogram_discrete("FR6ORMOR")
dto[["unitData"]][["alsa"]]%>% dplyr::group_by_("FR6ORMOR") %>% dplyr::summarize(n=n())
Source: local data frame [6 x 2]
FR6ORMOR n
(fctr) (int)
1 Never 1064
2 Less than monthly 134
3 Monthly 39
4 Weekly 32
5 Daily or almost daily 23
6 NA 795
dto[["metaData"]] %>% dplyr::filter(study_name=="alsa", name=="FREQALCH") %>% dplyr::select(name,label)
name label
1 FREQALCH Frequency alcohol
dto[["unitData"]][["alsa"]]%>% histogram_discrete("FREQALCH")
dto[["unitData"]][["alsa"]]%>% dplyr::group_by_("FREQALCH") %>% dplyr::summarize(n=n())
Source: local data frame [6 x 2]
FREQALCH n
(fctr) (int)
1 Never 774
2 Monthly or less 368
3 Two to four times a month 167
4 Two to three times a week 214
5 Four or more times a week 544
6 NA 20
dto[["metaData"]] %>% dplyr::filter(study_name=="alsa", name=="NOSTDRNK") %>% dplyr::select(name,label)
name label
1 NOSTDRNK Number of standard drinks
dto[["unitData"]][["alsa"]]%>% histogram_discrete("NOSTDRNK")
dto[["unitData"]][["alsa"]]%>% dplyr::group_by_("NOSTDRNK") %>% dplyr::summarize(n=n())
Source: local data frame [6 x 2]
NOSTDRNK n
(fctr) (int)
1 One or two 1033
2 Three or four 195
3 Five or six 46
4 Seven to nine 16
5 Ten or more 2
6 NA 795
dto[["metaData"]] %>% dplyr::filter(study_name=="lbsl", name=="ALCOHOL") %>% dplyr::select(name,label)
name label
1 ALCOHOL Alcohol use
dto[["unitData"]][["lbsl"]]%>% histogram_discrete("ALCOHOL")
dto[["unitData"]][["lbsl"]]%>% dplyr::group_by_("ALCOHOL") %>% dplyr::summarize(n=n())
Source: local data frame [8 x 2]
ALCOHOL n
(fctr) (int)
1 never drank 92
2 not in last year 92
3 few times a year 143
4 once or twice per month 59
5 once a week 35
6 two or three times weekly 59
7 daily or almost daily 82
8 NA 94
# requires categorization
dto[["metaData"]] %>% dplyr::filter(study_name=="lbsl", name=="BEER") %>% dplyr::select(name,label)
name label
1 BEER Number of cans/bottles of beer last week
dto[["unitData"]][["lbsl"]]%>% histogram_continuous("BEER")
dto[["unitData"]][["lbsl"]]%>% dplyr::group_by_("BEER") %>% dplyr::summarize(n=n())
Source: local data frame [18 x 2]
BEER n
(int) (int)
1 0 242
2 1 31
3 2 22
4 3 8
5 4 6
6 5 4
7 6 5
8 7 5
9 8 2
10 9 2
11 10 7
12 12 2
13 14 1
14 15 1
15 18 1
16 25 1
17 30 1
18 NA 315
# requires categorization
dto[["metaData"]] %>% dplyr::filter(study_name=="lbsl", name=="HARDLIQ") %>% dplyr::select(name,label)
name label
1 HARDLIQ Number of drinks containing hard liquor last week
dto[["unitData"]][["lbsl"]]%>% histogram_continuous("HARDLIQ")
dto[["unitData"]][["lbsl"]]%>% dplyr::group_by_("HARDLIQ") %>% dplyr::summarize(n=n())
Source: local data frame [17 x 2]
HARDLIQ n
(int) (int)
1 0 231
2 1 23
3 2 34
4 3 8
5 4 7
6 5 7
7 6 10
8 7 13
9 8 2
10 9 1
11 10 1
12 12 1
13 14 9
14 15 1
15 21 1
16 25 1
17 NA 306
# requires categorization
dto[["metaData"]] %>% dplyr::filter(study_name=="lbsl", name=="WINE") %>% dplyr::select(name,label)
name label
1 WINE Number of glasses of wine last week
dto[["unitData"]][["lbsl"]]%>% histogram_continuous("WINE")
dto[["unitData"]][["lbsl"]]%>% dplyr::group_by_("WINE") %>% dplyr::summarize(n=n())
Source: local data frame [16 x 2]
WINE n
(int) (int)
1 0 189
2 1 45
3 2 28
4 3 17
5 4 15
6 5 6
7 6 10
8 7 10
9 8 3
10 9 2
11 10 7
12 12 3
13 14 4
14 15 4
15 21 1
16 NA 312
dto[["metaData"]] %>% dplyr::filter(study_name=="satsa", name=="GALCOHOL") %>% dplyr::select(name,label)
name label
1 GALCOHOL Do you ever drink alcoholic beverages?
dto[["unitData"]][["satsa"]]%>% histogram_discrete("GALCOHOL")
dto[["unitData"]][["satsa"]]%>% dplyr::group_by_("GALCOHOL") %>% dplyr::summarize(n=n())
Source: local data frame [3 x 2]
GALCOHOL n
(fctr) (int)
1 No 529
2 Yes 934
3 NA 34
dto[["metaData"]] %>% dplyr::filter(study_name=="satsa", name=="GBEERX") %>% dplyr::select(name,label)
name label
1 GBEERX How much beer do you usually drink at a time?
dto[["unitData"]][["satsa"]]%>% histogram_discrete("GBEERX")
dto[["unitData"]][["satsa"]]%>% dplyr::group_by_("GBEERX") %>% dplyr::summarize(n=n())
Source: local data frame [8 x 2]
GBEERX n
(fctr) (int)
1 1 glass or less 351
2 1 bottle (33 cl) 358
3 2 bottles 56
4 3 bottles (two 45 cl cans) 28
5 4 bottles 5
6 5 bottles 6
7 6 bottles or more 9
8 NA 684
dto[["metaData"]] %>% dplyr::filter(study_name=="satsa", name=="GBOTVIN") %>% dplyr::select(name,label)
name label
1 GBOTVIN ..more than 1 bottle, i.e.____bottles (state number of bottles): GBOTVIN
dto[["unitData"]][["satsa"]]%>% histogram_discrete("GBOTVIN")
dto[["unitData"]][["satsa"]]%>% dplyr::group_by_("GBOTVIN") %>% dplyr::summarize(n=n())
Source: local data frame [5 x 2]
GBOTVIN n
(int) (int)
1 0 1
2 1 1
3 2 3
4 4 1
5 NA 1491
dto[["metaData"]] %>% dplyr::filter(study_name=="satsa", name=="GDRLOTS") %>% dplyr::select(name,label)
name
1 GDRLOTS
label
1 How often do you consume more than five bottles of beer or more than one bottle of wine or more than 1/2 bottle liquot at one occasion?
dto[["unitData"]][["satsa"]]%>% histogram_discrete("GDRLOTS")
dto[["unitData"]][["satsa"]]%>% dplyr::group_by_("GDRLOTS") %>% dplyr::summarize(n=n())
Source: local data frame [9 x 2]
GDRLOTS n
(fctr) (int)
1 Never 1002
2 1-3 times a year 110
3 4-6 times a year 41
4 approx. once a month 33
5 a few times a month 12
6 approx. once a week 8
7 a few times a week 6
8 almost daily 1
9 NA 284
dto[["metaData"]] %>% dplyr::filter(study_name=="satsa", name=="GEVRALK") %>% dplyr::select(name,label)
name label
1 GEVRALK Do you ever drink alcoholic drinks? - Yes
dto[["unitData"]][["satsa"]]%>% histogram_discrete("GEVRALK")
dto[["unitData"]][["satsa"]]%>% dplyr::group_by_("GEVRALK") %>% dplyr::summarize(n=n())
Source: local data frame [4 x 2]
GEVRALK n
(fctr) (int)
1 Yes 961
2 No, I have never drunk alcoholic drinks 376
3 No, I quit. 78
4 NA 82
dto[["metaData"]] %>% dplyr::filter(study_name=="satsa", name=="GFREQBER") %>% dplyr::select(name,label)
name label
1 GFREQBER How often do you drink beer (not light beer)?
dto[["unitData"]][["satsa"]]%>% histogram_discrete("GFREQBER")
dto[["unitData"]][["satsa"]]%>% dplyr::group_by_("GFREQBER") %>% dplyr::summarize(n=n())
Source: local data frame [10 x 2]
GFREQBER n
(fctr) (int)
1 Never 300
2 Once a year or less 77
3 2-6 times a year 212
4 Once a month 122
5 2 times a month 117
6 Once a week 127
7 2 times a week 100
8 every other day 34
9 every day 29
10 NA 379
dto[["metaData"]] %>% dplyr::filter(study_name=="satsa", name=="GFREQLIQ") %>% dplyr::select(name,label)
name
1 GFREQLIQ
label
1 How often do you usually drink hard liquor? (e.g. aquavit, whiskey, gin, brandy, punsch. Also liquot in cocktails and long drinks)
dto[["unitData"]][["satsa"]]%>% histogram_discrete("GFREQLIQ")
dto[["unitData"]][["satsa"]]%>% dplyr::group_by_("GFREQLIQ") %>% dplyr::summarize(n=n())
Source: local data frame [10 x 2]
GFREQLIQ n
(fctr) (int)
1 Never 278
2 Once a year or less 100
3 2-6 times a year 349
4 Once a month 138
5 2 times a month 131
6 Once a week 103
7 2 times a week 60
8 every other day 10
9 every day 5
10 NA 323
dto[["metaData"]] %>% dplyr::filter(study_name=="satsa", name=="GFREQVIN") %>% dplyr::select(name,label)
name label
1 GFREQVIN How often do you usually drink wine (red or white)?
dto[["unitData"]][["satsa"]]%>% histogram_discrete("GFREQVIN")
dto[["unitData"]][["satsa"]]%>% dplyr::group_by_("GFREQVIN") %>% dplyr::summarize(n=n())
Source: local data frame [10 x 2]
GFREQVIN n
(fctr) (int)
1 Never 261
2 Once a year or less 111
3 2-6 times a year 304
4 Once a month 123
5 2 times a month 126
6 Once a week 112
7 2 times a week 72
8 every other day 8
9 every day 4
10 NA 376
dto[["metaData"]] %>% dplyr::filter(study_name=="satsa", name=="GLIQX") %>% dplyr::select(name,label)
name label
1 GLIQX How much hard liquot do you usually drink at time?
dto[["unitData"]][["satsa"]]%>% histogram_discrete("GLIQX")
dto[["unitData"]][["satsa"]]%>% dplyr::group_by_("GLIQX") %>% dplyr::summarize(n=n())
Source: local data frame [9 x 2]
GLIQX n
(fctr) (int)
1 4 cl (approx. a small shot or equivalent) 328
2 6 cl (a big shot or equivalent) 190
3 8 cl 135
4 12 cl 106
5 18 cl 65
6 37 cl (half a bottle) 57
7 60 cl 3
8 75 cl (1 whole bottle) 7
9 NA 606
# requires categorization? maybe, maybe not
dto[["metaData"]] %>% dplyr::filter(study_name=="satsa", name=="GSTOPALK") %>% dplyr::select(name,label)
name label
1 GSTOPALK Do you ever drink alcoholic drinks? -No I quit. When? 19__
dto[["unitData"]][["satsa"]]%>% histogram_continuous("GSTOPALK")
dto[["unitData"]][["satsa"]]%>% dplyr::group_by_("GSTOPALK") %>% dplyr::summarize(n=n())
Source: local data frame [33 x 2]
GSTOPALK n
(int) (int)
1 34 1
2 40 2
3 41 1
4 44 1
5 46 1
6 47 1
7 50 3
8 58 2
9 59 1
10 60 2
.. ... ...
dto[["metaData"]] %>% dplyr::filter(study_name=="satsa", name=="GVINX") %>% dplyr::select(name,label)
name label
1 GVINX How much wine do you usually drink at a time?
dto[["unitData"]][["satsa"]]%>% histogram_discrete("GVINX")
dto[["unitData"]][["satsa"]]%>% dplyr::group_by_("GVINX") %>% dplyr::summarize(n=n())
Source: local data frame [7 x 2]
GVINX n
(fctr) (int)
1 10 cl (1 wine glass) 330
2 20 cl 308
3 37 cl (half a bottle) 184
4 60 cl 14
5 75 cl (one whole bottle) 18
6 More than 1 whole bottle 2
7 NA 641
# requires categorization
dto[["metaData"]] %>% dplyr::filter(study_name=="tilda", name=="BEHALC.DRINKSPERDAY") %>% dplyr::select(name,label)
name label
1 BEHALC.DRINKSPERDAY BEHalc_drinksperday Standard drinks per day
dto[["unitData"]][["tilda"]]%>% histogram_continuous("BEHALC.DRINKSPERDAY")
dto[["unitData"]][["tilda"]]%>% dplyr::group_by_("BEHALC.DRINKSPERDAY") %>% dplyr::summarize(n=n())
Source: local data frame [36 x 2]
BEHALC.DRINKSPERDAY n
(dbl) (int)
1 0.0 1835
2 0.5 8
3 0.7 1
4 1.0 631
5 1.5 139
6 2.0 1212
7 2.5 199
8 3.0 823
9 3.5 152
10 4.0 731
.. ... ...
# requires categorization
dto[["metaData"]] %>% dplyr::filter(study_name=="tilda", name=="BEHALC.DRINKSPERWEEK") %>% dplyr::select(name,label)
name label
1 BEHALC.DRINKSPERWEEK BEHalc_drinksperweek Standard drinks a week
dto[["unitData"]][["tilda"]]%>% histogram_continuous("BEHALC.DRINKSPERWEEK")
dto[["unitData"]][["tilda"]]%>% dplyr::group_by_("BEHALC.DRINKSPERWEEK") %>% dplyr::summarize(n=n())
Source: local data frame [121 x 2]
BEHALC.DRINKSPERWEEK n
(dbl) (int)
1 0.000 1893
2 0.060 3
3 0.084 1
4 0.120 181
5 0.180 19
6 0.240 171
7 0.300 11
8 0.350 152
9 0.360 76
10 0.420 6
.. ... ...
dto[["metaData"]] %>% dplyr::filter(study_name=="tilda", name=="BEHALC.FREQ.WEEK") %>% dplyr::select(name,label)
name label
1 BEHALC.FREQ.WEEK BEHalc_freq_week Average times drinking per week
dto[["unitData"]][["tilda"]]%>% histogram_discrete("BEHALC.FREQ.WEEK")
dto[["unitData"]][["tilda"]]%>% dplyr::group_by_("BEHALC.FREQ.WEEK") %>% dplyr::summarize(n=n())
Source: local data frame [8 x 2]
BEHALC.FREQ.WEEK n
(dbl) (int)
1 0.00 1935
2 0.12 667
3 0.35 865
4 1.50 2043
5 3.50 841
6 5.50 269
7 6.50 445
8 NA 1439
# requires labelling factor levels
dto[["metaData"]] %>% dplyr::filter(study_name=="tilda", name=="SCQALCOFREQ") %>% dplyr::select(name,label)
name label
1 SCQALCOFREQ SCQalcofreq frequency of drinking alcohol
dto[["unitData"]][["tilda"]]%>%
dplyr::filter(!SCQALCOFREQ %in% c(-867,-856,-845,-823,-812)) %>% histogram_discrete("SCQALCOFREQ")
dto[["unitData"]][["tilda"]]%>% dplyr::group_by_("SCQALCOFREQ") %>% dplyr::summarize(n=n())
Source: local data frame [13 x 2]
SCQALCOFREQ n
(int) (int)
1 -856 2
2 -845 1
3 -823 2
4 -812 5
5 -1 1806
6 1 440
7 2 267
8 3 841
9 4 2042
10 5 863
11 6 667
12 7 129
13 NA 1439
dto[["metaData"]] %>% dplyr::filter(study_name=="tilda", name=="SCQALCOHOL") %>% dplyr::select(name,label)
name label
1 SCQALCOHOL SCQalcohol drink alcohol
dto[["unitData"]][["tilda"]]%>% histogram_discrete("SCQALCOHOL")
dto[["unitData"]][["tilda"]]%>% dplyr::group_by_("SCQALCOHOL") %>% dplyr::summarize(n=n())
Source: local data frame [3 x 2]
SCQALCOHOL n
(fctr) (int)
1 yes 5349
2 no 1812
3 NA 1343
dto[["metaData"]] %>% dplyr::filter(study_name=="tilda", name=="SCQALCONO1") %>% dplyr::select(name,label)
name label
1 SCQALCONO1 SCQalcono1 more than two drinks in a single day
dto[["unitData"]][["tilda"]]%>%
dplyr::filter(!SCQALCONO1 %in% c(-867,-856,-845,-823,-812)) %>% histogram_discrete("SCQALCONO1")
dto[["unitData"]][["tilda"]]%>% dplyr::group_by_("SCQALCONO1") %>% dplyr::summarize(n=n())
Source: local data frame [12 x 2]
SCQALCONO1 n
(int) (int)
1 -867 1
2 -856 2
3 -812 1
4 -1 1806
5 1 272
6 2 207
7 3 664
8 4 1810
9 5 905
10 6 857
11 7 552
12 NA 1427
dto[["metaData"]] %>% dplyr::filter(study_name=="tilda", name=="SCQALCONO2") %>% dplyr::select(name,label)
name label
1 SCQALCONO2 SCQalcono2 How many drinks consumed on days drink taken
dto[["unitData"]][["tilda"]]%>%
dplyr::filter(!SCQALCONO2 %in% c(-99, -1 )) %>% histogram_continuous("SCQALCONO2")
dto[["unitData"]][["tilda"]]%>% dplyr::group_by_("SCQALCONO2") %>% dplyr::summarize(n=n())
Source: local data frame [21 x 2]
SCQALCONO2 n
(dbl) (int)
1 -99.0 405
2 -1.0 1803
3 0.0 41
4 1.0 631
5 1.5 139
6 2.0 1212
7 2.5 199
8 3.0 823
9 3.5 152
10 4.0 731
.. ... ...
sessionInfo()
R version 3.2.5 (2016-04-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggplot2_2.1.0 knitr_1.12.3 magrittr_1.5
loaded via a namespace (and not attached):
[1] splines_3.2.5 lattice_0.20-33 colorspace_1.2-6 htmltools_0.3.5 mgcv_1.8-12
[6] yaml_2.1.13 chron_2.3-47 survival_2.38-3 nloptr_1.0.4 foreign_0.8-66
[11] DBI_0.4-1 RColorBrewer_1.1-2 plyr_1.8.3 stringr_1.0.0 MatrixModels_0.4-1
[16] munsell_0.4.3 gtable_0.2.0 htmlwidgets_0.6 evaluate_0.9 labeling_0.3
[21] latticeExtra_0.6-28 SparseM_1.7 extrafont_0.17 quantreg_5.21 pbkrtest_0.4-6
[26] parallel_3.2.5 markdown_0.7.7 highr_0.5.1 Rttf2pt1_1.3.3 Rcpp_0.12.5
[31] acepack_1.3-3.3 scales_0.4.0 DT_0.1.40 formatR_1.3 Hmisc_3.17-4
[36] jsonlite_0.9.20 lme4_1.1-12 gridExtra_2.2.1 testit_0.5 digest_0.6.9
[41] stringi_1.0-1 dplyr_0.4.3 grid_3.2.5 tools_3.2.5 lazyeval_0.1.10
[46] dichromat_2.0-0 Formula_1.2-1 cluster_2.0.3 tidyr_0.4.1 extrafontdb_1.0
[51] car_2.1-2 MASS_7.3-45 Matrix_1.2-4 rsconnect_0.4.2.1 data.table_1.9.6
[56] assertthat_0.1 minqa_1.2.4 rmarkdown_0.9.6 R6_2.1.2 rpart_4.1-10
[61] nnet_7.3-12 nlme_3.1-126