This report lists the candidate variable for DataScheme variables of the construct health.
This report is meant to be compiled after having executed the script
./manipulation/0-ellis-island.R
, which prepares the necessary data transfer object (DTO). We begin with a brief recap of this script and the DTO it produces.
All data land on Ellis Island.
The script 0-ellis-island.R
is the first script in the analytic workflow. It accomplished the following:
./data/shared/derived/meta-data-live.csv
, which is updated every time Ellis Island script is executed../data/shared/meta-data-map.csv
. They are used by automatic scripts in later harmonization and analysis.# load the product of 0-ellis-island.R, a list object containing data and metadata
dto <- readRDS("./data/unshared/derived/dto.rds")
# the list is composed of the following elements
names(dto)
[1] "studyName" "filePath" "unitData" "metaData"
# 1st element - names of the studies as character vector
dto[["studyName"]]
[1] "alsa" "lbsl" "satsa" "share" "tilda"
# 2nd element - file paths of the data files for each study as character vector
dto[["filePath"]]
[1] "./data/unshared/raw/ALSA-Wave1.Final.sav" "./data/unshared/raw/LBSL-Panel2-Wave1.Final.sav"
[3] "./data/unshared/raw/SATSA-Q3.Final.sav" "./data/unshared/raw/SHARE-Israel-Wave1.Final.sav"
[5] "./data/unshared/raw/TILDA-Wave1.Final.sav"
# 3rd element - list objects with the following elements
names(dto[["unitData"]])
[1] "alsa" "lbsl" "satsa" "share" "tilda"
# each of these elements is a raw data set of a corresponding study, for example
dplyr::tbl_df(dto[["unitData"]][["lbsl"]])
Source: local data frame [656 x 27]
id AGE94 SEX94 MSTAT94 EDUC94 NOWRK94 SMK94 SMOKE
(int) (int) (int) (fctr) (int) (fctr) (fctr) (fctr)
1 4001026 68 1 divorced 16 no, retired no never smoked
2 4012015 94 2 widowed 12 no, retired no never smoked
3 4012032 94 2 widowed 20 no, retired no don't smoke at present but smoked in the past
4 4022004 93 2 NA NA NA NA never smoked
5 4022026 93 2 widowed 12 no, retired no never smoked
6 4031031 92 1 married 8 no, retired no don't smoke at present but smoked in the past
7 4031035 92 1 widowed 13 no, retired no don't smoke at present but smoked in the past
8 4032201 92 2 NA NA NA NA don't smoke at present but smoked in the past
9 4041062 91 1 widowed 7 NA no don't smoke at present but smoked in the past
10 4042057 91 2 NA NA NA NA NA
.. ... ... ... ... ... ... ... ...
Variables not shown: ALCOHOL (fctr), WINE (int), BEER (int), HARDLIQ (int), SPORT94 (int), FIT94 (int), WALK94 (int),
SPEC94 (int), DANCE94 (int), CHORE94 (int), EXCERTOT (int), EXCERWK (int), HEIGHT94 (int), WEIGHT94 (int), HWEIGHT
(int), HHEIGHT (int), SRHEALTH (fctr), smoke_now (lgl), smoked_ever (lgl)
# 4th element - a dataset names and labels of raw variables + added metadata for all studies
dto[["metaData"]] %>% dplyr::select(study_name, name, item, construct, type, categories, label_short, label) %>%
DT::datatable(
class = 'cell-border stripe',
caption = "This is the primary metadata file. Edit at `./data/shared/meta-data-map.csv",
filter = "top",
options = list(pageLength = 6, autoWidth = TRUE)
)
dto[["metaData"]]%>%dplyr::filter(study_name=="alsa", name=="BTSM12MN")%>%dplyr::select(name,label)
name label
1 BTSM12MN Health comp with 12mths ago
dto[["unitData"]][["alsa"]]%>%histogram_discrete("BTSM12MN")
dto[["unitData"]][["alsa"]]%>%dplyr::group_by_("BTSM12MN")%>%dplyr::summarize(n=n())
Source: local data frame [5 x 2]
BTSM12MN n
(fctr) (int)
1 Better now 285
2 About the same 1173
3 Not as good now 621
4 Don t Know 2
5 NA 6
dto[["metaData"]] %>% dplyr::filter(study_name=="alsa", name=="HLTHBTSM") %>% dplyr::select(name,label)
name label
1 HLTHBTSM Health compared to others
dto[["unitData"]][["alsa"]]%>% histogram_discrete("HLTHBTSM")
dto[["unitData"]][["alsa"]]%>% dplyr::group_by_("HLTHBTSM") %>% dplyr::summarize(n=n())
Source: local data frame [5 x 2]
HLTHBTSM n
(fctr) (int)
1 Better 1224
2 Same 646
3 Worse 147
4 Don t Know 51
5 NA 19
dto[["metaData"]] %>% dplyr::filter(study_name=="alsa", name=="HLTHLIFE") %>% dplyr::select(name,label)
name label
1 HLTHLIFE Self-rated health
dto[["unitData"]][["alsa"]]%>% histogram_discrete("HLTHLIFE")
dto[["unitData"]][["alsa"]]%>% dplyr::group_by_("HLTHLIFE") %>% dplyr::summarize(n=n())
Source: local data frame [7 x 2]
HLTHLIFE n
(fctr) (int)
1 Excellent 191
2 Very Good 599
3 Good 633
4 Fair 477
5 Poor 181
6 Don t Know 1
7 NA 5
dto[["metaData"]]%>%dplyr::filter(study_name=="lbsl", name=="SRHEALTH")%>%dplyr::select(name,label)
name label
1 SRHEALTH Self-reported health compared to age peers
dto[["unitData"]][["lbsl"]]%>%histogram_discrete("SRHEALTH")
dto[["unitData"]][["lbsl"]]%>%dplyr::group_by_("SRHEALTH")%>%dplyr::summarize(n=n())
Source: local data frame [7 x 2]
SRHEALTH n
(fctr) (int)
1 very good 163
2 good 173
3 moderately good 177
4 moderately poor 41
5 poor 7
6 very poor 3
7 NA 92
dto[["metaData"]]%>%dplyr::filter(study_name=="satsa", name=="GGENHLTH")%>%dplyr::select(name,label)
name label
1 GGENHLTH How do you judge your general state of health?
dto[["unitData"]][["satsa"]]%>%histogram_discrete("GGENHLTH")
dto[["unitData"]][["satsa"]]%>%dplyr::group_by_("GGENHLTH")%>%dplyr::summarize(n=n())
Source: local data frame [4 x 2]
GGENHLTH n
(fctr) (int)
1 good 853
2 reasonable 587
3 bad 42
4 NA 15
dto[["metaData"]]%>%dplyr::filter(study_name=="satsa", name=="GHLTHOTH")%>%dplyr::select(name,label)
name label
1 GHLTHOTH How do you judge your general state of health compared to other people your age?
dto[["unitData"]][["satsa"]]%>%histogram_discrete("GHLTHOTH")
dto[["unitData"]][["satsa"]]%>%dplyr::group_by_("GHLTHOTH")%>%dplyr::summarize(n=n())
Source: local data frame [5 x 2]
GHLTHOTH n
(fctr) (int)
1 better 362
2 about the same 1019
3 worse 91
4 UNDOCUMENTED CODE 1
5 NA 24
dto[["metaData"]]%>%dplyr::filter(study_name=="tilda", name=="PH001")%>%dplyr::select(name,label)
name label
1 PH001 ph001 Now I would like to ask you some questions about your health. Would you say ?
dto[["unitData"]][["tilda"]]%>%histogram_discrete("PH001")
dto[["unitData"]][["tilda"]]%>%dplyr::group_by_("PH001")%>%dplyr::summarize(n=n())
Source: local data frame [6 x 2]
PH001 n
(fctr) (int)
1 Excellent 1360
2 Very good 2448
3 Good 2758
4 Fair 1517
5 Poor 420
6 NA 1
dto[["metaData"]]%>%dplyr::filter(study_name=="tilda", name=="PH009")%>%dplyr::select(name,label)
name label
1 PH009 ph009 In general, compared to other people your age, would you say your health is
dto[["unitData"]][["tilda"]] %>%histogram_discrete("PH009")
dto[["unitData"]][["tilda"]]%>%dplyr::group_by_("PH009")%>%dplyr::summarize(n=n())
Source: local data frame [6 x 2]
PH009 n
(fctr) (int)
1 Excellent 1799
2 Very good 2871
3 Good 2525
4 Fair 1021
5 Poor 273
6 NA 15
sessionInfo()
R version 3.2.5 (2016-04-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggplot2_2.1.0 knitr_1.12.3 magrittr_1.5
loaded via a namespace (and not attached):
[1] splines_3.2.5 lattice_0.20-33 colorspace_1.2-6 htmltools_0.3.5 mgcv_1.8-12
[6] yaml_2.1.13 chron_2.3-47 survival_2.38-3 nloptr_1.0.4 foreign_0.8-66
[11] DBI_0.4-1 RColorBrewer_1.1-2 plyr_1.8.3 stringr_1.0.0 MatrixModels_0.4-1
[16] munsell_0.4.3 gtable_0.2.0 htmlwidgets_0.6 evaluate_0.9 labeling_0.3
[21] latticeExtra_0.6-28 SparseM_1.7 extrafont_0.17 quantreg_5.21 pbkrtest_0.4-6
[26] parallel_3.2.5 markdown_0.7.7 highr_0.5.1 Rttf2pt1_1.3.3 Rcpp_0.12.5
[31] acepack_1.3-3.3 scales_0.4.0 DT_0.1.40 formatR_1.3 Hmisc_3.17-4
[36] jsonlite_0.9.20 lme4_1.1-12 gridExtra_2.2.1 testit_0.5 digest_0.6.9
[41] stringi_1.0-1 dplyr_0.4.3 grid_3.2.5 tools_3.2.5 lazyeval_0.1.10
[46] dichromat_2.0-0 Formula_1.2-1 cluster_2.0.3 tidyr_0.4.1 extrafontdb_1.0
[51] car_2.1-2 MASS_7.3-45 Matrix_1.2-4 rsconnect_0.4.2.1 data.table_1.9.6
[56] assertthat_0.1 minqa_1.2.4 rmarkdown_0.9.6 R6_2.1.2 rpart_4.1-10
[61] nnet_7.3-12 nlme_3.1-126