This report lists the candidate variable for DataScheme variables of the construct health.

Exposition

This report is meant to be compiled after having executed the script ./manipulation/0-ellis-island.R, which prepares the necessary data transfer object (DTO). We begin with a brief recap of this script and the DTO it produces.

Ellis Island

All data land on Ellis Island.

The script 0-ellis-island.R is the first script in the analytic workflow. It accomplished the following:

    1. Reads in raw data files from the candidate studies
    1. Extract, combines, and exports their metadata (specifically, variable names and labels, if provided) into ./data/shared/derived/meta-data-live.csv, which is updated every time Ellis Island script is executed.
    1. Augments raw metadata with instructions for renaming and classifying variables. The instructions are provided as manually entered values in ./data/shared/meta-data-map.csv. They are used by automatic scripts in later harmonization and analysis.
    1. Combines unit and metadata into a single DTO to serve as a starting point to all subsequent analyses.
# load the product of 0-ellis-island.R,  a list object containing data and metadata
dto <- readRDS("./data/unshared/derived/dto.rds")
# the list is composed of the following elements
names(dto)
[1] "studyName" "filePath"  "unitData"  "metaData" 
# 1st element - names of the studies as character vector
dto[["studyName"]]
[1] "alsa"  "lbsl"  "satsa" "share" "tilda"
# 2nd element - file paths of the data files for each study as character vector
dto[["filePath"]]
[1] "./data/unshared/raw/ALSA-Wave1.Final.sav"         "./data/unshared/raw/LBSL-Panel2-Wave1.Final.sav" 
[3] "./data/unshared/raw/SATSA-Q3.Final.sav"           "./data/unshared/raw/SHARE-Israel-Wave1.Final.sav"
[5] "./data/unshared/raw/TILDA-Wave1.Final.sav"       
# 3rd element - list objects with the following elements
names(dto[["unitData"]])
[1] "alsa"  "lbsl"  "satsa" "share" "tilda"
# each of these elements is a raw data set of a corresponding study, for example
dplyr::tbl_df(dto[["unitData"]][["lbsl"]]) 
Source: local data frame [656 x 27]

        id AGE94 SEX94  MSTAT94 EDUC94     NOWRK94  SMK94                                         SMOKE
     (int) (int) (int)   (fctr)  (int)      (fctr) (fctr)                                        (fctr)
1  4001026    68     1 divorced     16 no, retired     no                                  never smoked
2  4012015    94     2  widowed     12 no, retired     no                                  never smoked
3  4012032    94     2  widowed     20 no, retired     no don't smoke at present but smoked in the past
4  4022004    93     2       NA     NA          NA     NA                                  never smoked
5  4022026    93     2  widowed     12 no, retired     no                                  never smoked
6  4031031    92     1  married      8 no, retired     no don't smoke at present but smoked in the past
7  4031035    92     1  widowed     13 no, retired     no don't smoke at present but smoked in the past
8  4032201    92     2       NA     NA          NA     NA don't smoke at present but smoked in the past
9  4041062    91     1  widowed      7          NA     no don't smoke at present but smoked in the past
10 4042057    91     2       NA     NA          NA     NA                                            NA
..     ...   ...   ...      ...    ...         ...    ...                                           ...
Variables not shown: ALCOHOL (fctr), WINE (int), BEER (int), HARDLIQ (int), SPORT94 (int), FIT94 (int), WALK94 (int),
  SPEC94 (int), DANCE94 (int), CHORE94 (int), EXCERTOT (int), EXCERWK (int), HEIGHT94 (int), WEIGHT94 (int), HWEIGHT
  (int), HHEIGHT (int), SRHEALTH (fctr), smoke_now (lgl), smoked_ever (lgl)

Meta

# 4th element - a dataset names and labels of raw variables + added metadata for all studies
dto[["metaData"]] %>% dplyr::select(study_name, name, item, construct, type, categories, label_short, label) %>% 
  DT::datatable(
    class   = 'cell-border stripe',
    caption = "This is the primary metadata file. Edit at `./data/shared/meta-data-map.csv",
    filter  = "top",
    options = list(pageLength = 6, autoWidth = TRUE)
  )

ALSA

BTSM12MN

dto[["metaData"]]%>%dplyr::filter(study_name=="alsa", name=="BTSM12MN")%>%dplyr::select(name,label)
      name                       label
1 BTSM12MN Health comp with 12mths ago
dto[["unitData"]][["alsa"]]%>%histogram_discrete("BTSM12MN")

dto[["unitData"]][["alsa"]]%>%dplyr::group_by_("BTSM12MN")%>%dplyr::summarize(n=n())
Source: local data frame [5 x 2]

         BTSM12MN     n
           (fctr) (int)
1      Better now   285
2  About the same  1173
3 Not as good now   621
4      Don t Know     2
5              NA     6

HLTHBTSM

dto[["metaData"]] %>% dplyr::filter(study_name=="alsa", name=="HLTHBTSM") %>% dplyr::select(name,label)
      name                     label
1 HLTHBTSM Health compared to others
dto[["unitData"]][["alsa"]]%>% histogram_discrete("HLTHBTSM")

dto[["unitData"]][["alsa"]]%>% dplyr::group_by_("HLTHBTSM") %>% dplyr::summarize(n=n())
Source: local data frame [5 x 2]

    HLTHBTSM     n
      (fctr) (int)
1     Better  1224
2       Same   646
3      Worse   147
4 Don t Know    51
5         NA    19

HLTHLIFE

dto[["metaData"]] %>% dplyr::filter(study_name=="alsa", name=="HLTHLIFE") %>% dplyr::select(name,label)
      name             label
1 HLTHLIFE Self-rated health
dto[["unitData"]][["alsa"]]%>% histogram_discrete("HLTHLIFE")

dto[["unitData"]][["alsa"]]%>% dplyr::group_by_("HLTHLIFE") %>% dplyr::summarize(n=n())
Source: local data frame [7 x 2]

    HLTHLIFE     n
      (fctr) (int)
1  Excellent   191
2  Very Good   599
3       Good   633
4       Fair   477
5       Poor   181
6 Don t Know     1
7         NA     5

LBSL

SRHEALTH

dto[["metaData"]]%>%dplyr::filter(study_name=="lbsl", name=="SRHEALTH")%>%dplyr::select(name,label)
      name                                      label
1 SRHEALTH Self-reported health compared to age peers
dto[["unitData"]][["lbsl"]]%>%histogram_discrete("SRHEALTH")

dto[["unitData"]][["lbsl"]]%>%dplyr::group_by_("SRHEALTH")%>%dplyr::summarize(n=n())
Source: local data frame [7 x 2]

         SRHEALTH     n
           (fctr) (int)
1       very good   163
2            good   173
3 moderately good   177
4 moderately poor    41
5            poor     7
6       very poor     3
7              NA    92

SATSA

GGENHLTH

dto[["metaData"]]%>%dplyr::filter(study_name=="satsa", name=="GGENHLTH")%>%dplyr::select(name,label)
      name                                          label
1 GGENHLTH How do you judge your general state of health?
dto[["unitData"]][["satsa"]]%>%histogram_discrete("GGENHLTH")

dto[["unitData"]][["satsa"]]%>%dplyr::group_by_("GGENHLTH")%>%dplyr::summarize(n=n())
Source: local data frame [4 x 2]

    GGENHLTH     n
      (fctr) (int)
1       good   853
2 reasonable   587
3        bad    42
4         NA    15

GHLTHOTH

dto[["metaData"]]%>%dplyr::filter(study_name=="satsa", name=="GHLTHOTH")%>%dplyr::select(name,label)
      name                                                                            label
1 GHLTHOTH How do you judge your general state of health compared to other people your age?
dto[["unitData"]][["satsa"]]%>%histogram_discrete("GHLTHOTH")

dto[["unitData"]][["satsa"]]%>%dplyr::group_by_("GHLTHOTH")%>%dplyr::summarize(n=n())
Source: local data frame [5 x 2]

           GHLTHOTH     n
             (fctr) (int)
1            better   362
2    about the same  1019
3             worse    91
4 UNDOCUMENTED CODE     1
5                NA    24

SHARE

PH0020

dto[["metaData"]]%>%dplyr::filter(study_name=="share", name=="PH0020")%>%dplyr::select(name,label)
    name                          label
1 PH0020 health in general question v 1
dto[["unitData"]][["share"]]%>%histogram_discrete("PH0020")

dto[["unitData"]][["share"]]%>%dplyr::group_by_("PH0020")%>%dplyr::summarize(n=n())
Source: local data frame [6 x 2]

     PH0020     n
     (fctr) (int)
1 very good   314
2      good   358
3      fair   454
4       bad   150
5  very bad    43
6        NA  1279

PH0030

dto[["metaData"]]%>%dplyr::filter(study_name=="share", name=="PH0030")%>%dplyr::select(name,label)
    name                          label
1 PH0030 health in general question v 2
dto[["unitData"]][["share"]]%>%histogram_discrete("PH0030")

dto[["unitData"]][["share"]]%>%dplyr::group_by_("PH0030")%>%dplyr::summarize(n=n())
Source: local data frame [7 x 2]

      PH0030     n
      (fctr) (int)
1  excellent   129
2  very good   277
3       good   333
4       fair   320
5       poor   217
6 don't know     1
7         NA  1321

PH0520

dto[["metaData"]]%>%dplyr::filter(study_name=="share", name=="PH0520")%>%dplyr::select(name,label)
    name                          label
1 PH0520 health in general question v 2
dto[["unitData"]][["share"]]%>%histogram_discrete("PH0520")

dto[["unitData"]][["share"]]%>%dplyr::group_by_("PH0520")%>%dplyr::summarize(n=n())
Source: local data frame [7 x 2]

      PH0520     n
      (fctr) (int)
1  excellent   222
2  very good   311
3       good   382
4       fair   263
5       poor   138
6 don't know     2
7         NA  1280

PH0530

dto[["metaData"]]%>%dplyr::filter(study_name=="share", name=="PH0530")%>%dplyr::select(name,label)
    name                          label
1 PH0530 health in general question v 1
dto[["unitData"]][["share"]]%>%histogram_discrete("PH0530")

dto[["unitData"]][["share"]]%>%dplyr::group_by_("PH0530")%>%dplyr::summarize(n=n())
Source: local data frame [7 x 2]

      PH0530     n
      (fctr) (int)
1  very good   340
2       good   365
3       fair   400
4        bad   127
5   very bad    43
6 don't know     1
7         NA  1322

TILDA

PH001

dto[["metaData"]]%>%dplyr::filter(study_name=="tilda", name=="PH001")%>%dplyr::select(name,label)
   name                                                                                 label
1 PH001 ph001  Now I would like to ask you some questions about your health.  Would you say ?
dto[["unitData"]][["tilda"]]%>%histogram_discrete("PH001")

dto[["unitData"]][["tilda"]]%>%dplyr::group_by_("PH001")%>%dplyr::summarize(n=n())
Source: local data frame [6 x 2]

      PH001     n
     (fctr) (int)
1 Excellent  1360
2 Very good  2448
3      Good  2758
4      Fair  1517
5      Poor   420
6        NA     1

PH009

dto[["metaData"]]%>%dplyr::filter(study_name=="tilda", name=="PH009")%>%dplyr::select(name,label)
   name                                                                              label
1 PH009 ph009  In general, compared to other people your age, would you say your health is
dto[["unitData"]][["tilda"]] %>%histogram_discrete("PH009")

dto[["unitData"]][["tilda"]]%>%dplyr::group_by_("PH009")%>%dplyr::summarize(n=n())
Source: local data frame [6 x 2]

      PH009     n
     (fctr) (int)
1 Excellent  1799
2 Very good  2871
3      Good  2525
4      Fair  1021
5      Poor   273
6        NA    15
sessionInfo()
R version 3.2.5 (2016-04-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggplot2_2.1.0 knitr_1.12.3  magrittr_1.5 

loaded via a namespace (and not attached):
 [1] splines_3.2.5       lattice_0.20-33     colorspace_1.2-6    htmltools_0.3.5     mgcv_1.8-12        
 [6] yaml_2.1.13         chron_2.3-47        survival_2.38-3     nloptr_1.0.4        foreign_0.8-66     
[11] DBI_0.4-1           RColorBrewer_1.1-2  plyr_1.8.3          stringr_1.0.0       MatrixModels_0.4-1 
[16] munsell_0.4.3       gtable_0.2.0        htmlwidgets_0.6     evaluate_0.9        labeling_0.3       
[21] latticeExtra_0.6-28 SparseM_1.7         extrafont_0.17      quantreg_5.21       pbkrtest_0.4-6     
[26] parallel_3.2.5      markdown_0.7.7      highr_0.5.1         Rttf2pt1_1.3.3      Rcpp_0.12.5        
[31] acepack_1.3-3.3     scales_0.4.0        DT_0.1.40           formatR_1.3         Hmisc_3.17-4       
[36] jsonlite_0.9.20     lme4_1.1-12         gridExtra_2.2.1     testit_0.5          digest_0.6.9       
[41] stringi_1.0-1       dplyr_0.4.3         grid_3.2.5          tools_3.2.5         lazyeval_0.1.10    
[46] dichromat_2.0-0     Formula_1.2-1       cluster_2.0.3       tidyr_0.4.1         extrafontdb_1.0    
[51] car_2.1-2           MASS_7.3-45         Matrix_1.2-4        rsconnect_0.4.2.1   data.table_1.9.6   
[56] assertthat_0.1      minqa_1.2.4         rmarkdown_0.9.6     R6_2.1.2            rpart_4.1-10       
[61] nnet_7.3-12         nlme_3.1-126