This report lists the candidate variable for DataScheme variables of the construct work status.

Exposition

This report is meant to be compiled after having executed the script ./manipulation/0-ellis-island.R, which prepares the necessary data transfer object (DTO). We begin with a brief recap of this script and the DTO it produces.

Ellis Island

All data land on Ellis Island.

The script 0-ellis-island.R is the first script in the analytic workflow. It accomplished the following:

    1. Reads in raw data files from the candidate studies
    1. Extract, combines, and exports their metadata (specifically, variable names and labels, if provided) into ./data/shared/derived/meta-data-live.csv, which is updated every time Ellis Island script is executed.
    1. Augments raw metadata with instructions for renaming and classifying variables. The instructions are provided as manually entered values in ./data/shared/meta-data-map.csv. They are used by automatic scripts in later harmonization and analysis.
    1. Combines unit and metadata into a single DTO to serve as a starting point to all subsequent analyses.
# load the product of 0-ellis-island.R,  a list object containing data and metadata
dto <- readRDS("./data/unshared/derived/dto.rds")
# the list is composed of the following elements
names(dto)
[1] "studyName" "filePath"  "unitData"  "metaData" 
# 1st element - names of the studies as character vector
dto[["studyName"]]
[1] "alsa"  "lbsl"  "satsa" "share" "tilda"
# 2nd element - file paths of the data files for each study as character vector
dto[["filePath"]]
[1] "./data/unshared/raw/ALSA-Wave1.Final.sav"         "./data/unshared/raw/LBSL-Panel2-Wave1.Final.sav" 
[3] "./data/unshared/raw/SATSA-Q3.Final.sav"           "./data/unshared/raw/SHARE-Israel-Wave1.Final.sav"
[5] "./data/unshared/raw/TILDA-Wave1.Final.sav"       
# 3rd element - list objects with the following elements
names(dto[["unitData"]])
[1] "alsa"  "lbsl"  "satsa" "share" "tilda"
# each of these elements is a raw data set of a corresponding study, for example
dplyr::tbl_df(dto[["unitData"]][["lbsl"]]) 
Source: local data frame [656 x 27]

        id AGE94 SEX94  MSTAT94 EDUC94     NOWRK94  SMK94                                         SMOKE
     (int) (int) (int)   (fctr)  (int)      (fctr) (fctr)                                        (fctr)
1  4001026    68     1 divorced     16 no, retired     no                                  never smoked
2  4012015    94     2  widowed     12 no, retired     no                                  never smoked
3  4012032    94     2  widowed     20 no, retired     no don't smoke at present but smoked in the past
4  4022004    93     2       NA     NA          NA     NA                                  never smoked
5  4022026    93     2  widowed     12 no, retired     no                                  never smoked
6  4031031    92     1  married      8 no, retired     no don't smoke at present but smoked in the past
7  4031035    92     1  widowed     13 no, retired     no don't smoke at present but smoked in the past
8  4032201    92     2       NA     NA          NA     NA don't smoke at present but smoked in the past
9  4041062    91     1  widowed      7          NA     no don't smoke at present but smoked in the past
10 4042057    91     2       NA     NA          NA     NA                                            NA
..     ...   ...   ...      ...    ...         ...    ...                                           ...
Variables not shown: ALCOHOL (fctr), WINE (int), BEER (int), HARDLIQ (int), SPORT94 (int), FIT94 (int), WALK94 (int),
  SPEC94 (int), DANCE94 (int), CHORE94 (int), EXCERTOT (int), EXCERWK (int), HEIGHT94 (int), WEIGHT94 (int), HWEIGHT
  (int), HHEIGHT (int), SRHEALTH (fctr), smoke_now (lgl), smoked_ever (lgl)

Meta

# 4th element - a dataset names and labels of raw variables + added metadata for all studies
dto[["metaData"]] %>% dplyr::select(study_name, name, item, construct, type, categories, label_short, label) %>% 
  DT::datatable(
    class   = 'cell-border stripe',
    caption = "This is the primary metadata file. Edit at `./data/shared/meta-data-map.csv",
    filter  = "top",
    options = list(pageLength = 6, autoWidth = TRUE)
  )

ALSA

RETIRED

dto[["metaData"]] %>% dplyr::filter(study_name=="alsa", name=="RETIRED") %>% dplyr::select(name,label)
     name                               label
1 RETIRED Are you retired from your last job?
dto[["unitData"]][["alsa"]]%>% histogram_discrete("RETIRED")

dto[["unitData"]][["alsa"]]%>% dplyr::group_by_("RETIRED") %>% dplyr::summarize(n=n())
Source: local data frame [3 x 2]

  RETIRED     n
   (fctr) (int)
1     Yes  1767
2      No   134
3      NA   186

CURRWORK

dto[["metaData"]] %>% dplyr::filter(study_name=="alsa", name=="CURRWORK") %>% dplyr::select(name,label)
      name             label
1 CURRWORK Currently working
dto[["unitData"]][["alsa"]]%>% histogram_discrete("CURRWORK")

dto[["unitData"]][["alsa"]]%>% dplyr::group_by_("CURRWORK") %>% dplyr::summarize(n=n())
Source: local data frame [3 x 2]

  CURRWORK     n
    (fctr) (int)
1      Yes    31
2       No  2038
3       NA    18

LBSL

NOWRK94

dto[["metaData"]] %>% dplyr::filter(study_name=="lbsl", name=="NOWRK94") %>% dplyr::select(name,label)
     name                    label
1 NOWRK94 Working at present time?
dto[["unitData"]][["lbsl"]]%>% histogram_discrete("NOWRK94")

dto[["unitData"]][["lbsl"]]%>% dplyr::group_by_("NOWRK94") %>% dplyr::summarize(n=n())
Source: local data frame [9 x 2]

                 NOWRK94     n
                  (fctr) (int)
1         yes, full time   105
2         yes, part time    64
3 yes, more than one job     2
4            no, retired   318
5          no, homemaker    34
6         no, unemployed     7
7   no, not seeking work     7
8           no, disabled    14
9                     NA   105

SATSA

GAMTWORK

dto[["metaData"]] %>% dplyr::filter(study_name=="satsa", name=="GAMTWORK") %>% dplyr::select(name,label)
      name                                                                                      label
1 GAMTWORK Which of the following alternatives best describes your current work/retirement situation?
dto[["unitData"]][["satsa"]]%>% histogram_discrete("GAMTWORK")

dto[["unitData"]][["satsa"]]%>% dplyr::group_by_("GAMTWORK") %>% dplyr::summarize(n=n())
Source: local data frame [11 x 2]

                           GAMTWORK     n
                             (fctr) (int)
1                 old-age pensioner   778
2           pension due to sickness    77
3     On leave of absence from work     2
4                    work half-time   112
5                    work full-time   407
6    Unemployed (looking for a job)    13
7  Unemployed (not looking for job)     3
8                 full time student     3
9                housewife/houseman    22
10                           other'    58
11                               NA    22

SHARE

EP0050

dto[["metaData"]] %>% dplyr::filter(study_name=="share", name=="EP0050") %>% dplyr::select(name,label)
    name                 label
1 EP0050 current job situation
dto[["unitData"]][["share"]]%>% histogram_discrete("EP0050")

dto[["unitData"]][["share"]]%>% dplyr::group_by_("EP0050") %>% dplyr::summarize(n=n())
Source: local data frame [10 x 2]

                                                         EP0050     n
                                                         (fctr) (int)
1                                                       Retired  1071
2  Employed or self-employed (including working for family busi   932
3                                       Unemployed, seeking job    64
4                                   Unemployed, not seeking job    64
5                                  Temporarily sick or disabled    46
6                                  Permanently sick or disabled    89
7                                                     Homemaker   289
8                                               Other (specify)    34
9                                                    Don't know     1
10                                                           NA     8

TILDA

WE001

dto[["metaData"]] %>% dplyr::filter(study_name=="tilda", name=="WE001") %>% dplyr::select(name,label)
   name                                                                   label
1 WE001 Which one of these would you say best describes your current situation?
dto[["unitData"]][["tilda"]]%>% histogram_discrete("WE001")

dto[["unitData"]][["tilda"]]%>% dplyr::group_by_("WE001") %>% dplyr::summarize(n=n())
Source: local data frame [9 x 2]

                              WE001     n
                             (fctr) (int)
1                           Retired  3048
2                          Employed  2218
3 Self-employed (including farming)   923
4                        Unemployed   413
5      Permanently sick or disabled   395
6      Looking after home or family  1346
7          In education or training    55
8                   Other (Specify)   104
9                                NA     2

WE003

dto[["metaData"]] %>% dplyr::filter(study_name=="tilda", name=="WE003") %>% dplyr::select(name,label)
   name                                                                          label
1 WE003 Did you, nevertheless, do any paid work during the last week, either as an em?
dto[["unitData"]][["tilda"]]%>% histogram_discrete("WE003")

dto[["unitData"]][["tilda"]]%>% dplyr::group_by_("WE003") %>% dplyr::summarize(n=n())
Source: local data frame [3 x 2]

              WE003     n
             (fctr) (int)
1 UNDOCUMENTED CODE  3141
2               Yes   256
3                No  5107
sessionInfo()
R version 3.2.5 (2016-04-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggplot2_2.1.0 knitr_1.12.3  magrittr_1.5 

loaded via a namespace (and not attached):
 [1] splines_3.2.5       lattice_0.20-33     colorspace_1.2-6    htmltools_0.3.5     mgcv_1.8-12        
 [6] yaml_2.1.13         chron_2.3-47        survival_2.38-3     nloptr_1.0.4        foreign_0.8-66     
[11] DBI_0.4-1           RColorBrewer_1.1-2  plyr_1.8.3          stringr_1.0.0       MatrixModels_0.4-1 
[16] munsell_0.4.3       gtable_0.2.0        htmlwidgets_0.6     evaluate_0.9        labeling_0.3       
[21] latticeExtra_0.6-28 SparseM_1.7         extrafont_0.17      quantreg_5.21       pbkrtest_0.4-6     
[26] parallel_3.2.5      markdown_0.7.7      highr_0.5.1         Rttf2pt1_1.3.3      Rcpp_0.12.5        
[31] acepack_1.3-3.3     scales_0.4.0        DT_0.1.40           formatR_1.3         Hmisc_3.17-4       
[36] jsonlite_0.9.20     lme4_1.1-12         gridExtra_2.2.1     testit_0.5          digest_0.6.9       
[41] stringi_1.0-1       dplyr_0.4.3         grid_3.2.5          tools_3.2.5         lazyeval_0.1.10    
[46] dichromat_2.0-0     Formula_1.2-1       cluster_2.0.3       tidyr_0.4.1         extrafontdb_1.0    
[51] car_2.1-2           MASS_7.3-45         Matrix_1.2-4        rsconnect_0.4.2.1   data.table_1.9.6   
[56] assertthat_0.1      minqa_1.2.4         rmarkdown_0.9.6     R6_2.1.2            rpart_4.1-10       
[61] nnet_7.3-12         nlme_3.1-126