Exposition
- Ellis Island
  - Meta
ALSA
- SCHOOL
- TYPQUAL
LBSL
- EDUC94
SATSA
- EDUC
SHARE
- DN012D01
- DN012D02
- DN012D03
- DN012D04
- DN012D05
- DN012D09
- DN012DNO
- DN012DOT
- DN012DRF
- DN012DDK
- DN0100
TILDA
- DM001

This report lists the candidate variable for DataScheme variables of the construct education.

Exposition

This report is meant to be compiled after having executed the script ./manipulation/0-ellis-island.R, which prepares the necessary data transfer object (DTO). We begin with a brief recap of this script and the DTO it produces.

Ellis Island

All data land on Ellis Island.

The script 0-ellis-island.R is the first script in the analytic workflow. It accomplished the following:

1. Reads in raw data files from the candidate studies
1. Extract, combines, and exports their metadata (specifically, variable names and labels, if provided) into ./data/shared/derived/meta-data-live.csv, which is updated every time Ellis Island script is executed.
1. Augments raw metadata with instructions for renaming and classifying variables. The instructions are provided as manually entered values in ./data/shared/meta-data-map.csv. They are used by automatic scripts in later harmonization and analysis.
1. Combines unit and metadata into a single DTO to serve as a starting point to all subsequent analyses.

# load the product of 0-ellis-island.R,  a list object containing data and metadata
dto <- readRDS("./data/unshared/derived/dto.rds")

# the list is composed of the following elements
names(dto)

[1] "studyName" "filePath"  "unitData"  "metaData"

# 1st element - names of the studies as character vector
dto[["studyName"]]

[1] "alsa"  "lbsl"  "satsa" "share" "tilda"

# 2nd element - file paths of the data files for each study as character vector
dto[["filePath"]]

[1] "./data/unshared/raw/ALSA-Wave1.Final.sav"         "./data/unshared/raw/LBSL-Panel2-Wave1.Final.sav" 
[3] "./data/unshared/raw/SATSA-Q3.Final.sav"           "./data/unshared/raw/SHARE-Israel-Wave1.Final.sav"
[5] "./data/unshared/raw/TILDA-Wave1.Final.sav"

# 3rd element - list objects with the following elements
names(dto[["unitData"]])

[1] "alsa"  "lbsl"  "satsa" "share" "tilda"

# each of these elements is a raw data set of a corresponding study, for example
dplyr::tbl_df(dto[["unitData"]][["lbsl"]])

Source: local data frame [656 x 27]

        id AGE94 SEX94  MSTAT94 EDUC94     NOWRK94  SMK94                                         SMOKE
     (int) (int) (int)   (fctr)  (int)      (fctr) (fctr)                                        (fctr)
1  4001026    68     1 divorced     16 no, retired     no                                  never smoked
2  4012015    94     2  widowed     12 no, retired     no                                  never smoked
3  4012032    94     2  widowed     20 no, retired     no don't smoke at present but smoked in the past
4  4022004    93     2       NA     NA          NA     NA                                  never smoked
5  4022026    93     2  widowed     12 no, retired     no                                  never smoked
6  4031031    92     1  married      8 no, retired     no don't smoke at present but smoked in the past
7  4031035    92     1  widowed     13 no, retired     no don't smoke at present but smoked in the past
8  4032201    92     2       NA     NA          NA     NA don't smoke at present but smoked in the past
9  4041062    91     1  widowed      7          NA     no don't smoke at present but smoked in the past
10 4042057    91     2       NA     NA          NA     NA                                            NA
..     ...   ...   ...      ...    ...         ...    ...                                           ...
Variables not shown: ALCOHOL (fctr), WINE (int), BEER (int), HARDLIQ (int), SPORT94 (int), FIT94 (int), WALK94 (int),
  SPEC94 (int), DANCE94 (int), CHORE94 (int), EXCERTOT (int), EXCERWK (int), HEIGHT94 (int), WEIGHT94 (int), HWEIGHT
  (int), HHEIGHT (int), SRHEALTH (fctr), smoke_now (lgl), smoked_ever (lgl)

ALSA

SCHOOL

dto[["metaData"]] %>% dplyr::filter(study_name=="alsa", name=="SCHOOL") %>% dplyr::select(name,label)

    name           label
1 SCHOOL Age left school

dto[["unitData"]][["alsa"]]%>% histogram_discrete("SCHOOL")

dto[["unitData"]][["alsa"]]%>% dplyr::group_by_("SCHOOL") %>% dplyr::summarize(n=n())

Source: local data frame [8 x 2]

                  SCHOOL     n
                  (fctr) (int)
1   Never went to school    30
2   Under fourteen years   306
3         Fourteen years   819
4          Fifteen years   382
5          Sixteen years   280
6        Seventeen years   131
7 Eighteen or more years   113
8                     NA    26

TYPQUAL

dto[["metaData"]] %>% dplyr::filter(study_name=="alsa", name=="TYPQUAL") %>% dplyr::select(name,label)

     name                 label
1 TYPQUAL Highest qualification

dto[["unitData"]][["alsa"]]%>% histogram_discrete("TYPQUAL")

dto[["unitData"]][["alsa"]]%>% dplyr::group_by_("TYPQUAL") %>% dplyr::summarize(n=n())

Source: local data frame [10 x 2]

                                    TYPQUAL     n
                                     (fctr) (int)
1                     Primary School Course     1
2                   Secondary School Course    17
3                   Trade or Apprenticeship   236
4                    Certificate or Diploma   332
5  Bachelor Degree or Post Graduate Diploma    80
6                      Higher Qualification    14
7           Adult Education or Hobby Course    11
8                                     Other     6
9                         No Formal Tuition     3
10                                       NA  1387

LBSL

EDUC94

dto[["metaData"]] %>% dplyr::filter(study_name=="lbsl", name=="EDUC94") %>% dplyr::select(name,label)

    name                                      label
1 EDUC94 Number of Years of school completed (1-20)

dto[["unitData"]][["lbsl"]]%>% histogram_discrete("EDUC94")

dto[["unitData"]][["lbsl"]]%>% dplyr::group_by_("EDUC94") %>% dplyr::summarize(n=n())

Source: local data frame [18 x 2]

   EDUC94     n
    (int) (int)
1       4     1
2       7     6
3       8    16
4       9     4
5      10    29
6      11    18
7      12   170
8      13    40
9      14    85
10     15    37
11     16    62
12     17    15
13     18    28
14     19    10
15     20    31
16     21     1
17     23     1
18     NA   102

SATSA

EDUC

dto[["metaData"]] %>% dplyr::filter(study_name=="satsa", name=="EDUC") %>% dplyr::select(name,label)

  name     label
1 EDUC Education

dto[["unitData"]][["satsa"]]%>% histogram_discrete("EDUC")

dto[["unitData"]][["satsa"]]%>% dplyr::group_by_("EDUC") %>% dplyr::summarize(n=n())

Source: local data frame [5 x 2]

                                         EDUC     n
                                       (fctr) (int)
1                           Elementary school   858
2 O-level or vocational school or folk school   381
3                         gymnasium (A-level)   121
4                        university or higher   109
5                                          NA    28

DN012D01

dto[["metaData"]] %>% dplyr::filter(study_name=="share", name=="DN012D01") %>% dplyr::select(name,label)

      name                               label
1 DN012D01 yeshiva, religious high institution

dto[["unitData"]][["share"]]%>% histogram_discrete("DN012D01")

dto[["unitData"]][["share"]]%>% dplyr::group_by_("DN012D01") %>% dplyr::summarize(n=n())

Source: local data frame [3 x 2]

      DN012D01     n
        (fctr) (int)
1 not selected  2581
2     selected    16
3           NA     1

DN012D02

dto[["metaData"]] %>% dplyr::filter(study_name=="share", name=="DN012D02") %>% dplyr::select(name,label)

      name          label
1 DN012D02 nursing school

dto[["unitData"]][["share"]]%>% histogram_discrete("DN012D02")

dto[["unitData"]][["share"]]%>% dplyr::group_by_("DN012D02") %>% dplyr::summarize(n=n())

Source: local data frame [3 x 2]

      DN012D02     n
        (fctr) (int)
1 not selected  2543
2     selected    54
3           NA     1

DN012D03

dto[["metaData"]] %>% dplyr::filter(study_name=="share", name=="DN012D03") %>% dplyr::select(name,label)

      name       label
1 DN012D03 polytechnic

dto[["unitData"]][["share"]]%>% histogram_discrete("DN012D03")

dto[["unitData"]][["share"]]%>% dplyr::group_by_("DN012D03") %>% dplyr::summarize(n=n())

Source: local data frame [3 x 2]

      DN012D03     n
        (fctr) (int)
1 not selected  2482
2     selected   115
3           NA     1

DN012D04

dto[["metaData"]] %>% dplyr::filter(study_name=="share", name=="DN012D04") %>% dplyr::select(name,label)

      name                        label
1 DN012D04 university, Bachelors degree

dto[["unitData"]][["share"]]%>% histogram_discrete("DN012D04")

dto[["unitData"]][["share"]]%>% dplyr::group_by_("DN012D04") %>% dplyr::summarize(n=n())

Source: local data frame [3 x 2]

      DN012D04     n
        (fctr) (int)
1 not selected  2213
2     selected   384
3           NA     1

DN012D05

dto[["metaData"]] %>% dplyr::filter(study_name=="share", name=="DN012D05") %>% dplyr::select(name,label)

      name                       label
1 DN012D05 university, graduate degree

dto[["unitData"]][["share"]]%>% histogram_discrete("DN012D05")

dto[["unitData"]][["share"]]%>% dplyr::group_by_("DN012D05") %>% dplyr::summarize(n=n())

Source: local data frame [3 x 2]

      DN012D05     n
        (fctr) (int)
1 not selected  2358
2     selected   239
3           NA     1

DN012D09

dto[["metaData"]] %>% dplyr::filter(study_name=="share", name=="DN012D09") %>% dplyr::select(name,label)

      name                                  label
1 DN012D09 still in further education or training

dto[["unitData"]][["share"]]%>% histogram_discrete("DN012D09")

dto[["unitData"]][["share"]]%>% dplyr::group_by_("DN012D09") %>% dplyr::summarize(n=n())

Source: local data frame [3 x 2]

      DN012D09     n
        (fctr) (int)
1 not selected  2588
2     selected     9
3           NA     1

DN012DNO

dto[["metaData"]] %>% dplyr::filter(study_name=="share", name=="DN012DNO") %>% dplyr::select(name,label)

      name                label
1 DN012DNO no further education

dto[["unitData"]][["share"]]%>% histogram_discrete("DN012DNO")

dto[["unitData"]][["share"]]%>% dplyr::group_by_("DN012DNO") %>% dplyr::summarize(n=n())

Source: local data frame [3 x 2]

      DN012DNO     n
        (fctr) (int)
1 not selected  1077
2     selected  1520
3           NA     1

DN012DOT

dto[["metaData"]] %>% dplyr::filter(study_name=="share", name=="DN012DOT") %>% dplyr::select(name,label)

      name                   label
1 DN012DOT other further education

dto[["unitData"]][["share"]]%>% histogram_discrete("DN012DOT")

dto[["unitData"]][["share"]]%>% dplyr::group_by_("DN012DOT") %>% dplyr::summarize(n=n())

Source: local data frame [3 x 2]

      DN012DOT     n
        (fctr) (int)
1 not selected  2320
2     selected   277
3           NA     1

DN012DRF

dto[["metaData"]] %>% dplyr::filter(study_name=="share", name=="DN012DRF") %>% dplyr::select(name,label)

      name   label
1 DN012DRF refused

dto[["unitData"]][["share"]]%>% histogram_discrete("DN012DRF")

dto[["unitData"]][["share"]]%>% dplyr::group_by_("DN012DRF") %>% dplyr::summarize(n=n())

Source: local data frame [2 x 2]

      DN012DRF     n
        (fctr) (int)
1 not selected  2597
2           NA     1

DN012DDK

dto[["metaData"]] %>% dplyr::filter(study_name=="share", name=="DN012DDK") %>% dplyr::select(name,label)

      name     label
1 DN012DDK dont know

dto[["unitData"]][["share"]]%>% histogram_discrete("DN012DDK")

dto[["unitData"]][["share"]]%>% dplyr::group_by_("DN012DDK") %>% dplyr::summarize(n=n())

Source: local data frame [3 x 2]

      DN012DDK     n
        (fctr) (int)
1 not selected  2596
2     selected     1
3           NA     1

DN0100

dto[["metaData"]] %>% dplyr::filter(study_name=="share", name=="DN0100") %>% dplyr::select(name,label)

    name                               label
1 DN0100 highest educational degree obtained

dto[["unitData"]][["share"]]%>% histogram_discrete("DN0100")

dto[["unitData"]][["share"]]%>% dplyr::group_by_("DN0100") %>% dplyr::summarize(n=n())

Source: local data frame [13 x 2]

                                                        DN0100     n
                                                        (fctr) (int)
1                                            Elementary school   501
2  Partial occipational secondary school (did not graduate, no   102
3        full occipational secondary school (no matriculation)   174
4      full occipational secondary school (with matriculation)   113
5         partial Academic secondary school (no matriculation)   219
6            full Academic secondary school (no matriculation)   274
7          full Academic secondary school (with matriculation)  1024
8                  yeshiva secondary school (no matriculation)     8
9                 yeshiva secondary school (wih matriculation)     6
10                                                        none   143
11                                    other type (also abroad)    32
12                                                  don't know     1
13                                                          NA     1

TILDA

DM001

dto[["metaData"]] %>% dplyr::filter(study_name=="tilda", name=="DM001") %>% dplyr::select(name,label)

   name                                                            label
1 DM001 dm001  What is the highest level of education you have completed

dto[["unitData"]][["tilda"]]%>% histogram_discrete("DM001")

dto[["unitData"]][["tilda"]]%>% dplyr::group_by_("DM001") %>% dplyr::summarize(n=n())

Source: local data frame [9 x 2]

                                                DM001     n
                                               (fctr) (int)
1                         Some primary (not complete)   280
2                               Primary or equivalent  2232
3 Intermediate/junior/group certificate or equivalent  1971
4                   Leaving certificate or equivalent  1460
5                                 Diploma/certificate  1335
6                                      Primary degree   730
7                          Postgraduate/higher degree   483
8                                                None     9
9                                                  NA     4

sessionInfo()

R version 3.2.5 (2016-04-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggplot2_2.1.0 knitr_1.12.3  magrittr_1.5 

loaded via a namespace (and not attached):
 [1] splines_3.2.5       lattice_0.20-33     colorspace_1.2-6    htmltools_0.3.5     mgcv_1.8-12        
 [6] yaml_2.1.13         chron_2.3-47        survival_2.38-3     nloptr_1.0.4        foreign_0.8-66     
[11] DBI_0.4-1           RColorBrewer_1.1-2  plyr_1.8.3          stringr_1.0.0       MatrixModels_0.4-1 
[16] munsell_0.4.3       gtable_0.2.0        htmlwidgets_0.6     evaluate_0.9        labeling_0.3       
[21] latticeExtra_0.6-28 SparseM_1.7         extrafont_0.17      quantreg_5.21       pbkrtest_0.4-6     
[26] parallel_3.2.5      markdown_0.7.7      highr_0.5.1         Rttf2pt1_1.3.3      Rcpp_0.12.5        
[31] acepack_1.3-3.3     scales_0.4.0        DT_0.1.40           formatR_1.3         Hmisc_3.17-4       
[36] jsonlite_0.9.20     lme4_1.1-12         gridExtra_2.2.1     testit_0.5          digest_0.6.9       
[41] stringi_1.0-1       dplyr_0.4.3         grid_3.2.5          tools_3.2.5         lazyeval_0.1.10    
[46] dichromat_2.0-0     Formula_1.2-1       cluster_2.0.3       tidyr_0.4.1         extrafontdb_1.0    
[51] car_2.1-2           MASS_7.3-45         Matrix_1.2-4        rsconnect_0.4.2.1   data.table_1.9.6   
[56] assertthat_0.1      minqa_1.2.4         rmarkdown_0.9.6     R6_2.1.2            rpart_4.1-10       
[61] nnet_7.3-12         nlme_3.1-126

Describe: education