(I) Exposition
- (I.A) Ellis Island
  - Meta
- (I.B) Target-H
(II) Development
- (II.A)
  - (1) Schema sets
- (II.B) sedentary
  - ALSA
  - LBSL
  - SATSA
  - SHARE
  - TILDA
(III) Recapitulation

This report lists the candidate variable for DataScheme variables of the construct physical activity.

(I) Exposition

This report is a record of interaction with a data transfer object (dto) produced by ./manipulation/0-ellis-island.R.

The next section recaps this script, exposes the architecture of the DTO, and demonstrates the language of interacting with it.

(I.A) Ellis Island

All data land on Ellis Island.

The script 0-ellis-island.R is the first script in the analytic workflow. It accomplished the following:

1. Reads in raw data files from the candidate studies
1. Extract, combines, and exports their metadata (specifically, variable names and labels, if provided) into ./data/shared/derived/meta-data-live.csv, which is updated every time Ellis Island script is executed.
1. Augments raw metadata with instructions for renaming and classifying variables. The instructions are provided as manually entered values in ./data/shared/meta-data-map.csv. They are used by automatic scripts in later harmonization and analysis.
1. Combines unit and metadata into a single DTO to serve as a starting point to all subsequent analyses.

# load the product of 0-ellis-island.R,  a list object containing data and metadata
dto <- readRDS("./data/unshared/derived/dto.rds")

# the list is composed of the following elements
names(dto)

[1] "studyName" "filePath"  "unitData"  "metaData"

# 1st element - names of the studies as character vector
dto[["studyName"]]

[1] "alsa"  "lbsl"  "satsa" "share" "tilda"

# 2nd element - file paths of the data files for each study as character vector
dto[["filePath"]]

[1] "./data/unshared/raw/ALSA-Wave1.Final.sav"         "./data/unshared/raw/LBSL-Panel2-Wave1.Final.sav" 
[3] "./data/unshared/raw/SATSA-Q3.Final.sav"           "./data/unshared/raw/SHARE-Israel-Wave1.Final.sav"
[5] "./data/unshared/raw/TILDA-Wave1.Final.sav"

# 3rd element - is a list object containing the following elements
names(dto[["unitData"]])

[1] "alsa"  "lbsl"  "satsa" "share" "tilda"

# each of these elements is a raw data set of a corresponding study, for example
dplyr::tbl_df(dto[["unitData"]][["lbsl"]])

Source: local data frame [656 x 36]

        id AGE94 SEX94  MSTAT94 EDUC94     NOWRK94  SMK94                                         SMOKE
     (int) (int) (int)   (fctr)  (int)      (fctr) (fctr)                                        (fctr)
1  4001026    68     1 divorced     16 no, retired     no                                  never smoked
2  4012015    94     2  widowed     12 no, retired     no                                  never smoked
3  4012032    94     2  widowed     20 no, retired     no don't smoke at present but smoked in the past
4  4022004    93     2       NA     NA          NA     NA                                  never smoked
5  4022026    93     2  widowed     12 no, retired     no                                  never smoked
6  4031031    92     1  married      8 no, retired     no don't smoke at present but smoked in the past
7  4031035    92     1  widowed     13 no, retired     no don't smoke at present but smoked in the past
8  4032201    92     2       NA     NA          NA     NA don't smoke at present but smoked in the past
9  4041062    91     1  widowed      7          NA     no don't smoke at present but smoked in the past
10 4042057    91     2       NA     NA          NA     NA                                            NA
..     ...   ...   ...      ...    ...         ...    ...                                           ...
Variables not shown: ALCOHOL (fctr), WINE (int), BEER (int), HARDLIQ (int), SPORT94 (int), FIT94 (int), WALK94 (int),
  SPEC94 (int), DANCE94 (int), CHORE94 (int), EXCERTOT (int), EXCERWK (int), HEIGHT94 (int), WEIGHT94 (int), HWEIGHT
  (int), HHEIGHT (int), SRHEALTH (fctr), smoke_now (lgl), smoked_ever (lgl), year_of_wave (dbl), age_in_years (dbl),
  year_born (dbl), female (lgl), marital (chr), single (lgl), educ3 (chr), current_work_2 (lgl), current_drink (lgl)

(I.B) Target-H

Everybody wants to be somebody.

We query metadata set to retrieve all variables potentially tapping the construct physical activity. These are the candidates to enter the DataSchema and contribute to computing harmonized variables.

NOTE: what is being retrieved depends on the manually entered values in the column construct of the metadata file ./data/shared/meta-data-map.csv. To specify a different group of variables, edit the metadata, not the script.

meta_data <- dto[["metaData"]] %>%
  dplyr::filter(construct %in% c('physact')) %>% 
  dplyr::select(study_name, name, construct, label_short, categories, url) %>%
  dplyr::arrange(construct, study_name)
knitr::kable(meta_data)

study_name	name	construct	label_short	categories
alsa	EXRTHOUS	physact	Exertion around house	NA
alsa	HWMNWK2W	physact	Times walked in past two weeks	NA
alsa	LSVEXC2W	physact	Less vigor sessions last 2 weeks	NA
alsa	LSVIGEXC	physact	Less vigor past 2 weeks	NA
alsa	TMHVYEXR	physact	Time heavy physical exertion	NA
alsa	TMVEXC2W	physact	Vigor Time past 2 weeks	NA
alsa	VIGEXC2W	physact	Vigor Sessions in past 2 weeks	NA
alsa	VIGEXCS	physact	Vigorous exercise	NA
alsa	WALK2WKS	physact	Walking past 2 weeks	NA
lbsl	SPORT94	physact	Participant sports, number of hours	NA
lbsl	FIT94	physact	Physical fitness, number of hours each week	NA
lbsl	WALK94	physact	Walking, number of hours per week	NA
lbsl	SPEC94	physact	Spectator sports, number of hours spent per week	NA
lbsl	DANCE94	physact	Dancing	NA
lbsl	CHORE94	physact	Doing household chores (hrs/wk)	NA
lbsl	EXCERTOT	physact	Exercising for shape/fun (hrs/wk)	NA
lbsl	EXCERWK	physact	Exercised or played sports (oc/wk)	NA
satsa	GEXERCIS	physact	What option best describes your exercise on a yearly basis?	NA
share	BR0150	physact	sports or activities that are vigorous	NA
share	BR0160	physact	activities requiring a moderate level of energy	NA
tilda	BH101	physact	During the last 7 days, on how many days did you do vigorous physical activit?	NA
tilda	BH102	physact	How much time did you usually spend doing vigorous physical activities on one?	NA
tilda	BH102A	physact	How much time did you usually spend doing vigorous physical activities on one?	NA
tilda	BH103	physact	During the last 7 days, on how many days did you do moderate physical activit?	NA
tilda	BH104	physact	How much time did you usually spend doing moderate physical activities on one?	NA
tilda	BH104A	physact	How much time did you usually spend doing moderate physical activities on one?	NA
tilda	BH105	physact	During the last 7 days, on how many days did you walk for at least 10 minutes?	NA
tilda	BH106	physact	How much time did you usually spend walking on one of those days? HOURS	NA
tilda	BH106A	physact	How much time did you usually spend walking on one of those days? MINS	NA
tilda	BH107	physact	During the last 7 days, how much time did you spend sitting on a week day? HO?	NA
tilda	BH107A	physact	During the last 7 days, how much time did you spend sitting on a week day? MINS	NA
tilda	IPAQMETMINUTES	physact	Physical activity met (minutes)	NA
tilda	IPAQEXERCISE3	physact	Physical activity met (minutes)	NA

View descriptives : physical activity for closer examination of each candidate.

After reviewing descriptives and relevant codebooks, the following operationalization of the harmonized variables for physical activity have been adopted:

Target (1) : `sedentary`

0 - FALSE
1 - TRUE

These variables will be generated next, in the Development section.

(II) Development

The particulare goal of this section is to ensure that the schema to encode the values for the physical activity variable is consisten across studies.

In this section we will define the schema sets for harmonizing physical activity construct (i.e. specify which variables from which studies will be contributing to computing harmonized variables ). Each of these schema sets will have a particular pattern of possible response values to these variables, which we will export for inspection as .csv tables. We then will manually edit these .csv tables, populating new columns that will map values of harmonized variables to the specific response pattern of the schema set variables. We then will import harmonization algorithms encoded in .csv tables and apply them to compute harmonized variables in the dataset combining raw and harmonized variables for physical activity construct across studies.

(II.A)

(1) Schema sets

Having all potential variables in categorical format we have defined the sets of data schema variables thus:

Each of these schema sets have a particular pattern of possible response values, for example:

We output these tables into self-standing .csv files, so we can manually provide the logic of computing harmonized variables.

You can examine them in `./data/meta/response-profiles-live/

(II.B) `sedentary`

Target (1) : `sedentary`

0 - FALSE
1 - TRUE

ALSA

Items that can contribute to generating values for the harmonized variable sedentary are:

dto[["metaData"]] %>%
  dplyr::filter(study_name=="alsa", construct %in% c("physact")) %>%
  dplyr::select(study_name, name, label,categories)

  study_name     name                            label categories
1       alsa EXRTHOUS            Exertion around house         NA
2       alsa HWMNWK2W   Times walked in past two weeks         NA
3       alsa LSVEXC2W Less vigor sessions last 2 weeks         NA
4       alsa LSVIGEXC          Less vigor past 2 weeks         NA
5       alsa TMHVYEXR     Time heavy physical exertion         NA
6       alsa TMVEXC2W          Vigor Time past 2 weeks         NA
7       alsa VIGEXC2W   Vigor Sessions in past 2 weeks         NA
8       alsa  VIGEXCS                Vigorous exercise         NA
9       alsa WALK2WKS             Walking past 2 weeks         NA

We encode the harmonization rule by manually editing the values in a corresponding .csv file located in ./data/meta/h-rules/. Then, we apply the recoding logic it contains and append the newly created, harmonized variable to the initial data set.

study_name <- "alsa"
path_to_hrule <- "./data/meta/h-rules/h-rules-physact-alsa.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
  dto,
  study_name = study_name, 
  variable_names = c("WALK2WKS", "LSVIGEXC", "VIGEXCS", "EXRTHOUS"), 
  harmony_name = "sedentary"
)

Source: local data frame [18 x 6]
Groups: WALK2WKS, LSVIGEXC, VIGEXCS, EXRTHOUS [?]

   WALK2WKS LSVIGEXC VIGEXCS EXRTHOUS sedentary     n
      (chr)    (chr)   (chr)    (chr)     (lgl) (int)
1        No       No      No       No      TRUE   814
2        No       No      No      Yes     FALSE   113
3        No       No     Yes       No     FALSE    14
4        No       No     Yes      Yes     FALSE     6
5        No      Yes      No       No     FALSE   118
6        No      Yes      No      Yes     FALSE    18
7        No      Yes     Yes       No     FALSE     4
8        No      Yes     Yes      Yes     FALSE     4
9       Yes       No      No       No     FALSE   601
10      Yes       No      No      Yes     FALSE    98
11      Yes       No     Yes       No     FALSE    21
12      Yes       No     Yes      Yes     FALSE     8
13      Yes      Yes      No       No     FALSE   177
14      Yes      Yes      No      Yes     FALSE    39
15      Yes      Yes      No       NA     FALSE     1
16      Yes      Yes     Yes       No     FALSE    24
17      Yes      Yes     Yes      Yes     FALSE     4
18       NA       NA      NA       NA        NA    23

# verify
dto[["unitData"]][["alsa"]] %>%
  dplyr::filter(id %in% sample(unique(id),10)) %>%
  dplyr::select_("id", "WALK2WKS", "LSVIGEXC", "VIGEXCS", "EXRTHOUS", "sedentary")

      id WALK2WKS LSVIGEXC VIGEXCS EXRTHOUS sedentary
1    581      Yes      Yes      No       No     FALSE
2    761      Yes       No      No       No     FALSE
3   3032      Yes       No      No       No     FALSE
4   5771      Yes       No      No      Yes     FALSE
5  17712      Yes       No      No       No     FALSE
6  22601       No       No      No       No      TRUE
7  22941       No       No      No       No      TRUE
8  23161      Yes       No      No      Yes     FALSE
9  24401       No       No      No      Yes     FALSE
10 29611      Yes       No      No       No     FALSE

LBSL

Items that can contribute to generating values for the harmonized variable sedentary are:

dto[["metaData"]] %>%
  dplyr::filter(study_name == "lbsl", construct == "physact") %>%
  dplyr::select(study_name, name, label_short,categories)

  study_name     name                                      label_short categories
1       lbsl  SPORT94              Participant sports, number of hours         NA
2       lbsl    FIT94      Physical fitness, number of hours each week         NA
3       lbsl   WALK94                Walking, number of hours per week         NA
4       lbsl   SPEC94 Spectator sports, number of hours spent per week         NA
5       lbsl  DANCE94                                          Dancing         NA
6       lbsl  CHORE94                  Doing household chores (hrs/wk)         NA
7       lbsl EXCERTOT                Exercising for shape/fun (hrs/wk)         NA
8       lbsl  EXCERWK               Exercised or played sports (oc/wk)         NA

study_name <- "lbsl"
path_to_hrule <- "./data/meta/h-rules/h-rules-physact-lbsl.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
  dto,
  study_name = study_name, 
  variable_names = c("WALK94","EXCERTOT"), 
  harmony_name = "sedentary"
)

Source: local data frame [133 x 4]
Groups: WALK94, EXCERTOT [?]

    WALK94 EXCERTOT sedentary     n
     (chr)    (chr)     (lgl) (int)
1        0        0      TRUE    39
2        0        1     FALSE     2
3        0       10     FALSE     2
4        0       12     FALSE     2
5        0       14     FALSE     1
6        0       16     FALSE     1
7        0        2     FALSE     5
8        0        3     FALSE     2
9        0        4     FALSE     9
10       0        5     FALSE     4
11       0        6     FALSE     4
12       0        7     FALSE     2
13       0        9     FALSE     1
14       0       NA      TRUE     1
15       1        0     FALSE    12
16       1        1     FALSE    18
17       1       14     FALSE     4
18       1        2     FALSE    12
19       1        3     FALSE    11
20       1        4     FALSE     3
21       1        5     FALSE     6
22       1        6     FALSE     1
23       1        7     FALSE     1
24       1        8     FALSE     1
25       1       NA     FALSE     3
26      10       10     FALSE     2
27      10       20     FALSE     2
28      10        4     FALSE     1
29      10        6     FALSE     2
30      10        7     FALSE     2
31      10        8     FALSE     1
32      11       10     FALSE     1
33      12       12     FALSE     3
34      12       15     FALSE     1
35      12       27     FALSE     1
36      14        4     FALSE     1
37      15        1     FALSE     1
38       2        0     FALSE    16
39       2        1     FALSE     5
40       2       10     FALSE     3
41       2       12     FALSE     2
42       2       14     FALSE     1
43       2       15     FALSE     1
44       2       18     FALSE     1
45       2        2     FALSE    19
46       2        3     FALSE     8
47       2        4     FALSE     7
48       2        5     FALSE     3
49       2        6     FALSE     8
50       2        7     FALSE     3
51       2       NA     FALSE     1
52      20       20     FALSE     1
53      20        4     FALSE     1
54       3        0     FALSE     5
55       3        1     FALSE     1
56       3       10     FALSE     1
57       3       11     FALSE     1
58       3       12     FALSE     1
59       3       18     FALSE     1
60       3        2     FALSE     4
61       3        3     FALSE    23
62       3        4     FALSE     4
63       3        5     FALSE     7
64       3        7     FALSE     5
65       3        8     FALSE     3
66       3       NA     FALSE     1
67      30        3     FALSE     1
68       4        0     FALSE     7
69       4       10     FALSE     4
70       4       15     FALSE     1
71       4       16     FALSE     1
72       4       18     FALSE     1
73       4        2     FALSE     4
74       4        3     FALSE     3
75       4        4     FALSE     9
76       4        6     FALSE     4
77       4        7     FALSE     1
78       4        8     FALSE     3
79       4        9     FALSE     1
80       4       NA     FALSE     1
81       5        0     FALSE     4
82       5       10     FALSE     3
83       5       12     FALSE     1
84       5       18     FALSE     1
85       5        2     FALSE     5
86       5        3     FALSE     1
87       5       35     FALSE     1
88       5        4     FALSE     5
89       5        5     FALSE     7
90       5        6     FALSE     2
91       5        7     FALSE     1
92       5        9     FALSE     1
93       5       NA     FALSE     1
94       6        0     FALSE     2
95       6        1     FALSE     1
96       6       10     FALSE     3
97       6       11     FALSE     1
98       6       15     FALSE     1
99       6        2     FALSE     3
100      6        3     FALSE     2
..     ...      ...       ...   ...

# verify
dto[["unitData"]][["lbsl"]] %>%
  dplyr::filter(id %in% sample(unique(id),10)) %>%
  dplyr::select_("id", "WALK94","EXCERTOT", "sedentary")

        id WALK94 EXCERTOT sedentary
1  4051023     12       12     FALSE
2  4082091      1        1     FALSE
3  4141201     NA        0      TRUE
4  4181083      6        2     FALSE
5  4181091     NA        5     FALSE
6  4191084     11       10     FALSE
7  4221083      1       14     FALSE
8  4232084     NA        5     FALSE
9  4302016     NA       NA        NA
10 4321046      8       10     FALSE

SATSA

Items that can contribute to generating values for the harmonized variable sedentary are:

dto[["metaData"]] %>%
  dplyr::filter(study_name == "satsa", construct == "physact") %>%
  dplyr::select(study_name, name, label_short,categories)

  study_name     name                                                 label_short categories
1      satsa GEXERCIS What option best describes your exercise on a yearly basis?         NA

study_name <- "satsa"
path_to_hrule <- "./data/meta/h-rules/h-rules-physact-satsa.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
  dto,
  study_name = study_name, 
  variable_names = c("GEXERCIS"), 
  harmony_name = "sedentary"
)

Source: local data frame [8 x 3]
Groups: GEXERCIS [?]

                          GEXERCIS sedentary     n
                             (chr)     (lgl) (int)
1   I don't get very much exercise      TRUE   394
2          I get a lot of exercise     FALSE    88
3            I get little exercise      TRUE   193
4    I get quite a lot of exercise     FALSE   430
5       I get very little exercise      TRUE   181
6         I get very much exercise     FALSE    17
7 I hardly get any exercise at all      TRUE   169
8                               NA        NA    25

# verify
dto[["unitData"]][["satsa"]] %>%
  dplyr::filter(id %in% sample(unique(id),10)) %>%
  dplyr::select_("id", "GEXERCIS", "sedentary")

        id                       GEXERCIS sedentary
1   151061     I get very little exercise      TRUE
2   163902          I get little exercise      TRUE
3   167101 I don't get very much exercise      TRUE
4   172301     I get very little exercise      TRUE
5   181811  I get quite a lot of exercise     FALSE
6   191411 I don't get very much exercise      TRUE
7  2181602          I get little exercise      TRUE
8  2190512 I don't get very much exercise      TRUE
9  2191501     I get very little exercise      TRUE
10 2232641 I don't get very much exercise      TRUE

Items that can contribute to generating values for the harmonized variable sedentary are:

dto[["metaData"]] %>%
  dplyr::filter(study_name == "share", construct == "physact") %>%
  dplyr::select(study_name, name, label_short,categories)

  study_name   name                                     label_short categories
1      share BR0150          sports or activities that are vigorous         NA
2      share BR0160 activities requiring a moderate level of energy         NA

study_name <- "share"
path_to_hrule <- "./data/meta/h-rules/h-rules-physact-share.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
  dto,
  study_name = study_name, 
  variable_names = c("BR0160" ,"BR0150"), 
  harmony_name = "sedentary"
)

Source: local data frame [19 x 4]
Groups: BR0160, BR0150 [?]

                       BR0160                     BR0150 sedentary     n
                        (chr)                      (chr)     (lgl) (int)
1                  don't know      more than once a week     FALSE     3
2       hardly ever, or never      hardly ever, or never      TRUE   553
3       hardly ever, or never      more than once a week     FALSE    58
4       hardly ever, or never                once a week     FALSE    31
5       hardly ever, or never one to three times a month     FALSE    15
6       more than once a week                 don't know     FALSE     3
7       more than once a week      hardly ever, or never     FALSE   329
8       more than once a week      more than once a week     FALSE   968
9       more than once a week                once a week     FALSE   165
10      more than once a week one to three times a month     FALSE    59
11                once a week      hardly ever, or never     FALSE   111
12                once a week      more than once a week     FALSE    46
13                once a week                once a week     FALSE   109
14                once a week one to three times a month     FALSE    28
15 one to three times a month      hardly ever, or never     FALSE    53
16 one to three times a month      more than once a week     FALSE    17
17 one to three times a month                once a week     FALSE    17
18 one to three times a month one to three times a month     FALSE    29
19                         NA                         NA        NA     4

# verify
knitr::kable(dto[["unitData"]][["share"]] %>%
  dplyr::filter(id %in% sample(unique(id),10)) %>%
  dplyr::select_("id", "BR0160" ,"BR0150", "sedentary"))

id	BR0160	BR0150	sedentary
2.505206e+12	hardly ever, or never	hardly ever, or never	TRUE
2.505209e+12	more than once a week	more than once a week	FALSE
2.505249e+12	more than once a week	more than once a week	FALSE
2.505261e+12	more than once a week	more than once a week	FALSE
2.505266e+12	hardly ever, or never	hardly ever, or never	TRUE
2.505283e+12	more than once a week	more than once a week	FALSE
2.505284e+12	hardly ever, or never	hardly ever, or never	TRUE
2.605235e+12	once a week	once a week	FALSE
2.605282e+12	more than once a week	more than once a week	FALSE
2.705218e+12	hardly ever, or never	hardly ever, or never	TRUE

TILDA

Items that can contribute to generating values for the harmonized variable sedentary are:

dto[["metaData"]] %>%
  dplyr::filter(study_name == "tilda", construct == "physact") %>%
  dplyr::select(study_name, name, label_short,categories)

   study_name           name                                                                     label_short categories
1       tilda          BH101  During the last 7 days, on how many days did you do vigorous physical activit?         NA
2       tilda          BH102  How much time did you usually spend doing vigorous physical activities on one?         NA
3       tilda         BH102A  How much time did you usually spend doing vigorous physical activities on one?         NA
4       tilda          BH103  During the last 7 days, on how many days did you do moderate physical activit?         NA
5       tilda          BH104  How much time did you usually spend doing moderate physical activities on one?         NA
6       tilda         BH104A  How much time did you usually spend doing moderate physical activities on one?         NA
7       tilda          BH105  During the last 7 days, on how many days did you walk for at least 10 minutes?         NA
8       tilda          BH106         How much time did you usually spend walking on one of those days? HOURS         NA
9       tilda         BH106A          How much time did you usually spend walking on one of those days? MINS         NA
10      tilda          BH107  During the last 7 days, how much time did you spend sitting on a week day? HO?         NA
11      tilda         BH107A During the last 7 days, how much time did you spend sitting on a week day? MINS         NA
12      tilda IPAQMETMINUTES                                                 Physical activity met (minutes)         NA
13      tilda  IPAQEXERCISE3                                                 Physical activity met (minutes)         NA

study_name <- "tilda"
path_to_hrule <- "./data/meta/h-rules/h-rules-physact-tilda.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
  dto,
  study_name = study_name, 
  variable_names = c("BH105", "BH106", "BH101" ,"BH103"), 
  harmony_name = "sedentary"
)

Source: local data frame [1,029 x 6]
Groups: BH105, BH106, BH101, BH103 [?]

    BH105 BH106 BH101 BH103 sedentary     n
    (chr) (chr) (chr) (chr)     (lgl) (int)
1       0    -1     0     0      TRUE   819
2       0    -1     0     1      TRUE    40
3       0    -1     0     2      TRUE    47
4       0    -1     0     3      TRUE    47
5       0    -1     0     4      TRUE    28
6       0    -1     0     5      TRUE    40
7       0    -1     0     6      TRUE    13
8       0    -1     0     7      TRUE   114
9       0    -1     1     0      TRUE    21
10      0    -1     1     1      TRUE     1
11      0    -1     1     2      TRUE     5
12      0    -1     1     3      TRUE     1
13      0    -1     1     4      TRUE     2
14      0    -1     1     5      TRUE     1
15      0    -1     1     7      TRUE    14
16      0    -1     2     0      TRUE    20
17      0    -1     2     1      TRUE     1
18      0    -1     2     2      TRUE     5
19      0    -1     2     3      TRUE     3
20      0    -1     2     4      TRUE     3
21      0    -1     2     5      TRUE     5
22      0    -1     2     6      TRUE     1
23      0    -1     2     7      TRUE     4
24      0    -1     2    NA      TRUE     1
25      0    -1     3     0      TRUE    10
26      0    -1     3     1      TRUE     2
27      0    -1     3     2      TRUE     2
28      0    -1     3     3      TRUE     5
29      0    -1     3     4      TRUE     1
30      0    -1     3     5      TRUE     4
31      0    -1     3     7      TRUE     7
32      0    -1     4     0      TRUE     7
33      0    -1     4     3      TRUE     4
34      0    -1     4     5      TRUE     1
35      0    -1     4     7      TRUE     3
36      0    -1     5     0      TRUE     7
37      0    -1     5     1      TRUE     2
38      0    -1     5     2      TRUE     2
39      0    -1     5     3      TRUE     3
40      0    -1     5     4      TRUE     1
41      0    -1     5     5      TRUE     7
42      0    -1     5     6      TRUE     2
43      0    -1     5     7      TRUE     3
44      0    -1     6     0      TRUE     2
45      0    -1     6     1      TRUE     1
46      0    -1     6     2      TRUE     1
47      0    -1     6     6      TRUE     1
48      0    -1     6     7      TRUE     2
49      0    -1     7     0      TRUE    13
50      0    -1     7     1      TRUE     1
51      0    -1     7     2      TRUE     4
52      0    -1     7     3      TRUE     2
53      0    -1     7     4      TRUE     2
54      0    -1     7     5      TRUE     4
55      0    -1     7     6      TRUE     2
56      0    -1     7     7      TRUE    25
57      1     0     0     0      TRUE    93
58      1     0     0     1      TRUE    13
59      1     0     0     2      TRUE    15
60      1     0     0     3      TRUE     3
61      1     0     0     4      TRUE     2
62      1     0     0     5      TRUE     1
63      1     0     0     6      TRUE     1
64      1     0     0     7      TRUE    15
65      1     0     1     0      TRUE     7
66      1     0     1     1      TRUE     2
67      1     0     1     3      TRUE     1
68      1     0     1     4      TRUE     1
69      1     0     1     5      TRUE     2
70      1     0     1     7      TRUE     3
71      1     0     2     0      TRUE     5
72      1     0     2     1      TRUE     1
73      1     0     2     2      TRUE     2
74      1     0     2     3      TRUE     2
75      1     0     2     5      TRUE     2
76      1     0     3     0      TRUE     3
77      1     0     3     1      TRUE     1
78      1     0     3     2      TRUE     3
79      1     0     3     7      TRUE     1
80      1     0     4     0      TRUE     1
81      1     0     4     4      TRUE     1
82      1     0     4     6      TRUE     1
83      1     0     5     0      TRUE     1
84      1     0     5     5      TRUE     1
85      1     0     6     6      TRUE     1
86      1     0     6     7      TRUE     1
87      1     0     7     0      TRUE     1
88      1     0     7     2      TRUE     1
89      1     0     7     5      TRUE     1
90      1     0     7     7      TRUE     1
91      1     0    NA     0      TRUE     1
92      1     1     0     0     FALSE    44
93      1     1     0     1     FALSE     7
94      1     1     0     2     FALSE     6
95      1     1     0     3     FALSE     3
96      1     1     0     4     FALSE     4
97      1     1     0     5     FALSE     5
98      1     1     0     6     FALSE     1
99      1     1     0     7     FALSE     8
100     1     1     1     0     FALSE     4
..    ...   ...   ...   ...       ...   ...

# verify
dto[["unitData"]][["tilda"]] %>%
  dplyr::filter(id %in% sample(unique(id),10)) %>%
  dplyr::select_("id","BH105", "BH106", "BH101" ,"BH103","sedentary")

                   id BH105 BH106 BH101 BH103 sedentary
1  62911                  5     1     0     0     FALSE
2  97911                  3     1     0     0     FALSE
3  111841                 7     2     5     5     FALSE
4  163411                 7     1     4     7     FALSE
5  289631                 7     0     0     0     FALSE
6  302071                 0    -1     0     3      TRUE
7  329222                 5     1     0     1     FALSE
8  437862                 7     1     0     0     FALSE
9  470011                 7     0     0     0     FALSE
10 572322                 7     0     0     0     FALSE

(III) Recapitulation

At this point the dto[["unitData"]] elements (raw data files for each study) have been augmented with the harmonized variable sedentary. We retrieve harmonized variables to view frequency counts across studies:

dumlist <- list()
for(s in dto[["studyName"]]){
  ds <- dto[["unitData"]][[s]]
  dumlist[[s]] <- ds[,c("id","sedentary")]
}
ds <- plyr::ldply(dumlist,data.frame,.id = "study_name")
head(ds)

  study_name  id sedentary
1       alsa  41     FALSE
2       alsa  42     FALSE
3       alsa  61      TRUE
4       alsa  71     FALSE
5       alsa  91     FALSE
6       alsa 121      TRUE

ds$id <- 1:nrow(ds) # some ids values might be identical, replace
table( ds$sedentary, ds$study_name, useNA="always")

       
        alsa lbsl satsa share tilda <NA>
  FALSE 1250  470   535  2041  6937    0
  TRUE   814   85   937   553  1562    0
  <NA>    23  101    25     4     5    0

Finally, we have added the newly created, harmonized variables to the raw source objects and save the data transfer object.

# Save as a compress, binary R dataset.  It's no longer readable with a text editor, but it saves metadata (eg, factor information).
saveRDS(dto, file="./data/unshared/derived/dto.rds", compress="xz")

Harmonize: physical activity

(I) Exposition

(I.A) Ellis Island

Meta

(I.B) Target-H

Target (1) : sedentary

(II) Development

(II.A)

(1) Schema sets

(II.B) sedentary

Target (1) : sedentary

ALSA

LBSL

SATSA

SHARE

TILDA

(III) Recapitulation

Target (1) : `sedentary`

(II.B) `sedentary`

Target (1) : `sedentary`