This report lists the candidate variable for DataScheme variables of the construct physical activity.
This report is a record of interaction with a data transfer object (dto) produced by
./manipulation/0-ellis-island.R
.
The next section recaps this script, exposes the architecture of the DTO, and demonstrates the language of interacting with it.
All data land on Ellis Island.
The script 0-ellis-island.R
is the first script in the analytic workflow. It accomplished the following:
./data/shared/derived/meta-data-live.csv
, which is updated every time Ellis Island script is executed../data/shared/meta-data-map.csv
. They are used by automatic scripts in later harmonization and analysis.# load the product of 0-ellis-island.R, a list object containing data and metadata
dto <- readRDS("./data/unshared/derived/dto.rds")
# the list is composed of the following elements
names(dto)
[1] "studyName" "filePath" "unitData" "metaData"
# 1st element - names of the studies as character vector
dto[["studyName"]]
[1] "alsa" "lbsl" "satsa" "share" "tilda"
# 2nd element - file paths of the data files for each study as character vector
dto[["filePath"]]
[1] "./data/unshared/raw/ALSA-Wave1.Final.sav" "./data/unshared/raw/LBSL-Panel2-Wave1.Final.sav"
[3] "./data/unshared/raw/SATSA-Q3.Final.sav" "./data/unshared/raw/SHARE-Israel-Wave1.Final.sav"
[5] "./data/unshared/raw/TILDA-Wave1.Final.sav"
# 3rd element - is a list object containing the following elements
names(dto[["unitData"]])
[1] "alsa" "lbsl" "satsa" "share" "tilda"
# each of these elements is a raw data set of a corresponding study, for example
dplyr::tbl_df(dto[["unitData"]][["lbsl"]])
Source: local data frame [656 x 36]
id AGE94 SEX94 MSTAT94 EDUC94 NOWRK94 SMK94 SMOKE
(int) (int) (int) (fctr) (int) (fctr) (fctr) (fctr)
1 4001026 68 1 divorced 16 no, retired no never smoked
2 4012015 94 2 widowed 12 no, retired no never smoked
3 4012032 94 2 widowed 20 no, retired no don't smoke at present but smoked in the past
4 4022004 93 2 NA NA NA NA never smoked
5 4022026 93 2 widowed 12 no, retired no never smoked
6 4031031 92 1 married 8 no, retired no don't smoke at present but smoked in the past
7 4031035 92 1 widowed 13 no, retired no don't smoke at present but smoked in the past
8 4032201 92 2 NA NA NA NA don't smoke at present but smoked in the past
9 4041062 91 1 widowed 7 NA no don't smoke at present but smoked in the past
10 4042057 91 2 NA NA NA NA NA
.. ... ... ... ... ... ... ... ...
Variables not shown: ALCOHOL (fctr), WINE (int), BEER (int), HARDLIQ (int), SPORT94 (int), FIT94 (int), WALK94 (int),
SPEC94 (int), DANCE94 (int), CHORE94 (int), EXCERTOT (int), EXCERWK (int), HEIGHT94 (int), WEIGHT94 (int), HWEIGHT
(int), HHEIGHT (int), SRHEALTH (fctr), smoke_now (lgl), smoked_ever (lgl), year_of_wave (dbl), age_in_years (dbl),
year_born (dbl), female (lgl), marital (chr), single (lgl), educ3 (chr), current_work_2 (lgl), current_drink (lgl)
# 4th element - a dataset names and labels of raw variables + added metadata for all studies
dto[["metaData"]] %>% dplyr::select(study_name, name, item, construct, type, categories, label_short, label) %>%
DT::datatable(
class = 'cell-border stripe',
caption = "This is the primary metadata file. Edit at `./data/shared/meta-data-map.csv",
filter = "top",
options = list(pageLength = 6, autoWidth = TRUE)
)
Everybody wants to be somebody.
We query metadata set to retrieve all variables potentially tapping the construct physical activity
. These are the candidates to enter the DataSchema and contribute to computing harmonized variables.
NOTE: what is being retrieved depends on the manually entered values in the column construct
of the metadata file ./data/shared/meta-data-map.csv
. To specify a different group of variables, edit the metadata, not the script.
meta_data <- dto[["metaData"]] %>%
dplyr::filter(construct %in% c('physact')) %>%
dplyr::select(study_name, name, construct, label_short, categories, url) %>%
dplyr::arrange(construct, study_name)
knitr::kable(meta_data)
study_name | name | construct | label_short | categories | url |
---|---|---|---|---|---|
alsa | EXRTHOUS | physact | Exertion around house | NA | |
alsa | HWMNWK2W | physact | Times walked in past two weeks | NA | |
alsa | LSVEXC2W | physact | Less vigor sessions last 2 weeks | NA | |
alsa | LSVIGEXC | physact | Less vigor past 2 weeks | NA | |
alsa | TMHVYEXR | physact | Time heavy physical exertion | NA | |
alsa | TMVEXC2W | physact | Vigor Time past 2 weeks | NA | |
alsa | VIGEXC2W | physact | Vigor Sessions in past 2 weeks | NA | |
alsa | VIGEXCS | physact | Vigorous exercise | NA | |
alsa | WALK2WKS | physact | Walking past 2 weeks | NA | |
lbsl | SPORT94 | physact | Participant sports, number of hours | NA | |
lbsl | FIT94 | physact | Physical fitness, number of hours each week | NA | |
lbsl | WALK94 | physact | Walking, number of hours per week | NA | |
lbsl | SPEC94 | physact | Spectator sports, number of hours spent per week | NA | |
lbsl | DANCE94 | physact | Dancing | NA | |
lbsl | CHORE94 | physact | Doing household chores (hrs/wk) | NA | |
lbsl | EXCERTOT | physact | Exercising for shape/fun (hrs/wk) | NA | |
lbsl | EXCERWK | physact | Exercised or played sports (oc/wk) | NA | |
satsa | GEXERCIS | physact | What option best describes your exercise on a yearly basis? | NA | |
share | BR0150 | physact | sports or activities that are vigorous | NA | |
share | BR0160 | physact | activities requiring a moderate level of energy | NA | |
tilda | BH101 | physact | During the last 7 days, on how many days did you do vigorous physical activit? | NA | |
tilda | BH102 | physact | How much time did you usually spend doing vigorous physical activities on one? | NA | |
tilda | BH102A | physact | How much time did you usually spend doing vigorous physical activities on one? | NA | |
tilda | BH103 | physact | During the last 7 days, on how many days did you do moderate physical activit? | NA | |
tilda | BH104 | physact | How much time did you usually spend doing moderate physical activities on one? | NA | |
tilda | BH104A | physact | How much time did you usually spend doing moderate physical activities on one? | NA | |
tilda | BH105 | physact | During the last 7 days, on how many days did you walk for at least 10 minutes? | NA | |
tilda | BH106 | physact | How much time did you usually spend walking on one of those days? HOURS | NA | |
tilda | BH106A | physact | How much time did you usually spend walking on one of those days? MINS | NA | |
tilda | BH107 | physact | During the last 7 days, how much time did you spend sitting on a week day? HO? | NA | |
tilda | BH107A | physact | During the last 7 days, how much time did you spend sitting on a week day? MINS | NA | |
tilda | IPAQMETMINUTES | physact | Physical activity met (minutes) | NA | |
tilda | IPAQEXERCISE3 | physact | Physical activity met (minutes) | NA |
View descriptives : physical activity for closer examination of each candidate.
After reviewing descriptives and relevant codebooks, the following operationalization of the harmonized variables for physical activity
have been adopted:
sedentary
0
- FALSE
1
- TRUE
These variables will be generated next, in the Development section.
The particulare goal of this section is to ensure that the schema to encode the values for the physical activity
variable is consisten across studies.
In this section we will define the schema sets for harmonizing physical activity
construct (i.e. specify which variables from which studies will be contributing to computing harmonized variables ). Each of these schema sets will have a particular pattern of possible response values to these variables, which we will export for inspection as .csv
tables. We then will manually edit these .csv
tables, populating new columns that will map values of harmonized variables to the specific response pattern of the schema set variables. We then will import harmonization algorithms encoded in .csv
tables and apply them to compute harmonized variables in the dataset combining raw and harmonized variables for physical activity
construct across studies.
Having all potential variables in categorical format we have defined the sets of data schema variables thus:
Each of these schema sets have a particular pattern of possible response values, for example:
We output these tables into self-standing .csv
files, so we can manually provide the logic of computing harmonized variables.
You can examine them in `./data/meta/response-profiles-live/
sedentary
sedentary
0
- FALSE
1
- TRUE
Items that can contribute to generating values for the harmonized variable sedentary
are:
dto[["metaData"]] %>%
dplyr::filter(study_name=="alsa", construct %in% c("physact")) %>%
dplyr::select(study_name, name, label,categories)
study_name name label categories
1 alsa EXRTHOUS Exertion around house NA
2 alsa HWMNWK2W Times walked in past two weeks NA
3 alsa LSVEXC2W Less vigor sessions last 2 weeks NA
4 alsa LSVIGEXC Less vigor past 2 weeks NA
5 alsa TMHVYEXR Time heavy physical exertion NA
6 alsa TMVEXC2W Vigor Time past 2 weeks NA
7 alsa VIGEXC2W Vigor Sessions in past 2 weeks NA
8 alsa VIGEXCS Vigorous exercise NA
9 alsa WALK2WKS Walking past 2 weeks NA
We encode the harmonization rule by manually editing the values in a corresponding .csv
file located in ./data/meta/h-rules/
. Then, we apply the recoding logic it contains and append the newly created, harmonized variable to the initial data set.
study_name <- "alsa"
path_to_hrule <- "./data/meta/h-rules/h-rules-physact-alsa.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
dto,
study_name = study_name,
variable_names = c("WALK2WKS", "LSVIGEXC", "VIGEXCS", "EXRTHOUS"),
harmony_name = "sedentary"
)
Source: local data frame [18 x 6]
Groups: WALK2WKS, LSVIGEXC, VIGEXCS, EXRTHOUS [?]
WALK2WKS LSVIGEXC VIGEXCS EXRTHOUS sedentary n
(chr) (chr) (chr) (chr) (lgl) (int)
1 No No No No TRUE 814
2 No No No Yes FALSE 113
3 No No Yes No FALSE 14
4 No No Yes Yes FALSE 6
5 No Yes No No FALSE 118
6 No Yes No Yes FALSE 18
7 No Yes Yes No FALSE 4
8 No Yes Yes Yes FALSE 4
9 Yes No No No FALSE 601
10 Yes No No Yes FALSE 98
11 Yes No Yes No FALSE 21
12 Yes No Yes Yes FALSE 8
13 Yes Yes No No FALSE 177
14 Yes Yes No Yes FALSE 39
15 Yes Yes No NA FALSE 1
16 Yes Yes Yes No FALSE 24
17 Yes Yes Yes Yes FALSE 4
18 NA NA NA NA NA 23
# verify
dto[["unitData"]][["alsa"]] %>%
dplyr::filter(id %in% sample(unique(id),10)) %>%
dplyr::select_("id", "WALK2WKS", "LSVIGEXC", "VIGEXCS", "EXRTHOUS", "sedentary")
id WALK2WKS LSVIGEXC VIGEXCS EXRTHOUS sedentary
1 581 Yes Yes No No FALSE
2 761 Yes No No No FALSE
3 3032 Yes No No No FALSE
4 5771 Yes No No Yes FALSE
5 17712 Yes No No No FALSE
6 22601 No No No No TRUE
7 22941 No No No No TRUE
8 23161 Yes No No Yes FALSE
9 24401 No No No Yes FALSE
10 29611 Yes No No No FALSE
Items that can contribute to generating values for the harmonized variable sedentary
are:
dto[["metaData"]] %>%
dplyr::filter(study_name == "lbsl", construct == "physact") %>%
dplyr::select(study_name, name, label_short,categories)
study_name name label_short categories
1 lbsl SPORT94 Participant sports, number of hours NA
2 lbsl FIT94 Physical fitness, number of hours each week NA
3 lbsl WALK94 Walking, number of hours per week NA
4 lbsl SPEC94 Spectator sports, number of hours spent per week NA
5 lbsl DANCE94 Dancing NA
6 lbsl CHORE94 Doing household chores (hrs/wk) NA
7 lbsl EXCERTOT Exercising for shape/fun (hrs/wk) NA
8 lbsl EXCERWK Exercised or played sports (oc/wk) NA
We encode the harmonization rule by manually editing the values in a corresponding .csv
file located in ./data/meta/h-rules/
. Then, we apply the recoding logic it contains and append the newly created, harmonized variable to the initial data set.
study_name <- "lbsl"
path_to_hrule <- "./data/meta/h-rules/h-rules-physact-lbsl.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
dto,
study_name = study_name,
variable_names = c("WALK94","EXCERTOT"),
harmony_name = "sedentary"
)
Source: local data frame [133 x 4]
Groups: WALK94, EXCERTOT [?]
WALK94 EXCERTOT sedentary n
(chr) (chr) (lgl) (int)
1 0 0 TRUE 39
2 0 1 FALSE 2
3 0 10 FALSE 2
4 0 12 FALSE 2
5 0 14 FALSE 1
6 0 16 FALSE 1
7 0 2 FALSE 5
8 0 3 FALSE 2
9 0 4 FALSE 9
10 0 5 FALSE 4
11 0 6 FALSE 4
12 0 7 FALSE 2
13 0 9 FALSE 1
14 0 NA TRUE 1
15 1 0 FALSE 12
16 1 1 FALSE 18
17 1 14 FALSE 4
18 1 2 FALSE 12
19 1 3 FALSE 11
20 1 4 FALSE 3
21 1 5 FALSE 6
22 1 6 FALSE 1
23 1 7 FALSE 1
24 1 8 FALSE 1
25 1 NA FALSE 3
26 10 10 FALSE 2
27 10 20 FALSE 2
28 10 4 FALSE 1
29 10 6 FALSE 2
30 10 7 FALSE 2
31 10 8 FALSE 1
32 11 10 FALSE 1
33 12 12 FALSE 3
34 12 15 FALSE 1
35 12 27 FALSE 1
36 14 4 FALSE 1
37 15 1 FALSE 1
38 2 0 FALSE 16
39 2 1 FALSE 5
40 2 10 FALSE 3
41 2 12 FALSE 2
42 2 14 FALSE 1
43 2 15 FALSE 1
44 2 18 FALSE 1
45 2 2 FALSE 19
46 2 3 FALSE 8
47 2 4 FALSE 7
48 2 5 FALSE 3
49 2 6 FALSE 8
50 2 7 FALSE 3
51 2 NA FALSE 1
52 20 20 FALSE 1
53 20 4 FALSE 1
54 3 0 FALSE 5
55 3 1 FALSE 1
56 3 10 FALSE 1
57 3 11 FALSE 1
58 3 12 FALSE 1
59 3 18 FALSE 1
60 3 2 FALSE 4
61 3 3 FALSE 23
62 3 4 FALSE 4
63 3 5 FALSE 7
64 3 7 FALSE 5
65 3 8 FALSE 3
66 3 NA FALSE 1
67 30 3 FALSE 1
68 4 0 FALSE 7
69 4 10 FALSE 4
70 4 15 FALSE 1
71 4 16 FALSE 1
72 4 18 FALSE 1
73 4 2 FALSE 4
74 4 3 FALSE 3
75 4 4 FALSE 9
76 4 6 FALSE 4
77 4 7 FALSE 1
78 4 8 FALSE 3
79 4 9 FALSE 1
80 4 NA FALSE 1
81 5 0 FALSE 4
82 5 10 FALSE 3
83 5 12 FALSE 1
84 5 18 FALSE 1
85 5 2 FALSE 5
86 5 3 FALSE 1
87 5 35 FALSE 1
88 5 4 FALSE 5
89 5 5 FALSE 7
90 5 6 FALSE 2
91 5 7 FALSE 1
92 5 9 FALSE 1
93 5 NA FALSE 1
94 6 0 FALSE 2
95 6 1 FALSE 1
96 6 10 FALSE 3
97 6 11 FALSE 1
98 6 15 FALSE 1
99 6 2 FALSE 3
100 6 3 FALSE 2
.. ... ... ... ...
# verify
dto[["unitData"]][["lbsl"]] %>%
dplyr::filter(id %in% sample(unique(id),10)) %>%
dplyr::select_("id", "WALK94","EXCERTOT", "sedentary")
id WALK94 EXCERTOT sedentary
1 4051023 12 12 FALSE
2 4082091 1 1 FALSE
3 4141201 NA 0 TRUE
4 4181083 6 2 FALSE
5 4181091 NA 5 FALSE
6 4191084 11 10 FALSE
7 4221083 1 14 FALSE
8 4232084 NA 5 FALSE
9 4302016 NA NA NA
10 4321046 8 10 FALSE
Items that can contribute to generating values for the harmonized variable sedentary
are:
dto[["metaData"]] %>%
dplyr::filter(study_name == "satsa", construct == "physact") %>%
dplyr::select(study_name, name, label_short,categories)
study_name name label_short categories
1 satsa GEXERCIS What option best describes your exercise on a yearly basis? NA
We encode the harmonization rule by manually editing the values in a corresponding .csv
file located in ./data/meta/h-rules/
. Then, we apply the recoding logic it contains and append the newly created, harmonized variable to the initial data set.
study_name <- "satsa"
path_to_hrule <- "./data/meta/h-rules/h-rules-physact-satsa.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
dto,
study_name = study_name,
variable_names = c("GEXERCIS"),
harmony_name = "sedentary"
)
Source: local data frame [8 x 3]
Groups: GEXERCIS [?]
GEXERCIS sedentary n
(chr) (lgl) (int)
1 I don't get very much exercise TRUE 394
2 I get a lot of exercise FALSE 88
3 I get little exercise TRUE 193
4 I get quite a lot of exercise FALSE 430
5 I get very little exercise TRUE 181
6 I get very much exercise FALSE 17
7 I hardly get any exercise at all TRUE 169
8 NA NA 25
# verify
dto[["unitData"]][["satsa"]] %>%
dplyr::filter(id %in% sample(unique(id),10)) %>%
dplyr::select_("id", "GEXERCIS", "sedentary")
id GEXERCIS sedentary
1 151061 I get very little exercise TRUE
2 163902 I get little exercise TRUE
3 167101 I don't get very much exercise TRUE
4 172301 I get very little exercise TRUE
5 181811 I get quite a lot of exercise FALSE
6 191411 I don't get very much exercise TRUE
7 2181602 I get little exercise TRUE
8 2190512 I don't get very much exercise TRUE
9 2191501 I get very little exercise TRUE
10 2232641 I don't get very much exercise TRUE
Items that can contribute to generating values for the harmonized variable sedentary
are:
dto[["metaData"]] %>%
dplyr::filter(study_name == "tilda", construct == "physact") %>%
dplyr::select(study_name, name, label_short,categories)
study_name name label_short categories
1 tilda BH101 During the last 7 days, on how many days did you do vigorous physical activit? NA
2 tilda BH102 How much time did you usually spend doing vigorous physical activities on one? NA
3 tilda BH102A How much time did you usually spend doing vigorous physical activities on one? NA
4 tilda BH103 During the last 7 days, on how many days did you do moderate physical activit? NA
5 tilda BH104 How much time did you usually spend doing moderate physical activities on one? NA
6 tilda BH104A How much time did you usually spend doing moderate physical activities on one? NA
7 tilda BH105 During the last 7 days, on how many days did you walk for at least 10 minutes? NA
8 tilda BH106 How much time did you usually spend walking on one of those days? HOURS NA
9 tilda BH106A How much time did you usually spend walking on one of those days? MINS NA
10 tilda BH107 During the last 7 days, how much time did you spend sitting on a week day? HO? NA
11 tilda BH107A During the last 7 days, how much time did you spend sitting on a week day? MINS NA
12 tilda IPAQMETMINUTES Physical activity met (minutes) NA
13 tilda IPAQEXERCISE3 Physical activity met (minutes) NA
We encode the harmonization rule by manually editing the values in a corresponding .csv
file located in ./data/meta/h-rules/
. Then, we apply the recoding logic it contains and append the newly created, harmonized variable to the initial data set.
study_name <- "tilda"
path_to_hrule <- "./data/meta/h-rules/h-rules-physact-tilda.csv"
dto[["unitData"]][[study_name]] <- recode_with_hrule(
dto,
study_name = study_name,
variable_names = c("BH105", "BH106", "BH101" ,"BH103"),
harmony_name = "sedentary"
)
Source: local data frame [1,029 x 6]
Groups: BH105, BH106, BH101, BH103 [?]
BH105 BH106 BH101 BH103 sedentary n
(chr) (chr) (chr) (chr) (lgl) (int)
1 0 -1 0 0 TRUE 819
2 0 -1 0 1 TRUE 40
3 0 -1 0 2 TRUE 47
4 0 -1 0 3 TRUE 47
5 0 -1 0 4 TRUE 28
6 0 -1 0 5 TRUE 40
7 0 -1 0 6 TRUE 13
8 0 -1 0 7 TRUE 114
9 0 -1 1 0 TRUE 21
10 0 -1 1 1 TRUE 1
11 0 -1 1 2 TRUE 5
12 0 -1 1 3 TRUE 1
13 0 -1 1 4 TRUE 2
14 0 -1 1 5 TRUE 1
15 0 -1 1 7 TRUE 14
16 0 -1 2 0 TRUE 20
17 0 -1 2 1 TRUE 1
18 0 -1 2 2 TRUE 5
19 0 -1 2 3 TRUE 3
20 0 -1 2 4 TRUE 3
21 0 -1 2 5 TRUE 5
22 0 -1 2 6 TRUE 1
23 0 -1 2 7 TRUE 4
24 0 -1 2 NA TRUE 1
25 0 -1 3 0 TRUE 10
26 0 -1 3 1 TRUE 2
27 0 -1 3 2 TRUE 2
28 0 -1 3 3 TRUE 5
29 0 -1 3 4 TRUE 1
30 0 -1 3 5 TRUE 4
31 0 -1 3 7 TRUE 7
32 0 -1 4 0 TRUE 7
33 0 -1 4 3 TRUE 4
34 0 -1 4 5 TRUE 1
35 0 -1 4 7 TRUE 3
36 0 -1 5 0 TRUE 7
37 0 -1 5 1 TRUE 2
38 0 -1 5 2 TRUE 2
39 0 -1 5 3 TRUE 3
40 0 -1 5 4 TRUE 1
41 0 -1 5 5 TRUE 7
42 0 -1 5 6 TRUE 2
43 0 -1 5 7 TRUE 3
44 0 -1 6 0 TRUE 2
45 0 -1 6 1 TRUE 1
46 0 -1 6 2 TRUE 1
47 0 -1 6 6 TRUE 1
48 0 -1 6 7 TRUE 2
49 0 -1 7 0 TRUE 13
50 0 -1 7 1 TRUE 1
51 0 -1 7 2 TRUE 4
52 0 -1 7 3 TRUE 2
53 0 -1 7 4 TRUE 2
54 0 -1 7 5 TRUE 4
55 0 -1 7 6 TRUE 2
56 0 -1 7 7 TRUE 25
57 1 0 0 0 TRUE 93
58 1 0 0 1 TRUE 13
59 1 0 0 2 TRUE 15
60 1 0 0 3 TRUE 3
61 1 0 0 4 TRUE 2
62 1 0 0 5 TRUE 1
63 1 0 0 6 TRUE 1
64 1 0 0 7 TRUE 15
65 1 0 1 0 TRUE 7
66 1 0 1 1 TRUE 2
67 1 0 1 3 TRUE 1
68 1 0 1 4 TRUE 1
69 1 0 1 5 TRUE 2
70 1 0 1 7 TRUE 3
71 1 0 2 0 TRUE 5
72 1 0 2 1 TRUE 1
73 1 0 2 2 TRUE 2
74 1 0 2 3 TRUE 2
75 1 0 2 5 TRUE 2
76 1 0 3 0 TRUE 3
77 1 0 3 1 TRUE 1
78 1 0 3 2 TRUE 3
79 1 0 3 7 TRUE 1
80 1 0 4 0 TRUE 1
81 1 0 4 4 TRUE 1
82 1 0 4 6 TRUE 1
83 1 0 5 0 TRUE 1
84 1 0 5 5 TRUE 1
85 1 0 6 6 TRUE 1
86 1 0 6 7 TRUE 1
87 1 0 7 0 TRUE 1
88 1 0 7 2 TRUE 1
89 1 0 7 5 TRUE 1
90 1 0 7 7 TRUE 1
91 1 0 NA 0 TRUE 1
92 1 1 0 0 FALSE 44
93 1 1 0 1 FALSE 7
94 1 1 0 2 FALSE 6
95 1 1 0 3 FALSE 3
96 1 1 0 4 FALSE 4
97 1 1 0 5 FALSE 5
98 1 1 0 6 FALSE 1
99 1 1 0 7 FALSE 8
100 1 1 1 0 FALSE 4
.. ... ... ... ... ... ...
# verify
dto[["unitData"]][["tilda"]] %>%
dplyr::filter(id %in% sample(unique(id),10)) %>%
dplyr::select_("id","BH105", "BH106", "BH101" ,"BH103","sedentary")
id BH105 BH106 BH101 BH103 sedentary
1 62911 5 1 0 0 FALSE
2 97911 3 1 0 0 FALSE
3 111841 7 2 5 5 FALSE
4 163411 7 1 4 7 FALSE
5 289631 7 0 0 0 FALSE
6 302071 0 -1 0 3 TRUE
7 329222 5 1 0 1 FALSE
8 437862 7 1 0 0 FALSE
9 470011 7 0 0 0 FALSE
10 572322 7 0 0 0 FALSE
At this point the dto[["unitData"]]
elements (raw data files for each study) have been augmented with the harmonized variable sedentary
. We retrieve harmonized variables to view frequency counts across studies:
dumlist <- list()
for(s in dto[["studyName"]]){
ds <- dto[["unitData"]][[s]]
dumlist[[s]] <- ds[,c("id","sedentary")]
}
ds <- plyr::ldply(dumlist,data.frame,.id = "study_name")
head(ds)
study_name id sedentary
1 alsa 41 FALSE
2 alsa 42 FALSE
3 alsa 61 TRUE
4 alsa 71 FALSE
5 alsa 91 FALSE
6 alsa 121 TRUE
ds$id <- 1:nrow(ds) # some ids values might be identical, replace
table( ds$sedentary, ds$study_name, useNA="always")
alsa lbsl satsa share tilda <NA>
FALSE 1250 470 535 2041 6937 0
TRUE 814 85 937 553 1562 0
<NA> 23 101 25 4 5 0
Finally, we have added the newly created, harmonized variables to the raw source objects and save the data transfer object.
# Save as a compress, binary R dataset. It's no longer readable with a text editor, but it saves metadata (eg, factor information).
saveRDS(dto, file="./data/unshared/derived/dto.rds", compress="xz")