AnthonyFucci@Westat.com
AlexanderCates@Westat.com
summarizeNHTS
Wednesday, August 8, 2018
Workshop Goals
This presentation is intended for re-use
Presentation hotkeys
Key | Action |
---|---|
C | Show table of contents |
F | Toggles the display of the footer |
A | Toggles display of current vs all slides |
S | Make fonts smaller |
B | Make fonts larger |
Click here for latest installation instructions
Instructions provide explicit download links for
Once installed, open RStudio
Make sure you have the “summarizeNHTS” R software package installed
install.packages("devtools")
devtools::install_github("Westat-Transportation/summarizeNHTS")
And now that we have installed the necessary software, load the software
library(summarizeNHTS)
OK, we’re ready!
Before we begin, let’s make sure the summarizeNHTS
package is loaded.
library(summarizeNHTS)
# Not Run
download_nhts_data("2001")
download_nhts_data("2009")
download_nhts_data("2017")
download_nhts_data
function.
download_nhts_data
parameters
nhts_data <- read_data("2017", "C:/NHTS")
read_data
is a function that reads and compiles data from CSVs.
nhts_data
is the new object we created.
'2017'
to specify that we are working with the 2017 dataset.download_nhts_data
function).nhts_data
object we created.
$
summary
function to get an overview of the data structure.summary(nhts_data$data)
## Length Class Mode
## trip 60 data.table list
## person 87 data.table list
## household 59 data.table list
## vehicle 15 data.table list
data
element include four data.table
objects:
data.tables
are 2-dimensional data structures (rows X columns).
data.frame
structure (with enhanced functionality).nhts_data$data$vehicle
## HOUSEID VEHID ANNMILES FUELTYPE HFUEL HYBRID MAKE MODEL OD_READ
## 1: 30000007 01 1000 01 -1 02 49 49032 69000
## 2: 30000007 02 2500 01 -1 02 49 49442 164000
## 3: 30000007 03 900 01 -1 02 19 19014 120000
## 4: 30000007 04 500 01 -1 02 19 19021 -88
## 5: 30000007 05 10000 01 -1 02 20 20481 300000
## ---
## 256111: 40794301 01 9500 01 -1 02 37 37441 60976
## 256112: 40794301 02 9600 01 -1 02 49 49032 93388
## 256113: 40794301 03 6200 01 -1 02 35 35443 143616
## 256114: 40794301 04 -1 01 -1 02 49 49471 195078
## 256115: 40794301 05 1000 01 -1 02 35 XXXXX 167141
## VEHAGE VEHOWNED VEHOWNMO VEHTYPE VEHYEAR WHOMAIN
## 1: 10 01 -1 01 2007 03
## 2: 13 01 -1 02 2004 -8
## 3: 19 01 -1 01 1998 01
## 4: 20 01 -1 01 1997 02
## 5: 24 01 -1 04 1993 02
## ---
## 256111: 5 01 -1 02 2012 01
## 256112: 12 02 10 01 2005 04
## 256113: 21 01 -1 02 1996 03
## 256114: 26 01 -1 04 1991 02
## 256115: 27 01 -1 01 1990 01
# By position
nhts_data$data$vehicle[, c(1, 3)]
# By name (single variable)
nhts_data$data$vehicle$ANNMILES
# By name
nhts_data$data$vehicle[, list(HOUSEID, ANNMILES)]
# By row numbers (first 5 rows)
nhts_data$data$vehicle[1:5, ]
# By condition
nhts_data$data$vehicle[VEHTYPE == "01", ]
# By condition (multiple values)
nhts_data$data$vehicle[VEHTYPE %in% c("01","02"), ]
codebook_2001
, codebook_2009
, codebook_2017
# 2017 variables table
head(codebook_2017$variables)
## NAME TABLE TYPE
## 1: AGERANGE person character
## 2: AIRPORT household character
## 3: ALT_1 person character
## 4: ALT_16 person character
## 5: ALT_2 person character
## 6: ALT_23 person character
## LABEL
## 1: Age Range
## 2: Influence Choice of Airport
## 3: Alternative Mode of Transportation: Public Transportation
## 4: Alternative Mode of Transportation: Public Transportation or Taxi
## 5: Alternative Mode of Transportation: Passenger to Friend or Family Member
## 6: Alternative Mode of Transportation: Passenger to Friend/Family Member or Rental Car
# 2017 values table
head(codebook_2017$values)
## NAME VALUE LABEL
## 1: AGERANGE -8 I don't know
## 2: AGERANGE -7 I prefer not to answer
## 3: AGERANGE -1 Appropriate skip
## 4: AGERANGE 01 0-4 years old
## 5: AGERANGE 02 5-15 years old
## 6: AGERANGE 03 16-17 years old
summarize_data
functionsummarize_data
parameterssummarize_data
summarizeNHTS
package has built in functions for running complex queries on the NHTS datasetsummarize_data
is the workhorse function behind these queries.summarize_data(
data = nhts_data,
agg = "household_count"
)
## W E S N
## 1: 118208251 1.186759e-07 129696 129696
summarize_data
query will return these fields.summarize_data
Parameterssummarize_data(
data = nhts_data,
agg = "household_count"
)
data
- NHTS dataset object
read_data
.nhts_data
object.agg
- Aggregate function label
'household_count'
but agg
could be a number of other labels.by
parameter to group by metropolitan status.summarize_data(
data = nhts_data,
agg = "household_count",
by = "IS_METRO"
)
## IS_METRO W E S N
## 1: Metropolitan Area 100480095 37143.09 109250 109250
## 2: Not a Metropolitan Area 17728156 37143.09 20446 20446
by
parameter.summarize_data(
data = nhts_data,
agg = "household_count",
by = c("IS_METRO","HOMEOWN")
)
## IS_METRO HOMEOWN W E
## 1: Metropolitan Area I dont know 1728.364 1060.471
## 2: Metropolitan Area I prefer not to answer 31126.577 10432.815
## 3: Metropolitan Area Own 62154931.751 122801.012
## 4: Metropolitan Area Rent 37328199.616 124577.033
## 5: Metropolitan Area Some other arrangement 964108.222 87850.539
## 6: Not a Metropolitan Area I prefer not to answer 5866.829 4255.033
## 7: Not a Metropolitan Area Own 12363614.703 124022.494
## 8: Not a Metropolitan Area Rent 5135781.233 113252.628
## 9: Not a Metropolitan Area Some other arrangement 222893.706 36387.468
## S N
## 1: 3 3
## 2: 26 26
## 3: 81923 81923
## 4: 26568 26568
## 5: 730 730
## 6: 6 6
## 7: 16536 16536
## 8: 3700 3700
## 9: 204 204
'household_count'
, 'person_count'
, 'trip_count'
, 'vehicle_count'
# Person count
summarize_data(
data = nhts_data,
agg = "person_count"
)
## W E S N
## 1: 301599169 0.390838 264234 264234
prop
parameter
by
variable is specified# Proportion of persons by WORKER, worker status
summarize_data(
data = nhts_data,
agg = "person_count",
by = "WORKER",
prop = TRUE
)
## WORKER W E S N
## 1: Not ascertained 6.819779e-05 2.760915e-05 6.433691e-05 17
## 2: Appropriate skip 1.508504e-01 1.263185e-03 1.057320e-01 27938
## 3: Yes 5.205195e-01 1.690699e-03 4.855091e-01 128288
## 4: No 3.285619e-01 2.475022e-03 4.086946e-01 107991
'sum'
, 'avg'
, 'median'
agg_var
parameter
# Average TRPMILES, trip distance in miles
summarize_data(
data = nhts_data,
agg = "avg",
agg_var = "TRPMILES"
)
## W E S N
## 1: 10.70435 0.2003875 11.45057 922916
Notes
summarize_data
handles missing value (-1,-7,-8,-9) exclusion for numeric aggregates'household_trip_rate'
- Daily Person Trips per Household'person_trip_rate'
- Daily Person Trips per Person# Daily person Trips by worker status
summarize_data(
data = nhts_data,
agg = "person_trip_rate",
by = "WORKER"
)
## WORKER W E S N
## 1: Not ascertained 1.280018 1.11437619 1.647059 28
## 2: Appropriate skip 2.780250 0.03060904 2.838249 79295
## 3: Yes 3.773681 0.01752420 3.881454 497944
## 4: No 3.006374 0.02919963 3.206795 346305
summarize_data
subset
parameter.
# Distribution of social/recreational trips by travel day
summarize_data(
data = nhts_data,
agg = "trip_count",
by = "TRAVDAY",
prop = TRUE,
subset = "WHYTRP90 %in% c('07','08','10')"
)
## TRAVDAY W E S N
## 1: Sunday 0.1735797 0.003073281 0.1335907 33043
## 2: Monday 0.1179881 0.002790327 0.1299642 32146
## 3: Tuesday 0.1097795 0.001604585 0.1308981 32377
## 4: Wednesday 0.1123670 0.002679049 0.1350300 33399
## 5: Thursday 0.1243485 0.003190797 0.1427399 35306
## 6: Friday 0.1526481 0.003741184 0.1695486 41937
## 7: Saturday 0.2092892 0.004953279 0.1582284 39137
# Person trip rate by Sex (for millennials)
summarize_data(
data = nhts_data,
agg = "person_trip_rate",
by = "R_SEX",
subset = "R_AGE >= 18 & R_AGE <= 34"
)
## R_SEX W E S N
## 1: I dont know 0.9505731 1.09495843 1.333333 4
## 2: I prefer not to answer 3.4089751 0.63080999 3.647059 124
## 3: Male 3.1543275 0.04725515 3.262248 63591
## 4: Female 3.4506053 0.05616977 3.590901 77427
?summarize_data
R Documentation for summarize_data
Example Derived Variable Coding Scenario
Anthony: “I am interested in exploring how financial burden may affect travel.”
Alex: “Remember that question about walking to save money? I would include that in your analysis.”
WALK2SAVE: “I walk to places to save money.”
Values:
01 | Strongly agree |
02 | Agree |
03 | Neither Agree or Disagree |
04 | Disagree |
05 | Strongly disagree |
Look for potential other ways of maniupulating this variable for analysis
Create variable called WALK_FINANCE, a yes/no variable for the binary analysis question, “who does or does not walk to save money?”
A derived variable template file is included in summarizeNHTS
Create basic and complex variables with your own logic
Variables loaded automatically for you by read_data()
The derived variable template file preserves details of coding for your documentation
Derived variable file requirements
Item | Description |
---|---|
NAME | The name of the variable as it will appear in the dataset |
TABLE | The table level this variable is being computed for |
TYPE | Data type (numeric or character) |
DOMAIN | Logical expression that decides value assignment |
VALUE | A variable code value |
LABEL | Description of code value |
Using the Derived Variables file, create a variable with the following requirements:
NAME | TABLE | TYPE | DOMAIN | VALUE | LABEL |
---|---|---|---|---|---|
HAS_VEHICLE | household | character | HHVEHCNT > 0 | 1 | Yes |
HAS_VEHICLE | household | character | HHVEHCNT == 0 | 2 | No |
Using the Derived Variables file, create a variable with the following requirements:
NAME | TABLE | TYPE | DOMAIN | VALUE | LABEL |
---|---|---|---|---|---|
AGE_GROUP | person | character | R_AGE >= 0 & R_AGE <= 17 | 1 | Child |
AGE_GROUP | person | character | R_AGE >= 18 & R_AGE <= 44 | 2 | Young Adult |
AGE_GROUP | person | character | R_AGE >= 45 & R_AGE <= 65 | 3 | Middle Adult |
AGE_GROUP | person | character | R_AGE >= 66 | 4 | Older Adult |
Using the Derived Variables file, create a variable with the following requirements:
NAME | TABLE | TYPE | DOMAIN | VALUE | LABEL |
---|---|---|---|---|---|
USES_TNC | person | character | RIDESHARE > 0 | 1 | Yes |
USES_TNC | person | character | RIDESHARE == 0 | 2 | No |
Using the Derived Variables file, create a variable with the following requirements:
NAME | TABLE | TYPE | DOMAIN | VALUE | LABEL |
---|---|---|---|---|---|
IS_METRO | household | character | MSACAT %in% c(‘01’,‘02’,‘03’) | 1 | Yes |
IS_METRO | household | character | MSACAT %in% c(‘04’) | 2 | No |
make_table
- Create report-ready, formatted tables.make_chart
- Create interactive bar charts.make_map
- Create interactive choropleth maps.summarize_data
to a new object.statistic <- summarize_data(
data = nhts_data,
agg = "person_trip_rate",
by = "WORKER"
)
make_table(statistic)
Person Trip Rate | ||||
---|---|---|---|---|
Weighted | MOE (95%) | Surveyed | N | |
WORKER | ||||
Not ascertained | 1.28 | 2.21 | 1.65 | 28 |
Appropriate skip | 2.78 | 0.06 | 2.84 | 79,295 |
Yes | 3.77 | 0.03 | 3.88 | 497,944 |
No | 3.01 | 0.06 | 3.21 | 346,305 |
statistic <- summarize_data(
data = nhts_data,
agg = "person_count",
by = c("TRAVDAY","OCCAT","EDUC"),
exclude_missing = TRUE
)
make_table(
tbl = statistic,
title = "Table 1: Distribution of Persons (%) by Travel Day, Job Category, and Educational Attainment",
output = c(W = "Weighted Percentage", N = "Sample Size"),
row_vars = c("EDUC","OCCAT")
)
Table 1: Distribution of Persons (%) by Travel Day, Job Category, and Educational Attainment | ||||||||||||||||||||
TRAVDAY | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sunday | Monday | Tuesday | Wednesday | Thursday | Friday | Saturday | ||||||||||||||
EDUC by OCCAT |
Weighted Percentage | Sample Size | Weighted Percentage | Sample Size | Weighted Percentage | Sample Size | Weighted Percentage | Sample Size | Weighted Percentage | Sample Size | Weighted Percentage | Sample Size | Weighted Percentage | Sample Size | ||||||
Less than a high school graduate | ||||||||||||||||||||
Sales or service | 574,253.60 | 185 | 456,938.19 | 290 | 464,457.46 | 296 | 421,934.03 | 277 | 515,296.12 | 329 | 476,395.16 | 319 | 531,217.96 | 195 | ||||||
Clerical or administrative support | 24,191.30 | 19 | 52,107.51 | 23 | 43,121.83 | 22 | 16,618.23 | 19 | 11,019.01 | 21 | 37,910.96 | 27 | 11,626.80 | 7 | ||||||
Manufacturing, construction, maintenance, or farming | 346,452.77 | 126 | 360,873.85 | 181 | 259,282.42 | 178 | 401,220.90 | 204 | 387,728.36 | 222 | 315,426.80 | 190 | 313,535.91 | 124 | ||||||
Professional, managerial, or technical | 120,839.84 | 34 | 123,697.34 | 66 | 93,703.74 | 57 | 87,975.61 | 59 | 50,525.03 | 53 | 72,764.91 | 61 | 77,724.76 | 33 | ||||||
Something else | 210.64 | 1 | 15,959.54 | 5 | 160.85 | 1 | 3,835.75 | 1 | 2,581.47 | 2 | 3,218.39 | 5 | 950.45 | 1 | ||||||
High school graduate or GED | ||||||||||||||||||||
Sales or service | 1,507,054.46 | 777 | 1,532,467.67 | 1,077 | 1,680,817.18 | 1,107 | 1,671,275.11 | 1,067 | 1,662,584.43 | 1,141 | 1,530,220.84 | 1,128 | 1,681,702.76 | 743 | ||||||
Clerical or administrative support | 340,716.78 | 270 | 452,033.54 | 421 | 443,409.41 | 402 | 469,047.39 | 423 | 375,361.44 | 387 | 368,600.99 | 371 | 423,808.33 | 273 | ||||||
Manufacturing, construction, maintenance, or farming | 1,170,818.16 | 683 | 1,256,923.82 | 941 | 1,366,781.94 | 968 | 1,250,329.74 | 970 | 1,362,216.76 | 1,023 | 1,182,563.55 | 996 | 1,331,924.68 | 642 | ||||||
Professional, managerial, or technical | 636,681.78 | 399 | 620,937.35 | 595 | 585,906.37 | 572 | 656,343.34 | 563 | 800,063.56 | 565 | 737,028.11 | 556 | 702,597.58 | 408 | ||||||
Something else | 3,750.26 | 3 | 5,755.61 | 4 | 5,382.27 | 9 | 18,357.79 | 6 | 44,805.78 | 7 | 2,447.66 | 5 | 30,231.15 | 8 | ||||||
Some college or associates degree | ||||||||||||||||||||
Sales or service | 2,119,609.44 | 1,171 | 2,225,504.43 | 1,609 | 2,344,837.19 | 1,676 | 2,051,889.49 | 1,604 | 2,326,519.70 | 1,786 | 2,124,506.59 | 1,681 | 2,130,460.05 | 1,185 | ||||||
Clerical or administrative support | 1,059,863.00 | 689 | 1,054,056.58 | 939 | 1,104,790.26 | 917 | 1,009,295.31 | 906 | 917,592.62 | 969 | 957,458.75 | 897 | 1,151,759.59 | 649 | ||||||
Manufacturing, construction, maintenance, or farming | 1,129,054.98 | 688 | 1,153,404.53 | 937 | 1,092,654.99 | 967 | 1,111,667.24 | 921 | 1,100,853.72 | 934 | 1,404,031.34 | 943 | 967,864.82 | 643 | ||||||
Professional, managerial, or technical | 2,150,336.26 | 1,478 | 2,165,225.35 | 1,983 | 2,113,058.89 | 1,981 | 2,102,367.69 | 2,012 | 2,130,279.99 | 2,017 | 1,869,631.39 | 2,002 | 2,201,691.52 | 1,414 | ||||||
Something else | 3,370.23 | 7 | 6,270.17 | 10 | 11,620.51 | 13 | 16,103.53 | 14 | 5,910.66 | 6 | 16,839.89 | 9 | 18,037.26 | 9 | ||||||
Bachelors degree | ||||||||||||||||||||
Sales or service | 1,198,964.48 | 776 | 1,066,091.99 | 1,016 | 1,108,382.46 | 985 | 928,730.44 | 993 | 1,142,293.84 | 1,045 | 1,173,892.65 | 1,047 | 1,041,291.51 | 712 | ||||||
Clerical or administrative support | 550,344.86 | 396 | 485,124.59 | 479 | 554,515.12 | 511 | 640,136.79 | 533 | 553,331.09 | 535 | 687,831.93 | 558 | 603,274.08 | 381 | ||||||
Manufacturing, construction, maintenance, or farming | 327,650.06 | 223 | 278,962.62 | 312 | 326,049.57 | 337 | 376,779.26 | 301 | 311,670.17 | 302 | 386,643.87 | 321 | 342,461.46 | 221 | ||||||
Professional, managerial, or technical | 3,490,067.83 | 2,519 | 3,781,889.54 | 3,586 | 3,598,016.19 | 3,532 | 3,537,870.69 | 3,448 | 3,628,396.12 | 3,517 | 3,620,414.26 | 3,460 | 3,655,575.39 | 2,463 | ||||||
Something else | 1,569.75 | 3 | 10,904.59 | 3 | 9,240.48 | 6 | 8,391.68 | 10 | 4,766.86 | 3 | 1,524.71 | 8 | 4,093.29 | 4 | ||||||
Graduate degree or professional degree | ||||||||||||||||||||
Sales or service | 407,761.90 | 251 | 325,271.57 | 366 | 345,069.92 | 334 | 349,717.14 | 307 | 421,688.06 | 379 | 399,670.59 | 385 | 295,913.34 | 234 | ||||||
Clerical or administrative support | 133,270.32 | 125 | 162,821.19 | 164 | 197,150.23 | 180 | 206,896.40 | 163 | 244,780.52 | 178 | 189,128.77 | 165 | 197,210.40 | 131 | ||||||
Manufacturing, construction, maintenance, or farming | 90,665.02 | 70 | 110,044.45 | 117 | 101,388.94 | 108 | 170,316.50 | 107 | 82,921.29 | 102 | 103,042.94 | 110 | 97,717.60 | 76 | ||||||
Professional, managerial, or technical | 3,965,329.40 | 3,001 | 3,784,198.49 | 3,890 | 3,582,823.68 | 3,913 | 3,895,814.37 | 3,981 | 3,914,655.72 | 4,150 | 3,890,890.15 | 4,088 | 4,061,496.28 | 2,933 | ||||||
Something else | 5,054.21 | 6 | 5,947.14 | 4 | 20,441.16 | 3 | 2,146.62 | 3 | 2,033.31 | 7 | 613.61 | 2 | 507.95 | 2 |
statistic <- summarize_data(
data = nhts_data,
agg = "person_trip_rate",
by = "WORKER",
exclude_missing = TRUE
)
make_chart(statistic)
Person Trip Rate by Sex, Worker Status, and Travel Day of Week
statistic <- summarize_data(
data = nhts_data,
agg = "person_trip_rate",
by = c("R_SEX","WORKER","TRAVDAY"),
exclude_missing = TRUE
)
# Specify fill and facet
make_chart(
tbl = statistic,
fill = "WORKER",
facet = "TRAVDAY",
palette = "Accent"
)
statistic <- summarize_data(
data = nhts_data,
agg = "person_count",
by = "CENSUS_D"
)
make_map(statistic)
census_region_layer
census_division_layer
state_layer
/ state_tile_layer
cbsa_layer
Include a second table grouping by the original geography plus one variable.
statistic1 <- summarize_data(
data = nhts_data,
agg = "person_trip_rate",
by = "HHSTFIPS",
exclude_missing = TRUE
)
statistic2 <- summarize_data(
data = nhts_data,
agg = "person_trip_rate",
by = c("HHSTFIPS","WORKER"),
exclude_missing = TRUE
)
map <- make_map(
tbl = statistic1,
tbl2 = statistic2
)
map
digits
- Number of decimal places to usepercentage
- Treat proportions as percentages
scientific
- Use scientific notation
multiplier
- A value multiplier
make_table
statistic <- summarize_data(
data = nhts_data,
agg = "trip_count",
by = "PRMACT",
exclude_missing = TRUE
)
make_table(
tbl = statistic,
title = "Trip Count by Primary Activity (in Millions)",
output = c(W = "Trip Count (Millions)", E = "SE"),
digits = 0,
multiplier = 1000000
)
Trip Count by Primary Activity (in Millions) | ||
Trip Frequency | ||
---|---|---|
Trip Count (Millions) | SE | |
PRMACT | ||
Working | 187,651 | 1,637 |
Temporarily absent from a job or business | 8,764 | 558 |
Looking for work / unemployed | 11,419 | 848 |
A homemaker | 23,661 | 910 |
Going to school | 21,390 | 1,094 |
Retired | 53,644 | 1,146 |
Something else | 18,432 | 747 |