Exploring the NHTS with R

AnthonyFucci@Westat.com

AlexanderCates@Westat.com

summarizeNHTS

Wednesday, August 8, 2018

Overview

Workshop Goals

Topics

  1. Setting Up an R Analysis Environment
  2. Accessing the Data
  3. Generating Estimates
  4. Creating New Variables
  5. Visualizing Estimates
  6. Producing a Travel Analysis Report

“House Cleaning”

This presentation is intended for re-use

 

Presentation hotkeys

Key Action
C Show table of contents
F Toggles the display of the footer
A Toggles display of current vs all slides
S Make fonts smaller
B Make fonts larger

Next Topic

  1. Setting Up an R Analysis Environment
  2. Accessing the Data
  3. Generating Estimates
  4. Creating New Variables
  5. Visualizing Estimates
  6. Producing a Travel Analysis Report

Setting Up an R Analysis Environment

Click here for latest installation instructions

Instructions provide explicit download links for

Once installed, open RStudio

Setting Up an R Analysis Environment (cont.)

Make sure you have the “summarizeNHTS” R software package installed

install.packages("devtools")
devtools::install_github("Westat-Transportation/summarizeNHTS")

And now that we have installed the necessary software, load the software

library(summarizeNHTS)

OK, we’re ready!

Next Topic

  1. Setting Up an R Analysis Environment
  2. Accessing the Data
  3. Generating Estimates
  4. Creating New Variables
  5. Visualizing Estimates
  6. Producing a Travel Analysis Report

Accessing the Data

Before we begin, let’s make sure the summarizeNHTS package is loaded.

library(summarizeNHTS)

Accessing the Data: Downloading NHTS Data

# Not Run
download_nhts_data("2001")
download_nhts_data("2009")
download_nhts_data("2017")

Accessing the Data: Reading the 2017 NHTS data

nhts_data <- read_data("2017", "C:/NHTS")

Accessing the Data: Summarizing the data object

summary(nhts_data$data)
##           Length Class      Mode
## trip      60     data.table list
## person    87     data.table list
## household 59     data.table list
## vehicle   15     data.table list

Accessing the Data: Snapshot of the vehicle data

nhts_data$data$vehicle
##          HOUSEID VEHID ANNMILES FUELTYPE HFUEL HYBRID MAKE MODEL OD_READ
##      1: 30000007    01     1000       01    -1     02   49 49032   69000
##      2: 30000007    02     2500       01    -1     02   49 49442  164000
##      3: 30000007    03      900       01    -1     02   19 19014  120000
##      4: 30000007    04      500       01    -1     02   19 19021     -88
##      5: 30000007    05    10000       01    -1     02   20 20481  300000
##     ---                                                                 
## 256111: 40794301    01     9500       01    -1     02   37 37441   60976
## 256112: 40794301    02     9600       01    -1     02   49 49032   93388
## 256113: 40794301    03     6200       01    -1     02   35 35443  143616
## 256114: 40794301    04       -1       01    -1     02   49 49471  195078
## 256115: 40794301    05     1000       01    -1     02   35 XXXXX  167141
##         VEHAGE VEHOWNED VEHOWNMO VEHTYPE VEHYEAR WHOMAIN
##      1:     10       01       -1      01    2007      03
##      2:     13       01       -1      02    2004      -8
##      3:     19       01       -1      01    1998      01
##      4:     20       01       -1      01    1997      02
##      5:     24       01       -1      04    1993      02
##     ---                                                 
## 256111:      5       01       -1      02    2012      01
## 256112:     12       02       10      01    2005      04
## 256113:     21       01       -1      02    1996      03
## 256114:     26       01       -1      04    1991      02
## 256115:     27       01       -1      01    1990      01

Accessing the Data: Subsetting

# By position
nhts_data$data$vehicle[, c(1, 3)]

# By name (single variable)
nhts_data$data$vehicle$ANNMILES

# By name
nhts_data$data$vehicle[, list(HOUSEID, ANNMILES)]
# By row numbers (first 5 rows)
nhts_data$data$vehicle[1:5, ]

# By condition
nhts_data$data$vehicle[VEHTYPE == "01", ]

# By condition (multiple values)
nhts_data$data$vehicle[VEHTYPE %in% c("01","02"), ]

Accessing the Data: Codebook objects

# 2017 variables table
head(codebook_2017$variables)
##        NAME     TABLE      TYPE
## 1: AGERANGE    person character
## 2:  AIRPORT household character
## 3:    ALT_1    person character
## 4:   ALT_16    person character
## 5:    ALT_2    person character
## 6:   ALT_23    person character
##                                                                                  LABEL
## 1:                                                                           Age Range
## 2:                                                         Influence Choice of Airport
## 3:                           Alternative Mode of Transportation: Public Transportation
## 4:                   Alternative Mode of Transportation: Public Transportation or Taxi
## 5:            Alternative Mode of Transportation: Passenger to Friend or Family Member
## 6: Alternative Mode of Transportation: Passenger to Friend/Family Member or Rental Car
# 2017 values table
head(codebook_2017$values)
##        NAME VALUE                  LABEL
## 1: AGERANGE    -8           I don't know
## 2: AGERANGE    -7 I prefer not to answer
## 3: AGERANGE    -1       Appropriate skip
## 4: AGERANGE    01          0-4 years old
## 5: AGERANGE    02         5-15 years old
## 6: AGERANGE    03        16-17 years old

Next Topic

  1. Setting Up an R Analysis Environment
  2. Accessing the Data
  3. Generating Estimates
  4. Creating New Variables
  5. Visualizing Estimates
  6. Producing a Travel Analysis Report

Generating Estimates

Generating Estimates: Introduction to summarize_data

summarize_data(
  data = nhts_data,
  agg = "household_count"
)
##            W            E      S      N
## 1: 118208251 1.186759e-07 129696 129696

Generating Estimates: Exploring summarize_data Parameters

summarize_data(
  data = nhts_data,
  agg = "household_count"
)

Generating Estimates: Grouping by Variables

summarize_data(
  data = nhts_data,
  agg = "household_count",
  by = "IS_METRO"
)
##                   IS_METRO         W        E      S      N
## 1:       Metropolitan Area 100480095 37143.09 109250 109250
## 2: Not a Metropolitan Area  17728156 37143.09  20446  20446
summarize_data(
  data = nhts_data,
  agg = "household_count",
  by = c("IS_METRO","HOMEOWN")
)
##                   IS_METRO                HOMEOWN            W          E
## 1:       Metropolitan Area            I dont know     1728.364   1060.471
## 2:       Metropolitan Area I prefer not to answer    31126.577  10432.815
## 3:       Metropolitan Area                    Own 62154931.751 122801.012
## 4:       Metropolitan Area                   Rent 37328199.616 124577.033
## 5:       Metropolitan Area Some other arrangement   964108.222  87850.539
## 6: Not a Metropolitan Area I prefer not to answer     5866.829   4255.033
## 7: Not a Metropolitan Area                    Own 12363614.703 124022.494
## 8: Not a Metropolitan Area                   Rent  5135781.233 113252.628
## 9: Not a Metropolitan Area Some other arrangement   222893.706  36387.468
##        S     N
## 1:     3     3
## 2:    26    26
## 3: 81923 81923
## 4: 26568 26568
## 5:   730   730
## 6:     6     6
## 7: 16536 16536
## 8:  3700  3700
## 9:   204   204

Generating Estimates: Frequencies/Proportions

# Person count
summarize_data(
  data = nhts_data,
  agg = "person_count"
)
##            W        E      S      N
## 1: 301599169 0.390838 264234 264234
# Proportion of persons by WORKER, worker status
summarize_data(
  data = nhts_data,
  agg = "person_count",
  by = "WORKER",
  prop = TRUE
)
##              WORKER            W            E            S      N
## 1:  Not ascertained 6.819779e-05 2.760915e-05 6.433691e-05     17
## 2: Appropriate skip 1.508504e-01 1.263185e-03 1.057320e-01  27938
## 3:              Yes 5.205195e-01 1.690699e-03 4.855091e-01 128288
## 4:               No 3.285619e-01 2.475022e-03 4.086946e-01 107991

Generating Estimates: Numeric Aggregates

# Average TRPMILES, trip distance in miles
summarize_data(
  data = nhts_data,
  agg = "avg",
  agg_var = "TRPMILES"
)
##           W         E        S      N
## 1: 10.70435 0.2003875 11.45057 922916

Notes

Generating Estimates: Trip Rates

# Daily person Trips by worker status
summarize_data(
  data = nhts_data,
  agg = "person_trip_rate",
  by = "WORKER"
)
##              WORKER        W          E        S      N
## 1:  Not ascertained 1.280018 1.11437619 1.647059     28
## 2: Appropriate skip 2.780250 0.03060904 2.838249  79295
## 3:              Yes 3.773681 0.01752420 3.881454 497944
## 4:               No 3.006374 0.02919963 3.206795 346305

Generating Estimates: Subsetting in summarize_data

# Distribution of social/recreational trips by travel day
summarize_data(
  data = nhts_data,
  agg = "trip_count",
  by = "TRAVDAY",
  prop = TRUE,
  subset = "WHYTRP90 %in% c('07','08','10')"
)
##      TRAVDAY         W           E         S     N
## 1:    Sunday 0.1735797 0.003073281 0.1335907 33043
## 2:    Monday 0.1179881 0.002790327 0.1299642 32146
## 3:   Tuesday 0.1097795 0.001604585 0.1308981 32377
## 4: Wednesday 0.1123670 0.002679049 0.1350300 33399
## 5:  Thursday 0.1243485 0.003190797 0.1427399 35306
## 6:    Friday 0.1526481 0.003741184 0.1695486 41937
## 7:  Saturday 0.2092892 0.004953279 0.1582284 39137
# Person trip rate by Sex (for millennials)
summarize_data(
  data = nhts_data,
  agg = "person_trip_rate",
  by = "R_SEX",
  subset = "R_AGE >= 18 & R_AGE <= 34"
)
##                     R_SEX         W          E        S     N
## 1:            I dont know 0.9505731 1.09495843 1.333333     4
## 2: I prefer not to answer 3.4089751 0.63080999 3.647059   124
## 3:                   Male 3.1543275 0.04725515 3.262248 63591
## 4:                 Female 3.4506053 0.05616977 3.590901 77427

Generating Estimates: Documentation

?summarize_data
R Documentation for summarize_data

R Documentation for summarize_data

Next Topic

  1. Setting Up an R Analysis Environment
  2. Accessing the Data
  3. Generating Estimates
  4. Creating New Variables
  5. Visualizing Estimates
  6. Producing a Travel Analysis Report

Creating New Variables

Creating New Variables: Example Scenario

Example Derived Variable Coding Scenario

  1. Someone’s interested in querying the NHTS for a particular travel behavior

Anthony: “I am interested in exploring how financial burden may affect travel.”
Alex: “Remember that question about walking to save money? I would include that in your analysis.”

  1. Consider suggested variable’s usefulness for Anthony’s analysis:

WALK2SAVE: “I walk to places to save money.”

Values:

01 Strongly agree
02 Agree
03 Neither Agree or Disagree
04 Disagree
05 Strongly disagree
  1. Look for potential other ways of maniupulating this variable for analysis

  2. Create variable called WALK_FINANCE, a yes/no variable for the binary analysis question, “who does or does not walk to save money?”

Creating New Variables: Configuration

Creating New Variables: Configuration (cont.)

Derived variable file requirements

Item Description
NAME The name of the variable as it will appear in the dataset
TABLE The table level this variable is being computed for
TYPE Data type (numeric or character)
DOMAIN Logical expression that decides value assignment
VALUE A variable code value
LABEL Description of code value

Review File

Creating New Variables: Example 1 (Has/Has-not)

Using the Derived Variables file, create a variable with the following requirements:

 

NAME TABLE TYPE DOMAIN VALUE LABEL
HAS_VEHICLE household character HHVEHCNT > 0 1 Yes
HAS_VEHICLE household character HHVEHCNT == 0 2 No

Creating New Variables: Example 2 (Grouping)

Using the Derived Variables file, create a variable with the following requirements:

 

NAME TABLE TYPE DOMAIN VALUE LABEL
AGE_GROUP person character R_AGE >= 0 & R_AGE <= 17 1 Child
AGE_GROUP person character R_AGE >= 18 & R_AGE <= 44 2 Young Adult
AGE_GROUP person character R_AGE >= 45 & R_AGE <= 65 3 Middle Adult
AGE_GROUP person character R_AGE >= 66 4 Older Adult

Creating New Variables: Example 3 (Uses/Does-not-use)

Using the Derived Variables file, create a variable with the following requirements:

 

NAME TABLE TYPE DOMAIN VALUE LABEL
USES_TNC person character RIDESHARE > 0 1 Yes
USES_TNC person character RIDESHARE == 0 2 No

Creating New Variables: Example 4 (Is/Is-not)

Using the Derived Variables file, create a variable with the following requirements:

 

NAME TABLE TYPE DOMAIN VALUE LABEL
IS_METRO household character MSACAT %in% c(‘01’,‘02’,‘03’) 1 Yes
IS_METRO household character MSACAT %in% c(‘04’) 2 No

Creating New Variables: Summary

Next Topic

  1. Setting Up an R Analysis Environment
  2. Accessing the Data
  3. Generating Estimates
  4. Creating New Variables
  5. Visualizing Estimates
  6. Producing a Travel Analysis Report

Visualizing Estimates

Visualizing Estimates: Tables (Introductory)

statistic <- summarize_data(
  data = nhts_data,
  agg = "person_trip_rate",
  by = "WORKER"
)

make_table(statistic)
Person Trip Rate
Weighted MOE (95%) Surveyed N
WORKER
        Not ascertained 1.28 2.21 1.65 28
        Appropriate skip 2.78 0.06 2.84 79,295
        Yes 3.77 0.03 3.88 497,944
        No 3.01 0.06 3.21 346,305

Visualizing Estimates: Tables (Advanced)

statistic <- summarize_data(
  data = nhts_data,
  agg = "person_count",
  by = c("TRAVDAY","OCCAT","EDUC"),
  exclude_missing = TRUE
)

make_table(
  tbl = statistic,
  title = "Table 1: Distribution of Persons (%) by Travel Day, Job Category, and Educational Attainment",
  output = c(W = "Weighted Percentage", N = "Sample Size"),
  row_vars = c("EDUC","OCCAT")
)
Table 1: Distribution of Persons (%) by Travel Day, Job Category, and Educational Attainment
TRAVDAY
Sunday   Monday   Tuesday   Wednesday   Thursday   Friday   Saturday
EDUC
by
OCCAT
Weighted Percentage Sample Size   Weighted Percentage Sample Size   Weighted Percentage Sample Size   Weighted Percentage Sample Size   Weighted Percentage Sample Size   Weighted Percentage Sample Size   Weighted Percentage Sample Size
Less than a high school graduate
        Sales or service 574,253.60 185   456,938.19 290   464,457.46 296   421,934.03 277   515,296.12 329   476,395.16 319   531,217.96 195
        Clerical or administrative support 24,191.30 19   52,107.51 23   43,121.83 22   16,618.23 19   11,019.01 21   37,910.96 27   11,626.80 7
        Manufacturing, construction, maintenance, or farming 346,452.77 126   360,873.85 181   259,282.42 178   401,220.90 204   387,728.36 222   315,426.80 190   313,535.91 124
        Professional, managerial, or technical 120,839.84 34   123,697.34 66   93,703.74 57   87,975.61 59   50,525.03 53   72,764.91 61   77,724.76 33
        Something else 210.64 1   15,959.54 5   160.85 1   3,835.75 1   2,581.47 2   3,218.39 5   950.45 1
High school graduate or GED
        Sales or service 1,507,054.46 777   1,532,467.67 1,077   1,680,817.18 1,107   1,671,275.11 1,067   1,662,584.43 1,141   1,530,220.84 1,128   1,681,702.76 743
        Clerical or administrative support 340,716.78 270   452,033.54 421   443,409.41 402   469,047.39 423   375,361.44 387   368,600.99 371   423,808.33 273
        Manufacturing, construction, maintenance, or farming 1,170,818.16 683   1,256,923.82 941   1,366,781.94 968   1,250,329.74 970   1,362,216.76 1,023   1,182,563.55 996   1,331,924.68 642
        Professional, managerial, or technical 636,681.78 399   620,937.35 595   585,906.37 572   656,343.34 563   800,063.56 565   737,028.11 556   702,597.58 408
        Something else 3,750.26 3   5,755.61 4   5,382.27 9   18,357.79 6   44,805.78 7   2,447.66 5   30,231.15 8
Some college or associates degree
        Sales or service 2,119,609.44 1,171   2,225,504.43 1,609   2,344,837.19 1,676   2,051,889.49 1,604   2,326,519.70 1,786   2,124,506.59 1,681   2,130,460.05 1,185
        Clerical or administrative support 1,059,863.00 689   1,054,056.58 939   1,104,790.26 917   1,009,295.31 906   917,592.62 969   957,458.75 897   1,151,759.59 649
        Manufacturing, construction, maintenance, or farming 1,129,054.98 688   1,153,404.53 937   1,092,654.99 967   1,111,667.24 921   1,100,853.72 934   1,404,031.34 943   967,864.82 643
        Professional, managerial, or technical 2,150,336.26 1,478   2,165,225.35 1,983   2,113,058.89 1,981   2,102,367.69 2,012   2,130,279.99 2,017   1,869,631.39 2,002   2,201,691.52 1,414
        Something else 3,370.23 7   6,270.17 10   11,620.51 13   16,103.53 14   5,910.66 6   16,839.89 9   18,037.26 9
Bachelors degree
        Sales or service 1,198,964.48 776   1,066,091.99 1,016   1,108,382.46 985   928,730.44 993   1,142,293.84 1,045   1,173,892.65 1,047   1,041,291.51 712
        Clerical or administrative support 550,344.86 396   485,124.59 479   554,515.12 511   640,136.79 533   553,331.09 535   687,831.93 558   603,274.08 381
        Manufacturing, construction, maintenance, or farming 327,650.06 223   278,962.62 312   326,049.57 337   376,779.26 301   311,670.17 302   386,643.87 321   342,461.46 221
        Professional, managerial, or technical 3,490,067.83 2,519   3,781,889.54 3,586   3,598,016.19 3,532   3,537,870.69 3,448   3,628,396.12 3,517   3,620,414.26 3,460   3,655,575.39 2,463
        Something else 1,569.75 3   10,904.59 3   9,240.48 6   8,391.68 10   4,766.86 3   1,524.71 8   4,093.29 4
Graduate degree or professional degree
        Sales or service 407,761.90 251   325,271.57 366   345,069.92 334   349,717.14 307   421,688.06 379   399,670.59 385   295,913.34 234
        Clerical or administrative support 133,270.32 125   162,821.19 164   197,150.23 180   206,896.40 163   244,780.52 178   189,128.77 165   197,210.40 131
        Manufacturing, construction, maintenance, or farming 90,665.02 70   110,044.45 117   101,388.94 108   170,316.50 107   82,921.29 102   103,042.94 110   97,717.60 76
        Professional, managerial, or technical 3,965,329.40 3,001   3,784,198.49 3,890   3,582,823.68 3,913   3,895,814.37 3,981   3,914,655.72 4,150   3,890,890.15 4,088   4,061,496.28 2,933
        Something else 5,054.21 6   5,947.14 4   20,441.16 3   2,146.62 3   2,033.31 7   613.61 2   507.95 2

Visualizing Estimates: Charts (Introductory)

statistic <- summarize_data(
  data = nhts_data,
  agg = "person_trip_rate",
  by = "WORKER",
  exclude_missing = TRUE
)

make_chart(statistic)

Visualizing Estimates: Charts (Advanced)

Person Trip Rate by Sex, Worker Status, and Travel Day of Week

statistic <- summarize_data(
  data = nhts_data,
  agg = "person_trip_rate",
  by = c("R_SEX","WORKER","TRAVDAY"),
  exclude_missing = TRUE
)
# Specify fill and facet
make_chart(
  tbl = statistic, 
  fill = "WORKER",
  facet = "TRAVDAY",
  palette = "Accent"
)

Visualizing Estimates: Maps (Introductory)

statistic <- summarize_data(
  data = nhts_data,
  agg = "person_count",
  by = "CENSUS_D"
)

make_map(statistic)

Visualizing Estimates: Maps - Built in Geography Layers

Visualizing Estimates: Maps (Advanced)

Include a second table grouping by the original geography plus one variable.

statistic1 <- summarize_data(
  data = nhts_data,
  agg = "person_trip_rate",
  by = "HHSTFIPS",
  exclude_missing = TRUE
)

statistic2 <- summarize_data(
  data = nhts_data,
  agg = "person_trip_rate",
  by = c("HHSTFIPS","WORKER"),
  exclude_missing = TRUE
)

map <- make_map(
  tbl = statistic1, 
  tbl2 = statistic2
)
map

Formatting

statistic <- summarize_data(
  data = nhts_data,
  agg = "trip_count",
  by = "PRMACT",
  exclude_missing = TRUE
)

make_table(
  tbl = statistic,
  title = "Trip Count by Primary Activity (in Millions)",
  output = c(W = "Trip Count (Millions)", E = "SE"),
  digits = 0,
  multiplier = 1000000
)
Trip Count by Primary Activity (in Millions)
Trip Frequency
Trip Count (Millions) SE
PRMACT
        Working 187,651 1,637
        Temporarily absent from a job or business 8,764 558
        Looking for work / unemployed 11,419 848
        A homemaker 23,661 910
        Going to school 21,390 1,094
        Retired 53,644 1,146
        Something else 18,432 747

Next Topic

  1. Setting Up an R Analysis Environment
  2. Accessing the Data
  3. Generating Estimates
  4. Creating New Variables
  5. Visualizing Estimates
  6. Producing a Travel Analysis Report

Producing a Travel Analysis Report