OUHSC Statistical Computing User Group
Will Beasley, Dept of Pediatrics,
Biomedical and Behavioral Methodology Core (BBMC)
Combine three difference datasets that structurally and cosmetically differ. The state data has three different sources, each managed by a different agency.
File | Description |
---|---|
nurse-month-oklahoma.csv |
one row per nurse per month for Oklahoma County |
month-tulsa.csv |
one row per month for Tulsa County (ie, it's already aggregated) |
nurse-month-rural.csv |
one row per nurse per month for the other 75 counties |
Oklahoma | Tulsa | Rural | Approach | |
---|---|---|---|---|
Structure | one row per month per nurse |
one row per month (it's already aggregated) |
one row per month per nurse |
dplyr 's group_by() and summarize() |
Contains PHI | Yes | n | Yes | Hash |
Rename Fields | Yes | Yes | Yes | dplyr::rename() |
Missing Values | n | n | Yes | compare county holes |
Legit Holes | n | n | Yes | enumerate all combos and fill z/ zeros |
Right Censored | Maybe | Maybe | n | group, sort, andzoo::rollmedian() |
Oklahoma | Tulsa | Rural | Approach | |
---|---|---|---|---|
Date | Year & Month separate |
1/15/2009 |
06/2012 |
as.Date() format parameter |
FTE Type | Proportion | Sum | Percentage | regex gsub() |
Requires Linking Counties | Sorta | Sorta | Yes | Lookup Table & left join |
Misspelled Counties | – | – | Yes | car::recode() or plyr::revalue() |
Counties to Drop | n | n | Yes | blacklist |