BBMC Validator: catch and communicate data errors

Objectives for Validator Software

catches & displays data entry errors,
communicates problems to statisticians,
communicates problems to data collectors and managers
(who typically have some tech phobia),
executes with automation, and
produces self-contained report file that can be emailed.

`validation_check` S3 object

validation_check <- function( 
  name, error_message, priority, passing_test 
) {
  # S3 object to check
  l <- list()
  class(l)         <- "check"
  l$name           <- name
  l$error_message  <- error_message
  l$priority       <- priority
  l$passing_test   <- passing_test
  return( l )
}

Declare List of Checks

# Add to this list for new validators.
checks <- list(
  validation_check(
    name          = "record_id_no_white_space",
    error_message = "'record_id' contains white space.",
    priority      = 1L,
    passing_test  = function( d ) {
      !grepl("\\s", d$record_id, perl=T)
    }
  ),
  validation_check(
    name          = "interview_started_set",
    error_message = "`interview_started` can't be missing.",
    priority      = 2L,
    passing_test  = function( d ) {
      !is.na(d$interview_started)
    }
  ),
  ...
)

Execute Checks

for( check in checks ) {
  index <- length(ds_violation_list) + 1L
  violations <- !check$passing_test(ds_interview)
  ds_violation <- ds_interview %>%
    dplyr::filter(violations)


  if( nrow(ds_violation) > 0L ) {
    ds_violation_list[[index]] <- extract_violation_info(ds_violation, check)
  }
  rm(violations, ds_violation)
}

Display Failures as HTML

DT::datatable(
  data         = ds_violation_pretty,
  filter       = "bottom",
  caption      = paste("Violations at", Sys.time()),
  escape       = FALSE,
  options      = list(pageLength = 30, dom = 'tip')
)

Save Failures as CSV

# ---- save-to-disk ----------------------------------
message("Saving list of violations to `", path_output, "`.")

readr::write_csv(ds_violation, path=path_output)

Live Products

CSV of Violations.
Self-explanatory & portable report (hopefully).

Table of Violations

display-table

Portable HTML Report

Example Checks

example-checks

Important Characteristics

No PHI within report.
Because you can't control where it will be emailed.
URLs link to PHI within REDCap.
Let REDCap handle all the authentication duties.
Sortable & filterable table.
By date, user, error type.
Portable & disconnected report.
The data collectorsaren't always OUHSC employees or on campus.
Database agnostic.
Accommodates REDCap, SQL Server, CSV, …

Human Considerations

Each check should be easy to understand
Each violation should be easy (as possible) to fix
Send reports frequently to data collectors
- So the list doesn't become overwhelmingly large
- So the cases are fresh on their minds
What other suggestions do you have?

Upcoming Features/Uses

Report runs updates every 10 minutes, and is displayed in Shiny.
Report-level checks will supplement the record-level checks.
(e.g., “At least 30% of participants should be female.”)
Graph performance of each data collector.
(Suggested by Geneva Marshall.)
The data collectors could check the report after their 3 hour interview, but before leaving the participant's home.
(Suggested by Thomas Wilson.)
Pull the reusable code into a package, leaving a file with only the checks and a few project-specific parameters.

Generalizable

We want this mechanism to be used in almost all our research that involves live data collection. We'll also make this publically available.
Ideally, a single mechanism accommodates all these types of research.
How could this be modified/expanded to accommodate your type of research and human environments?

Feedback During Presentation

Mike Anderson: use a similar tool to create an action item report that fills the gap between (a) static REDCap scheduling and (b) “errors” of the Validator report.
Summer Frank: Building off of Thomas's idea, the data collectors could review only the top priority violations before leaving the participant. Ignore the checks that can be corrected later, that don't eat up more of the participant's time.
Dwayne Geller: Use REDCap's DET (data entry trigger) to run a mini version of the validator that shows errors to the data collector in a REDCap text field. This would reduce a round trip between the Validator report and REDCap.