BBMC Validator: catch and communicate data errors

OUHSC Statistical Computing User Group

Will Beasley1, Geneva Marshall1, Thomas Wilson1,
Som Bohora2, & Maleeha Shahid2.

  1. Biomedical and Behavioral Methodology Core (BBMC)
  2. Center on Child Abuse and Neglect (CCAN)

November 1, 2016

Objectives for Validator Software

  1. catches & displays data entry errors,
  2. communicates problems to statisticians,
  3. communicates problems to data collectors and managers
    (who typically have some tech phobia),
  4. executes with automation, and
  5. produces self-contained report file that can be emailed.

`validation_check` S3 object

validation_check <- function( 
  name, error_message, priority, passing_test 
) {
  # S3 object to check
  l <- list()
  class(l)         <- "check"
  l$name           <- name
  l$error_message  <- error_message
  l$priority       <- priority
  l$passing_test   <- passing_test
  return( l )
}

Declare List of Checks

# Add to this list for new validators.
checks <- list(
  validation_check(
    name          = "record_id_no_white_space",
    error_message = "'record_id' contains white space.",
    priority      = 1L,
    passing_test  = function( d ) {
      !grepl("\\s", d$record_id, perl=T)
    }
  ),
  validation_check(
    name          = "interview_started_set",
    error_message = "`interview_started` can't be missing.",
    priority      = 2L,
    passing_test  = function( d ) {
      !is.na(d$interview_started)
    }
  ),
  ...
)

Upcoming Features/Uses

  1. Report runs updates every 10 minutes, and is displayed in Shiny.
  2. Report-level checks will supplement the record-level checks.
    (e.g., “At least 30% of participants should be female.”)
  3. Graph performance of each data collector.
    (Suggested by Geneva Marshall.)
  4. The data collectors could check the report after their 3 hour interview, but before leaving the participant's home.
    (Suggested by Thomas Wilson.)
  5. Pull the reusable code into a package, leaving a file with only the checks and a few project-specific parameters.

Generalizable

  • We want this mechanism to be used in almost all our research that involves live data collection. We'll also make this publically available.
  • Ideally, a single mechanism accommodates all these types of research.
  • How could this be modified/expanded to accommodate your type of research and human environments?

Feedback During Presentation

  • Mike Anderson: use a similar tool to create an action item report that fills the gap between (a) static REDCap scheduling and (b) “errors” of the Validator report.
  • Summer Frank: Building off of Thomas's idea, the data collectors could review only the top priority violations before leaving the participant. Ignore the checks that can be corrected later, that don't eat up more of the participant's time.
  • Dwayne Geller: Use REDCap's DET (data entry trigger) to run a mini version of the validator that shows errors to the data collector in a REDCap text field. This would reduce a round trip between the Validator report and REDCap.