Text Manipulation with Regular Expressions Part 1

OUHSC Statistical Computing User Group

Will Beasley, Dept of Pediatrics,

Biomedical and Behavioral Methodology Core (BBMC)

February 2, 2016

Overview of Regular Expressions

A 'regex' is typically a carefully crafted string that describes a pattern of text. It can:

  • Extract components of the text,
  • Substitute components of the text, or
  • Determine if the pattern simply appears in the text.

Generalization of Simple Wildcards

It's like the big brother of wildcards you match filenames with
(eg, "*.R").

windows-exporer

Simple Examples

Pattern Matches
mike “mike”, “smike”, “miked”, etc.
mike4 “mike4”, “smike4”, etc.
mike\d “mike” followed by any single digit (eg “mike8”, “smike8”)
mike\d+ “mike” followed by one or more digits (eg “mike1234”, “smike8”)
^mike$ only “mike”

Evaluate the Instructor

Look for an email invitation to a REDCap survey.

Potential Solutions for 3 & 4

Example 3

(,*\s*)"(\w+)"\s+=\s+"(\w+)" and
$1"$3"    = "$2"

Example 4

library\((\w+),\s*quietly=(T|TRUE)\) and 
library($1)

Potential Solutions for 5 & 6

Example 5

\b(\d)\b and 
0$1

Example 6

,c(\d{1,2})-(\d|nm),