Text Manipulation with Regular Expressions Part 2

OUHSC Statistical Computing User Group

Will Beasley, Dept of Pediatrics,

Biomedical and Behavioral Methodology Core (BBMC)

May 3, 2016

Agenda

  • Review of Regex Part 1 from two meetings ago.
  • Introduce a few more language-agnostic techniques
  • Apply in a few languages

Environments

  • Text editors
    • Notepad++, Atom, or anything else halway-serious
  • Languages
    • R, Python, SAS, & most others.
  • Databases
    • First-class support in Postgres with succinct ~ and in MySQL with REGEXP. And in Oracle with REGEXP_SUBSTR and REGEXP_LIKE, and even REGEXP_REPLACE.
    • It's tricky, but possible with SQLite and SQL Server.
    • The standard/portable LIKE SQL operator might do everything you need anyway.

Overview of Regular Expressions

A 'regex' is typically a carefully crafted string that describes a pattern of text. It can:

  • Extract components of the text,
  • Substitute components of the text, or
  • Determine if the pattern simply appears in the text.

Evaluate the Instructor

Look for an email invitation to a REDCap survey.

Potential Solutions for 3 & 4

Example 3

(,*\s*)"(\w+)"\s+=\s+"(\w+)" and
$1"$3"    = "$2"

Example 4

library\((\w+),\s*quietly=(T|TRUE)\) and 
library($1)

Potential Solutions for 5 & 6

Example 5

\b(\d)\b and 
0$1

Example 6

,c(\d{1,2})-(\d|nm),