Text Manipulation with Regular Expressions Part 2

OUHSC Statistical Computing User Group

Will Beasley, Dept of Pediatrics,

Biomedical and Behavioral Methodology Core (BBMC)

November 5, 2019

(Based of the presentation from May 3, 2016)

Agenda

  • Review of Regex Part 2 from last month
  • Introduce a few more language-agnostic techniques
  • Apply in a few languages

Environments

  • Text editors
    • Notepad++, Atom, Visual Studio Code, or anything else halfway-serious
  • Languages
    • R, Python, SAS, & most others.
  • Databases
    • First-class support in Postgres with succinct ~ and in MySQL with REGEXP. And in Oracle with REGEXP_SUBSTR and REGEXP_LIKE, and even REGEXP_REPLACE.
    • It's tricky, but possible with SQLite and SQL Server.
    • The standard/portable LIKE SQL operator might do everything you need anyway, if you have only a simple comparison.

Overview of Regular Expressions

A 'regex' is typically a carefully crafted string that describes a pattern of text. It can:

  • Extract components of the text,
  • Substitute components of the text, or
  • Determine if the pattern simply appears in the text.

Proceed However You'd Like

  • Work by yourself or in pairs.
  • After you're done with these 6 exercises,
    • Invent new challenges
    • Help someone else
    • Check the solutions I thought of.

Potential Solutions for 3 & 4

Example 3

(,*\s*)"(\w+)"\s+=\s+"(\w+)" and
$1"$3"    = "$2"

Example 4

library\((\w+),\s*quietly=(T|TRUE)\) and
library($1)

Potential Solutions for 5 & 6

Example 5

\b(\d)\b and
0$1

Example 6

,c(\d{1,2})-(\d|nm),