OUHSC Statistical Computing User Group
Will Beasley, Dept of Pediatrics,
Biomedical and Behavioral Methodology Core (BBMC)
A 'regex' is typically a carefully crafted string that describes a pattern of text. It can:
It's like the big brother of wildcards you match filenames with
(eg, "*.R"
).
Pattern | Matches |
---|---|
mike |
“mike”, “smike”, “miked”, etc. |
mike4 |
“mike4”, “smike4”, etc. |
mike\d |
“mike” followed by any single digit (eg “mike8”, “smike8”) |
mike\d+ |
“mike” followed by one or more digits (eg “mike1234”, “smike8”) |
^mike$ |
only “mike” |
\b19(?=(1|2))(\d{2})\b
and 20\2
converts years in the 1910s and 1920s to the 2010s and 2020s
(but leaves later years as they are).
Today's as language agnostic as possible. SAS, R, and Python examples in Part 2.
Later, consider RegexBuddy for $40.
"\d\w"
) instead of "[:digit:][:alnum:]"
)1916-1918 subscales for a subject
1998-1914 subscales for a subject
subscales for a subject 1998-1920
1234
23
14a
1a3
234
1.39
In the right panel of bottom right panel of regex101.com. Especially these first:
. versus \.
\w and \d and \s (versus \W and \D and \S)
^ and \A
$ and \Z
? and * and + and things like {3,6}
Capturing
Character classes
"CL_ID" = "ClientID"
, "RMSEQ" = "RemovedSequence"
, "RMVL_BEGDT" = "RemovalBeginDate"
, "RMVL_ENDDT" = "RemovalEndDate"
, "END_TYP_CDE" = "EndTypeID"
, "REMOVED_FROM" = "RemovedFromTypeID"
, "CURR_RMVL_TYP" = "RemovalTypeCurrentID"
, "ORIG_RMVL_TYP" = "RemovalTypeOriginalID"
, "FMLY_STRUCTURE" = "FamilyStructureTypeID"
requireNamespace("dplyr", quietly=TRUE) #hadley/dplyr
requireNamespace("lubridate")
requireNamespace("OuhscMunge",quietly=T) #OuhscBbmc/OuhscMunge
9
4
34
3
62
43
1
Time,Gender,Genetype,Treatment,MouseID,OR-Recognition Index,FC-t-F %,FC-b-F %,FC-a-F %
4M,Male,WILD,Control,c9-1,0.32,11.9,0,25.7
4M,Male,WILD,Control,c13-2,0.47,23.7,0,11.
4M,Male,WILD,Prozac,c10-2,0.62,40.7,11.4,51.4
4M,Male,WILD,Prozac,c14-3,0.63,10.2,0,28.6
4M,Male,YFP,Control,c9-2,0.42,42.4,11.4,22.9
4M,Male,YFP,Control,c13-1,0.5,15.3,0,54.1
4M,Male,YFP,Control,c13-nm,1,27.1,0,31.4
4M,Male,YFP,Prozac,c10-1,0.65,20.3,17.1,54.3
(In some cases, you'd have to parse only the cell, not the entire line. But this is good practice.)
Look for an email invitation to a REDCap survey.
Example 3
(,*\s*)"(\w+)"\s+=\s+"(\w+)" and
$1"$3" = "$2"
Example 4
library\((\w+),\s*quietly=(T|TRUE)\) and
library($1)
Example 5
\b(\d)\b and
0$1
Example 6
,c(\d{1,2})-(\d|nm),