PeterMac Data Science’s modified version of material by the University of Cambridge (Mark Dunning, Suraj Menon and Aiora Zabala. Original material by Robert Stojnić, Laurent Gatto, Rob Foy, John Davey, Dávid Molnár and Ian Roberts)

3. R for data analysis

3 steps to Basic Data Analysis

  • In this short section, we show how the data manipulation steps we have just seen can be used as part of an analysis pipeline:
  1. Reading in data
    • read.table()
    • read.csv(), read.delim()
  2. Analysis
    • Manipulating & reshaping the data
      • perhaps dealing with “missing data”
    • Any maths you like
    • Diagnostic Plots
  3. Writing out results
    • write.table()
    • write.csv()

A simple walkthrough

  • We have data from 100 patients that given consent for their data to use in future studies
  • A researcher wants to undertake a study involving people that are overweight
  • We will walkthrough how to filter the data and write a new file with the candidates for the study

The Working Directory (wd)

  • Like many programs R has a concept of a working directory
  • It is the place where R will look for files to execute and where it will save files, by default
  • For this course we need to set the working directory to the location of the course scripts
  • In RStudio use the mouse and browse to the directory where you saved the Course Materials

  • Session → Set Working Directory → Choose Directory…

0. Locate the data

Before we even start the analysis, we need to be sure of where the data are located on our hard drive

  • Functions that import data need a file location as a character vector
  • The default location is the working directory
getwd()
[1] "/home/mdoyle/RStudio/training/IntroR/r-intro"
  • If the file you want to read is in your working directory, you can just use the file name
list.files()
  • The file.exists function does exactly what it says on the tin!
    • a good sanity check for your code
file.exists("patient-info.txt")
  • Otherwise you need the path to the file
    • you can get this using file.choose()
  • If you unsure about specifying a file path at the command line, this online tutorial will give you hands-on practice

1. Read in the data

  • The data are a tab-delimited file. Each row is a record, each column is a field. Columns are separated by tabs in the text
  • We need to read in the results and assign it to an object (patients)
patients <- read.delim("patient-info.txt")

In the latest RStudio, there is the option to import data directly from the File menu. File -> Import Dataset -> From Csv

  • If the data are comma-separated, then use either the argument sep="," or the function read.csv():
  • You need to make sure you use the correct function
    • can you explain the output of the following lines of code?
tmp <- read.csv("patient-info.txt")
head(tmp)
  • For full list of arguments:
?read.table

1b. Check the data

  • Always check the object to make sure the contents and dimensions are as you expect
  • R will sometimes create the object without error, but the contents may be un-usable for analysis
    • If you specify an incorrect separator, R will not be able to locate the columns in your data, and you may end up with an object with just one column
# View the first 10 rows to ensure import is OK
patients[1:10,]  
  • or use the View() function to get a display of the data in RStudio:
View(patients)

1c. Understanding the object

  • Once we have read the data successfully, we can start to interact with it
  • The object we have created is a data frame:
class(patients)
  • We can query the dimensions:
ncol(patients)
nrow(patients)
dim(patients)
  • We can also examine the type of data in the frame:
str(patients)
  • The names of the columns are automatically assigned:
colnames(patients)
  • We can use any of these names to access a particular column:
    • and create a vector
    • TOP TIP: type the name of the object and hit TAB: you can select the column from the drop-down list!
patients$ID

Word of warning

Like families, tidy datasets are all alike but every messy dataset is messy in its own way - (Hadley Wickham - RStudio chief scientist and author of dplyr, ggplot2 and others)

You will make your life a lot easier if you keep your data tidy and organised. Before blaming R, consider if your data are in a suitable form for analysis. The more manual manipulation you have done on the data (highlighting, formulas, copy-and-pasting), the less happy R is going to be to read it. Here are some useful links on some common pitfalls and how to avoid them

Handling missing values

  • The data frame contains some NA values, which means the values are missing – a common occurrence in real data collection
  • NA is a special value that can be present in objects of any type (logical, character, numeric etc)
  • NA is not the same as NULL:
    • NULL is an empty R object.
    • NA is one missing value within an R object (like a data frame or a vector)
  • Often R functions will handle NAs gracefully:
length(patients$Height)
mean(patients$Height)
  • However, sometimes we have to tell the functions what to do with them.
  • R has some built-in functions for dealing with NAs, and functions often have their own arguments (like na.rm) for handling them:
    • annoyingly, different functions have different argument names to change their behaviour with regards to NA values. Always check the documentation
mean(patients$Height, na.rm = TRUE)

mean(na.omit(patients$Height))

2. Analysis (reshaping data and maths)

  • Our analysis involves identifying patients with extreme BMI
    • we will define this as being two standard deviations from the mean
# Create an index of results:
BMI <- (patients$Weight)/((patients$Height/100)^2)
upper.limit <- mean(BMI,na.rm = TRUE) + 2*sd(BMI,na.rm = TRUE)
upper.limit
  • We can plot a simple chart of the BMI values
    • add a vertical line to indicate the cut-off
    • plotting will be covered in detail shortly..
plot(BMI)
# Add a horizonal line:
abline(h=upper.limit) 
  • It is also useful to save the variable we have computed as a new column in the data frame
round(BMI,1)
patients$BMI <- round(BMI,1)
head(patients)
  • To actually select the candidates we can use a logical expression to test the values of the BMI vector being greater than the upper limit
    • if the second line looks a bit weird, remember that <- is doing an assignment. Thevalue we are assigning to our new variable is the logical (TRUE or FALSE) vector given by testing each item in BMI against the upper.limit
BMI > upper.limit
candidates <- BMI > upper.limit

We have seen that a logical vector can be used to subset a data frame

  • However, in our case the result looks a bit funny
  • Can you think why this might be?
patients[candidates,]

The which function will take a logical vector and return the indices of the TRUE values

  • This can then be used to subset the data frame
which(BMI > upper.limit)
candidates <- which(BMI > upper.limit)
  • Lets try again:
patients[candidates,]

3. Outputting the results

  • We write out a data frame of candidates (patients with BMI more than standard deviations from the mean) as a ‘comma separated values’ text file (CSV):
write.csv(patients[candidates,], file="selectedSamples.csv")
  • The output file is directly-readable by Excel
  • It’s often helpful to double check where the data has been saved. Use the get working directory function:
getwd()      # print working directory
list.files() # list files in working directory

To recap, the set of R commands we have used is:-

patients <- read.delim("patient-info.txt")
BMI <- (patients$Weight)/((patients$Height/100)^2)
upper.limit <- mean(BMI,na.rm = TRUE) + 2*sd(BMI,na.rm = TRUE)
plot(BMI)
# Add a horizonal line:
abline(h=upper.limit) 
patients$BMI <- round(BMI,1)
candidates <- which(BMI > upper.limit)
write.csv(patients[candidates,], file="selectedSamples.csv")

Exercise: Exercise 3

  • A separate study is looking for patients that are underweight and also smoke;
  • Modify the condition in our previous code to find these patients
  • e.g. having BMI that is 2 standard deviations less than the mean BMI
  • Write out a results file of the samples that match these criteria, and open it in a spreadsheet program
### Your Answer Here ### 
LS0tCnRpdGxlOiAiSW50cm9kdWN0aW9uIHRvIFNvbHZpbmcgQmlvbG9naWNhbCBQcm9ibGVtcyBVc2luZyBSIC0gV2VlayAyIgpkYXRlOiAnYHIgZm9ybWF0KFN5cy50aW1lKCksICJMYXN0IG1vZGlmaWVkOiAlZCAlYiAlWSIpYCcKb3V0cHV0OgogIGh0bWxfbm90ZWJvb2s6CiAgICB0b2M6IHllcwogICAgdG9jX2Zsb2F0OiB5ZXMKLS0tCipQZXRlck1hYyBEYXRhIFNjaWVuY2UncyBtb2RpZmllZCB2ZXJzaW9uIG9mIG1hdGVyaWFsIGJ5IHRoZSBVbml2ZXJzaXR5IG9mIENhbWJyaWRnZSAoTWFyayBEdW5uaW5nLCBTdXJhaiBNZW5vbiBhbmQgQWlvcmEgWmFiYWxhLiBPcmlnaW5hbCBtYXRlcmlhbCBieSBSb2JlcnQgU3Rvam5pxIcsCiAgTGF1cmVudCBHYXR0bywgUm9iIEZveSwgSm9obiBEYXZleSwgRMOhdmlkIE1vbG7DoXIgYW5kIElhbiBSb2JlcnRzKSoKCiMgMy4gUiBmb3IgZGF0YSBhbmFseXNpcwoKIyMzIHN0ZXBzIHRvIEJhc2ljIERhdGEgQW5hbHlzaXMKCi0gSW4gdGhpcyBzaG9ydCBzZWN0aW9uLCB3ZSBzaG93IGhvdyB0aGUgZGF0YSBtYW5pcHVsYXRpb24gc3RlcHMgd2UgaGF2ZSBqdXN0IHNlZW4gY2FuIGJlIHVzZWQgYXMgcGFydCBvZiBhbiBhbmFseXNpcyBwaXBlbGluZToKCjEuIFJlYWRpbmcgaW4gZGF0YQogICAgKyBgcmVhZC50YWJsZSgpYAogICAgKyBgcmVhZC5jc3YoKSwgcmVhZC5kZWxpbSgpYAoyLiBBbmFseXNpcwogICAgKyBNYW5pcHVsYXRpbmcgJiByZXNoYXBpbmcgdGhlIGRhdGEKICAgICAgICArIHBlcmhhcHMgZGVhbGluZyB3aXRoICJtaXNzaW5nIGRhdGEiCiAgICArIEFueSBtYXRocyB5b3UgbGlrZQogICAgKyBEaWFnbm9zdGljIFBsb3RzCjMuIFdyaXRpbmcgb3V0IHJlc3VsdHMKICAgICsgYHdyaXRlLnRhYmxlKClgCiAgICArIGB3cml0ZS5jc3YoKWAKICAKIyMgQSBzaW1wbGUgd2Fsa3Rocm91Z2gKCi0gV2UgaGF2ZSBkYXRhIGZyb20gMTAwIHBhdGllbnRzIHRoYXQgZ2l2ZW4gY29uc2VudCBmb3IgdGhlaXIgZGF0YSB0byB1c2UgaW4gZnV0dXJlIHN0dWRpZXMKLSBBIHJlc2VhcmNoZXIgd2FudHMgdG8gdW5kZXJ0YWtlIGEgc3R1ZHkgaW52b2x2aW5nIHBlb3BsZSB0aGF0IGFyZSBvdmVyd2VpZ2h0Ci0gV2Ugd2lsbCB3YWxrdGhyb3VnaCBob3cgdG8gZmlsdGVyIHRoZSBkYXRhIGFuZCB3cml0ZSBhIG5ldyBmaWxlIHdpdGggdGhlIGNhbmRpZGF0ZXMgZm9yIHRoZSBzdHVkeSAgICAKICAgIAojI1RoZSBXb3JraW5nIERpcmVjdG9yeSAod2QpCgoKLSBMaWtlIG1hbnkgcHJvZ3JhbXMgUiBoYXMgYSBjb25jZXB0IG9mIGEgd29ya2luZyBkaXJlY3RvcnkgCi0gSXQgaXMgdGhlIHBsYWNlIHdoZXJlIFIgd2lsbCBsb29rIGZvciBmaWxlcyB0byBleGVjdXRlIGFuZCB3aGVyZSBpdCB3aWxsCnNhdmUgZmlsZXMsIGJ5IGRlZmF1bHQKLSBGb3IgdGhpcyBjb3Vyc2Ugd2UgbmVlZCB0byBzZXQgdGhlIHdvcmtpbmcgZGlyZWN0b3J5IHRvIHRoZSBsb2NhdGlvbgpvZiB0aGUgY291cnNlIHNjcmlwdHMKLSBJbiBSU3R1ZGlvIHVzZSB0aGUgbW91c2UgYW5kIGJyb3dzZSB0byB0aGUgZGlyZWN0b3J5IHdoZXJlIHlvdSBzYXZlZCB0aGUgQ291cnNlIE1hdGVyaWFscwoKLSAqKipTZXNzaW9uIOKGkiBTZXQgV29ya2luZyBEaXJlY3Rvcnkg4oaSIENob29zZSBEaXJlY3RvcnkuLi4qKioKCiMjIDAuIExvY2F0ZSB0aGUgZGF0YQoKQmVmb3JlIHdlIGV2ZW4gc3RhcnQgdGhlIGFuYWx5c2lzLCB3ZSBuZWVkIHRvIGJlIHN1cmUgb2Ygd2hlcmUgdGhlIGRhdGEgYXJlIGxvY2F0ZWQgb24gb3VyIGhhcmQgZHJpdmUKCi0gRnVuY3Rpb25zIHRoYXQgaW1wb3J0IGRhdGEgbmVlZCBhIGZpbGUgbG9jYXRpb24gYXMgYSBjaGFyYWN0ZXIgdmVjdG9yCi0gVGhlIGRlZmF1bHQgbG9jYXRpb24gaXMgdGhlICoqKndvcmtpbmcgZGlyZWN0b3J5KioqCmBgYHtyfQpnZXR3ZCgpCmBgYAoKCi0gSWYgdGhlIGZpbGUgeW91IHdhbnQgdG8gcmVhZCBpcyBpbiB5b3VyIHdvcmtpbmcgZGlyZWN0b3J5LCB5b3UgY2FuIGp1c3QgdXNlIHRoZSBmaWxlIG5hbWUKYGBge3J9Cmxpc3QuZmlsZXMoKQpgYGAKCgotIFRoZSBgZmlsZS5leGlzdHNgIGZ1bmN0aW9uIGRvZXMgZXhhY3RseSB3aGF0IGl0IHNheXMgb24gdGhlIHRpbiEKICAgICsgYSBnb29kIHNhbml0eSBjaGVjayBmb3IgeW91ciBjb2RlCmBgYHtyfQpmaWxlLmV4aXN0cygicGF0aWVudC1pbmZvLnR4dCIpCmBgYAoKLSBPdGhlcndpc2UgeW91IG5lZWQgdGhlICpwYXRoKiB0byB0aGUgZmlsZQogICAgKyB5b3UgY2FuIGdldCB0aGlzIHVzaW5nICoqYGZpbGUuY2hvb3NlKClgKioKICAgIAotIElmIHlvdSB1bnN1cmUgYWJvdXQgc3BlY2lmeWluZyBhIGZpbGUgcGF0aCBhdCB0aGUgY29tbWFuZCBsaW5lLCB0aGlzIFtvbmxpbmUgdHV0b3JpYWxdKGh0dHA6Ly9yaWsuc21pdGgtdW5uYS5jb20vY29tbWFuZF9saW5lX2Jvb3RjYW1wLz9pZD12Y3poeWJqaHR5dCkgd2lsbCBnaXZlIHlvdSBoYW5kcy1vbiBwcmFjdGljZQogICAgCiMjMS4gUmVhZCBpbiB0aGUgZGF0YQoKLSBUaGUgZGF0YSBhcmUgYSB0YWItZGVsaW1pdGVkIGZpbGUuIEVhY2ggcm93IGlzIGEgcmVjb3JkLCBlYWNoIGNvbHVtbiBpcyBhIGZpZWxkLiBDb2x1bW5zIGFyZSBzZXBhcmF0ZWQgYnkgdGFicyBpbiB0aGUgdGV4dAotIFdlIG5lZWQgdG8gcmVhZCBpbiB0aGUgcmVzdWx0cyBhbmQgYXNzaWduIGl0IHRvIGFuIG9iamVjdCAoYHBhdGllbnRzYCkKCmBgYHtyfQpwYXRpZW50cyA8LSByZWFkLmRlbGltKCJwYXRpZW50LWluZm8udHh0IikKCmBgYAoKSW4gdGhlIGxhdGVzdCBSU3R1ZGlvLCB0aGVyZSBpcyB0aGUgb3B0aW9uIHRvIGltcG9ydCBkYXRhIGRpcmVjdGx5IGZyb20gdGhlIEZpbGUgbWVudS4gKioqRmlsZSoqKiAtPiAqKipJbXBvcnQgRGF0YXNldCoqKiAtPiAqKipGcm9tIENzdioqKgoKLSBJZiB0aGUgZGF0YSBhcmUgY29tbWEtc2VwYXJhdGVkLCB0aGVuIHVzZSBlaXRoZXIgdGhlIGFyZ3VtZW50IGBzZXA9IiwiYCBvciB0aGUgZnVuY3Rpb24gYHJlYWQuY3N2KClgOgotIFlvdSBuZWVkIHRvIG1ha2Ugc3VyZSB5b3UgdXNlIHRoZSBjb3JyZWN0IGZ1bmN0aW9uCiAgICArIGNhbiB5b3UgZXhwbGFpbiB0aGUgb3V0cHV0IG9mIHRoZSBmb2xsb3dpbmcgbGluZXMgb2YgY29kZT8KCmBgYHtyIH0KdG1wIDwtIHJlYWQuY3N2KCJwYXRpZW50LWluZm8udHh0IikKaGVhZCh0bXApCmBgYAotIEZvciBmdWxsIGxpc3Qgb2YgYXJndW1lbnRzOgpgYGB7cn0KP3JlYWQudGFibGUKYGBgCgojIzFiLiBDaGVjayB0aGUgZGF0YQotICpBbHdheXMqIGNoZWNrIHRoZSBvYmplY3QgdG8gbWFrZSBzdXJlIHRoZSBjb250ZW50cyBhbmQgZGltZW5zaW9ucyBhcmUgYXMgeW91IGV4cGVjdAotIFIgd2lsbCBzb21ldGltZXMgY3JlYXRlIHRoZSBvYmplY3Qgd2l0aG91dCBlcnJvciwgYnV0IHRoZSBjb250ZW50cyBtYXkgYmUgdW4tdXNhYmxlIGZvciBhbmFseXNpcwogICAgKyBJZiB5b3Ugc3BlY2lmeSBhbiBpbmNvcnJlY3Qgc2VwYXJhdG9yLCBSIHdpbGwgbm90IGJlIGFibGUgdG8gbG9jYXRlIHRoZSBjb2x1bW5zIGluIHlvdXIgZGF0YSwgYW5kIHlvdSBtYXkgZW5kIHVwIHdpdGggYW4gb2JqZWN0IHdpdGgganVzdCBvbmUgY29sdW1uCiAgICAKYGBge3J9CiMgVmlldyB0aGUgZmlyc3QgMTAgcm93cyB0byBlbnN1cmUgaW1wb3J0IGlzIE9LCnBhdGllbnRzWzE6MTAsXSAgCmBgYAoKCi0gb3IgdXNlIHRoZSBgVmlldygpYCBmdW5jdGlvbiB0byBnZXQgYSBkaXNwbGF5IG9mIHRoZSBkYXRhIGluIFJTdHVkaW86CmBgYHtyfQpWaWV3KHBhdGllbnRzKQpgYGAKCiMjMWMuIFVuZGVyc3RhbmRpbmcgdGhlIG9iamVjdAoKLSBPbmNlIHdlIGhhdmUgcmVhZCB0aGUgZGF0YSBzdWNjZXNzZnVsbHksIHdlIGNhbiBzdGFydCB0byBpbnRlcmFjdCB3aXRoIGl0Ci0gVGhlIG9iamVjdCB3ZSBoYXZlIGNyZWF0ZWQgaXMgYSAqZGF0YSBmcmFtZSo6CmBgYHtyfQpjbGFzcyhwYXRpZW50cykKYGBgCgoKLSBXZSBjYW4gcXVlcnkgdGhlIGRpbWVuc2lvbnM6CgpgYGB7cn0KbmNvbChwYXRpZW50cykKbnJvdyhwYXRpZW50cykKZGltKHBhdGllbnRzKQpgYGAKCi0gV2UgY2FuIGFsc28gZXhhbWluZSB0aGUgdHlwZSBvZiBkYXRhIGluIHRoZSBmcmFtZToKYGBge3J9CnN0cihwYXRpZW50cykKYGBgCgotIFRoZSBuYW1lcyBvZiB0aGUgY29sdW1ucyBhcmUgYXV0b21hdGljYWxseSBhc3NpZ25lZDoKCmBgYHtyfQpjb2xuYW1lcyhwYXRpZW50cykKYGBgCgotIFdlIGNhbiB1c2UgYW55IG9mIHRoZXNlIG5hbWVzIHRvIGFjY2VzcyBhIHBhcnRpY3VsYXIgY29sdW1uOgogICAgKyBhbmQgY3JlYXRlIGEgdmVjdG9yCiAgICArIFRPUCBUSVA6IHR5cGUgdGhlIG5hbWUgb2YgdGhlIG9iamVjdCBhbmQgaGl0IFRBQjogeW91IGNhbiBzZWxlY3QgdGhlIGNvbHVtbiBmcm9tIHRoZSBkcm9wLWRvd24gbGlzdCEKYGBge3J9CnBhdGllbnRzJElECgpgYGAKCiMjIFdvcmQgb2Ygd2FybmluZwoKCiFbXShpbWFnZXMvdG9sc3RveS5qcGcpCgoKCiFbXShpbWFnZXMvaGFkbGV5LmpwZykKCj4gTGlrZSBmYW1pbGllcywgdGlkeSBkYXRhc2V0cyBhcmUgYWxsIGFsaWtlIGJ1dCBldmVyeSBtZXNzeSBkYXRhc2V0IGlzIG1lc3N5IGluIGl0cyBvd24gd2F5IC0gKEhhZGxleSBXaWNraGFtIC0gUlN0dWRpbyBjaGllZiBzY2llbnRpc3QgYW5kIGF1dGhvciBvZiBkcGx5ciwgZ2dwbG90MiBhbmQgb3RoZXJzKQoKWW91IHdpbGwgbWFrZSB5b3VyIGxpZmUgYSBsb3QgZWFzaWVyIGlmIHlvdSBrZWVwIHlvdXIgZGF0YSAqKnRpZHkqKiBhbmQgKioqb3JnYW5pc2VkKioqLiBCZWZvcmUgYmxhbWluZyBSLCBjb25zaWRlciBpZiB5b3VyIGRhdGEgYXJlIGluIGEgc3VpdGFibGUgZm9ybSBmb3IgYW5hbHlzaXMuIFRoZSBtb3JlIG1hbnVhbCBtYW5pcHVsYXRpb24geW91IGhhdmUgZG9uZSBvbiB0aGUgZGF0YSAoaGlnaGxpZ2h0aW5nLCBmb3JtdWxhcywgY29weS1hbmQtcGFzdGluZyksIHRoZSBsZXNzIGhhcHB5IFIgaXMgZ29pbmcgdG8gYmUgdG8gcmVhZCBpdC4gSGVyZSBhcmUgc29tZSB1c2VmdWwgbGlua3Mgb24gc29tZSBjb21tb24gcGl0ZmFsbHMgYW5kIGhvdyB0byBhdm9pZCB0aGVtCgotIGh0dHA6Ly93d3cuZGF0YWNhcnBlbnRyeS5vcmcvc3ByZWFkc2hlZXQtZWNvbG9neS1sZXNzb24vCi0gaHR0cDovL2ticm9tYW4ub3JnL2RhdGFvcmcvCgojI0hhbmRsaW5nIG1pc3NpbmcgdmFsdWVzCgotIFRoZSBkYXRhIGZyYW1lIGNvbnRhaW5zIHNvbWUgKipgTkFgKiogdmFsdWVzLCB3aGljaCBtZWFucyB0aGUgdmFsdWVzIGFyZSBtaXNzaW5nIOKAkyBhIGNvbW1vbiBvY2N1cnJlbmNlIGluIHJlYWwgZGF0YSBjb2xsZWN0aW9uCi0gYE5BYCBpcyBhIHNwZWNpYWwgdmFsdWUgdGhhdCBjYW4gYmUgcHJlc2VudCBpbiBvYmplY3RzIG9mIGFueSB0eXBlIChsb2dpY2FsLCBjaGFyYWN0ZXIsIG51bWVyaWMgZXRjKQotIGBOQWAgaXMgbm90IHRoZSBzYW1lIGFzIGBOVUxMYDoKICAgIC0gYE5VTExgIGlzIGFuIGVtcHR5IFIgb2JqZWN0LiAKICAgIC0gYE5BYCBpcyBvbmUgbWlzc2luZyB2YWx1ZSB3aXRoaW4gYW4gUiBvYmplY3QgKGxpa2UgYSBkYXRhIGZyYW1lIG9yIGEgdmVjdG9yKQotIE9mdGVuIFIgZnVuY3Rpb25zIHdpbGwgaGFuZGxlIGBOQWBzIGdyYWNlZnVsbHk6CgpgYGB7cn0KbGVuZ3RoKHBhdGllbnRzJEhlaWdodCkKbWVhbihwYXRpZW50cyRIZWlnaHQpCmBgYAoKLSBIb3dldmVyLCBzb21ldGltZXMgd2UgaGF2ZSB0byB0ZWxsIHRoZSBmdW5jdGlvbnMgd2hhdCB0byBkbyB3aXRoIHRoZW0uIAotIFIgaGFzIHNvbWUgYnVpbHQtaW4gZnVuY3Rpb25zIGZvciBkZWFsaW5nIHdpdGggYE5BYHMsIGFuZCBmdW5jdGlvbnMgb2Z0ZW4gaGF2ZSB0aGVpciBvd24gYXJndW1lbnRzIChsaWtlIGBuYS5ybWApIGZvciBoYW5kbGluZyB0aGVtOgogICAgKyBhbm5veWluZ2x5LCBkaWZmZXJlbnQgZnVuY3Rpb25zIGhhdmUgZGlmZmVyZW50IGFyZ3VtZW50IG5hbWVzIHRvIGNoYW5nZSB0aGVpciBiZWhhdmlvdXIgd2l0aCByZWdhcmRzIHRvIGBOQWAgdmFsdWVzLiAqQWx3YXlzIGNoZWNrIHRoZSBkb2N1bWVudGF0aW9uKgoKYGBge3J9Cm1lYW4ocGF0aWVudHMkSGVpZ2h0LCBuYS5ybSA9IFRSVUUpCgptZWFuKG5hLm9taXQocGF0aWVudHMkSGVpZ2h0KSkKYGBgCgojIzIuIEFuYWx5c2lzIChyZXNoYXBpbmcgZGF0YSBhbmQgbWF0aHMpCgotIE91ciBhbmFseXNpcyBpbnZvbHZlcyBpZGVudGlmeWluZyBwYXRpZW50cyB3aXRoIGV4dHJlbWUgQk1JCiAgICArIHdlIHdpbGwgZGVmaW5lIHRoaXMgYXMgYmVpbmcgdHdvIHN0YW5kYXJkIGRldmlhdGlvbnMgZnJvbSB0aGUgbWVhbgoKYGBge3J9CiMgQ3JlYXRlIGFuIGluZGV4IG9mIHJlc3VsdHM6CkJNSSA8LSAocGF0aWVudHMkV2VpZ2h0KS8oKHBhdGllbnRzJEhlaWdodC8xMDApXjIpCnVwcGVyLmxpbWl0IDwtIG1lYW4oQk1JLG5hLnJtID0gVFJVRSkgKyAyKnNkKEJNSSxuYS5ybSA9IFRSVUUpCnVwcGVyLmxpbWl0CmBgYAoKCi0gV2UgY2FuIHBsb3QgYSBzaW1wbGUgY2hhcnQgb2YgdGhlIEJNSSB2YWx1ZXMgCiAgICArIGFkZCBhIHZlcnRpY2FsIGxpbmUgdG8gaW5kaWNhdGUgdGhlIGN1dC1vZmYKICAgICsgcGxvdHRpbmcgd2lsbCBiZSBjb3ZlcmVkIGluIGRldGFpbCBzaG9ydGx5Li4KCmBgYHtyfQpwbG90KEJNSSkKIyBBZGQgYSBob3Jpem9uYWwgbGluZToKYWJsaW5lKGg9dXBwZXIubGltaXQpIApgYGAKCi0gSXQgaXMgYWxzbyB1c2VmdWwgdG8gc2F2ZSB0aGUgdmFyaWFibGUgd2UgaGF2ZSBjb21wdXRlZCBhcyBhIG5ldyBjb2x1bW4gaW4gdGhlIGRhdGEgZnJhbWUKCmBgYHtyfQpyb3VuZChCTUksMSkKcGF0aWVudHMkQk1JIDwtIHJvdW5kKEJNSSwxKQpoZWFkKHBhdGllbnRzKQpgYGAKCi0gVG8gYWN0dWFsbHkgc2VsZWN0IHRoZSBjYW5kaWRhdGVzIHdlIGNhbiB1c2UgYSBsb2dpY2FsIGV4cHJlc3Npb24gdG8gdGVzdCB0aGUgdmFsdWVzIG9mIHRoZSBCTUkgdmVjdG9yIGJlaW5nIGdyZWF0ZXIgdGhhbiB0aGUgdXBwZXIgbGltaXQKICAgICsgaWYgdGhlIHNlY29uZCBsaW5lIGxvb2tzIGEgYml0IHdlaXJkLCByZW1lbWJlciB0aGF0IGA8LWAgaXMgZG9pbmcgYW4gYXNzaWdubWVudC4gVGhldmFsdWUgd2UgYXJlIGFzc2lnbmluZyB0byBvdXIgbmV3IHZhcmlhYmxlIGlzIHRoZSBsb2dpY2FsIChgVFJVRWAgb3IgYEZBTFNFYCkgdmVjdG9yIGdpdmVuIGJ5IHRlc3RpbmcgZWFjaCBpdGVtIGluIGBCTUlgIGFnYWluc3QgdGhlIGB1cHBlci5saW1pdGAKICAgIApgYGB7cn0KQk1JID4gdXBwZXIubGltaXQKY2FuZGlkYXRlcyA8LSBCTUkgPiB1cHBlci5saW1pdApgYGAKCldlIGhhdmUgc2VlbiB0aGF0IGEgbG9naWNhbCB2ZWN0b3IgY2FuIGJlIHVzZWQgdG8gc3Vic2V0IGEgZGF0YSBmcmFtZQoKLSBIb3dldmVyLCBpbiBvdXIgY2FzZSB0aGUgcmVzdWx0IGxvb2tzIGEgYml0IGZ1bm55Ci0gQ2FuIHlvdSB0aGluayB3aHkgdGhpcyBtaWdodCBiZT8KCmBgYHtyfQpwYXRpZW50c1tjYW5kaWRhdGVzLF0KYGBgCgpUaGUgYHdoaWNoYCBmdW5jdGlvbiB3aWxsIHRha2UgYSBsb2dpY2FsIHZlY3RvciBhbmQgcmV0dXJuIHRoZSBpbmRpY2VzIG9mIHRoZSBgVFJVRWAgdmFsdWVzCgotIFRoaXMgY2FuIHRoZW4gYmUgdXNlZCB0byBzdWJzZXQgdGhlIGRhdGEgZnJhbWUKCmBgYHtyfQp3aGljaChCTUkgPiB1cHBlci5saW1pdCkKY2FuZGlkYXRlcyA8LSB3aGljaChCTUkgPiB1cHBlci5saW1pdCkKYGBgCgotIExldHMgdHJ5IGFnYWluOgoKYGBge3J9CnBhdGllbnRzW2NhbmRpZGF0ZXMsXQpgYGAKCgojIyAzLiBPdXRwdXR0aW5nIHRoZSByZXN1bHRzCgotIFdlIHdyaXRlIG91dCBhIGRhdGEgZnJhbWUgb2YgY2FuZGlkYXRlcyAocGF0aWVudHMgd2l0aCBCTUkgbW9yZSB0aGFuIHN0YW5kYXJkIGRldmlhdGlvbnMgZnJvbSB0aGUgbWVhbikgYXMgYSAnY29tbWEgc2VwYXJhdGVkIHZhbHVlcycgdGV4dCBmaWxlIChDU1YpOgoKYGBge3J9CndyaXRlLmNzdihwYXRpZW50c1tjYW5kaWRhdGVzLF0sIGZpbGU9InNlbGVjdGVkU2FtcGxlcy5jc3YiKQpgYGAKCi0gVGhlIG91dHB1dCBmaWxlIGlzIGRpcmVjdGx5LXJlYWRhYmxlIGJ5IEV4Y2VsCi0gSXQncyBvZnRlbiBoZWxwZnVsIHRvIGRvdWJsZSBjaGVjayB3aGVyZSB0aGUgZGF0YSBoYXMgYmVlbiBzYXZlZC4gVXNlIHRoZSAqZ2V0IHdvcmtpbmcgZGlyZWN0b3J5KiBmdW5jdGlvbjoKCmBgYHtyIGV2YWw9RkFMU0V9CmdldHdkKCkgICAgICAjIHByaW50IHdvcmtpbmcgZGlyZWN0b3J5Cmxpc3QuZmlsZXMoKSAjIGxpc3QgZmlsZXMgaW4gd29ya2luZyBkaXJlY3RvcnkKCmBgYAoKClRvIHJlY2FwLCB0aGUgc2V0IG9mIFIgY29tbWFuZHMgd2UgaGF2ZSB1c2VkIGlzOi0KCmBgYHtyfQpwYXRpZW50cyA8LSByZWFkLmRlbGltKCJwYXRpZW50LWluZm8udHh0IikKQk1JIDwtIChwYXRpZW50cyRXZWlnaHQpLygocGF0aWVudHMkSGVpZ2h0LzEwMCleMikKdXBwZXIubGltaXQgPC0gbWVhbihCTUksbmEucm0gPSBUUlVFKSArIDIqc2QoQk1JLG5hLnJtID0gVFJVRSkKcGxvdChCTUkpCiMgQWRkIGEgaG9yaXpvbmFsIGxpbmU6CmFibGluZShoPXVwcGVyLmxpbWl0KSAKcGF0aWVudHMkQk1JIDwtIHJvdW5kKEJNSSwxKQpjYW5kaWRhdGVzIDwtIHdoaWNoKEJNSSA+IHVwcGVyLmxpbWl0KQp3cml0ZS5jc3YocGF0aWVudHNbY2FuZGlkYXRlcyxdLCBmaWxlPSJzZWxlY3RlZFNhbXBsZXMuY3N2IikKCmBgYAoKIyNFeGVyY2lzZTogRXhlcmNpc2UgMwoKLSBBIHNlcGFyYXRlIHN0dWR5IGlzIGxvb2tpbmcgZm9yIHBhdGllbnRzIHRoYXQgYXJlIHVuZGVyd2VpZ2h0IGFuZCBhbHNvIHNtb2tlOyAKICArIE1vZGlmeSB0aGUgY29uZGl0aW9uIGluIG91ciBwcmV2aW91cyBjb2RlIHRvIGZpbmQgdGhlc2UgcGF0aWVudHMKICArIGUuZy4gaGF2aW5nIEJNSSB0aGF0IGlzIDIgc3RhbmRhcmQgZGV2aWF0aW9ucyAqbGVzcyogdGhhbiB0aGUgbWVhbiBCTUkKICArIFdyaXRlIG91dCBhIHJlc3VsdHMgZmlsZSBvZiB0aGUgc2FtcGxlcyB0aGF0IG1hdGNoIHRoZXNlIGNyaXRlcmlhLCBhbmQgb3BlbiBpdCBpbiBhIHNwcmVhZHNoZWV0IHByb2dyYW0KCgpgYGB7cn0KIyMjIFlvdXIgQW5zd2VyIEhlcmUgIyMjIAoKCgpgYGAKCg==