3. R for data analysis

3 steps to Basic Data Analysis

  • In this short section, we show how the data manipulation steps we have just seen can be used as part of an analysis pipeline:
  1. Reading in data
    • read.table()
    • read.csv(), read.delim()
  2. Analysis
    • Manipulating & reshaping the data
      • perhaps dealing with “missing data”
    • Any maths you like
    • Diagnostic Plots
  3. Writing out results
    • write.table()
    • write.csv()

A simple walkthrough

  • We have data from 100 patients that given consent for their data to use in future studies
  • A researcher wants to undertake a study involving people that are overweight
  • We will walkthrough how to filter the data and write a new file with the candidates for the study

The Working Directory (wd)

  • Like many programs R has a concept of a working directory
  • It is the place where R will look for files to execute and where it will save files, by default
  • For this course we need to set the working directory to the location of the course scripts
  • In RStudio use the mouse and browse to the directory where you saved the Course Materials

  • Session → Set Working Directory → Choose Directory…

0. Locate the data

Before we even start the analysis, we need to be sure of where the data are located on our hard drive

  • Functions that import data need a file location as a character vector
  • The default location is the working directory
getwd()
[1] "/Users/dunnin01/work/git/r-intro"
  • If the file you want to read is in your working directory, you can just use the file name
list.files()
  • The file.exists function does exactly what it says on the tin!
    • a good sanity check for your code
file.exists("patient-info.txt")
[1] TRUE
  • Otherwise you need the path to the file
    • you can get this using file.choose()
  • If you unsure about specifying a file path at the command line, this online tutorial will give you hands-on practice

1. Read in the data

  • The data are a tab-delimited file. Each row is a record, each column is a field. Columns are separated by tabs in the text
  • We need to read in the results and assign it to an object (patients)
patients <- read.delim("patient-info.txt")

In the latest RStudio, there is the option to import data directly from the File menu. File -> Import Dataset -> From Csv

  • If the data are comma-separated, then use either the argument sep="," or the function read.csv():
  • You need to make sure you use the correct function
    • can you explain the output of the following lines of code?
tmp <- read.csv("patient-info.txt")
head(tmp)
  • For full list of arguments:
?read.table

1b. Check the data

  • Always check the object to make sure the contents and dimensions are as you expect
  • R will sometimes create the object without error, but the contents may be un-usable for analysis
    • If you specify an incorrect separator, R will not be able to locate the columns in your data, and you may end up with an object with just one column
# View the first 10 rows to ensure import is OK
patients[1:10,]  
  • or use the View() function to get a display of the data in RStudio:
View(patients)

1c. Understanding the object

  • Once we have read the data successfully, we can start to interact with it
  • The object we have created is a data frame:
class(patients)
[1] "data.frame"
  • We can query the dimensions:
ncol(patients)
[1] 10
nrow(patients)
[1] 100
dim(patients)
[1] 100  10
  • The names of the columns are automatically assigned:
colnames(patients)
 [1] "ID"     "Race"   "Sex"    "Smokes" "Height" "Weight" "State"  "Pet"    "Grade"  "Age"   
  • We can use any of these names to access a particular column:
    • and create a vector
    • TOP TIP: type the name of the object and hit TAB: you can select the column from the drop-down list!
patients$ID
  [1] AC/AH/001 AC/AH/017 AC/AH/020 AC/AH/022 AC/AH/029 AC/AH/033 AC/AH/037 AC/AH/044 AC/AH/045 AC/AH/048 AC/AH/049 AC/AH/050
 [13] AC/AH/052 AC/AH/053 AC/AH/057 AC/AH/061 AC/AH/063 AC/AH/076 AC/AH/077 AC/AH/086 AC/AH/089 AC/AH/100 AC/AH/104 AC/AH/112
 [25] AC/AH/113 AC/AH/114 AC/AH/115 AC/AH/127 AC/AH/133 AC/AH/150 AC/AH/154 AC/AH/156 AC/AH/159 AC/AH/160 AC/AH/164 AC/AH/171
 [37] AC/AH/176 AC/AH/180 AC/AH/185 AC/AH/186 AC/AH/192 AC/AH/198 AC/AH/207 AC/AH/208 AC/AH/210 AC/AH/211 AC/AH/213 AC/AH/219
 [49] AC/AH/220 AC/AH/221 AC/AH/225 AC/AH/233 AC/AH/241 AC/AH/244 AC/AH/248 AC/AH/249 AC/SG/002 AC/SG/003 AC/SG/008 AC/SG/009
 [61] AC/SG/010 AC/SG/015 AC/SG/016 AC/SG/046 AC/SG/055 AC/SG/056 AC/SG/064 AC/SG/065 AC/SG/067 AC/SG/068 AC/SG/072 AC/SG/074
 [73] AC/SG/084 AC/SG/095 AC/SG/099 AC/SG/101 AC/SG/107 AC/SG/116 AC/SG/121 AC/SG/122 AC/SG/123 AC/SG/134 AC/SG/139 AC/SG/142
 [85] AC/SG/155 AC/SG/165 AC/SG/167 AC/SG/172 AC/SG/173 AC/SG/179 AC/SG/181 AC/SG/182 AC/SG/191 AC/SG/193 AC/SG/194 AC/SG/197
 [97] AC/SG/204 AC/SG/216 AC/SG/217 AC/SG/234
100 Levels: AC/AH/001 AC/AH/017 AC/AH/020 AC/AH/022 AC/AH/029 AC/AH/033 AC/AH/037 AC/AH/044 AC/AH/045 AC/AH/048 ... AC/SG/234

Word of warning

Like families, tidy datasets are all alike but every messy dataset is messy in its own way - (Hadley Wickham - RStudio chief scientist and author of dplyr, ggplot2 and others)

You will make your life a lot easier if you keep your data tidy and organised. Before blaming R, consider if your data are in a suitable form for analysis. The more manual manipulation you have done on the data (highlighting, formulas, copy-and-pasting), the less happy R is going to be to read it. Here are some useful links on some common pitfalls and how to avoid them

Handling missing values

  • The data frame contains some NA values, which means the values are missing – a common occurrence in real data collection
  • NA is a special value that can be present in objects of any type (logical, character, numeric etc)
  • NA is not the same as NULL:
    • NULL is an empty R object.
    • NA is one missing value within an R object (like a data frame or a vector)
  • Often R functions will handle NAs gracefully:
length(patients$Height)
[1] 100
mean(patients$Height)
[1] NA
  • However, sometimes we have to tell the functions what to do with them.
  • R has some built-in functions for dealing with NAs, and functions often have their own arguments (like na.rm) for handling them:
    • annoyingly, different functions have different argument names to change their behaviour with regards to NA values. Always check the documentation
mean(patients$Height, na.rm = TRUE)
[1] 167.4969
mean(na.omit(patients$Height))
[1] 167.4969

2. Analysis (reshaping data and maths)

  • Our analysis involves identifying patients with extreme BMI
    • we will define this as being two standard deviations from the mean
# Create an index of results:
BMI <- (patients$Weight)/((patients$Height/100)^2)
upper.limit <- mean(BMI,na.rm = TRUE) + 2*sd(BMI,na.rm = TRUE)
upper.limit
[1] 30.9533
  • We can plot a simple chart of the BMI values
    • add a vertical line to indicate the cut-off
    • plotting will be covered in detail shortly..
plot(BMI)
# Add a horizonal line:
abline(h=upper.limit) 

  • It is also useful to save the variable we have computed as a new column in the data frame
round(BMI,1)
  [1] 22.9 25.1 26.4 30.6 26.5 27.9 26.3 25.6 23.4 28.2 28.2   NA 30.0 27.9 24.5 22.0 25.6 31.5 23.8   NA 23.5 26.7 31.4   NA
 [25] 24.6   NA 24.8 29.2   NA 24.1 25.1 28.0 29.4 28.2 23.6 26.4   NA 25.0 27.7 27.0 25.6 26.7 24.5 26.1 23.1 28.2 26.9   NA
 [49] 25.4 25.9   NA 24.8 28.2   NA 30.4 26.8 26.0 25.2 26.9 31.7 25.6   NA 26.7 27.8 28.4   NA 31.5 27.0 30.0 26.5 25.2   NA
 [73] 26.7 25.8   NA 27.6 29.1 26.6 26.6 26.9 27.6 26.4 27.8   NA 27.8 25.8 27.7 28.7 24.2 24.6 28.3 24.8 27.8 21.4 28.0 26.0
 [97] 26.2 26.4 27.7   NA
patients$BMI <- round(BMI,1)
head(patients)
  • To actually select the candidates we can use a logical expression to test the values of the BMI vector being greater than the upper limit
    • if the second line looks a bit weird, remember that <- is doing an assignment. Thevalue we are assigning to our new variable is the logical (TRUE or FALSE) vector given by testing each item in BMI against the upper.limit
BMI > upper.limit
  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE    NA FALSE FALSE FALSE FALSE FALSE  TRUE FALSE    NA
 [21] FALSE FALSE  TRUE    NA FALSE    NA FALSE FALSE    NA FALSE FALSE FALSE FALSE FALSE FALSE FALSE    NA FALSE FALSE FALSE
 [41] FALSE FALSE FALSE FALSE FALSE FALSE FALSE    NA FALSE FALSE    NA FALSE FALSE    NA FALSE FALSE FALSE FALSE FALSE  TRUE
 [61] FALSE    NA FALSE FALSE FALSE    NA  TRUE FALSE FALSE FALSE FALSE    NA FALSE FALSE    NA FALSE FALSE FALSE FALSE FALSE
 [81] FALSE FALSE FALSE    NA FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE    NA
candidates <- BMI > upper.limit

We have seen that a logical vector can be used to subset a data frame

  • However, in our case the result looks a bit funny
  • Can you think why this might be?
patients[candidates,]

The which function will take a logical vector and return the indices of the TRUE values

  • This can then be used to subset the data frame
which(BMI > upper.limit)
[1] 18 23 60 67
candidates <- which(BMI > upper.limit)

3. Outputting the results

  • We write out a data frame of candidates (patients with BMI more than standard deviations from the mean) as a ‘comma separated values’ text file (CSV):
write.csv(patients[candidates,], file="selectedSamples.csv")
  • The output file is directly-readable by Excel
  • It’s often helpful to double check where the data has been saved. Use the get working directory function:
getwd()      # print working directory
list.files() # list files in working directory

To recap, the set of R commands we have used is:-

patients <- read.delim("patient-info.txt")
BMI <- (patients$Weight)/((patients$Height/100)^2)
upper.limit <- mean(BMI,na.rm = TRUE) + 2*sd(BMI,na.rm = TRUE)
plot(BMI)
# Add a horizonal line:
abline(h=upper.limit) 

patients$BMI <- round(BMI,1)
candidates <- which(BMI > upper.limit)
write.csv(patients[candidates,], file="selectedSamples.csv")

Exercise: Exercise 3

  • A separate study is looking for patients that are underweight and also smoke;
  • Modify the condition in our previous code to find these patients
  • e.g. having BMI that is 2 standard deviations less than the mean BMI
  • Write out a results file of the samples that match these criteria, and open it in a spreadsheet program
### Your Answer Here ### 
LS0tCnRpdGxlOiAiSW50cm9kdWN0aW9uIHRvIFNvbHZpbmcgQmlvbG9naWNhbCBQcm9ibGVtcyBVc2luZyBSIC0gRGF5IDEiCmF1dGhvcjogTWFyayBEdW5uaW5nLCBTdXJhaiBNZW5vbiBhbmQgQWlvcmEgWmFiYWxhLiBPcmlnaW5hbCBtYXRlcmlhbCBieSBSb2JlcnQgU3Rvam5pxIcsCiAgTGF1cmVudCBHYXR0bywgUm9iIEZveSwgSm9obiBEYXZleSwgRMOhdmlkIE1vbG7DoXIgYW5kIElhbiBSb2JlcnRzCmRhdGU6ICdgciBmb3JtYXQoU3lzLnRpbWUoKSwgIkxhc3QgbW9kaWZpZWQ6ICVkICViICVZIilgJwpvdXRwdXQ6CiAgaHRtbF9ub3RlYm9vazoKICAgIHRvYzogeWVzCiAgICB0b2NfZmxvYXQ6IHllcwotLS0KCiMgMy4gUiBmb3IgZGF0YSBhbmFseXNpcwoKIyMzIHN0ZXBzIHRvIEJhc2ljIERhdGEgQW5hbHlzaXMKCi0gSW4gdGhpcyBzaG9ydCBzZWN0aW9uLCB3ZSBzaG93IGhvdyB0aGUgZGF0YSBtYW5pcHVsYXRpb24gc3RlcHMgd2UgaGF2ZSBqdXN0IHNlZW4gY2FuIGJlIHVzZWQgYXMgcGFydCBvZiBhbiBhbmFseXNpcyBwaXBlbGluZToKCjEuIFJlYWRpbmcgaW4gZGF0YQogICAgKyBgcmVhZC50YWJsZSgpYAogICAgKyBgcmVhZC5jc3YoKSwgcmVhZC5kZWxpbSgpYAoyLiBBbmFseXNpcwogICAgKyBNYW5pcHVsYXRpbmcgJiByZXNoYXBpbmcgdGhlIGRhdGEKICAgICAgICArIHBlcmhhcHMgZGVhbGluZyB3aXRoICJtaXNzaW5nIGRhdGEiCiAgICArIEFueSBtYXRocyB5b3UgbGlrZQogICAgKyBEaWFnbm9zdGljIFBsb3RzCjMuIFdyaXRpbmcgb3V0IHJlc3VsdHMKICAgICsgYHdyaXRlLnRhYmxlKClgCiAgICArIGB3cml0ZS5jc3YoKWAKICAKIyMgQSBzaW1wbGUgd2Fsa3Rocm91Z2gKCi0gV2UgaGF2ZSBkYXRhIGZyb20gMTAwIHBhdGllbnRzIHRoYXQgZ2l2ZW4gY29uc2VudCBmb3IgdGhlaXIgZGF0YSB0byB1c2UgaW4gZnV0dXJlIHN0dWRpZXMKLSBBIHJlc2VhcmNoZXIgd2FudHMgdG8gdW5kZXJ0YWtlIGEgc3R1ZHkgaW52b2x2aW5nIHBlb3BsZSB0aGF0IGFyZSBvdmVyd2VpZ2h0Ci0gV2Ugd2lsbCB3YWxrdGhyb3VnaCBob3cgdG8gZmlsdGVyIHRoZSBkYXRhIGFuZCB3cml0ZSBhIG5ldyBmaWxlIHdpdGggdGhlIGNhbmRpZGF0ZXMgZm9yIHRoZSBzdHVkeSAgICAKICAgIAojI1RoZSBXb3JraW5nIERpcmVjdG9yeSAod2QpCgoKLSBMaWtlIG1hbnkgcHJvZ3JhbXMgUiBoYXMgYSBjb25jZXB0IG9mIGEgd29ya2luZyBkaXJlY3RvcnkgCi0gSXQgaXMgdGhlIHBsYWNlIHdoZXJlIFIgd2lsbCBsb29rIGZvciBmaWxlcyB0byBleGVjdXRlIGFuZCB3aGVyZSBpdCB3aWxsCnNhdmUgZmlsZXMsIGJ5IGRlZmF1bHQKLSBGb3IgdGhpcyBjb3Vyc2Ugd2UgbmVlZCB0byBzZXQgdGhlIHdvcmtpbmcgZGlyZWN0b3J5IHRvIHRoZSBsb2NhdGlvbgpvZiB0aGUgY291cnNlIHNjcmlwdHMKLSBJbiBSU3R1ZGlvIHVzZSB0aGUgbW91c2UgYW5kIGJyb3dzZSB0byB0aGUgZGlyZWN0b3J5IHdoZXJlIHlvdSBzYXZlZCB0aGUgQ291cnNlIE1hdGVyaWFscwoKLSAqKipTZXNzaW9uIOKGkiBTZXQgV29ya2luZyBEaXJlY3Rvcnkg4oaSIENob29zZSBEaXJlY3RvcnkuLi4qKioKCiMjIDAuIExvY2F0ZSB0aGUgZGF0YQoKQmVmb3JlIHdlIGV2ZW4gc3RhcnQgdGhlIGFuYWx5c2lzLCB3ZSBuZWVkIHRvIGJlIHN1cmUgb2Ygd2hlcmUgdGhlIGRhdGEgYXJlIGxvY2F0ZWQgb24gb3VyIGhhcmQgZHJpdmUKCi0gRnVuY3Rpb25zIHRoYXQgaW1wb3J0IGRhdGEgbmVlZCBhIGZpbGUgbG9jYXRpb24gYXMgYSBjaGFyYWN0ZXIgdmVjdG9yCi0gVGhlIGRlZmF1bHQgbG9jYXRpb24gaXMgdGhlICoqKndvcmtpbmcgZGlyZWN0b3J5KioqCmBgYHtyfQpnZXR3ZCgpCmBgYAoKLSBJZiB0aGUgZmlsZSB5b3Ugd2FudCB0byByZWFkIGlzIGluIHlvdXIgd29ya2luZyBkaXJlY3RvcnksIHlvdSBjYW4ganVzdCB1c2UgdGhlIGZpbGUgbmFtZQoKYGBge3IgZXZhbD1GQUxTRX0KbGlzdC5maWxlcygpCmBgYAoKLSBUaGUgYGZpbGUuZXhpc3RzYCBmdW5jdGlvbiBkb2VzIGV4YWN0bHkgd2hhdCBpdCBzYXlzIG9uIHRoZSB0aW4hCiAgICArIGEgZ29vZCBzYW5pdHkgY2hlY2sgZm9yIHlvdXIgY29kZQoKYGBge3J9CmZpbGUuZXhpc3RzKCJwYXRpZW50LWluZm8udHh0IikKYGBgCgotIE90aGVyd2lzZSB5b3UgbmVlZCB0aGUgKnBhdGgqIHRvIHRoZSBmaWxlCiAgICArIHlvdSBjYW4gZ2V0IHRoaXMgdXNpbmcgKipgZmlsZS5jaG9vc2UoKWAqKgogICAgCi0gSWYgeW91IHVuc3VyZSBhYm91dCBzcGVjaWZ5aW5nIGEgZmlsZSBwYXRoIGF0IHRoZSBjb21tYW5kIGxpbmUsIHRoaXMgW29ubGluZSB0dXRvcmlhbF0oaHR0cDovL3Jpay5zbWl0aC11bm5hLmNvbS9jb21tYW5kX2xpbmVfYm9vdGNhbXAvP2lkPXZjemh5YmpodHl0KSB3aWxsIGdpdmUgeW91IGhhbmRzLW9uIHByYWN0aWNlCiAgICAKIyMxLiBSZWFkIGluIHRoZSBkYXRhCgotIFRoZSBkYXRhIGFyZSBhIHRhYi1kZWxpbWl0ZWQgZmlsZS4gRWFjaCByb3cgaXMgYSByZWNvcmQsIGVhY2ggY29sdW1uIGlzIGEgZmllbGQuIENvbHVtbnMgYXJlIHNlcGFyYXRlZCBieSB0YWJzIGluIHRoZSB0ZXh0Ci0gV2UgbmVlZCB0byByZWFkIGluIHRoZSByZXN1bHRzIGFuZCBhc3NpZ24gaXQgdG8gYW4gb2JqZWN0IChgcGF0aWVudHNgKQoKYGBge3J9CnBhdGllbnRzIDwtIHJlYWQuZGVsaW0oInBhdGllbnQtaW5mby50eHQiKQoKYGBgCgpJbiB0aGUgbGF0ZXN0IFJTdHVkaW8sIHRoZXJlIGlzIHRoZSBvcHRpb24gdG8gaW1wb3J0IGRhdGEgZGlyZWN0bHkgZnJvbSB0aGUgRmlsZSBtZW51LiAqKipGaWxlKioqIC0+ICoqKkltcG9ydCBEYXRhc2V0KioqIC0+ICoqKkZyb20gQ3N2KioqCgotIElmIHRoZSBkYXRhIGFyZSBjb21tYS1zZXBhcmF0ZWQsIHRoZW4gdXNlIGVpdGhlciB0aGUgYXJndW1lbnQgYHNlcD0iLCJgIG9yIHRoZSBmdW5jdGlvbiBgcmVhZC5jc3YoKWA6Ci0gWW91IG5lZWQgdG8gbWFrZSBzdXJlIHlvdSB1c2UgdGhlIGNvcnJlY3QgZnVuY3Rpb24KICAgICsgY2FuIHlvdSBleHBsYWluIHRoZSBvdXRwdXQgb2YgdGhlIGZvbGxvd2luZyBsaW5lcyBvZiBjb2RlPwoKYGBge3IgfQp0bXAgPC0gcmVhZC5jc3YoInBhdGllbnQtaW5mby50eHQiKQpoZWFkKHRtcCkKYGBgCi0gRm9yIGZ1bGwgbGlzdCBvZiBhcmd1bWVudHM6CmBgYHtyfQo/cmVhZC50YWJsZQpgYGAKCiMjMWIuIENoZWNrIHRoZSBkYXRhCi0gKkFsd2F5cyogY2hlY2sgdGhlIG9iamVjdCB0byBtYWtlIHN1cmUgdGhlIGNvbnRlbnRzIGFuZCBkaW1lbnNpb25zIGFyZSBhcyB5b3UgZXhwZWN0Ci0gUiB3aWxsIHNvbWV0aW1lcyBjcmVhdGUgdGhlIG9iamVjdCB3aXRob3V0IGVycm9yLCBidXQgdGhlIGNvbnRlbnRzIG1heSBiZSB1bi11c2FibGUgZm9yIGFuYWx5c2lzCiAgICArIElmIHlvdSBzcGVjaWZ5IGFuIGluY29ycmVjdCBzZXBhcmF0b3IsIFIgd2lsbCBub3QgYmUgYWJsZSB0byBsb2NhdGUgdGhlIGNvbHVtbnMgaW4geW91ciBkYXRhLCBhbmQgeW91IG1heSBlbmQgdXAgd2l0aCBhbiBvYmplY3Qgd2l0aCBqdXN0IG9uZSBjb2x1bW4KICAgIApgYGB7cn0KIyBWaWV3IHRoZSBmaXJzdCAxMCByb3dzIHRvIGVuc3VyZSBpbXBvcnQgaXMgT0sKcGF0aWVudHNbMToxMCxdICAKYGBgCgoKLSBvciB1c2UgdGhlIGBWaWV3KClgIGZ1bmN0aW9uIHRvIGdldCBhIGRpc3BsYXkgb2YgdGhlIGRhdGEgaW4gUlN0dWRpbzoKYGBge3J9ClZpZXcocGF0aWVudHMpCmBgYAoKIyMxYy4gVW5kZXJzdGFuZGluZyB0aGUgb2JqZWN0CgotIE9uY2Ugd2UgaGF2ZSByZWFkIHRoZSBkYXRhIHN1Y2Nlc3NmdWxseSwgd2UgY2FuIHN0YXJ0IHRvIGludGVyYWN0IHdpdGggaXQKLSBUaGUgb2JqZWN0IHdlIGhhdmUgY3JlYXRlZCBpcyBhICpkYXRhIGZyYW1lKjoKYGBge3J9CmNsYXNzKHBhdGllbnRzKQpgYGAKCi0gV2UgY2FuIHF1ZXJ5IHRoZSBkaW1lbnNpb25zOgoKYGBge3J9Cm5jb2wocGF0aWVudHMpCm5yb3cocGF0aWVudHMpCmRpbShwYXRpZW50cykKYGBgCgoKLSBUaGUgbmFtZXMgb2YgdGhlIGNvbHVtbnMgYXJlIGF1dG9tYXRpY2FsbHkgYXNzaWduZWQ6CgpgYGB7cn0KY29sbmFtZXMocGF0aWVudHMpCmBgYAoKLSBXZSBjYW4gdXNlIGFueSBvZiB0aGVzZSBuYW1lcyB0byBhY2Nlc3MgYSBwYXJ0aWN1bGFyIGNvbHVtbjoKICAgICsgYW5kIGNyZWF0ZSBhIHZlY3RvcgogICAgKyBUT1AgVElQOiB0eXBlIHRoZSBuYW1lIG9mIHRoZSBvYmplY3QgYW5kIGhpdCBUQUI6IHlvdSBjYW4gc2VsZWN0IHRoZSBjb2x1bW4gZnJvbSB0aGUgZHJvcC1kb3duIGxpc3QhCmBgYHtyfQpwYXRpZW50cyRJRAoKYGBgCgojIyBXb3JkIG9mIHdhcm5pbmcKCgohW10oaW1hZ2VzL3RvbHN0b3kuanBnKQoKCgohW10oaW1hZ2VzL2hhZGxleS5qcGcpCgo+IExpa2UgZmFtaWxpZXMsIHRpZHkgZGF0YXNldHMgYXJlIGFsbCBhbGlrZSBidXQgZXZlcnkgbWVzc3kgZGF0YXNldCBpcyBtZXNzeSBpbiBpdHMgb3duIHdheSAtIChIYWRsZXkgV2lja2hhbSAtIFJTdHVkaW8gY2hpZWYgc2NpZW50aXN0IGFuZCBhdXRob3Igb2YgZHBseXIsIGdncGxvdDIgYW5kIG90aGVycykKCllvdSB3aWxsIG1ha2UgeW91ciBsaWZlIGEgbG90IGVhc2llciBpZiB5b3Uga2VlcCB5b3VyIGRhdGEgKip0aWR5KiogYW5kICoqKm9yZ2FuaXNlZCoqKi4gQmVmb3JlIGJsYW1pbmcgUiwgY29uc2lkZXIgaWYgeW91ciBkYXRhIGFyZSBpbiBhIHN1aXRhYmxlIGZvcm0gZm9yIGFuYWx5c2lzLiBUaGUgbW9yZSBtYW51YWwgbWFuaXB1bGF0aW9uIHlvdSBoYXZlIGRvbmUgb24gdGhlIGRhdGEgKGhpZ2hsaWdodGluZywgZm9ybXVsYXMsIGNvcHktYW5kLXBhc3RpbmcpLCB0aGUgbGVzcyBoYXBweSBSIGlzIGdvaW5nIHRvIGJlIHRvIHJlYWQgaXQuIEhlcmUgYXJlIHNvbWUgdXNlZnVsIGxpbmtzIG9uIHNvbWUgY29tbW9uIHBpdGZhbGxzIGFuZCBob3cgdG8gYXZvaWQgdGhlbQoKLSBodHRwOi8vd3d3LmRhdGFjYXJwZW50cnkub3JnL3NwcmVhZHNoZWV0LWVjb2xvZ3ktbGVzc29uLwotIGh0dHA6Ly9rYnJvbWFuLm9yZy9kYXRhb3JnLwoKIyNIYW5kbGluZyBtaXNzaW5nIHZhbHVlcwoKLSBUaGUgZGF0YSBmcmFtZSBjb250YWlucyBzb21lICoqYE5BYCoqIHZhbHVlcywgd2hpY2ggbWVhbnMgdGhlIHZhbHVlcyBhcmUgbWlzc2luZyDigJMgYSBjb21tb24gb2NjdXJyZW5jZSBpbiByZWFsIGRhdGEgY29sbGVjdGlvbgotIGBOQWAgaXMgYSBzcGVjaWFsIHZhbHVlIHRoYXQgY2FuIGJlIHByZXNlbnQgaW4gb2JqZWN0cyBvZiBhbnkgdHlwZSAobG9naWNhbCwgY2hhcmFjdGVyLCBudW1lcmljIGV0YykKLSBgTkFgIGlzIG5vdCB0aGUgc2FtZSBhcyBgTlVMTGA6CiAgICAtIGBOVUxMYCBpcyBhbiBlbXB0eSBSIG9iamVjdC4gCiAgICAtIGBOQWAgaXMgb25lIG1pc3NpbmcgdmFsdWUgd2l0aGluIGFuIFIgb2JqZWN0IChsaWtlIGEgZGF0YSBmcmFtZSBvciBhIHZlY3RvcikKLSBPZnRlbiBSIGZ1bmN0aW9ucyB3aWxsIGhhbmRsZSBgTkFgcyBncmFjZWZ1bGx5OgoKYGBge3J9Cmxlbmd0aChwYXRpZW50cyRIZWlnaHQpCm1lYW4ocGF0aWVudHMkSGVpZ2h0KQpgYGAKCi0gSG93ZXZlciwgc29tZXRpbWVzIHdlIGhhdmUgdG8gdGVsbCB0aGUgZnVuY3Rpb25zIHdoYXQgdG8gZG8gd2l0aCB0aGVtLiAKLSBSIGhhcyBzb21lIGJ1aWx0LWluIGZ1bmN0aW9ucyBmb3IgZGVhbGluZyB3aXRoIGBOQWBzLCBhbmQgZnVuY3Rpb25zIG9mdGVuIGhhdmUgdGhlaXIgb3duIGFyZ3VtZW50cyAobGlrZSBgbmEucm1gKSBmb3IgaGFuZGxpbmcgdGhlbToKICAgICsgYW5ub3lpbmdseSwgZGlmZmVyZW50IGZ1bmN0aW9ucyBoYXZlIGRpZmZlcmVudCBhcmd1bWVudCBuYW1lcyB0byBjaGFuZ2UgdGhlaXIgYmVoYXZpb3VyIHdpdGggcmVnYXJkcyB0byBgTkFgIHZhbHVlcy4gKkFsd2F5cyBjaGVjayB0aGUgZG9jdW1lbnRhdGlvbioKCmBgYHtyfQptZWFuKHBhdGllbnRzJEhlaWdodCwgbmEucm0gPSBUUlVFKQoKbWVhbihuYS5vbWl0KHBhdGllbnRzJEhlaWdodCkpCmBgYAoKIyMyLiBBbmFseXNpcyAocmVzaGFwaW5nIGRhdGEgYW5kIG1hdGhzKQoKLSBPdXIgYW5hbHlzaXMgaW52b2x2ZXMgaWRlbnRpZnlpbmcgcGF0aWVudHMgd2l0aCBleHRyZW1lIEJNSQogICAgKyB3ZSB3aWxsIGRlZmluZSB0aGlzIGFzIGJlaW5nIHR3byBzdGFuZGFyZCBkZXZpYXRpb25zIGZyb20gdGhlIG1lYW4KCmBgYHtyfQojIENyZWF0ZSBhbiBpbmRleCBvZiByZXN1bHRzOgpCTUkgPC0gKHBhdGllbnRzJFdlaWdodCkvKChwYXRpZW50cyRIZWlnaHQvMTAwKV4yKQp1cHBlci5saW1pdCA8LSBtZWFuKEJNSSxuYS5ybSA9IFRSVUUpICsgMipzZChCTUksbmEucm0gPSBUUlVFKQp1cHBlci5saW1pdApgYGAKCgotIFdlIGNhbiBwbG90IGEgc2ltcGxlIGNoYXJ0IG9mIHRoZSBCTUkgdmFsdWVzIAogICAgKyBhZGQgYSB2ZXJ0aWNhbCBsaW5lIHRvIGluZGljYXRlIHRoZSBjdXQtb2ZmCiAgICArIHBsb3R0aW5nIHdpbGwgYmUgY292ZXJlZCBpbiBkZXRhaWwgc2hvcnRseS4uCgpgYGB7cn0KcGxvdChCTUkpCiMgQWRkIGEgaG9yaXpvbmFsIGxpbmU6CmFibGluZShoPXVwcGVyLmxpbWl0KSAKYGBgCgotIEl0IGlzIGFsc28gdXNlZnVsIHRvIHNhdmUgdGhlIHZhcmlhYmxlIHdlIGhhdmUgY29tcHV0ZWQgYXMgYSBuZXcgY29sdW1uIGluIHRoZSBkYXRhIGZyYW1lCgpgYGB7cn0Kcm91bmQoQk1JLDEpCnBhdGllbnRzJEJNSSA8LSByb3VuZChCTUksMSkKaGVhZChwYXRpZW50cykKYGBgCgotIFRvIGFjdHVhbGx5IHNlbGVjdCB0aGUgY2FuZGlkYXRlcyB3ZSBjYW4gdXNlIGEgbG9naWNhbCBleHByZXNzaW9uIHRvIHRlc3QgdGhlIHZhbHVlcyBvZiB0aGUgQk1JIHZlY3RvciBiZWluZyBncmVhdGVyIHRoYW4gdGhlIHVwcGVyIGxpbWl0CiAgICArIGlmIHRoZSBzZWNvbmQgbGluZSBsb29rcyBhIGJpdCB3ZWlyZCwgcmVtZW1iZXIgdGhhdCBgPC1gIGlzIGRvaW5nIGFuIGFzc2lnbm1lbnQuIFRoZXZhbHVlIHdlIGFyZSBhc3NpZ25pbmcgdG8gb3VyIG5ldyB2YXJpYWJsZSBpcyB0aGUgbG9naWNhbCAoYFRSVUVgIG9yIGBGQUxTRWApIHZlY3RvciBnaXZlbiBieSB0ZXN0aW5nIGVhY2ggaXRlbSBpbiBgQk1JYCBhZ2FpbnN0IHRoZSBgdXBwZXIubGltaXRgCiAgICAKYGBge3J9CkJNSSA+IHVwcGVyLmxpbWl0CmNhbmRpZGF0ZXMgPC0gQk1JID4gdXBwZXIubGltaXQKYGBgCgpXZSBoYXZlIHNlZW4gdGhhdCBhIGxvZ2ljYWwgdmVjdG9yIGNhbiBiZSB1c2VkIHRvIHN1YnNldCBhIGRhdGEgZnJhbWUKCi0gSG93ZXZlciwgaW4gb3VyIGNhc2UgdGhlIHJlc3VsdCBsb29rcyBhIGJpdCBmdW5ueQotIENhbiB5b3UgdGhpbmsgd2h5IHRoaXMgbWlnaHQgYmU/CgpgYGB7cn0KcGF0aWVudHNbY2FuZGlkYXRlcyxdCmBgYAoKVGhlIGB3aGljaGAgZnVuY3Rpb24gd2lsbCB0YWtlIGEgbG9naWNhbCB2ZWN0b3IgYW5kIHJldHVybiB0aGUgaW5kaWNlcyBvZiB0aGUgYFRSVUVgIHZhbHVlcwoKLSBUaGlzIGNhbiB0aGVuIGJlIHVzZWQgdG8gc3Vic2V0IHRoZSBkYXRhIGZyYW1lCgpgYGB7cn0Kd2hpY2goQk1JID4gdXBwZXIubGltaXQpCmNhbmRpZGF0ZXMgPC0gd2hpY2goQk1JID4gdXBwZXIubGltaXQpCmBgYAoKCiMjIDMuIE91dHB1dHRpbmcgdGhlIHJlc3VsdHMKCi0gV2Ugd3JpdGUgb3V0IGEgZGF0YSBmcmFtZSBvZiBjYW5kaWRhdGVzIChwYXRpZW50cyB3aXRoIEJNSSBtb3JlIHRoYW4gc3RhbmRhcmQgZGV2aWF0aW9ucyBmcm9tIHRoZSBtZWFuKSBhcyBhICdjb21tYSBzZXBhcmF0ZWQgdmFsdWVzJyB0ZXh0IGZpbGUgKENTVik6CgpgYGB7cn0Kd3JpdGUuY3N2KHBhdGllbnRzW2NhbmRpZGF0ZXMsXSwgZmlsZT0ic2VsZWN0ZWRTYW1wbGVzLmNzdiIpCmBgYAoKLSBUaGUgb3V0cHV0IGZpbGUgaXMgZGlyZWN0bHktcmVhZGFibGUgYnkgRXhjZWwKLSBJdCdzIG9mdGVuIGhlbHBmdWwgdG8gZG91YmxlIGNoZWNrIHdoZXJlIHRoZSBkYXRhIGhhcyBiZWVuIHNhdmVkLiBVc2UgdGhlICpnZXQgd29ya2luZyBkaXJlY3RvcnkqIGZ1bmN0aW9uOgoKYGBge3IgZXZhbD1GQUxTRX0KZ2V0d2QoKSAgICAgICMgcHJpbnQgd29ya2luZyBkaXJlY3RvcnkKbGlzdC5maWxlcygpICMgbGlzdCBmaWxlcyBpbiB3b3JraW5nIGRpcmVjdG9yeQoKYGBgCgoKVG8gcmVjYXAsIHRoZSBzZXQgb2YgUiBjb21tYW5kcyB3ZSBoYXZlIHVzZWQgaXM6LQoKYGBge3J9CnBhdGllbnRzIDwtIHJlYWQuZGVsaW0oInBhdGllbnQtaW5mby50eHQiKQpCTUkgPC0gKHBhdGllbnRzJFdlaWdodCkvKChwYXRpZW50cyRIZWlnaHQvMTAwKV4yKQp1cHBlci5saW1pdCA8LSBtZWFuKEJNSSxuYS5ybSA9IFRSVUUpICsgMipzZChCTUksbmEucm0gPSBUUlVFKQpwbG90KEJNSSkKIyBBZGQgYSBob3Jpem9uYWwgbGluZToKYWJsaW5lKGg9dXBwZXIubGltaXQpIApwYXRpZW50cyRCTUkgPC0gcm91bmQoQk1JLDEpCmNhbmRpZGF0ZXMgPC0gd2hpY2goQk1JID4gdXBwZXIubGltaXQpCndyaXRlLmNzdihwYXRpZW50c1tjYW5kaWRhdGVzLF0sIGZpbGU9InNlbGVjdGVkU2FtcGxlcy5jc3YiKQoKYGBgCgojI0V4ZXJjaXNlOiBFeGVyY2lzZSAzCgotIEEgc2VwYXJhdGUgc3R1ZHkgaXMgbG9va2luZyBmb3IgcGF0aWVudHMgdGhhdCBhcmUgdW5kZXJ3ZWlnaHQgYW5kIGFsc28gc21va2U7IAogICsgTW9kaWZ5IHRoZSBjb25kaXRpb24gaW4gb3VyIHByZXZpb3VzIGNvZGUgdG8gZmluZCB0aGVzZSBwYXRpZW50cwogICsgZS5nLiBoYXZpbmcgQk1JIHRoYXQgaXMgMiBzdGFuZGFyZCBkZXZpYXRpb25zICpsZXNzKiB0aGFuIHRoZSBtZWFuIEJNSQogICsgV3JpdGUgb3V0IGEgcmVzdWx0cyBmaWxlIG9mIHRoZSBzYW1wbGVzIHRoYXQgbWF0Y2ggdGhlc2UgY3JpdGVyaWEsIGFuZCBvcGVuIGl0IGluIGEgc3ByZWFkc2hlZXQgcHJvZ3JhbQoKCmBgYHtyfQojIyMgWW91ciBBbnN3ZXIgSGVyZSAjIyMgCgoKCmBgYAoK