Programming with data using R

In order to perform the data analysis, we need a tool to help us

R is such a tool that was created for “Programming with Data”, and will be covered in this lesson.

We will require each student to learn a lot on his or her own as it is not possible to exhaustively cover every function or operation used in our exmaples. As you will see, the actual number of resources to learn R is endless (just type “introduction to R” in Google), which then leads to a tyranny of choice and decision fatigue. We might recommend a very few resources:

Rintro Rmanip

One strategy (for learning a new topic in general) is to skim the contents of many resources and identify:

We will present a few examples to initiate a user generally familiar with a limited amount of programming. Basic classifications to keep in mind.

Important “data types” (objects):

Everything in R is technically a “vector” with different or additional attributes such as storage mode, dimension, etc.

Important classes of operations:

Important features when working with real data:

Overview

Rlogo

One of its core-strengths is the collection of user-contributed libraries. You will find that R permits many ways to do the same thing. Find the way that works best for you (and your colleagues), and stick with it.

R for the MATLAB user

Superficial differences:

  • Comment character is # rather than %.
  • Arrows <- are assignment operators. (Can also use equal sign in most places.)
  • *, /, ^ operate element-wise (like .*, ./, .^ in MATLAB).
  • Functions do not have to be defined in separate files.
  • Most operations are performed by calling a function on arguments. Functional calls are very forgiving, and arguments can be specified by 1) order provided or 2) partial matching of argument names.

    > divide <- function(numerator,denominator) numerator / denominator
    > divide(1,2)
    [1] 0.5
    > divide(denominator=1,numerator=2)
    [1] 2
    > divide(d=1,2)
    [1] 2
  • Same control structures exist—if, for, while, etc.—but braces {} denote extent of expressions, rather than end statements.

Check out:

Basics

For the rest of the examples, we will assume the following libraries have been imported and options set:

library(dplyr)
library(reshape2)
library(chron)
library(ggplot2)
source("GRB001.R")
## Loading required package: outliers
Sys.setlocale("LC_TIME","C")
options(stringsAsFactors=FALSE)
options(chron.year.abb=FALSE)
theme_set(theme_bw()) # just my preference for plots

These libraries should be present on the machines in room GRB001. On your own machine, you can install necessary packages from the web:

install.packages("packagename",repos="http://stat.ethz.ch/CRAN/")

or, for multiple packages at once,

install.packages(c("packagename1","packagename2"),repos="http://stat.ethz.ch/CRAN/")

and so on.

In the assignment of values to symbols/variables, <- or = can be used. The former method is recommended.

```r
x <- 1   # scalar (a vector of length 1)
x = 1
x <- 1:5 # vector
x = 1:5
```

Functions accept a set of arguments (inputs) and return a value (output). Even binary operators can be written as a function in prefix notation – e.g., x+1is equivalent to ```+``(x,1).

Variables (data types, objects)

Variables are symbols (e.g., x, y) that represent a set of values. These values can take on one of several types.

  • simple (“atomic”) data types: logical (Boolean), numeric (integer, real/double), complex, character (string), and raw
  • complex structures can be composed of atomic data types: list, data.frame, factors

You can convert among data types with as: as.data.frame, as.list, as.numeric, as.character, and so on.

Lists are like cell arrays or structures in MATLAB.

As with MATLAB, R has vectorized operations. Vectorized operations are applicable for atomic data types, but not for “list”. Example:

x <- 1:5
y <- x+1
print(y)
## [1] 2 3 4 5 6
mode(y)
## [1] "numeric"
typeof(y)
## [1] "double"

You can check the mode of your data type with mode() or typeof().

Apart from the mode, objects can have dimensionality: 1-D (vector), 2-D (matrix), N-D (array).

Special data types:

  • factor (categorical variable) (1-D)
  • data frame (relational table) (2-D)

Creating instances of vectors, matrices, lists, etc.

c concatenates elements and is often used to construct a vector.

(char <- c("ab","d"))      # concatenate elements (character)
## [1] "ab" "d"
(v <- c(1,3,5))           # concatenate elements (numeric)
## [1] 1 3 5

Other objects are created by functions that are named after the object.

(m <- matrix(1:6,ncol=2)) # create a matrix
##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6
(l <- list(1,v,m))        # define a list
## [[1]]
## [1] 1
## 
## [[2]]
## [1] 1 3 5
## 
## [[3]]
##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6

In contrast with MATLAB, Note that a single element in a character vector in R can contain multiple letters.

Objects

An object defines both the 1) data type and 2) operations that are allowed on them, and is labeled by a “class”.

Check the class of an R object by class(), and the functions allowed with methods():

x <- 1
print(class(x))
## [1] "numeric"
methods(class=class(x))
## [1] all.equal     as.data.frame as.Date       as.POSIXct    as.POSIXlt   
## [6] as.raster     coerce        Ops           recode       
## see '?methods' for accessing help and source code

Alternatively, you can query what type of objects a function can operate on. For instance, we have a function called mean():

methods(mean)
## [1] mean.Date     mean.default  mean.difftime mean.POSIXct  mean.POSIXlt 
## [6] mean.times*  
## see '?methods' for accessing help and source code

In R, all variables are objects and all operations are functions. All functions are also objects.

Labeling elements

Elements can have optional labels.

Vectors, lists can have names.

(v <- c(a=1, b=3, c=5))
## a b c 
## 1 3 5
v["a"]
## a 
## 1
names(l) <- names(v)
print(l)
## $a
## [1] 1
## 
## $b
## [1] 1 3 5
## 
## $c
##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6

Matrices can have column and row names.

colnames(m) <- letters[1:ncol(m)]
rownames(m) <- letters[1:nrow(m)]
print(m)
##   a b
## a 1 4
## b 2 5
## c 3 6

Text processing

Processing strings of characters is a relevant part of data analysis.

Manipulate strings:

date <- "2012.03.01"
strsplit(date, ".", fixed=TRUE)
## [[1]]
## [1] "2012" "03"   "01"
paste("2012", "03", "01", sep=".")
## [1] "2012.03.01"
sprintf("%d.%02d.%02d",2012, 3, 1)
## [1] "2012.03.01"

Search for strings:

dates <- c("2012.03.01","2013.03.01")
grep("2012", date, fixed=TRUE)
## [1] 1
grepl("2012", date, fixed=TRUE)
## [1] TRUE
grep("2012", date, value=TRUE, fixed=TRUE)
## [1] "2012.03.01"

To delve deeper, you will eventually want to look into regular expressions.

pattern <- "([0-9]{4})\\.([0-9]{2})\\.([0-9]{2})"
sub(pattern, "\\1", dates)
## [1] "2012" "2013"
sub(pattern, "\\2", dates)
## [1] "03" "03"
sub(pattern, "\\3", dates)
## [1] "01" "01"

Useful example:

data <- read.table("data/2013/LAU.csv", skip=5, sep=";", header=TRUE, check.names=FALSE)
names(data)
## [1] "Date/time"          "O3 [\xb5g/m\xb3]"   "NO2 [\xb5g/m\xb3]" 
## [4] "CO [mg/m\xb3]"      "PM10 [\xb5g/m\xb3]" "TEMP [\xb0C]"      
## [7] "PREC [mm]"          "RAD [W/m\xb2]"

Note non-ASCII encoding.

We can delete everything after whitespace:

sub("[ ].*$","",names(data))
## [1] "Date/time" "O3"        "NO2"       "CO"        "PM10"      "TEMP"     
## [7] "PREC"      "RAD"

With such functions, you can relable your data table columns without assigning them manually (note fixed=TRUE indicates that you are using fixed string patterns and not regular expressions).

names(data) <- sub("[ ].*$","",names(data))
names(data) <- sub("Date/time", "datetime", names(data), fixed=TRUE)

Some text processing functions:

paste()
substr(); substring(); nchar()
strsplit()
sub(); gsub()
grep()
regexpr(); gregexpr() match(); pmatch(); %in% `==`

Set operations

newdate1 <- "2013.03.01"
newdate2 <- "2013.03.02"
union(newdate2, dates)
## [1] "2013.03.02" "2012.03.01" "2013.03.01"
intersect(newdate2, dates)
## character(0)
intersect(newdate1, dates)
## [1] "2013.03.01"
setdiff(dates, newdate1)
## [1] "2012.03.01"

Extract/replace

Can refer to elements by sequence number, or by label.

Vectors:

v[2]
## b 
## 3
v["b"]
## b 
## 3
v["b"] <- 10
print(v)
##  a  b  c 
##  1 10  5

Matrices:

m[1,2]
## [1] 4
m[,"b"]               # select column
## a b c 
## 4 5 6
m[,"b"] <- c(0,10,20) # replace column
print(m)
##   a  b
## a 1  0
## b 2 10
## c 3 20

Lists:

l[2]
## $b
## [1] 1 3 5
l[[2]]
## [1] 1 3 5
l[["b"]]
## [1] 1 3 5
l[["b"]] <- 10
print(l)
## $a
## [1] 1
## 
## $b
## [1] 10
## 
## $c
##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6

You may also see the $ operator used, but this is used to extract or assign to a single element of a list or data frame.

l$b
## [1] 10
l$b <- "a"
l
## $a
## [1] 1
## 
## $b
## [1] "a"
## 
## $c
##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6

Add to the collection:

c(v,d=3)
##  a  b  c  d 
##  1 10  5  3

Remove from the collection:

v[-2]
## a c 
## 1 5
v[!names(v) %in% "b"]
## a c 
## 1 5

Modify the values

v[2] <- "2"
print(v) # now converted to character string
##   a   b   c 
## "1" "2" "5"
  • “The most important distinction between [, [[ and $ is that the [ can select more than one element, whereas the other two select a single element.”
  • 2-D, 3-D, N-D: See x[i,j,...,drop=FALSE] to retain other dimensions.
  • Also, see x[i,j,...,exact=FALSE] to enable partial matching (partial matching enabled for $ by default).
  • Each extraction operator has a corresponding replacement method.

Arithmetic operations

The usual arithmetic operators/functions:

+, -, *, /,                      # binary operators
sum(), prod(), cumsum(), diff(), # apply to vector

Other functions:

exp(), log(), log10(), ^
%%  #(modulo)
%/% #(integer division),
floor(), ceiling(), round(), signif()

Control structures

For loop (note that for this example, you could also use vectorized addition):

x <- 1:5
y <- numeric(length(x))
for(i in 1:length(x)) {
  y[i] <- x[i] + 1
}
print(y)
## [1] 2 3 4 5 6

If-else:

i <- 3
for(i in 1:3) {
  if(i==1) {
    x <- 1
  } else if(i==2) {
    x <- 2
  } else {
    x <- 3
  }
}
print(x)
## [1] 3

Within the use, you can use a few additional structures:

  break # break out of loop
  next  # skip rest of loop

Logical values

Logical values: TRUE or FALSE.

operator description
|, || or
&, && and
<, > less/greater than
<=, >= less/greater than or equal tl
==, != equal/not equal to
%in% is in {collection}
! not
any() is any TRUE
all() are all FALSE
  • ||, && return single value (evaluates only first statement if first statement is TRUE) *???? |, & return vectored value

Negate any expression by prefixing with !.

Missing values

Missing values are encoded as NA.

R has very sophisticated facilities for handling missing values. Test for missing values in a vector with the following functions:

is.na()
is.nan()
is.finite()

You can test for non-missing values with is.na(x).

Many common functions provide a rm.na=TRUE argument. E.g., mean(x,na.rm=TRUE).

You can remove missing values with na.omit(x) but be careful as this can change the length of the vector x.

Anatomy of a function

Example:

Foo <- function(x) {
  y <- 1
  x + y + 2*z
}
  • x is a bound variable, and its value is determined by the argument passed upon function invocation.
  • y is a local variable, which is defined only within the context of the function.
  • z is a free variable, and its value is found in the environment in which the function was defined.

The value of the last expression (x + y + 2*z in this case) is returned from the function.

z <- 2
y <- 2
m <- 1
n <- Foo(m)
print(m)    # remains unchanged
## [1] 1
print(y)    # remains unchanged
## [1] 2
print(n)    # value that is returned
## [1] 6

Flexibility of R function invocation

Example function:

Bar <- function(first, second = 3) {
  first + 2*second
}

Note that default values for arguments can be provided in the function definition, in which chase they become optional arguments for the user to specify.

Explore the possibilities:

Bar(1)
## [1] 7
Bar(1, 3)
## [1] 7
Bar(3, 1)
## [1] 5
Bar(second=3, first=1)
## [1] 7
Bar(s=3, 1)
## [1] 7

Special types

Categorical variables (factor class)

The factor class in R is useful for representing categorical variables, which are discrete variables with a defined set of possibilities.

sites <- factor(c("Lausanne","Zurich"),                # values
                 levels=c("Bern","Lausanne","Zurich")) # set from which values are drawn
sites
## [1] Lausanne Zurich  
## Levels: Bern Lausanne Zurich
unclass(sites)
## [1] 2 3
## attr(,"levels")
## [1] "Bern"     "Lausanne" "Zurich"
sites == "Lausanne"
## [1]  TRUE FALSE
sites[1] <- "Fribourg"                                 # not in defined set of possibilities
## Warning in `[<-.factor`(`*tmp*`, 1, value = "Fribourg"): invalid factor
## level, NA generated
methods(class="factor")
##  [1] [             [[            [[<-          [<-           all.equal    
##  [6] as.character  as.data.frame as.Date       as.list       as.logical   
## [11] as.POSIXlt    as.vector     coerce        droplevels    escape       
## [16] format        initialize    is.na<-       length<-      levels<-     
## [21] Math          Ops           plot          print         recode       
## [26] relevel       relist        rep           show          slotsFromS3  
## [31] summary       Summary       type_sum      xtfrm        
## see '?methods' for accessing help and source code

Caution: factors are integer at heart.

(vec <- c(four=4, five=5))
## four five 
##    4    5
(fac <- factor(c("four","five")))
## [1] four five
## Levels: five four
vec["four"]
## four 
##    4
fac[2]
## [1] five
## Levels: five four
vec[fac[2]]
## four 
##    4

This behavior occurs because:

unclass(fac)
## [1] 2 1
## attr(,"levels")
## [1] "five" "four"

so fac[2] is equivalent to 1, and vec[fac[2]] is vec[1]

Data tables (the R “data frame”)

Note that each column can have a different data type.

(dtable <- data.frame(label=c("a","b"),value=c(1,2)))
##   label value
## 1     a     1
## 2     b     2
ColClasses(dtable)
  label   value

1 character numeric

Common data frame operations defined in base R, with improvements in speeed or usability provided by the reshape2 and dplyr packages. Many of these will be further demonstrated in context throughout the rest of the course.

Operation base R dplyr/reshape2
subset rows [, subset() filter()
select columns [, subset() select()
modify column values [<-, transform() mutate()
rename coluimns rename()
join tables merge(), rbind(), cbind() {full/inner/left/right}_join()
pivot frame stack(), unstack() melt(), dcast()

Chaining operations

Apply a series of functions in sequence. Subset rows where label is equal to “b”, and then change the value to 3.

Note the following sequence of operations:

mutate(filter(dtable,label=="b"),value=3)
##   label value
## 1     b     3

We can accomplish the same operation with pipes:

dtable %>% filter(label=="b") %>% mutate(value=3)
##   label value
## 1     b     3

The %>% is a “postfix operator” and assigns the preceding object (data frame) to the first argument of the proceeding function.

Modfying tables

Return new data frame (original is unmodified):

(newtable <- dtable %>% mutate(value2=NA))
##   label value value2
## 1     a     1     NA
## 2     b     2     NA
print(dtable)
##   label value
## 1     a     1
## 2     b     2

``In place’’ (modify original data frame):

dtable[,"value2"] <- NA
print(dtable)
##   label value value2
## 1     a     1     NA
## 2     b     2     NA

Merging example

Merging tables is a powerful feature.

dtable[,"value2"] <- NULL # delete column

print(dtable)
##   label value
## 1     a     1
## 2     b     2
(dtable2 <- data.frame(label="b", value2=1))
##   label value2
## 1     b      1
inner_join(dtable, dtable2)
## Joining, by = "label"
##   label value value2
## 1     b     2      1
full_join(dtable, dtable2)
## Joining, by = "label"
##   label value value2
## 1     a     1     NA
## 2     b     2      1
(dtable3 <- data.frame(label="c",value=1))
##   label value
## 1     c     1
inner_join(dtable, dtable3)
## Joining, by = c("label", "value")
## [1] label value
## <0 rows> (or 0-length row.names)
full_join(dtable, dtable3)
## Joining, by = c("label", "value")
##   label value
## 1     a     1
## 2     b     2
## 3     c     1

Reshaping/pivoting

Let us revisit an example from Lesson 1:

data <- read.table("data/2013/LAU.csv", sep=";", skip=6,
  col.names=c("datetime","O3","NO2","CO","PM10","TEMP","PREC","RAD"))

Convert to “long” format:

library(reshape2)                     # makes melt() available
lf <- melt(data[1:2,],                # data table
           id.vars="datetime",        # columns to keep fixed (collapse/stack all other variables)
           variable.name="variable",  # name of new variable column
           value.name="value")        # name of new value column

Convert back to “wide” format:

wf <- dcast(lf,                       # data table
            datetime~variable,        # keep 'datetime' fixed; unstack variables in ???variable??? column
            value.var="value")        # column from which values should be taken to fill new wide-format table
head(wf)
##           datetime   O3  NO2  CO PM10 TEMP PREC  RAD
## 1 31.12.2012 01:00  7.8 56.3 0.5 16.1  3.8    0 -2.4
## 2 31.12.2012 02:00 22.4 38.0 0.4 11.6  4.1    0 -2.3

Split, apply, combine

Based on what we’ve learned so far, we can define a function for reading a csv file from the NABEL network:

ReadTSeries <- function(filename, timecolumn="datetime", timeformat="%d.%m.%Y %H:%M") {
  data <- read.table(filename, skip=5, header=TRUE, sep=";", check.names=FALSE)
  names(data) <- sub("[ ].*$","",names(data))
  names(data) <- sub("Date/time", timecolumn, names(data), fixed=TRUE)
  data[,timecolumn] <- as.chron(data[,timecolumn], timeformat)
  data
}
data <- ReadTSeries("data/2013/LAU.csv")
## Warning in strptime(x, format, tz = tz): unknown timezone 'zone/tz/2018c.
## 1.0/zoneinfo/Europe/Zurich'

Add month column:

data[,"month"] <- months(data[,"datetime"])
head(data)
##                datetime   O3  NO2  CO PM10 TEMP PREC  RAD month
## 1 (12/31/2012 01:00:00)  7.8 56.3 0.5 16.1  3.8    0 -2.4   Dec
## 2 (12/31/2012 02:00:00) 22.4 38.0 0.4 11.6  4.1    0 -2.3   Dec
## 3 (12/31/2012 03:00:00) 14.5 37.2 0.3 10.3  3.1    0 -2.1   Dec
## 4 (12/31/2012 04:00:00) 28.7 25.4 0.3 10.5  3.5    0 -2.2   Dec
## 5 (12/31/2012 05:00:00) 19.6 33.7 0.3  9.0  2.9    0 -2.2   Dec
## 6 (12/31/2012 06:00:00) 30.8 51.2 0.3  8.7  3.2    0 -2.3   Dec

Return single value (or series of individually computed values): use summarize().

data %>%
  group_by(month) %>%
    summarize(mean=mean(O3,na.rm=TRUE),
              sd=sd(O3,na.rm=TRUE))
## # A tibble: 12 x 3
##    month     mean       sd
##    <ord>    <dbl>    <dbl>
##  1   Jan 22.62880 15.54421
##  2   Feb 40.58125 17.06185
##  3   Mar 35.02078 21.76502
##  4   Apr 50.78187 23.39744
##  5   May 51.85189 18.45804
##  6   Jun 59.61657 24.25277
##  7   Jul 76.19515 31.87907
##  8   Aug 66.49771 23.86214
##  9   Sep 43.20181 21.75406
## 10   Oct 24.74307 16.98329
## 11   Nov 25.85577 17.67496
## 12   Dec 18.66697 18.18140

Return table: use do().

Statsfn <- function(subtable) {
  O3 <- subtable[["O3"]] # to select a single column in this case, use [[]] rather than [,]
  data.frame(mean=mean(O3,na.rm=TRUE), sd=sd(O3,na.rm=TRUE))
}
data %>%
  group_by(month) %>%
    do(Statsfn(.))
## Source: local data frame [12 x 3]
## Groups: month [12]
## 
## # A tibble: 12 x 3
##    month     mean       sd
##    <ord>    <dbl>    <dbl>
##  1   Jan 22.62880 15.54421
##  2   Feb 40.58125 17.06185
##  3   Mar 35.02078 21.76502
##  4   Apr 50.78187 23.39744
##  5   May 51.85189 18.45804
##  6   Jun 59.61657 24.25277
##  7   Jul 76.19515 31.87907
##  8   Aug 66.49771 23.86214
##  9   Sep 43.20181 21.75406
## 10   Oct 24.74307 16.98329
## 11   Nov 25.85577 17.67496
## 12   Dec 18.66697 18.18140

Suggestions on how to use do():

  • Write a function that accepts an argument: a subtable (a subset of a data table split according to variables passed to group_by()) or its variables, and returns a different table. E.g., let us call this function Foo().
  • Pass the table to do() using this syntax: do(Foo(.)) or do(Foo(.[["column"]])) or do(Foo(.[,c("column1", "column2")])).
    • . represents the table (data frame) itself
    • Foo(.) or its variants above should return a data frame to be merged back with the other results

Key-value pairs

Use labeling feature of R to implement a key-value pair structure (lookup table'',associative list’‘, ``hash table’’).

seasons <- c(
  Dec="DJF",
  Jan="DJF",
  Feb="DJF",
  Mar="MAM",
  Apr="MAM",
  May="MAM",
  Jun="JJA",
  Jul="JJA",
  Aug="JJA",
  Sep="SON",
  Oct="SON",
  Nov="SON"
  )
dframe <- data.frame(month=c("Jan","Feb","Nov"))
dframe[,"season"] <- seasons[dframe[,"month"]]
print(dframe)
##   month season
## 1   Jan    DJF
## 2   Feb    DJF
## 3   Nov    SON

The same thing can be accomplished by merging data frames.

dframe[,"season"] <- NULL
seasons.df <- data.frame(month=names(seasons), season=seasons)
print(seasons.df)
##     month season
## Dec   Dec    DJF
## Jan   Jan    DJF
## Feb   Feb    DJF
## Mar   Mar    MAM
## Apr   Apr    MAM
## May   May    MAM
## Jun   Jun    JJA
## Jul   Jul    JJA
## Aug   Aug    JJA
## Sep   Sep    SON
## Oct   Oct    SON
## Nov   Nov    SON
dframe <- inner_join(dframe, seasons.df)
## Joining, by = "month"
print(dframe)
##   month season
## 1   Jan    DJF
## 2   Feb    DJF
## 3   Nov    SON

Interacting with files on your hard drive (I/O) and objects in working memory

Shell operations

Determine and set working directory:

  getwd()
  setwd("path/to/directory")

The working directory determines where you read from and write files to, and all path names are written relative to this location.

Shell functions:

list.files()
file.copy()
file.rename()
file.remove()
dir.create()
file.info()
basename()
dirname()
...
system()

Reading/writing text files:

scan()
read.table()
write.table()
...

To save R objects:

save(); load()
saveRDS(); readRDS()

Working memory

Check order of packages loaded into memory with search(). This list determines the order in which function definitions will be searched, much like the search path in MATLAB and your operating system. See Namespaces topic below.

To save objects in your memory (“workspace”), use save.image(). This command will create a file called “.RData” on your hard drive, which you can load with load(".RData"). Note that you will have to reload libraries (preferably in the order in which they were loaded previously) to continue seamlessly.

Namespaces

Namespaces can be thought of as containers which resolve conflicts among variables and functions with identical names. For instance, it is common to name your data tables as “data” or “df” (for data frame). These two names are actually functions predefined in R packages loaded at startup (which you can see with search()).

First, let us look at data:

exists("data")
## [1] TRUE
class(data)        # this is our data set we loaded previously
## [1] "data.frame"
rm(data)           # we can delete our object
exists("data")     # an object called 'data' still exists
## [1] TRUE
class(data)        # function to load avaiable data sets
## [1] "function"
environment(data)  # this is loaded in the 'utils' package
## <environment: namespace:utils>

Next, we continue our example with df:

exists("df")       # F distribution distribution
## [1] TRUE
environment(df)
## <environment: namespace:stats>
df
## function (x, df1, df2, ncp, log = FALSE) 
## {
##     if (missing(ncp)) 
##         .Call(C_df, x, df1, df2, log)
##     else .Call(C_dnf, x, df1, df2, ncp, log)
## }
## <bytecode: 0x7ff410219758>
## <environment: namespace:stats>

However, you can still assign values to these symbols:

df <- data.frame(x=1:5)

and these new objects will co-exist with the original objects in separate namespaces.

In this case, when df is used in your code, it will first look in the namespace that you are working in, and then move on down the search() list for other definitions.

df
##   x
## 1 1
## 2 2
## 3 3
## 4 4
## 5 5

If you want to use the original function, you can prepend the namespace with two colons:

stats::df
## function (x, df1, df2, ncp, log = FALSE) 
## {
##     if (missing(ncp)) 
##         .Call(C_df, x, df1, df2, log)
##     else .Call(C_dnf, x, df1, df2, ncp, log)
## }
## <bytecode: 0x7ff410219758>
## <environment: namespace:stats>

Or, more generally, use get:

get("df","package:stats")
## function (x, df1, df2, ncp, log = FALSE) 
## {
##     if (missing(ncp)) 
##         .Call(C_df, x, df1, df2, log)
##     else .Call(C_dnf, x, df1, df2, ncp, log)
## }
## <bytecode: 0x7ff410219758>
## <environment: namespace:stats>

If you want gory details, look here.

Graphics

There are several graphics “paradigms”. In base graphics, a graphic is specified by its primitive elements. In ggplot(2) as illustrated earlier, a graphic is build by describing how the data relates to its grammar.

Anatomy of a graphic

Symbols, lines, and characters:

x <- 1:10
y <- 1:10

Build up with low-level elements:

  plot.new()
  plot.window(range(x),range(y))
  axis(1)
  axis(2)
  box()
  points(x, y, col="blue")
  lines(x, y, col="red")
  title(xlab="x",ylab="y")

Use high level function plot():

  plot(x, y, col="blue")
  lines(x, y, col="red")