In many data files, the date or time of day will be an important variable. In this introductory tutorial, we will learn some basics on how R handles dates.

Using base R to work with dates

There is an international standard ISO 8610 for dates that has the format yyyy-mm-dd hh:mm:ss. We can use the base R command as.Dates() to convert dates to this standard. The first argument is a vector with the date(s) to be converted, and the second argument gives the format of these dates:

as.Date(c("06-28-2015", "11-25-2016"), format = "%m-%d-%Y")
## [1] "2015-06-28" "2016-11-25"
as.Date(c("07/14/17"), format = "%m/%d/%y")
## [1] "2017-07-14"
Code Value
%d Day of month as number
%m Month as number
%b Month abbreviated
%B Month full name
%y Year, two digits
%Y Year, four digits

Basic operations on date objects include finding the number of days between dates,

x <- as.Date("May 12, 2017", format = "%B %d, %Y")
y <- as.Date("March 5, 2017", format = "%B %d, %Y")
x - y
## Time difference of 68 days

and creating a sequence of dates using, for instance, weekly increments.

seq(y, length = 6, by = "week")
## [1] "2017-03-05" "2017-03-12" "2017-03-19" "2017-03-26" "2017-04-02"
## [6] "2017-04-09"

We can also identify the day of the week, or month or quarter on which a date falls.

weekdays(y)
## [1] "Sunday"
months(y)
## [1] "March"
quarters(y)
## [1] "Q1"

Date objects are stored internally in R as the number of days since January 1, 1970.

as.numeric(x)
## [1] 17298
x - as.Date("1970-01-01")
## Time difference of 17298 days

2. The POSIXct class

In addition to the date, we may also have time information. The as.POSIXct command will convert the date and time information to the ISO 8610 standard format. POSIXct class date objects will specify a timezone also.

If you only have a date, then the syntax is similar to what we saw above:

w <- as.POSIXct(c("05/24/2017", "10/15/2019"), format = "%m/%d/%Y")
w
## [1] "2017-05-24 CDT" "2019-10-15 CDT"
data.class(w)
## [1] "POSIXct"
diff(w)
## Time difference of 874 days

The timezone in the output will be system dependent.

To add the time information

u <- as.POSIXct("05/24/2017 06:13:10", format = "%m/%d/%Y %H:%M:%S")
u
## [1] "2017-05-24 06:13:10 CDT"

You can use the same commands on the POSIXct class that you used for the Date class:

weekdays(w)
## [1] "Wednesday" "Tuesday"
quarters(w)
## [1] "Q2" "Q4"
seq(u, length = 5, by = "months")
## [1] "2017-05-24 06:13:10 CDT" "2017-06-24 06:13:10 CDT"
## [3] "2017-07-24 06:13:10 CDT" "2017-08-24 06:13:10 CDT"
## [5] "2017-09-24 06:13:10 CDT"
seq(u, length = 5, by = "hours")
## [1] "2017-05-24 06:13:10 CDT" "2017-05-24 07:13:10 CDT"
## [3] "2017-05-24 08:13:10 CDT" "2017-05-24 09:13:10 CDT"
## [5] "2017-05-24 10:13:10 CDT"
w[1]-u
## Time difference of -6.219444 hours

If you specify just the time, then as.POSIXct will add the current date:

as.POSIXct("08:45", format = "%H:%M")
## [1] "2016-09-09 08:45:00 CDT"
Sys.time()              #current time
## [1] "2016-09-09 16:46:09 CDT"
as.POSIXct("08:45", format = "%H:%M", tz = "EST")  #specify time zone
## [1] "2016-09-09 08:45:00 EST"

The list of timezone names is not set in R but is dependent on the user’s operating system. Most operating systems recognize the names given here http://en.wikipedia.org/wiki/List_of_tz_database_time_zones

3. The lubridate package

The ‘lubridate’ package by Grolemund and Wickham is just a wrapper for POSIXct commands: that is, lubridate commands have a more intuitive syntax for converting to the POSIXct structure.

library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date

Remark

The message following the library command tells us that there is a command called dates in lubridate which will supersede (mask) the similarly named command dates in base R.

The following commands assume the given dates have the format specified by the command name itself:

mdy("05/12/2014") #format of input is month/day/year
## [1] "2014-05-12"
dmy("21-06-1997") #format of input is day-month-year
## [1] "1997-06-21"
mdy_hms("05/12/2014 11:45:10") #format of input is month/day/year hours:minutes:seconds
## [1] "2014-05-12 11:45:10 UTC"

By default, the mdy_hms command represents time using UTC, Universal Coordinated Time Zone, which is a standard defined by the International Telecommunications Union Recommendation. Time zones are then expressed as positive or negative offsets from the UTC.

To find the day of the week of a date, expressed either as a number or the name of the day:

wday("05/12/2014 11:45:10")
## [1] 3
wday("05/12/2014 11:45:10", label = TRUE)
## [1] Tues
## Levels: Sun < Mon < Tues < Wed < Thurs < Fri < Sat

Rounding dates

u
## [1] "2017-05-24 06:13:10 CDT"
round_date(u, "month")
## [1] "2017-06-01 CDT"
round_date(u, "hour")
## [1] "2017-05-24 06:00:00 CDT"
floor_date(u, "hour")
## [1] "2017-05-24 06:00:00 CDT"
ceiling_date(u, "month")
## [1] "2017-06-01 CDT"

Some common lubridate commands

command output
today Date with no time
now Date with time
year Year
month Month
week Week
yday Day of year (number)
mday Day of month (number)
hour Hour
minute Minute
second Second
tz Time zone
floor_date Round down
ceiling_date Round up
round_date Round