In many data files, the date or time of day will be an important variable. In this introductory tutorial, we will learn some basics on how R handles dates.
There is an international standard ISO 8610 for dates that has the format yyyy-mm-dd hh:mm:ss
. We can use the base R command as.Dates()
to convert dates to this standard. The first argument is a vector with the date(s) to be converted, and the second argument gives the format of these dates:
as.Date(c("06-28-2015", "11-25-2016"), format = "%m-%d-%Y")
## [1] "2015-06-28" "2016-11-25"
as.Date(c("07/14/17"), format = "%m/%d/%y")
## [1] "2017-07-14"
Code | Value |
---|---|
%d |
Day of month as number |
%m |
Month as number |
%b |
Month abbreviated |
%B |
Month full name |
%y |
Year, two digits |
%Y |
Year, four digits |
Basic operations on date objects include finding the number of days between dates,
x <- as.Date("May 12, 2017", format = "%B %d, %Y")
y <- as.Date("March 5, 2017", format = "%B %d, %Y")
x - y
## Time difference of 68 days
and creating a sequence of dates using, for instance, weekly increments.
seq(y, length = 6, by = "week")
## [1] "2017-03-05" "2017-03-12" "2017-03-19" "2017-03-26" "2017-04-02"
## [6] "2017-04-09"
We can also identify the day of the week, or month or quarter on which a date falls.
weekdays(y)
## [1] "Sunday"
months(y)
## [1] "March"
quarters(y)
## [1] "Q1"
Date objects are stored internally in R as the number of days since January 1, 1970.
as.numeric(x)
## [1] 17298
x - as.Date("1970-01-01")
## Time difference of 17298 days
In addition to the date, we may also have time information. The as.POSIXct
command will convert the date and time information to the ISO 8610 standard format. POSIXct class date objects will specify a timezone also.
If you only have a date, then the syntax is similar to what we saw above:
w <- as.POSIXct(c("05/24/2017", "10/15/2019"), format = "%m/%d/%Y")
w
## [1] "2017-05-24 CDT" "2019-10-15 CDT"
data.class(w)
## [1] "POSIXct"
diff(w)
## Time difference of 874 days
The timezone in the output will be system dependent.
To add the time information
u <- as.POSIXct("05/24/2017 06:13:10", format = "%m/%d/%Y %H:%M:%S")
u
## [1] "2017-05-24 06:13:10 CDT"
You can use the same commands on the POSIXct class that you used for the Date class:
weekdays(w)
## [1] "Wednesday" "Tuesday"
quarters(w)
## [1] "Q2" "Q4"
seq(u, length = 5, by = "months")
## [1] "2017-05-24 06:13:10 CDT" "2017-06-24 06:13:10 CDT"
## [3] "2017-07-24 06:13:10 CDT" "2017-08-24 06:13:10 CDT"
## [5] "2017-09-24 06:13:10 CDT"
seq(u, length = 5, by = "hours")
## [1] "2017-05-24 06:13:10 CDT" "2017-05-24 07:13:10 CDT"
## [3] "2017-05-24 08:13:10 CDT" "2017-05-24 09:13:10 CDT"
## [5] "2017-05-24 10:13:10 CDT"
w[1]-u
## Time difference of -6.219444 hours
If you specify just the time, then as.POSIXct
will add the current date:
as.POSIXct("08:45", format = "%H:%M")
## [1] "2016-09-09 08:45:00 CDT"
Sys.time() #current time
## [1] "2016-09-09 16:46:09 CDT"
as.POSIXct("08:45", format = "%H:%M", tz = "EST") #specify time zone
## [1] "2016-09-09 08:45:00 EST"
The list of timezone names is not set in R but is dependent on the user’s operating system. Most operating systems recognize the names given here http://en.wikipedia.org/wiki/List_of_tz_database_time_zones
The ‘lubridate’ package by Grolemund and Wickham is just a wrapper for POSIXct commands: that is, lubridate
commands have a more intuitive syntax for converting to the POSIXct structure.
library(lubridate)
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
Remark
The message following the library
command tells us that there is a command called dates
in lubridate
which will supersede (mask) the similarly named command dates
in base R.
The following commands assume the given dates have the format specified by the command name itself:
mdy("05/12/2014") #format of input is month/day/year
## [1] "2014-05-12"
dmy("21-06-1997") #format of input is day-month-year
## [1] "1997-06-21"
mdy_hms("05/12/2014 11:45:10") #format of input is month/day/year hours:minutes:seconds
## [1] "2014-05-12 11:45:10 UTC"
By default, the mdy_hms
command represents time using UTC, Universal Coordinated Time Zone, which is a standard defined by the International Telecommunications Union Recommendation. Time zones are then expressed as positive or negative offsets from the UTC.
To find the day of the week of a date, expressed either as a number or the name of the day:
wday("05/12/2014 11:45:10")
## [1] 3
wday("05/12/2014 11:45:10", label = TRUE)
## [1] Tues
## Levels: Sun < Mon < Tues < Wed < Thurs < Fri < Sat
u
## [1] "2017-05-24 06:13:10 CDT"
round_date(u, "month")
## [1] "2017-06-01 CDT"
round_date(u, "hour")
## [1] "2017-05-24 06:00:00 CDT"
floor_date(u, "hour")
## [1] "2017-05-24 06:00:00 CDT"
ceiling_date(u, "month")
## [1] "2017-06-01 CDT"
command | output |
---|---|
today |
Date with no time |
now |
Date with time |
year |
Year |
month |
Month |
week |
Week |
yday |
Day of year (number) |
mday |
Day of month (number) |
hour |
Hour |
minute |
Minute |
second |
Second |
tz |
Time zone |
floor_date |
Round down |
ceiling_date |
Round up |
round_date |
Round |