Being careful not to repeat the year 1901 mistake, I set the TZ
variable
before I run R. I have the same set of data that I convert as follows:
dates <- c("11/11/1900", "01/01/1901", "30/05/1901", "01/01/1902") values <- c( 1, 2, 0.7, 0.1 ) date1 <- as.Date( dates, "%d/%m/%Y" ) date2 <- as.POSIXct( dates, "%d/%m/%Y" )
and then plot
plot( date1, values ) plot( date2, values )
To my surprise I end up with the following two graphs.
A number of things conspired against me here:
- positional parameters,
- default parameters.
Well the main cause is that I did not read the manual and assumed the parameters
for as.Date()
are similar to those of as.POSIXct()
.
But ask yourself how many times you used a function without consulting
the manual because you thought you knew what the parameters were.
So lets look at the other causes.
The help for as.Date()
shows the following possible parameters.
as.Date(x, ...) ## S3 method for class 'character' as.Date(x, format = "", ...) ## S3 method for class 'numeric' as.Date(x, origin, ...) ## S3 method for class 'POSIXct' as.Date(x, tz = "UTC", ...)
Depending on the type of object you are trying to convert different parameter lists apply. Lets focus on character objects.
as.Date(x, format = "", ...)
Two parameters are expected:
x
an object to convert,format
a format string that specifies how the dates are formatted.
An example of the former.
as.Date('1492-11-29', '%Y-%m-%d') [1] "1492-11-29"
Notice that format
also has a default value,
""
.
Which means we do not have to provide it.
This indeed works.
as.Date('1492-11-29') [1] "1492-11-29"
Well sort of.
If format
equals ""
R tries %Y-%m-%d
and %Y/%m/%d
.
It warns when this does not succeed
as.Date('29-Nov-1492') Error in charToDate(x) : character string is not in a standard unambiguous format
But fails without warning for
as.Date('29-11-1492') [1] "29-11-14"
29-11-1492 is interpreted as the 29th year, 11-th month and 14-th day. The remaining string "92" not used, but this is not reported.
Default parameters can save time but it is better to be explicit and say what you mean and specify the format of your data, so R does not have to guess, and you won't end up being surprised.
Back to how to provide the values of the parameters. We have seen the positional method, the other one is by name. This would be:
as.Date(x='1492-11-29', format='%Y-%m-%d') [1] "1492-11-29"
It even works the other way around now.
as.Date(format='%Y/%m/%d', x='1492/11/29') [1] "1492-11-29"
It is more work to type this, and probably not worth it when you are just using R interactively. But if you are writing a script that is to be reused, this is the best way. It is very explicit, in a good way. You tell R exactly what you mean. In addition you tell your future self what you meant when you wrote it. It help others to understand what you are trying to say. This is good for reproducible research.
Now what went wrong in the original plot?
My mistake was to assume that the parameters for
as.POSIXct()
appear in the same order as
the ones for as.Date()
. This is not
the case however, as can be seen from the help pages.
## S3 method for class 'character' as.Date(x, format = "", ...) ## S3 method for class 'character' as.POSIXlt(x, tz = "", format, ...)
Therefore the conversion
date2 <- as.POSIXct( dates, "%d/%m/%Y" )
ended up using "%d/%m/%Y" as the tz
parameter. It did not find a value for the format
parameter and therefore guessed one (%Y-%m-%d).
They day numbers got interpreted as years, and the graph
therefore shows the years, 1, 11 and 30. About 2000 years ago
instead of the original 100 years given by the data.
I would have avoided the surprising graph had I used named parameters, even without reading the manual.
Conclusion
Be explicit and be beware of implicit defaults in R.
No comments:
Post a Comment