What is an ISO 8601 Date?

by Rick Jelliffe

When you see a data field with text like 2007-07-05 you are probably looking at a date in ISO 8601 date format. Year, month, day: YYYY-MM-DD

IS 8601 is in an international standard which gives several standard syntaxes for representing Gregorian dates and times. The full English title is ISO 8601:2004 Data elements and interchange formats — Information interchange — Representation of dates and times. It is only about 33 pages long; you can purchase it from your local standards body or from ISO, and as is common practice for ISO standards, there are final drafts available for free on the Internet. It is maintained by ISO TC 154 who have the Dr Who-ish name of Time Task Force.

Before IS 8601 there were multiple other standards for dates and times. For example, IS 2711 which allowed formats like 5 Jan 2000 and dates using ordinals. 2711 was withdrawn as an ISO standard in 1988, superceded by 8601. However, other nations and bodies have continued to use many of the other ex-standard formats, because it is convenient to have time written according to local conventions. The difficulty, you see, with IS 8601 is that they managed to get a nice unambiguous format for dates by adopting a format that no-one non-technical used: year, month, day. (I won't be dealing with times in this blog.)

The second rub with IS 8601 is that it defines multiple formats. So as well as YYYY-MM-DD for years, such as 2000-01-01 you can also have the basic form 20000101. And the same date could also be represented as YYYY-DDD as 2000-001 with a basic form version of 2000001, where the DDD is an ordinal counting the number of days into the year. And it could be specified as a relative date too: you could specify it as a duration (from a base of, say, 1999-01-01) using the syntax PdddD where P means period and D means days: so P365D or even P12M where M is month. You can do the same with an explicit base and get the notations 19990101/P365D for example. The more exotic you get, the more chance that you have a need for what the standard calls "a mutual agreement" where the exchanging parties agree on what the notation means, because it could have several different meanings under the standard.

The third rub with IS 8601 is that it is based on the Gregorian calendar. This is unsurprising, in view of the dominance of the West and its ex-colonies in trade and standards adoption. However it imposes a conceptual, processing and formatting burden in places where other dates formats are used. It is not as uncommon as people think: I have lived in Taiwan and Japan, where non-Gregorian calendars are used for example. And obviously the Islamic calendar is in wide use.

XML Schema Standards and Dates



XML Schemas (XSD) is a Technical Recommendation made by an industry consortium W3C, which allows direct participation by representatives of fee-paying members. XML Schemas supports a wide range of ISO 8601-ish date formats. For most formats, there is no difference, and there is even an explicit appendix ISO 8601 Date and Time Formats which gives clear information.

XSD provides many different datatypes for dates, times and durations. However, it does not allow all ISO8601 syntaxes, and it does alter others. For example, ISO 8601 allow a year 0000. This is not allowed under the AC/DC calendar system, where you go straight from 1BC to 1AD. XSD's date types disallow the year 0. Most importantly, the date notation used is the extended one with the minuses, so 2000-01-01 not 20000101. XSD allows you to "derive types" to restrict dates to certain values or ranges.

When we were discussing date formats in the W3C XML Schemas Working Group, I tried to get localized date formats allowed: I think it is the same principle as IRIs: it is good if a human can author directly (or generate directly) in the form or notation that they use to think about the data. However, my brilliant idea was rejected by the XML Schema Working Group (with the MS representative taking quite a strong stand that neutral/standard formats should be used, not localized ones) probably because I did not have a proof of concept. Since then, Jenni Tennison's DTTL data typing language has come along, and is being adopted as part of the ISO DSDL multi-part standard. It is, I believe, exactly the right way to go: allow notations in the format that makes most sense to the stakeholders and application requirements, but provide a mapping to neutral/fixed-syntax formats.

In that sense, my personal belief is that ISO 8601 is a relic of a pre-markup and pre-schema mentality. That does not mean it is not valuable nor that it should not be maintained, nor indeed that it shouldn't be the first port of call when looking at date formats. But it pushes localization to be an application consideration whereas I think it is just as legitimate and feasible to make it a markup/parsing (i.e. schema) level issue. This is not only because localized formats (rigorously described with an appropriate declarative schema language) make it easier for humans to read and write, but also because where the consumers and generators of data are computers and humans are relatively unimportant in the pipeline or critical path, then data field notations localized (again, rigorously described) for optimal computer performance is entirely appropriate and smart.

Office Document Formats



ODF and Open XML both use ISO 8601 dates in the YYYY-MM-DD form throughout for all dates. (The ODF spec uses US date MM/DD/YYY formatting in palces in its text, but don't let that confuse you.)

ODF has quite a nice, basic and consistent approach to dates in spreadsheets: read and store them in a kind of ISO 8601 format but also allow a "null date" (such as 1899-12-31) to be specified to allow conversions of date into numbers. Spreadsheets very often actually store, manipulate or transfer dates as ordinal values from an index point: this makes calculations with dates very straightforwards. Representing dates as ordinals is also used in other ISO standards: for example, the SQL_DATE data type gives the number of days since January 1, 1841. (It gives this count as a simple integer.) See section 8.5.2 Calculation settings in ISO ODF for more information.

The draft specification for ISO Open XML, from Ecma, does have one oddity, which has attracted much controversy. In SpreadsheetML table cells only, dates are actually saved as durations, as ordinals. The base is set by an attribute on the workbook, and reflects the supported ranges of Excel on different platforms (on Windows, Excel does not support dates before 1900; on the Macintosh, Excel does not support dates before 1904; putting in such a date will be serialized out as 0 into SpreadsheetML.)

The reasons for saving as as duration rather than a date are obvious: it reflects the internal format directly, allows faster loading and save times to the XML, and allows faster loading and saving times when interfacing with an SQL system that uses SQL_DATE etc. The economic value of load/store times for Office documents is enormous, and it would be quite inappropriate to apply the criteria that one might use, say, for DOCBOOK documents, to standard office formats: I actually think that ODF gets it quite wrong here, and that best practice should dictate that optimized formats should be available. However, by the same token, I think that SpreadsheetML gets it wrong, and that it also should allow reading of data in ISO 8601 format as well as in its optimized notation.

The logical question that comes up is Should SpreadsheetML use ISO 8601 duration format rather than just raw ordinal integers. If the ISO 8601 standard notation was used, SpreadsheetML should use <v>P1D</v> to mark up the first day in the range, rather than <v>1</v>. However, the P and D are redundant, because the notation is clearly marked up by attributes (and documentation). This is the old issue of where the barriers should be between information in markup and information in embedded formats. I don't see that <v>P1D</v> has any benefits over <v>1</v> frankly: it would seem to be an exercise in nominalism and pointless compliance.

<digression>The additional difficulty here is that we are let down by XSD here, again: XSD doesn't allow the type of an element to be selected in part or whole by an attribute value on an ancestor, unlike ISO Schematron and ISO RELAX NG. XSD is completely deficient in support for these kinds of idioms, because the database mindset of its developers let them to conceive of attributes as merely funny kinds of elements rather than as metadata on an element, of the same importance and character as the element name. So XSD doesn't allow attributes to select type; therefore Open XML would have to compromise its design, where elements are highly generic (i.e. data values in spreadsheets are in a "v" for value element) in order to allow values to be typed; however then Open XML could declare the value to be an xsd:duration which would then require the P1D notation. Another approach in XSD would be to use xsi:type where the v element is a union of integers, durations, string etc. However, then we would need to consider how to fit shared string references into the datatyping framework. Too much work! </digression>

The second reason why the ordinal values for dates in SpreadsheetML are controversial is because of an out-by-one adjustment that is needed for some functions for the first two months in 1900. To me, this is just a silly edge case: remembering that spreadsheets from Mac Excel don't even get back to 1900, and on Windows they don't go before 1900: it is hardly the wholesale subversion of the Gregorian calendar that you might suspect from various comments on the Web. ODF perhaps punts the issue, by allowing date indexes to start on 1899-12-31 or on 1900-01-01 (examples they give) and so they leave it up to the application developer or document generator to figure out which one is appropriate.

In my blog last month on Principles for reviewing standards, I took the position, which I think is the most reasonable one, that for embedded data fields the standard forms should be provided and optimized forms may be provided. From that POV, Open XML should also allow ISO 8601 durations and/or dates as well as the simple duration ordinal. And ODF should allow duration ordinals as a matter of best practice,

12 Comments

Yoon Kit
2007-07-10 05:05:13
Hi Rick,


> The reasons for saving as as duration rather than a date are obvious:
> it reflects the internal format directly, allows faster loading
> and save times to the XML, and allows faster loading and saving
> times when interfacing with an SQL system that uses SQL_DATE etc.
> The economic value of load/store times for Office documents is enormous,
> and it would be quite inappropriate to apply the criteria that one
> might use, say, for DOCBOOK documents, to standard office formats:
> I actually think that ODF gets it quite wrong here, and that best
> practice should dictate that optimized formats should be available.


A little C programme shows that this is not the case. Yes, ISO Dates takes 51% longer to be converted compared to numeric date values, but in the real world, this only translates to an additional 0.5s to load up 5.5 million date values!


This "enormous" economic value for load time which you claim has to be justified.


http://www.openmalaysiablog.com/2007/06/will-readabil-1.html


> However, by the same token, I think that SpreadsheetML gets it wrong,
> and that it also should allow reading of data in ISO 8601 format as
> well as in its optimized notation.


Agreed. However why restrict one-self to an arbitrary (and variable) duration/number? Sticking to ISO 8601 as they did with Microsoft Office 2003 would provide tremendous benefits to users.


Regards!


yk.

Rick Jelliffe
2007-07-10 22:53:44
Yoon Kit: Good to see some code and numbers! Well done.


A fast format may be the result of many small optimizations.

Yoon Kit
2007-07-11 21:34:30
> A fast format may be the result of many small optimizations.


The point here is that as an end user, I would rather have humanly readable date-times in my XML spreadsheet rather than some arbitrary number. I am also confident (via the programme) that the performance hit on using ISO 8601 instead of internal numberings is extremely small and I am willing to pay for this "feature" by banking on the fact that computers will be faster.


If we really yearn for "performance" in a fileformat, then it would suite Ecma/Microsoft better if they were to standardise the Microsoft Office BINARY formats instead.


Confusing legacy, app-specific performance with future document encoding in XML is not the right path.


yk.

Rick Jelliffe
2007-07-12 05:51:43
Yoon Kit: No, actually you have demonstrated that a tight loop parses fast, for example because it gets good pipelining and cacheline effects.


'I am willing to pay for this "feature"': with respect, what you are actually saying is that you are willing for *us* to pay for that feature.


If you want their binary formats documented, what has that got to do with their XML format going through ISO?

Yoon Kit
2007-07-12 21:43:49
> Yoon Kit: No, actually you have demonstrated that a tight loop parses
> fast, for example because it gets good pipelining and cacheline effects.


Well, if you have a large spreadsheet with over a million datefields then pipelining and caching will also be used during the load process. Especially days like these where CPU cache is in megabytes.


> 'I am willing to pay for this "feature"': with respect, what you are
> actually saying is that you are willing for *us* to pay for that feature.


Likewise, You are willing to forfeit easy to read, and bug-free dates for a temporary "performance" issue?


What would the people 50 years from now think? What Im saying is that Im willing to invest in the ISO dates feature for future understandability and flexibility. Is that too much a price to pay, Rick?


> If you want their binary formats documented, what has that got to do
> with their XML format going through ISO?


Well, it has everything to do with the MSOOXML spec. On one hand, Microsoft touts MSOOXML as a means to describe its binary file format in XML for legacy support. On the other, its suppose to be future proof. Im just saying that these are two separate problems which should be addressed on their own.


1) If its a legacy issue, then fully describe the binary file formats (since Word 1 to Word 2007). Release this as an Open Standard. This solves that legacy problem fully. There is no reason to create an "abstraction" in XML because we all know that this will leak.


2) If its a future file format issue, then go ahead! You like it, Microsoft likes it. Just dont use the legacy issue to hinder this new XML format.


yk.

Theo
2007-07-14 13:07:08
> The difficulty, you see, with IS 8601 is that they managed to get a nice unambiguous
> format for dates by adopting a format that no-one non-technical used.


We have used the YYYY-MM-DD date format in Sweden for a long time. It is used in all places where dates are communicated in written text (although there is also the more archaic or casual format DD/MM-YY which is more in line with how you would say it in daily speech).


Our equivalent of social security numbers (called "person number", which sound really Orwellian in English), based on our date of birth, is on the format YYMMDD-NNNN, and that system has been in use since 1947.


2007-07-17 16:08:52
"...remembering that spreadsheets from Mac Excel don't even get back to 1900, and on Windows they don't go before 1900"


Those "don't"s are, and will remain, "can't"s as long as Microsoft, which wrote both applications, fails to enter the 21st century.


Time systems have long been a problem, moving from the cases where lunar, seasonal, dynastic, and other changes were important and particular dates less so, to the present, where nanosecond time systems allow sub-centimeter positioning.


These time systems have been based on numerous societies - Aztec, Egyptian, Chinese, Arabic, various European to name a few; and are all incompatible to the extent that conversion from one to another is subject to difficult to obtain and interpret rules (the reason for the 1-2-3 bug, by the by.)


By allowing localizing one suggests a great return to the days when everyone had their own time - less than 200 years ago most every city had noon at it's own time. Until the telegraph and the railroad no one knew different.


While no one is truly stopped from creating their own version because of a standard, it really isn't a standard if everyone can mold the interpretation of that standard into whatever they want. It degenerates the value of the standard by requiring users to first read the standard and then try to figure out what the imple-twister did with his private interpretation.


Finally, there was a time when there was no Excel 1900/1904 attribute. Spreadsheets moved from a PC to a Macintosh regularly showed schedules moving approximately 4 years one way or the other, requiring manual intervention to repair. That's from a single software supplier. Let's hope it can be dealt with in a uniform, multi-vendor way.

Dave S.
2007-07-17 16:09:34
"...remembering that spreadsheets from Mac Excel don't even get back to 1900, and on Windows they don't go before 1900"


Those "don't"s are, and will remain, "can't"s as long as Microsoft, which wrote both applications, fails to enter the 21st century.


Time systems have long been a problem, moving from the cases where lunar, seasonal, dynastic, and other changes were important and particular dates less so, to the present, where nanosecond time systems allow sub-centimeter positioning.


These time systems have been based on numerous societies - Aztec, Egyptian, Chinese, Arabic, various European to name a few; and are all incompatible to the extent that conversion from one to another is subject to difficult to obtain and interpret rules (the reason for the 1-2-3 bug, by the by.)


By allowing localizing one suggests a great return to the days when everyone had their own time - less than 200 years ago most every city had noon at it's own time. Until the telegraph and the railroad no one knew different.


While no one is truly stopped from creating their own version because of a standard, it really isn't a standard if everyone can mold the interpretation of that standard into whatever they want. It degenerates the value of the standard by requiring users to first read the standard and then try to figure out what the imple-twister did with his private interpretation.


Finally, there was a time when there was no Excel 1900/1904 attribute. Spreadsheets moved from a PC to a Macintosh regularly showed schedules moving approximately 4 years one way or the other, requiring manual intervention to repair. That's from a single software supplier. Let's hope it can be dealt with in a uniform, multi-vendor way.

Dave S.
2007-07-17 16:09:50
"...remembering that spreadsheets from Mac Excel don't even get back to 1900, and on Windows they don't go before 1900"


Those "don't"s are, and will remain, "can't"s as long as Microsoft, which wrote both applications, fails to enter the 21st century.


Time systems have long been a problem, moving from the cases where lunar, seasonal, dynastic, and other changes were important and particular dates less so, to the present, where nanosecond time systems allow sub-centimeter positioning.


These time systems have been based on numerous societies - Aztec, Egyptian, Chinese, Arabic, various European to name a few; and are all incompatible to the extent that conversion from one to another is subject to difficult to obtain and interpret rules (the reason for the 1-2-3 bug, by the by.)


By allowing localizing one suggests a great return to the days when everyone had their own time - less than 200 years ago most every city had noon at it's own time. Until the telegraph and the railroad no one knew different.


While no one is truly stopped from creating their own version because of a standard, it really isn't a standard if everyone can mold the interpretation of that standard into whatever they want. It degenerates the value of the standard by requiring users to first read the standard and then try to figure out what the imple-twister did with his private interpretation.


Finally, there was a time when there was no Excel 1900/1904 attribute. Spreadsheets moved from a PC to a Macintosh regularly showed schedules moving approximately 4 years one way or the other, requiring manual intervention to repair. That's from a single software supplier. Let's hope it can be dealt with in a uniform, multi-vendor way.

Dave S.
2007-07-17 16:11:40
Sorry for the multi-post.
Brenda Bargain
2007-08-28 15:01:07
ODF and Open XML both use ISO 8601 dates in the YYYY-MM-DD form throughout for all dates.


Great!


The draft specification for ISO Open XML, from Ecma, does have one oddity, which has attracted much controversy. In SpreadsheetML table cells only, dates are actually saved as durations, as ordinals.


oh. Except that 'Open' Office XML actually doesn't!


The second reason why the ordinal values for dates in SpreadsheetML are controversial is because of an out-by-one adjustment that is needed for some functions for the first two months in 1900. To me, this is just a silly edge case


So, having made a big deal out of the (trivial) time to parse ISO 8601 dates, you now say that an issue which intorduces extra checks on every single operation involving spreadsheet dates is 'just a silly edge case'. Presumably of the processing time used to work around this bug has less economic value for some mysterious reason.


This is purest FUD, Mr. Jeliffe!


Your proposal amounts to allowing dates either to be written according to an agreed (unambiguous) ISO standard ... or not!
If my application can't parse your date, who is violating the standard? This is not what is meant by 'open' or 'standard'.
Or perhaps you're suggesting that every implementation should be able to interpret dates whose format may be described in one or more as yet unspecified schema languages. Is the imposition of this unnecessary overhead this going to help developers? No! Is it going to reduce those economically oh-so-important load/save times? Absolutely not (unless, perhaps, what you actually mean by described with an appropriate declarative schema language is just to switch to a parser optimised to read some unspecified proprietary format)!



What you're really saying here is, "let's not have a clear or implementable standard, at all; let's just write dates how we feel like writing them." Fine, if that's what you want. But why on earth should the ISO be expected to rubber-stamp such a statement.

Rick Jelliffe
2007-08-29 01:25:29
Brenda: Yes, the user perception of load time versus processing time is indeed different. Reviewers benchmark products based on load time for example.


Why wouldn't your application be able to parse the date? There would be a clear attribute to say which notation was being used.


"Or perhaps you're suggesting that every implementation should be able to interpret dates whose format may be described in one or more as yet unspecified schema languages." Actually, the schema language that would be used is ISO 99757 - 4, DTTL, which allows arbitrary notations to be resolved into markup fragments. This standard is being developed by SC34, the committee that is looking after ODF and OOXML. I am a strong believer in schema languages that allow localized data, but you are entirely incorrect to think that this means adopting vague data notations.


The problem with your argument is that you use "implementor" to mean "implementor of systems that consumes OOXML" and exclude implementors of systems that generate OOOXML. Perhaps you work for a company that sells consuming applications such as office suites and so are not aware of the needs of system integrators much. For a generator, a format that allows multiple alternative notations (plurality) is more convenient than a format that allows only a single notation. There is less conversion work. For a format where you have more generators than acceptors, the relative additional burden on the receiving developer of multiple formats can be weighed against the absolute increase in ease-of-use for the generator-developers.