Data Is

by Glenn Vanderburg

I know that "data" is technically the plural of "datum". But I find it jarring when I read that "the data are transmitted" somewhere. In common usage (both speech and informal writing) that data "is transmitted."

It's not that "data" is singular; it's more like a nonspecific collective noun, like "air". It has come to mean -- and I'm going to really massacre the language here, just to emphasize the distinction I'm trying to make -- "some datums". We say "the data is corrupt" in the same way we say "the air is polluted."

At this point you may be thinking I'm just upset that the dictionary doesn't agree with the way I do things. For the record, though, I'm a careful speaker and writer who usually argues for the rules people have forgotten rather than the common, often sloppy usage. This time I think the change in usage has happened for good reasons.

One reason, I think, is that "datum" is so rarely a useful word. I'm not sure why, but we rarely need to distinguish between singular and plural with respect to data; it's almost never important to talk about a single datum.

A related reason is that it's unclear what constitutes a datum. Is it always a bit? Or some larger group of data? (See how slippery it is? Is it reasonable to say that a datum is composed of a group of smaller data?)

My "air" analogy illustrates that problem quite well. Is a molecule of oxygen also an "air molecule"? Air is a mixture, so identifying the smallest unit of air is a tricky thing.

There are contexts, perhaps, where data are discrete and well structured so that the distinction makes sense. But in most cases, data is complex, with an almost fractal structure, and the line between data and datum is almost impossible to draw. (This paragraph is a test, by the way. Which of those sentences seemed most natural to you?)

I think it's time to acknowledge that the old rule, in this case, is obsolete. Circumstance and usage has turned "data" into a collective, singular noun. It refers to "some data" -- and in the tradition of computer science, "some" can mean "zero or more". "Datum" can still be useful on the rare occasions where you need to emphasize a singular unit that can't be described as a bit, byte, octet, scalar, etc.

Update: a respondent, "gojomo", points out that the correct linguistic term for the common usage of "data" is "mass noun". Other examples of mass nouns include water, blood, light, money, and cheese.

Which way do you use the word "data"? Can you think of good reasons for the rules to stay the way they are?


2003-01-04 00:30:31
I agree completely. I find the distinction between 'data' and 'datum' annoying and often confusing. I often use 'data is', and have no plans on changing. I usually think of a 'sea of data'... much like air or water.
2003-01-04 02:35:48
Yes and No
I think there is good reason for not diluting the language.

There was another discussion here not so many days ago, started by Andy Oram, that involved the nuances between raw data and processed data. I participated in that discussion. I do not believe, and have never believed, that it is possible to have useful discussions without precise and exact language.

In that discussion we talked about data. But at the core of it was the existence of the datum. Call it a "fact" if you like - most people don't know the singular form anyway; our schools are crappy.

You know what's a fact? A sentence. An expression. A datum, a single datum, _is_ an expression. It's noun, object, verb. All scientific data are of that form. So are all others. And they can be clearly identified as discrete facts. IMHO.


2003-01-04 04:31:24
'data' has become a 'mass noun'
With its origination as the plural of 'datum' largely forgotten, 'data' seems to have become, over time, a 'mass noun' in common usage and people's mental models.

For more info on mass nouns, see:

2003-01-04 07:54:35
'data' has become a 'mass noun'
Thanks! I knew there was probably a technical term for that kind of noun, but I didn't know what it was.
2003-01-04 08:00:11
Yes and No
I agree that we need to be precise, and I don't think I'm advocating dilution of the language. I don't want to abolish "datum"; it can still be a very useful word. But I do think that "data" has ceased to be strictly plural (and I don't think we've lost anything through that).

The previous post calling data a "mass noun" has it exactly right.

2003-01-04 10:39:47
Yes and No
I read that post; it's interesting, and I didn't know the term "mass noun" either. Learn something new every day. :-)

I happen not to disagree with your main point. But perhaps I mistook the degree of advocacy.

I think you're exactly right - that's how the word is used. But it's used that way because most folks don't even know that it's a plural, and they sure don't know the singular form. And I deplore that. It's language evolution through ignorance.

2003-01-04 12:16:36
language evolves
... in any case. I think this is one of them.

I had a professor who declared that "fun is an adjective, not a noun." Funk that.


2003-01-06 19:06:04
"Data" isn't alone
While I know "data" is technically plural, it just sounds so stilted to use it as such these days.

Besides, other plural Latin words have become singular in English. For example, "agenda" is from the Latin plural of "agendum," and "opera" is from the plural of "opus."

Here's another interesting take on the usage of "data":

Among other things, the author found:

"...the (traditional) meaning 'evidence used in experimental procedures' is most often plural, while the (more recent) meaning 'digital information stored or manipulated by a computer' is most often singular..."

Of course, you can Google search this topic to your heart's content.

2003-01-21 14:35:46
data is
I love it! I'm just getting over being annoyed at people using " are..." Now I'm hearing "...the group are..." or "...the family are..." Love the analogy to agenda!
2003-03-28 08:48:12
"Data" isn't alone
I so agree with you. I am a database administrator and I tend to view "data" as a singular collective noun when I discuss the data in a database. I have been corrected by more than one manager when I say that the "data is ready." This was particularly annoying when I worked an entire weekend to prepare data, told my manager that the "data is ready", and all he could say was "the data are ready". That does not sound right to me.

However, I feel that when someone is talking about data in the sense of pieces of information, such as "the data tell us that more and more people are using data as a singular", that sounds okay.

It is nice to know that I am not in the alone!

2003-06-07 04:49:09
Rules is rules
If the whole world had historically decided that the rule based systems that society adopted to communicate our understanding of our world were only acceptable if they sounded and felt right, the sun would still be spinning round the earth and relativity would be the ramblings of a mad man.

I, like many people, deplore and rebel against what seem to be useless rules.

However, clarity of communication is essential to our understanding of our world. Degrading the hard earned clarity of the English language, because it doesn't sound right, is degrading our view of the world.

I understand the views I have read here. Maybe we should get rid of all plurals and any words that are difficult to pronounce (this certainly should include all foreign language words, except of course excluding Greek, Latin and some French).

Ah, I've got the solution let's get rid of all declamation in English!

2003-10-07 03:40:57
Rules is rules
I agree that it is better to be conservative about the rules rather than readily change them for whimsical reasons. But having said that, language has always been dynamic, language evolves. This is precisely how that "hard-earned clarity" came about. Surely we are not to stand still now?

I think "data is" is not a mere question of it sounding better (for esthetic reasons), but is preferred because of conceptual reasons.

I learned about "datum" and "data" and compound nouns (mass nouns) at school, but I shall definitely be using "data is" in our cyber age:-)

P. Truyens, translator

2003-10-07 03:49:27
Rules is rules
Oops, I said compound nouns there, but what I meant was uncount nouns (of which mass noun is a special kind).
2004-05-07 18:33:28
My dictionary says the data has
according to my dictionary , Random House Webster's College Dictionary, published in 1992,
Data can be either plural or singular, It is plural when it refers to individual pieces of information, but singular when used as a mass noun such as "the data has been read."

I guess the key here is that data should not be used as plural unless there is more than one.

2004-05-07 19:03:21
another thing
I think in many cases, as touched upon by an earlier post, the people who say "data are" are not thinking of "data" as plural of "datum" but instead are confused about collective and mass nouns. The same people who say "the data are..." also say "Nasa are studying..." "The U.S. Army are attacking...", "what do the board of dirctors do?", etc. The use of "are" instead of "is" is one of my pet peeves. Of course, it isn't just the verb "to be" that is often mangled in this fashion, it just seems to be the most common victum.

Another situation involves letting the predicate (or specifaclly, the object) determine the form of the verb instead of the subject. For exmaple, an infamous infomercial stated "The secret are these retractable pins." other examples would be "His favorite sidedish are french fries." and "his best protection are his dogs."

Sometimes I just can't get through an entire newscast without shouting "IS" numerous times. News anchors and other talking heads are paid to talk, they should be able to do it right!

OK, I guess I'm getting a little off topic, so I won't get into dangling particples and the frequent misuse of "misnomer."

2004-05-07 19:05:43
I guess there is no way for me to edit that to correct my typo.