How do you pronounce => ?

by Jesse Liberty

Since Silverlight 2 is around the corner, and with it Managed Code and LINQ, I'm cross posting the following to both my Silverlight blog and my O'Reilly blog. Hope that isn't too annoying.

I believe it is a major stumbling block, when learning new technology, if you can't say the syntax "out loud in your head." -- that is if you can't read the code to yourself in a way that you can then translate into a meaningful sentence. For example, when learning C#, if you see

int result = employees.Add(237, new Employee("John Doe", theAddressRecord);

If you don't speak C# you can't really read this to yourself without stumbling. What do you do with the dot between employees and Add? What do you do with the commas? the parenthesis? Do you pronounce them (employee dot add?)

if you "speak" C# you can read this to yourself quite easily; perhaps without noticing: "Call the Add method on the Employees object, and pass in a new Employee object, initialized with two parameters, a string and an object and return a value that you will assign to the local integer variable result."

In fact, you'd go much further, based on your knowledge of C# and you'd read it complete with the logical inferences: "add to the Employees dictionary a new Employee object, keyed to the integer value 237. The new Employee's constructor takes two parameters: the Employee's name as a string, and the Employee's address as a AddressRecord. The Add Method of the Dictionary returns an integer indicating success or failure which is assigned to a local variable named result."

Now, I posed the following question to Ian Griffiths: "how do you pronounce this C# LINQ statement:

IEnumerable<Person> results = people.Where(p => p.LastName == "Liberty");

Ian Griffiths is a consultant, developer, speaker, author, blogger, and to my great fortune, he was one of the technical editors for the fifth edition of my book Programming C# 3.0

Ian's first response to my question was "IEnumerable of Person results equals people dot where p goes to p dot last name equals Liberty .. or just people where last name equals Liberty" ... However, I've not spent any time trying to devise a way of saying LINQ that's necessarily comprehensible to anyone listening who doesn't have the source code to look at...I'm also wondering if I'm missing a trick question

[Let me note now that I'm abbreviating both Ian's comments and mine to make this readable and keep to the essence of the discussion]

I explained my reasons for wanting to teach how to pronounce it and suggested

Declare results as an instance of a collection that implements the generic interface IEnumerable of Person, and assign to results each member of the collection people (which we assume to be a collection of Person objects) that meets the condition given in the parentheses. The condition is: let p represent each member of people in turn, give me each p where p.LastName is equal to Liberty."


Ian objected to the two parts I marked in bold.

"First, results holds an instance of a collection, and I prefer to keep the distinction between the variable and the object clear."

After some back and forth, we agreed that it is better to say that results is a reference to an instance of a collection.

Ian's' second objection was more substantive and nailed my misunderstanding. He wrote,

The language suggests that we will assign each matching Person into results, which isn't really true. It could be taken to suggest a process that looks like this:

List<Person> results = new List<Person>();
foreach (Person p in people)
{
if (p.LastName == "Liberty") { results.Add(p);
}

and while that might have the same effect, I think it's potentially misleading to think in those terms. (For one thing, the approach shown here will fail for infinite collections. But LINQ is quite happy to evaluate infinite collections lazily, so long as a) you use suitable enumerator and operators, and b) you never ask it to materialize the full results of the query.)

To that end, part of me wants to pare it right down:

"Let 'results' be all the 'Person' objects in 'people' that have a LastName of "Liberty"."

For me that captures the essence, and avoids getting bogged down in details. I like it because it doesn't make many assumptions about what 'people' is. (Specifically...very specifically and somewhat pedantically in fact...it makes the assumption that when we examine people through the standard LINQ 'Where' operator, it appears to contain a set of objects. And I chose my wording very carefully there - that does not mean that 'people' is necessarily a collection of objects. I could for example write a LINQ to SwipeCard library; 'people' might actually be a SwipeCardReader, and I may have provided a Where extension method that can be applied to a SwipeCardReader that returns an enumerator that yields an object each time someone swipes a card that matches the Where predicate.

OK that'd be a slightly weird thing to do - I'm just illustrating that there are scenarios in which talking in terms of 'collections' doesn't fit. More pragmatically, in LINQ to SQL and LINQ to Entities, the Where clause ends up getting converted into SQL...so all you know is that it yields filtered output, and you can't talk about how it achieves this in object terms.)

But that doesn't explain the individual pieces. If we want to say precisely what each bit of that code does, we need a more complete explanation. And for that...well I'm still in two minds. It depends on context - how much do we really know about 'people'. Given just that line of code, people could be anything, and we might well be building a query against a database here. The pattern presented is using one of the standard LINQ operators, so it's applicable to LINQ to Objects, LINQ to SQL, and LINQ to anything else that might spit out an object that might have a LastName property. (So I don't think this particular example would work directly with LINQ to XML. However, it's still possible that person was a LINQ to XML query whose SELECT clause happened to project the results into a .NET object. But that would make this LINQ to Objects...)

But if we can assume that 'people' really is a collection objects, and that we're using LINQ to objects, then my 'no stone unturned' version might look like:

"Declare a variable 'results', which will hold an enumerable set of Person objects, calculated by invoking the 'Where' operation on 'people'. 'Where' is a standard LINQ operator that performs filtering. Since 'people' is a collection, the 'Where' operator is provided in this case by LINQ to Objects, and it is an extension method. (So we are really invoking Enumerable.Where here, even though the syntax makes 'Where' look like a member of the object referred to by 'people'.) The parameter to 'Where' is a lambda expression that will be evaluated for each Person 'p' in people. The 'Where' method includes all Person objects for which the expression evaluates to true (i.e., the ones whose LastName property is "Liberty") and excludes the rest. 'Where' returns all of the included objects as an IEnumerable<Person>, which is assigned into the 'results' variable."

Comprehensive and, I fear, unreadable...


This was particularly powerful to me, because I had exactly that foreach loop in my head. I replied acknowledging that, and also highlighting his distinction of a set vs a collection and I didn't much like the includes / excludes language, and asked about using

the 'where' method yields all Person objects for which the expression evaluates to true.

We agree that the problem with my language is that it doesn't quite make explicit that those that don't match are dropped on the floor, but on the other hand it is less ugly than saying

the 'where' method yields all (and only) Person objects for which the expression evaluates to true

2 Comments

Adam
2008-02-06 08:49:41
You're missing the last paren: )
Matthias
2008-02-06 23:17:53
If the object is to read the statement (as opposed to "explain how it works"), conveying a model of what it does instead of how, I'm totally with that part of Ian that proposes the one-liner.
"Where" corresponds to the notation in mathematics for defining a subset by giving a predicate on the elements:
U(x) = { y \in S | d(x,y)<epsilon }


"U of x is the set of all points in S whose distance to x is smaller than epsilon".


The purpose of LINQ, as I understand it, is that you do not have to translate from that level of abstraction down to iterators and loops - the compiler does it for you.