oreilly.comSafari Books Online.Conferences.


AddThis Social Bookmark Button

Using Calculated DataColumns in ADO.NET

by Mike Gunderloy

If you have a background in database work, you have undoubtedly run into the rules of normalization. In particular, you probably internalized the rule that you should not store derived data in the database. For example, if your database includes an Order Details table containing columns for Price and Quantity, you should not put an Item Total column in that table. The item total can always be calculated from the price and quantity when it's needed. By not storing the calculated column, you avoid having to update it whenever the data in one of the other columns changes.

But even though ADO.NET's DataSet class is sometimes described as a "relational database in memory," it's important to remember that there are differences between databases and DataSets. In this particular case, the DataSets are designed to manage calculated columns. In this article I will show you the syntax for building calculated columns in your own DataSets, and discuss some of the pros and cons of using such columns.

A Simple Example

To demonstrate calculated DataColumns in action, I'll build a very simple C# Windows Forms application. The user interface consists entirely of a DataGrid control named dgMain. If you'd like to follow along, start by putting this code into the form's Load event handler:

// Connect to the database
SqlConnection cnn = new SqlConnection();
cnn.ConnectionString = "Data Source=\"(local)\";" +
	"Initial Catalog=Northwind;Integrated Security=SSPI";

// Set up the DataSet to hold all the data
DataSet dsMain = new DataSet();

// Load customers
SqlCommand cmdCustomers = cnn.CreateCommand();
cmdCustomers.CommandType = System.Data.CommandType.Text;
cmdCustomers.CommandText="SELECT * FROM Customers";
SqlDataAdapter daCustomers = new SqlDataAdapter();
daCustomers.SelectCommand = cmdCustomers;
daCustomers.Fill(dsMain, "Customers");

// Load orders
SqlCommand cmdOrders = cnn.CreateCommand();
cmdOrders.CommandType = System.Data.CommandType.Text;
cmdOrders.CommandText="SELECT * FROM Orders";
SqlDataAdapter daOrders = new SqlDataAdapter();
daOrders.SelectCommand = cmdOrders;
daOrders.Fill(dsMain, "Orders");

// Load order details
SqlCommand cmdOrderDetails = cnn.CreateCommand();
cmdOrderDetails.CommandType = System.Data.CommandType.Text;
cmdOrderDetails.CommandText="SELECT * FROM [Order Details]";
SqlDataAdapter daOrderDetails = new SqlDataAdapter();
daOrderDetails.SelectCommand = cmdOrderDetails;
daOrderDetails.Fill(dsMain, "OrderDetails");

// Relate tables

dsMain.Relations.Add(new DataRelation("relCustOrders", 
dsMain.Relations.Add(new DataRelation("relOrdersDetails", 

// Calculations will go here

// Bind data to the user interface
dgMain.DataSource = dsMain;
dgMain.DataMember = "Customers";

Assuming you've done work with ADO.NET, this code should look very familiar. It connects to the Northwind database on the local SQL Server, and then proceeds to use SqlCommand and SqlDataAdapter objects to retrieve information from three tables. It then builds relations between the tables and displays the result on the user interface. Assuming you've got SQL Server on the same computer as .NET, running the project should give you a DataGrid full of data, as shown in Figure 1.

Sample data on a DataGrid
Figure 1. Sample data on a DataGrid

You can drill down in the data by clicking on the + signs and the hyperlinks, which appear when you click on them. Figure 2 shows the result of drilling down twice to reach the order detail level. You can see that the data from the Northwind sample database includes Price, Quantity, and Discount columns for each line item on the order. Note that the header of the DataGrid provides information on the navigation path down to these details.

Line item detail on a DataGrid
Figure 2. Line item detail on a DataGrid

Adding a Calculated Column

Adding a calculated column to the DataSet requires only two lines of code. Insert these lines at the placeholder comment in the main code:

// Add a calculated column to the order details
dsMain.Tables["OrderDetails"].Columns.Add("ItemTotal", typeof(Double));
dsMain.Tables["OrderDetails"].Columns["ItemTotal"].Expression =
	"UnitPrice * Quantity * (1 - Discount)";

The first line of code here adds a new DataColumn named ItemTotal to the OrderDetails DataTable in the DataSet. The Add method takes two arguments in this particular overload: the name of the new column and the type of data that it will contain. The second line of code then uses the Expression property of the new column to tell ADO.NET how to compute the values that this column will contain. In this particular case, the column values are calculated by using other columns in the same DataTable together with standard mathematical functions. Figure 3 shows the results: the last column in the DataGrid displays the results of the calculation for each row.

Calculated column on a DataGrid
Figure 3. Calculated column on a DataGrid

Aggregate Calculated Columns

Another handy feature of ADO.NET is the ability to calculate an aggregate column. To see this in action, add a second calculation to the source code directly after the first:

// Add a calculated column to the orders
	typeof(Double), "Sum(Child.ItemTotal)");

In this case, I've used a different overload of the Add method; this one takes the column name, data type, and expression as three arguments, so I don't need to supply the expression separately. Child is a special keyword for the ADO.NET expression service. It refers to the child DataTable related to the current DataTable. So, the expression calculates the sum of the ItemTotal column (which is itself a calculated column) of all rows in the OrderDetails DataTable related to the current row in the Orders DataTable.

Finally, I can add an overall total column to the Customers DataTable, which shows the total of all orders for the customer, in this way:

// Add a calculated column to the customers
dsMain.Tables["Customers"].Columns.Add("CustomerTotal", typeof(Double), 

When a DataTable contains more than one relationship, you need to specify precisely the relation that should be used in determining the child table. This is not necessary in this case because the Customers DataTable has only a single related child table.

Although I have added only a single calculated DataColumn to each DataTable in this example, you can have multiple calculated DataColumns in a single DataTable. You need to be careful, though, not to create circular references (in which one column depends on another column, which in turn depends on the first column).

DataColumn Expressions

DataColumns support a fairly rich syntax for expressions. The table below shows the major pieces of this syntax.

Token Meaning
ColumnName Refer to columns by name. If the column name contains a special character, enclose the name in square brackets.
50 or 50.0 or 5E1 Numeric constants can be represented as integers, floating point, or in scientific notation.
#9/2/1959# Date constants should be quoted with pound signs.
'Polygon' String constants should be quoted with single quotes.
AND, OR, NOT Boolean operators
<, >, <=, >=, <>, =, IN, LIKE Comparison operators
+, -, *, /, % Arithmetic operators
+ String concatenation operator
* or % Wildcards for string comparison
Child.ColumnName or Child(RelationName).ColumnName Column in a child table
Parent.ColumnName Column in a parent table
Sum(), Avg(), Min(), Max(), StDev(), Var() Aggregate functions
CONVERT(expression, type) Convert an expression to a .NET type
LEN(string) Length of a string
ISNULL(expression, replacement) Returns the expression if it isn't Null, otherwise returns the replacement
IIF(expression, truepart, falsepart) Returns truepart or falsepart depending on whether the expression is true or false
TRIM(expression) Removes leading and trailing blanks
SUBSTRING(expression, start, length) Returns length number of characters from the specified starting point

To reset the expression value, assign an empty string or a null value to it. If you reset an expression and there is a default value specified for the column, then the default will propagate into all of the previously calculated rows.

Uses and Cautions

Related Reading

ADO.NET in a Nutshell
By Bill Hamilton, Matthew MacDonald

Now that you know about this nifty tool, when should you use it? In making the decision, you need to consider your application's user interface, potential performance penalties, and bugs.

On the user interface front, remember that the calculation is being done for all rows in the affected DataTable, whether anyone looks at them or not. Thus, if you're displaying only a few rows out of a very large set of data, it may be more useful to calculate derived values when you need them, instead of in advance. This will avoid loading up memory and processor time with calculations that are never used. On the other hand, if you're using a grid interface, it makes sense to calculate the values for all rows that will be seen in the grid, as I did in this article.

Performance is a tricky thing because you need to worry about both real performance and perceived performance. Suppose you end up displaying only 10 percent of the rows in your data, but you need to display the results of a time-consuming calculation with each row. Should you avoid the calculated DataColumn and instead perform the calculations at display time? Not necessarily. If you use a calculated DataColumn, the application's startup time may be longer, but individual rows will then display quickly because all of the calculations will be performed in advance. This is a situation where perceived performance is more important than actual performance, arguing for the use of the calculated DataColumn.

Finally, I did find one piece of unexpected behavior (that I personally would classify as a bug) when working up the example for this article. Fire up the form, drill down to the order detail level, and change the quantity value for an existing row of data. You'll find that the ItemTotal column for the row is recalculated as soon as you leave the row, but that the values for the OrderTotal and CustomerTotal (as displayed in the header rows) remain unchanged. Navigate back to the parent row (using the left-pointing arrow at the upper right of the DataGrid), or Alt-tab to another application and back (forcing a screen repaint) and the OrderTotal will be updated. But try as I might, I can't find a way to force the CustomerTotal to be updated without writing code. It appears that automatic recalculation only goes up one level the hierarchy.


DataColumn.Expression Documentation (for complete syntax of Expressions).

Mike Gunderloy is the lead developer for Larkware and author of numerous books and articles on programming topics.

Return to