advertisement

Print

Oracle Cross Tab Views: A Case for Code

by Tom Birch
02/19/2003

I try to avoid coding when assembling data for reporting. It's not that I can't write PL/SQL, but I find that there is more power in well-structured SQL than is generally assumed. Resorting to code just because a query is too complex is a brute-force methodology that I find distasteful. However, one situation that has on occasion beaten my purest intentions is when certain Cross tab functionality is needed. A Cross tab query is used to turn data from vertical to horizontal. MS Access has created this as a query type, but it doesn't technically exist in Oracle. It can, however, be created and work quite well.

In this article, I'll walk you through a scenario where brute force won out over finesse when I needed to create a Cross tab query for a report. Along the way, I'll also explore both the power and the limits of the Cross tab construction.

I work as a consultant in the Canadian forestry sector. Recently, my company worked on a project that required a report be designed to meet government specifications. The report had to show the species mix of trees growing within a defined area, referred to as a stratum. The database stored the tree species and percent for each stratum in a vertical table that looked like the following:

STRATUMID

SPECIES_CODE

SPECIES_PERCENT

43193

Fdi

60

43193

Sp

30

43193

C

10

The StratumId is the foreign key identifying the stand of trees growing inside the boundary of the stratum. The Species_Code is a one-to-three character code representing the tree species. In the above case, the species were Interior Douglas Fir (Fdi), Spruce (Sp), and Cedar (C). Finally, the Species_Percent is a three-digit number field which, when summed for a given StratumId, totals 100. What I needed to do was turn the table into a series of six species and percent columns for the government report. The result was to look as follows:

STRATUMID

SP1

SP%1

SP2

SP%2

SP3

SP%3

SP4

SP%4

SP5

SP%5

SP6

SP%6

43193

Fdi

60

Sp

30

C

10

 

 

 

 

 

 

The species needed to be ordered in the fields from the highest to the lowest percent, so that the lead species would be displayed first. In a case where more than six species had been defined for a stratum, the seventh species and all that followed were to be ignored, even though the total would be less than 100 percent. As this was required for a government report, there was no flexibility on the formatting.

My first thought was to use a Cross tab query to turn the data. To order the data by descending percentages, the Cross tab had to use the values from that field. This meant 100 statements in the query. However, I did not want to return 100 columns (SP100, SP99, SP98, and so on). To avoid this, I needed to build the entire species and percent part of the Cross tab into one long string for each StratumId and then split up the result into the desired set of six species. The Cross tab then looked like this:

SELECT STRATUMID,  
MAX(DECODE(SPECIES_PERCENT, 100, SPECIES_CODE || SPECIES_PERCENT, NULL)) || 
MAX(DECODE(SPECIES_PERCENT, 99, SPECIES_CODE || SPECIES_PERCENT, NULL)) || 
... 
MAX(DECODE(SPECIES_PERCENT, 1, SPECIES_CODE || SPECIES_PERCENT, NULL)) SP_LABEL 
FROM SPECIES_COMPOSITION 
GROUP BY STRATUMID;

Each line would return either a species code and a percent, or a null value. Concatenating these together allowed the nulls to drop out and only the contributing columns to remain. This gave me the entire label in the correct order. I added padding to the Species_Code and Species_Percent fields, so that each species and percent in the string was three characters long. I also added 30 blank spaces to the end of the string (there would always be at least one species and percent, so every value would now be a minimum of 36 characters long) and then used the SUBSTR() function to cut the entire label to 36 characters. This allowed only the first six species to report. I tested the performance on a production database with over 18,000 records in the table and the SQL ran in four seconds. So far, so good; I now had:

CREATE OR REPLACE VIEW STRATUM_LABEL AS  
SELECT STRATUMID,  
SUBSTR( 
MAX(DECODE(SPECIES_PERCENT, 100, RPAD(SPECIES_CODE,3,' ') 
   || RPAD(SPECIES_PERCENT,3,' '), NULL)) || 
... 
MAX(DECODE(SPECIES_PERCENT, 1, RPAD(SPECIES_CODE,3,' ') ||  
   RPAD(SPECIES_PERCENT,3,' '), NULL)) ||  
'                              ',1,36) SP_LABEL 
FROM SPECIES_COMPOSITION 
GROUP BY STRATUMID;

Results:

STRATUMID

SP_LABEL

43193

Fdi60 Sp 30 C 10

43194

C 100

43195

C 30 Sp 20 Sx 18 Fdi13 Pli7 Act6

The only remaining issue now was to cut the string into its 12 component fields. I turned my query into a view (as shown above) and then used it in another query to do the string manipulations. Since I had padded all of the parts of the label to be three characters long, it was a simple matter of using substr(SP_LABEL, 1, 3) and advancing through the label to get all of the required columns. Any blank columns would return a string of three blanks, which was acceptable. I now had a simple query:

SELECT STRATUMID, SUBSTR(SP_LABEL,1,3) SP1, SUBSTR(SP_LABEL,4,3) SP1_PER, 
SUBSTR(SP_LABEL,7,3) SP2, SUBSTR(SP_LABEL,10,3) SP2_PER, 
SUBSTR(SP_LABEL,13,3) SP3, SUBSTR(SP_LABEL,16,3) SP3_PER, 
SUBSTR(SP_LABEL,19,3) SP4, SUBSTR(SP_LABEL,22,3) SP4_PER, 
SUBSTR(SP_LABEL,25,3) SP5, SUBSTR(SP_LABEL,28,3) SP5_PER, 
SUBSTR(SP_LABEL,31,3) SP6, SUBSTR(SP_LABEL,34,3) SP6_PER 
FROM STRATUM_LABEL;

It is usually at this point, when I think that I am nearly done, that I realize there is a problem. This was no exception. The max(decode(field, value, expression, null)) syntax works well for creating Cross tab queries, but it requires that the field being decoded be made unique for each value in the group by clause, without losing any data. The uniqueness is created by the max() function, which could be replaced by sum()--or other functions--which will return a single value, depending on the data type. Because the value I was returning was text, I used the max function.

Related Reading

Mastering Oracle SQL
By Sanjay Mishra, Alan Beaulieu

Next I had to ensure that there was only one value being evaluated for any given percentage in a stratum, or else I would lose data. The unique index on the table was a composite key of StratumId and Species_Code. This meant that while the Species_Code was unique by StratumId, the Species_Percent was not. The result was that if there were two species with the same percentage, then the max() function would return the species whose name sorted the highest. For example, a stratum with 50 percent Fdi and 50 percent C would have a label of "Fdi 50," because both Fdi and C would sort to the 50 percent column, whereupon max() would discard the C species.

With this realization, I knew I was stuck. I could Cross tab by the species, which would fulfill the uniqueness requirement. However, this would have two other problems. It would not order by the percent, which was one of the reporting requirements. Also, when creating a Cross tab in Oracle, all possible values must be accounted for in the max(decode(field, value, expression, null)) statements.

For the Species_Percent, this was a fixed list of 100 values. The Species_Code, however, is user-defined and the list of possible values can change from time to time (for example, the government may change the standards for capitalization on the codes, and then everything could get updated). An Oracle Cross tab view will not adjust to changes in the value list, which means that they are hard to maintain if they are used with user-defined fields.

To solve the problem of user-defined codes, I would normally write a function to create my Cross tab query as a view. This is a fairly simple matter of creating a cursor that contains a distinct set of the codes and then looping through it. Each loop extracts the next code and adds it to a string that is then used to create the view. The downside to this is that the function must run before a report could be produced. This is the only way to ensure that the Cross tab contains the most up-to-date list of codes. But again, you can build this into a report that will run the function, and then use the resulting Cross tab view as its data source.

However, in this case even that would not solve the problem due to the requirement to order by descending percentages. Here I came to a case for the worst kind of code. The solution required a temporary reporting table. (It is always good to know when you are beat before you spend a lot of your client's money trying to build something that will never work.)

I wrote a function that created a temporary table and worked through a cursor ordered by StratumId and Percent, populating species and percent variables. As the StratumId changed, it inserted a new row into the temporary table. Then, for the first six species and percent values, it updated the row. As it moved to a new StratumId it inserted a new row, and so on. As it finished, it opened the report. My temporary table, when populated, was as follows:

STRATUMID

SP1

SP1_PER

SP2

SP2_PER

SP3

SP3_PER

SP4

SP4_PER

SP5

SP5_PER

SP6

SP6_PER

43193

Fdi

60

Sp

30

C

10

 

 

 

 

 

 

43194

C

100

 

 

 

 

 

 

 

 

 

 

43195

C

30

Sp

20

Sx

18

Fdi

13

Pli

7

Act

6

The final report took this table, TMP_SPECIES_COMPOSITION, and linked it to the STRATUM table to produce the final report:

SELECT S.STRATUM_NAME, S.AREA, S.NUMBER_TREES, 
   T.SP1, T.SP1_PER, T.SP2, T.SP2_PER, T.SP3, T.SP3_PER, T.SP4, 
T.SP4_PER, T.SP5, T.SP5_PER, T.SP6, T.SP6_PER 
FROM STRATUM S, TMP_SPECIES_COMPOSITION T 
WHERE S.STRATUMID = T.STRATUMID;

So much for finesse. The Oracle Cross tab only orders on the field that is being manipulated. It works when the entire value set is known and the field can be made unique to the group by values--without losing desired data. A dynamic value set can be handled by creating the view in a simple routine. However, if the field you must order by cannot be made unique, then the structure will not work. In these cases, the brute-force method is actually the right method.

Tom Birch has worked in the British Columbia forest industry since 1991. He is currently a User Support Specialist with Forsite Consultants Ltd., where he focuses on data management and client support.


Return to the O'Reilly Network.