Friday, May 18, 2012

Use the set operator UNION in PROC SQL

SQL syntax contains a few set operators, such as UNION, EXCEPT and INTERSECT. The UNION operator concatenates the results of multiple SQL queries vertically into a single table for all matching rows, which I found particularly useful in PROC SQL while using SAS to manage data. Here come two examples.

Example 1 – Transpose data
PROC SQL can transform a dataset to any desired structure, without referring to DATA step or PROC TRANSPOSE. For example, SASHELP.CLASS can be transposed from wide to long by the UNION ALL clause, and reversely from long to wide by the MAX function and the GROUP clause.

From wide to long

data wide;
set sashelp.class;
drop sex;
run;

proc sql;
create table long as
select name, 'Age' as var, age as col1
from wide
union all
select name, 'Weight' as var, weight as col1
from wide
union all
select name, 'Height' as var, height as col1
from wide
;quit;
From long to wide

proc sql;
create table wide as
select name,
max(case when var = 'Age' then col1 end) as Age,
max(case when var = 'Weight' then col1 end) as Weight,
max(case when var = 'Height' then col1 end) as Height
from long
group by name;
quit;
Example 2 – Aggregate data into a cube
In SAS, PROC SQL doesn’t support the ROLLUP or CUBE clause. However, we can apply multiple UNION operators to simulate such functionality. For example, we can create a cube table to list all possible summations of the ACTUAL variable by STATE, PRODUCT, YEAR in the SASHELP.PRDSAL2 dataset. Afterward we can easily query this multi-dimensional data structure to look for interesting aggregation information, without running any other aggregating procedure again.

data prdsal2;
set sashelp.prdsal2;
run;

proc sql;
create table cube as
select state,product, year, 'total by state, prodcut and year' as category,
sum(actual) as actual
from prdsal2
group by state, product, year
union
select state, product, ., 'total by state and prodcuct', sum(actual)
from prdsal2
group by state, product
union
select state,'', year, 'total by state and year', sum(actual)
from prdsal2
group by state, year
union
select '',product, year, 'total by product and year', sum(actual)
from prdsal2
group by product, year
union
select '' ,'', year, 'total by year', sum(actual)
from prdsal2
group by year
union
select state, '',. , 'total by state', sum(actual)
from prdsal2
group by state
union
select '', product, ., 'total by product', sum(actual)
from prdsal2
union
select '', '', ., 'grand total', sum(actual)
from prdsal2
order by state, product, year
;quit;

2 comments:

  1. Hi, Charlie,

    As always, really cool page!

    A quick question, hopefully. From wide to long, is there a quick way to do the union if i have many variables? Say, I have variables x1, x2 ..., x20 and i need to reshaping them to a single variable?

    Thank you much!!!!

    ReplyDelete
  2. Thanks from the UK

    ReplyDelete