Showing posts with label data preparation. Show all posts
Showing posts with label data preparation. Show all posts

Sunday, December 12, 2010

5D visualiztion: from SAS to Google Motion Chart



Three dimensions are usually regarded as the maximum for data presentation. With the opening of ODS from SAS 9.2 and its graph template language, 3D graphing is no longer a perplexing problem for SAS programmers. However, nowadays magnificent amount of data with multi-dimension structure needs more vivid and simpler way to be displayed.

The emerging of Google Motion Chart now provides a sound solution to visualize data in a more than three dimensions scenario. This web-based analytical technology originated from Dr. Hans Rosling’s innovation. Dr. Rosling and his Gapminder foundation invented a technology to demonstrate the relationship among multiple dimensions by animated bubbles. They developed a lot of bubble plots in Gapminder’s website to discover knowledge form a bulk of public information, especially for regional/national comparison. It soon attracted Google’s attention. In 2008 after an agreement between Dr. Rosling and Google’s two founders, Google launched its Motion Chart gadget. People could create motion chart by using Google Docs, an online alternative to Microsoft’s Office.

The combination between SAS and Google Motion Chart shows a handy and cheap way for up-to-five-dimension data visualization. For Motion Chart, it supports five variables all together in a plot. Commonly the data structure requires time(animation), var1(X axis), var2(Y axis), var3(color) and var4(bubble size). The correlation from var1 to var4 is expected: usually the bubbles with changing color and size tend to move along the diagonal line. Overall 5d visualization can be rendered within such a single plot. In this example, a SAS help dataset ‘SASHELP.SHOES’ is used. The data set has several regions to compare each other. Logged return money is Y-axis, while logged sale money is X-axis. A series of virtual time is given to each region, with inventory as bubble size and the store number as color. By SAS, the data structure in Motion Chart can be prepared quickly. Thus, once the CSV file is uploaded to Google Docs, a motion chart is ready to be published in any webpage. OK, it's time to sit and discover some interesting tendency...

Reference:
1.'Show me--New ways of visualising data’. The Economist. Feb 25th 2010.
2.‘Making data dance’. The Economist. Dec 11st 2010.
3. Google Docs online help center. 2010.

*********(1) Extract data from SASHELP.SHOES***********;
proc sql;
create table test as
select region, Sales, Inventory, Returns, Stores
from sashelp.shoes
order by region , sales desc
;quit;
********(2) Create a random variable for time************;
data test1;
do i=1 by 1 until (last.region);
set test;
by region;
time=today()-i+1;
mytime=put(time, mmddyy8.);
drop i;
output;
end;
run;
********(3) Transform some variables with log**********;
proc sql;
create table test2 as
select region, mytime, log(sales) as logsales, log(returns) as logreturn, Stores as storenum, Inventory
from test1
order by region, mytime
;quit;
********(4) Export data as CSV***************;
proc export data=test2 outfile='C:\Users\Yanyi\Desktop\test.csv' replace;
run;
*******(5) Upload CSV to Google Docs************;
******(6) Create Google Motion Chart manually**********;

**********END*********TEST PASSED 12DEC2010****************************;

Wednesday, April 14, 2010

Labeling variables by a macro in SAS

To rename the variables of a dataset in SAS is a daily routine. SAS or the programmer s would give an arbitrary name for any variable at the initial stage of data integration. Those names have to be modified afterward. Wensui [Ref.1] developed a macro to add prefixes to the variables . Vincent et al. [Ref. 2] extended his idea and added some parameters into the macros. However, giving a name to a variable in SAS has many restrictions regarding the length and the format. For better understanding and recognition, labeling variables instead of renaming them would be useful. In the example below, first comes with an integration of complicated text data. Proc Transpose generates a number of variables with the same prefix.  Then by invoking the label() macro, the dataset would be correctly labeled as desired.

References:
1. Wensui Liu. ‘How to rename many variables in SAS’. http://statcompute.blogspot.com/
2. Vincent Weng. Ying Feng. ‘Renaming in Batches’. SAS Global 2009.

****************(1) MODULE-BUILDING STEP******************;
%macro label(dsin = , dsout = , dslabel = );
/***********************************************************
* MACRO: label()
* GOAL: use a label dataset to label the variables
* of the target dataset
* PARAMETERS: dsin = input dataset
* dsout = output dataset
* dslabel = label dataset
*
***********************************************************/
data _tmp;
set &dslabel ;
num = _n_;
run;

ods listing close;
ods output variables = _varlist;
proc contents data = &dsin;
run;

proc sql;
select cats(a.variable, '="', b.labelname, '"')
into: labellist separated by ' '
from _varlist as a, _tmp as b
where a.num = b.num
;quit;

data &dsout;
set &dsin;
label &labellist;
run;

proc datasets;
delete _:;
quit;
ods listing;
%mend;

****************(2) TESTING STEP******************;
******(2.1) INTEGRATE COMPLICATED DATA*************;
data have;
infile datalines dlm = ',';
retain _row;
input _tmpvar $ @@ ;
if prxmatch("/10\d/", _tmpvar) ne 0 then _row + 1;
if missing(_tmpvar) then delete;
datalines;
100, Tom, 3,1,5,2,6
101, Marlene, 1,2,4
102, Jerry, 9,10,4,
5, 6
103, Jim,2 ,1, 2, 2,4
;
run;

proc transpose data=have out=want(drop = _:)
prefix = var;
by _row;
var _tmpvar;
run;

******(2.2) INPUT LABELS FOR USE*************;
data label;
input labelname $30.;
cards;
Patient ID
Patient last name
The 1st treatment
The 2nd treatment
The 3rd treatment
The 4th treatment
The 5th treatment
;
run;

******(2.3) INVOKE MACRO TO LABEL*************;
%label(dsin = want, dsout = want_labeled, dslabel = label);

****************END OF ALL CODING***************************************;