backup

Thursday, September 29, 2011

An easy solution for Multi-Sheet EXCEL reporting

Currently the only way to output SAS datasets as a multi-sheet EXCEL workbook for reporting is to use ExcelXP ODS tagset. I like this method a lot, because it can generate stylish multiple EXCEL sheets and is highly customizable. However, in practice it has some weaknesses. 1 - Running this tagset is resource-costly, since it depends on an 8k lines SAS codes - ExcelXP.sas. While dealing with a large SAS dataset, it always gets jammed. 2- It only allows one grouping variable by the BY statement inside the output procedures (PROC REPORT, PROC PRINT, etc.). 3 - The user often has to estimate the width for each column in EXCEL.

Actually we can use SAS macro and VBA macro together to obtain high-quality multi-sheet EXCEL workbook. The workflow is pretty simple: first a SAS macro splits a SAS dataset into many XLS files in a folder through ODS HTML targset. Second a VBA macro merges those single XLS files as sheets in to a workbook. For example, SAS shipped with a sample dataset SASHELP.PRDSAL2 with 23040 observations and 11 variables. If we want to generate a multi-sheet EXCEL workbook grouped by two variables such as ‘state’ and ‘year’, we can set up an empty directory in the hard disk and run a macro like below. As a result, we will have a number of small XLS files.


%macro split(data = , folder = , clsvar1 = , clsvar2 = );
   options nocenter nodate nonumber ps = 9000; 
   title; footnote;
   ods listing close;
   proc sql noprint;
      create table _tmp01 as 
      select &clsvar1, &clsvar2, count(*) as number
      from &data
      group by &clsvar1, &clsvar2
      order by &clsvar1, &clsvar2
   ;quit;
   data _tmp02;
      set _tmp01 nobs = nobs;
      where number gt 0;
      index = _n_;
      call symput('nobs', nobs);
   run;
   %do i = 1 %to &nobs;
      proc sql noprint;
         select &clsvar1, &clsvar2
            into:clsvar1name,:clsvar2name
         from _tmp02
         where index = &i
      ;quit;
      %let filepath = &folder\%sysfunc(dequote(&clsvar1name))_%sysfunc(dequote(&clsvar2name)).xls;
      ods html file = "&filepath " style = minimal; 
      proc print data = &data noobs label;
         where &clsvar1 = "&clsvar1name" and &clsvar2 = &clsvar2name;
      run;
   %end;
   ods listing; 
   ods html close;
%mend;
%split(data = sashelp.PRDSAL2, folder = C:\test1, clsvar1 = state , clsvar2 = year)

Then we can open EXCEL, press ALT+F11, paste the VBA code below and run it. Then we will be able to have a decent multi-sheet EXCEL workbook. The biggest strength for this method is that it is very fast – the overall process (running SAS macro and VBA macro) only takes less than a minute for this relatively large dataset SASHELP.PRDSAL2. And it can be expanded to many grouping variables by modifying the SAS macro a little. In conclusion, for big data EXCEL reporting, combining SAS macro and VBA macro together is a good alternative other than ExcelXP ODS tagset.
VBA Draft 3.1 Blogpost

Monday, September 19, 2011

A test to count missing values for large data

This morning Rick introduced how to count the missing frequencies for all variables in a dataset, including character and numeric variables. He provided two solutions by either PROC FREQ or PROC IML. I have a petty macro based on PROC SQL’s nmiss() function to do the same job. In this big data era, I am interested in those SAS codes’ efficiencies to check large data.

Then I simulated a 1e8 size dataset (5 numerical and 5 character variables) and tested the 3 methods. According to the results, PROC FREQ is slightly better than PROC SQL in both memory usage and speed. However, a macro may be needed to integrate a number of output tables by PROC FREQ for reporting (the demo macro by SAS Knowledge Base is an option and can be improved). Sadly, PROC IML used out all the 250MB memories the server allocated to me and can’t finish this test. Similarly, the memory volume should be considered for other matrix languages like R and Matlab.

P.S.

Thanks to Rick’s new codes, missing values were counted successfully by PROC IML finally. The real time spent by PROC IML was 150 seconds, which is almost equal to running the macro by PROC SQL. It is interesting to compare the functions from these two procedures: COUNT() and NMISS() functions in PROC SQL and COUNTN() and COUNTMISS() functions in PROC IML can work for both numeric and character variables.


**********************(1) SIMULATION STEP**************************************;
data test;
   array _c{*} $ c1-c5;
   array _n{*} n1-n5;
    do i = 1 to 1e8;
       do j = 1 to 5;
          if ranuni(1234)>0.2 then _c[j]= 'A';
          else call missing(_c[j]);
          if ranuni(4321)>0.3 then _n[j] = rannor(0);
          else call missing(_n[j]);
       end;
       output;
       drop i j;
    end;
run;

**********************(2) MODULE-BUILDING STEP*********************************;
%macro countmiss(data = );
   ods listing close;
   ods output variables = _varlist;
   proc contents data = &data;
   run;
   proc sql;
      alter table _varlist
      add nmiss num
      ;
      select count(*) into :nvar
      from _varlist
      ;
      select count(*) into :nobs
      from &data
      ;
   quit;
   %do i = 1 %to &nvar;
      proc sql;
         select cats('nmiss(', variable, ')') into :miss_var
         from _varlist
         where num = &i
         ;
         select &miss_var into: miss_num
         from &data
         ;
         update _varlist
         set nmiss = &miss_num
         where num = &i
         ;
      quit;
   %end;
   proc sort data = _varlist;
     by num;
   run;
   ods listing;
   proc report data = _varlist nowd split = '*';
      columns  type num variable label nmiss pcnmiss;
      define type / group;
      define variable / width = 15;
      define nmiss / display 'Missing*Number';
      define pcnmiss / computed format = percent8.3
                  'Missing*Percentage' width = 10;
      compute pcnmiss;
         pcnmiss = nmiss / &nobs;
      endcomp;
   run;
%mend countmiss();

proc format;
 value $missfmt ' '='Missing' other='Not Missing';
 value  missfmt  . ='Missing' other='Not Missing';
run;

**********************(3) TESTING STEP***************************************;
options fullstimer;
*******************METHOD(1): PROC SQL***************************************;
%countmiss(data = test);
 
*******************METHOD(2): PROC FREQ*************************************;
proc freq data = TEST; 
   format _CHAR_ $missfmt.; 
   tables _CHAR_ / missing missprint nocum nopercent;
   format _NUMERIC_ missfmt.;
   tables _NUMERIC_ / missing missprint nocum nopercent;
run;
*******************METHOD(3): PROC IML***************************************;
proc iml;
   use test;
   read all var _NUM_ into x[colname=nNames]; 
   n = countn(x,"col");
   nmiss = countmiss(x,"col");
   read all var _CHAR_ into x[colname=cNames]; 
   close one;
   c = countn(x,"col");
   cmiss = countmiss(x,"col");
   Names = cNames || nNames;
   rNames = {"    Missing", "Not Missing"};
   cnt = (cmiss // c) || (nmiss // n);
   print cnt[r=rNames c=Names label=""];
quit;
********************END OF ALL CODING*****************************************;

Sunday, September 11, 2011

Add 10 buttons to enhance SAS 9.3 environment

One striking change in SAS 9.3 is that everything is HTMLized. Adding user-defined buttons to the toolbar of SAS’s windowing environment would make coding more efficient.

1. Clear log and result
The most used SAS command is now prompted by a button.


output;clear;log;clear;wpgm;

2. Clear files left by results viewer
Lots of trash (mostly HTML files and PNG-formatted images by SAS 9.3's results viewer) would permanently remain in the current folder after each SAS session (the current folde is indicated on the Window Bar at the bottom). To avoid the possibility that someday the hard disk is jammed, clearing them out at times may be a good habit.


gsubmit "options noxsync noxwait;x 'del *.htm';x 'del *.png';"

3. Clear work directory
Everything in the WORK directory can be cleared out during a SAS session by such an approach.


gsubmit "proc catalog c=work.sasmacr kill force;proc datasets kill nolist;quit;"

4. Turn off ODS graphics
This code line closes ODS graphics for the statistics procedures if needed, and then saves resource expense.


gsubmit "ods graphics off;"

5. Separate html
SAS 9.3 stores all results in a very long html file that is what we see in the results viewer new html files. Ken Kleinman mentioned that running this code would generate a new html file.


gsubmit "ods html close; ods html;"

6. Listing on
One click to go back to the listing output if with output-intensive job.


gsubmit "ods _all_ close; ods listing;"

7. Increase DPI
The original image resolution in SAS 9.3 is set at low level for processing efficiency. Actually DPI (dots per inch) in SAS can be raised as high as 600. Graphs with 300 dpi would be decent for regular paperwork.


gsubmit"ods html image_dpi=300;"

8. Change style
The default style in SAS 9.3 is htmlblue, which is an elegant and smooth html template. However, it has weaknesses: while it draws scatter plots, the overlapping dots tend to be indistinguishable. My favorite for scatting plots is 'harvest'. Aligning them together would see the difference. SAS 9.3 has many other styles. I like 'navy' or 'money' for pictures that need high contrast. The styles 'journal/2/3' may be considered for publishing purpose.


gsubmit"ods html style=harvest;"

9. Recover formchar
If you see tables on listing separated by weird charaters, it is time to click this button.


gsubmit "options formchar='|----|+|---+=|-/\<>*';"

10. Community help
Searching the key words in SAS-L is probably the most efficient way to find answers for any SAS question. Another option is Lex Jansen that searches from total 15925 SAS conference proceeding papers.


wbrowse 'http://listserv.uga.edu/archives/sas-l.html'

Those user-defined buttons can be exported and imported by PROC CATALOG.


********EXPORT****************************;
proc catalog;
   copy in=sasuser.profile out=work.backup;
   select sasedit_main / et=toolbox;
quit;
********IMPORT****************************;
proc catalog;
   copy in=work.backup out=sasuser.profile;
   select sasedit_main / et=toolbox;
quit;

Sunday, August 28, 2011

My 10 wishes for SAS

No wonder that SAS 9.3 is one of SAS’s greatest products. Beyond it, I have 10 secret wishes and hope the far-away SAS 9.3.2 or SAS 9.4 might realize them someday.

1. Trigger for SAS dataset
A wonderful thing about SAS is that it can be used as a RDBMS with full functionality. Just one piece from the SQL language is missing in SAS - trigger. Adding triggers would bring more security for SAS datasets or data views, and automate some routine operations.

2. PROC PIVOT for pivot table
Pivot table is a huge business success, since I found that every boss loves to do point-and-click data aggregation in Excel (why not they just use the simple PROC FREQ of SAS?). I often spend many hours to painfully decorate a pivot table. A procedure that directly exports SAS’ dataset to pivot-table-contained Excel spreadsheet is going to be a big plus.

3. Visible hash objects
Hash object offers an efficient alternative to the hard disk based Data Step programming. Michele Burlew’s new book this fall would be a milestone for this emerging technology since SAS 9.1. If SAS windowing environment provides the views for the hash objects through the library explorer, matching or lookup on the hash objects would be better perceived.

4. A multi-threading LOGISTIC procedure
Last week Rick introduced how to open multiple workspace instances to make SAS/IML multi-threading. For many SAS users, PROC LOGISTIC is worthy of the half price they paid for SAS. It seems that SAS is developing a multi-threading HPLOGISTIC procedure for Teradata or in-database technology. However, at the age of big data, a multi-threading LOGISTIC procedure is still very much desired in SAS/STAT.

5. A system options for decimal digits
It is well known that to export the tables by ODS and then change the formats would allow displaying more decimal places for SAS’s outputs. However, a system option specifying the number of digits in the result would save coding time.



/*******************OPTION 1 ***********************************************

* From NOTE 37106 at support.sas.com/kb/37/106.html 

* How can I display more or fewer decimal places in procedure results

* -- Change the decimal places for a table 

****************************************************************************/

ods output parameterestimates = pe;

proc reg data = sashelp.class;

   model Weight = Height;

run; 

proc print data = pe label noobs; 

   format _numeric_ 12.8 ;

run;



/*******************OPTION 2 ***********************************************

* From Robin High at http://listserv.uga.edu/cgi-bin/wa?A2=ind0704b&L=sas-l&P=73559

* Re: How to display p value with more dicimal digits

* -- Change  the decimal places for all p-values 

****************************************************************************/

proc template;

   define column Common.PValue;

      format = pvalue12.8; 

   end;

run;

proc reg data = sashelp.class;

   model Weight = Height;

run;

6. Support vector machine in SAS/STAT
Although SAS Enterprise Miner 7.1 has two procedures: PROC SVM and PROC SVMSCORE, they seem primitive and only apply for bivariate response variable. A procedure for SVM in SAS/STAT alongside the robust GLM-based procedures there would relieve many desperate SAS coders who got projects to do SVM with SAS.

7. A trial version (or learning edition)
One friend of mine is still using SAS 6 and insists that it is the culminating product. If there is a trial edition of SAS 9.3 available he can taste, he probably will change the idea. A learning edition with no cost or little cost can attract more people to start to learn SAS.

8. PROC TEXTCLOUD
Text cloud is a fancy visualization tool, although it does nothing about statistics. For example, Rick summarized his first 100 blog posts by it. R also has a nice package ' wordcloud'. A text cloud procedure would definitely make SAS more fun.

9. Random forest in Enterprise Miner
Random forest is one of the popular classification methods, which has not been included in Enterprise Miner 7.1 yet. Hope in the future it could become one of the modeling nodes.

10. Reasonably priced SAS cloud
Amazon is earning a windfall of money by its cloud services. And many start-ups provide R clouds. A PC SAS or UNIX SAS cloud may be a lucrative business for SAS. And I will be happy to show SAS to my friends on iPad or Android phone.

Wednesday, August 24, 2011

A macro design pattern by PROC FCMP

We all heard horrible stories that someone tried to piece together a bunch of nice functioning macros for a big macro, and ended up with a messy and undebuggable system. Part of reasons can be about encapsulation: not all SAS programmers have the good habit to localize the macro variables by %local statement; the leaking macro variables may ruin the attempt to utilize multiple macros written by different SAS programmers. To solve this problem, Mark Tabladillo brought the concepts of encapsulation, polymorphism, and inheritance into nested macros. And he raised several design patterns for macros to emulate the object-oriented languages.

The FCMP procedure, besides its original purpose as a function compiler, could encapsulate macros by its RUN_MACRO function. The macro-based functions seem to be more safe modules than the macros themselves. Erin, Daniel and Himesh in their SAS Global 2011 paper showed an example to build a complicated reporting system for academic performances. Their principle is to construct a few macro-embedded functions by PROC FCMP and then incorporate them with an interface macro. Here I modified their codes a little to increase the number of macros and showed the relationship among the elements in the UML diagram above. The stucture is similar to the adapter pattern, one of the many OOP design patterns, with PROC FCMP as a wrapper.

Overall, functionalizing our macros or our colleagues’ macros by PROC FCMP is an alternative way to integrate them for a ‘big’ purpose.


 
/*******************READ ME*********************************************
* THE CODES BELOW ARE COPIED AND MODIFIED FROM ERIN LYNCH, DANIEL 
* O’CONNOR, HIMESH PATEL OF SAS INSTITUTE 
*
* THE ORIGINAL CODE AND RAW DATA CAN BE FOUND FROM THEIR PAPER 
* MY REPORTING REQUIRES A FULL STAFF—HELP! 
* PAPER 291, SAS GLOBAL FORUM 2011
* support.sas.com/resources/papers/proceedings11/291-2011.pdf
*
****************END OF READ ME******************************************/

%macro tabledata_prep;
   options topmargin=.125in bottommargin=.125in leftmargin=.25in rightmargin=.25in nodate nonumber;  
   title; footnote;
   ods escapechar="~"; 
   %let tabledata=%sysfunc(dequote(&tabledata.)); 
   data tabledata; 
      set &tabledata; 
      district=substr(district,1,8)||' '||substr(district,9,1); 
      school=substr(school,1,6)||' '||substr(school,7,1); 
   run; 
%mend;

%macro linedata_prep;
   %let linedata=%sysfunc(dequote(&linedata.)); 
   ods _all_ close; 
   data linedata; 
      set &linedata; 
      district=substr(district,1,8)||' '||substr(district,9,1); 
   run; 
   proc sort data= linedata out=sorted_linedata; 
      by district year; 
   run; 
   proc sort data= linedata out=districts(keep=district) nodupkey; 
      by district; 
   run; 
%mend;
                                       
proc template; 
   define style Styles.Charter; 
      parent = styles.printer; 
      style Body from Document 
         "Undef margins so we get the margins from the printer or SYS option"  / 
            marginbottom = _undef_ 
            margintop = _undef_ 
            marginright = _undef_ 
            marginleft = _undef_ 
            pagebreakhtml = html('PageBreakLine') 
            backgroundimage="Your.png"; 
   end; 
run; 
 
%macro Newfile; 
  %if &path ne '' %then %let pathopt= path=&path(url=none); 
   %else  %let pathopt=; 

   %if &gpath ne '' %then %let gpathopt= gpath=&gpath(url=none); 
   %else %let gpathopt=; 

   %let path=%sysfunc(dequote(&path.)); 
   %let gpath=%sysfunc(dequote(&gpath.)); 
   %let destination=%sysfunc(dequote(&destination.)); 
   %let file=%sysfunc(translate(%sysfunc(dequote(&file.)), "_", " ")); 
   %let extension=%sysfunc(dequote(&extension));      
 
   %if &styleparm ne '' %then %let styleopt= style=%sysfunc(dequote(&styleparm.)); 
   %else  %let styleopt=; 

    %if ( %upcase(&destination) eq PDF ) %then %do; 
       ods &destination file="&path.&file..&extension" notoc startpage=no 
       &styleopt; 
   %end; 
   %else %if (( %upcase(&destination) eq RTF ) or ( %upcase(&destination) eq TAGSETS.RTF )) %then %do; 
       ods &destination file="&path.&file..&extension" startpage=no &styleopt; 
   %end; 
   %else %if ( %upcase(&destination) eq HTML ) %then %do; 
       ods &destination file="&file..&extension" &pathopt &gpathopt &styleopt; 
   %end; 
%mend; 
                                                 
%macro Enrollment; 
   %let district=%sysfunc(dequote(&district.)); 
   ods text="~{newline 3}"; 
   ods text="~{style [width=100pct font_size=26pt background=CXf4e9c9] &district Enrollment By School Year}"; 
   ods text="~{newline 2}"; 
   ods text="~{style systemtitle [just=center]Enrollment by Year}"; 
   ods graphics / height=3in width=6in; 
   proc sgplot data=sorted_linedata(where=(district="&district")); 
        series x=year y=students / markers 
        lineattrs=(color=CX39828C pattern=SOLID thickness=3) 
        markerattrs=(color=CX0000FF symbol=STARFILLED) name='series'; 
   run; 
%mend; 

%macro District_Makeup; 
   %let district=%sysfunc(dequote(&district.)); 
   ods text="~{newline 6}"; 
   ods text="~{style [width=100pct font_size=26pt background=CXf4e9c9]Current Year Percentage Of Students By School}"; 
   proc report data=tabledata(where=(district="&district")) nowd 
         style(report)={frame=void font_size=12pt rules=none backgroundcolor=CXF4E9C9 
         cellpadding=0 cellspacing=0}; 
      define district / noprint; 
      define students / noprint; 
      define total_enrollment / noprint; 
      define school / '' style(column)={width=5.5in}; 
      define percent / '' style(column)={width=.5in} right; 
   run; 
%mend; 

%macro Closefile; 
   %let destination=%sysfunc(dequote(&destination.)); 
   ods &destination close; 
%mend; 

proc fcmp outlib=work.fncs.submit; 
   function tabledata_prep(tabledata $); 
      rc = run_macro('tabledata_prep', tabledata); 
      return(rc); 
   endsub; 
   function linedata_prep(linedata $); 
      rc = run_macro('linedata_prep', linedata); 
      return(rc); 
   endsub; 
   function Enrollment(district $); 
      rc = run_macro('Enrollment', district ); 
      return(rc); 
   endsub; 
   function District_Makeup(district $); 
      rc = run_macro('District_Makeup', district ); 
      return(rc); 
   endsub; 
   function Newfile( destination $, path $, gpath $, file $, extension $, styleparm $ ); 
      rc = run_macro('Newfile', destination, path, gpath, file, extension, styleparm ); 
      return(rc); 
   endsub; 
   function Closefile( destination $ ); 
      rc = run_macro('CloseFile', destination ); 
      return(rc); 
   endsub; 
run; quit; 
 
%macro Academic_Performance_Report (linedata =, tabledata = , destination=, path=, gpath=, extension=, style= );        
   options mlogic mprint; 
   %if ( "&extension" eq "" ) and ( &destination ne "" ) %then %let extension =&destination; 
   options cmplib=work.fncs;       
   data _null_;
      rc = tabledata_prep(symget('tabledata')); 
      rc = linedata_prep(symget('linedata'));
   run;
   data _null_;
      set districts;                    
      rc = Newfile( symget('destination'), symget('path'), symget('gpath'), 
         cats(district, "_Annual_Performance"), symget('extension'), symget('style'));                                 
      if ( rc eq 0) then do; 
         rc = Enrollment( district ); 
         rc = District_Makeup( district ); 
         rc = Closefile(symget('destination')); 
      end;  
   run; quit;
%mend; 
 
%Academic_Performance_Report(linedata = data1, tabledata = data2, destination=html, path=, gpath=, extension=, style=Charter );

Sunday, August 14, 2011

Using PROC COPULA in a more volatile market

The last week witnessed one of the wildest fluctuations in the market. Copula could measure the nonlinear dependence of multiple assets in a portfolio, and most importantly, is pronounced as \`kä-pyə-lə\(Thanks to the tip by Rick). The latest COPULA procedure in SAS 9.3 is one of the emerging tools to implement copulas.

To test it, I used R to download the daily return data for a few stock shares plus S&P500 index prices, since January 01, 2010. The six stocks are Citi group(C), JP Morgan(jpm), Pfizer(pfe), IBM(ibm), Apple(aapl), and Google(goog). I constructed an equally weighted portfolio by them. Until August 12, 2011, there are 406 observations. Therefore, in a composite plot by SAS, the stocks of banks show the greatest volatility, followed by pharmaceutical and high-tech companies.


#*********************(0) DOWNLOAD MARKET DATA***********************;
library("tseries")    
sp500= get.hist.quote(instrument="^gspc",start="2010-01-01",quote="AdjClose")
c    = get.hist.quote(instrument="c",    start="2010-01-01",quote="AdjClose") 
jpm  = get.hist.quote(instrument="jpm",  start="2010-01-01",quote="AdjClose") 
pfe  = get.hist.quote(instrument="pfe",  start="2010-01-01",quote="AdjClose")
ibm  = get.hist.quote(instrument="ibm",  start="2010-01-01",quote="AdjClose")
aapl = get.hist.quote(instrument="aapl", start="2010-01-01",quote="AdjClose")
goog = get.hist.quote(instrument="goog", start="2010-01-01",quote="AdjClose")

result=as.data.frame(diff(log(merge(sp500, c, jpm, pfe, ibm, aapl, goog))))
write.csv(result,file='c:/tmp/r2sas.csv')

**********************(1) INPUT RAW DATA*****************************;
options fullstimer; dm 'output;clear; log;clear;';
data raw;
   infile 'c:/tmp/r2sas.csv' delimiter = ',' missover dsd firstobs=2 ;
   informat date yymmdd10. sp500 c jpm pfe ibm aapl goog best32.; 
   format date date9.;
   input date sp500 c jpm pfe ibm aapl goog;
run;

**********************(2) PLOT STOCK RETURNS*************************;
proc transpose data = raw out = test01;
   by date;
   var c jpm pfe ibm aapl goog;
run;
data test02;
   merge test01 raw(keep=date sp500);
   by date;
run;
ods graphics / antialiasmax=2900;
proc sgpanel data = test02;
   panelby _name_ / spacing=5 columns = 3 rows = 2 novarname;
   series y=sp500 x=date / lineattrs=(color=red);
   series y=col1  x=date / lineattrs=(color=blue);
   refline 0/ axis=y lineattrs=(pattern=shortdash);
   rowaxis label = 'daily returns';
   label col1 = 'individual stock' ;
run;

I followed the online document of this procedure and also chose the t copula. The correlation plots of the fitted data are displayed above. It seems that PROC COPULA could only draw up to 5*5 matrix for scatter plots in my test. I don’t know if there is any parameter to activate since I have 6 stocks.


**********************(3) CALCULATE COPULA***************************;
proc copula data = raw(drop=sp500); 
   var c jpm pfe ibm aapl goog;
   fit t / marginals = empirical
           method = mle
           plots = (data = both matrix);
   simulate / ndraws = 10000
            seed = 20100822
            out  = sm_t;
run;

Then the kernel densities between stocks from the simulated dataset were calculated in SAS and plotted in R.

 
**********************(4) CALCULATE KERNEL DENSITIES*****************;
%macro kernel(var_list = );
   ods select none;
   %do i = 1 %to 5;
      %do j = %eval(&i + 1) %to 6;
         %let var1 = %scan(&var_list, &i);
         %let var2 = %scan(&var_list, &j);
         proc kde data= sm_t ;
            bivar &var1 &var2 / out = _tmp01;
         run;
         %if %eval(&i + &j) = 3 %then %do;
            data comb;
               set _tmp01;
            run;
         %end;
         %else %do;
            data comb;
               set comb _tmp01;
            run;
         %end;
      %end;
   %end;
   ods select all;
%mend;
%kernel(var_list = c jpm pfe ibm aapl goog);

data comb1;
   set comb;
   length gname $15;
   gname = cats('x=', var1, ';', 'y=', var2);
   keep value1 value2 gname density;
run;
proc export data = comb1 outfile = 'c:/tmp/sas2r.csv' replace;
run;

#*********************(5) PLOT DENSITY IN R**************************;
x = read.csv('c:/tmp/sas2r.csv')
library('lattice')
wireframe(density ~ value1 * value2 | gname , x, shade = TRUE,
          screen = list(z = -30, x = -50), lwd = 0.01, 
          xlab = "Stock X", ylab = "Stock Y",  
          zlab = "Density")

The simulated daily portfolio returns are likely to follow a normal distribution.


**********************(6) PLOT RETURN DISTRIBUTION*******************;
data port_ret (drop = i ret);
   set sm_t;
   array returns{6} c jpm pfe ibm aapl goog;
   ret =0;
   do i =1 to 6;
      ret = ret+ (1/6)*exp(returns[i]);
   end;
   port_ret = ret-1;
run;

proc kde data = port_ret;
   univar port_ret / percentiles = 1,2.5,5,10,90,95,99 plots=histdensity;
   ods output  percentiles = pcts;
   format port_ret percent8.3;
   label port_ret = 'portfolio return';
run;

Several predicted portfolio changes at given probabilities are given in a tilt plot. Hope this portfolio’s performance next day (August 15, 2011) would be within the expected ranges.


**********************(7) PLOT PORTFOLIO RETURNS*********************;
data prob;
   set pcts;
   percent = percent / 100;
   if percent > 0.5 then do ;
      percent  = 1 - percent ;
    result = put(percent , percent8.1)||
         'probability portfolio gains'|| put(port_ret, percent8.3);
   end;
    else result = put(percent , percent8.1)||
         'probability portfolio loses'|| put(port_ret, percent8.3);
run;

goptions device=javaimg ftitle="arial/bold" ftext="arial" 
htitle=.15in htext=.2in xpixels=600 ypixels=500;
proc gtile data = prob;
    tile Percent tileby = (result, Percent) / colorvar = port_ret;
   format port_ret 8.4;
   label port_ret = 'portfolio return';
run;

PROC COPULA supports five types of copulas(normal copula, t copula, clayton copula, Gumbel copula and Frank copula). Jan Chvosta described a ranking method to choose the best copula. I can easily apply the author's protocol.

Overall, PROC COPULA has much lower learning curve than the R package ‘copula’. Hope it grows to a dominating tool in analyzing copula.

Monday, August 1, 2011

10 interesting discoveries from SAS 9.3

1. SAS 9.2 and SAS 9.3 coexist
I thought that a new installation of SAS 9.3 would automatically uninstall the old SAS 9.22 on my laptop. Amazingly, the two editions live together peacefully. The directory for SAS 9.3 is SASHOME while SAS 9.22’s is SAS. I can even run them simultaneously, and they don't bother to interfere each other. Anyway, Enterprise Guide 4.3 replaced the old 4.2 edition after this upgrade.

2. Zero-configuration for Enterprise Miner 7.1
In SAS 9.2 era, installation of Enterprise Miner was a nightmare for me. I can’t imagine somebody other than a trained technician can install it properly. In SAS 9.3, Enterprise Miner 7.1 workstation becomes a part of the default installation (of course you should have the license). There seems no hassle to make it work well.

3. Default HTML output is surprisingly speedy
At the beginning, I seriously doubted on this change, since calling a browser to open HTML pages was intolerably slow in the SAS 9.2 windowing environment. However, HTML output in SAS 9.3 is impressively fast, and I didn’t feel any significant difference between the HTML and the listing. However, I suggest keeping the default HTMLBlue as the HTML style. I tried other styles and they are not that fast as HTMLBlue -- possibly it has been optimized for this feature.

4. Much faster installation
I used to spend an entire afternoon to install SAS 9.22 on my ancient laptop. The medium for SAS 9.22 was 4 DVD discs and I was so exhausted to change the discs from time to time. For SAS 9.3 I used only one hour with 3 discs. It is going to be a big relief for those SAS administrators or IT support.

5. New plotting statements come with no learning curve
I can draw bubble plot or waterfall plot simply according to the SG procedures’ syntax. Sanjay Matange told that SAS 9.3 can also do many other kinds of plots, such as heat map, attribute map, clustered graph, etc, which I will certainly explore. And I pretty expect to see his new book about ODS Graphics in SAS 9.3 later this year.

6. One click to Microsoft Office
Another reason I keep HTML as output destination is that it has really cool interface to send the pieces toward Microsoft Office. For example, I can right click on an output table by PROC PRINT to save it as an Excel sheet. Or I can right click on a graph toward OneNote. Previously in SAS 9.22 I have to write a few lines of codes to do this job.

7. The first question: ODS Graphics on or off
The first time I ran SAS 9.3, a window popped up and asked if I wanted to switch off ODS Graphics. Believe it or not, every statistical procedure could generate dazzling high-quality ODS graphs. However, the trade-off is that it costs more computer resources. A regression by PROC REG from weight to height on SASHELP.CLASS takes 44 seconds. Thus I opt for off.

8. Import spreadsheet from Excel 2007/2010 officially
In many years, to transform an xlsx spreadsheet to a CSV or xls format and import is a routine work for SAS users. In SAS 9.22, a hidden trick is to use PROC IMPORT to input Excel 2007/2010 formats. Now we can see that it has an option in the import menu. Finally SAS 9.3 shows friendship to Excel 2007/2010.

9. New tool in the radar: ODS Graphics Designer
This interface would allow users to create ODS graphs by point-and-click. It can also be invoked by the %sgdesign() command. With this nice facility, I am able to find and export the underlying Graph Template Language (GTL) codes from ODS Graphics.

10. Help documents have a multiple-tab head
Everything in SAS 9.3 has been HTMLized. I like the new multiple-tab head for each SAS procedure in the documents, which should be more helpful in locating desired information.

In conclusion, SAS 9.3 is the real athlete for production purpose, while SAS 9.22 looks like an experimental warm-up release.