backup: proc fcmp

Showing posts with label proc fcmp. Show all posts

Thursday, June 27, 2013

The complexity for Fibonacci numbers in SAS

Fibonacci numbers have a lot of applications. There have been a lot of interesting researches regarding Fibonacci numbers since 800 years ago. Mathematically they are the numbers from a sequence, which is defined as $F_n = F_{n-1} + F_{n-2}\hspace{5}with\hspace{5}F_0 =0, \hspace{2} F_1=1$ .

Experiment and Result

Fibonacci numbers can be derived from either the original definition that is a typical example of recursion, or a mathematically simplified form that is called closed-from expression. In SAS, a user-defined Fib1 function in PROC FCMP can realize the recursion expression, and another Fib2 function is created to express the closed-form expression. I insert both functions to a loop from 10 to 40 with step size of 5 and record the real time costed.

According to the figure above, both algorithms have no actually difference while N is relative small. When N becomes 30 or bigger, the recursion expression function spends a lot of system time. When N is even greater than 40, SAS enters a freezing status and I have to break from the command manually. Thus, the curve for the recursion expression seems to fit an exponential relationship fairly well. On the contrary, the algorithm of closed-from expression takes almost constant time for the execution,

Complexity

Recursion expression
$T(n) = T(n-1) + T(n-2) + O(1)$ , which may imply $O(2^{n-1}) + O(2^{n-2}) + O(1) = O(2^n)$ .
Closed-form expression
The equation is fairly straightforward and described as $F_n= {\phi^n-(-\phi)^{-n}\over\sqrt{5}}\hspace{2}with\hspace{2} \phi={(1+\sqrt{5})\over2}$ . Here $\phi$ is the well-known golden ratio. Its complexity is a constant, which suggests $O(n) = O(n-1) = O(1)$ .

Conclusion

Mathematics sometimes significantly decreases the complexity of the computation, in this case, from $O(2^n)$ to $O(1)$ . It is also useful for us to estimate how much resources are needed for specific purpose.


****(1) Bulid user-defined functions *****************;
proc fcmp outlib = work.test.fibonacci;  
   * Create the 1st function based on recursion; 
   function fib1(n);
      if n<0 then return(0);
      else if n = 1 or n = 2 then return(1);
      else return(fib1(n-1) + fib1(n-2));
   endsub;
   * Create the 2nd function based on closed form; 
   function fib2(n);
       phi = (1 + sqrt(5))/2;
       return(round((phi**n - (1-phi)**n) / sqrt(5)));
   endsub;
quit; 

****(2) Run the tests sequentially********************;
options cmplib = work.test fullstimer;
%macro test();
   * Create datasets for verification
   %do j = 10 %to 40 %by 5;
      data _fib1_&j;
         do i = 1 to &j;
            x = fib1(i);
            output;
         end;
         %put Recursion method at &j;
      run;
      data _fib2_&j;
         do i = 1 to &j;
            x = fib2(i);
            output;
         end;
         %put Closed form method at &j;
      run;
   %end;
%mend;

* Output SAS log to text file;
proc printto log = "c:\tmp\test.log" new;
run;
%test()
proc printto;
run;

****(3) Display the results***************************;
proc datasets nolist;
   delete _:;
quit;

data one;
  infile "c:\tmp\test.log" truncover;
  input @1 line $80.;
  if index(line, 'real') > 0;
run;

data two;
   set one;
   length method $20.;
   retain number 5;
   * Ignore the time used by PROC PRINTTO;
   if _n_ > 1; 
   if mod(_n_ , 2) = 0 then do; 
      method = "Recursion"; 
      number + 5;
   end;
   else method = "Closed form";
   time = input(substr(line, 21, 5), 5.2);
run;

proc sgplot data = two;
   series x = number y = time / group = method;
   yaxis label = "Time elapsed by seconds" grid;
run;

Thursday, March 8, 2012

Top 10 reasons for a modeler to learn PROC FCMP

10. New financial functions
34 pre-compiled financial functions were shipped with SAS for free, which can be called in DATA step.

9. A management GUI
There is a nice-looking Java-powered GUI to manage the user-defined functions: SAS FCmp function editor.

8. Dynamic parameters
Parameters from other statistical procedures in SAS can be passed to a function. We can use a macro to wrap PROC FCMP to create tons of data-driven functions (temporary or permanent).

7. Migration from VBA
Excel dominates every desktop. However, using VBA to process large data is just a pain, such as calculating a transition matrix for probability of default. PROC FCMP is built on the logic of VBA -- easy to port any Excel macro/functions to SAS.

6. Backtesting
PROC FCMP loads data into memory. For backtesting task, it is much more efficient than hard-disk based DATA step.

5. Function/subroutine compiler
A lot of functions/subroutines SAS doesn’t have can be built, such as Binomial tree function or an asset price simulation subroutine.

4. Encapsulation
To combine a lot of macros/nested macros for a complicated system, encapsulating the macro variables is a serious concern – PROC FCMP may be a rescue.

3. Monte Carlo simulation
The MCMC procedure, the designated procedure for Monte Carlo simulation in SAS, fully supports the syntax of PROC FCMP with its BEGINCNST and ENDCNST statements.

2. Deal with matrix
As part of SAS/BASE, PROC FCMP, allows all kinds of matrix operations everywhere with a SAS system.

1. Low level computation
The combination of PROC FCMP and PROC PROTO together wraps C functions in SAS. It is interesting to compare the efficiency between a C function and a SAS function.

Thursday, March 1, 2012

Rolling regressions for backtesting

Market always generates huge volume time series data with millions of records. Running regressions to obtain the coefficients in a rolling time window is common for many backtesing jobs. In SAS, writing a macro based on the GLM procedures such as PROC REG is not an efficient option. We can imagine the situations: calling PROC REG thousands of times in a big loop would easily petrify any system.

The better way is to go down to the bottom to re-implement the OLS clichés: inverse, transpose and multiply the vectors and matrices. We can do it in either PROC IML, DATA step array or PROC FCMP. For such attempts PROC IML is really powerful but needs extra license. DATA step array would require very high data manipulation skills, since it is not designed for matrix operations. PROC FCMP, a part of SAS/BASE, seems like a portable solution for SAS 9.1 or later. To test this method, I simulated a two-asset portfolio with 100k records, and under a 1000-obs long rolling window, eventually ran 99,001 regressions. The time cost was just 10 seconds on an old laptop. Overall, the speed is quite satisfying.


/* 1 -- Simulate a fund and russell2000 index */
data simuds;
   _beta0 = 0.2;
   _beta1 = 0.6;
   _mse = 0.8;
   format day date9.;
   do day = today() - 20000 to today();
     russell2000 = rannor(1234);
     myfund = _beta0 + _beta1*russell2000 + _mse*rannor(3421);
     output;
   end;
   drop _:;
run;
 
proc print data = simuds(obs = 100);
run;
 
/* 2 -- Decide length of rolling window */
%macro rollreg(data = , wlength = , benchmark = , fund = , out = , rfree = );
   data _1;
      set &data nobs = nobs;
      &benchmark = &benchmark - &rfree;
      &fund = &fund - &rfree;
      call symput('nloop', nobs - &wlength + 1);
      call symput('nobs', nobs);
   run;
   %put &nloop &nobs;
   /* 3 -- Manipulate matrices */
   proc fcmp;
      /* Allocate spaces for matrices */
      array input[&nobs, 2] / nosym;
      array y[&wlength] / nosym;
      array ytrans[1, &wlength] / nosym;
      array xtrans[2, &wlength] / nosym;
      array x[&wlength, 2] / nosym;
      array sscp[2, 2] / nosym;
      array sscpinv[2, 2] / nosym;
      array beta[2] / nosym;
      array result[&nloop, 4] / nosym;
      array beta_xtrans_y[1] / nosym;
      array xtrans_y[2, 1] / nosym;
      array ytrans_y[1] / nosym;
      
      /* Input simulation dataset */
      rc1 = read_array("_1", input, "&benchmark", "&fund");
 
      /* Calculate OLS regression coefficients and r square */
      do j = 1 to &nloop;
         ytotal = 0;
         do i = 1 to &wlength;  
            xtrans[2, i] = input[i+j-1, 1];
            xtrans[1, i] = 1;
            y[i] = input[i+j-1, 2];
          ytotal = ytotal + y[i];
         end;  
         call transpose(y, ytrans);
         call mult(ytrans, y, ytrans_y);
         call transpose(xtrans, x);
         call mult(xtrans, x, sscp);
         call inv(sscp, sscpinv);
         call mult(xtrans, y, xtrans_y);
         call mult(sscpinv, xtrans_y, beta);
         call mult(beta, xtrans_y, beta_xtrans_y);
         nymeansquare = ytotal**2 / &wlength;
         result[j, 1] = beta[1];
         result[j, 2] = beta[2];
         result[j, 3] = (beta_xtrans_y[1]-nymeansquare) / (ytrans_y[1]-nymeansquare);
         result[j, 4] = j;
      end;
 
      /* Output resulting matrix as dataset */
      rc2 = write_array("&out", result, 'beta0', 'beta1', 'rsquare', 'day');
      if rc1 + rc2 > 0 then put 'ERROR: I/O error';
      else put 'NOTE: I/O was successful';
    quit;
%mend;
 
%rollreg(data = simuds, wlength = 50, benchmark = russell2000, fund = myfund, out = result, rfree = 0.005);
 
ods graphics on / noborder;
/* 4 -- Visualize result */
proc sgplot data = result;
   needle x = day y = beta1;
   yaxis label = 'Systematic Risk';
run;
 
proc sgplot data = result;
   needle x = day y = beta0;
   yaxis label = "Jensen's alpha";
run;
 
proc sgplot data = result;
   needle x = day y = rsquare;
   yaxis label = "Coefficient of Determination";
run;

Thursday, February 9, 2012

Cholesky decomposition to "expand" data

Yesterday Rick showed how to use Cholesky decomposition to transform data by the ROOT function of SAS/IML. Cholesky decomposition is so important in simulation. For those DATA STEP programmers who are not very familiar with SAS/IML, PROC FCMP in SAS may be another option, since it has an equivalent routine CALL CHOL.

To replicate Rick’s example of general Cholesky transformation for correlates variables, I randomly chose three variables from a SASHELP dataset SASHELP.CARS and created a simulated dataset which shares the identical variance-covariance structure. A simulated dataset can be viewed as an “expanded’ version of the original data set.

Conclusion:
In PROC FCMP, for memory's sake, don’t allocate many matrices (or arrays). A better way is to use CALL DYNAMIC_ARRAY routine to resize a used matrix, which is similar to the ReDim statement in VBA. A VBA programmer can easily migrate to SAS through PROC FCMP.


proc corr data=sashelp.cars cov outp=corr_cov plots=scatter;
   var weight length mpg_city;
run;

data cov;
   set corr_cov;
   where _type_ = 'COV';
   drop _:;
run;

proc fcmp;  
   /* Allocate space for matrices*/
   array a1[3, 3] / nosymbols;
   array a2[3, 3] / nosymbols;
   array b1[3, 1000] / nosymbols;
   array b2[3, 1000] / nosymbols;

   /* Simulate a matrix by normal distribution*/
   do i = 1 to 3;
      do j = 1 to 1000;
         b1[i, j] = rannor(12345);
      end;
   end;

   /* Read the covariance matrix*/
   rc1 = read_array('cov', a1);
   call chol(a1, a2);
   put a2;
   call mult(a2, b1, b2);

   /* Output the result matrix*/
   call dynamic_array(b1, 1000, 3);
   call transpose(b2, b1);
   rc2 = write_array('result', b1);
quit;

proc corr data=result cov plots=scatter;
run;

Wednesday, August 24, 2011

A macro design pattern by PROC FCMP

We all heard horrible stories that someone tried to piece together a bunch of nice functioning macros for a big macro, and ended up with a messy and undebuggable system. Part of reasons can be about encapsulation: not all SAS programmers have the good habit to localize the macro variables by %local statement; the leaking macro variables may ruin the attempt to utilize multiple macros written by different SAS programmers. To solve this problem, Mark Tabladillo brought the concepts of encapsulation, polymorphism, and inheritance into nested macros. And he raised several design patterns for macros to emulate the object-oriented languages.

The FCMP procedure, besides its original purpose as a function compiler, could encapsulate macros by its RUN_MACRO function. The macro-based functions seem to be more safe modules than the macros themselves. Erin, Daniel and Himesh in their SAS Global 2011 paper showed an example to build a complicated reporting system for academic performances. Their principle is to construct a few macro-embedded functions by PROC FCMP and then incorporate them with an interface macro. Here I modified their codes a little to increase the number of macros and showed the relationship among the elements in the UML diagram above. The stucture is similar to the adapter pattern, one of the many OOP design patterns, with PROC FCMP as a wrapper.

Overall, functionalizing our macros or our colleagues’ macros by PROC FCMP is an alternative way to integrate them for a ‘big’ purpose.


 
/*******************READ ME*********************************************
* THE CODES BELOW ARE COPIED AND MODIFIED FROM ERIN LYNCH, DANIEL 
* O’CONNOR, HIMESH PATEL OF SAS INSTITUTE 
*
* THE ORIGINAL CODE AND RAW DATA CAN BE FOUND FROM THEIR PAPER 
* MY REPORTING REQUIRES A FULL STAFF—HELP! 
* PAPER 291, SAS GLOBAL FORUM 2011
* support.sas.com/resources/papers/proceedings11/291-2011.pdf
*
****************END OF READ ME******************************************/

%macro tabledata_prep;
   options topmargin=.125in bottommargin=.125in leftmargin=.25in rightmargin=.25in nodate nonumber;  
   title; footnote;
   ods escapechar="~"; 
   %let tabledata=%sysfunc(dequote(&tabledata.)); 
   data tabledata; 
      set &tabledata; 
      district=substr(district,1,8)||' '||substr(district,9,1); 
      school=substr(school,1,6)||' '||substr(school,7,1); 
   run; 
%mend;

%macro linedata_prep;
   %let linedata=%sysfunc(dequote(&linedata.)); 
   ods _all_ close; 
   data linedata; 
      set &linedata; 
      district=substr(district,1,8)||' '||substr(district,9,1); 
   run; 
   proc sort data= linedata out=sorted_linedata; 
      by district year; 
   run; 
   proc sort data= linedata out=districts(keep=district) nodupkey; 
      by district; 
   run; 
%mend;
                                       
proc template; 
   define style Styles.Charter; 
      parent = styles.printer; 
      style Body from Document 
         "Undef margins so we get the margins from the printer or SYS option"  / 
            marginbottom = _undef_ 
            margintop = _undef_ 
            marginright = _undef_ 
            marginleft = _undef_ 
            pagebreakhtml = html('PageBreakLine') 
            backgroundimage="Your.png"; 
   end; 
run; 
 
%macro Newfile; 
  %if &path ne '' %then %let pathopt= path=&path(url=none); 
   %else  %let pathopt=; 

   %if &gpath ne '' %then %let gpathopt= gpath=&gpath(url=none); 
   %else %let gpathopt=; 

   %let path=%sysfunc(dequote(&path.)); 
   %let gpath=%sysfunc(dequote(&gpath.)); 
   %let destination=%sysfunc(dequote(&destination.)); 
   %let file=%sysfunc(translate(%sysfunc(dequote(&file.)), "_", " ")); 
   %let extension=%sysfunc(dequote(&extension));      
 
   %if &styleparm ne '' %then %let styleopt= style=%sysfunc(dequote(&styleparm.)); 
   %else  %let styleopt=; 

    %if ( %upcase(&destination) eq PDF ) %then %do; 
       ods &destination file="&path.&file..&extension" notoc startpage=no 
       &styleopt; 
   %end; 
   %else %if (( %upcase(&destination) eq RTF ) or ( %upcase(&destination) eq TAGSETS.RTF )) %then %do; 
       ods &destination file="&path.&file..&extension" startpage=no &styleopt; 
   %end; 
   %else %if ( %upcase(&destination) eq HTML ) %then %do; 
       ods &destination file="&file..&extension" &pathopt &gpathopt &styleopt; 
   %end; 
%mend; 
                                                 
%macro Enrollment; 
   %let district=%sysfunc(dequote(&district.)); 
   ods text="~{newline 3}"; 
   ods text="~{style [width=100pct font_size=26pt background=CXf4e9c9] &district Enrollment By School Year}"; 
   ods text="~{newline 2}"; 
   ods text="~{style systemtitle [just=center]Enrollment by Year}"; 
   ods graphics / height=3in width=6in; 
   proc sgplot data=sorted_linedata(where=(district="&district")); 
        series x=year y=students / markers 
        lineattrs=(color=CX39828C pattern=SOLID thickness=3) 
        markerattrs=(color=CX0000FF symbol=STARFILLED) name='series'; 
   run; 
%mend; 

%macro District_Makeup; 
   %let district=%sysfunc(dequote(&district.)); 
   ods text="~{newline 6}"; 
   ods text="~{style [width=100pct font_size=26pt background=CXf4e9c9]Current Year Percentage Of Students By School}"; 
   proc report data=tabledata(where=(district="&district")) nowd 
         style(report)={frame=void font_size=12pt rules=none backgroundcolor=CXF4E9C9 
         cellpadding=0 cellspacing=0}; 
      define district / noprint; 
      define students / noprint; 
      define total_enrollment / noprint; 
      define school / '' style(column)={width=5.5in}; 
      define percent / '' style(column)={width=.5in} right; 
   run; 
%mend; 

%macro Closefile; 
   %let destination=%sysfunc(dequote(&destination.)); 
   ods &destination close; 
%mend; 

proc fcmp outlib=work.fncs.submit; 
   function tabledata_prep(tabledata $); 
      rc = run_macro('tabledata_prep', tabledata); 
      return(rc); 
   endsub; 
   function linedata_prep(linedata $); 
      rc = run_macro('linedata_prep', linedata); 
      return(rc); 
   endsub; 
   function Enrollment(district $); 
      rc = run_macro('Enrollment', district ); 
      return(rc); 
   endsub; 
   function District_Makeup(district $); 
      rc = run_macro('District_Makeup', district ); 
      return(rc); 
   endsub; 
   function Newfile( destination $, path $, gpath $, file $, extension $, styleparm $ ); 
      rc = run_macro('Newfile', destination, path, gpath, file, extension, styleparm ); 
      return(rc); 
   endsub; 
   function Closefile( destination $ ); 
      rc = run_macro('CloseFile', destination ); 
      return(rc); 
   endsub; 
run; quit; 
 
%macro Academic_Performance_Report (linedata =, tabledata = , destination=, path=, gpath=, extension=, style= );        
   options mlogic mprint; 
   %if ( "&extension" eq "" ) and ( &destination ne "" ) %then %let extension =&destination; 
   options cmplib=work.fncs;       
   data _null_;
      rc = tabledata_prep(symget('tabledata')); 
      rc = linedata_prep(symget('linedata'));
   run;
   data _null_;
      set districts;                    
      rc = Newfile( symget('destination'), symget('path'), symget('gpath'), 
         cats(district, "_Annual_Performance"), symget('extension'), symget('style'));                                 
      if ( rc eq 0) then do; 
         rc = Enrollment( district ); 
         rc = District_Makeup( district ); 
         rc = Closefile(symget('destination')); 
      end;  
   run; quit;
%mend; 
 
%Academic_Performance_Report(linedata = data1, tabledata = data2, destination=html, path=, gpath=, extension=, style=Charter );

Saturday, June 4, 2011

The eve of Lehman Brothers' demise

Recently Moody’s warned the US government to degrade its credit rating if the nation’s debt limit increase is not approved [Ref. 1]. The news came right after Standard & Poor’s lowered US’s sovereign rating from AAA to AA. Those rating changes suggest the accumulation of default risk and may cause some butterfly effect.

Before the start of this Great Recession, without much notice, Lehman Brothers’ default probability increased drastically according to a classic model by Merton [Ref. 2]. And this change failed to be disclosed by either Standard & Poor’s rating or the stock price. Here I translated Gunter and Peter’s VBA code for Merton’s model [Ref. 3] into SAS code and reproduced the trend plot. The result clearly shows that implementation of probability and statistics may catch alert within the narrow 'escape' window, and therefore help avoid or mitigate risks .

References:
1. ‘Moody’s Warns of Downgrade for U.S. Credit’. The New York Times. 02JUN2011
2. Merton, R.C. ‘On the pricing of corporate debt: The risk structure of interest rates’. The Journal of Finance. 29(2): 449-470. 1974
3. Gunter Loeffler and Peter Posch. ‘Credit Risk Modeling using Excel and VBA’. The 2nd edition. Wiley.


/*******************READ ME*********************************************
* - The eve of Lehman Brothers' demise -
*
* SAS VERSION:    9.2.2
* DATE:           04jun2011
* AUTHOR:         hchao8@gmail.com
****************END OF READ ME******************************************/

proc fcmp outlib = work.cg.func;
   function cg_ps(s, sigma_s, d, lambda, sigma_b, t);
      d1 = (s + lambda*d) * exp(sigma_b ** 2) / (lambda *d);
      alpha = (((sigma_s*s) / (s + lambda * d))**2 * t + sigma_b**2)**0.5;
      x = probnorm(-(alpha/2) + (log(d1)/alpha)) - 
          d1 * probnorm(-(alpha/2) - (log(d1) / alpha));
      return(x);
   endsub;
run;

data record;
   input @1 date $7. @11 sp 4.2 @18 dps 20.8 @32 Vol30d  4.2 @43 s_p $2.;
   cards;
   Q4 2003   36.11  69.95612419   25.26      A+
   /*To buy Gunter and Peter's book will have the full data*/
   Q2 2008   36.81        127.8   99.93      A  
;;;
run;

options cmplib = work.cg;
data scored;
   set record;
   global_rcv = 0.5;
   vol_barrier = 0.1;
   time = 1;
   vol30d = vol30d / 100;
   pd = 1 - cg_ps(sp, vol30d, dps,  global_rcv, vol_barrier, time);
   label sp = 'Lehman Brothers'' stock price'
         pd = 'Probability of default';
run;

proc sgplot data = scored;
   series x = date  y = sp;
   series x = date  y = pd / y2axis;
   yaxis values= (0 to 90 by 10) label = 'stock price($)';
   y2axis values= (0 to 0.4 by 0.05) grid 
          label = 'Default rating by Merton''s model';
   refline  'Q4 2005' / axis = x;
   inset "Time period when S&P rating as A" / position = topright ;
   inset "Time period when S&P rating as A+" / position = topleft;
run;

Sunday, May 29, 2011

Monte Carlo and quasi-Monte Carlo simulations for credit risk

In SAS, we can do quasi-Monte Carlo simulation by Halton’s low-discrepancy sequences, which are better than random variables from uniform distribution . The only procedure in SAS 9.2 to produce Halton’s sequence is probably the MDC procedure in SAS/ETS [Ref. 2] but hard to output. However, we can build a user-defined function by PROC FCMP to realize our goals. I showed an example by modifying the codes from Paolo Brandimarte’s Matlab book [Ref. 1] to estimate the area of an arbitrary surface. Overall, Halton’s method would converge with more sampling points, while uniform randomization will cause estimates fluctuating along the real value.

/*******************READ ME*********************************************
* - QUASI-MONTE CARLO SIMULATION BY FUNCTIONAL PROGRAMMING -
*
* VERSION:     SAS 9.2(ts2m0), windows 64bit
* DATE:        02apr2011
* AUTHOR:      hchao8@gmail.com
*
****************END OF READ ME*****************************************/

****************(1) MODULE-BUILDING STEP******************;
******(1.1) COMPILE USER-DEFINED FUNCTIONS*************;
proc fcmp outlib = sasuser.qmc.funcs;
   /***********************************************************
   *   FUNCTION:    h = halton(n, b)
   *      INPUT:    n = the nth number  |  b =  the base number
   *     OUTPUT:    h = the halton's low-discrepancy number
   ***********************************************************/
   function halton(n, b);
      n0 = n;
      h = 0;
      f = 1 / b;
       do while (n0 > 0);
         n1 = floor(n0 / b);
         r   = n0 - n1*b;
         h  = h + f*r;
         f  =  f / b;
         n0 = n1;
      end;
      return(h);
   endsub;

   /***********************************************************
   *   FUNCTION:    z = surface1(x, y)
   *      INPUT:    x, y = independent variables 
   *     OUTPUT:    z = response variable 
   ***********************************************************/
   function surface1(x, y);
      pi = constant("PI");
      z = exp((-x)*y) * (sin(6*pi*x) + cos(8*pi*y));
      return(z);
   endsub;
run;

******(1.2) BUILD A 3D PLOTTING MACRO*************;
%macro plot3d(ds = , x = , y = , z = , width = , height = , zangle = );
   ods html style = money;
   ods graphics / width = &width.px height = &height.px imagefmt = png;
   proc template;
     define statgraph surfaceplotparm;
       begingraph;
         layout overlay3d / cube = true rotate = &zangle;
           surfaceplotparm x = &x y = &y z = &z 
                        / surfacetype = fill surfacecolorgradient = &z;
         endlayout;
       endgraph;
     end;
   run;

   proc sgrender data = &ds
                 template = surfaceplotparm;
   run;
   ods graphics off;
   ods html close;
%mend;

****************END OF STEP (1)******************;

****************(2) PROGRAMMING STEP**********************;
******(2.1) IMPORT USER-DEFINED FUNCTIONS*************;
option cmplib = (sasuser.qmc);

******(2.2) DISTRIBUTIONS BETWEEN HALTON AND UNIFORM RANDOM NUMBERS*;
data test;
    do n = 1 to 100;
      do b = 2, 7;
         halton_random_number  = halton(n, b);
         uniform_random_number = ranuni(20110401);
         output;
      end;
   end;
run; 
   
proc transpose data = test out = test1;
   by n;
   var halton_random_number uniform_random_number;
   id b;
run;

******(2.3) GENERATE DATA FOR PLOTTING FUNCTION'S SURFACE****;
data surface;
   do x = 0 to 1 by 0.01;
      do y = 0 to 1 by 0.01;
         z = surface1(x, y);
         output;
      end;
   end;
run;

******(2.4) CONDUCT QUASI-MONTE CARLO SIMULATION***;
data simuds;
   do i = 1 to 50;
       do n = 1 to 100*i;
         do b = 2, 7;
            halton_random_number = halton(n, b);
            uniform_random_number = ranuni(20110401);
            output;
         end;
      end;
   end;
run; 

proc transpose data = simuds out = simuds_t;
   by i n;
   id b;
   var halton_random_number uniform_random_number;
run;

data simuds1;
   set simuds_t;
   x = surface1(_2, _7);
run;

proc sql;
   create table simuds2 as
   select i, _name_, mean(x) as area label = 'Area by simulation'
   from simuds1
   group by i, _name_
;quit;

****************END OF STEP (2)******************;

****************(3) VISUALIZATION STEP*******************************;
******(3.1) PLOT DATA BY STEP 2.2*************;
ods html style = money;
proc sgscatter data = test1;
   plot _2*_7 / grid group = _name_;
   label _2 = 'Sequences with 2 as base'
         _7 = 'Sequences with 7 as base' 
         _name_ = 'Methods';
run;
ods html close;

******(3.2) PLOT DATA BY STEP 2.3*************;
%plot3d(ds = surface, x = x , y = y, z = z, width = 800, height = 800, zangle = 60)

******(3.3) PLOT DATA BY STEP 2.4*************;
ods html style = ocean;
proc sgplot data = simuds2;
   series x = i y = area / group = _name_ ;
   refline 0.0199 / axis = y label = ('Real area');
   xaxis label = 'Random numbers --- ×100'; 
   label _name_ = 'Methods';
run;
ods html close;

****************END OF STEP (3)******************;

****************END OF ALL CODING***************************************;

First, we can use Monte Carlo simulation to evaluate the risk of a portfolio of credit instruments. As for a risk manager, the common question is to answer the probability of the losses for a portfolio over certain time period. To conduct a Monte Carlo simulation, four variables, such as Probability of default(PD), Loss given default(LGD), Exposure at default(EAD), Weight(correlations among individual credit events), are required. Here I simulated a portfolio with 100 obligors. Then I ran a 10000- loop for MC and calculated the loss at given percentiles. The histogram of the portfolio loss is displayed above.


data raw;
   format pd lgd percent8.;
   w = 0.3;
   do i = 1 to 50;
      PD = 0.01;
      LGD= 0.5;
      EAD= 100;
      output;
   end;
   do i = 51 to 100;
      PD = 0.05;
      LGD= 0.5;
      EAD= 100;
      output;
   end;
   label pd = 'Probability of default'
        lgd = 'Loss given default'
        ead = 'Exposure at default'
        w   = 'Weight';
run;

libname mem 'c:\tmp' memlib;
data mem.raw;
   set raw;
run;

%macro mc(cycle = );
   filename junk dummy;
   proc printto log=junk; run;
   %do i = 1 %to &cycle;
      %if %eval(&i) = 1 %then %do;
         proc sql;
            create table mem.result (sumloss num);
         quit;
      %end;   
      data mem._tmp01;
         set mem.raw;
         retain z;
         if _n_ = 1 then z = rannor(0);
         d = probit(pd);
         a = w*z + (1-w**2)**0.5*rannor(0);
         if a < d then loss = lgd*ead;
         else loss = 0;
      run;
      proc sql;
         insert into mem.result 
         select sum(loss) from mem._tmp01;
      quit;
   %end;
   proc univariate data = mem.result noprint;
      output out = mem.pct pctlpts = 90 95 99 99.5 pctlpre = loss;
   run;
   proc printto; run;
%mend;
%mc(cycle = 10000);

References:
1. Paolo Brandimarte. Numerical methods in finance. John Wiley & Sons. 2002.
2. SAS/ETS 9.2 user’s guide. SAS Publishing. 2008

Thursday, April 7, 2011

Using SAS to find the best k for k-Nearest Neighbor classification

Thursday, March 31, 2011

Optimize many-to-one mapping by user-defined functions

In many occasions, fast access into a lookup table to find desired value is necessary. In computer science, linked list, associative array, and hash table are widely used to construct the relationship between values and keys. Hash function, like value <-- index = Function(key), is essential to build such a hash table. Improving the hash function’s performance is pretty challenging and rewarding [Ref. 1]. In SAS, macro may be utilized to substitute function. However, macro would be failed in front of some cases, such as f(x1) + g(x2) or f(g(x)). Function or functional programming is still a better choice. With the user-defined function complier, Proc Fcmp, building reliable functions in SAS to map many keys to a single value looks promising.

However, SAS’s hash object is not applicable for coding interactive or reusable hash functions. The concept of hash object is introduced since SAS version 9, and it is ‘the first truly runtime dynamic, memory-resident DATA step structure’ [Ref. 2]. Evidence shows that it is robust, and most importantly, faster than other hard-disk based lookup solutions. And this technology is vigorously growing: SAS’s hash object has more methods, and can even accept duplicate keys [Ref. 3]. The biggest problem for SAS’s hash object is that it is transient and not savable. Hash object has to be declared within a data step to load data. If there is any other query, it has to be loaded again. If with frequently invocation, the loading factor will be formidable. As the result, SAS’s hash object is particularly useful for batch processing of lookup tasks, say, finding many key-value pairs simultaneously. That is probably why SAS programmers like to use hash object in merging datasets.

Thus, mapping many keys to one value by user-defined functions has to rely on other methods. Senior SAS programmers may prefer SAS arrays with help of the POINT option. However, SAS’s array is still transient. SQL and Proc Format are the options available. SQL is interactive and does not rely on any particular data type. Format is a unique data type in SAS created by Proc Format, which is permanent and exchangeable with a common SAS dataset. To test the efficiency of many-to-one mapping by user-defined functions, first a dataset of a million records with 2 keys and 1 value was simulated, and three functions were defined and complied by Proc Fcmp. Then those functions were repeatedly called for 10000 times. For SQL-based function, invoking each time is equivalent to querying the raw dataset once. On my notebook, the total time cost of running SQL-based function is more than 3 minutes, which is intolerable. Adding indexes to both keys of the lookup table will largely decrease the time expense to 1/6 of the original time. In addition, loading the lookup table into the memory, by the SASFILE statement, would improve the efficiency of the function a little further. Since Proc Format only allows one key, the two keys need to be concatenated as a composite key before finalizing the lookup format. Amazingly, calling of Format-based function 10000 times only spends less than 2 seconds. As the result, the Format-based function is the champion of this test. Understandably, the memory consumption is proportional to the functions’ efficiency: faster speed means more memory requirement. Much more than other methods, the Format-based function used 200 MB memories. I guess that the ‘format’ data type may be loaded into memory and stayed there during the processing time, which slashed the loading factor. In conclusion, Proc Format produced ‘format’ is not only an alternative solution to merge large datasets, but also is a reliable foundation to define key-value pairs by many-to-one mapping functions.

References:
1. Hash table on Wikimedia. http://en.wikipedia.org/wiki/Hash_table
2. Elena Muriel. Hashing Performance Time with Hash Tables. SAS Global 2007.
3. Robert Ray and Jason Secosky. Better Hashing in SAS 9.2. SAS Global 2008.

/*******************READ ME****************************************
* --- OPTIMIZE MANY-TO-ONE MAPPING BY USER-DEFINED FUNCTIONS ----
*
* HARDWARE:    a notebook with amd64 cpu 2.0g, 3g ram 
* SAS:         SAS 9.2(TS2M0), pc 64bit
* 
* TEST PASSED: 31mar2011
* AUTHOR:      hchao8@gmail.com
*
****************END OF READ ME************************************/

************(1) GENERATE TEST DATASET*********************;
******(1.1) SIMULATE A DATASET OF 1M RECORDS WITH 2 KEYS*********;
data raw;
  do key1 =1 to 1000;
    do key2 = 1 to 1000;
      value = ranuni(20100322) + rannor(20100322);
      output;
    end;
  end;
run;

******(1.2) DERIVE INDEXED DATASET*********;
proc sql;
  create table indexed as select * from raw;
  create index key1 on indexed(key1);
  create index key2 on indexed(key2);
quit;

******(1.3) TRANSFORM DATASET INTO FORMAT*********;
****(1.3.1) CAST VALUE FROM NUMBER TO CHARACTER*****;
proc format;
  picture key low-high = '0000'(fill = '0');
run;
****(1.3.2) COMBINE 2 KEYS TO 1 COMPOSITE KEY******;
data keycombined;
  set raw;
  length keystr $8;
  keystr = cats(put(key1, key.), put(key2, key.));
  drop key1 key2;
run;
****(1.3.3) DERIVE FORMAT DATASET*****;
data fmtds;
  set keycombined;
  rename keystr = start 
          value = label;
  fmtname = '$myfmt';
  type = 'C';
run;
****(1.3.4) INCORPORATE FORMAT DATASET TO BUILD FORMAT *****;
proc format cntlin = fmtds;
run;

**************END OF STEP (1)*****************;

**************(2) CREATE USER-DEFINED FUNCTIONS***********;
*******(2.1) BUILD THE FIRST FUNCTION************;
****(2.1.1) THE EMBEDDED MACRO************; 
option mstored sasmstore = sasuser;
%macro myfunc1_macro / store source;
  %let key1 = %sysfunc(dequote(&key1));
  %let key2 = %sysfunc(dequote(&key2));
   proc sql noprint;
      select value into: value
      from raw
      where key1 = &key1
          and key2 = &key2
  ;quit;
%mend;
****(2.1.2) CREATE THE FUNCTION AND OUTPUT***********;
proc fcmp outlib = sasuser.keyvalue.funcs;
   function myfunc1(key1, key2);
      rc = run_macro('myfunc1_macro', key1, key2, value);
      if rc eq 0 then return(value);
      else return(.);
   endsub;
run;

*******(2.2) BUILD THE SECOND FUNCTION************;
****(2.2.1) THE EMBEDDED MACRO************; 
option mstored sasmstore = sasuser;
%macro myfunc2_macro / store source;
  %let key1 = %sysfunc(dequote(&key1));
  %let key2 = %sysfunc(dequote(&key2));
   proc sql noprint;
      select value into: value
      from indexed
      where key1 = &key1
          and key2 = &key2
  ;quit;
%mend;
****(2.2.2) CREATE THE FUNCTION AND OUTPUT***********;
proc fcmp outlib = sasuser.keyvalue.funcs;
   function myfunc2(key1, key2);
      rc = run_macro('myfunc2_macro', key1, key2, value);
      if rc eq 0 then return(value);
      else return(.);
   endsub;
run;

*******(2.3) BUILD THE THIRD FUNCTION************;
proc fcmp outlib = sasuser.keyvalue.funcs;
  function myfunc3(key1, key2);
    key = put(cats(put(key1, key.), put(key2, key.)), $8.);
    value = put(key, $myfmt.);
    return(value);
   endsub;
run;

**************END OF STEP (2)*****************;

**************(3) TEST USER-DEFINED FUNCTIONS***********;
****(3.1) CREATE A TEST MACRO TO RUN FUNCTIONS 10000 TIMES******;
%macro test(num);
  option cmplib = (sasuser.keyvalue) mstored sasmstore = sasuser;
  data test;
     do x = 101 to 200;
        do y = 301 to 400;  
           z = myfunc&num(x, y);
           output;
        end;
     end;
  run;
%mend;

****(3.2) TEST THE FIRST FUNCTION********;
%test(1);

****(3.3) TEST THE SECOND FUNCTION********;
%test(2);

****(3.4) TEST THE SECOND FUNCTION WITH IN-MEMORY LOOKUP TABLE********;
sasfile indexed open;
%test(2);
sasfile indexed close;

****(3.5) TEST THE THIRD FUNCTION********;
%test(3);

**************END OF STEP (3)*****************;

*************************END OF ALL CODING***************************;

Friday, March 18, 2011

Array 2.0: matrix-friendly array in Proc Fcmp

Array is probably the only number-indexed data type in SAS. Interestingly, SAS and R, the major statistical softwares, use 1 instead of 0 to specify the first element. SAS programmers adopt array mostly for multiple-variable batch-processing. For example, longitudinal summation can be achieved by specifying a one-dimensional array and then adding all array elements together. On the contrary, general-purpose programming languages, like Java, Python, C++, widely use array or list to store and index data. The programmers who newly switch into SAS often feel confused about the limitation of SAS’ array, which is transient and need output to the parental data step to save result. Since data step does not memorize the array’s original structure, multiple dimensional arrays are almost useless in creating more complex data structures. In SAS, those multiple dimensional arrays are occasionally implemented to reshape data [Ref. 1]. If a matrix, such as a big one with random numbers of 5000 rows by 5000 columns [Example 1], is needed, Other programming languages would use int[][] to declare a matrix. In SAS, 2D array, such as a [5000, 5000] array, may be intuitively chosen to perform this task. However, the result is not desired, and the right way is still to apply a one-dimensional array.

Array in Proc Fcmp is an upgraded array data type other than data step array. This new version of array allows not only creating matrices but also indexing matrix. In addition, the communication between data step and Proc Fcmp is fast and convenient: this procedure supplies the READ_ARRAY() and WRITE_ARRAY() functions, which transform a dataset to an array and vice versa. For example, to test an interviewee’s knowledge on SAS’s data step array, a typical interview question is to ask her/him to write a 9X9 multiplication table [Example 2]. The expected solution is to place the OUTPUT statement between the inner Do-loop and outer Do-loop. Placing OUTPUT into the inner layer of Do-loops builds an 81*9 monster, while ignoring OUPTUT would only generate the last row of multiplication table. This task is much simpler in Proc Fcmp: just produce a matrix and write it to a dataset. Proc Fcmp is a nice alternative to SAS/IML, a specialized matrices language module in SAS. For instance, a typical SAS/IML kind of job, such as filling missing cells in a matrix (or a dataset) with its diagonal elements, can be fulfilled by arrays in Proc Fcmp [Example 3]. Proc Fcmp is shipped in SAS/BASE, which means that no extra out-of-pocket money is needed for another license. Another concern is the efficiency of SAS/IML module. It is reported that frequently calling of SAS/IML's functions would decrease system speed dramatically[Ref. 3].

The matrix feature of Proc Fcmp’s array benefits other SAS programming areas. For example, Proc Transpose and data step array usually reshape data structure from either long to wide or wide to long. However, in many cases that positions between observation and variable in a dataset have to be exchanged, the two methods may require several try-and-error steps. The TRANSPOSE() subroutine in Proc Fcmp solves such a problem easily [Example 4]. Currently there are 13 functions or subroutines available in Proc Fcmp for matrices operation[Ref. 2]. Some may complain they are still not enough for their particular need. Don’t forget: Proc Fcmp makes user-defined functions and subroutines! For example, to simulate set() function in Python, a deldup_array() function, based on encapsulating Proc SQL in a macro by RUN_MACRO(), can delete duplicate elements in array[Example 5]. Therefore, users of Proc Fcmp’s array can always construct and accumulate their tools to suit their purpose.

References: 1. UCLA Statistics Course. http://www.ats.ucla.edu/stat/sas/library/multidimensional_arrays.htm
2. SAS 9.2 Online Help. http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a003193719.htm
3. Wang, Songfeng; Zhang, Jiajia. Developing User-Defined Functions in SAS®: A Summary and Comparison. SAS Global 2011.

******(1) EXAMPLE 1: CREATE A 5000X5000 RANDOM MATRIX****;
****(1.1) WRONG 2D ARRAY SOLUTION******;
data matrix1;
   array v[5000, 5000];
   do i = 1 to 5000;
     do j = 1 to 5000;
       v[i, j] = ranuni(0);
     end;
     output;
   end;
run;  

****(1.2) CORRECT 1D ARRAY SOLUTION*****;
data matrix2;
   array x[5000];
   do i = 1 to 5000;
     do j = 1 to 5000;
       x[j] = ranuni(0);
     end;
     output;
   end;
run;

******(2) EXAMPLE 2: CREATE A MULTIPLICATION TABLE******;
****(2.1) DATA STEP ARRAY SOLUTION ********;
data mt1;
   array a[9] a1-a9;   
   do row = 1 to 9;           
      do col= 1 to 9;    
         if row ge col then a[col]=row*col;  
      end;
     output;        
   end;
   drop row col;
run;

****(2.2) PROC FCMP ARRAY SOLUTION*****;
proc fcmp;
   array a[9, 9] / nosymbols;
   do row =1 to 9;
      do col = 1 to 9;
         if row ge col  then a[row,  col] = row*col;
      end;
   end;
   rc1 = write_array('mt2', a);
quit;

****(3) EXAMPLE 3: FILL MISSING CELL WITH DIAGONAL ELEMENT******;
****(3.0) INPUT RAW DATA****;
data have;
   input x1-x4;
   datalines;
   1 . . .
   2 1 . . 
   3 4 1 .
   7 6 5 1
;
run;

****(3.1) PROC FCMP TRANSPOSITION******;
proc fcmp;
   array a[4, 4] / nosymbols;
   rc1 = read_array('have', a);
   do i = 1 to 4;
      do j = 1 to 4;
         if missing(a[i, j]) = 1 then a[i, j] = a[j, i];
      end;
   end;
   rc2 = write_array('want', a);
quit;

*****(4) EXAMPLE 4: RESHAPE SQUARE-SHAPE DATA********;
****(4.0) INPUT RAW DATA********;
data have1;
   input x1-x5;
   cards;
   1 . 0 1 1
   0 1 . 0 0
   .  . 1 1 1
   0 0 0 1 .
   . 0 0 1 1
;
run;

****(4.1) PROC TRANSPOSE SOLUTION******;
data trps1;
   set have1;
   obs = _n_;
run;

proc transpose data = trps1 out = trps2;
   by obs;
   var x1-x5;
run;

proc sort data = trps2 out = trps3;
   by _name_;
run;

proc transpose data = trps3 out = want1_1;
   by _name_;
   var col1;
run;

****(4.2) PROC FCMP SOLUTION********;
proc fcmp;
   array a[5, 5] / nosymbols;
   rc1 = read_array('have1', a);
   array b[5, 5] ;
   call transpose(a, b);
   rc2 = write_array('want1_2', b);
quit;

******(5) EXAMPLE 5: FCMP DEDUPLICATION FUNCTION FOR FCMP ARRAY*******;
****(5.1) ENCAPSULATE DEDUPLICATIONA UTILITY OF PROC SQL IN A MACRO ******;
%macro deldup_array;
   %let dsname = %sysfunc(dequote(&dsname));
   %let arrayname = %sysfunc(dequote(&arrayname));
   /*(5.1.1) GENERATE UNIQUE DATASET AND ITS OBSERVATION NUMBER*/
   proc sql noprint;
      select count(unique(&arrayname.1)) into: obs_num
      from &dsname;
      create table _temp as
      select distinct *
      from &dsname;
   quit;
   /*(5.1.2) USE TEMP DATASET TO REPLACE RAW DATASET*/
   data &dsname;
      set _temp;
   run;
   /*(5.1.3) DELETE TEMP DATASET*/ 
   proc datasets;
      delete _temp;
   run;
%mend deldup_array; 

****(5.2) ENCAPSULATE MACRO ABOVE IN A FUNCTION*****;
proc fcmp outlib=work.func.practice;
   function deldup_array(dsname $, arrayname $);
      rc = run_macro('deldup_array', dsname, arrayname, obs_num);
      if rc eq 0 then return(obs_num);
      else return(.);
   endsub;
run;

****(5.3) USE THIS FUNCION TO DELETE DUPLICATES IN ARRAY*****;
option cmplib = (work.func) mlogic mprint symbolgen; 
proc fcmp;
   array a[1000] /nosymbols;
   do j = 1 to 1000;
      a[j] = ceil((ranuni(12345)*100) + rannor(12345));
   end;
   
   dsname = 'numbers'; 
   rc1 = write_array(dsname, a);
   
   n = deldup_array(dsname,  %sysfunc(quote(a)));
   
   call dynamic_array(a, n);
   rc2 = read_array(dsname, a);
   put a = ;
quit;

*****************END OF ALL CODING****************************;

Friday, February 11, 2011

Proc Fcmp(4): Binomial tree vs. Black-Scholes model

The very truth is that SAS has limited financial functions. Thanks to SAS Institute, they finally added some option pricing functions in the base module of SAS 9.2, such as Black-Scholes put/call functions, Garman-Kohlhagen put/call functions, etc. Thus, the number of financial functions in the SAS toolbox adds up to more than 20 now.

Functions made easy by Proc Fcmp. In the finance’s brave new world of functions, financial function is ammo in a war. A unique and resourceful stockpile of functions is desired. Previously, in SAS, macro may seem like a reasonable substitute. However, when we deal with situations like evaluating y=h(x1)*f(x2), macro is feeble. The new procedure debuted in SAS 9.2, Proc Fcmp, provides us an easy way to create and accumulate functions, which will benefit people who are struggling to use SAS to solve finance problems in Data Step.

Proc Fcmp is used for a Binomial tree function of European call option pricing in this example. Once set up, it is very handy to use a function in the common Data Step: the function just acts like an inherent one. In the example, the exercise price ‘E’ is 50, the time to maturation 't' is 5 months, the share price ‘S’ is 50 again, the risk-free interest rate ‘r’ during t is 0.05, and the volatility ‘sigma’ is 0.3. Then the SAS native Black-Scholes option call function ‘blkshclprc’ and the newly ‘manufactured’ Binomial tree function ‘Eurocall’ powered by Proc Fcmp are compared. With the increasing of the layers ‘n’ of the Binomial tree model or the expanding of the tree branches, the option price fluctuates and gets closer to the price by Black-Scholes model. Finally it may converge to the ‘correct’ price. The results show that more steps may provide more accurate result for a Binomial tree model.

Everything in Proc Fcmp is encapsulated. Typically in designing a macro, we concern about the local or global variables. While dealing with a function by Proc Fcmp, it is argument in and argument out: no variable in a function would leak. All arrays in Proc Fcmp are also temporary unless they are indicated as output. Those features are friendly in making a workable function rapidly.

Reference: 1. Paolo Brandimarte. Numerical Methods in Finance and Economics: A MATLAB-Based Introduction. Wiley-Interscience, 2006.
2. SAS 9.2 Language Reference: Dictionary, Third Edition. SAS Publishing. 2009.

proc fcmp outlib = myfunc.finance.price;
    function Eurocall(E, t, F, r, sigma, N);
        deltaT=T/N;
        u=exp(sigma*sqrt(deltaT));
        d=1/u;
        p=(exp(r*deltaT)-d)/(u-d);
        array tree[1]/ nosymbols;
        call dynamic_array(tree, n+1, n+1);
        call zeromatrix(tree);
        
        do i=0 to N;
            tree[i+1,N+1]=max(0 , E*(u**i)*(d**(N-i)) - F);
        end;
        do j=(N-1) to 0 by -1;
            do  i=0 to j by 1;
                tree[i+1,j+1] = exp(-r*deltaT)* 
                        (p * tree[i+2,j+2] + (1-p) * tree[i+1,j+2]);
            end;  
        end;
        price = tree[1,1];
        return(price);
    endsub;
run;


*****(2)Use Binomial tree model and Black-Scholes model functions *****;
options cmplib = (myfunc.finance);
data test;
    BSprice=blkshclprc(50, 5/12, 50, 0.05, 0.3);
    do n=1 to 100;
        Treeprice=eurocall(50, 5/12, 50, 0.05, 0.3, n);
        output;
    end;
run;

***********(3)Display the comparision between the two functions***************;
proc sgplot data=test;
    title 'The comparison between Black-Sholes model and Binomial tree model';
    needle x=n y=Treeprice/baseline=4;
    series x=n y=BSprice/ lineattrs=(color=red);
    yaxis label='Option price';
run;
***************END***************TEST PASSED 12DEC2010**************;

Macro embedded function finds AUC

As a routine practice to reuse codes, SAS programmers tend to contain procedures in a SAS macro and pass arguments to them as macro variables. The result could be anything by data set and SAS procedure: figure, dataset, SAS list, etc. Thus, macro in SAS is like module or class in other languages. However, if repeated calling of a macro is about to accumulate some key values based on different input variables, the design of such a macro could be tricky. My first thought is to use a nested macro (child macro within parent macro) to capture the invisible macro variables floating everywhere in the environment. The idea is pretty daunting, since I have to consider the scopes of macro variables from different macro layers and naming of temporary datasets.

Now with Proc Fcmp, macro embedded function allows the utilization of SAS’s procedure within Data Step. One of the unique features inProc Fcmp is that it could encapsulate macros by its RUN_MACRO function and RC handle. Implementation of macro embedded function has some advantages. First, since a self-defined function is called in a data step, all values would be automatically saved in the dataset of this data step. Second, we can just keep the exactly return values we want. The ODS statement first specifies the output dataset to be temporally saved. Then the exact value as a macro variable will be transported to function as return argument. Third, Proc Fcmp would provide an exception-handling mechanism, depending on the macro’s output. In this post, I showed one example with a SAS help file ‘SASHELP.CARS’ (code below). This dataset has the information of some car models. I would like to see which variables, such as car’s length, weight, price, etc, could predict whether this car is made by US or NON-US countries. Values of AUC (area under ROC curve) from logistic regression were obtained and compared by this strategy.

The user-defined function library is accessible and reusable. SAS provides a four-level directory to save user-defined functions (library-1st level directory- 2nd level directory –function’s name), and also a two-level directory for a complied macro (library-macro’s name). As for a macro embedded function, the complied macros and functions should be stored together at the same place. Then next time, by specifying the path in SAS option statement, those functions can be instantly invoked.

Reference: Stacey and Jacques. Adding Statistical Functionality to the DATA Step with PROC FCMP. SAS Global 2010.

******(1) CREATE COMPLIED MACRO AND FUNCTION******;
****(1.0) SET A PATH TO STORE MACRO AND FUNCTION***;
libname myfunc 'h:\myfun';
option  mstored sasmstore=myfunc;

****(1.1) CREATE THE EMBEDDED MACRO*****;
%macro auc_logis_macro() /store source des='auc_logis_macro';
    /* NOTE: DEQUOTE THE INPUT STRINGS */
 %let ds_in = %sysfunc(dequote(&ds_in));
 %let target = %sysfunc(dequote(&target));
 %let input = %sysfunc(dequote(&input));
 /* NOTE: USE PROC LOGISTIC TO GENERATE AUC */
 ods graphics;
 ods output Association= temp;
 ods select association roccurve;
 proc logistic data=&ds_in descending plots(only)=roc ;
  model &target = &input  ; 
 run;
    ods graphics off;
 /*NOTE: ONLY CHOOSE AUC TO OUTPUT  */
 proc sql noprint;
  select nValue2 into: auc_value
  from temp
  where Label2 = 'c'
 ;quit;
 /*NOTE: KILL THE INTERIM DATASET  */
 proc datasets; delete temp;run;
%mend auc_logis_macro;

****(1.2) INCORPORATE MACRO INTO FUNCTION ****;
proc fcmp outlib = myfunc.macro.functions;
   function auc_logis(ds_in $, target $ , input $);
      rc = run_macro('auc_logis_macro', ds_in, target, input, auc_value);
      if rc eq 0 then return(auc_value);
      else return(.);
   endsub;
run;
*******END OF STEP (1)***********;


*****(2) USE THE MACRO EMBEDDED FUNCTION*******;
****(2.0) FIND PATH FOR STORED FUNCTION AND MACRO****;
libname myfunc 'h:\myfun';
option cmplib = (myfunc.macro) mstored sasmstore=myfunc mprint symbolgen;

****(2.1) CREATE A BINARY TARGET VAR FROM A SAS HELP DATASET****;
data test;
 set sashelp.cars;
 length country  $6;
 if origin = 'USA' then country = 'US';
 else country = 'NON-US';
run;

****(2.2) INVOKE THE FUNCTION TO EVALUATE MULTIPLE VARS OR VARS' COMBINATION****;
data auc_ds;
  ds_in='test';
  target = 'country';
  length input $70;
 do input = ' Weight' , ' Length', 'Horsepower', 'EngineSize',  'MPG_City', 
'Wheelbase', 'Weight Length', ' EngineSize Weight Length MPG_City Wheelbase Horsepower MPG_Highway';
  auc=auc_logis(ds_in ,target,  input); OUTPUT;
 end;
run;

****(2.3) COMPARE ALL AUC VALUES****;
data auc_ds1;
 set auc_ds;
 label auc ='AUC';
 if _n_=8 then input ='All numeric variables';
run;
proc sgplot data=auc_ds1;
 vbar input/response=auc;
 yaxis grid;
 refline 0.5;
run;

****END OF STEP (2)****;
*****************END OF THE PROGRAM************;

Monday, December 6, 2010

Proc Fcmp(3): brute force for a distribution's pdf

Proc Fcmp can produce arbitrary distribution formulas. In this example, suppose that I don’t know too much statistics, what if I want to evaluate the pdf of the absolute value of the subtraction between two independent random variable from the uniform distributions? In this example, probably we can follow such procedures: (1) use Proc Kde to find the kernel distribution and make a guess about what this distribution is; (2) use Proc Fcmp to make the pdf of the distribution; (3) use Proc Model to fit the test dataset and find the parameters; (4) use the simulation to validate the hypothesis. Thus, such a method may be a useable solution to evaluate distribution of a random variable, if without stringent mathematical proof.

*************(1) The distribution of the absolute value of between two uniform distribution*****;
data distance;
do i=1 to 10000;
x1=ranuni(0);
x2=ranuni(0);
x=abs(x1-x2);
drop i;
output;
end;
run;
proc sgplot;
histogram x1/transparency=0.3;
histogram x2/transparency=0.7;
histogram x/transparency=0.8;
run;

************(2) Observe the kernel distribution ************;
ods graphics on;
proc kde data=distance;
univar x /plots=all out=myden;
run;
ods graphics off;

***********(3) Fit with Beta distribution*****************;
proc fcmp outlib = sasuser.myfunc.math;
function betadist(a,b,x);
density=(beta(a,b))**(-1)*x**(a-1)*(1-x)**(b-1);
return(density);
endsub;
run;

options cmplib = sasuser.myfunc nodate ls=80;
proc model data=myden(rename=(value=x));
density=betadist(a,b,x);
fit density;
quit;

************(4) Look at the distribution again***************;
data newdata;
do i=1 to 1E5;
y=rand('beta', 1, 2);
output;
end;
run;

proc sgplot data=newdata;
histogram y/fillattrs=(color=green) ;
run;
********************END********************;

Saturday, December 4, 2010

Proc Fcmp(2): a subroutine for Binomial-CRR model

Problems: Quote for six-month American style euro currency options on plain vanilla, Max[S-K,0]and 〖Max[S-K,0]〗^0.5. Exchange rate S_0=$1.3721 /euro
Six-month continuously compounded inter-bank rates: r=0.4472%,r_f=1.2840%.
Assumptions:The exchange rate for euro follows an iid log normal price changes and volatility is constant.
Methodology:Binomial Model is used to price American currency options on euros.
We calculate the payoffs at time T and discount payoffs to the prior time step. Under the risk neutral probability measure,
c_(t-1)=(q×c_u+(1-q)×c_d)/R
Since these two options are American styles, we need to check for optimal early exercise at each node of the binomial tree. For these two currency options, we check Max[S-K,c_(t-1),0] and Max[〖Max[S-K,0]〗^0.5,c_(t-1) ] . Matlab is the software used to implement the binomial model.
--Parameters:1. Time steps n
As the number of time steps n increases, we can apply CRR model and the binomial model approaches real world price changes. We choose n=80,h=T/n=0.5/80=0.00625.
2. u and dWe choose CRR model to define u and d for binomial model.
u=e^(σ√h) ; d=1/u=e^(-σ√h)
Where h is the length of the binomial times step and σ is the annualized log price change volatility.
3. Annualized log price change volatility σThe daily log changes and daily squared log changes for two year exchange rates from 11/5/2008 – 11/5/2010 are as follows. We consider the volatility to be constant since 05/01/2009. Thus, we choose the historical prices from 05/01/2009-11/5/2010 and apply the volatility of the daily log changes as an estimate. Then the annualized log price change volatility equals the square root of the trading days in one year (252 days) times the daily log price change volatility.
σ=√252×σ_daily
4. Risk Neutral Measure QOptions on currencies can be regarded as an asset providing a yield at the foreign risk-free rate of interest. Thus, the risk neutral probability measure Q :
q=(e^((r-r_f ) )-d)/(u-d) ;
1-q= 〖u- e〗^((r-r_f ) )/(u-d)
5. Strike Price KSet strike price K from $1.3000/euro to $1.5000/euro with $0.005 per euro increments.
Reference: 1. John C. Hull.Options, Futures and Other Derivatives, 7th edition. Prentice Hall. 2008.
2. Base SAS 9.2 Procedures Guide. SAS Publishing. 2009

**********************AUTHOR(DAPANGMAO)----HCHAO8@GMAIL.COM***********************************;
****************(1) CONSTRUCT € TO $ EXCHANGE RATIO VECTOR******************;
data vector;
attrib value informat=dollar10.4 format=dollar10.4 ;
StrikeS=1.3000;  
StrikeE=1.5000;
Increment=0.005;
do value=StrikeS to StrikeE by  Increment;
drop StrikeS StrikeE Increment;
output;
end;
run;

**************(2) BUILD THE FUNCTION TO EVALUATE OPTION PRICES********;
proc fcmp outlib = myfunc.finance.subrout;
subroutine mysubr(T, n, r, rf, s0, sigma, c1[*], c2[*]);
/*Two vectors are output arguments*/
outargs c1, c2;
/*Inside calculation from input arguments*/
dt=T/n;
length=dim(c1);
u=exp(sigma*sqrt(dt));
d=1/u;
q=(exp((r-rf)*dt)-d)/(u-d);
/*Announce 4 arrays -- 1 vector and 3 matrixes */
array k[1]/ nosymbols;
array s[1] /nosymbols;
array x1[1] /nosymbols;
array x2[1] /nosymbols;
n=n+1;
/*The sizes of the arrays are specified*/
call dynamic_array(s, n, n);
call dynamic_array(x1, n, n);
call dynamic_array(x2, n, n);
call dynamic_array(k, length);
/*Read the exchange ratio into function*/
rc=read_array('vector', k);
/*Assign values to S matrix */
call zeromatrix(s);
S[1,1]=S0;
do _col=2 to n ;
do _row=1 to n;
S[_row,_col]=S0*u**(_col-_row)*d**(_row-1);
if _row gt _col then S[_row, _col]=0;
end;
end;
/*Generate final option vectors */
do i=1 to length;
x=k[i];
call zeromatrix (x1);
call zeromatrix (x2);
do j=1 to n;
x1[j,n]=max(S[j,n]-x,0);   
x2[j,n]=max(S[j,n]-x,0)**0.5;     
end;
do _col=(n-1) to 1 by -1;
do _row=1 to (n-1) by 1;
h=exp(-r*dt)*(q*x1[_row,_col+1]+(1-q)*x1[_row+1,_col+1]);
h2=exp(-r*dt)*(q*x2[_row,_col+1]+(1-q)*x2[_row+1,_col+1]);
x1[_row,_col]=max(S[_row,_col]-X,h,0);    
x2[_row,_col]=max( max(S[_row,_col]-x ,0) **0.5, h2);  
end;           
c1[i]=x1[1,1];
c2[i]=x2[1,1];
end;
end;
endsub;
run;
quit;

***********SOME INITIAL VALUES*****************;
/*T=1/2;*/
/*n=80;*/
/*r=0.004472;    */
/*rf=0.01284;*/
/*S0=1.3721;*/
/*Sigma=0.1074*/

**************(3) MEASURE THE LENGTH OF PRICING VECTOR**************;
proc sql;
select count(*) into: vecnum from vector;
quit;

**************(4) USE THE SUBROUTINE TO GENERATE TWO VECTORS********************;
options cmplib = (myfunc.finance);
data final;
array c1[&vecnum] _temporary_;
array c2[&vecnum] _temporary_;
call mysubr(0.5, 80, 0.004472, 0.01284, 1.3721, 0.1074, c1, c2);   /*subroutine mysubr(T, n, r, rf, s0, sigma, c1[*], c2[*]);*/
do i=1 to dim(c1);
c1value= c1[i];
c2value=c2[i];
output;
end;
run;
****************TEST PASSED ----------- END****************************;

Friday, October 29, 2010

Proc Fcmp(1): from VBA to SAS

Why use SAS in finance: SAS is a distinguished software package in statistics with more than 40-year development history. Starting from as a scripting tool to do ANOVA for agricultural experimental design in North Carolina, SAS has been heavily built on generalized linear model. For example, SAS institute consistently improve linear model procedures, from Proc Anova, Proc Glm, Proc Mixed to the latest Proc Glimmix. In a summary, SAS is pretty good at processing and analyzing any linear or non-linear models. However, the foundation for finance model, such as fixed income products and derivatives, is continuous-time equations, such as Black-Scholes formula. Most likely, quantitative analysts tend to price the products by solving those equations. So, in a word, the finance analyst is always working with equations, or many equations. Obviously here SAS is not good at it. Yes, SAS has more than 900 functions. And they are still not enough to keep up with the fast-pace of Wall Street. That is why the quants use Matlab, C++ and Excel VBA, instead of SAS. Then how the quants need to create their own equations in SAS? And how they build their function library or include the 3rd party library? Proc Fcmp may be the rescue.
Why Proc Fcmp? Finally we have Proc Fcmp, an equation editor. Proc Fcmp is a formidable tool for building function and even function library. All self-built or third party functions are stored in customer-specified package for future usage. Like Excel VBA, Proc Fcmp can construct equivalent subroutine and function. The nice thing is that all the function-based variables are encapsulated without any explicit declaration ( I hate nested macros: the variables would surf around from here to there). In addition, SAS Function Editor is an excellent tool viewer to manage and check all functions.
Conclusion: Look at the codes below, you see that Excel VBA and SAS Proc Fcmp are quite similar. A VBA developer can switch to SAS developer very smoothly in a short period. Also many people can work with a function package simultaneously through a distant SAS server, while each of them builds individual function. The quants may feel more comfortable to use SAS than VBA. Another good thing is that, by using Proc Proto, C++ function can be introduced into Proc Fcmp. That means that even C++ developer can also explore the turf of SAS language. Given that SAS is also a wonderful database management software, I expect that more and more people would embrace SAS through Proc Fcmp in the finance area.

Reference: Jørgen Boysen Hansen. Using the new features of Proc Fcmp in risk management at dong energy A/S. DONG Energy A/S.

'USE EXCEL VBA TO TO GRADE THE SCORES OF 28 STUDENTS
Function Grade(score)
If IsGrade(score) Then
Select Case score
Case Is <= 60  Grade = "F"             
Case 60.5 To 70   Grade = "D"            
Case 70.5 To 80   Grade = "C"            
Case 80.5 To 90   Grade = "B"             
Case IS > 90      Grade = "A"
End Select
Else
Grade = " "
End If
End Function

/*USE PROC FCMP TO GENERATE THE FUNCTION */
proc fcmp outlib=sasuser.myfunction.grade;
function grade(score);
select;
when (Score GE 90) return ("A");
when (Score GE 80) return ("B");
when(Score GE 70) return ("C");
when (Score GE 60) return ("D");
when (Score NE .)   return ("F");
otherwise;
end;
endsub;
run;
quit;
/*APPLY THE FUNCTION TO GRADE THE SCORES OF 28 STUDENTS*/
options cmplib=sasuser.myfunction;
data exam_one_graded;
set exam_one;
Grade_one=grade(score);
run;