Thursday, March 1, 2012

Rolling regressions for backtesting




Market always generates huge volume time series data with millions of records. Running regressions to obtain the coefficients in a rolling time window is common for many backtesing jobs. In SAS, writing a macro based on the GLM procedures such as PROC REG is not an efficient option. We can imagine the situations: calling PROC REG thousands of times in a big loop would easily petrify any system.

The better way is to go down to the bottom to re-implement the OLS clichés: inverse, transpose and multiply the vectors and matrices. We can do it in either PROC IML, DATA step array or PROC FCMP. For such attempts PROC IML is really powerful but needs extra license. DATA step array would require very high data manipulation skills, since it is not designed for matrix operations. PROC FCMP, a part of SAS/BASE, seems like a portable solution for SAS 9.1 or later. To test this method, I simulated a two-asset portfolio with 100k records, and under a 1000-obs long rolling window, eventually ran 99,001 regressions. The time cost was just 10 seconds on an old laptop. Overall, the speed is quite satisfying.


/* 1 -- Simulate a fund and russell2000 index */
data simuds;
_beta0 = 0.2;
_beta1 = 0.6;
_mse = 0.8;
format day date9.;
do day = today() - 20000 to today();
russell2000 = rannor(1234);
myfund = _beta0 + _beta1*russell2000 + _mse*rannor(3421);
output;
end;
drop _:;
run;

proc print data = simuds(obs = 100);
run;

/* 2 -- Decide length of rolling window */
%macro rollreg(data = , wlength = , benchmark = , fund = , out = , rfree = );
data _1;
set &data nobs = nobs;
&benchmark = &benchmark - &rfree;
&fund = &fund - &rfree;
call symput('nloop', nobs - &wlength + 1);
call symput('nobs', nobs);
run;
%put &nloop &nobs;
/* 3 -- Manipulate matrices */
proc fcmp;
/* Allocate spaces for matrices */
array input[&nobs, 2] / nosym;
array y[&wlength] / nosym;
array ytrans[1, &wlength] / nosym;
array xtrans[2, &wlength] / nosym;
array x[&wlength, 2] / nosym;
array sscp[2, 2] / nosym;
array sscpinv[2, 2] / nosym;
array beta[2] / nosym;
array result[&nloop, 4] / nosym;
array beta_xtrans_y[1] / nosym;
array xtrans_y[2, 1] / nosym;
array ytrans_y[1] / nosym;

/* Input simulation dataset */
rc1 = read_array("_1", input, "&benchmark", "&fund");

/* Calculate OLS regression coefficients and r square */
do j = 1 to &nloop;
ytotal = 0;
do i = 1 to &wlength;
xtrans[2, i] = input[i+j-1, 1];
xtrans[1, i] = 1;
y[i] = input[i+j-1, 2];
ytotal = ytotal + y[i];
end;
call transpose(y, ytrans);
call mult(ytrans, y, ytrans_y);
call transpose(xtrans, x);
call mult(xtrans, x, sscp);
call inv(sscp, sscpinv);
call mult(xtrans, y, xtrans_y);
call mult(sscpinv, xtrans_y, beta);
call mult(beta, xtrans_y, beta_xtrans_y);
nymeansquare = ytotal**2 / &wlength;
result[j, 1] = beta[1];
result[j, 2] = beta[2];
result[j, 3] = (beta_xtrans_y[1]-nymeansquare) / (ytrans_y[1]-nymeansquare);
result[j, 4] = j;
end;

/* Output resulting matrix as dataset */
rc2 = write_array("&out", result, 'beta0', 'beta1', 'rsquare', 'day');
if rc1 + rc2 > 0 then put 'ERROR: I/O error';
else put 'NOTE: I/O was successful';
quit;
%mend;

%rollreg(data = simuds, wlength = 50, benchmark = russell2000, fund = myfund, out = result, rfree = 0.005);

ods graphics on / noborder;
/* 4 -- Visualize result */
proc sgplot data = result;
needle x = day y = beta1;
yaxis label = 'Systematic Risk';
run;

proc sgplot data = result;
needle x = day y = beta0;
yaxis label = "Jensen's alpha";
run;

proc sgplot data = result;
needle x = day y = rsquare;
yaxis label = "Coefficient of Determination";
run;

1 comment:

  1. Hi Charlie,
    Awesome stuff. I am new to Proc FCMP. Can it replace Proc IML.

    What would be difference between using Proc Reg and the way you use Proc FCMP. Would they give same results.

    Thanks,

    Amit

    ReplyDelete