Friday, October 29, 2010

Proc Fcmp(1): from VBA to SAS


Why use SAS in finance: SAS is a distinguished software package in statistics with more than 40-year development history. Starting from as a scripting tool to do ANOVA for agricultural experimental design in North Carolina, SAS has been heavily built on generalized linear model. For example, SAS institute consistently improve linear model procedures, from Proc Anova, Proc Glm, Proc Mixed to the latest Proc Glimmix. In a summary, SAS is pretty good at processing and analyzing any linear or non-linear models. However, the foundation for finance model, such as fixed income products and derivatives, is continuous-time equations, such as Black-Scholes formula. Most likely, quantitative analysts tend to price the products by solving those equations. So, in a word, the finance analyst is always working with equations, or many equations. Obviously here SAS is not good at it. Yes, SAS has more than 900 functions. And they are still not enough to keep up with the fast-pace of Wall Street. That is why the quants use Matlab, C++ and Excel VBA, instead of SAS. Then how the quants need to create their own equations in SAS? And how they build their function library or include the 3rd party library? Proc Fcmp may be the rescue.
Why Proc Fcmp? Finally we have Proc Fcmp, an equation editor. Proc Fcmp is a formidable tool for building function and even function library. All self-built or third party functions are stored in customer-specified package for future usage. Like Excel VBA, Proc Fcmp can construct equivalent subroutine and function. The nice thing is that all the function-based variables are encapsulated without any explicit declaration ( I hate nested macros: the variables would surf around from here to there). In addition, SAS Function Editor is an excellent tool viewer to manage and check all functions.
Conclusion: Look at the codes below, you see that Excel VBA and SAS Proc Fcmp are quite similar. A VBA developer can switch to SAS developer very smoothly in a short period. Also many people can work with a function package simultaneously through a distant SAS server, while each of them builds individual function. The quants may feel more comfortable to use SAS than VBA. Another good thing is that, by using Proc Proto, C++ function can be introduced into Proc Fcmp. That means that even C++ developer can also explore the turf of SAS language. Given that SAS is also a wonderful database management software, I expect that more and more people would embrace SAS through Proc Fcmp in the finance area.

Reference: Jørgen Boysen Hansen. Using the new features of Proc Fcmp in risk management at dong energy A/S. DONG Energy A/S.
'USE EXCEL VBA TO TO GRADE THE SCORES OF 28 STUDENTS
Function Grade(score)
If IsGrade(score) Then
Select Case score
Case Is <= 60 Grade = "F"
Case 60.5 To 70 Grade = "D"
Case 70.5 To 80 Grade = "C"
Case 80.5 To 90 Grade = "B"
Case IS > 90 Grade = "A"
End Select
Else
Grade = " "
End If
End Function

/*USE PROC FCMP TO GENERATE THE FUNCTION */
proc fcmp outlib=sasuser.myfunction.grade;
function grade(score);
select;
when (Score GE 90) return ("A");
when (Score GE 80) return ("B");
when(Score GE 70) return ("C");
when (Score GE 60) return ("D");
when (Score NE .) return ("F");
otherwise;
end;
endsub;
run;
quit;
/*APPLY THE FUNCTION TO GRADE THE SCORES OF 28 STUDENTS*/
options cmplib=sasuser.myfunction;
data exam_one_graded;
set exam_one;
Grade_one=grade(score);
run;

Tuesday, October 26, 2010

Proc Arboretum: a secret weapon in decision tree

Introduction: Decision tree, such as CHAID and CART, is a power predicative tool in statistical learning and business intelligence. Starting from SAS®9.1, the ARBORETUM procedure provided facilities to interactively build and deploy decision tress. Even though it is still an experiment procedure, the ARBORETUM procedure has comprehensive features for classification and predication. And the ARBORETUM procedure is also the foundation of decision tree node in SAS Enterprise Miner.
Method: A common SAS dataset ’sashelp.cars’ was divided into three parts of equal size: training, validation and scoring. Two methods were applied: the target variable ‘origin’ as nominal level and the target variable ’ MSRP’ as interval level.
Result: the codes below introduced how to use PROC RBORETUM to train, validate and score datasets based on decision tree. The generated DATA step codes were stored in two flat text files.
Conclusion: the ARBORETUM procedure is quick and versatile for applying decision tree for any size of dataset. It is really a secret weapon in the procedure stockpile of SAS.

Reference: Xiangxiang Meng. Using the SGSCATTER Procedure to Create High-Quality Scatter Plots. SAS Global Forum 2010.

/*DIVIDE THE ORIGINAL DATA INTO 3 PARTS: 1:1:1*/
data cars;
set sashelp.cars;
_index=_n_;
run;
proc sort data=cars;by origin;run;
proc surveyselect data=cars samprate=0.3333 out=train;
strata origin /alloc=prop ;
run;
proc sql;
create table cars2 as
select * from cars
where _index not in ( select _index from train)
;quit;
proc surveyselect data=cars2 samprate=0.5 out=validation;
strata origin /alloc=prop ;
run;
proc sql;
create table test as
select * from cars2
where _index not in ( select _index from validation)
;quit;
proc datasets;
delete cars2 cars;
run;

/*TARGET VARIABLE: NOMINAL*/
filename code_1 'C:\code_1.txt';
proc arboretum data=train;
target origin / level=nominal;
input MSRP Cylinders Length Wheelbase MPG_City MPG_Highway Invoice Weight Horsepower/ level=interval;
input EngineSize/level=ordinal;
input DriveTrain Type /level=nominal;
assess validata=validation;
code file=code_1;
score data=test out=scorecard outfit=scorefit;
save IMPORTANCE=imp1 MODEL=mymodel NODESTATS=nodstat1 RULES=rul1 SEQUENCE=seq1 SIM=sim1 STATSBYNODE= statb1 SUM=sum1
;
run;
quit;

/*TARGET VARIABLE: INTERVAL*/
filename code_2 'C:\code_2.txt';
proc arboretum data=train;
target MSRP / level=interval;
input Cylinders Length Wheelbase MPG_City MPG_Highway Weight Horsepower/ level=interval;
input EngineSize/level=ordinal;
input DriveTrain Type origin /level=nominal;
assess validata=validation;
code file=code_2;
score data=test out=scorecard2 outfit=scorefit2;
save IMPORTANCE=imp2 MODEL=mymode2 NODESTATS=nodstat2 RULES=rul2 SEQUENCE=seq2 SIM=sim2 STATSBYNODE= statb2 SUM=sum2
;
run;
quit;