Wednesday, April 14, 2010

Labeling variables by a macro in SAS

To rename the variables of a dataset in SAS is a daily routine. SAS or the programmer s would give an arbitrary name for any variable at the initial stage of data integration. Those names have to be modified afterward. Wensui [Ref.1] developed a macro to add prefixes to the variables . Vincent et al. [Ref. 2] extended his idea and added some parameters into the macros. However, giving a name to a variable in SAS has many restrictions regarding the length and the format. For better understanding and recognition, labeling variables instead of renaming them would be useful. In the example below, first comes with an integration of complicated text data. Proc Transpose generates a number of variables with the same prefix.  Then by invoking the label() macro, the dataset would be correctly labeled as desired.

References:
1. Wensui Liu. ‘How to rename many variables in SAS’. http://statcompute.blogspot.com/
2. Vincent Weng. Ying Feng. ‘Renaming in Batches’. SAS Global 2009.

****************(1) MODULE-BUILDING STEP******************;
%macro label(dsin = , dsout = , dslabel = );
/***********************************************************
* MACRO: label()
* GOAL: use a label dataset to label the variables
* of the target dataset
* PARAMETERS: dsin = input dataset
* dsout = output dataset
* dslabel = label dataset
*
***********************************************************/
data _tmp;
set &dslabel ;
num = _n_;
run;

ods listing close;
ods output variables = _varlist;
proc contents data = &dsin;
run;

proc sql;
select cats(a.variable, '="', b.labelname, '"')
into: labellist separated by ' '
from _varlist as a, _tmp as b
where a.num = b.num
;quit;

data &dsout;
set &dsin;
label &labellist;
run;

proc datasets;
delete _:;
quit;
ods listing;
%mend;

****************(2) TESTING STEP******************;
******(2.1) INTEGRATE COMPLICATED DATA*************;
data have;
infile datalines dlm = ',';
retain _row;
input _tmpvar $ @@ ;
if prxmatch("/10\d/", _tmpvar) ne 0 then _row + 1;
if missing(_tmpvar) then delete;
datalines;
100, Tom, 3,1,5,2,6
101, Marlene, 1,2,4
102, Jerry, 9,10,4,
5, 6
103, Jim,2 ,1, 2, 2,4
;
run;

proc transpose data=have out=want(drop = _:)
prefix = var;
by _row;
var _tmpvar;
run;

******(2.2) INPUT LABELS FOR USE*************;
data label;
input labelname $30.;
cards;
Patient ID
Patient last name
The 1st treatment
The 2nd treatment
The 3rd treatment
The 4th treatment
The 5th treatment
;
run;

******(2.3) INVOKE MACRO TO LABEL*************;
%label(dsin = want, dsout = want_labeled, dslabel = label);

****************END OF ALL CODING***************************************;

Tuesday, March 9, 2010

A macro for Jarque–Bera test

Jarque–Bera test is an important test for normality. This macro works with multivariates.

Note: PROC UNIVARIATE actually computes excess Kurtosis.

%macro jbtest(data = , var = );
   ods listing close;
   proc univariate data = &data ;
       var &var;
      ods output moments = _1;
   run;
 
   data _2;
      set _1;
      label = label1; value = nValue1; output;
      label = label2; value = nValue2; output;
      drop cvalue: label1 label2 nvalue:;
   run;

   proc transpose data = _2 out = _3;
      by varname notsorted;
      id label;
      var value;
   run;

   data _4;
      set _3;
      jb = (skewness**2 * n)/6 + ((kurtosis)**2 *n)/24;
      p = 1 - probchi(jb, 2);
      lable jb = 'JB Statistic' p = 'P-value'
         kurtosis = 'Excess Kurtosis';
   run;
   ods listing;
   proc print data = _4 label;
      title "Jarque-Bera test for variable &var from data &data";
      var varname n skewness kurtosis jb p;
   run;

   proc datasets nolist;
      delete _:;
   quit;
%mend;

/* Run a test using a data set from SAS's HELP library */
%jbtest(data = sashelp.class, var = age height weight); 

Monday, December 15, 2008

Vertical collapse by five methods

******************(1) INPUT STEP***********;
data have;
input id: $ string: $;
cards;
001 aaa
001 bbb
002 ccccc
002 dddd
002 eee
003 ffff
004 gggggg
;
run;

*******************(2) CONCATENATION STEP ***********;
***********(2.1) METHOD I: do-loop and substr()***********;
data want1(drop = string);
length newstring $50.;
do _n_ = 1 by 1 until(last.id);
set have;
by id notsorted;
substr(newstring,length(newstring) + 1) = string;
end;
run;

***********(2.2) METHOD II: Proc Transpose***********;
proc transpose data = have out = _tmp;
by id;
var string;
run;

data want2;
set _tmp;
newstring = cats(of col:);
drop _: col:;
run;

***********(2.3) METHOD III: retain statement***********;
data want3(drop = string);
set have;
by id notsorted;
length newstring $50.;
retain newstring ;
if first.id then newstring = string;
else newstring = cats(newstring, string);
if last.id;
run;

***********(2.4) METHOD IV: Hash table***********;
data _null_;
length newstring $50;
if _n_ =1 then do;
declare hash h();
h.defineKey('id');
h.defineData('id', 'newstring');
h.defineDone();
end;
set have end = eof ;
if h.find() ne 0 then do;
newstring = string;
h.add();
end;
else do;
newstring = cats(newstring, string);
h.replace();
end;
if eof then h.output(dataset: 'want4');
run;

***********(2.5) METHOD V: SQL and macro***********;
proc sql noprint;
select count(unique(id)) into: idnum
from have;
select distinct id into: allid separated by ', '
from have;
quit;

%macro concatenate();
%let id = scan("&allid", &i);
%do i = 1 %to &idnum;
proc sql noprint;
select string into: newstring separated by ''
from have
where id = &id;
quit;
%put &newstring;
%end;
%mend;
%concatenate();

*********************END OF ALL CODING******************************;
References:
1. Technique board. Mysas.net.

Saturday, December 13, 2008

Proc GLM v.s. IML in ANOVA

data thrust;
input speed100 speed250 speed400 speed550;
if _n_ le 15 then feed=0.015;
if _n_ le 10 then feed=0.010;
if _n_ le 5 then feed=0.005;
cards;
121 98 83 58
124 108 81 59
104 87 88 60
124 94 90 66
110 91 86 56
329 291 281 265
331 265 278 265
324 295 275 269
338 288 276 260
332 297 287 251
640 569 551 487
600 575 552 481
612 565 570 487
620 573 546 500
623 588 569 497
;
run;

data thrust2;
set thrust;
array speedarray(4) speed:;
do i = 1 to 4;
force = speedarray(i);
output;
end;
drop speed:;
run;

data thrust3;
set thrust2;
if i = 1 then speed = 100;
if i = 2 then speed = 250;
if i = 3 then speed = 400;
if i = 4 then speed = 550;
drop i;
run;

proc sort data=thrust3 out=thrust4;
by feed speed;
run;

proc glm data=thrust3;
title 'GLM solution';
class feed speed;
model force= feed speed feed*speed;
run;

proc iml;
title 'IML solution';
use thrust4;
read all var{force} into y;
Xzero=J(60,1);
Xfeed=I(3)@j(4,1)@j(5,1);
Xspeed=j(3,1)@I(4)@j(5,1);
Xint=I(3)@I(4)@j(5,1);
Xfull=Xzero||Xfeed||Xspeed||Xint;

SSE=y`*y-y`*Xfull*ginv(Xfull`*Xfull)*Xfull`*y;
SSTrtFeed=y`*((I(3)-(1/3)*J(3,3))@((1/4)*J(4,4))@((1/5)*J(5,5)))*y;
SSTrtSpeed=y`*(((1/3)*J(3,3))@(I(4)-(1/4)*J(4,4))@((1/5)*J(5,5)))*y;
SSInt=y`*((I(3)-(1/3)*J(3,3))@(I(4)-(1/4)*J(4,4))@((1/5)*J(5,5)))*y;
SSTotal=y`*(I(60)-(1/60)*J(60,60))*y;

print SSTrtFeed SSTrtSpeed SSInt SSE SSTotal;

MSTrtFeed=SSTrtFeed/2; MSTrtSpeed=SSTrtSpeed/3; MSInt=SSInt/6; MSE=SSE/48;
print MSTrtFeed MSTrtSpeed MSInt MSE ;

F_TrtFeed=MSTrtFeed/MSE; F_TrtSpeed=MSTrtSpeed/MSE; F_Int=MSInt/MSE;
Print F_TrtFeed F_TrtSpeed F_Int;

P_TrtFeed=1-probf(F_TrtFeed,2,48);
P_TrtSpeed=1-probf(F_TrtSpeed,3,48);
P_Int=1-probf(F_Int,6,48);
Print P_TrtFeed P_TrtSpeed P_Int;
run;quit;

Wednesday, December 3, 2008

Some tips about Proc Printto and SAS memory

Run SAS Proc Options would have the answers.
/* Specifies the limit on the total amount of memory to be used by the SAS Syste*/
proc options option=MEMSIZE; run;
/*Upper limit for data-dependent memory usage during summarization*/
proc options option=SUMSIZE; run;
/* Upper limit for memory during sorting*/
proc options option=SORTSIZE; run;


/*Output saslog and saslist to specified locations*/
proc printto log="C:\user\myname\mylog.log" print="C:\user\myname\mylist.lst" new;
run;
/*Open the files with notpads directly*/
proc printto log="C:\user\myname\mylog.txt" print="C:\user\myname\mylist.txt" new;
run;
/*A method to suppress log generation during execuation, especially for simulation*/
filename supress dummy;
proc printto log=supress;
run;
/*Reopen the default setting*/
proc printto;
run;

Monday, November 24, 2008

How to tune a SQL query



From: Russ Lavery. The SQL Optimizer Project: _Method and _Tree in SAS®9.1. SUGI 30