PROC HADOOP is available since SAS 9.3M2, which bridges a Windows client and a Hadoop server. The great thing about this procedure is that it supports user-defined function. There are several steps to apply this procedure.
- Download Java SE and Eclipse on Windows
Java SE and Eclipse are free to download. Installation is also fairly easy. - Make user-defined function on Windows
The most basic user-defined function is an upper-case function for a string that wraps Java’s native str.toUpperCase() function. Pig’s manual has [detail descripton][1] about it. - Package the function as JAR
There is a wonderful video tutorial on YouTube. Make sure that version of the [Pig API][2] with the name such as pig-0.12.0.jar on Windows is the same to the one running on the Hadoop. - Run PROC HADOOP commands
# pig_code
A = load 'test3.txt' as (f1: chararray, f2: chararray, f3: chararray, f4: chararray, f5: chararray);
describe A;
register myudfs.jar;
B = foreach A generate myudfs.UPPER(f3);
dump B;Then we can run the SAS codes with PROC HADOOP. Subsequently one field f3 of the text file on HDFS is capitalized.filename cfg "C:\tmp\config.xml";
filename code "C:\tmp\pig_code.txt";
proc hadoop options=cfg username="myname" password="mypwd" verbose;
pig code=code registerjar="C:\tmp\myudfs.jar";