Friday, July 19, 2013

Cluster analysis on a pivot table

The link of the pivot table is here

The increasing supremacy of JavaScript on both server side and client side seems a good news for those statistical workers who deal with data and model, and therefore always live in the darkness. They could eventually find a relatively easier way to show off their hard work on Web, the final destination of data. Here I show how to display the result of a cluster analysis on a web-based pivot table.
Back-end: cluster analysis
SAS has a FASTCLUS procedure, which implements a nearest centroid sorting algorithm and is similar to k-means. It has some time and space advantages over other more complicated clustering algorithms in SAS.
I still use the SASHELP.CLASS dataset and cluster the rows by weight and height. I specify 2 clusters and easily obtain the distances to the centroids by PROC FASTCLUS. The plot demonstrates thatweight=100 looks like the boundary to separate the two clusters. Next in DATA Step, I translate the SAS dataset to JSON format so that the browser can understand it.
************(1) Cluster the dataset*******;
proc fastclus data = sashelp.class maxclusters = 2 out = class;
var height weight;
run;

proc sgplot data = class;
scatter x = height y = weight /group = cluster;
yaxis grid;
run;

************(2) Transform to JSON*********;
data toJSON;
set class;
length line $200.;
array a[5] _numeric_;
array _a[5] $20.;
do i = 1 to 5;
_a[i] = cat('"',vname(a[i]),'":', a[i], ',');
end;
array b[2] name sex;
array _b[2] $20.;
do j = 1 to 2;
_b[j] =cat('"',vname(b[j]),'":"', b[j], '",');
end;
line = cats('{', cats(of _:), '},');
substr(line, length(line)-2, 1) = ' ';
keep line;
run;
Front-end: pivot table
Pivot table is a nice way to present data, especially raw data. There are a few approaches to realize pivot table on web, such as Google's fusion table. Nicolas Kruchten developed a framework called PivotTable.js on github, which is very popular.
I embed the JSON data with the PivotTable.js to make the HTML file static, since the Blogger doesn't provide the function of HTTP server. The file content will be like:

Eventually we can view the cluster result on a pivot table. The audience can now interactively play with data.

No comments:

Post a Comment