Data extracted from The Do Loop
SAS official blogs have been restructured this summer. Since I can’t find the previous XML button on the website, I rewrote a program to directly extract HTML data to drive the KPI. Jiangtang Hu also created a program to extract data from The Do Loop, and mentioned that Rick is an incredibly productive writer.
%macro extract(page = );
options mlogic mprint;
%do index = 1 %to &page;
filename raw url "http://blogs.sas.com/content/iml/page/&index/";
data _tmp01;
infile raw lrecl= 550 pad ;
input record $550. ;
if find(record, 'id="post') gt 0 or find(record, 'class="post') gt 0;
run;
data _tmp02;
set _tmp01;
_n + 1;
_j = int((_n+2) / 3);
run;
proc transpose data=_tmp02 out=_tmp03;
by _j;
var record;
run;
data _&index;
set _tmp03;
array out[3] $100. title time pageview;
array in[3] col1-col3;
do i = 1 to 3;
if i = 1 then do; _str1 = 'rel="bookmark">'; _str2 = "</a></"; end;
if i = 2 then do; _str1 = '+0000">'; _str2 = '</abbr>'; end;
if i = 3 then do; _str1 = '="postviews">'; _str2 = "</span>"; end;
_len = length(compress(_str1));
_start = find(in[i], compress(_str1)) + _len ;
_end = find(in[i], _str2, _start);
out[i] = substr(in[i] , _start , _end - _start);
end;
drop _: col: i;
run;
%end;
data out;
set %do n = 1 %to &page;
_&n
%end;;
run;
proc datasets nolist;
delete _:;
quit;
%mend;
%extract(page = 20);
data out1;
set out nobs = nobs;
j + 1;
n = nobs - j + 1;
length level $20.;
label pageview1 = 'PAGEVIEW' time1 = 'TIME' n = 'TOTAL POSTS';
pageview1 = input(pageview, 5.);
_month = scan(time, 1);
_date = scan(time, 2);
_year = scan(time, 3);
time1 = input(cats(_date, substr(_month, 1, 3), _year), date9. );
weekday = weekday(time1);
drop _:;
format time1 date9.;
run;
ods html style = htmlbluecml;
proc sql noprint;
select count(*), sum(pageview1) into: nopost, :noview
from out1
;quit;
proc gkpi mode=basic;
dial actual = &nopost bounds = (0 100 200 300 400) /
target=200 nolowbound
afont=(f="Garamond" height=.6cm)
bfont=(f="Garamond" height=.7cm) ;
proc gkpi mode=basic;
dial actual = &noview bounds = (0 2e4 4e4 6e4 8e4) /
afont=(f="Garamond" height=.6cm)
bfont=(f="Garamond" height=.7cm) ;
quit;
What I learnedI accumulated all the 195 titles, replaced/removed some words and processed them with Wordle. As I expected, Rick’s blog is mainly about ‘Matrix’, ‘Statistics’ and ‘Data’. It is interesting to learn how to create ‘Function’ in SAS/IML, which involves a lot of programming skills. I also enjoyed his topics about ‘Simulating’ and ‘Computing’ with ‘Random’ numbers. He also has exciting articles about how to deal with ‘Missing’ values and ‘Curve’.
data word_remove;
input word : $15. @@;
cards;
sas iml using use creating create proc blog vesus
;;;
proc sql noprint;
select quote(upcase(compress(word))) into :wordlist separated by ','
from word_remove
;quit;
data _null_;
set out(keep=title);
title =tranwrd(upcase(title), 'MATRICES', 'MATRIX');
title =tranwrd(upcase(title), 'FUNCTIONS', 'FUNCTION');
title =tranwrd(upcase(title), 'STATISTICAL', 'STATISTICS');
length i $8.;
do i = &wordlist;
title =tranwrd(upcase(title), compress(i), ' ');
end;
file 'c:\tmp\output1.txt';
put title;
run;
When the number reaches 200Except the holidays (those gaps in the finger plot above), Rick keeps a constant rate in writing articles (approximately 3 posts a week).
No double the OLS regression gives a straight line. It seems that the total number will hit the 200 target pretty soon: next next week I believe.
proc sgplot data=out1;
needle x = time1 y = n;
yaxis grid max = 300;
run;
proc sgplot data = out1;
reg x =time1 y = n;
refline 200/ axis=y ;
yaxis max = 300;
run;
What a SAS user likes to know
From my experience, clicks in a web browser are mostly originated form search engines, while a regular reader would like to use feeds instead. The page views recorded on the website of The Do Loop can reflect what SAS users try to find. Rick follows his pattern -- introductory tips on Monday, intermediate techniques for Wednesday, and topics for experienced programmers Friday. If we separate the page view trends at the three levels, we can see that the intermediate and advanced posts attract more page views than the basic ones.
data out2;
set out1;
if weekday = 2 then level = '1-Basic';
else if weekday in (3, 4) then level = '2-Intermediate';
else level = '3-Advanced';
output;
set out1;
level = '4-Overall';
output;
run;
proc sgpanel data = out2;
panelby level / spacing=5 columns = 2 rows = 2 novarname;
series x = time1 y = pageview1;
rowaxis grid; colaxis grid;
run;
ConclusionI agree with what Rick Wicklin said: blogging helps us to become more aware of what we know and what we don't know. I benefited a lot from his book and his resourceful blog in the past year. Cheers on Rick’s incoming 200th post!
@Tricia - I agree. Wordle is a wonderful tool. Charlie
ReplyDeleteThanks for your post. I’ve been thinking about writing a very comparable post over the last couple of weeks, I’ll probably keep it short and sweet and link to this instead if thats cool. Thanks.
ReplyDeletesb game hacker
supersu apk download
hungry shark evolution mod download
download gunship battle apk
lucky patcher apk download
download game killer app