The number of R packages on CRAN is 3,483 on 2011-12-12. The growth of R package in the past years can be fitted by a quadratic regression perfectly.
I am always interested in who are maintaining those packages. Then I wrote an R script to extract the package head information from CRAN’s website and stored them in a SQLite database. Most R developers are maintaining 1-3 R packages. Some of them are really productive. By the correspondence addresses (Email), the top 50 R developers are listed below:
developer package
1 Kurt Hornik
2 Martin Maechler
3 Hadley Wickham
4 Rmetrics Core Team
5 Achim Zeileis
6 Henrik Bengtsson
7 Paul Gilbert
8 Brian Ripley
9 Roger D. Peng
10 Torsten Hothorn
11 Karline Soetaert
12 Philippe Grosjean
13 Robin K. S. Hankin
14 Charles J. Geyer
15 Matthias Kohl
16 Charlotte Maia
17 Mikis Stasinopoulos
18 Simon Urbanek (1)
19 Thomas Lumley
20 Arne Henningsen
21 Gregory R. Warnes
22 Jonathan M. Lees
23 Michael Hahsler
24 Peter Ruckdeschel
25 A.I. McLeod
26 Brian Lee Yung Rowe
27 Dirk Eddelbuettel
28 John Fox
29 Kaspar Rufibach
30 Korbinian Strimmer
31 Michael Friendly
32 Peter Solymos
33 Roger Bivand
34 Simon Urbanek (2)
35 Christopher Brown
36 David Meyer
37 ORPHANED 7
38 Revolution Analytics
39 Rob J Hyndman
40 Romain Francois
41 Ulrike Groemping
42 Christophe Genolini
43 Frank Schaarschmidt
44 G. Grothendieck
45 Hana Sevcikova
46 Jeffrey A. Ryan
47 Kjetil Halvorsen
48 Pei Wang
49 Trevor Hastie
50 Yihui Xie
### A script of R to extract R package information and
### build a SQLite databse by hchao8@gmail.com
library(ggplot2)
library(XML)
library(RSQLite)
# Create and connect a SQLite database
conn <- dbConnect("SQLite", dbname = "c:/Rpackage.db")
# Extract names of R packages available from web
allPackageURL <-
"http://cran.r-project.org/web/packages/available_packages_by_name.html"
allPackage <- na.omit(melt(readHTMLTable(allPackageURL))[, c("V1")])
# Extract individual package information from web and store data in SQLite
for (i in 1:length(allPackage)){
packageName <- allPackage[i]
packageURL <- paste("http://cran.r-project.org/web/packages/",packageName,
"/index.html", sep="")
y <- melt(readHTMLTable(packageURL))
y$L1 <- packageName
if(dbExistsTable(conn, "Rpackage")) {
dbWriteTable(conn, "Rpackage", y, append = TRUE)
} else {
dbWriteTable(conn, "Rpackage", y)
}
}
# Pull out maintainer information from SQLite database
all <- fetch(dbSendQuery(conn, "
select v2 as author, count(v2) as package
from rpackage
where v1 = 'Maintainer:'
group by v2
order by package desc
;"))
# Disconnect SQLite database
dbDisconnect(conn)
# Draw a histogram
qplot(package, data = all, binwidth = 1, ylab = "Frequency",
xlab = "R packages maintained by individual developer")
ggsave("c:/Rlist.png")
# Find 50 most productive developers
head(all, 50)
No comments:
Post a Comment