The essential R packages

Much has been said about the richness of the system of packages for R, but where is one supposed to start?
The availability of a wide variety of packages has been long highlighted as one of the strengths of the R language. But the number is overwhelming — 5000 is the last I've heard and the growth is exponential — and the quality variable. When I talk about quality, I don't mean only "difficult to use", "buggy" or "slow", albeit that happens too. I also mean that some packages offer fundamental abstractions that you are likely to want in your toolset for one reason or another, whereas others have more specific goals, for instance they implement a specialized class of models or are companions for books and classes and so forth. Like other developers, I could just list and praise the ones I use or one could go for the crowdsourced solution of crantastic. Here I would like to suggest a data-driven approach based on the dependencies between packages and graph analysis. A package listed by another as a dependency can be seen as receiving an endorsement of sorts from the developers of the dependent package. After all, they have decided that using that package is better than the alternatives. Also, endorsement from authors of very important package can be seen as carrying more weight than the same for lesser packages. You can guess here a recursive definition whereby being an important package means being a dependency for other important packages. If one considers the graph with packages as vertices and dependencies as directed edges, one can recognize the familiar notion of page rank made popular by Google, whereby important sites are linked to by other important sites. So after some CRAN scraping (the data set is a little old, like 12/2011) and using the package igraph and specifically the page.rank function, here are the top 100 dependency-ranked packages. I entered a brief description by hand for about the first half, then run out of steam. Maybe we need a data-driven solution also for that task. Enjoy.

1	stats	0.0962312835109951	Distributions and other basic statistical stuff
2	methods	0.0732606540057392	Object oriented programming
3	graphics	0.0536687309266182	Of course, graphics
4	MASS	0.0283011225469996	Supporting material for Modern Applied Statistics with S
5	grDevices	0.0281639967024237	Graphical devices
6	utils	0.0224799288855229	In a snub to modularity, a little bit of everything, but very useful
7	lattice	0.0163861320305732	graphics
8	grid	0.0126373607888249	more graphics
9	Matrix	0.0115594712568376	Matrices
10	mvtnorm	0.0108335460953897	Multivariate Normal and t Distributions
11	sp	0.00916721059561437	Spatial data
12	tcltk	0.00885654936181036	GUI development
13	splines	0.00871777304117854	Needless to say, splines
14	nlme	0.00603233299532761	Mixed effects models
15	survival	0.00590245542213706	Survival analysis
16	cluster	0.00569050414061241	Clustering
17	R.methodsS3	0.00536103360510169	Object oriented programming
18	coda	0.00525607637692928	MCMC
19	igraph	0.00510936911063866	Graphs (the combinatorial objects)
20	akima	0.00448891508477221	Interpolation of irregularly spaced data
21	rgl	0.00448697035750645	3D graphics (openGL)
22	rJava	0.00419658010963776	Interface with Java
23	RColorBrewer	0.00405898916813389	Palette generations
24	ape	0.00401423956752348	Phylogenetics
25	gtools	0.00390068663688166	Functions that didn't fit anywhere else, including macros
26	nnet	0.00372527822413159	Neural networks
27	quadprog	0.00346928434614538	Quadratic programmin
28	boot	0.00339455733075856	Bootstrap
29	Hmisc	0.00321230956674779	Yet another miscellaneous package
30	car	0.00306687776780923	Companion to the Applied Regression book
31	lme4	0.00299902494303813	Linear mixed-effects models
32	foreign	0.00299020969373986	Data compatibility
33	Rcpp	0.00294488173058946	R C++ integration
34	robustbase	0.00292512759045668	Robust statistics
35	zoo	0.00291360656774946	Regular and irregular Time Series
36	ggplot2	0.00280061452368686	Graphics
37	iterators	0.00271022721728954	Iterators
38	XML	0.00268297000192895	XML
39	plyr	0.00260013798376819	In-memory data transformations
40	statmod	0.00255576796128438	Statistical modeling
41	tkrplot	0.00253629634469558	Plots as tk widgets
42	timeDate	0.00241854401215965	Time and date
43	fields	0.00229020477891645	Spatial data fitting
44	R.oo	0.00224897565304714	Object oriented programming
45	futile.paradigm	0.00208727007738248	Functional programming
46	abind	0.00203562002853031	Multidimensional array manipulation
47	rscproxy	0.00199899977662843	Interface to third party applications
48	scatterplot3d	0.00194982279122935	3D scatter plot
49	distr	0.00193739059491831	Object oriented distributions
50	codetools	0.00190284811878283	Code analysis
51	corpcor	0.00187713924111935	Efficient Estimation of Covariance and (Partial) Correlation
52	numDeriv	0.00186866167837909	Numerical derivatives
53	gdata	0.00186445901204259	Data manipulation
54	emulator	0.00186390193431536	Bayesian emulation of computer programs
55	KernSmooth	0.00183629272694307	Kernel smoothing
56	mgcv	0.00182832116584045	Generalized ridge regression
57	ade4	0.00182738399748524	Analysis of ecological data
58	foreach	0.00182632366989875	Alternative looping construct
59	e1071	0.00178029575562234	Support material for a class
60	splus2R	0.00176824350296979	Support for porting from Splus
61	plotrix	0.00174576155295491	More graphics
62	RGtk2	0.00172084829088438	GUI building with GTK
63	mclust	0.00171720012190246	Model-based clustering
64	colorspace	0.00170618665568823	Color Space manipulation
65	rgdal	0.00169086925766161	Geospatial data processing
66	gWidgets	0.00167347646713519	GUI building
67	tools	0.00166343776456814	Tools for package development
68	DBI	0.00165189537436299
69	class	0.00163669316246539
70	snow	0.00163581475562725
71	tframe	0.00162026150727402
72	pcaPP	0.00161552199090754
73	stats4	0.00158184928979309
74	vegan	0.00157719980281494
75	timeSeries	0.00155718601562939
76	rgenoud	0.00155684112512074
77	reshape	0.00155396309497494
78	RCurl	0.00151307683694413
79	rpart	0.00150199881687968
80	Rcmdr	0.00149432071343987
81	locfit	0.00146482502191925
82	RJSONIO	0.00146060707726276
83	maxLik	0.00145055642526326
84	startupmsg	0.0014445515325449
85	deSolve	0.00143101879661299
86	tseries	0.00140336389124161
87	gamlss	0.00139669657806558
88	lars	0.00139142435757209
89	caTools	0.00137676796617264
90	R.utils	0.00134070208104741
91	genetics	0.00133801968423769
92	proto	0.00132588926315005
93	np	0.00132017944858541
94	spatstat	0.00131066700412731
95	MCMCpack	0.00127549927255682
96	maptools	0.00127277095638128
97	rrcov	0.00126919936569582
98	lpSolve	0.00125502811609384
99	RcppArmadillo	0.00125049110788447
100	copula	0.00122860896379617