• ## How plyrmr was ahead of the curve

I recently attended a talk by the always excellent Hadley Wickham about his latest work on creating and visualizing many models. I combine here two...
• ## Yet Another Pipe Operator in R to unify interactive and programming use

Prologue The pipe operator, %>% in its latest incarnation, is all the rage in R circles. I first saw it in a less-well-known package called...
• ## Syntax Directed Diffs for R in R

Unsatisfied with general purpose, syntax-oblivious diff tools I take the first step towards syntax-directed diffs for R. Like many developers, I use git to manage...
• ## Delicious R Curry

In R, functional::Curry is a misnomer at best. Let’s implement currying in R. I’ve always wondered why the function Curry in package functional for the...
• ## 10 eigenmaps of the United States of America

An unbiased analysis of census data reveals not one but many maps of the United States. The original inspiration for this post comes from a...
• ## Can't someone else find those differences?

Use statistics and R instead of squinting at satellite images! You may have, like me, run into this article. Amazing stuff. A little startup pushing...
• ## The Greatest Sailing Race of All Time Seen Through Statistical Graphics

I don't know if you've been following the America's Cup. It's the oldest sailing competition and, by some accounts, the oldest international sporting event bar...
• ## Three microblogs: The Ascetic Programmer, Science in Crisis and Data Science Matters.

I've started three thematic microblogs you may be interested in.They are all link and quote microblogs that reflect side interests related to my work but...
• ## R anti-tips

Not all R tips are equally good. Let's set the record straight. Anti-tip #1: For loops are slower than functions in the apply familyWhy should...
• ## The essential R packages

Much has been said about the richness of the system of packages for R, but where is one supposed to start? The availability of a...
• ## Mapreduce everywhere

Mapreduce could extend its reach beyond — or inside — the data center. Coming soon to a computer near you?The local Hadoop SF meetings cover...
• ## The connected components example, rewritten using RHadoop/rmr

My new implementation of random mate for mapreduce, using the package rmr from Revolution Analytics open source project RHadoop.This story has now three episodes. First,...
• ## A map reduce algorithm for connected components: implementation

At long last, a complete implementation of the algorithm I described some time ago.You are kindly advised to go back and check the algorithm motivation...
• ## Bringing relational joins to Rhipe

Relational operations are a very common way to express map-reduce computations at a higher level, but Rhipe, an R package for mapreduce, doesn't have any....
• ## Let a million Twitters bloom

Why are some people uncomfortable with cloud computing? What are the limitations and is there a way forward?The recent sudden change in Twitter terms of...
• ## Looking for a map reduce language

On a quest for an elegant and effective map reduce language, I went through a number of options and put together some considerations. And the...
• ## Find the odd bag

From a job interview challenge, an interesting probability exercise in two parts. One of the themes here is pretty standard fare. You are given a...
• ## On lenses for small cameras: a data-driven counterargument

Andy Westlake of dpreview.com takes apart the current lens offering for lightweight interchangeable lens cameras (LILC) like the micro four thirds and related mirrorless designs,...
• ## Thoughts on A/B testing

A/B testing is part of a push towards software engineering as an experimental science, which I support, but there are plenty of open problems.I've been...
• ## An algorithm for sample quantiles in map reduce

A simple but often occurring problem is computing sample quantiles, sometimes named top $k$ elements, in a large data set. Here I show a solution...
• ## A map reduce algorithm for connected components

In a recently published book about algorithms for the map reduce model of computation, a simple connected components algorithm based on lablel propagation is proposed,...
• ## Rapleaf Array Absurdity or On streaming problems in disguise

From the interview challenges of an up and coming web startup, three problems that range from the trivial to the impossible. The key to the...