• Delicious R Curry

    In R, functional::Curry is a misnomer at best. Let’s implement currying in R. I’ve always wondered why the function Curry in package functional for the...
  • 10 eigenmaps of the United States of America

    An unbiased analysis of census data reveals not one but many maps of the United States. The original inspiration for this post comes from a...
  • Can't someone else find those differences?

    Use statistics and R instead of squinting at satellite images! You may have, like me, run into this article. Amazing stuff. A little startup pushing...
  • The Greatest Sailing Race of All Time Seen Through Statistical Graphics

    I don't know if you've been following the America's Cup. It's the oldest sailing competition and, by some accounts, the oldest international sporting event bar...
  • Three microblogs: The Ascetic Programmer, Science in Crisis and Data Science Matters.

    I've started three thematic microblogs you may be interested in.They are all link and quote microblogs that reflect side interests related to my work but...
  • R anti-tips

    Not all R tips are equally good. Let's set the record straight. Anti-tip #1: For loops are slower than functions in the apply familyWhy should...
  • The essential R packages

    Much has been said about the richness of the system of packages for R, but where is one supposed to start? The availability of a...
  • Mapreduce everywhere

    Mapreduce could extend its reach beyond — or inside — the data center. Coming soon to a computer near you?The local Hadoop SF meetings cover...
  • The connected components example, rewritten using RHadoop/rmr

    My new implementation of random mate for mapreduce, using the package rmr from Revolution Analytics open source project RHadoop.This story has now three episodes. First,...
  • A map reduce algorithm for connected components: implementation

    At long last, a complete implementation of the algorithm I described some time ago.You are kindly advised to go back and check the algorithm motivation...
  • Bringing relational joins to Rhipe

    Relational operations are a very common way to express map-reduce computations at a higher level, but Rhipe, an R package for mapreduce, doesn't have any....
  • Let a million Twitters bloom

    Why are some people uncomfortable with cloud computing? What are the limitations and is there a way forward?The recent sudden change in Twitter terms of...
  • Looking for a map reduce language

    On a quest for an elegant and effective map reduce language, I went through a number of options and put together some considerations. And the...
  • Find the odd bag

    From a job interview challenge, an interesting probability exercise in two parts. One of the themes here is pretty standard fare. You are given a...
  • On lenses for small cameras: a data-driven counterargument

    Andy Westlake of dpreview.com takes apart the current lens offering for lightweight interchangeable lens cameras (LILC) like the micro four thirds and related mirrorless designs,...
  • Thoughts on A/B testing

    A/B testing is part of a push towards software engineering as an experimental science, which I support, but there are plenty of open problems.I've been...
  • An algorithm for sample quantiles in map reduce

    A simple but often occurring problem is computing sample quantiles, sometimes named top $k$ elements, in a large data set. Here I show a solution...
  • A map reduce algorithm for connected components

    In a recently published book about algorithms for the map reduce model of computation, a simple connected components algorithm based on lablel propagation is proposed,...
  • Rapleaf Array Absurdity or On streaming problems in disguise

    From the interview challenges of an up and coming web startup, three problems that range from the trivial to the impossible. The key to the...
  • Facebook Illegal Wiretaps

    The formulation of this problem is quite creative, but overall it is just describing a matrix where the rows are workers and the columns are...
  • Facebook Prime Bits

    This is one of Facebook job candidate puzzles. Given a range [a,b] of positive integer numbers, test for the primality of the number of 1...
  • ProjectDescription - Lucene-hadoop Wiki

    Implementation of simple parallel computing, based on Google's map-reduce, runs over Amazon's EC2. Supercomputing for the rest of us ProjectDescription - Lucene-hadoop Wiki