The Greatest Sailing Race of All Time Seen Through Statistical Graphics

I don't know if you've been following the America's Cup. It's the oldest sailing competition and, by some accounts, the oldest international sporting event bar none. This year, this time honored contest has been thrust into the modern age with the adoption of foiling winged catamarans that skim the water of San Francisco Bay at 90 Km/h. Not only that, the competition has also entered the Big Data age with 30,000 data points per second generated by on-board sensors, not to mention the multiple video feeds, the enhanced reality visuals and more.

The prospect of having some of that data made available to the general public was mouth watering. It turns out that for one race what is shared is a paltry 14,000 records and all the columns corresponding to on board instruments contain only zeros and America's cup data engineers have confirmed the omission is necessitated by the rules. But what's there already tells the story of a race in quite some depth.

Mapreduce Everywhere

Mapreduce could extend its reach beyond — or inside — the data center. Coming soon to a computer near you?

Bringing Relational Joins to Rhipe

Relational operations are a very common way to express map-reduce computations at a higher level, but Rhipe, an R package for mapreduce, doesn’t have any. Let’s start to fix this with a basic join function.