I don't know if you've been following the America's Cup. It's the oldest sailing competition and, by some accounts, the oldest international sporting event bar none. This year, this time honored contest has been thrust into the modern age with the adoption of foiling winged catamarans that skim the water of San Francisco Bay at 90 Km/h. Not only that, the competition has also entered the Big Data age with 30,000 data points per second generated by on-board sensors, not to mention the multiple video feeds, the enhanced reality visuals and more.

The prospect of having some of that data made available to the general public was mouth watering. It turns out that for one race what is shared is a paltry 14,000 records and all the columns corresponding to on board instruments contain only zeros and America's cup data engineers have confirmed the omission is necessitated by the rules. But what's there already tells the story of a race in quite some depth.

Here I focus on Race 10, a cliff hanger of a race, hailed by many as one of the greatest sailing races of all time. I wanted to build some high density graphics that showed the crucial events of the race. Some of the answers offered by the graphics would need to be confirmed by more rigorous statistical methods but we will stop short of that in this article. The first graphics is going to be based on the course followed by the boats and this is what happens just plotting their longitude and latitude, captured 5 times per second:

plot of chunk unnamed-chunk-7

Since the race goes back and forth three times in between two gates, the overlap of the yachts' trajectories makes this first visualization hard to read, so I decided to mirror the Longitude every time the sailboats reach a mark (sailing speak for turn around point). Imagine the race course as a folded piece of paper and the visualization as its unfolded version, with the creases in the north-south direction at the marks. Being a race, speed matters, so I decided to use color to represent it. Since in sailing “fast” is only relative to the wind, I learned to use the ratio of boat speed to wind speed from America's Cup commentators, accomplished sailors Gary Jobson and Ken Read. In Race 10 this ranged between 0.5 on some not-so-good tacks (upwind turns), when the boat is briefly traveling against the wind, to 2.7. Symbols are used to identify the boats. A nice side effect of this is that the symbols show the position of the boats at regular time intervals, suggesting which one is ahead particularly near a cross. In sailing, speed is not everything: equally important is angle w.r.t the wind, since sailboats can't go straight upwind and can go straight downwind only paying a massive speed penalty, and the course is roughly aligned with the wind. The combination of speed and angle is called velocity made good, or VMG. As for raw speed, what a good VMG is depends on the intensity of the wind, so it makes sense to take the ratio of VMG to wind speed, which I call relative VMG or RVMG. In the following graphics, this is expressed as thickness of the line. To summarize, the next graphics shows the trajectory of the two boats with the twist that the race course is replicated three times and mirrored as needed to avoid overlaps; color is speed and thickness is RVMG; symbols identify the boats and their position at regular time intervals.

plot of chunk unnamed-chunk-8

If you watched this fantastic race, you can recognize all of its decisive moments in this graphic. The race starts with a very fast reach, with team New Zealand (NZL) pushing team USA (USA) wide at the first mark rounding. USA has a difficult jybe (turn going downwind). The two boats go in sync down to mark 2, with USA following a closer angle for one tack, not sure why. At the mark USA goes for a complicated maneuver to obtain a split — a split is always preferred by the chaser, as these boats always leave a wake of disturbed air behind them, irrespective of wind, so great is their speed — but the maneuver costs them dearly in boat speed. In the upwind leg, the boats seem happy to crisscross paths as the leaders don't make a defensive tack over their opponents. The leadership changes hands 4 times. Peaks in speed at the crossings show the boat on port tack taking evasive maneuver to avoid collision, while the one on starboard tack, with the right of way, tries to make it hard on the other boat. Right before mark 3, the two boats part ways: USA ducks deeply behind NZL whereas NZL slows down heading upwind and goes into the mark rounding with a zigzag. USA speed seems to suffer after the last tack and they go into the downwind leg with a slight deficit, but with the split. At the first cross, a decision looms between ducking, losing ground or jibing, losing the split. As USA tactician Ainslie later explained, neither looks good, but they duck. NZL is clear ahead at their next crossing and, without major errors, the race is over. USA fails to keep the pressure up though with a poor last mark rounding.

Now I would like to transition into a different, more abstract visualization, but before doing that I need to show a version of the previous graphics with color representing time.

plot of chunk unnamed-chunk-9

This isn't so interesting per se but you need to keep an eye on it to read the next graphics, where time is represented by the same color scale. The next graphics is focused on speed and direction w.r.t the wind, what sailors call point of sail. In polar coordinates, imagine the wind coming from above and the boat with its stern (back) in the center and its bow (front) pointing out. The labels are the traditional names for different points of sail: in irons (against the wind), close hauled (almost against the wind) and so forth. The distance from the center is boat speed relative to the wind. Color represents time and going back and forth to and from the previous graphic you can associate the different colors with various phases of the race. Each point represents a speed and direction reading, taken five times per second and each boat has a separate panel. As you can see, the points are not randomly scattered. Boats tend to stay on a course that gives them the best VMG most of the time. The biggest exceptions are the blue and purple clusters, which are the beginning and final stretches of the race, which are oriented at almost 90 degrees to the wind and as such VMG doesn't matter there, only speed. So the main six clusters are port and starboard tack, upwind and downwind, and the starting and final reaches. In between these we see connecting lines: roughly horizontally we have tacks (upper half) and jibes (lower half); vertically we have mark roundings. There's a few lines that don't fit any of the above: the acceleration from the start in red and a few “tactical” situations such as NZL zig-zagging before mark 3 (in blue-green) and USA ducking deep behind NZL (green) also before the same mark. Right click on the graphics to see it at full size.

plot of chunk unnamed-chunk-11

By promoting speed from a color scale to a more perceptually precise spatial scale, we can gain new insights, like the remarkable speed difference between the two teams rounding mark 2 (in yellow) , how scattered the final run is for USA compared to the tight cluster of points for NZL (in purple) and how more consistent are jibe speeds for NZL. In favor of USA, we may see slightly faster speeds through the tacks, but a different graphics later doesn't confirm this.

The problem with this visualization is in the tight clusters of points, that is straight line travel. It's hard to see the density of points, since they overlap. So let's now drop the individual data points. In the next graph, the density of color red is proportional to the time spent sailing at a certain point of sail and speed. It's a more static view of the race with emphasis on the normal modes of sailing and less on the episodes and outliers.