Antonio Piccolboni

The Strong ML Hypothesis
Jun 4, 2019
Data and compute power availability are important in the resurgence of ML and AI, but two of the biggest innovations in neural networks (NN), convolutional and deep networks (CN and DN), are data- and compute-efficient ideas, which allow practitioners to do more with fewer resources. I think this observation deserves...
The softmedian
May 14, 2019
Why is there a softmax function, but not a softmedian? Let’s create not one, but a few of them.
altair_recipes: a Python package to generate essential statistical graphics for the web
Feb 15, 2019
If you don’t need the full power of the grammar of graphics to generate classical plots for the web altair_recipes is the the easy way. Check it out with pip install altair_recipes.
A Simple Loss Function for Multi-Task learning with Keras implementation, part 2
Apr 13, 2018
In this post, we show how to implement a custom loss function for multitask learning in Keras and perform a couple of simple experiments with itself. TL;DR; this is the code:
A Simple Loss Function for Multi-Task learning with Keras implementation, part 1
Mar 8, 2018
In this post I walk through a recent paper about multi-task learning and fill in some mathematical details. Implementation and experiments will follow in a later post.
Tame the newsfeed with homemade AI
Nov 4, 2017
Back in the 60s it was called information overload and it affected so-called decision makers. Fast forward to today and the situation hasn’t improved.
Mathematical model sides with tennis players, not pundits, on serve selection
Jul 16, 2017
I was watching the current Wimbledon tennis tournament when I heard a comment by former champion and coach Boris Becker that got my attention. He complained that Canadian player Milos Raonic was not using the body serve, a shot aimed directly at the opponent that allegedly results in a weak...
A nutritional search engine with shiny and dplyr
Oct 24, 2016
TL; DR: try our shiny new nutritional search engine. Feedback welcome. “In the middle of our life’s journey, I found myself in a dark wood.” So starts Dante’s Inferno. My midlife doesn’t feel remotely as bleak, but for reasons that will be best left untold, I had to almost completely...
How plyrmr was ahead of the curve
Mar 31, 2016
I recently attended a talk by the always excellent Hadley Wickham about his latest work on creating and visualizing many models.
Yet Another Pipe Operator in R to unify interactive and programming use
Sep 20, 2015
Prologue The pipe operator, %>% in its latest incarnation, is all the rage in R circles. I first saw it in a less-well-known package called vadr. Then one was added to dplyr, but I preferred my own implementation when working on plyrmr. Then a dedicated package emerged called magrittr and...
Syntax Directed Diffs for R in R
Sep 11, 2015
Unsatisfied with general purpose, syntax-oblivious diff tools I take the first step towards syntax-directed diffs for R. Like many developers, I use git to manage my source code and collaborate with others. One fundamental component of source code control is a tool to compare files, namely source code files. Most...
Delicious R Curry
Jul 21, 2015
In R, functional::Curry is a misnomer at best. Let’s implement currying in R. I’ve always wondered why the function Curry in package functional for the language R is named that way when it actually implements partial application. What it does is transfroming a function into another one with a smaller...
10 eigenmaps of the United States of America
Aug 26, 2014
An unbiased analysis of census data reveals not one but many maps of the United States. The original inspiration for this post comes from a New York Times article. By combining 6 socio-economic observables at a the county level, the author puts together a map that in his view describes...
Can't someone else find those differences?
Aug 13, 2014
Use statistics and R instead of squinting at satellite images! You may have, like me, run into this article. Amazing stuff. A little startup pushing satellite imaging to the next level. Full planet coverage at the resolution of a few feet every 24 hours, soon, and on a shoestring budget....
The Greatest Sailing Race of All Time Seen Through Statistical Graphics
Sep 23, 2013
I don't know if you've been following the America's Cup. It's the oldest sailing competition and, by some accounts, the oldest international sporting event bar none. This year, this time honored contest has been thrust into the modern age with the adoption of foiling winged catamarans that skim the water...
Three microblogs: The Ascetic Programmer, Science in Crisis and Data Science Matters.
May 5, 2013
I've started three thematic microblogs you may be interested in.They are all link and quote microblogs that reflect side interests related to my work but that I don't want to force onto all of my twenty-five readers. My main microblog is focused on work related matters, projects etc. and I plan on...
R anti-tips
Oct 18, 2012
Not all R tips are equally good. Let's set the record straight. Anti-tip #1: For loops are slower than functions in the apply familyWhy should that be the case? Let's see what the R interpreter has to say about it. Let's get some numbers to chew on first: z =...
The essential R packages
May 10, 2012
Much has been said about the richness of the system of packages for R, but where is one supposed to start? The availability of a wide variety of packages has been long highlighted as one of the strengths of the R language. But the number is overwhelming — 5000 is...
Mapreduce everywhere
Dec 3, 2011
Mapreduce could extend its reach beyond — or inside — the data center. Coming soon to a computer near you?The local Hadoop SF meetings cover a variety of topics, mostly practical. But on one occasion the discussion took a speculative turn: does Hadoop have legs or is it a stop-gap...
The connected components example, rewritten using RHadoop/rmr
Sep 15, 2011
My new implementation of random mate for mapreduce, using the package rmr from Revolution Analytics open source project RHadoop.This story has now three episodes. First, I got interested in how to compute connected components in map reduce in a way that works even for large diameter graphs and proposed an...
A map reduce algorithm for connected components: implementation
Apr 27, 2011
At long last, a complete implementation of the algorithm I described some time ago.You are kindly advised to go back and check the algorithm motivation and description in my older post, but the short of it is that it is a map reduce algorithm for connected components that is not...
Bringing relational joins to Rhipe
Apr 15, 2011
Relational operations are a very common way to express map-reduce computations at a higher level, but Rhipe, an R package for mapreduce, doesn't have any. Let's start to fix this with a basic join function.This is going to be a little dry and technical, in preparation of better things to...
Let a million Twitters bloom
Apr 11, 2011
Why are some people uncomfortable with cloud computing? What are the limitations and is there a way forward?The recent sudden change in Twitter terms of service for developers — the consensus is, despite attempts to backtrack, that they are against third party clients — has unleashed a debate about the...
Looking for a map reduce language
Apr 7, 2011
On a quest for an elegant and effective map reduce language, I went through a number of options and put together some considerations. And the winner is …Update: since writing this post, I was approached by Revolution Analytics to write yet another map reduce library, this time for R, and...
Find the odd bag
Nov 29, 2010
From a job interview challenge, an interesting probability exercise in two parts. One of the themes here is pretty standard fare. You are given a clearly defined random procedure whose outcome is a mixture of two distributions. The problem is, given a certain set of outcomes, find which of the...
On lenses for small cameras: a data-driven counterargument
Sep 17, 2010
Andy Westlake of dpreview.com takes apart the current lens offering for lightweight interchangeable lens cameras (LILC) like the micro four thirds and related mirrorless designs, but I was unconvinced. Let's see what the data says.Andy Westlake is a photographer and camera reviewer at dpreview.com and his opinions carry some weight...
Thoughts on A/B testing
Sep 16, 2010
A/B testing is part of a push towards software engineering as an experimental science, which I support, but there are plenty of open problems.I've been mulling over these points for a long while, but, after running into this excellent and amusing post by John Moult, about the pains and perils...
An algorithm for sample quantiles in map reduce
Jul 27, 2010
A simple but often occurring problem is computing sample quantiles, sometimes named top $k$ elements, in a large data set. Here I show a solution for the MapReduce model of computation.The standard in memory algorithm for this problem is similar to quicksort, with the main difference that only one branch...
A map reduce algorithm for connected components
Jul 19, 2010
In a recently published book about algorithms for the map reduce model of computation, a simple connected components algorithm based on lablel propagation is proposed, but its complexity depends on the diameter of the graph, which can be very large. It turns out we can get rid of that dependency...
Rapleaf Array Absurdity or On streaming problems in disguise
Jul 16, 2010
From the interview challenges of an up and coming web startup, three problems that range from the trivial to the impossible. The key to the the solution is to recognize that the setting is close to that of streaming algorithms, which allows for very limited space resources compared to the...
Facebook Illegal Wiretaps
Nov 4, 2007
The formulation of this problem is quite creative, but overall it is just describing a matrix where the rows are workers and the columns are tasks. Workers have numbers and tasks have names and the job completion time depend on whether the worker is odd or even, the number of...
Facebook Prime Bits
Nov 2, 2007
This is one of Facebook job candidate puzzles. Given a range [a,b] of positive integer numbers, test for the primality of the number of 1 bits in the binary representation of each number, and do so in O(n) where n is b - a.Unfortunately the puzzle goes on to assume...
ProjectDescription - Lucene-hadoop Wiki
May 24, 2007
Implementation of simple parallel computing, based on Google's map-reduce, runs over Amazon's EC2. Supercomputing for the rest of us ProjectDescription - Lucene-hadoop Wiki