Complete, experienced software engineer and data scientist with industrial and academic backgrounds. Select work includes energy storage optimization, big data packages for R, information filtering, social network analysis. His work has been cited 5000+ times in the scientific literature.


Software for big data analytics, map-reduce algorithms. Algorithm design and implementation, data analysis, machine learning. Speaker at professional meetings.

Work Experience

Data Scientist, Decision Patterns — 7/2019 -– Present

Projects include the refactor and optimization of large Hadoop ETL and machine learning jobs for a Fortune 500 client.

Senior Data Scientist, Stem Inc. — 12/2016 – 2/2018

Design and implementation of probabilistic load prediction algorithms using Spark, Tensorflow and scikit-learn on proprietary datasets combined with meteorological data. Contributed to the design of battery storage optimization algorithms.

Consultant, Self-employed — 12/2010 – 11/2016

Clients include Dataspora, Revolution Analytics and Adatao. Lead developer for the RHadoop open source project for big data analytics, including creating the rmr package and others in addition to several internal projects. rmr became the de-facto standard for big data analytics in R and has been downloaded more than 1M times. Technical and instructional documents and community building through speaking engagements and forum participation. Presented at strata. Other projects include text analysis of user interactions for a software developer, using NLTK and scikit-learn, and a large data integration project with approximate user identification.

Principal, Rightload — 5/2010 – 12/2010

Rightload was an experiment in personal and group information filtering. Addressing the problem of information overload, it leveraged machine learning, web standards and a minimalistic UI to fit nimbly within the workflow of the information professional that uses a feed reader for her or his information needs.

Senior Software Engineer, hi5 Networks — 8/2008 – 9/2009

A/B testing design, implementation and advocacy for a top 20 web site by traffic. User behavior and user content analysis. Implemented at scale on Hadoop and using nonparametric statistics in R.

Inference Engineer, Quantcast — 1/2008 – 7/2008

Develop and implement machine learning approaches to analyze web traffic data in very high volumes (4TB/day, implemented on Hadoop). Redesigned reach estimation algorithm that determines ranking for 20 million web sites with proven and significant accuracy gains.

Staff Bioinformatics Engineer, Affymetrix — 5/2002 – 12/2007

Led small team of software engineers to design and implement data analysis pipeline for advanced research group. Developed core algorithms and high performance computing backend. Took leading or collaborative role in several papers published in high impact journals, including one describing the highest resolution human transcriptome map to date, designing and implementing data analysis methods as needed. Named inventor in two patents. Contributed statistical modules to APT software for high volume customers.

Bioinformatics Engineer, ThermoFinnigan — 3/2001 – 5/2002

Algorithms and systems for the analysis of mass spectrometry data in proteomics applications.

Lecturer, University of California, Davis — 10/2000 – 12/2000

Teach Software Engineering/OOP/C++.

Post-graduate researcher, International Computer Science Institute — 10/1999 – 9/2000

Research on algorithms for computational biology and machine learning

Post-graduate researcher, University of California, Davis — 10/1998 – 10/1999

Research on algorithms for computational biology (pedigree analysis)

Visiting Scientist, Sandia National Laboratories 7/1998 – 8/1998

Research on algorithms for computational biology (protein folding, QSAR)


University of Milan, Italy — PhD in Computer Science — 11/1993 – 5/1997

University of Milan, Milan, Italy — MS degree in Computer Science — 11/1987 – 7/1993