Complete, experienced software engineer and data scientist with industrial and academic backgrounds. Most recent work on energy storage optimization,big data packages for R, information filtering, social network analysis. His work has been cited 5000+ times in the scientific literature.


Software for big data analytics, map-reduce algorithms. Algorithm design and implementation, data analysis, machine learning. Speaker at professional meetings.

Work Experience

Senior Data Scientist, Stem Inc. — 12/2016 – present

Design and implementation of load prediction and battery storage optimization algorithms.

Manager, Per Data LLC — 6/2013 – 11/2016

Design and development of scalable data analysis software and algorithms that combine data scientist-friendly, high level APIs with existing big data platforms (RHadoop project). Technical and instructional documents and community building through speaking engagements and forum participation. Research and analysis as related to scalable data analysis.

Consultant, Self-employed — 12/2010 – 5/2013

Clients include Dataspora and Revolution Analytics. Lead developer for the RHadoop open source project, including creating the rmr, plyrmr, quickcheck and dplyr-spark packages in addition to several internal projects. rmr has become the de-facto standard for big data analytics in R and has several thousands users. Presented at strata

Principal, Rightload — 5/2010 – 12/2010

Rightload was an experiment in personal and group information filtering. Addressing the problem of information overload, it leveraged machine learning, web standards and a minimalistic UI to fit nimbly within the workflow of the information professional that uses a feed reader for her or his information needs.

Senior Software Engineer, hi5 Networks — 8/2008 – 9/2009

A/B testing design, implementation and advocacy. User behavior and user content analysis. All of the above implemented at scale on Hadoop for a top 20 web site by traffic.

Inference Engineer, Quantcast — 1/2008 – 7/2008

Develop and implement machine learning approaches to analyze web traffic data in very high volumes. Redesigned reach estimation algorithm that determines ranking for 20 million web sites with proven and significant accuracy gains.

Staff Bioinformatics Engineer, Affymetrix — 5/2002 – 12/2007

Led small team of software engineers to design and implement data analysis pipeline for advanced research group. Developed core algorithms and high performance computing backend. Took leading or collaborative role in several papers published in high impact journals, including one describing the highest resolution human transcriptome map to date, designing and implementing data analysis methods as needed. Named inventor in two patents. Contributed statistical modules to APT software for high volume customers.

Bioinformatics Engineer, ThermoFinnigan — 3/2001 – 5/2002

Algorithms and systems for the analysis of mass spectrometry data in proteomics applications.

Lecturer, University of California, Davis — 10/2000 – 12/2000

Teach Software Engineering/OOP/C++.

Post-graduate researcher, International Computer Science Institute — 10/1999 – 9/2000

Research on algorithms for computational biology and machine learning

Post-graduate researcher, University of California, Davis — 10/1998 – 10/1999

Research on algorithms for computational biology (pedigree analysis)

Visiting Scientist, Sandia National Laboratories 7/1998 – 8/1998

Research on algorithms for computational biology (protein folding, QSAR)


University of Milan, Italy — PhD in Computer Science — 11/1993 – 5/1997

University of Milan, Milan, Italy — MS degree in Computer Science — 11/1987 – 7/1993