Complete, experienced software engineer and data scientist with industrial and academic backgrounds. Select work includes energy storage optimization, big data packages for R, information filtering, social network analysis. His work has been cited 5000+ times in the scientific literature.
Software for big data analytics, map-reduce algorithms. Algorithm design and implementation, data analysis, machine learning. Speaker at professional meetings.
- Expert knowledge of machine learning
- Unrelenting troubleshooter, delights users
- Accomplished API designer, big data expert
Data Scientist, Decision Patterns — 7/2019 -– Present
Projects include the refactor and optimization of large Hadoop ETL and machine learning jobs for a Fortune 500 client.
Senior Data Scientist, Stem Inc. — 12/2016 – 2/2018
Design and implementation of probabilistic load prediction algorithms using Spark, Tensorflow and scikit-learn on proprietary datasets combined with meteorological data. Contributed to the design of battery storage optimization algorithms.
Consultant, Self-employed — 12/2010 – 11/2016
Clients include Dataspora, Revolution Analytics and Adatao. Lead developer for the RHadoop open source project for big data analytics, including creating the rmr package and others in addition to several internal projects. rmr became the de-facto standard for big data analytics in R and has been downloaded more than 1M times. Technical and instructional documents and community building through speaking engagements and forum participation. Presented at strata. Other projects include text analysis of user interactions for a software developer, using NLTK and scikit-learn, and a large data integration project with approximate user identification.
Principal, Rightload — 5/2010 – 12/2010
Rightload was an experiment in personal and group information filtering. Addressing the problem of information overload, it leveraged machine learning, web standards and a minimalistic UI to fit nimbly within the workflow of the information professional that uses a feed reader for her or his information needs.
Senior Software Engineer, hi5 Networks — 8/2008 – 9/2009
A/B testing design, implementation and advocacy for a top 20 web site by traffic. User behavior and user content analysis. Implemented at scale on Hadoop and using nonparametric statistics in R.
Inference Engineer, Quantcast — 1/2008 – 7/2008
Develop and implement machine learning approaches to analyze web traffic data in very high volumes (4TB/day, implemented on Hadoop). Redesigned reach estimation algorithm that determines ranking for 20 million web sites with proven and significant accuracy gains.
Staff Bioinformatics Engineer, Affymetrix — 5/2002 – 12/2007
Led small team of software engineers to design and implement data analysis pipeline for advanced research group. Developed core algorithms and high performance computing backend. Took leading or collaborative role in several papers published in high impact journals, including one describing the highest resolution human transcriptome map to date, designing and implementing data analysis methods as needed. Named inventor in two patents. Contributed statistical modules to APT software for high volume customers.
Bioinformatics Engineer, ThermoFinnigan — 3/2001 – 5/2002
Algorithms and systems for the analysis of mass spectrometry data in proteomics applications.
Lecturer, University of California, Davis — 10/2000 – 12/2000
Teach Software Engineering/OOP/C++.
Post-graduate researcher, International Computer Science Institute — 10/1999 – 9/2000
Research on algorithms for computational biology and machine learning
Post-graduate researcher, University of California, Davis — 10/1998 – 10/1999
Research on algorithms for computational biology (pedigree analysis)
Visiting Scientist, Sandia National Laboratories 7/1998 – 8/1998
Research on algorithms for computational biology (protein folding, QSAR)
University of Milan, Italy — PhD in Computer Science — 11/1993 – 5/1997
University of Milan, Milan, Italy — MS degree in Computer Science — 11/1987 – 7/1993