Complete, experienced software engineer and data scientist with industrial and academic backgrounds. Most recent work on big data packages for R, information filtering, social network analysis, web analytics. Cited 5000+ times in the scientific literature.
Software for big data analytics, map-reduce algorithms. Algorithm design and implementation, data analysis. Speaker at professional meetings.
Manager, Per Data LLC — June 2013 – present
Design and development of scalable data analysis software and algorithms that combine data scientist-friendly, high level APIs with existing big data platforms (RHadoop project). Technical and instructional documents and community building through speaking engagements and forum participation. Research and analysis as related to scalable data analysis.
Consultant, Self-employed — 12/2010 – 5/2013
Clients include Dataspora and Revolution Analytics. Lead developer for the RHadoop open source project, including creating the rmr, plyrmr, quickcheck and dplyr-spark packages in addition to several internal projects. rmr has become the de-facto standard for big data analytics in R and has several thousands users. Presented at strata
Principal, Rightload — 5/2010 – 12/2010
Rightload was an experiment in personal and group information filtering. Addressing the problem of information overload, it leveraged machine learning, web standards and a minimalistic UI to fit nimbly within the workflow of the information professional that uses a feed reader for her or his information needs.
Senior Software Engineer, hi5 Networks — 8/2008 – 9/2009
A/B testing design, implementation and advocacy. User behavior and user content analysis. All of the above implemented at scale on Hadoop for a top 20 web site by traffic.
Inference Engineer, Quantcast — 1/2008 – 7/2008
Develop and implement machine learning approaches to analyze web traffic data in very high volumes. Redesigned reach estimation algorithm that determines ranking for 20 million web sites with proven and significant accuracy gains.
Staff Bioinformatics Engineer, Affymetrix — 5/2002 – 12/2007
Led small team of software engineers to design and implement data analysis pipeline for advanced research group. Developed core algorithms and high performance computing backend. Took leading or collaborative role in several papers published in high impact journals, including one describing the highest resolution human transcriptome map to date, designing and implementing data analysis methods as needed. Named inventor in two patents. Contributed statistical modules to APT software for high volume customers.
Bioinformatics Engineer, ThermoFinnigan — 3/2001 – 5/2002
Algorithms and systems for the analysis of mass spectrometry data in proteomics applications.
Lecturer, University of California, Davis — 10/2000 – 12/2000
Teach Software Engineering/OOP/C++.
Post-graduate researcher, International Computer Science Institute — 10/1999 – 9/2000
Research on algorithms for computational biology and machine learning
Post-graduate researcher, University of California, Davis — 10/1998 – 10/1999
Research on algorithms for computational biology (pedigree analysis)
Visiting Scientist, Sandia National Laboratories 7/1998 – 8/1998
Research on algorithms for computational biology (protein folding, QSAR)
University of Milan, Italy — PhD in Computer Science — 11/1993 – 5/1997
University of Milan, Milan, Italy — MS degree in Computer Science — 11/1987 – 7/1993