David M. Andrzejewski

Applied machine learning and (un)natural language processing.

Technical Blogging

My blog
Mostly machine learning.
Sumo Logic Blog
Posts I've written at work about log management, distributed systems, The Cloud, Scala, etc.

Technical Talks

Privacy-aware data science in Scala with monads and type level programming
Scale by the Bay (San Francisco), 2018
(slides, video)
Sumo Global Intelligence Insights For Better DevSecOps (with Bruno Kurtic)
Illuminate - Sumo Logic User Conference, 2018
Understanding Software System Behavior With ML and Time Series Data
QCon.ai (San Francisco), 2018
(slides, video)
How to eat ML/AI
CCSF Coders Club
Functional Programming for Machine Learning (panel)
Scale By the Bay (San Francisco), 2017
Sumo Logic Analytics Platform Integration
Illuminate - Sumo Logic User Conference, 2017
Economical machine learning via functional programming
Big Data Scala by the Bay (Oakland), 2015
(abstract, slides, video)
Graph mining for log data
Strata Conference (San Jose), 2015
(abstract, slides, video)
Mining human-scale insights from log data with machine learning
OC Big Data Meetup main talk
Machine learning for machine data
Strata Conference (Santa Clara), 2014
(abstract, slides, video)
Scala type classes and machine learning
Bay Area Scala Enthusiasts Meetup lightning talk
(slides, blog post)

Research Papers

Exploring Topic Coherence over Many Models and Many Topics
Keith Stevens, Philip Kegelmeyer, David Andrzejewski and David Buttler.
Conference on Empirical Methods in Natural Language Processing and Natural Language Learning (EMNLP-CoNLL 2012)
(pdf, code)
Latent Topic Feedback for Information Retrieval
David Andrzejewski and David Buttler.
Proceedings of the 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2011)
(pdf, slides, poster, code)
A Framework for Incorporating General Domain Knowledge into Latent Dirichlet Allocation using First-Order Logic
David Andrzejewski, Xiaojin Zhu, Mark Craven, and Benjamin Recht.
Proceedings of the 22nd International Joint Conferences on Artificial Intelligence (IJCAI 2011)
(pdf, slides, poster, code, video)
Accelerated Gibbs Sampling for Infinite Sparse Factor Analysis
David Andrzejewski
Lawrence Livermore National Laboratory Technical Report (LLNL-TR-499647)
(pdf, code)
Inferring Compositional Style in the Neo-plastic Paintings of Piet Mondrian by Machine Learning
David Andrzejewski, David G. Stork, Xiaojin Zhu, and Ron Spronk.
Electronic Imaging: Computer Image Analysis in the Study of Art (SPIE 2010)
(pdf, slides, data)
Incorporating Domain Knowledge into Topic Modeling via Dirichlet Forest Priors
David Andrzejewski, Xiaojin Zhu and Mark Craven.
Proceedings of the 26th International Conference on Machine Learning (ICML 2009)
(pdf, slides, code, video)
Latent Dirichlet Allocation with Topic-in-Set Knowledge
David Andrzejewski and Xiaojin Zhu.
NAACL 2009 Workshop on Semi-supervised Learning for NLP (NAACL-SSLNLP 2009)
(pdf, code, parallel code)
May All Your Wishes Come True: A Study of Wishes and How to Recognize Them
Andrew B. Goldberg, Nathanael Fillmore, David Andrzejewski, Zhiting Xu, Bryan Gibson and Xiaojin Zhu.
Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT 2009)
Statistical Debugging using Latent Topic Models
David Andrzejewski, Anne Mulhern, Ben Liblit, and Xiaojin Zhu.
Proceedings of the 18th European Conference on Machine Learning (ECML 2007)
(pdf, slides, code)
Improving Diversity in Ranking using Absorbing Random Walks
Xiaojin Zhu, Andrew B. Goldberg, Jurgen Van Gael, and David Andrzejewski.
Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT 2007)
(pdf, code)
Ranking Biomedical Passages for Relevance and Diversity: University of Wisconsin, Madison at Trec Genomics 2006
Andrew B. Goldberg, David Andrzejewski, Jurgen Van Gael, Burr Settles, Xiaojin Zhu, and Mark Craven.
Proceedings of the Fifteenth Text Retrieval Conference (TREC 2006)
Enhancement of Combustion by Drift in a Coupled Reaction-Diffusion Model
Lam Raga Markely, David Andrzejewski, Erick Butzlaff, and Alexander Kiselev.
Communications in Mathematical Sciences 2006, Vol 4, Number 1, 213-225

PhD Thesis

Incorporating Domain Knowledge in Latent Topic Models
Advised by Mark Craven and Xiaojin (Jerry) Zhu.
Computer Sciences, University of Wisconsin-Madison (2010)
(pdf, slides)

Patents and Applications

System and method of drug identification through radio frequency identification (RFID)
United States Patent Application (11/465993)
Ronald Makin, Kyle Jansson, Silas Zirn, David Andrzejewski, Timothy Flink.
Visualization Tool For System Tracing Infrastructure Events
United States Patent (US8464221)
Alice X. Zheng, Trishul A. Chilimbi, Shuo-Hsien Hsiao, Danyel A. Fisher, David M. Andrzejewski

Misc Technical Notes

Contrastive Divergence learning for binary Restricted Boltzmann Machines (RBM) (pdf)
Expectation-Maximization (EM) algorithm (pdf)
Online optimization algorithms (pdf)