David M. Andrzejewski


PROFESSIONAL EXPERIENCE

Industry

Sumo Logic
Director of Engineering, AI Experiences
Redwood City, CA (2019-present)

Sumo Logic
Senior Engineering Manager, Data Insights
Redwood City, CA (2018-2019)

Sumo Logic
Engineering Manager, Unified Logs and Metrics
Redwood City, CA (2015-2018)
-Team responsible for new time-series data analysis platform

Sumo Logic
Tech Lead / Manager, Data Sciences Engineering
Redwood City, CA (2014-2015)
-Lead the development of advanced analytics for machine-generated log data

Sumo Logic
Lead Data Sciences Engineer 
Redwood City, CA (2013-2014)

Sumo Logic
Data Sciences Engineer 
Mountain View, CA (2011-2013)

Lawrence Livermore National Laboratory
Postdoctoral Research Staff Member
Livermore, CA (2010-2011)
-Apply statistical modeling to knowledge discovery in text corpora

Microsoft Research 
Research Intern - Redmond, WA (Summer 2008)
-Developed analysis tool for investigating system performance anomalies

GE Healthcare 
Research & Development Engineer - Madison, WI (Summer 2004, Summer 2005)
-Developed software for drug identification system prototype 
-Created clinical pharmacokinetic modeling system prototype

GE Healthcare 
Software Engineer - Menomonee Falls, WI (Fall 2003)
-Developed testing tools for cardiac image analysis software
-Identified and resolved bugs in cardiac image analysis software

Academic

University of Wisconsin-Madison
Research Assistant (Professors Mark Craven and Xiaojin Zhu)
Madison, WI (2008-2010)
-Project: Knowledge-augmented topic models
-Developed new latent topic models to allow prior knowledge and user feedback
-Proposed, implemented, and conducted experiments on new models and techniques

University of Wisconsin-Madison
Computation and Informatics in Biology and Medicine predoctoral trainee
Madison, WI (2005-2008)
-Project: Biomedical text mining
-Applied text mining to assist biological researchers in understanding
 experimental results
-Incorporated structured knowledge sources into biomedical text analysis

University of Wisconsin-Madison Collaborative Undergraduate Research Lab
Undergraduate Researcher - Madison, WI (Fall 2004)
-Conducted computational experiments on reaction-diffusion equations


EDUCATION

University of Wisconsin-Madison
-PhD, Computer Sciences 2010
Research focus: Machine learning
Advisors: Mark Craven and Xiaojin Zhu
Thesis: Incorporating Domain Knowledge in Latent Topic Models
-MS, Computer Sciences 2007
-BS, Computer Engineering, Mathematics, Computer Sciences 2005


PUBLICATIONS

Keith Stevens, Philip Kegelmeyer, David Andrzejewski and David
Buttler.  Exploring topic coherence over many models and many
topics. In EMNLP-CoNLL 2012: Conference on Empirical Methods in
Natural Language Processing and Natural Language Learning, 2012.
Association for Computational Linguistics.  (18% of submissions
accepted for oral presentation)

David Andrzejewski and David Buttler. Latent topic feedback for
information retrieval. In KDD '11: Proceedings of the 17th ACM SIGKDD
Conference on Knowledge Discovery and Data Mining, 2011. Association
for Computing Machinery. (8% of submissions accepted for oral
presentation)

David Andrzejewski, Xiaojin Zhu, Mark Craven, and Benjamin Recht. A
framework for incorporating general domain knowledge into latent
Dirichlet allocation using first-order logic. In IJCAI ’11:
Proceedings of the 22nd International Joint Conference on Artificial
Intelligence, 2011. AAAI Press. (17% of submissions accepted)

David Buttler, David Andrzejewski, Keith Stevens, David Anastasiu, and
Byron Gao. Rapid Exploitation and Analysis of Documents. Lawrence
Livermore National Laboratory Technical Report, 2011 (LLNL-TR-517731).

David Andrzejewski. Accelerated Gibbs Sampling for Infinite Sparse
Factor Analysis. Lawrence Livermore National Laboratory Technical
Report, 2011 (LLNL-TR-499647).

David Andrzejewski, David G. Stork, Xiaojin Zhu, and Ron
Spronk. Inferring compositional style in the neo-plastic paintings of
Piet Mondrian by machine learning. In David G. Stork, Jim Coddington,
and Anna Bentkowska-Kafel, editors, Computer Vision and Image Analysis
of Art, volume 7531, page 75310G, San Jose, California, USA,
2010. SPIE.

David Andrzejewski and Xiaojin Zhu. Latent Dirichlet allocation with
topic-in-set knowledge. In SemiSupLearn '09: Proceedings of the NAACL
HLT 2009 Workshop on Semi-Supervised Learning for Natural Language
Processing, pages 43-48, 2009. Association for Computational
Linguistics.

David Andrzejewski, Xiaojin Zhu, and Mark Craven. Incorporating domain
knowledge into topic modeling via Dirichlet forest priors. In ICML
'09: Proceedings of the 26th Annual International Conference on
Machine Learning, pages 25-32, 2009. Association for Computing
Machinery.  (25% of submissions accepted)

Andrew B. Goldberg, Nathanael Fillmore, David Andrzejewski, Zhiting
Xu, Bryan Gibson, and Xiaojin Zhu. May all your wishes come true: a
study of wishes and how to recognize them. In HLT-NAACL 2009:
Proceedings of the Human Language Technology Conference of the North
American Chapter of the Association of Computational Linguistics,
pages 263-271. Association for Computational Linguistics.  (29% of
submissions accepted).

David Andrzejewski, Anne Mulhern, Ben Liblit, and Xiaojin
Zhu. Statistical debugging using latent topic models. In ECML '07:
Proceedings of the 18th European conference on Machine Learning, pages
6-17, 2007. Springer-Verlag.  (9% of submissions accepted).

Xiaojin Zhu, Andrew B. Goldberg, Jurgen Van Gael, and David
Andrzejewski. Improving diversity in ranking using absorbing random
walks. In HLT-NAACL 2007: Proceedings of the Human Language Technology
Conference of the North American Chapter of the Association of
Computational Linguistics, pages 97-104, 2007. The Association for
Computational Linguistics.  (24% of submissions accepted).

Andrew B. Goldberg, David Andrzejewski, Jurgen Van Gael, Burr Settles,
Xiaojin Zhu, and Mark Craven. Ranking biomedical passages for
relevance and diversity: University of Wisconsin, Madison at TREC
Genomics 2006. In Ellen M. Voorhees and Lori P. Buckland, editors,
TREC 2006: Proceedings of the Fifteenth Text REtrieval Conference,
volume Special Publication 500-272, 2006. National Institute of
Standards and Technology (NIST).

Lam Raga A. Markely, David Andrzejewski, Erick Butzlaff, and Alexander
Kiselev. Enhancement of combustion by drift in a coupled
reaction-diffusion model. Communications in Mathematical Sciences,
4(1):213-225, 2006.


TALKS

Getting more out of security log data. University of San Francisco -
Data Science Speaker Series (February 2023)

Ad hoc investigation of suspicious event logs with Behavior Insights
(with Bashyam Anant). SANS Institute Webcast (September 2022)

Improving Cyber Defense with Cloud MLOps and Cloud SIEM (with Girish
Bhat). (ISC)² Security Briefings (April 2022)

Leveraging Cloud MLOps to detect and investigate security threats with
confidence (with Girish Bhat). IBM Digital Developer Conference: Cloud
Security (December 2021)

Drive DevSecOps diagnosis with insights from Sumo Logic’s Global
Intelligence Service (with Shri Subramanian and Manpreet
Dhanjal). Illuminate - Sumo Logic User Conference (October 2020)

Reliable machine learning. Scale By the Bay, Oakland (November 2019)

Us vs Them: Global Intelligence to Action for Amazon GuardDuty (with
Bashyam Anant). Illuminate - Sumo Logic User Conference (September
2019)

Privacy-aware data science in Scala with monads and type level
programming. Scale By the Bay, San Francisco (November 2018)

Sumo Global Intelligence Insights For Better DevSecOps (with Bruno
Kurtic). Illuminate - Sumo Logic User Conference (September 2018)

Understanding Software System Behavior With ML and Time Series
Data. QCon.ai, San Francisco (April 2018)

How to Eat AI/ML. CCSF Coders Club (December 2017)

Functional Programming for Machine Learning (panel). Scale By the Bay,
San Francisco (November 2017)

Sumo Logic Analytics Platform Integration.  Illuminate - Sumo Logic
User Conference (September 2017)

Economical machine learning via functional programming. 
Big Data Scala by the Bay, Oakland (August 2015)

Graph mining for log data. Strata + Hadoop World, San Jose (February
2015)

Mining human-scale insights from log data with machine
learning. Orange County Big Data Meetup (September 2014)

Machine learning for machine data. Strata Conference, Santa Clara
(February 2014)

Scala type classes and machine learning. Bay Area Scala Enthusiasts
lightning talk (January 2013)

Latent Topic Feedback for Information Retrieval. ACM SIGKDD Conference
on Knowledge Discovery and Data Mining (August 2011)

A Framework for Incorporating General Domain Knowledge into Latent
Dirichlet Allocation using First-Order Logic.  International Joint
Conference on Artificial Intelligence (July 2011)

Machine Learning: An Overview.  LLNL Global Security Tech Talks
(May 2011)

Inferring compositional style in the neo-plastic paintings of Piet
Mondrian by machine learning. SPIE Computer Vision and Image Analysis
of Art (January 2010)

Incorporating domain knowledge into topic modeling via Dirichlet
forest priors. International Conference on Machine Learning (June
2009)

Data analysis with latent topic models: genes, bugs, and
art. UW-Madison CIBM Seminar (March 2008)

Statistical debugging using latent topic models. European Conference
on Machine Learning (September 2007)

Extracting information from the scientific literature to aid in
uncovering gene regulatory networks. NSF Symposium on Cyber-Enabled
Discovery and Innovation (September 2007)


SERVICE

-Co-organizer SF Bay Area Machine Learning Meetup (> 7000 members)

-Reviewing
 Top 30% highest-scoring reviewer - NIPS 2018
 International Conference on Learning Representations (ICLR 2024, 2023, 2021)
 International Conference on Machine Learning (ICML 2024, 2023, 2021, 2018, 2016, 2015, 2014, 2013, 2010)
 International Conference on Artificial Intelligence and Statistics (AISTATS 2020, 2019, 2018, 2017)
 Neural Information Processing Systems (NIPS/NeurIPS 2023, 2020, 2019, 2018, 2017, 2015, 2011)
 International Joint Conferences on Artificial Intelligence (IJCAI 2020, 2019, 2015, 2011)
 AAAI Conference on Artificial Intelligence (AAAI 2020, 2018)
 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2013, 2012)
 Conference on Empirical Methods on Natural Language Processing (EMNLP 2014, 2015)
 SIAM International Conference on Data Mining (SDM 2013)
 Deep Learning from Scratch (O'Reilly Media)
 Hands-on Machine Learning with Scikit-Learn and TensorFlow (O'Reilly Media)
 Fundamentals of Deep Learning (O'Reilly Media)
 Learning Spark (O'Reilly Media)
 Reactive Machine Learning Systems (Manning Publications)
 Scaling Up Machine Learning (Cambridge University Press)
 SoftwareMining-2015 (IEEE ASE 2015 Workshop)  
 SoftwareMining-2014 (ICDM 2014 Workshop)
 SoftwareMining-2013 (ASE 2013 Workshop)  
 SoftwareMining-2012 (KDD 2012 Workshop)
 IEEE International Conference on Development and Learning (ICDL 2010)
 Topic Models: Computation, Application, and Evaluation (NIPS 2013 workshop)
 Strata + Hadoop World New York 2014-2019
 Strata + Hadoop World 2014-2020
 Data Mining and Knowledge Discovery           
 Journal of Computer Science and Technology
 Journal of the American Society for Information Science and Technology
 Open Information Systems Journal
 Machine Learning

-Coordinated AI reading group (Fall 2009-Spring 2010)

-Organized and coordinated Math for Machine Learning reading group
 (Spring 2010)

-UW-Madison Computer Sciences graduate admissions committee (2009)


PATENTS AND APPLICATIONS

Clustering of structured log data by key schema
United States Patent (11321158)
Udit Saxena, Reetika Roy, Ryley Higa, David M. Andrzejewski, Bashyam TCA

Clustering of structured log data by key-values
United States Patent (11663066)
Udit Saxena, Reetika Roy, Ryley Higa, David M. Andrzejewski, Bashyam TCA

Cardinality of time series 
United States Patent (11182434)
Christian Friedrich Beedgen, David M. Andrzejewski, Weijia Che

Anomaly detection 
United States Patent (10445311B1)
Kumar Saurabh, David M. Andrzejewski, Yuchen Zhao, Christian Friedrich
Beedgen, Bruno Kurtic

Data enrichment and augmentation
United States Patent (11397726)
Christian Friedrich Beedgen, David M Andrzejewski, Benjamin Everette Newton, Kumar Avijit, Stefan Christoph Zier

Logs to metrics synthesis
United States Patent (11042534) 
Christian Friedrich Beedgen, David M Andrzejewski, Benjamin Everette
Newton, Kumar Avijit, Stefan Christoph Zier

Key name synthesis
United States Patent (11481383)
Christian Friedrich Beedgen, David M. Andrzejewski

Visualization tool for system tracing infrastructure events
United States Patent (8464221)
Alice X. Zheng, Trishul A. Chilimbi, Shuo-Hsien Hsiao, Danyel
A. Fisher, David M. Andrzejewski

System and method of drug identification through radio frequency
identification (RFID)
United States Patent Application (11/465993)
Ronald Makin, Kyle Jansson, Silas Zirn, David Andrzejewski, and Timothy Flink.


AWARDS

ICML student travel award (2009)
Computation and Informatics in Biology and Medicine (CIBM) traineeship
(2005-2008 NIH/NLM doctoral training award)