David M. Andrzejewski PROFESSIONAL EXPERIENCE Industry Sumo Logic Director of Engineering, AI Experiences Redwood City, CA (2019-present) Sumo Logic Senior Engineering Manager, Data Insights Redwood City, CA (2018-2019) Sumo Logic Engineering Manager, Unified Logs and Metrics Redwood City, CA (2015-2018) -Team responsible for new time-series data analysis platform Sumo Logic Tech Lead / Manager, Data Sciences Engineering Redwood City, CA (2014-2015) -Lead the development of advanced analytics for machine-generated log data Sumo Logic Lead Data Sciences Engineer Redwood City, CA (2013-2014) Sumo Logic Data Sciences Engineer Mountain View, CA (2011-2013) Lawrence Livermore National Laboratory Postdoctoral Research Staff Member Livermore, CA (2010-2011) -Apply statistical modeling to knowledge discovery in text corpora Microsoft Research Research Intern - Redmond, WA (Summer 2008) -Developed analysis tool for investigating system performance anomalies GE Healthcare Research & Development Engineer - Madison, WI (Summer 2004, Summer 2005) -Developed software for drug identification system prototype -Created clinical pharmacokinetic modeling system prototype GE Healthcare Software Engineer - Menomonee Falls, WI (Fall 2003) -Developed testing tools for cardiac image analysis software -Identified and resolved bugs in cardiac image analysis software Academic University of Wisconsin-Madison Research Assistant (Professors Mark Craven and Xiaojin Zhu) Madison, WI (2008-2010) -Project: Knowledge-augmented topic models -Developed new latent topic models to allow prior knowledge and user feedback -Proposed, implemented, and conducted experiments on new models and techniques University of Wisconsin-Madison Computation and Informatics in Biology and Medicine predoctoral trainee Madison, WI (2005-2008) -Project: Biomedical text mining -Applied text mining to assist biological researchers in understanding experimental results -Incorporated structured knowledge sources into biomedical text analysis University of Wisconsin-Madison Collaborative Undergraduate Research Lab Undergraduate Researcher - Madison, WI (Fall 2004) -Conducted computational experiments on reaction-diffusion equations EDUCATION University of Wisconsin-Madison -PhD, Computer Sciences 2010 Research focus: Machine learning Advisors: Mark Craven and Xiaojin Zhu Thesis: Incorporating Domain Knowledge in Latent Topic Models -MS, Computer Sciences 2007 -BS, Computer Engineering, Mathematics, Computer Sciences 2005 PUBLICATIONS Keith Stevens, Philip Kegelmeyer, David Andrzejewski and David Buttler. Exploring topic coherence over many models and many topics. In EMNLP-CoNLL 2012: Conference on Empirical Methods in Natural Language Processing and Natural Language Learning, 2012. Association for Computational Linguistics. (18% of submissions accepted for oral presentation) David Andrzejewski and David Buttler. Latent topic feedback for information retrieval. In KDD '11: Proceedings of the 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2011. Association for Computing Machinery. (8% of submissions accepted for oral presentation) David Andrzejewski, Xiaojin Zhu, Mark Craven, and Benjamin Recht. A framework for incorporating general domain knowledge into latent Dirichlet allocation using first-order logic. In IJCAI ’11: Proceedings of the 22nd International Joint Conference on Artificial Intelligence, 2011. AAAI Press. (17% of submissions accepted) David Buttler, David Andrzejewski, Keith Stevens, David Anastasiu, and Byron Gao. Rapid Exploitation and Analysis of Documents. Lawrence Livermore National Laboratory Technical Report, 2011 (LLNL-TR-517731). David Andrzejewski. Accelerated Gibbs Sampling for Infinite Sparse Factor Analysis. Lawrence Livermore National Laboratory Technical Report, 2011 (LLNL-TR-499647). David Andrzejewski, David G. Stork, Xiaojin Zhu, and Ron Spronk. Inferring compositional style in the neo-plastic paintings of Piet Mondrian by machine learning. In David G. Stork, Jim Coddington, and Anna Bentkowska-Kafel, editors, Computer Vision and Image Analysis of Art, volume 7531, page 75310G, San Jose, California, USA, 2010. SPIE. David Andrzejewski and Xiaojin Zhu. Latent Dirichlet allocation with topic-in-set knowledge. In SemiSupLearn '09: Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing, pages 43-48, 2009. Association for Computational Linguistics. David Andrzejewski, Xiaojin Zhu, and Mark Craven. Incorporating domain knowledge into topic modeling via Dirichlet forest priors. In ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning, pages 25-32, 2009. Association for Computing Machinery. (25% of submissions accepted) Andrew B. Goldberg, Nathanael Fillmore, David Andrzejewski, Zhiting Xu, Bryan Gibson, and Xiaojin Zhu. May all your wishes come true: a study of wishes and how to recognize them. In HLT-NAACL 2009: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pages 263-271. Association for Computational Linguistics. (29% of submissions accepted). David Andrzejewski, Anne Mulhern, Ben Liblit, and Xiaojin Zhu. Statistical debugging using latent topic models. In ECML '07: Proceedings of the 18th European conference on Machine Learning, pages 6-17, 2007. Springer-Verlag. (9% of submissions accepted). Xiaojin Zhu, Andrew B. Goldberg, Jurgen Van Gael, and David Andrzejewski. Improving diversity in ranking using absorbing random walks. In HLT-NAACL 2007: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pages 97-104, 2007. The Association for Computational Linguistics. (24% of submissions accepted). Andrew B. Goldberg, David Andrzejewski, Jurgen Van Gael, Burr Settles, Xiaojin Zhu, and Mark Craven. Ranking biomedical passages for relevance and diversity: University of Wisconsin, Madison at TREC Genomics 2006. In Ellen M. Voorhees and Lori P. Buckland, editors, TREC 2006: Proceedings of the Fifteenth Text REtrieval Conference, volume Special Publication 500-272, 2006. National Institute of Standards and Technology (NIST). Lam Raga A. Markely, David Andrzejewski, Erick Butzlaff, and Alexander Kiselev. Enhancement of combustion by drift in a coupled reaction-diffusion model. Communications in Mathematical Sciences, 4(1):213-225, 2006. TALKS Getting more out of security log data. University of San Francisco - Data Science Speaker Series (February 2023) Ad hoc investigation of suspicious event logs with Behavior Insights (with Bashyam Anant). SANS Institute Webcast (September 2022) Improving Cyber Defense with Cloud MLOps and Cloud SIEM (with Girish Bhat). (ISC)² Security Briefings (April 2022) Leveraging Cloud MLOps to detect and investigate security threats with confidence (with Girish Bhat). IBM Digital Developer Conference: Cloud Security (December 2021) Drive DevSecOps diagnosis with insights from Sumo Logic’s Global Intelligence Service (with Shri Subramanian and Manpreet Dhanjal). Illuminate - Sumo Logic User Conference (October 2020) Reliable machine learning. Scale By the Bay, Oakland (November 2019) Us vs Them: Global Intelligence to Action for Amazon GuardDuty (with Bashyam Anant). Illuminate - Sumo Logic User Conference (September 2019) Privacy-aware data science in Scala with monads and type level programming. Scale By the Bay, San Francisco (November 2018) Sumo Global Intelligence Insights For Better DevSecOps (with Bruno Kurtic). Illuminate - Sumo Logic User Conference (September 2018) Understanding Software System Behavior With ML and Time Series Data. QCon.ai, San Francisco (April 2018) How to Eat AI/ML. CCSF Coders Club (December 2017) Functional Programming for Machine Learning (panel). Scale By the Bay, San Francisco (November 2017) Sumo Logic Analytics Platform Integration. Illuminate - Sumo Logic User Conference (September 2017) Economical machine learning via functional programming. Big Data Scala by the Bay, Oakland (August 2015) Graph mining for log data. Strata + Hadoop World, San Jose (February 2015) Mining human-scale insights from log data with machine learning. Orange County Big Data Meetup (September 2014) Machine learning for machine data. Strata Conference, Santa Clara (February 2014) Scala type classes and machine learning. Bay Area Scala Enthusiasts lightning talk (January 2013) Latent Topic Feedback for Information Retrieval. ACM SIGKDD Conference on Knowledge Discovery and Data Mining (August 2011) A Framework for Incorporating General Domain Knowledge into Latent Dirichlet Allocation using First-Order Logic. International Joint Conference on Artificial Intelligence (July 2011) Machine Learning: An Overview. LLNL Global Security Tech Talks (May 2011) Inferring compositional style in the neo-plastic paintings of Piet Mondrian by machine learning. SPIE Computer Vision and Image Analysis of Art (January 2010) Incorporating domain knowledge into topic modeling via Dirichlet forest priors. International Conference on Machine Learning (June 2009) Data analysis with latent topic models: genes, bugs, and art. UW-Madison CIBM Seminar (March 2008) Statistical debugging using latent topic models. European Conference on Machine Learning (September 2007) Extracting information from the scientific literature to aid in uncovering gene regulatory networks. NSF Symposium on Cyber-Enabled Discovery and Innovation (September 2007) SERVICE -Co-organizer SF Bay Area Machine Learning Meetup (> 7000 members) -Reviewing Top 30% highest-scoring reviewer - NIPS 2018 International Conference on Learning Representations (ICLR 2024, 2023, 2021) International Conference on Machine Learning (ICML 2024, 2023, 2021, 2018, 2016, 2015, 2014, 2013, 2010) International Conference on Artificial Intelligence and Statistics (AISTATS 2020, 2019, 2018, 2017) Neural Information Processing Systems (NIPS/NeurIPS 2023, 2020, 2019, 2018, 2017, 2015, 2011) International Joint Conferences on Artificial Intelligence (IJCAI 2020, 2019, 2015, 2011) AAAI Conference on Artificial Intelligence (AAAI 2020, 2018) Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2013, 2012) Conference on Empirical Methods on Natural Language Processing (EMNLP 2014, 2015) SIAM International Conference on Data Mining (SDM 2013) Deep Learning from Scratch (O'Reilly Media) Hands-on Machine Learning with Scikit-Learn and TensorFlow (O'Reilly Media) Fundamentals of Deep Learning (O'Reilly Media) Learning Spark (O'Reilly Media) Reactive Machine Learning Systems (Manning Publications) Scaling Up Machine Learning (Cambridge University Press) SoftwareMining-2015 (IEEE ASE 2015 Workshop) SoftwareMining-2014 (ICDM 2014 Workshop) SoftwareMining-2013 (ASE 2013 Workshop) SoftwareMining-2012 (KDD 2012 Workshop) IEEE International Conference on Development and Learning (ICDL 2010) Topic Models: Computation, Application, and Evaluation (NIPS 2013 workshop) Strata + Hadoop World New York 2014-2019 Strata + Hadoop World 2014-2020 Data Mining and Knowledge Discovery Journal of Computer Science and Technology Journal of the American Society for Information Science and Technology Open Information Systems Journal Machine Learning -Coordinated AI reading group (Fall 2009-Spring 2010) -Organized and coordinated Math for Machine Learning reading group (Spring 2010) -UW-Madison Computer Sciences graduate admissions committee (2009) PATENTS AND APPLICATIONS Clustering of structured log data by key schema United States Patent (11321158) Udit Saxena, Reetika Roy, Ryley Higa, David M. Andrzejewski, Bashyam TCA Clustering of structured log data by key-values United States Patent (11663066) Udit Saxena, Reetika Roy, Ryley Higa, David M. Andrzejewski, Bashyam TCA Cardinality of time series United States Patent (11182434) Christian Friedrich Beedgen, David M. Andrzejewski, Weijia Che Anomaly detection United States Patent (10445311B1) Kumar Saurabh, David M. Andrzejewski, Yuchen Zhao, Christian Friedrich Beedgen, Bruno Kurtic Data enrichment and augmentation United States Patent (11397726) Christian Friedrich Beedgen, David M Andrzejewski, Benjamin Everette Newton, Kumar Avijit, Stefan Christoph Zier Logs to metrics synthesis United States Patent (11042534) Christian Friedrich Beedgen, David M Andrzejewski, Benjamin Everette Newton, Kumar Avijit, Stefan Christoph Zier Key name synthesis United States Patent (11481383) Christian Friedrich Beedgen, David M. Andrzejewski Visualization tool for system tracing infrastructure events United States Patent (8464221) Alice X. Zheng, Trishul A. Chilimbi, Shuo-Hsien Hsiao, Danyel A. Fisher, David M. Andrzejewski System and method of drug identification through radio frequency identification (RFID) United States Patent Application (11/465993) Ronald Makin, Kyle Jansson, Silas Zirn, David Andrzejewski, and Timothy Flink. AWARDS ICML student travel award (2009) Computation and Informatics in Biology and Medicine (CIBM) traineeship (2005-2008 NIH/NLM doctoral training award)