ZoMBI | Portfolio

Fast Optimization of Needle-in-a-Haystack Problems using [Zo]oming [M]emory [B]ased [I]initialization (ZoMBI)

My Contributions:

Conceived the idea
Validated the compute complexity theory with experiment (Python)
Wrote the machine learning code (Python)
Trained and tested all models (Python)
Created analytical test functions
Wrote the paper

Needle-in-a-Haystack problems exist across a wide range of applications including rare disease prediction, ecological resource management, fraud detection, and material property optimization. A Needle-in-a-Haystack problem arises when there is an extreme imbalance of optimum conditions relative to the size of the dataset. For example, only 0.82% out of 146k total materials in the open-access Materials Project database have a negative Poisson's ratio. However, current state-of-the-art optimization algorithms are not designed with the capabilities to find solutions to these challenging multidimensional Needle-in-a-Haystack problems, resulting in slow convergence to a global optimum or pigeonholing into a local minimum. In this paper, we present a Zooming Memory-Based Initialization algorithm, entitled ZoMBI, that builds on conventional Bayesian optimization principles to quickly and efficiently optimize Needle-in-a-Haystack problems in both less time and fewer experiments by addressing the common convergence and pigeonholing issues. ZoMBI actively extracts knowledge from the previously best-performing evaluated experiments to iteratively zoom in the sampling search bounds towards the global optimum "needle" and then prunes the memory of low-performing historical experiments to accelerate compute times.

ZoMBI Workflow

We demonstrate ZoMBI's optimization performance on three real-world datasets exhibiting Needle-in-a-Haystack problems that vary in dimensionality from 6D to 11D and further stress-test the algorithm's performance across an additional 174 analytical datasets that vary in optimum needle width, optimum distance to edges, dimensionality, and initialization conditions. The ZoMBI algorithm demonstrates compute time speed-ups of 400x compared to traditional Bayesian optimization as well as efficiently discovering optima in under 100 experiments that are up to 3x more highly optimized than those discovered by similar methods MiP-EGO, TuRBO, and HEBO.

ZoMBI performance across 3 real-world datasets

ZoMBI accelerated convergence on true target