High-Dimensional Hyperparameter Optimization via Adjoint Differentiation

Research output: Contribution to journalArticlepeer-review

Abstract

As an emerging machine learning task, high-dimensional hyperparameter optimization (HO) aims at enhancing traditional deep learning models by simultaneously optimizing the neural networks' weights and hyperparameters in a joint bilevel configuration. However, such nested objectives can impose nontrivial difficulties for the pursuit of the gradient of the validation risk with respect to the hyperparameters (a.k.a. hypergradient). To tackle this challenge, we revisit its bilevel objective from the novel perspective of continuous dynamics and then solve the whole HO problem with the adjoint state theory. The proposed HO framework, termed Adjoint Diff, is naturally scalable to a very deep neural network with high-dimensional hyperparameters because it only requires constant memory cost in training. Adjoint Diff is in fact, a general framework that some existing gradient-based HO algorithms are well interpreted by it with simple algebra. In addition, we further offer the Adjoint Diff+ framework by incorporating the prevalent momentum learning concept into the basic Adjoint Diff for enhanced convergence. Experimental results show that our Adjoint Diff frameworks outperform several state-of-the-art approaches on three high-dimensional HO instances including, designing a loss function for imbalanced data, selecting samples from noisy labels, and learning auxiliary tasks for fine-grained classification.

Original languageEnglish
Pages (from-to)2148-2162
Number of pages15
JournalIEEE Transactions on Artificial Intelligence
Volume6
Issue number8
DOIs
StatePublished - 2025

Keywords

  • Adjoint state method
  • deep learning
  • high-dimensional hyperparameter optimization (HO)

Fingerprint

Dive into the research topics of 'High-Dimensional Hyperparameter Optimization via Adjoint Differentiation'. Together they form a unique fingerprint.

Cite this