Skip to main navigation Skip to search Skip to main content

Latent Variable Modeling for Controllable and Diverse Generation From Large Language Models

  • Jianfei Zhang
  • , Bei Li
  • , Zhuofan Chen
  • , Chang Liu
  • , Chen Li
  • , Chenghua Lin
  • , Wenge Rong*
  • *Corresponding author for this work
  • Beihang University
  • Meituan
  • University of Manchester

Research output: Contribution to journalArticlepeer-review

Abstract

Conditional variational auto-encoders (CVAEs) represent a powerful deep generative framework, utilizing latent variables (explicitly modeled hidden states) to capture underlying factors and govern the generation process accordingly. However, this idea is less explored in the era of large language models (LLMs), facing challenges in structural differences between LLMs and traditional CVAEs as well as challenges in posterior collapse (homogeneous latent variables). In this work, we present the first attempt to extend decoder-only LLMs into encoder-decoder CVAEs, aiming at enhancing existing LLMs with flexible control via low-dimensional latent vectors. To achieve this, we introduce a novel optimization objective for effective latent variable modeling and propose a gradient-only skip (G-Skip) connection, which jointly enhances generation controllability while preserving generation quality. Through experiments on AGNews, Yelp, and DailyDialog, we validate the effectiveness of our method in achieving latent modeling and latent-guided language generation on the basis of Llama3-8B. Specifically, we establish new state-of-the-art performance in dialogue generation on the DailyDialog dataset, achieving a BERTScore of 88.30 and a FED score of 5.49.

Original languageEnglish
Pages (from-to)791-805
Number of pages15
JournalIEEE Transactions on Artificial Intelligence
Volume7
Issue number2
DOIs
StatePublished - 2026

Keywords

  • Conditional variational auto-encoders (CVAEs)
  • controllable language generation
  • large language models (LLMs)

Fingerprint

Dive into the research topics of 'Latent Variable Modeling for Controllable and Diverse Generation From Large Language Models'. Together they form a unique fingerprint.

Cite this