Abstract
In this paper, we address the challenge of enabling large language models (LLMs) to effectively follow domain-specific instructions, a critical requirement for their successful deployment across various industries. We propose a novel pipeline for constructing verifiable instructions tailored to specific domains. This pipeline consists of three key stages: the creation of meta-requirement templates, the generation of custom instructions using GPT-4 with seed prompts, and manual refinement to ensure clarity, precision, and relevance. A unique aspect of our approach is the incorporation of verifiability into the instruction-following tuning process. Specifically, we design a verified reward mechanism within the Direct Preference Optimization (DPO) framework. This mechanism leverages the ability to automatically verify whether the generated responses adhere to the given instructions. By integrating this verified reward, we enable more effective alignment of LLM behavior with domain-specific requirements, ensuring higher reliability and consistency in outputs. Our study also explores various strategies to enhance the instruction-following capabilities of LLMs, with a focus on fine-tuning methodologies and data augmentation techniques. We provide a comprehensive analysis of domain-specific requirements to better understand how LLMs can be adapted for practical, real-world applications. The efficacy of our approach is empirically validated on GPT-4 and the LLaMA2 series. Notably, the LLaMA-7B model demonstrates a significant performance improvement of over 19% compared to zero-shot settings, underscoring the effectiveness of our methods. This work contributes to the field by bridging the gap between the general capabilities of LLMs and the nuanced demands of domain-specific instruction following. Our findings pave the way for more reliable and adaptable LLM applications across diverse industries.
| Original language | English |
|---|---|
| Journal | Proceedings of the International Joint Conference on Neural Networks |
| DOIs | |
| State | Published - 2025 |
| Event | 2025 International Joint Conference on Neural Networks, IJCNN 2025 - Rome, Italy Duration: 30 Jun 2025 → 5 Jul 2025 |
Keywords
- Domain Adaptation
- Instruction Following
- Large Language Models
- Verifiable Rewards
Fingerprint
Dive into the research topics of 'Learning to Follow Domain-specific Instruction with Verifiable Rewards'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver