REVOLVE: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization

Abstract

Recent advancements in large language models (LLMs) have significantly enhanced the ability of LLM-based systems to perform complex tasks through natural language processing and tool interaction. However, optimizing these LLM-based systems for specific tasks remains challenging, often requiring manual interventions like prompt engineering and hyperparameter tuning. Existing automatic optimization methods, such as textual feedback-based techniques (e.g., TextGrad), tend to focus on immediate feedback, analogous to using immediate derivatives in traditional numerical gradient descent. However, relying solely on such feedback can be limited when the adjustments made in response to this feedback are either too small or fluctuate irregularly, potentially slowing down or even stalling the optimization process.

To overcome these challenges, more adaptive methods are needed, especially in situations where the system’s response is evolving slowly or unpredictably. In this paper, we introduce REVOLVE, an optimization method that tracks how Responses EVOLVE across iterations in LLM systems. By focusing on the evolution of responses over time, REVOLVE enables more stable and effective optimization by making thoughtful, progressive adjustments at each step. We evaluate the effectiveness of REVOLVE across three tasks: prompt optimization, solution optimization, and code optimization. Experimental results demonstrate that REVOLVE outperforms competitive baselines, achieving a 7.8% improvement in prompt optimization, a 20.72% gain in solution refinement, and a 29.17% increase in code optimization. Additionally, REVOLVE converges in fewer iterations, resulting in significant computational savings. These advantages highlight its adaptability and efficiency, positioning REVOLVE as a valuable tool for optimizing LLM-based systems and accelerating the development of next-generation AI technologies.

Motivation

How can we unleash the full potential of LLMs? How do we achieve stable and efficient optimization in complex tasks? Traditional multi-agent optimization systems typically rely on immediate feedback from evaluators to adjust LLM outputs. However, these methods, based solely on local feedback, often face stagnation or unstable adjustments, especially when responses evolve slowly or fluctuate significantly.

We believe optimization goes beyond feedback! REVOLVE tracks the evolution of model responses over iterations, simulating second-order optimization effects. Specifically:

Applications

Problem solving

Using REVOLVE, you can enhance solutions to complex scientific problems, improving the zero-shot performance of LLMs. For example, on MMLU-Machine Learning benchmark, REVOLVE boosts the zero-shot accuracy of Llama-3.1-8B by 20.72% relevant performance gain compared with the best existing methods.

Reasoning

As a byproduct of our method, you can also optimize prompts to enhance LLM reasoning capabilities. For instance, REVOLVE refines prompts for reasoning tasks, helping improve SLMs' performance to approach that of GPT-4o in complex reasoning scenarios.

Coding

With REVOLVE, you can optimize solutions to complex coding problems. For example, on the LeetCodeHard benchmark, REVOLVE achieves a 29.17% performance gain in Llama-3.1-8B compared to the leading baseline.

UIUC DREAM Lab

Developing Reliable and Efficient AI for Medicine

Visit DREAM Lab

Join The Community

Join our community on Slack to discuss ideas, ask questions, and collaborate with new friends.

Join Slack

Share Your Research Insights

Provide us feedback by sharing insights or suggesting additional resources related to our study.

Fill This Form

Trustworthy ML Initiative

The Trustworthy ML Initiative (TrustML) addresses challenges in responsible ML by providing resources, showcasing early career researchers, fostering discussions and building a community.

More Info

BibTeX citation

    @misc{zhang2024revolveoptimizingaisystems,
      title={Revolve: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization}, 
      author={Peiyan Zhang and Haibo Jin and Leyang Hu and Xinnuo Li and Liying Kang and Man Luo and Yangqiu Song and Haohan Wang},
      year={2024},
      eprint={2412.03092},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.03092}, 
}