Can LLMs invent better ways to train LLMs?

Can LLMs invent better ways to train LLMs?

Sakana AI explores using Large Language Models (LLMs) for inventing better ways to train themselves, termed LLM². They leverage evolutionary algorithms to develop novel preference optimization techniques, significantly improving model performance. Their latest report introduces 'Discovered Preference Optimization (DiscoPOP)', achieving state-of-the-art results across various tasks with minimal human intervention. The approach promises a new paradigm of AI self-improvement, reducing extensive trial-and-error efforts traditionally required in AI research.

Visit Original Article →