SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

2024-06-02

The SWE-bench project investigates the ability of language models to automatically resolve GitHub issues. It uses a dataset comprising 2,294 issue-pull request pairs from 12 popular Python repositories, with evaluations based on unit test verification. The leaderboard showcases various models and their performance on this task, with Amazon Q Developer Agent currently leading.

LanguageModels GitHub Automation MachineLearning Python

Visit Original Article →

Was this useful?