SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

The SWE-bench project investigates the ability of language models to automatically resolve GitHub issues. It uses a dataset comprising 2,294 issue-pull request pairs from 12 popular Python repositories, with evaluations based on unit test verification. The leaderboard showcases various models and their performance on this task, with Amazon Q Developer Agent currently leading.

Visit Original Article →