Zum Inhalt springen

Terminal-Bench: a benchmark for AI agents in terminal environments