I made a programming language to test how creative LLMs really are

Not because I needed to. Not because it’s efficient. But because current benchmarks feel like they were built to make models look smart, not prove they are.

So I wrote Chester: a purpose-built, toy language inspired by Python and JavaScript. It’s readable (ish), strict (definitely), and forces LLMs to reason structurally—beyond just regurgitating known patterns.

The idea? If a model can take C code and transpile it via RAG into working Chester code, then maybe it understands the algorithm behind the syntax—not just the syntax. In other words, this test is translating the known into the unknown.

Finally, I benchmarked multiple LLMs across hallucination rates, translation quality, and actual execution of generated code.

It’s weird. And it actually kinda works.

submitted by /u/Bruh-Sound-Effect-6
[link] [comments]

Schreibe einen Kommentar

Name	Typ	Größe	Geändert am	Zugriff
📁 AILInux-App	Ordner	-	17.07.2025 10:53	drwxr-xr-x
📁 Distors	Ordner	-	07.07.2025 15:37	drwxr-xr-x
📁 Wine Runtimes	Ordner	-	07.07.2025 15:37	drwxr-xr-x

Schreibe einen Kommentar Antworten abbrechen