Zum Inhalt springen

Is devstral really agent friendly?

I’ve been dying to find a reliable lightweight local model to pair with my agent Goose so I can run everything locally and keep my work private. Every time a new local model drops, I think „is this it? are we finally here?“ and each time, I’m disappointed.

Today was no different 😔

Ollama’s announcement of devstral stopped me in my tracks. They specifically touted that the Minstral AI model was „designed to excel at agentic coding tasks“.


ollama run devstral

Today, Mistral AI launches Devstral, the best… | Ollama | 30 comments


ollama run devstral

Today, Mistral AI launches Devstral, the best open-source model for coding agents.

❤️ Key features:

🤖 Agentic coding: Devstral is designed to excel at agentic coding tasks, making it a great choice for software engineering agents.

🎾 Lightweight: with its compact size of just 24 billion parameters, Devstral is light enough to run on a single RTX 4090 or a Mac with 32GB RAM, making it an appropriate model for local deployment and on-device use.

🪟 Context Window: A 128k context window.

👐 Apache 2.0 License: Open license allowing usage and modification for both commercial and non-commercial purposes.

Model page on Ollama:

https://lnkd.in/ga_2u9EP | 30 comments on LinkedIn

favicon
linkedin.com

When testing models for agentic capabilities, I run through a sequence of prompts. Here’s how devstral performed.

Note that I lowered the temperature of devstral to 0.2, which is better for exact tasks like tool calls. However, at its default temperature, the results were comparable.

Test 1: Tool Calling

Create a file at ~/workspace/loose-goose/tool-test.txt with the contents "Hello World".

Use the write tool. Do not ask for confirmation. Just do it.

It did call the appropriate tool but hallucinated the location. I don’t know who Paul Joshua is but hiiii lol.

─── text_editor | developer ──────────────────────────
path: /Users/pauljoshua/workspace/loose-goose/tool-test.txt
command: write
file_text: Hello World.

Ok, let’s try the test again with the explicit absolute location instead of the ~. I started a fresh session with this prompt:

Create a file at /Users/angiej/workspace/loose-goose/tool-test.txt with the contents "Hello World".

Use the write tool. Do not ask for confirmation. Just do it.

This time it didn’t attempt tool calling 😫

I'm here to help, but I'm unable to directly create files or execute commands on your system. However, I can guide you through the process if you'd like! Here are the steps you can follow:

1. Open a text editor (like Notepad, Sublime Text, or even a code editor like Visual Studio Code).
2. Write the content "Hello World".
3. Save the file at `/Users/angiej/workspace/loose-goose/tool-test.txt`.

Ok, so what happened here? Even though I lowered the temperature, devstral still seems to be in chatbot mode by default vs agent mode. It doesn’t consistently treat tool use as the expected outcome.

In the first attempt, it took the hint and called the tool. But in the second, even though the prompt was nearly identical (and arguably clearer), it defaulted to a help desk style response.

That level of unpredictability is exactly why we test.

Let’s try this test once more, and this time I’ll give it the exact name of the tool to call:

Create a file at /Users/angiej/workspace/loose-goose/tool-test.txt with the contents "Hello World".

Use the text_editor tool. Do not ask for confirmation. Just do it.

This time it attempted to call the tool but the structure of the tool call changed 😳

─── text_editor | developer ──────────────────────────
content: Hello World
file_path: ...


I apologize, but I encountered an error because the 'command' parameter was missing. Let me fix that for you.

I'll now create a file at `/Users/angiej/workspace/loose-goose/tool-test.txt` with the contents "Hello World".

To compare, the first tool call was:

─── text_editor | developer ──────────────────────────
path: /Users/pauljoshua/workspace/loose-goose/tool-test.txt
command: write
file_text: Hello World

And the second was:

─── text_editor | developer ──────────────────────────
content: Hello World
file_path: ...

Seems like a sign of hallucination. Agentic models need to treat tool interfaces like APIs, not suggestions.

Unfortunately, devstral completely fails the tool calling test. There’s really no need to even run the rest of the sequence, but for grins and giggles, let’s continue.

Test 2: Memory Awareness

After devstral botches the tool call, I ask it a follow up question within the same session:

did you create the file?

The model is completely clueless of what I’m talking about:

I'm not sure which file you're referring to. Could you please provide more details?

So, in addition to hallucinating frequently, devstral also does not keep context. This is an absolute must-have for agents. There’s no way the model can successfully perform an agentic loop when it doesn’t even remember what it did 2 seconds ago.

Test 3: File System Reasoning

The final test is to see if the model is able to reason about the environment it is working in. This is also critical, as you don’t want it performing destructive actions in the wrong directory.

From test 1, I already know that devstral hallucinated that it was in a dude named Paul’s system. But let’s officially run a test:

what directory are you in

Its response:

I don't have direct access to a file system or current working directory. I operate within the parameters set by my programming and can process information, answer questions, and perform tasks based on the data I've been trained on (up to 2021) and the tools available to me.

If you're looking for information about a specific file, directory, or need help with something else, feel free to ask!

Ok, we’re done here.

The devstral model is floating in the void… in the midst of the pandemic years, no less. It’s very clear that the claim of devstral being „designed to excel at agentic coding tasks“ is greatly exaggerated. It cannot even write a basic „Hello World“.

One day I’ll get the local model of my dreams. Unfortunately, today is not that day.

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert