Sorry if it's hard to catch my vibe — Building the Dumbest (Yet Smartest?) C2 in Existence

Originally published on redteamer.tips

We’re back like we never left

It’s been a while, hasn’t it? But here we are again, back at it with another dumb idea that somehow turned into something worth writing about. This time I wanted to explore a question that’s been nagging at me for a while: how well can GenAI actually deal with new, cutting-edge trends?

Not the stuff that’s been in training data for years — I’m talking about fresh research, new protocols, things that are barely documented. The kind of stuff where you can’t just regurgitate Stack Overflow answers because there are no Stack Overflow answers yet.

So naturally, I decided to test this by building the dumbest C2 in existence.

Introducing: The Dumbest C2 in Existence

Here’s the concept: what if you had a C2 framework that has no built-in commands? Zero. Nada. Zilch.

Instead of hardcoding capabilities like whoami, ls, or upload, the operator describes what they want in natural language. An LLM then generates the corresponding C# code on the fly, and the agent picks it up and executes it.

The magic glue? MCP — the Model Context Protocol. The C2’s capabilities are dynamically loaded via MCP, meaning the LLM can generate, register, and serve new tools to the agent at runtime.

It’s dumb because it has no inherent capabilities. It’s smart because it can theoretically do anything the LLM can code up.

Think of it as a C2 that’s perpetually “I know nothing” until you whisper sweet prompts into its ear.

ChatGPMCP

I started this journey with ChatGPT-4o and the MCP C# SDK. MCP was still relatively new at the time, so I was curious to see how well the model could handle it.

The first step was straightforward: explore the SDK, understand the protocol, and see what ChatGPT could do with it. I fed it the SDK documentation and some examples, and off we went.

Time to make a plan

I laid out a clear plan for ChatGPT:

An agent checks in with the C2 server periodically
The operator asks the LLM to generate C# code for a specific capability
The generated code is registered as a new task on the server
The agent picks up the task, compiles and executes it, and returns the results

Simple enough, right?

ChatGPT-4o actually produced some solid code here. We’re talking proper MVC patterns, dependency injection, clean class separation — the whole nine yards. For the scaffolding and architecture of the C2 itself, it was genuinely impressive.

The server had:

A clean REST API for agent check-ins
A task queue system
Proper separation of concerns between the operator interface, task management, and agent communication

Agent → [Check-in] → C2 Server → [Get Tasks] → Agent
                          ↑
Operator → [LLM Prompt] → MCP Server → [Generate Code] → C2 Server

So far so good. The basic C2 infrastructure was coming together nicely. But then we had to actually integrate the MCP part…

But everything changed when the LLM started hallucinating

And this is where things got spicy.

The moment we moved from “standard C# web application” territory into “implement an MCP server using a brand-new SDK,” ChatGPT-4o started hallucinating hard.

It was generating API calls that didn’t exist. Referencing methods that weren’t in the SDK. Creating initialization patterns that looked plausible but were completely fabricated. Classic hallucination behavior — confident, syntactically correct, and utterly wrong.

This makes perfect sense when you think about it. The MCP C# SDK was new enough that 4o’s training data had minimal coverage of it. So the model was essentially improvising, and improvisation with APIs tends to go poorly.

Enter O3.

I switched to the O3 reasoning model, and the difference was night and day. O3 actually reasoned about the SDK structure. Instead of confidently making things up, it would:

Analyze the SDK’s patterns and infer correct usage
Acknowledge when it wasn’t sure about something
Work through problems step by step

The real highlight? O3 figured out a fix that wasn’t even in the official MCP documentation. The MCP server was failing when launched via dotnet run, and O3 reasoned that compiling to a standalone executable first and then running the binary directly would resolve the issue. And it did.

# What the docs suggested (didn't work reliably):
dotnet run --project McpServer

# What O3 figured out (worked):
dotnet publish -c Release -o ./out
./out/McpServer

That’s not something you find on Stack Overflow. That’s genuine problem-solving.

To Conclude

So what did we learn from building the dumbest C2 in existence?

LLMs are fantastic for rapid prototyping. The speed at which we went from “vague idea” to “working proof of concept” was remarkable. What might have taken days of manual coding was done in hours.

But not all models are created equal. ChatGPT-4o excelled at standard patterns — MVC, DI, REST APIs — the stuff that’s been in training data forever. The moment we ventured into new territory with MCP, it fell apart. O3’s reasoning capabilities made it significantly more capable of handling novel problems.

The separate GPT + code editor approach gives you more control than fully agentic IDEs. Having a conversation with the LLM in one window and your code in another lets you critically evaluate suggestions before applying them. Fully agentic coding assistants are cool, but for exploratory work like this, the human-in-the-loop approach felt more productive.

And hey — we got a working C2 out of it. One that has no built-in commands but can dynamically generate whatever capability you need via MCP and LLM code generation. Is it practical? Debatable. Is it a fascinating proof of concept? Absolutely.

Source code: https://github.com/Cytadel-Cyber/BlogPosts/tree/main/DumbestC2Ever

Until next time. Stay dumb, stay smart.