Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

MCP Sampling: When Your Tools Need to Think

12 January 2026 at 07:14

The following article originally appeared on Block’s blog and is being republished here with the author’s permission.

If you’ve been following MCP, you’ve probably heard about tools which are functions that let AI assistants do things like read files, query databases, or call APIs. But there’s another MCP feature that’s less talked about and arguably more interesting: sampling.

Sampling flips the script. Instead of the AI calling your tool, your tool calls the AI.

Let’s say you’re building an MCP server that needs to do something intelligent like summarize a document, translate text, or generate creative content. You have three options:

Option 1: Hardcode the logic. Write traditional code to handle it. This works for deterministic tasks, but falls apart when you need flexibility or creativity.

Option 2: Bake in your own LLM. Your MCP server makes its own calls to OpenAI, Anthropic, or whatever. This works, but now you’ve got API keys to manage and costs to track, and you’ve locked users into your model choice.

Option 3: Use sampling. Ask the AI that’s already connected to do the thinking for you. No extra API keys. No model lock-in. The user’s existing AI setup handles it.

How Sampling Works

When an MCP client like goose connects to an MCP server, it establishes a two-way channel. The server can expose tools for the AI to call, but it can also request that the AI generate text on its behalf.

Here’s what that looks like in code (using Python with FastMCP):

Using Python with FastMCP sampling

The ctx.sample() call sends a prompt back to the connected AI and waits for a response. From the user’s perspective, they just called a “summarize” tool. But under the hood, that tool delegated the hard part to the AI itself.

A Real Example: Council of Mine

Council of Mine is an MCP server that takes sampling to an extreme. It simulates a council of nine AI personas who debate topics and vote on each other’s opinions.

But there’s no LLM running inside the server. Every opinion, every vote, every bit of reasoning comes from sampling requests back to the user’s connected LLM.

The council has nine members, each with a distinct personality:

  • 🔧 The Pragmatist – “Will this actually work?”
  • 🌟 The Visionary – “What could this become?”
  • 🔗 The Systems Thinker – “How does this affect the broader system?”
  • 😊 The Optimist – “What’s the upside?”
  • 😈 The Devil’s Advocate – “What if we’re completely wrong?”
  • 🤝 The Mediator – “How can we integrate these perspectives?”
  • 👥 The User Advocate – “How will real people interact with this?”
  • 📜 The Traditionalist – “What has worked historically?”
  • 📊 The Analyst – “What does the data show?”

Each personality is defined as a system prompt that gets prepended to sampling requests.

When you start a debate, the server makes nine sampling calls, one for each council member:

Council of members 1

That temperature=0.8 setting encourages diverse, creative responses. Each council member “thinks” independently because each is a separate LLM call with a different personality prompt.

After opinions are collected, the server runs another round of sampling. Each member reviews everyone else’s opinions and votes for the one that resonates most with their values:

The council has voted

The server parses the structured response to extract votes and reasoning.

One more sampling call generates a balanced summary that incorporates all perspectives and acknowledges the winning viewpoint.

Total LLM calls per debate: 19

  • 9 for opinions
  • 9 for voting
  • 1 for synthesis

All of those calls go through the user’s existing LLM connection. The MCP server itself has zero LLM dependencies.

Benefits of Sampling

Sampling enables a new category of MCP servers that orchestrate intelligent behavior without managing their own LLM infrastructure.

No API key management: The MCP server doesn’t need its own credentials. Users bring their own AI, and sampling uses whatever they’ve already configured.

Model flexibility: If a user switches from GPT to Claude to a local Llama model, the server automatically uses the new model.

Simpler architecture: MCP server developers can focus on building a tool, not an AI application. They can let the AI be the AI, while the server focuses on orchestration, data access, and domain logic.

When to Use Sampling

Sampling makes sense when a tool needs to:

  • Generate creative content (summaries, translations, rewrites)
  • Make judgment calls (sentiment analysis, categorization)
  • Process unstructured data (extract info from messy text)

It’s less useful for:

  • Deterministic operations (math, data transformation, API calls)
  • Latency-critical paths (each sample adds round-trip time)
  • High-volume processing (costs add up quickly)

The Mechanics

If you’re implementing sampling, here are the key parameters:

Sampling parameters

The response object contains the generated text, which you’ll need to parse. Council of Mine includes robust extraction logic because different LLM providers return slightly different response formats:

Council of Mine robust extraction logic

Security Considerations

When you’re passing user input into sampling prompts, you’re creating a potential prompt injection vector. Council of Mine handles this with clear delimiters and explicit instructions:

Council of Mine delimiters and instructions

This isn’t bulletproof, but it raises the bar significantly.

Try It Yourself

If you want to see sampling in action, Council of Mine is a great playground. Ask goose to start a council debate on any topic and watch as nine distinct perspectives emerge, vote on each other, and synthesize into a conclusion all powered by sampling.

MCPs for Developers Who Think They Don’t Need MCPs

5 January 2026 at 06:01

The following article originally appeared on Block’s blog and is being republished here with the author’s permission.

Lately, I’ve seen more developers online starting to side eye MCP. There was a tweet by Darren Shepherd that summed it up well:

Most devs were introduced to MCP through coding agents (Cursor, VS Code) and most devs struggle to get value out of MCP in this use case…so they are rejecting MCP because they have a CLI and scripts available to them which are way better for them.

Fair. Most developers were introduced to MCPs through some chat-with-your-code experience, and sometimes it doesn’t feel better than just opening your terminal and using the tools you know. But here’s the thing…

MCPs weren’t built just for developers.

They’re not just for IDE copilots or code buddies. At Block, we use MCPs across everything, from finance to design to legal to engineering. I gave a whole talk on how different teams are using goose, an AI agent. The point is MCP is a protocol. What you build on top of it can serve all kinds of workflows.

But I get it… Let’s talk about the dev-specific ones that are worth your time.

GitHub: More Than Just the CLI

If your first thought is “Why would I use GitHub MCP when I have the CLI?” I hear you. GitHub’s MCP is kind of bloated right now. (They know. They’re working on it.)

But also: You’re thinking too local.

You’re imagining a solo dev setup where you’re in your terminal, using GitHub CLI to do your thing. And honestly, if all you’re doing is opening a PR or checking issues, you probably should use the CLI.

But the CLI was never meant to coordinate across tools. It’s built for local, linear commands. But what if your GitHub interactions happened somewhere else entirely?

MCP shines when your work touches multiple systems like GitHub, Slack, and Jira without you stitching it together.

Here’s a real example from our team:

Slack thread. Real developers in real time.

Dev 1: I think there’s a bug with xyz

Dev 2: Let me check… yep, I think you’re right.

Dev 3: @goose is there a bug here?

goose: Yep. It’s in these lines… [code snippet]

Dev 3: Okay @goose, open an issue with the details. What solutions would you suggest?

goose: Here are 3 suggestions: [code snippets with rationale]

Dev 1: I like Option 1

Dev 2: me too

Dev 3: @goose, implement Option 1

goose: Done. Here’s the PR.

All of that happened in Slack. No one opened a browser or a terminal. No one context-switched. Issue tracking, triaging, discussing fixes, implementing code in one thread in a five-minute span.

We’ve also got teams tagging Linear or Jira tickets and having goose fully implement them. One team had goose do 15 engineering days worth of work in a single sprint. The team literally ran out of tasks and had to pull from future sprints. Twice!

So yes, GitHub CLI is great. But MCP opens the door to workflows where GitHub isn’t the only place where dev work happens. That’s a shift worth paying attention to.

Context7: Docs That Don’t Suck

Here’s another pain point developers hit: documentation.

You’re working with a new library. Or integrating an API. Or wrestling with an open source tool.

The Context7 MCP pulls up-to-date docs, code examples, and guides right into your AI agent’s brain. You just ask and get answers to questions like:

  • How do I create a payment with the Square SDK?
  • What’s the auth flow for Firebase?
  • Is this library tree-shakable?

It doesn’t rely on stale LLM training data from two years ago. It scrapes the source of truth right now. Giving it updated…say it with me…CONTEXT.

Developer “flow” is real, and every interruption steals precious focus time. This MCP helps you figure out new libraries, troubleshoot integrations, and get unstuck without leaving your IDE.

Repomix: Know the Whole Codebase Without Reading It

Imagine you join a new project or want to contribute to an open source one, but it’s a huge repo with lots of complexity.

Instead of poking around for hours trying to draw an architectural diagram in your head, you just tell your agent: “goose, pack this project up.”

It runs Repomix, which compresses the entire codebase into an AI-optimized file. From there, your convo might go like this:

  • Where’s the auth logic?
  • Show me how API calls work.
  • What uses UserContext?
  • What’s the architecture?
  • What’s still a TODO?

You get direct answers with context, code snippets, summaries, and suggestions. It’s like onboarding with a senior dev who already knows everything. Sure, you could grep around and piece things together. But Repomix gives you the whole picture—structure, metrics, patterns—compressed and queryable.

And it even works with remote public GitHub repos, so you don’t need to clone anything to start exploring.

This is probably my favorite dev MCP. It’s a huge time saver for new projects, code reviews, and refactoring.

Chrome DevTools MCP: Web Testing While You Code

The Chrome DevTools MCP is a must-have for frontend devs. You’re building a new form/widget/page/whatever. Instead of opening your browser, typing stuff in, and clicking around, you just tell your agent: “Test my login form on localhost:3000. Try valid and invalid logins. Let me know what happens.”

Chrome opens, test runs, screenshots captured, network traffic logged, console errors noted. All done by the agent.

This is gold for frontend devs who want to actually test their work before throwing it over the fence.

Could you script all this with CLIs and APIs? Sure, if you want to spend your weekend writing glue code. But why would you want to do that when MCP gives you that power right out of the box… in any MCP client?!

So no, MCPs are not overhyped. They’re how you plug AI into everything you use: Slack, GitHub, Jira, Chrome, docs, codebases—and make that stuff work together in new ways.

Recently, Anthropic called out the real issue: Most dev setups load tools naively, bloat the context, and confuse the model. It’s not the protocol that’s broken. It’s that most people (and agents) haven’t figured out how to use it well yet. Fortunately, goose has—it manages MCPs by default, enabling and disabling as you need them.

But I digress.

Step outside the IDE, and that’s when you really start to see the magic.

PS Happy first birthday, MCP! 🎉

❌
❌