Introduction
Microsoft.Extensions.AI is a set of .NET abstractions that normalize how your code talks to AI models and related capabilities. Instead of binding your application tightly to one model vendor or SDK, you program against a small, testable interface surface that plugs into .NET’s dependency injection and hosting model. The result is cleaner composition, easier testing, and portability across providers. This article is a technical deep dive for developers who are comfortable with .NET 9 and C#, and want to use Microsoft.Extensions.AI as the primary way to invoke chat completions, enable tool calling, and consume structured outputs in production applications.
The focus here is purely on Microsoft.Extensions.AI: its mental model, core interfaces, configuration patterns, error handling, streaming, tool use, and structured output. The goal is to help you build reliable, evolvable AI features without vendor lock-in or brittle glue code.
Why an abstraction layer for AI?
Modern AI features often start small (a single chat endpoint) and evolve rapidly (multi-step workflows, tool use, and typed outputs). The challenge is keeping that growth from hardening into tight coupling with one SDK or a scattering of request/response shapes across your codebase. Microsoft.Extensions.AI solves this problem by introducing a unified way to: register AI clients with DI, submit requests, stream partial results, bind tool functions, and request typed responses. It emphasizes predictable lifetimes, cancellation, structured logs, and separation of concerns.
Thinking in .NET terms, Microsoft.Extensions.AI plays the same role for AI that HttpClientFactory and ILogger did for HTTP and logging. You code to interfaces, push variability into configuration, and centralize cross-cutting concerns—retries, timeouts, metrics—behind composable decorators.
Mental model and core concepts
Chat-centric programming model
At the center is the chat interaction. You send a sequence of messages (system, user, assistant). The AI returns content, optionally calling your tools, optionally shaping its output into a type you define. This model is general enough to cover most practical workloads: assistants, content generation, code review, data extraction, and planning.
Key interfaces
- IChatClient: The main entry point for sending chat requests. You’ll typically register one or more implementations and inject them where needed.
- ChatMessage and Role: Represent the conversation state you send to the model.
- ChatCompletionRequest / ChatCompletionResult: Request and response shapes for synchronous calls.
- CompleteStreamAsync: Produces an async stream of partial responses for live updates.
- ToolDefinition: Declares a callable function the model may invoke to accomplish tasks requiring external data or actions.
- StructuredOutput: Requests model output that conforms to a schema (mapped to your C# type).
Provider-agnostic design
You pick a provider-specific adapter (for example, one that talks to a given model endpoint). Your application code never references the provider SDK directly; it consumes IChatClient. This makes it straightforward to switch providers, split traffic, or run A/B experiments without rewriting business logic.
Installing and configuring
You add provider packages that implement the abstractions, then register an IChatClient. All configuration should come from your app settings or secrets. A typical registration in Program.cs looks like this:
using Microsoft.Extensions.AI;
using Microsoft.Extensions.DependencyInjection;
var builder = WebApplication.CreateBuilder(args);
// Placeholder registration; replace with a concrete adapter for your provider.
builder.Services.AddSingleton<IChatClient>(sp =>
{
var cfg = sp.GetRequiredService<IConfiguration>();
var apiKey = cfg["AI:ApiKey"];
var model = cfg["AI:Model"];
return new SomeProviderChatClient(apiKey, model);
});
var app = builder.Build();
app.MapGet("/health", () => "ok");
app.Run();
Two principles guide configuration:
- All provider secrets and model names live in configuration, not code.
- Register clients behind interfaces and, where helpful, wrap them with decorators for logging, retry, and metering.
Sending your first completion
Once registered, inject IChatClient where you need it. Microsoft.Extensions.AI uses a message list to represent the conversation state.
using Microsoft.Extensions.AI;
using Microsoft.AspNetCore.Mvc;
[ApiController]
[Route("api/chat")]
public sealed class ChatController : ControllerBase
{
private readonly IChatClient _chat;
public ChatController(IChatClient chat)
{
_chat = chat;
}
[HttpPost("complete")]
public async Task<IActionResult> Complete([FromBody] string prompt, CancellationToken ct)
{
var result = await _chat.CompleteAsync(new ChatCompletionRequest
{
Messages =
{
new ChatMessage(Role.System, "You are a concise technical assistant."),
new ChatMessage(Role.User, prompt)
},
Temperature = 0.1
}, ct);
return Ok(new { content = result.Content, usage = result.Usage });
}
}
Keep the system message explicit and stable. Use low temperature for deterministic behavior when the task is factual or extractive. Always pass the request’s CancellationToken so deployed instances shut down quickly and long polls don’t leak.
Streaming partial responses
Streaming improves perceived latency and supports chat UIs. The API produces an async stream of partial chunks. On the server, write them to the response as they arrive; on the client, append each chunk.
app.MapPost("/api/chat/stream", async (IChatClient chat, [FromBody] string prompt, HttpResponse res, CancellationToken ct) =>
{
res.Headers.Append("Content-Type", "text/event-stream");
await foreach (var part in chat.CompleteStreamAsync(new ChatCompletionRequest
{
Messages =
{
new ChatMessage(Role.System, "Stream concise answers as tokens."),
new ChatMessage(Role.User, prompt)
},
Temperature = 0.2
}, ct))
{
await res.WriteAsync(part.Content);
await res.Body.FlushAsync();
}
});
Stream handlers belong in their own endpoints. Treat the stream body as an event source that UI code can subscribe to. Keep stream messages small, and flush regularly.
Tool calling (function calling)
Tool calling lets the model request that your code perform a specific function—fetch a record, run a calculation, call an API—and feed the result back into the conversation. Define a tool with a name, a short description, and a typed argument schema. Then, when the model decides to call it, the runtime binds the arguments and invokes your function.
public sealed class TimeTool
{
public record Args(string Zone);
public Task<string> InvokeAsync(Args args, CancellationToken ct)
{
var now = TimeZoneInfo.ConvertTime(DateTimeOffset.UtcNow, TimeZoneInfo.FindSystemTimeZoneById(args.Zone));
return Task.FromResult($"Time in {args.Zone}: {now:yyyy-MM-dd HH:mm}");
}
}
builder.Services.AddSingleton(new TimeTool());
app.MapPost("/api/chat/tools", async (IChatClient chat, TimeTool timeTool, [FromBody] string prompt, CancellationToken ct) =>
{
var tools = new[]
{
ToolDefinition.Create(
name: "get_time",
description: "Returns the current time in the specified IANA/Windows time zone.",
func: (TimeTool.Args a, CancellationToken t) => timeTool.InvokeAsync(a, t))
};
var result = await chat.CompleteAsync(new ChatCompletionRequest
{
Messages =
{
new ChatMessage(Role.System, "You can call tools when needed. Prefer ISO 8601 times."),
new ChatMessage(Role.User, prompt)
},
Tools = tools,
Temperature = 0.0
}, ct);
return Results.Ok(new { content = result.Content });
});
Good tool design is crucial. Provide a clear description, precise argument types, and deterministic behavior. Tool results should be short, structured, and safe to log. Tools are your seam to external systems; secure them and validate inputs as you would any public API.
Structured output (strongly typed responses)
Unstructured natural language is difficult to validate. Structured output requests the model to conform to a schema that maps to your C# type. This makes responses testable and transforms fragile prompts into robust contracts. The abstractions let you ask for typed output by passing a schema.
public sealed record Summary(string Title, string[] Bullets);
app.MapPost("/api/summarise", async (IChatClient chat, [FromBody] string text, CancellationToken ct) =>
{
var schema = StructuredOutput.From<Summary>();
var completion = await chat.CompleteAsync(new ChatCompletionRequest
{
Messages =
{
new ChatMessage(Role.System, "Summarise the text into a title and 3-5 bullet points."),
new ChatMessage(Role.User, text)
},
ResponseFormat = schema,
Temperature = 0.2
}, ct);
return Results.Json(completion.Content);
});
Prefer structured output for extraction, classification, and routing. Keep the schema small and stable. Write unit tests that deserialize representative outputs and validate constraints. Make the system prompt explicit about format and constraints; treat it like an API contract.
Error handling and resilience
AI failures come from transient provider errors, rate limits, timeouts, and invalid prompt shapes. Handle these in consistent layers: request-level policies, pipeline decorators, and endpoint-specific fallbacks.
- Timeouts: Pass cancellation tokens and set per-request timeouts where your provider adapter supports them.
- Retries: Retry only idempotent requests; avoid retrying tool calls that mutate state.
- Validation: Validate tool arguments and reject malformed requests before calling the model.
- Backpressure: Rate-limit at the edge for bursty chat features and enforce maximum tokens per request.
Observability essentials
Instrument every call. Capture model name, request id, latency, token counts, and whether a tool was invoked. Log system messages and prompts in a redacted form. Emit metrics for failure rates, timeouts, and cost. Correlate AI spans with your HTTP request traces so developers can answer why a given chat response behaved unexpectedly.
Prompt design and lifecycle
Treat prompts as assets. Keep the system message short, authoritative, and versioned. Split business rules into reusable fragments. Write example pairs for non-trivial tasks. Add tests that assert properties of outputs under representative inputs. For multi-step tasks, make each step a distinct prompt with clear inputs and outputs. Prefer tool calls when the model needs fresh data or deterministic actions.
Security and safety
- Injection resistance: Never let user content redefine your system message. Keep tools opt-in and narrow in capability.
- Content controls: Post-filter model outputs for disallowed content when required.
- Secrets: Remove secrets from prompts. Strip tokens and credentials from logs.
- Data minimization: Send only the minimal context needed. Avoid including full transcripts when not necessary.
Cost control
- Cap tokens per request and per user/session.
- Cache embeddings or repeated completions where appropriate.
- Store summaries and use them as context instead of raw large documents.
Testing strategy
Testability is a primary reason to adopt the abstraction. Replace IChatClient with a deterministic stub in unit tests. For integration tests, use a local adapter or a record/replay harness. Focus on control flow: prompt creation, tool invocation, and structured output validation.
public sealed class FakeChatClient : IChatClient
{
public Task<ChatCompletionResult> CompleteAsync(ChatCompletionRequest request, CancellationToken ct = default)
{
var content = "stubbed";
return Task.FromResult(new ChatCompletionResult(content));
}
public async IAsyncEnumerable<ChatCompletionStreamingPart> CompleteStreamAsync(ChatCompletionRequest request, [System.Runtime.CompilerServices.EnumeratorCancellation] CancellationToken ct = default)
{
yield return new ChatCompletionStreamingPart("stub");
await Task.CompletedTask;
}
}
Tests should not verify entire strings; prefer property-based checks and structural assertions. When using StructuredOutput, validate that required fields are present and within expected bounds.
Versioning and compatibility
Model and provider versions change. Centralize version and capability checks. Make your system message declare assumptions about output format explicitly. If you must roll models in place, add feature flags and canary endpoints. Keep your IChatClient usage in a small number of service classes to minimize blast radius when swapping providers or models.
Design patterns for maintainable AI code
Command pattern for prompts
Create a small request object that encapsulates the intent, inputs, and an executor that knows how to convert it into ChatCompletionRequest. This keeps prompts discoverable and testable.
Tool-as-gateway
Treat tools as a boundary to external systems. Implement them as stateless services with strong validation. Return compact data structures that the model can reason over without hallucinating details.
Schema-first extraction
Use StructuredOutput for extraction tasks. Keep schemas small and version them.
Adapter composition
Wrap IChatClient with decorators for metrics, retries, caching, and audits. Register a chain in DI so the rest of your app remains unaware of cross-cutting behavior.
End-to-end working example
The following is a self-contained minimal API illustrating synchronous completions, streaming, one tool, and structured output. Replace the provider-specific client constructor with a real adapter in your environment.
using Microsoft.AspNetCore.Mvc;
using Microsoft.Extensions.AI;
using Microsoft.Extensions.DependencyInjection;
var builder = WebApplication.CreateBuilder(args);
// Replace with a concrete provider adapter.
builder.Services.AddSingleton<IChatClient>(sp =>
{
var cfg = sp.GetRequiredService<IConfiguration>();
var apiKey = cfg["AI:ApiKey"];
var model = cfg["AI:Model"];
return new SomeProviderChatClient(apiKey, model);
});
builder.Services.AddSingleton(new TimeTool());
var app = builder.Build();
app.MapGet("/health", () => "ok");
// 1) Basic completion
app.MapPost("/chat/complete", async ([FromServices] IChatClient chat, [FromBody] string prompt, CancellationToken ct) =>
{
var result = await chat.CompleteAsync(new ChatCompletionRequest
{
Messages =
{
new ChatMessage(Role.System, "You are a concise technical assistant."),
new ChatMessage(Role.User, prompt)
},
Temperature = 0.2
}, ct);
return Results.Ok(new { result.Content });
});
// 2) Streaming
app.MapPost("/chat/stream", async ([FromServices] IChatClient chat, [FromBody] string prompt, HttpResponse res, CancellationToken ct) =>
{
res.Headers.Append("Content-Type", "text/event-stream");
await foreach (var chunk in chat.CompleteStreamAsync(new ChatCompletionRequest
{
Messages =
{
new ChatMessage(Role.System, "Stream partial tokens as they become available."),
new ChatMessage(Role.User, prompt)
},
Temperature = 0.2
}, ct))
{
await res.WriteAsync(chunk.Content);
await res.Body.FlushAsync();
}
});
// 3) Tool calling
app.MapPost("/chat/tools", async ([FromServices] IChatClient chat, [FromServices] TimeTool timeTool, [FromBody] string prompt, CancellationToken ct) =>
{
var tools = new[]
{
ToolDefinition.Create("get_time", "Returns time in the specified time zone.", (TimeTool.Args a, CancellationToken t) => timeTool.InvokeAsync(a, t))
};
var result = await chat.CompleteAsync(new ChatCompletionRequest
{
Messages =
{
new ChatMessage(Role.System, "Use tools when you need accurate times."),
new ChatMessage(Role.User, prompt)
},
Tools = tools,
Temperature = 0.0
}, ct);
return Results.Ok(new { result.Content });
});
// 4) Structured output
public sealed record Summary(string Title, string[] Bullets);
app.MapPost("/summarise", async ([FromServices] IChatClient chat, [FromBody] string text, CancellationToken ct) =>
{
var schema = StructuredOutput.From<Summary>();
var completion = await chat.CompleteAsync(new ChatCompletionRequest
{
Messages =
{
new ChatMessage(Role.System, "Summarise the text into a title and 3-5 bullet points."),
new ChatMessage(Role.User, text)
},
ResponseFormat = schema,
Temperature = 0.2
}, ct);
return Results.Json(completion.Content);
});
app.Run();
public sealed class TimeTool
{
public record Args(string Zone);
public Task<string> InvokeAsync(Args args, CancellationToken ct)
{
var tz = TimeZoneInfo.FindSystemTimeZoneById(args.Zone);
var now = TimeZoneInfo.ConvertTime(DateTimeOffset.UtcNow, tz);
return Task.FromResult($"{now:O}");
}
}
// Placeholder provider adapter for compilation; replace with a real one.
public sealed class SomeProviderChatClient : IChatClient
{
private readonly string _apiKey;
private readonly string _model;
public SomeProviderChatClient(string apiKey, string model)
{
_apiKey = apiKey;
_model = model;
}
public Task<ChatCompletionResult> CompleteAsync(ChatCompletionRequest request, CancellationToken ct = default)
{
return Task.FromResult(new ChatCompletionResult("stubbed response"));
}
public async IAsyncEnumerable<ChatCompletionStreamingPart> CompleteStreamAsync(ChatCompletionRequest request, [System.Runtime.CompilerServices.EnumeratorCancellation] CancellationToken ct = default)
{
yield return new ChatCompletionStreamingPart("stub");
await Task.CompletedTask;
}
}
Operational checklists
Prompt and tool governance
- Keep system prompts short and versioned.
- Write unit tests for tool argument validation.
- Return compact, machine-friendly tool results.
Runtime hygiene
- Propagate cancellation tokens everywhere.
- Cap tokens, set timeouts, and rate-limit.
- Emit structured logs with model name, latency, token usage, and tool calls.
Change management
- Introduce new models behind flags.
- Canary releases for high-traffic endpoints.
- Snapshot tests for structured output contracts.
Common pitfalls and remedies
- Tight coupling to one SDK: Always code to IChatClient; put provider specifics behind configuration.
- Prompt sprawl: Centralize prompts, version them, and write tests.
- Unbounded costs: Enforce token limits, cache where safe, and track per-user budgets.
- Opaque failures: Add correlation ids, tag every completion with model/version, and capture diagnostics.
- Overusing free-form text: Prefer StructuredOutput for anything you later parse or store.
Where to take it next
Once you’re comfortable with synchronous completions, streaming, tools, and structured output, the next steps are orchestration and evaluation. Orchestration means chaining prompts and tool calls to accomplish multi-step tasks. Evaluation means building feedback loops: telemetry, human ratings where appropriate, and automated checks for regressions in accuracy or format. The abstraction layer keeps these concerns modular: orchestration logic depends on IChatClient, while evaluation consumes your logs and typed outputs.
Conclusion
Microsoft.Extensions.AI gives .NET teams a pragmatic way to build AI features that are portable, testable, and production-friendly. You define prompts and tools as first-class code, ask for typed outputs when you need guarantees, and keep the rest of your application clean of provider-specific details. Start by wrapping one feature with the abstraction, add streaming and a tool where they add value, and adopt structured output for any response you plan to parse or store. As requirements grow, the same interfaces scale with you.

Leave a comment