The full index

All 306 Patterns

Every pattern in the catalog, by volume — name, intent, and a link straight to the full entry. Use ⌘K to search across all of them.

Augmented LLM Wrap a base model with retrieval, tools, and memory before composing it into anything else. Prompt Chaining Decompose a task into a fixed sequence of LLM steps, where each step consumes the output of the previous one. Routing Classify the input and dispatch it to a specialized downstream handler. Cascade / Fallback Try a cheaper or faster model first; escalate to a stronger one only when the cheaper attempt is unconfident or fails verification. Parallelization — Sectioning Split a task into independent subtasks, run them concurrently, and aggregate. Parallelization — Voting Run the same task N times and combine the results to get a more confident answer. Map-Reduce over Corpus Apply the same LLM operation to each document in a corpus in parallel, then combine the results in a final reduce call. Orchestrator–Workers A central LLM dynamically decomposes the task into subtasks, delegates each to a worker LLM, and synthesizes their outputs. Hybrid Pipeline Compose a workflow as a sequence of stages, where one or more stages internally fan out in parallel. Evaluator–Optimizer One LLM generates a candidate; a second LLM evaluates it and produces feedback; the generator revises. Loop until the evaluator accepts or a budget is exhausted. Generator–Verifier Generate a candidate; check it with a binary Pass/Fail verifier; on Fail, sample again. The verifier emits no feedback; the generator simply produces a fresh sample. Reflexion Evaluator–Optimizer extended with persistent verbal lessons: the agent reflects on each failure in writing, stores the reflection, and consults the store on the next attempt. Autoresearch (Ratchet Loop) Propose a change, run an experiment, measure a metric, keep the change only if the metric improved; otherwise roll back. Repeat indefinitely. ReAct Interleave the model's reasoning ("Thought") with tool calls ("Action") and tool results ("Observation") in a single rolling context until the task is done. Plan-and-Execute Produce a full step-by-step plan up front; then execute each step, often with a smaller and cheaper model. ReWOO Produce a plan with explicit data dependencies, execute the independent steps in parallel without intermediate model invocations, and combine the results in a final synthesis step. Tree-of-Thoughts Generate multiple reasoning branches at each step, evaluate the partial branches, expand the most promising, backtrack when a branch fails. LATS Combine Tree-of-Thoughts (branching search), ReAct (tool use), and Reflexion (verbal lessons) into a unified deliberation loop. Multi-Agent Debate Multiple solver agents independently answer a question; they then exchange and critique each other's answers across one or more rounds; an aggregator synthesizes the final answer. Hierarchical Supervisor A top-level supervisor agent delegates to mid-level supervisors, which in turn delegate to specialist workers, forming a tree of responsibility. Autonomous Agent An LLM operates a tool-use loop with no predetermined number of steps, deciding its own actions, observing the environment, and terminating when it judges the task complete. Human-in-the-Loop Checkpoint Pause an otherwise-autonomous workflow at defined points to obtain human approval, correction, or judgment before proceeding. Deep Research Agent Decompose an open-ended research question into parallel sub-investigations; each sub-investigation runs its own ReAct-style loop over external sources; a synthesizer combines the findings into a long-form report.
skill-creator Guide Claude through creating, validating, and optimizing new SKILL.md files for any domain. claude-api Build, debug, and optimize Claude API and Anthropic SDK applications, including migration between model versions and adoption of features like prompt caching, tool use, thinking, and Managed Agents. Document creation skills — docx / pdf / pptx / xlsx Create, edit, and analyze Microsoft Office and PDF documents from Claude, with full fidelity for tracked changes, comments, formulas, layouts, and binary attachments. frontend-design Instruct Claude to avoid generic "AI slop" aesthetics and make distinctive, considered visual decisions when producing frontend code, especially with React and Tailwind. mcp-builder Guide Claude through producing high-quality MCP servers — the integration layer between Claude (or any MCP-compatible agent) and external APIs or data sources. webapp-testing Test local web applications using Playwright for UI verification, debugging, and visual diffing. anthropics/claude-cookbooks — reference notebooks Provide canonical reference implementations of how to call the Claude API and compose agent workflows. critical-code-reviewer Conduct rigorous, adversarial code reviews that name security holes, lazy patterns, edge-case failures, and bad practices across Python, R, JavaScript/TypeScript, SQL, and front-end code. describe-design Research a codebase and create architectural documentation describing how features or systems work, with Mermaid diagrams and stable code references. testing-r-packages Write idiomatic R package tests using testthat 3+, with the appropriate use of fixtures, snapshots, mocking, and BDD-style describe/it blocks. release-post + create-release-checklist Streamline the release of an R or Python package: produce the changelog-driven release blog post and the per-release operational checklist as a GitHub issue. quarto-alt-text Generate accessible alt text for figures in Quarto documents using Amy Cesal's three-part formula: chart type, data description, key insight. cli (r-lib) Use the cli R package well: semantic messaging, inline markup, progress indicators, theming. workflow-orchestration-patterns Apply the right state-management and reliability patterns when writing or modifying long-running workflow code, specifically targeting Temporal and Saga-style patterns. sql-optimization-patterns Optimize SQL by reading the query plan first — parse EXPLAIN output, identify the actual bottleneck, then make targeted changes, rather than reaching for indexes or rewrites on instinct. stripe-integration + pci-compliance Integrate Stripe (or similar payment processors) with correct webhook validation, idempotent payment routing, and PCI-compliance discipline. stride-analysis-patterns Apply STRIDE threat modeling (Spoofing, Tampering, Repudiation, Information disclosure, Denial of service, Elevation of privilege) to an architecture or design. attack-tree-construction Construct an attack tree for a target asset: the attacker's goal as root, the strategies that achieve it as branches, the sub-steps as leaves, with cost/feasibility estimates per path. context-driven-development Apply Context-Driven Development methodology: maintain product context, write specifications, and proceed in phased planning rather than ad-hoc code production. openskills CLI Install, sync, and read SKILL.md folders across any AI coding agent that reads AGENTS.md, with progressive disclosure preserved. n-skills marketplace A small, hand-curated collection of high-quality skills, installable via openskills. denote-core Teach Claude the Denote file-naming convention: timestamp-prefixed filenames with double-dash title separators and double-underscore tag separators, and the frontmatter conventions that go with them. denote-knowledge-graph Build and query a knowledge graph of inter-note links across a Denote vault, with hop-distance queries and centrality reporting. literate Operate org-babel literate-programming features correctly: :tangle for code extraction, :results for inline execution, :session for stateful evaluation. export Export Org-mode notes to Markdown, HTML, PDF, or other formats via pypandoc with the correct flags for the conversion. tdd Enforce strict red-green-refactor TDD with vertical slicing only — one test, one slice of behavior, one piece of implementation at a time. No horizontal slicing. grill-me Force the agent to ask detailed clarifying questions before starting work, surfacing the misalignment between what the user said and what they meant. diagnose Run a structured debugging loop: form a hypothesis about why the bug occurs, design an experiment that would prove or disprove it, run the experiment, iterate. improve-codebase-architecture Audit a codebase for architectural problems — shallow modules, exposed complexity, drift from the ubiquitous language — and recommend targeted refactors. git-guardrails-claude-code Configure Claude Code hooks that block dangerous git commands (push, reset –hard, clean -f, push -f) before they execute. Rosetta MCP server (ims-mcp) Serve instructions, skills, and workflows to any MCP-compatible agent on demand, with progressive disclosure and source-code isolation. rosetta-cli Initialize new repositories with Rosetta's conventions, manage instruction sets locally, and bridge between local-only and centralized deployments. The four-phase workflow (Prepare → Research → Plan → Act) Impose a consistent four-phase structure on agent work: load context, search for guidance, produce a reviewable plan, execute.
web_search Search the live web from inside a Claude turn, returning ranked results with snippets and source attributions. web_fetch Fetch the full contents of a specific URL, returning the text content (markdown-extracted by default) to the model. code_execution Execute Python code in a sandboxed environment with file persistence, network access, and pre-installed scientific libraries. bash_tool Execute shell commands in a Linux environment. Used heavily by Claude Code and by Anthropic-hosted agent containers. text_editor (view / str_replace / create) Read, modify, and create files via a focused editing API that maps naturally to how the model reasons about code changes. computer use Control a computer's screen, keyboard, and mouse as a human would — move the cursor, click, type, take screenshots, scroll. memory tool Give an agent durable cross-conversation memory — write notes, recall them in future sessions, and apply context editing to keep long-running conversations manageable. tool search Let Claude work with hundreds or thousands of tools by dynamically discovering and loading them on demand, instead of loading every tool definition into context upfront. advisor Let a faster, lower-cost executor model consult a higher-intelligence advisor model mid-generation for strategic guidance, without breaking out of the single request boundary. Tavily / Exa / Perplexity / Brave — the search backends Live web search via four distinctively different backends, each surfacing different parts of the web with different trade-offs. Firecrawl Industrial-strength web scraping — search, scrape, crawl, and extract structured data from JavaScript-heavy sites, returning clean markdown. Context7 Pull current, version-pinned documentation for 9,000+ libraries directly into the agent's context, so the model writes code against real APIs rather than from training-cutoff memory. Vector retrieval (Pinecone, Weaviate, embedded stores) Semantic search over a private corpus — the agent's knowledge base, internal documentation, customer support archives, codebases. E2B Open-source secure sandboxes for AI-generated code, running on Firecracker microVMs with sub-second boot times. Modal sandboxes General-purpose serverless compute that scales to AI sandbox workloads: GPUs, large memory, custom Docker images. Daytona / Blaxel — dev-environment sandboxes Sandbox products that double as full development environments, optimizing for long-lived state and fast resume rather than for ephemeral microVMs. Filesystem MCP Secure file operations — read, write, list, search — against a configurable directory tree, with access controls. Git MCP Read, search, and manipulate local Git repositories — status, diff, log, branch, commit, blame. GitHub MCP (vendor-maintained) Access the GitHub API as tools — search repos and code, read PRs and issues, create issues and PRs, comment on threads, get commit info. Anthropic Computer Use (cross-reference) Control a full desktop — mouse, keyboard, screenshots — as the universal automation fallback when no API exists. Playwright MCP Drive a browser through structured DOM operations — navigate, click elements, fill forms, take accessibility snapshots — rather than via screen coordinates. browser-use Browser automation built specifically for AI agents, with an agent-loop API rather than a low-level browser-control API. Slack Read and post to Slack channels, search message history, list channels and users, retrieve thread context. Gmail Read and search Gmail messages, draft and send messages, manage labels, search across history. Linear / Jira — issue trackers Manage issues, projects, sprints, and comments in Linear or Jira via natural-language agent interactions. Google Calendar Read, search, create, and modify calendar events; suggest meeting times; respond to invitations. Cloudflare MCP Operate Cloudflare resources — Workers, KV, R2, D1, DNS, Pages, Analytics — from inside an agent. AWS Knowledge Base / AWS MCPs Retrieve from AWS Bedrock Knowledge Bases via MCP; access broader AWS services through the AWS Labs MCP project. Kubernetes / kubectl MCP Inspect cluster state, read pod logs, apply manifests, manage deployments — the standard kubectl operations as agent tools. PostgreSQL MCP Query PostgreSQL databases from inside an agent — schema introspection, table queries, and (with care) limited mutations. DuckDB MCP Analytical SQL over local files (CSV, Parquet, JSON, Excel) or remote sources, with friendly SQL dialect and on-demand DuckDB CLI installation. Anthropic memory tool (cross-reference) Simple key-value memory across sessions, paired with context-editing strategies for long conversations. Mem0 A persistent memory layer for AI agents that automatically extracts, stores, and retrieves user preferences, facts, and conversation history with vector retrieval. Letta A memory-first agent framework with multi-tier memory architecture: core memory always in context, recall memory queried on demand, archival memory long-tail.
LangGraph Build resilient, long-running, stateful language agents as graphs of nodes and conditional edges, with durable execution and human-in-the-loop interrupts as first-class concerns. CrewAI Flows Provide explicit execution paths, conditional branching, and clean state management for event-driven agent workflows — the deterministic counterpart to CrewAI's autonomous, role-based Crews. LlamaIndex Workflows Build event-driven agent workflows as steps that emit and listen for typed events, with the runtime handling dispatch, parallelism, and durability. AutoGen Coordinate multiple agents through asynchronous message passing, with each agent following a conversation pattern (initiator, responder, group chat). Inngest Make long-running, event-driven workflows reliable by abstracting the durability concerns (queue, retries, idempotency, fan-out, scheduling) behind a TypeScript/Python function-style API. Temporal Run long-lived, fault-tolerant workflows across machines and deployments, with deterministic replay as the central guarantee. Restate Combine durable functions, durable communication, and durable state in one engine, with the operational footprint of a single binary rather than a cluster. Composio Agent Orchestrator Spawn parallel AI coding agents (Claude Code, Codex, Aider, OpenCode) on a single repo, each in its own git worktree, with declarative YAML rules for reacting to CI failures, review comments, and PR approvals. GitHub Actions as agent runner Use GitHub Actions' native event triggers (push, pull_request, schedule, workflow_dispatch, repository_dispatch) to invoke AI coding agents on a repository. Background coding agent platforms Provide background coding agents that watch a repository, react to issues / PR comments / mentions, and produce PRs without the team operating the orchestrator themselves. n8n Build event-driven workflows with a visual graph editor and a large library of pre-built integrations, including LLM and agent nodes that work as first-class steps. Zapier (with AI Agents) Connect SaaS apps via "Zaps" (trigger → actions) with a vast connector library, now extended with AI Agents that run multi-step LLM workflows. Pipedream / Activepieces Sit between Zapier (no-code, polished) and n8n (self-hosted, code-friendly): code-first workflow steps with managed infrastructure, full webhook surface, and a generous free tier. AWS EventBridge Route events between AWS services, SaaS apps, and custom applications via declarative pattern rules, with a schema registry that knows the shape of every event source. Apache Kafka Carry millions of events per second across many producers and consumers, with durable retention measured in days or weeks, and the strongest delivery semantics of any of the options in this section. Redis Streams / NATS Provide event streaming and pub/sub with much lower operational footprint than Kafka, suitable for agent systems that don't need Kafka's scale. Schedule and cron triggers Fire events on a recurring schedule or at a specific future time, so agents can run periodic work without an external system to nudge them. File-system and storage triggers Fire events when files or storage objects change, so agents can react to incoming documents without polling. Topic triggers (Microsoft Agent Academy taxonomy) Standardize the inventory of in-conversation events an agent should be able to react to: "user said something," "planner picked a topic," "user has been silent," "plan completed," "typed activity received." Policy and governance triggers Fire events when a permission gate, budget threshold, or governance rule is crossed, so an agent's action can be paused for review, escalated, or routed differently. LangSmith Trace, debug, evaluate, and continuously improve LLM and agent applications, with first-class support for LangGraph and LangChain. AgentOps Provide observability for agents built with any framework — LangGraph, CrewAI, AutoGen, OpenAI Agents SDK, custom — with the agent run as the central unit and an MCP server option for in-IDE integration. Logfire / OpenTelemetry GenAI Use the same observability infrastructure as the rest of the application stack — OpenTelemetry traces, metrics, logs — with conventions that capture agent and LLM call semantics.
AWS Sample Agentic Fabric ("Arbiter") Demonstrate a complete governance-first agent fabric on AWS, with deterministic policy evaluation on every dispatch, dynamically-generated worker agents, and explicit authority modeling. AWS Sample Agentic Platform Demonstrate the operational shape of an agent platform across multiple containerized compute choices — ECS, EKS, Bedrock AgentCore — with documentation structures designed for coding agents to navigate. Microsoft Agent Framework Provide Microsoft's official building blocks for orchestrating and safely deploying production multi-agent workflows on the Azure stack. Azure Agentic Fabric App Sample Demonstrate the dev-container pattern for building agents that operate directly against Microsoft Fabric workloads: lakehouses, semantic models, notebooks, pipelines. E2B (Firecracker microVMs) Provide secure, fast-booting sandboxes for AI-generated code using Firecracker microVMs, suitable for executing untrusted code without compromising the host. GitHub Agentic Workflows substrate Shield enterprise environments from untrusted agent code execution by running agents in attested, kernel-enforced VM runners with hardened communication boundaries. Modal sandboxes (cross-reference) Provide AI-agent sandboxes with the same isolation tier as E2B but with the full Modal compute surface — GPUs, custom images, persistent volumes, scheduled functions. Strata Agent Fabric (Identity Orchestration) Provide an identity security control plane purpose-built for AI agents: discovery, registry, OAuth scope auditing, IDP binding, federated trust across clouds and platforms. SPIFFE / SPIRE workload identity Issue cryptographically-attested, short-lived identities to workloads (containers, VMs, agents) based on attested platform properties, enabling identity-based authentication without long-lived secrets. OAuth-for-agents patterns (PKCE, DCR, RFC 8693) Use existing OAuth 2.x standards — specifically PKCE, Dynamic Client Registration, and Token Exchange — to handle the agent's authentication and authorization story without inventing new protocols. Service mesh for agents (Istio / Linkerd / App Mesh) Use a standard service mesh (Istio, Linkerd, App Mesh, Consul Connect) to provide mTLS, traffic management, and observability for agent-to-agent and agent-to-service communication. Pilot Protocol (purpose-built agent networking) Issue agents encrypted tunnels and virtual addresses so they can communicate across networks and clouds without exposing routable IPs or requiring traditional service-discovery infrastructure. MCP Registry (cross-reference) Index verified, production-grade MCP servers and tools so agents can discover capabilities at runtime. Agent fabric registry (private) Maintain a source of truth for which agents exist in the deployment, their identity bindings, scopes, intent, TTL, audit trail, and risk level. KEDA (Kubernetes Event-driven Autoscaling) Scale Kubernetes deployments up and down in response to external event sources (queue depth, message backlog, HTTP request rate) rather than CPU/memory metrics alone. AWS Bedrock AgentCore Provide a managed runtime for AI agents on AWS with sandbox isolation, identity, observability, and integration with the AWS Bedrock model surface — the equivalent of a managed Kubernetes for agents. Personal AI Infrastructure (Daniel Miessler) Capture a single human's complete context (notes, calendar, communications, search history, browsing) into a unified data substrate that AI agents can operate against, with explicit memory lifecycles, context-priming pipelines, and background scripts that maintain the substrate over time. Fabric Prompts Framework Layer highly reusable AI structural contexts ("patterns") and workflows natively over command-line interfaces, with each pattern shipping as a self-contained markdown file. Awesome Agentic AI (mfornos) Catalog open-source standards, compute specifications, and coordination fabric infrastructures for the agent ecosystem. Awesome Agents (kyrolabs) Index the agent ecosystem with particular attention to networking, communication, and control-plane infrastructure.
LangGraph state and checkpointers (cross-reference) Model agent execution as a graph of nodes operating on a typed state object, with the framework managing state persistence, replay, and human-in-the-loop interrupts. Redis-backed chat history Persist conversation threads in Redis for sub-millisecond reads and writes, with TTL-based expiration and the option to back it with disk-persistent Redis variants for durability. Chroma Provide a vector database that runs in-process inside a Python application for fast prototyping, with a path to a client/server deployment when scale demands it. Qdrant Provide a production-grade vector database written in Rust, with distributed deployment, rich payload filtering, hybrid search, and the operational maturity to back agent memory at scale. Pinecone Provide vector search as a pure SaaS, with no operational responsibility for the deployer: API key in, vectors in, queries out, with serverless scaling and global availability. Weaviate Provide vector search with first-class support for multi-vector objects, multimodal data (text + image + audio embeddings on the same object), and graph-like relationships between objects. Mem0 Provide a memory layer that integrates with any agent framework, automatically extracts and stores user-relevant facts, handles dedup and update across a hybrid storage backend, and exposes a uniform add()/search() API. Letta (formerly MemGPT) Implement memory for LLM agents the way a computer's OS implements memory — with a small fast "main context" (RAM-like), large slower "archival memory" (disk-like), and paging logic that moves entries between tiers as needed. Zep Serve as a fast, scalable long-term memory store for AI assistant apps, with automatic summarization of past conversations to keep context windows clean and a knowledge-graph backend (Graphiti) for relationship-aware retrieval. AutoGen shared group-chat state Coordinate teams of agents through shared conversation transcripts, with structured database backends for persistence and replay. CrewAI memory abstractions Provide built-in, opinionated memory abstractions for multi-agent crews: short-term task memory (current task state), long-term memory (persistent across tasks), entity memory (facts about people, organizations, things). Supabase (Postgres + pgvector + auth + realtime) Provide Postgres plus pgvector plus auth plus real-time subscriptions as a coherent backend, suitable as the unified state-and-memory store for AI agent applications. pgvector (the Postgres extension itself) Add vector data types and similarity search operators to Postgres, so any Postgres deployment can serve as both a relational store and a vector memory store. OpenAI text-embedding-3 and Voyage AI Provide top-tier embedding quality through hosted APIs, with the operational advantages of a fully-managed service. BGE and the open-source embedding ecosystem Provide top-tier-quality embedding models that run on local hardware, with MTEB scores competitive with the closed proprietary leaders. GraphRAG (Microsoft Research) Build a knowledge graph from unstructured text by extracting entities and relationships via LLM, then use the graph structure to support retrieval that pure vector search would miss. Neo4j with LLM-augmented patterns Use the leading graph database as the storage substrate for agent memory, with LLM-driven extraction filling the graph from observations and Cypher queries serving retrieval. MTEB benchmark and leaderboard Rank embedding models on a comprehensive set of retrieval, classification, clustering, and pair classification tasks, providing a single comparable score that approximates real-world embedding quality. Mem0 memory-benchmarks and awesome lists Provide an open-source evaluation suite for agent memory systems plus community-maintained discovery indexes for the broader ecosystem.
LangGraph interrupts (static and dynamic) Let the agent author mark places in the graph where execution should pause for human input, and let the runtime persist the paused state for resumption when input arrives. CrewAI human_input Let the crew author mark specific agents or tasks as requiring human review of the output before the crew advances. AutoGen human_input_mode Configure each AutoGen agent's human-input behavior through one of three modes: ALWAYS (every message requires human review), NEVER (fully autonomous), TERMINATE (human input only when the conversation reaches a terminating state). Temporal (Signals for HITL) Provide durable execution for long-running workflows, with Signals as the mechanism by which external events (including human approvals) deliver into a running workflow regardless of how many process restarts have happened. AWS Step Functions wait-for-callback Provide AWS-native durable workflow orchestration, with the task-token callback pattern as the mechanism for pausing a workflow until an external event (including human approval) triggers resumption. LangSmith Provide the observability story for LangChain and LangGraph applications: granular nested traces of every agent run, dataset-based evaluation, prompt versioning, human-feedback capture, and the audit surface for production agent systems. Arize Phoenix Provide LangSmith-class observability features as open-source software, with OpenTelemetry as the trace transport and the option to run entirely on-premises or in a local notebook. Langfuse Provide an open-source observability and evaluation platform with first-class human-feedback capture (thumbs up/down, custom scores), prompt management, and dataset-based evaluation, deployable as self-hosted or as Langfuse Cloud. OpenInference Define a vendor-neutral standard for representing LLM calls, tool calls, agent steps, and retrieval operations as structured trace spans, with attribute conventions that make the resulting traces portable across observability backends. OpenTelemetry GenAI semantic conventions Extend OpenTelemetry's semantic conventions with LLM-specific attributes (model name, token counts, prompt and completion contents, agent step types) so that agent telemetry flows through the existing OpenTelemetry infrastructure used for general application observability. Chainlit Make it trivial to build a ChatGPT-style web UI for any Python agent, with first-class support for displaying the agent's internal step-by-step thought process and for human approval of tool calls mid-conversation. Streamlit as agent UI Serve as the agent UI when the broader Streamlit application surface (dashboards, multi-page apps, non-chat UI) is valuable and the chat-component subset is sufficient for the agent interaction. CopilotKit Provide headless React components and hooks that let an agent be added to an existing application as a chat panel, an inline action, or a mid-conversation form, with access to the host application's state. Vercel AI SDK + ai-sdk/react Provide a TypeScript SDK that abstracts over LLM providers, supports streaming, type-safe tool calls, and structured outputs, with React hooks (ai-sdk/react) for building chat UIs against the SDK's backend primitives. Anthropic Artifacts (as HITL surface) Render generated UI artifacts — HTML, React components, SVG, Mermaid diagrams, documents — inline in the Claude.ai conversation as interactive panels separate from the chat stream. Vercel AI SDK streaming UI Stream React Server Components from the agent to the React client, so the agent can render arbitrary custom UI components per turn rather than communicating only through text or pre-defined elements. Anthropic API tool_use ↔ tool_result loop Use the tool_use / tool_result handshake as the foundational pause-and-resume mechanism: when the model emits a tool_use, the application has full control over when and how to return a tool_result, including indefinite delays for human approval.
NVIDIA NeMo Guardrails Provide a programmable execution engine that enforces hard boundaries around conversation topics, prevents prompt injection, blocks toxic content, and steers conversations through canonical dialog flows defined in a domain-specific language called Colang. Guardrails AI Provide a Python framework where input and output validation is composed from reusable validators (Pydantic-style schema checks, PII detection, hallucination detection, profanity filtering, structural constraints), executed against LLM I/O at runtime. Guardrails Hub Provide a community-curated marketplace of plug-and-play validators (PII masking, hallucination detection, profanity filtering, topic adherence, jailbreak detection, and many more) that Guardrails AI applications can install and compose by name. Meta Llama Guard Provide a fast, classification-focused safety model that classifies inputs and outputs as safe or unsafe across a defined taxonomy of harm categories, deployable in front of a main agent LLM at sub-100ms latency. Managed classifier guardrails (Azure, AWS, Anthropic) Provide the Llama Guard pattern as a managed service: an API endpoint that classifies content against a vendor-defined safety taxonomy, with no model weights to download or infrastructure to run. DeepEval Provide a Python evaluation framework that feels like pytest, with assertions over LLM-specific metrics (hallucination, faithfulness, contextual relevancy, tool correctness, bias, toxicity), runnable in CI/CD pipelines, with first-class support for tracking metrics over time and detecting regressions. Ragas Provide the canonical evaluation framework for retrieval-augmented generation systems, with RAG-specific metrics (faithfulness, answer relevancy, context precision, context recall, answer correctness) computed correctly and with thoughtful handling of the multi-component nature of RAG pipelines. Promptfoo Provide a CLI tool and library for evaluating LLM outputs through side-by-side comparison of prompts, models, and parameters, with red-teaming capabilities for security testing and a web UI for inspecting results. OpenAI Evals Provide an open framework for creating, sharing, and running LLM evaluations, with a registry of community-contributed evaluations covering benchmarks, behavioral checks, and capability tests. HELM (Holistic Evaluation of Language Models) Provide a comprehensive, methodologically rigorous evaluation framework that compares language models across dozens of scenarios and metrics, with public leaderboards tracking model performance over time. LangSmith Evaluators Provide built-in evaluation capabilities tied to LangSmith traces: capture production runs into datasets, run evaluators against them, track metric trends over time, and integrate with the same UI that surfaces traces and prompt versions. Phoenix Evals (Arize) Provide LLM evaluation capabilities tightly integrated with Phoenix's trace UI, with built-in evaluators (hallucination, relevance, toxicity, code QA) and the workflow of capturing traces, running evals against them, and inspecting failures in the same notebook or self-hosted UI. NVIDIA Garak Provide a vulnerability scanner for LLM endpoints, with a catalog of probes that test for specific failure modes (prompt injection, jailbreaks, data leakage, encoding attacks, toxicity, malicious code generation), runnable from the CLI with structured reports. Microsoft PyRIT Provide a Python framework for composing and running adversarial attacks against LLM systems, with built-in attack strategies (jailbreaks, prompt injections, prompt converters), pluggable target endpoints, and scoring mechanisms to identify successful attacks. OWASP Top 10 for LLM Applications (2025) Identify the ten most critical security risks in LLM applications, with descriptions, example attack scenarios, and prevention strategies, updated annually based on community input and real-world incidents. NIST AI Risk Management Framework Provide a framework for organizations to manage AI risk throughout the AI lifecycle, organized around four core functions (Govern, Map, Measure, Manage) with detailed profiles for sectors (Generative AI Profile, July 2024) and use cases. MITRE ATLAS Catalog the tactics and techniques adversaries use against AI systems, in the same MITRE ATT&CK style used for general cybersecurity, providing security teams with structured threat intelligence specific to AI. Benchmark leaderboards and awesome lists Provide pointers to the current state of LLM evaluation, model performance, and the security ecosystem, updated by communities of practice rather than by single vendors.
MCP — Model Context Protocol Provide a JSON-RPC-based protocol for connecting LLM agents to external tools, data sources, and (increasingly) other agents, with a standardized server interface that any compatible client can consume. A2A — Agent-to-Agent Protocol Provide a JSON-over-HTTP protocol designed specifically for agent-to-agent communication, with first-class agent discovery (via Agent Cards), capability advertisement, task delegation with streaming, and a well-defined task lifecycle. ACP — Agent Communication Protocol (AGNTCY) Provide an alternative to A2A under Linux Foundation governance, with a multi-vendor consortium (Cisco, IBM, LangChain, Galileo) developing both the wire protocol and the surrounding ecosystem services (directory, identity, observability standards). LangGraph (multi-agent) Provide a state-graph framework where multi-agent systems are designed as explicit graphs with named agent nodes, typed state, and named transitions, with built-in patterns for supervisor (one manager dispatches to workers), hierarchical (managers of managers), and network (peer-to-peer) topologies. AutoGen / AG2 Provide a framework where multi-agent systems are designed as conversations among agents, with configurable conversation patterns (sequential, group chat, hierarchical), termination conditions, and the ability to mix LLM agents, code-executor agents, and human-input agents in the same conversation. CrewAI Provide a framework where multi-agent systems are designed as crews of role-defined agents executing structured tasks, with explicit roles, goals, and backstories per agent and clear task definitions including expected outputs. OpenAI Agents SDK Provide a lightweight framework where multi-agent systems are designed as networks of agents connected by hand-offs (agent-to-agent transitions implemented as tool calls), with the framework handling routing and the agents themselves being simple functions. Claude Agent SDK — subagents Provide a primitive where a main Claude agent can spawn specialized subagents — sub-instances of Claude with focused prompts and constrained tool sets — to handle delegated sub-tasks, with the subagent's context isolated from the main agent's and its output returned as a structured result. LlamaIndex Workflows Provide an event-driven workflow framework where steps (which may include agent invocations) react to typed events and emit new events, with the workflow structure emerging from the event flow rather than being explicitly graphed. Microsoft Magentic-One Provide a research-grade implementation of the manager-plus-specialists hierarchical pattern, with concrete specialists for web browsing (WebSurfer), file operations (FileSurfer), code execution (Coder, ComputerTerminal), and an Orchestrator that manages the team. The planner-executor pattern Separate task decomposition from task execution by using one agent (planner) to produce a structured plan and another agent (or set of agents) to execute each step, with the executor optionally able to surface failures back to the planner for replanning. The critic-and-reflection pattern Improve output quality by separating production (a writer or executor agent) from review (a critic agent) and iterating: the critic reviews the producer's output, identifies issues, and the producer revises based on the critique. The shared scratchpad pattern Avoid the hand-off problem by giving all agents read-and-write access to a common state object, so coordination happens through state changes rather than message-passing. The blackboard pattern Provide a structured shared workspace where each agent contributes typed artifacts and agent activation depends on which artifacts are present, supporting opportunistic and emergent coordination patterns. Multi-agent tracing across LangSmith, Phoenix, Langfuse Extend the trace-tree mental model from Volume 7 to multi-agent systems by adding agent identity to spans, correlating cross-agent calls, and capturing inter-agent metrics, with the major observability platforms providing native support through OpenInference's multi-agent attribute conventions. AGNTCY directory and ecosystem resources Provide the cross-organization agent directory (AGNTCY) and the community-curated awesome lists that track the multi-agent ecosystem as it evolves.
OpenSearch Provide a fully open-source search and analytics engine that supports BM25, dense vector retrieval (k-NN), learned sparse retrieval, learning-to-rank, and the orchestration to combine them into hybrid search pipelines. Elasticsearch Provide the original Elasticsearch implementation under Elastic's licensing, with first-class support for the full hybrid retrieval stack: BM25, dense vectors, ELSER learned sparse retrieval, RRF fusion, and the broader Elastic Stack (Kibana, Logstash, Beats, observability) for operational visibility. Vespa Provide a search platform with deeper programmability than Elasticsearch or OpenSearch: tensor-native data model, phased ranking with custom expressions at each phase, first-class support for late-interaction (ColBERT-style) retrieval, and the operational maturity of having run Yahoo's personalized news ranking at production scale for over a decade. Proprietary embedding APIs Provide high-quality embedding models as managed API services, eliminating the operational burden of self-hosting embedding model inference at the cost of vendor lock-in and per-token API charges. Open-weights embedding models Provide downloadable embedding model weights that run on local infrastructure, eliminating per-token API charges and external dependencies at the cost of operational responsibility for serving the model. Cross-encoder rerankers Provide cross-encoder rerankers that score query-document pairs with full transformer attention, producing higher-quality ranking over the candidate set returned by first-stage retrieval. ColBERT and late-interaction models Provide a middle ground between bi-encoders (single-vector embeddings, fast but limited representation) and cross-encoders (full cross-attention, expressive but slow): per-token embeddings with MaxSim aggregation, document representations precomputable and indexable, query-time computation lighter than cross-encoders. Unstructured.io Provide a unified document-processing library that handles PDFs, Word documents, HTML, images, emails, presentations, and many other formats, producing a structured element stream (Title, NarrativeText, Table, ListItem, etc.) that downstream chunking and indexing can consume intelligently. LlamaParse and the LLM-driven document parsers Provide LLM-driven document parsing that handles complex layouts — multi-column documents, nested tables, mathematical content, scientific papers, financial filings — with fidelity that heuristic-based parsers struggle to match. LlamaIndex Provide a Python framework whose core abstractions are organized around retrieval: documents become nodes; nodes feed indexes; indexes produce retrievers; retrievers compose into query engines; query engines compose into agents. The framework's opinionation matches the shape of production RAG. LangChain retrievers Provide a retriever abstraction (BaseRetriever) that composes uniformly with the rest of LangChain's ecosystem: any vector store can expose a retriever, retrievers compose via LCEL into chains, the chains plug into agents, and the agents plug into the broader LangChain runtime. Query transformation patterns (implementable across frameworks) Improve retrieval quality by transforming user input before retrieval: cleaning noisy inputs (rewrite), broadening vocabulary (expansion), bridging vocabulary mismatch (HyDE), breaking compound questions (decomposition), or hedging against any single phrasing being wrong (multi-query). Classical IR techniques in the LLM era Cover the classical information retrieval techniques — query parsing, synonym expansion, learned-to-rank reranking, faceted filtering, intent classification — that predate LLM-driven retrieval and remain valuable in domains where structured queries and curated synonyms outperform LLM transformations. Neo4j with vector indexes Represent the corpus as a property graph (nodes, relationships, properties), index node content with vector embeddings via native vector indexes, and retrieve via Cypher queries that combine graph traversal with vector similarity. Microsoft GraphRAG Automatically extract entities and relationships from an unstructured corpus into a knowledge graph using an LLM, identify communities of densely-related entities, produce hierarchical summaries for each community level, and retrieve via the community structure for queries that span the whole corpus rather than matching specific chunks. MTEB, BEIR, and benchmark resources Provide standardized measurement of retrieval components — MTEB for embedding models across 58+ tasks, BEIR for retrievers on a curated set of zero-shot retrieval benchmarks, RAGAS for end-to-end RAG quality — with public leaderboards that update as new models and methods are released.
NIST AI Risk Management Framework Provide a structured, voluntary framework for managing risks of AI systems across their lifecycle, organized around four core functions (Govern, Map, Measure, Manage) with detailed playbooks, profiles for specific contexts (Generative AI, Government Use), and crosswalks to other frameworks. ISO/IEC 42001 Provide an international standard specifying requirements for establishing, implementing, maintaining, and continually improving an AI management system within an organization, structured to align with other ISO management system standards (27001, 9001) and certifiable through accredited third-party audit. EU AI Act (Regulation 2024/1689) Provide a binding legal framework for AI systems placed on the market or put into service in the EU (or whose output is used in the EU), with risk-tiered obligations scaling from prohibited practices to minimal obligations, ex-ante conformity assessment for high-risk systems, post-market monitoring requirements, and significant fines for non-compliance. GPAI Code of Practice Provide voluntary implementation guidance for providers of General-Purpose AI models to demonstrate compliance with Articles 53-55 of the EU AI Act, developed through a multi-stakeholder process under the AI Office's coordination per Article 56 of the Act. Colorado: SB 24-205 and its SB 189 replacement Originally to impose comprehensive risk-management and impact-assessment obligations on developers and deployers of high-risk AI systems making consequential decisions affecting Colorado residents (SB 24-205); replaced in May 2026 by a narrower disclosure-and-consumer-rights framework (SB 189) ahead of the original law's effective date. NYC AEDT, California AB-2013, and other state AI laws Cover the patchwork of US state and local AI laws that, taken together, form the working US state regulatory environment for AI in specific domains (employment, transparency, training data, generative AI disclosures). FDA AI/ML guidance for medical devices Establish how the FDA regulates AI/ML-enabled medical devices throughout their lifecycle, with specific guidance on premarket submission, software changes (Predetermined Change Control Plans), and post-market monitoring of AI/ML medical devices. Financial services Model Risk Management (SR 11-7 and successors) Establish supervisory expectations for managing risk associated with quantitative models used by financial institutions, originally focused on credit and market risk models but extended through subsequent guidance to AI/ML models. The framework predates modern AI by over a decade but its discipline (model development, validation, ongoing monitoring, governance) maps onto AI/ML model management directly. HIPAA and AI in healthcare Apply existing HIPAA Privacy and Security Rule obligations to AI/ML systems that process protected health information (PHI), with additional considerations addressing AI-specific risks (training data PHI exposure, model memorization, output PHI disclosure). China generative AI measures and related frameworks Regulate generative AI services, algorithmic recommendation systems, and deep synthesis (synthetic media) provided to users in mainland China through specific, binding obligations enforced by the Cyberspace Administration of China and other regulators. UK, Singapore, Japan, and other emerging frameworks Cover the collection of national AI governance frameworks outside the EU, US, and China that shape industry practice in their markets through voluntary frameworks, sectoral guidance, or developing legislation. Model Cards and Data Sheets Provide standardized documentation formats summarizing what AI models do (Model Cards) and what data they were trained on (Data Sheets), with enough structure to be machine-readable and enough flexibility to apply across model types. AI Bills of Materials (AIBOM) Provide a structured inventory of components, dependencies, and provenance information for AI systems, modeled on Software Bills of Materials (SBOMs) used in software supply-chain security, adapted to capture AI-specific components (foundation models, fine-tuning data, embedding models, prompts as code). SOC 2 with AI considerations and emerging AI assurance services Provide audit-attested evidence of AI governance practices through SOC 2 reports with AI-specific Trust Services Criteria considerations or through standalone AI assurance services, in formats that enterprise procurement accepts as third-party validation. Regulatory tracking resources and discovery infrastructure Provide pointers to the tracking infrastructure that monitors AI regulatory developments across jurisdictions, sectors, and frameworks, updated continuously by communities of practice, law firms, advocacy organizations, and government agencies.
Workload identity systems (SPIFFE/SPIRE, cloud-provider patterns) Provide AI agents with cryptographic identity rooted in their deployment context (where they run, what image they run as, what cluster they're in) rather than in human-issued credentials, enabling fine-grained authentication and authorization without long-lived secrets. OAuth-for-agents and human-delegation flows Extend OAuth 2.1 patterns to AI agents acting on behalf of human users, providing scoped, time-limited, revocable delegation with explicit user consent for the specific operations the agent will perform. Policy engines for agent authorization (OPA, Cedar, SpiceDB) Make authorization decisions for AI agent operations through externalized policy engines, with policies written in domain-specific languages, evaluated outside the application code, and auditable independently of the enforcing application. Secrets management adapted for AI agents (Vault, AWS Secrets Manager, Doppler) Provide centralized, audited, controlled storage and distribution of secrets that AI agents need to function, with automatic rotation, fine-grained access control, dynamic secret generation where appropriate, and immutable audit logging. Code execution sandboxes (E2B, Modal, Daytona, Sandbox.do) Provide secure execution environments where AI agents can run arbitrary code (Python, JavaScript, shell commands, file operations) without compromising the host system, with each execution environment ephemeral, network-controlled, and bounded in resources. Browser automation sandboxes (Browserbase, Anthropic Computer Use, Anchor Browser) Provide isolated browser environments where AI agents can interact with web pages (clicking, scrolling, form filling, navigation) with the browser running in a controlled sandbox that bounds what the agent can access, what content can affect the host, and what state persists. MicroVM and container isolation substrates (Firecracker, gVisor, Kata Containers) Provide the substrate technology that production AI sandboxes are built on, with stronger isolation than container runtimes alone and lighter weight than full VMs, suitable for ephemeral per-task execution environments. Model provenance and signing (Sigstore, Hugging Face security) Apply software supply chain provenance disciplines (signing, verification, attestation) to AI model artifacts — foundation model weights, fine-tuned models, training datasets — producing verifiable evidence of where models came from and what was done to them. Dependency scanning adapted for AI (Snyk, Aikido, Garak indirect) Extend dependency scanning beyond traditional software components to cover the AI-specific stack: model files, embedding models, RAG components, agent frameworks, prompt templates, and the broader AI dependency surface where vulnerabilities may originate from data-handling rather than code execution. Immutable audit trails and SIEM integration for AI agents Provide tamper-resistant, regulator-grade audit logs for AI agent activity, integrated with SIEM platforms where security operations already work, capturing both AI-specific events (prompt injection attempts, jailbreak attempts, unusual tool-call patterns) and standard operational events (authentication, authorization decisions, data access). MITRE ATLAS Provide a comprehensive, structured catalog of adversarial techniques against AI systems with documented examples, mitigations, and detection strategies, modeled on the highly-adopted MITRE ATT&CK framework for general cybersecurity. OWASP AI Security and Privacy Guide and AI threat modeling patterns Cover the OWASP AI Security and Privacy Guide as a complementary resource to MITRE ATLAS, and the broader landscape of AI threat modeling patterns including STRIDE adaptations for AI systems. AI security communities and tracking resources Provide pointers to the active AI security communities and tracking resources that document new threats, defenses, and best practices as the field evolves, updated continuously by researchers, practitioners, and vendor security teams.
Token streaming UX Reduce perceived latency for agent responses by streaming tokens to the UI as they generate, producing the feeling of an agent typing in real time rather than waiting silently and then dumping a complete response. Intermediate state display (thinking, tool use, retrieval) Show users what the agent is doing during the response generation — not just the final answer, but the reasoning, tool calls, retrieved sources, and other process steps — in a way that supports trust and debugging without overwhelming with cognitive load. Anthropic Artifacts Provide a dedicated UI surface for substantial work products the agent generates — documents, code, interactive HTML, diagrams — displayed alongside the chat where the user iterates on them, distinct from the conversation history where chat-style interaction happens. OpenAI Canvas Provide a dedicated workspace for document and code editing within ChatGPT, with chat-driven and direct-manipulation editing both supported, designed to let users iterate on substantial work products without losing them in chat history. Vercel AI SDK generative UI Enable applications to stream interactive UI components (React, Svelte, Vue, Solid) from server-side AI responses rather than only text, so the agent can return structured interface elements (forms, cards, charts, custom widgets) that the user interacts with directly. Pre-action approval patterns Provide users with visibility into what the agent is about to do, with explicit opportunity to confirm, modify, or cancel before the action executes, scaled to the operation's significance and frequency. Undo-first design Let the agent act on routine operations without explicit approval, with a clear path to undo if the action is wrong. Reduce friction for the common case while preserving the user's control over outcomes. Citation patterns for AI responses Make the sources of agent claims visible and verifiable, so users can check the agent's work, understand which claims are supported by retrieved sources vs. model knowledge, and dig deeper into specific sources when needed. Confidence and uncertainty display Communicate to users when the agent is uncertain about claims, decisions, or actions, so users can apply appropriate skepticism and verify or override as needed. Status and progress UX for long-running agent tasks Communicate progress on tasks that take longer than reasonable synchronous wait times, in ways that let users monitor without forcing constant attention and that surface relevant state without overwhelming with raw trace data. Notification and resume patterns for sessions Let users start agent tasks and return to them later — across sessions, across devices, after the task completes — with the agent's state preserved and the user's context restored. Vercel AI SDK UI primitives Provide React, Vue, Svelte, and Solid hooks and components for building AI applications, with first-class support for streaming, tool calling, generative UI, and multi-turn conversations. assistant-ui Provide a polished, batteries-included React component library for building AI chat interfaces, with thread management, attachments, tool calls, message editing, and many other features that production chat UIs need. AI Elements (shadcn-style) and CopilotKit Cover two complementary substrates: AI Elements as shadcn-style composable AI components for teams already using shadcn, and CopilotKit as a framework for embedding AI inside existing applications rather than as standalone chat. Multi-agent UX patterns and visualization Provide users with visibility into multi-agent systems — what each agent is doing, how they're coordinating, where intervention is possible — and intervention surfaces appropriate to the system's complexity. AI UX design resources and community tracking Provide pointers to the active sources of AI UX design knowledge: vendor documentation of design decisions, design publications covering AI specifically, conferences where AI UX patterns are presented, and the broader design community as it engages with AI.
Foundation model providers Provide the foundation models on which all agent products and frameworks depend, with characteristic positioning, capabilities, pricing, and deployment options. Provider-native agent offerings Provide turnkey agent capabilities on top of foundation models, with deep integration that third-party products typically can't match, covering coding (Claude Code, Codex, Gemini Code Assist), computer use (Anthropic Computer Use, OpenAI Operator), and general assistance (ChatGPT, Claude.ai, Gemini). Claude Code, Cursor, Windsurf, GitHub Copilot — the supervised pair-programming tier Provide coding assistance that augments developer productivity through inline suggestions, multi-file edits, agentic task completion, and codebase-aware reasoning, with the developer reviewing each suggestion or task in real-time supervision mode. Devin, OpenAI Codex, Replit Agent — the autonomous async tier Provide async coding capability where the developer delegates entire tasks (a ticket, a bug fix, a feature implementation) and the agent plans, implements, tests, and submits a PR with minimal supervision, fitting workflow patterns where the alternative would be a junior engineer. Open-source coding agents (Aider, Cline, Continue, OpenCode) Provide credible open-source alternatives to commercial coding agents for teams or individuals who prioritize cost, inspectability, or customization over commercial polish. LangChain and LangGraph Provide a comprehensive Python and TypeScript framework for building LLM-powered applications and agents, with LangChain as the broader application framework and LangGraph as the specific orchestration layer for multi-step agent workflows. OpenAI Agents SDK and Anthropic Agent SDK Provide first-party agent frameworks from foundation model providers, with tight integration with the provider's underlying API capabilities and updates aligned with model releases. AutoGen, CrewAI, PydanticAI, Vercel AI SDK — the alternatives Cover the agent frameworks that hold meaningful positions in specific use cases or for specific developer preferences, alongside the dominant LangChain and provider-native options. Customer support agents (Decagon, Sierra, Intercom Fin) Automate customer support interactions through AI agents that handle inquiries autonomously, assist human agents with information retrieval and response drafting, and integrate with existing customer support infrastructure (Zendesk, Salesforce, custom ticketing). Sales and marketing agents (11x, Clay, Apollo AI) Automate sales and marketing workflows through AI agents that handle prospecting (identifying and researching leads), personalized outreach (drafting and sending emails at scale), and pipeline operations (qualifying leads, scheduling, follow-up). Legal, research, and other vertical agents (Harvey, Hebbia, Glean, Perplexity) Cover three additional vertical categories with established players: legal AI dominated by Harvey for large-firm work, enterprise research dominated by Hebbia and Glean for internal knowledge synthesis, and Perplexity for consumer and enterprise research with web-search emphasis. Anthropic Computer Use, OpenAI Operator, and Browserbase Enable AI agents to operate computers (clicking, typing, scrolling, navigating, executing tasks in arbitrary applications) and browsers (visiting URLs, filling forms, extracting data, interacting with web applications), with appropriate sandboxing and human oversight. Consumer AI assistants (ChatGPT, Claude.ai, Gemini, Perplexity, Grok) Provide consumer-facing AI assistants for general-purpose use — questions, writing assistance, coding help, research, creative work — with each product taking a distinct positioning on capability, integration, and content policy. Agent observability platforms (LangSmith, Phoenix, Langfuse, Helicone, Braintrust, Galileo) Provide the operational substrate for production agents: tracing of agent behavior, evaluation pipelines, prompt management, cost tracking, performance monitoring, and the operational tooling that distinguishes production deployments from prototypes. Tracking the agent product landscape Provide pointers to the tracking infrastructure that documents agent product developments — new products, acquisitions, repositioning, deprecation — with sufficient currency that procurement and architectural decisions reflect the actual state of the market.
System prompt design patterns Design system prompts that produce reliable agent behavior across the range of user inputs by establishing role, capabilities, constraints, and output guidelines in a structure the model handles well. Few-shot and N-shot prompting patterns Improve model performance on specific tasks by including 1–N examples of desired input→output pairs in the prompt, allowing the model to infer patterns from the examples rather than relying on instruction-following alone. Chain-of-thought and reasoning patterns Improve performance on complex reasoning tasks by prompting the model to think step-by-step before producing the final answer, and understand how reasoning-trained models have shifted the technique's applicability. Context selection and ordering at production scale Determine what content goes into the context window and in what order to maximize the model's ability to use it effectively, recognizing that larger context windows have not eliminated the need for this engineering. Context compression patterns Compress context content when the relevant material exceeds the available context budget, using summarization, extraction, hierarchical retrieval, or compression-specific models, with explicit awareness of the trade-offs each compression approach introduces. Structured output prompting and vendor features Get reliable, parseable structured output (JSON, XML, custom formats) from LLMs using vendor-provided structured output features where available, falling back to prompt engineering and validation patterns where they aren't. Prompt registries and versioning (LangChain Hub, PromptLayer, vendor playgrounds) Treat prompts as versioned artifacts with explicit history, rollback capability, and lifecycle management, using prompt registry infrastructure that separates prompts from application code while maintaining the connection between specific prompt versions and specific deployment versions. Prompt testing and evaluation patterns Apply systematic testing discipline to prompts: verify specific behaviors against test cases, detect regressions when prompts or models change, compare prompt variants quantitatively for A/B testing decisions. Model-specific prompting conventions across major providers Use the prompting conventions each major foundation model handles best, recognizing that cross-model portability is partial and production deployments need to be aware of which conventions they're using. Meta-prompting and prompt optimization patterns Use foundation models to generate, optimize, or refine prompts for other AI tasks, leveraging the model's knowledge of effective prompt patterns to accelerate prompt development. Resources for tracking prompting discipline Provide pointers to the active sources of prompting discipline knowledge: vendor documentation that captures current recommendations, practitioner publications that capture working knowledge, and academic literature that captures emerging techniques.
Per-task model selection patterns Match each subtask in an agent to the foundation model that best fits its specific requirements — capability, cost, latency, context window — rather than defaulting to one model for all subtasks. Prompt caching across providers (Anthropic, OpenAI, Google) Reduce inference costs and latency by reusing previously-processed prompt content across requests, using provider-native caching features that bill cached content at substantially reduced rates. Multi-model routing for cost Reduce average cost per request by routing easy tasks to cheaper models and reserving expensive models for tasks that genuinely need them, with a routing layer that classifies requests and selects the appropriate model. Latency engineering patterns for agent deployments Reduce the latency of agent responses through streaming (improves perceived latency), parallelization (reduces total latency for multi-step work), and batching (improves throughput for high-volume use cases). Multi-model orchestration patterns Implement the per-task model selection from Section A as a coherent architectural pattern: routing logic determines which model handles which subtask, fallback chains handle failures, and the orchestration layer abstracts these decisions from the agent's core logic. Fine-tuning techniques and when to use each Choose the right fine-tuning technique for the specific use case when fine-tuning is genuinely the right answer, recognizing that different techniques have different cost, complexity, and outcome profiles. Self-hosting vs. API economics Evaluate whether self-hosting open-weight models or consuming cloud-hosted foundation model APIs is more appropriate for a specific deployment, considering volume economics, deployment constraints, capability requirements, and operational capacity. Model versioning and migration patterns Handle the recurring cycle of model changes — new versions, deprecations, behavior drift — with patterns that minimize production disruption and maintain quality as models evolve. Resources for tracking LLMOps practice Provide pointers to the active sources of LLMOps practitioner knowledge: vendor documentation, practitioner blogs, conference talks, and adjacent communities.
TCO modeling for AI deployments Build a working model of all costs attributable to an AI deployment — inference, infrastructure, people, data, migration, risk — so that strategic decisions reflect the full cost picture rather than only the visible inference bill. Procurement strategy patterns for foundation model providers Move from commodity pay-as-you-go consumption to strategic procurement relationships when deployment scale justifies it, capturing volume discounts, negotiated terms, and provider commitments that improve TCO and reduce strategic risk. Build-vs-buy decision framework for AI stack layers Make conscious build-vs-buy decisions at each layer of the AI stack — application, agent logic, framework, foundation model, infrastructure — with rationale based on strategic value, differentiation potential, and capability fit rather than uniform defaulting. Modeling the latency-cost-quality frontier Make the latency-cost-quality trade-off explicit through modeling rather than implicit through engineering intuition, so deployment decisions about model selection, routing, and infrastructure reflect the actual frontier rather than assumed trade-offs. Performance benchmarking methodology for AI procurement Build benchmarks that answer procurement questions — which model represents best value for this use case, what are the strategic risks of each provider, how do the alternatives compare across all relevant dimensions — rather than benchmarks that only optimize engineering parameters. FinOps for AI — the organizational practice Apply FinOps practice to AI deployments — cross-functional accountability for cost, continuous visibility into spending, optimization driven by business value rather than engineering preference — with the discipline adapted for AI-specific cost patterns. Capacity planning patterns for AI workloads Provision AI capacity — API rate limits, reserved capacity, infrastructure for self-hosted models, GPU allocations — in ways that handle realistic demand variability without over-provisioning waste or under-provisioning failures. Tracking AI economics and FinOps discipline Provide pointers to the active sources of AI economics and FinOps knowledge as the discipline matures.