Catalog · Non-Functional Concerns

Volume 17

The AI Agent Economics Catalog

Volume 17 of the Agentic AI Series

8 patterns draft-v0.1 2026-05 Non-Functional Concerns

The Cost and Performance Economics Catalog

A Catalog of Economic Discipline for Agentic AI --- TCO, Procurement, Capacity, FinOps

Draft v0.1

May 2026

Table of Contents

About This Catalog

This is the seventeenth volume in a catalog of the working vocabulary of agentic AI, and the third “weaker candidate” after Volume 15 (Prompting and Context Engineering) and Volume 16 (LLMOps and Model Lifecycle). The honest framing matters even more this time. Volume 16’s Appendix F explicitly identified “cost engineering as a deeper standalone treatment” as a potential future weaker-candidate and warned that the bar should rise for additional volumes. This volume is precisely that future weaker-candidate, written immediately after the warning. The honest question this volume must answer in its own front matter: does the content clear the higher bar Volume 16 set, or is this volume the pattern’s breaking point?

The answer hinges on whether the distinction from Volume 16 holds up substantively. Volume 16 covered cost as engineering discipline: the techniques agent engineers apply to reduce inference costs (token reduction, caching, routing) and the operational patterns around model lifecycle. This volume covers cost as economic discipline: the strategic questions that engineering leaders, FinOps practitioners, and procurement professionals face when AI deployments reach the scale where economic decisions dominate engineering ones. Total cost of ownership beyond inference. Procurement strategy and committed-spend contracts. Build-vs-buy decisions at each layer of the stack. Latency-cost-quality trade-off frontiers as explicit economic models. Performance benchmarking methodology for procurement rather than for engineering. FinOps practice for AI workloads. Capacity planning under demand uncertainty. The audience is different (leaders, FinOps, procurement vs. engineers); the artifacts are different (TCO models, contracts, capacity plans vs. caching code and routing logic); the discipline is different (economic reasoning vs. engineering implementation).

If that distinction holds up, the volume clears the bar. If the content turns out to overlap Volume 16 substantially despite the framing, the volume is the pattern’s breaking point and the series should not extend further. The reader will judge whether the distinction holds. This volume’s honesty rests on engaging with the question rather than papering over it. The bar test is real, and the volume submits to it explicitly.

Scope

Coverage:

  • Total Cost of Ownership (TCO) for AI deployments: the six components beyond inference cost --- infrastructure, people, data, migration, risk.

  • Procurement strategy and contracts: committed-spend agreements, volume discounts, enterprise contracts, multi-provider strategy from procurement perspective.

  • Build-vs-buy at each layer: application, agent logic, framework, foundation model, infrastructure. Different answers per layer are normal.

  • Latency-cost-quality trade-off frontiers: modeling the trade-offs explicitly rather than optimizing one dimension blindly.

  • Performance benchmarking methodology: how to evaluate models for procurement, not just for engineering.

  • FinOps for AI: the practice of managing AI costs as an organizational discipline.

  • Capacity planning under demand uncertainty: provisioning for AI workloads with variable demand.

Out of scope (covered in Volume 16):

  • Token reduction techniques. Volume 16 Section B covers.

  • Prompt caching mechanics. Volume 16 Section B covers with code example.

  • Multi-model routing patterns for cost. Volume 16 Section B covers.

  • Streaming, parallelization, batching as latency techniques. Volume 16 Section C covers.

  • Self-hosting vs API economics as engineering decision. Volume 16 Section F covers.

  • Fine-tuning cost trade-offs. Volume 16 Section E covers.

How to read this catalog

Part 1 (“The Narratives”) is conceptual orientation: the bar test that this volume must pass; the distinction between engineering and economic discipline; the TCO picture beyond inference; performance as economic variable; capacity planning under uncertainty. Four diagrams sit in Part 1.

Part 2 (“The Substrates”) is reference material organized by section, with a smaller entry count matching the weaker-candidate framing established in Volumes 15 and 16.

Part 1 — The Narratives

Five short essays orient the volume’s economic-discipline framing. The reference entries in Part 2 assume the perspective established here.

Chapter 1. The Bar Test

Volume 16’s Appendix F set a constraint: “Future weaker candidates would need to clear a rising bar: not every adjacent area of practitioner knowledge justifies its own consolidation.” The appendix specifically identified “cost engineering as a deeper standalone treatment” as one of the candidates that would need to clear the higher bar. This volume is exactly that candidate, written immediately after the warning. The bar test is real and the volume must engage with it honestly.

The bar test
Volume 16 covered the engineering discipline of cost. This volume covers the economic discipline. Whether the distinction holds up substantively determines whether the volume clears the bar.

Volume 16 covered cost as engineering discipline. Section B documented the cost engineering triangle (tokens × caching × routing); prompt caching mechanics with code examples for Anthropic’s cache_control; multi-model routing patterns. Section C documented latency engineering. Section E documented fine-tuning cost trade-offs. Section F documented self-hosting vs API economics from an engineering decision perspective. The audience throughout was agent engineers; the artifacts were caching configurations, routing classifiers, fine-tuning pipelines. Everything in Volume 16 was about what engineers do to reduce cost and improve performance.

This volume must cover something different to earn its place. The economic discipline is what engineering leaders, FinOps practitioners, and procurement professionals do when AI deployments reach scale. Total cost of ownership beyond the visible inference line item. Strategic procurement decisions with provider negotiations and committed-spend contracts. Build-vs-buy decisions at each layer of the stack, with different answers per layer. Latency-cost-quality trade-off frontiers as explicit economic models that inform deployment strategy rather than as engineering parameters. Performance benchmarking methodology designed for procurement decisions rather than engineering optimization. FinOps practice that surfaces costs to the organization and creates accountability. Capacity planning under demand uncertainty. The audience differs (engineering leaders, FinOps, procurement vs. engineers); the artifacts differ (TCO models, contracts, capacity plans vs. caching code and routing classifiers); the discipline differs (economic reasoning vs. engineering implementation).

If the distinction is real, the volume clears the bar. If readers find that this volume’s content overlaps Volume 16 substantially despite the framing, the volume is the pattern’s breaking point and the series should not extend further. The honest answer is that the distinction is real but the boundary is fuzzy: economic discipline informs engineering choices and vice versa. The volume’s contribution depends on whether the strategic and organizational dimensions are coherent enough to deserve their own treatment, separately from the engineering techniques Volume 16 covered. The volume submits to that test explicitly and lets the reader judge.

Chapter 2. The Total Cost of Ownership Picture

Inference costs are visible. The bill from Anthropic, OpenAI, or Google shows up monthly with specific numbers. Engineering teams optimize against the visible bill because it’s what they see. The full TCO picture has five additional components that compound, often invisibly, into the actual cost of running AI in production. Surfacing all six components is the first step in economic discipline; teams that optimize only the visible component miss most of the cost.

Total cost of ownership
Six components: inference, infrastructure, people, data, migration, risk. Shares vary by deployment size and industry.

Inference is the visible component. API charges from foundation model providers or compute costs for self-hosted infrastructure. At production scale, inference typically runs 30—60% of total AI TCO. Engineering teams have learned to optimize this dimension through Volume 16’s engineering disciplines. The optimization is real and important; the mistake is thinking inference optimization is the whole problem.

Infrastructure beyond inference is the second component. Storage for embeddings, training data, eval data, retrieved corpora. Networking costs for data movement between systems. Observability platform subscriptions (LangSmith, Phoenix, Langfuse, Helicone, Braintrust, Galileo --- Volume 14 covers the products). Vector databases and search infrastructure (Pinecone, Weaviate, Elastic). Authentication and access control systems for AI-specific use cases. Typically 10—20% of TCO; grows with deployment scale and complexity. The component is harder to optimize than inference because it’s distributed across multiple systems; each subsystem looks small until you total them.

People are the third component and dominate at small scale. Engineers building and operating AI systems. ML operations specialists. Prompt designers and evaluation engineers. Data engineers managing training and eval corpora. Security and compliance specialists for AI-specific concerns. Typically 20—40% of TCO; this dominates at small scale because team size doesn’t scale linearly with inference volume --- a small team can support a large deployment, but every deployment needs some people. The component is the hardest to optimize because the alternative to people is either products (which substitutes one cost for another) or absence (which sacrifices capability).

Data is the fourth component. Training data acquisition and curation for fine-tuning. Eval data construction and maintenance. RAG corpus curation, indexing, and updating. Annotation and labeling for evaluation. Typically 5—15% of TCO; higher for vertical agents with specialized domain corpora. The component is often misunderstood because data costs are spiky --- large upfront acquisition costs followed by smaller maintenance costs --- and the budget cycle doesn’t match the cost pattern.

Migration is the fifth component and is recurring. Each foundation model update requires eval re-runs, prompt rework, gradual rollout, monitoring. Migration costs are 5—10% of TCO when modeled as continuous; spiky when modeled per release. Production teams that don’t budget for migration are surprised by the recurring cost; teams that budget for it allocate engineering time to handle it without disrupting feature delivery.

Risk is the sixth component and varies dramatically by industry. Compliance audits and documentation. Security assessments. Incident response capability. Audit trails and governance tooling. Insurance premiums for AI-specific exposures. Typically 5—20% of TCO; the high end is regulated industries (healthcare, financial services, legal) where the regulatory layer has substantial cost. The component is often outside the AI team’s budget but is real AI cost; surfacing it in TCO analysis matters for strategic decisions.

The shares are illustrative, not prescriptive. A specific deployment may have inference at 70% (high-volume consumer agent with little customization), people at 60% (small team building a complex enterprise integration), or risk at 30% (healthcare AI with extensive compliance burden). The economic discipline is modeling the actual TCO for the specific deployment, not assuming the inference bill is the whole cost. FinOps practice (Section F) builds the infrastructure for ongoing visibility into all six components.

Chapter 3. Performance as Economic Variable

Performance --- capability, latency, throughput --- is often treated as an engineering optimization. The economic discipline treats it as a variable to model alongside cost and quality. The latency-cost-quality frontier is the working model: improvements on any dimension typically cost something on the others; the efficient frontier is the set of trade-offs that aren’t dominated. Decisions about which corner of the frontier to optimize for shape the entire deployment’s economics.

The latency-cost-quality frontier
Three dimensions, three trade-offs. The frontier shifts as foundation models improve. Optimizing for all three simultaneously is the common failure.

Quality, cost, and latency interact. Higher quality typically means more capable (and more expensive) models, which often also have higher latency. Lower cost typically means cheaper models, which often have lower capability or higher variance. Lower latency typically means smaller models or simpler pipelines, which often sacrifice capability. The three-way trade-off is real; what’s changed through 2024—2026 is the speed at which the frontier shifts as foundation models improve.

The frontier shift is the key economic phenomenon. In 2023, achieving good agent performance often required the top-tier model of the day. By 2026, much of what required Opus-tier in 2024 runs adequately on Haiku-tier or its equivalents at other providers, at a fraction of the cost and latency. The frontier shifts annually as new model releases push the boundary outward; tasks that were on the frontier last year are no longer there. The economic discipline involves re-evaluating each deployment’s position on the frontier on every major model release, not just engineering-optimizing the current position.

The common failure: optimizing for all three dimensions simultaneously. Production teams that demand maximum quality, lowest cost, and lowest latency at once produce confused engineering: routing to capable models for quality, then complaining about cost; using cheap models for cost, then complaining about quality; choosing fast models for latency, then complaining when reasoning suffers. The economic discipline: pick the corner you optimize for; accept what you give up on the others. Different parts of the agent may pick different corners; multi-model orchestration (Volume 16 Section D) implements per-corner choices.

Modeling the frontier explicitly matters. Production teams that draw the actual frontier for their workload --- plotting cost against latency for different quality levels --- make better deployment decisions than teams that intuit the trade-offs. The model doesn’t need to be precise; even rough plotting of the working frontier reveals choices the team should be making but isn’t. The discipline of explicit modeling is more important than the specific model’s accuracy.

Chapter 4. Build vs. Buy at Each Layer

Build-vs-buy is a recurring strategic decision in technology procurement; for AI deployments it applies at multiple layers simultaneously, with potentially different answers at each layer. The economic discipline involves making the decisions consciously per layer rather than defaulting to uniformity. Mixed deployments --- buy infrastructure, buy models, build on a framework, build the application --- are the norm in 2026 production deployments.

Build vs buy at each layer
Different answers per layer are normal. Uniform answers are the anti-pattern.

The application layer is usually build. The user-facing product is where differentiation lives; buying a generic application typically means buying a generic outcome. Vertical agent products (Volume 14 Section D --- Decagon, Sierra, Harvey, Hebbia, Glean) are the exception: when a vertical product fits the use case closely, buying is the right answer because the alternative is rebuilding what the product provides. The decision criterion: does this layer represent differentiated value, or undifferentiated commodity work?

Agent logic is usually build, on a framework. The orchestration, tool definitions, prompts, and business logic are where the application’s specifics live; framework leverage (LangChain, OpenAI Agents SDK, Anthropic Agent SDK, Vercel AI SDK, PydanticAI --- Volume 14 Section C) provides the primitives without forcing the specifics. The build-on-framework pattern dominates because pure-build (everything from scratch on the foundation model API) requires too much reinvention, while pure-buy (a turnkey agent product) doesn’t fit most differentiated use cases.

Frameworks themselves are buy: use open source. Building a custom agent framework typically wastes engineering on undifferentiated infrastructure that other people have already built and battle-tested. The exception is large organizations with specific platform requirements that existing frameworks don’t meet; even then, building on an existing framework and extending is usually better than ground-up building.

Foundation models are mostly buy (API), with self-host (partial build) for specific volume or sovereignty cases. Volume 16 Section F covers the self-hosting vs. API decision from the engineering perspective; the economic perspective is the same with different emphasis. API economics scale linearly with usage; self-hosting requires upfront infrastructure investment with marginal cost dropping after the breakeven. For most deployments, API is the right answer; for high-volume deployments or deployments with sovereignty requirements, partial self-hosting becomes economically rational.

Infrastructure is buy, from cloud providers. Specialized AI infrastructure providers (Anyscale, RunPod, Coreweave, Together, others) compete with hyperscalers (AWS, GCP, Azure) on specific dimensions --- specialized GPU availability, pricing, AI-specific tooling. The choice among providers is itself a build-vs-buy decision at finer granularity. Building your own data center infrastructure for AI is rarely economically rational outside the very largest organizations.

The decisions compose. A typical production deployment: build the application (custom for the use case), build the agent logic on a framework (custom orchestration with framework leverage), buy the framework (open source), buy the foundation model (API with some self-hosting for specific workloads), buy the infrastructure (cloud). The compositional pattern reflects the reality: differentiated layers get built; commodity layers get bought; the boundary between them is where the strategic decisions concentrate.

The anti-patterns are uniform answers. “We build everything” wastes engineering on undifferentiated layers where buying would be cheaper and faster; the team ends up with mediocre versions of what better-resourced teams have built professionally. “We buy everything” leaves no defensible differentiation; the application becomes a thin wrapper around purchased components that competitors can equally easily wrap. The discipline is matching the build-vs-buy decision to each layer’s strategic importance, not imposing uniformity.

Chapter 5. Capacity Planning Under Uncertainty

Traditional capacity planning is hard. Capacity planning for AI deployments is harder. Demand is variable in ways that don’t match traditional software workloads: AI usage often correlates with user-driven activity in unpredictable ways; new agent capabilities can trigger order-of-magnitude usage shifts when users discover them; foundation model rate limits and pricing tiers introduce step functions in cost vs. volume curves. The economic discipline involves planning explicitly for the uncertainty rather than assuming the past predicts the future.

The basic capacity planning question: how much capacity to provision, when. For API-based deployments, this is about reserved capacity and committed-spend contracts. For self-hosted deployments, this is about infrastructure provisioning and GPU capacity. For mixed deployments, both questions apply. The wrong answer in either direction has costs: over-provisioning wastes money; under-provisioning produces user-facing failures and emergency scrambles.

Volatility patterns for AI workloads. Demand follows user activity patterns (business hours, time zones, seasonal patterns) similarly to other software, but with characteristic AI-specific features. New capabilities trigger usage spikes when users discover them; a feature launch can produce 10x usage in days. Reasoning-heavy tasks have variable inference time; a request that took 5 seconds yesterday may take 30 seconds today depending on what the model decides to reason about. Multi-step agent workflows compound variability; each step’s variance multiplies through the chain.

Planning patterns. Buffer for variance (provision for the 95th or 99th percentile of demand, not the average). Reserved capacity for baseline (committed-spend agreements with providers reduce per-token costs in exchange for usage commitments). On-demand capacity for surges (pay premium for the surge tier rather than provisioning permanent capacity for occasional spikes). Multi-provider strategy for diversification (single-provider deployments are exposed to single-provider rate limits and outages; multi-provider provides natural elasticity). Each pattern has trade-offs in cost and complexity; the right combination depends on the workload’s characteristics.

Forecasting AI workload demand is harder than forecasting traditional software demand. Time-series methods (ARIMA, Prophet) work for stable patterns but miss capability-driven shifts. Capacity planning under uncertainty requires explicit scenarios: baseline forecast, growth scenario, surge scenario; capacity plans that handle the range rather than betting on one point estimate. The discipline is humility about forecast accuracy combined with planning that doesn’t require accurate forecasts to succeed.

Part 2 — The Substrates

Eight sections survey the economic discipline of AI cost and performance management as of mid-2026. Entry counts match the weaker-candidate framing established in Volumes 15 and 16.

Sections at a glance

  • Section A --- Total cost of ownership for AI deployments

  • Section B --- Procurement strategy and contracts

  • Section C --- Build-vs-buy at each layer

  • Section D --- Latency-cost-quality frontiers

  • Section E --- Performance benchmarking for procurement

  • Section F --- FinOps for AI

  • Section G --- Capacity planning under uncertainty

  • Section H --- Discovery and resources

Section A — Total cost of ownership for AI deployments

Modeling all six TCO components, not just the inference bill

TCO discipline starts with surfacing the components. The inference bill is visible because it arrives monthly from the foundation model provider. The other five components are distributed across the organization’s budget in ways that obscure their AI-attribution. The economic discipline involves attributing costs to the AI deployment that produces them rather than letting them remain hidden in general infrastructure, headcount, and overhead lines.

TCO modeling for AI deployments

Source: FinOps Foundation guidance adapted for AI; practitioner write-ups on AI cost analysis; vendor calculators (Anthropic, OpenAI, Google) for inference components

Classification The practice of modeling total cost of ownership for AI deployments across all six components.

Intent

Build a working model of all costs attributable to an AI deployment --- inference, infrastructure, people, data, migration, risk --- so that strategic decisions reflect the full cost picture rather than only the visible inference bill.

Motivating Problem

Engineering teams optimize what they can see. The inference bill is highly visible; the other TCO components are distributed across budgets and harder to attribute. The result: engineering optimization focuses heavily on inference reduction while other cost components grow unchecked. Strategic decisions (build vs buy, fine-tune vs prompt, vendor selection, capacity planning) made on incomplete cost data produce suboptimal outcomes. TCO modeling surfaces the full picture and informs better decisions.

How It Works

Six-component inventory. Start with the six components from Chapter 2: inference, infrastructure, people, data, migration, risk. For each, identify the specific costs attributable to the AI deployment. Some costs are clear (the Anthropic invoice for inference); some require allocation (a fraction of the cloud bill, a fraction of the security team’s time, a fraction of compliance audit costs).

Attribution methodology. Direct attribution where possible: this engineer spends 80% of time on the AI deployment; their fully-loaded cost × 0.80 is attributable. Allocation where direct attribution isn’t feasible: the security team handles AI alongside other concerns; allocate based on time tracking or proportional headcount estimates. The discipline values reasonable estimates over precise but unattainable accuracy.

S

Sensitivity analysis. Once the model exists, vary the inputs to see what dominates. If inference is 50% and people are 30%, doubling inference volume might double the inference component but only marginally change people. If a 10x growth scenario is plausible, the model shows which components grow proportionally vs. less than proportionally. The discipline informs strategic decisions about scaling.

Comparison and benchmarking. TCO models become more useful when compared across deployments or benchmarked against industry. Internal comparison: how does this team’s TCO per unit of business value compare to that team’s? Industry benchmarking: how does our TCO compare to public information about similar deployments? Benchmarks are imperfect but inform reality checks.

Refresh cadence. TCO models age. Foundation model pricing changes; team composition shifts; the deployment grows or shrinks. Annual TCO refresh is the minimum cadence; quarterly is better for active deployments; continuous (with FinOps tooling, Section F) is best for high-stakes deployments.

When to Use It

Production AI deployments above the toy scale where economic decisions matter. Strategic decisions (vendor selection, build vs buy, scaling, capacity planning) where incomplete cost data leads to wrong answers. Organizations where AI cost is becoming a visible budget line and accountability matters.

Alternatives --- inference-only cost tracking for small-scale deployments where the other components don’t materially affect decisions. Industry benchmarks instead of bottom-up modeling for high-level strategic planning. The combination of bottom-up modeling and benchmark cross-checks is the working pattern for serious TCO discipline.

Sources

  • FinOps Foundation (finops.org) --- general TCO methodology adapted for AI

  • Vendor cost calculators (anthropic.com, openai.com, ai.google.dev pricing pages)

  • Industry analyst reports (Gartner, Forrester) on AI cost benchmarks

Section B — Procurement strategy and contracts

Committed spend, volume discounts, multi-provider strategy from the procurement perspective

AI procurement in 2026 has matured from pay-as-you-go API consumption to negotiated enterprise contracts at scale. The economic discipline involves engaging with procurement as a strategic function rather than treating provider relationships as commodity purchases. Committed-spend agreements, volume discounts, multi-provider strategy, and enterprise contract terms all become levers when AI deployment scale justifies the negotiation overhead.

Procurement strategy patterns for foundation model providers

Source: Vendor enterprise sales pages; practitioner reports on enterprise contract negotiations; analyst coverage of AI procurement trends

Classification Strategic procurement approaches for foundation model and AI infrastructure providers.

Intent

Move from commodity pay-as-you-go consumption to strategic procurement relationships when deployment scale justifies it, capturing volume discounts, negotiated terms, and provider commitments that improve TCO and reduce strategic risk.

Motivating Problem

Foundation model providers publish list prices. At small scale, list pricing is what you pay. At enterprise scale, list pricing leaves money on the table: providers compete for enterprise relationships and offer significant discounts and concessions to customers who commit to volume. The economic discipline involves recognizing when scale justifies negotiation and engaging procurement as a strategic function rather than treating provider relationships as commodity purchases.

How It Works

Volume discounts. Most providers offer tiered pricing for enterprise customers, with discounts that grow with committed volume. The discount levels aren’t always public; engagement with provider sales is the path to understanding what’s available. Typical patterns: 10—20% discount for moderate commitment, 20—40% for substantial commitment, deeper discounts for the very largest customers.

Committed-spend agreements. Annual or multi-year commitments to specific spending levels in exchange for discounts and other concessions. The trade-off: discounted pricing in exchange for spending floor. Risk: actual usage below the floor still incurs the full commitment. The model fits deployments with predictable growth; less appropriate for deployments with high demand uncertainty.

Enterprise contract terms beyond price. SLAs (specific availability and performance commitments). Data handling and privacy commitments (specific terms about training-data use, data residency). Indemnification (provider takes responsibility for specific risks). Custom rate limits and burst capacity. Access to upcoming features and preview programs. Each term has value separate from per-token pricing; the strategic negotiation captures the bundle.

Multi-provider procurement strategy. Concentrating spend with one provider gets the deepest discounts; diversifying across multiple providers gets resilience and capability optimization. The economic discipline involves modeling the trade-off explicitly: how much discount does sole-source commitment provide, and is the resilience and capability of multi-provider worth that discount?

Provider relationship management. Strategic procurement isn’t a one-time transaction; it’s an ongoing relationship with the provider that affects pricing, feature access, and strategic alignment over time. Production teams at significant scale maintain explicit relationship management with their major providers: regular reviews, escalation paths, executive engagement on strategic alignment.

Negotiation timing. The right time to negotiate is when usage is growing toward levels that justify a contract --- not after growth has happened (when leverage is lower) and not before usage is significant (when there’s nothing to negotiate over). Production teams that watch usage growth and engage procurement proactively typically get better terms than teams that react after the inference bill has become alarming.

When to Use It

Production deployments approaching or exceeding enterprise scale (annual spend in the high-five-figures to seven-figures and growing). Organizations where AI is a strategic priority and provider relationships are worth managing strategically. Cases where committed-spend agreements’ discount level justifies the commitment.

Alternatives --- pay-as-you-go consumption for small-scale deployments where negotiation isn’t justified. Vendor-direct relationships through cloud marketplace contracts for organizations with existing cloud procurement relationships that can extend to AI services.

Sources

  • Vendor enterprise pricing pages (anthropic.com, openai.com, ai.google.dev)

  • Practitioner reports on AI contract negotiations

  • Cloud marketplace contracts (AWS, Azure, GCP) as procurement vehicles

Section C — Build-vs-buy at each layer

Decision frameworks for the application, agent logic, framework, model, infrastructure layers

Chapter 4 established the layered build-vs-buy framework. The economic discipline involves applying it consciously rather than defaulting to uniformity. Production deployments typically combine build and buy across layers; the strategic question is which decisions at which layers, with what rationale.

Build-vs-buy decision framework for AI stack layers

Source: General build-vs-buy literature adapted for AI; practitioner write-ups on AI procurement decisions

Classification Decision framework for build-vs-buy applied separately to each layer of the AI stack.

Intent

Make conscious build-vs-buy decisions at each layer of the AI stack --- application, agent logic, framework, foundation model, infrastructure --- with rationale based on strategic value, differentiation potential, and capability fit rather than uniform defaulting.

Motivating Problem

Build-vs-buy is often treated as a single decision rather than as a per-layer decision. The result: teams that build everything waste engineering on undifferentiated layers; teams that buy everything leave no defensible value. The per-layer framework forces conscious decision-making and produces deployments where each layer’s strategic role determines its build-vs-buy answer.

How It Works

The decision criteria per layer. (1) Does this layer represent differentiated value? If yes, lean build. (2) Does buying provide capability the team would struggle to match? If yes, lean buy. (3) What’s the maintenance burden of building this layer? Lean buy if maintenance would be substantial. (4) What’s the lock-in cost of buying? Lean build if lock-in is unacceptable.

Application layer. The user-facing product is where the application’s differentiation lives. Almost always build (unless a vertical product fits closely, in which case buy that). The strategic question: is your value proposition really differentiated, or are you building a thin variant of what existing products provide?

Agent logic layer. The orchestration, prompts, tools, and business logic. Usually build, on a framework. The framework provides primitives; you provide specifics. Pure-build (everything from scratch on the foundation model API) requires too much reinvention; pure-buy (turnkey agent product) doesn’t fit differentiated use cases.

Framework layer. The agent SDK or orchestration framework. Buy (use open source). Building a custom framework typically wastes engineering on undifferentiated infrastructure that other people have built and battle-tested. The exception: large organizations with specific platform requirements that existing frameworks don’t meet; even then, extend an existing framework rather than ground-up build.

Foundation model layer. Buy (API) for most cases; partial-build (self-host open weights) for high-volume or sovereignty-required cases. Volume 16 Section F covers the engineering decision; the economic decision is similar with emphasis on TCO comparison rather than per-token comparison.

Infrastructure layer. Buy from cloud providers. Building data center infrastructure for AI is rarely economically rational outside the very largest organizations. The fine-grained decision: which cloud provider, with what specific services? Specialized AI infrastructure providers (Anyscale, RunPod, Coreweave, Together) compete with hyperscalers on specific dimensions; the choice is itself a finer-grained build-vs-buy decision.

Composition of decisions. A typical production deployment: build application, build agent logic on framework, buy framework (open source), buy foundation model API, buy cloud infrastructure. The composition matches differentiated layers being built and commodity layers being bought. Teams that produce different compositions should be able to articulate why each layer’s decision differs from this baseline.

When to Use It

Any organization making AI procurement and build decisions at strategic scale. New deployments where the architecture is still being decided. Existing deployments under review for strategic alignment. Cases where uniform build or uniform buy is producing visible problems and the alternative is conscious layered decisions.

Alternatives --- vertical product adoption (Volume 14 Section D) for cases where a turnkey vertical fits the use case. Custom development on direct foundation model APIs for cases where framework abstraction interferes with specific requirements. The framework above is the working pattern for most production deployments; alternatives apply at the edges.

Sources

  • General build-vs-buy literature (Levitt, Stuckey)

  • AI-specific procurement write-ups in industry publications

Section D — Latency-cost-quality frontiers

Modeling the three-way trade-off explicitly to inform strategic decisions

Chapter 3 introduced the latency-cost-quality frontier as economic model. This section makes the modeling discipline explicit. Production teams that draw the actual frontier for their workload make better deployment decisions than teams that intuit the trade-offs. The model doesn’t need to be precise; rough plotting reveals decisions the team should be making but isn’t.

Modeling the latency-cost-quality frontier

Source: Engineering economics literature adapted for AI; practitioner reports on production AI optimization

Classification Explicit modeling of the three-way trade-off among latency, cost, and quality for deployment decisions.

Intent

Make the latency-cost-quality trade-off explicit through modeling rather than implicit through engineering intuition, so deployment decisions about model selection, routing, and infrastructure reflect the actual frontier rather than assumed trade-offs.

Motivating Problem

Teams optimize the dimension they’re measured on. Engineering teams measured on latency optimize latency. Finance measured on cost focuses on cost. Product measured on quality emphasizes quality. The three-dimensional trade-off is invisible to each function in isolation; explicit modeling surfaces the trade-off and supports decisions that balance the dimensions rather than optimizing each in isolation.

How It Works

Construct the frontier. For a representative workload, run experiments at different points: cheapest fast model, mid-tier model, frontier model. Measure quality (against eval suite), cost (per request or per task), latency (P50, P95, P99). Plot the results. The plot reveals which points are on the efficient frontier (no dominated alternatives) vs. dominated (some other point is better on all three dimensions).

Identify the operating point. Where on the frontier is the deployment currently operating? Where should it operate based on business requirements? The gap between actual and desired operating points is the optimization opportunity. The gap may be large (the team is operating well off the frontier) or small (the team is operating at a point near the frontier but the wrong point for the use case).

Trade-off rates. The frontier’s slope reveals the trade-off rate. Moving from point A to point B costs how much cost per unit of latency improvement? How much quality per unit of cost reduction? The rates inform local decisions: is this latency improvement worth the cost increase? Is this cost reduction worth the quality loss?

Frontier shifts. The frontier moves with each major foundation model release. What required Opus-tier in 2024 may run on Haiku-tier in 2026. Production teams re-run frontier experiments on major model releases to capture the shift. Migrations driven by frontier shifts often produce 2—5x improvements on dominant dimensions because the new frontier is meaningfully better.

Per-workload variation. Different workloads have different frontiers. A coding agent’s frontier differs from a customer support agent’s frontier differs from a research agent’s frontier. The discipline involves modeling per workload rather than applying one frontier to all use cases. Multi-model orchestration (Volume 16 Section D) is the implementation of per-workload frontier choices.

Communication artifacts. The frontier model is a communication artifact. Engineering teams use it to explain trade-offs to non-engineering stakeholders. Product teams use it to articulate quality decisions. Finance uses it to understand why the most-capable model isn’t always the right answer. The shared model reduces the function-specific optimization that produces unbalanced deployments.

When to Use It

Production deployments where latency, cost, and quality all matter materially. Strategic decisions (model selection, routing strategy, capacity planning) that involve trade-offs across the three dimensions. Organizations where different functions are optimizing different dimensions in ways that produce confusion or conflict.

Alternatives --- single-dimension optimization for cases where one dimension dominates the others. Default to vendor recommendations for cases where the deployment is small enough that explicit modeling isn’t justified. The modeling discipline pays off at production scale; for small deployments, the engineering effort exceeds the strategic benefit.

Sources

  • Engineering economics literature (general)

  • Production AI optimization write-ups from practitioners

Section E — Performance benchmarking for procurement

Evaluating models for purchase decisions, not just for engineering

Engineering teams benchmark models for optimization (Volume 8 covers eval discipline). Procurement teams need benchmarks for purchase decisions, which is a different discipline with different requirements. The economic benchmarking question: which model represents the best value for this use case, considering quality, cost, latency, and the strategic dimensions (provider relationship, lock-in, capability evolution) that engineering benchmarks don’t capture.

Performance benchmarking methodology for AI procurement

Source: Adapted from general procurement benchmarking practice; AI-specific benchmark suites (Artificial Analysis, LMSys, vendor benchmarks)

Classification Benchmark methodology designed to inform procurement decisions rather than engineering optimization.

Intent

Build benchmarks that answer procurement questions --- which model represents best value for this use case, what are the strategic risks of each provider, how do the alternatives compare across all relevant dimensions --- rather than benchmarks that only optimize engineering parameters.

Motivating Problem

Engineering benchmarks measure quality on specific tasks. Procurement benchmarks must measure value, which combines quality, cost, latency, and strategic factors. Using engineering benchmarks alone for procurement decisions produces incomplete answers: the best-quality model may not be the best-value model; the fastest model may not be the most strategically aligned. Procurement benchmarking discipline produces decision-supporting comparisons that reflect all relevant dimensions.

How It Works

Define the procurement question. The question isn’t “which model is most capable” --- it’s “which model best serves our deployment’s strategic objectives at acceptable cost and risk.” The procurement question shapes the benchmark methodology: what to measure, against what, with what weighting.

Workload-representative test set. Don’t use generic benchmarks alone; use a test set drawn from actual workload patterns. Specific tasks the agent handles, with quality assertions based on production requirements. Generic benchmarks (SWE-bench, MMLU, etc.) inform; workload-specific benchmarks decide.

Multi-dimensional scoring. Score each candidate model on quality (vs. workload test set), cost (per representative task), latency (P50, P95 on representative tasks), strategic factors (provider stability, capability evolution rate, contract terms available). Weight the dimensions by procurement priorities; surface the weighting explicitly so decisions can be revisited if priorities shift.

Sensitivity analysis. How does the ranking change if cost weighting increases? If quality weighting increases? If provider stability matters more? The sensitivity analysis reveals robustness: rankings that hold across reasonable parameter changes are robust; rankings that flip dramatically with small changes are knife-edge and should be revisited.

Public benchmark cross-reference. Public benchmarks (Artificial Analysis, LMSys Chatbot Arena, vendor benchmarks) provide industry comparison points. The procurement benchmark should align with public benchmarks where they overlap; substantial misalignment suggests the procurement methodology or the public benchmarks have issues worth investigating.

Re-evaluation cadence. Procurement benchmarks have a shelf life. New model releases reshuffle the rankings. Annual re-evaluation is minimum cadence; quarterly is better for high-stakes deployments; continuous (with automated benchmark runs) is best for organizations with the infrastructure for it.

When to Use It

Strategic model selection decisions where the choice between alternatives is significant. Vendor evaluation for committed-spend agreements. Re-evaluation when new models release and re-selection might improve outcomes. Cases where engineering benchmarks alone aren’t sufficient for the decision being made.

Alternatives --- engineering benchmarks alone for cases where procurement strategic dimensions don’t matter materially. Vendor recommendations for cases where the deployment is too small to justify procurement-grade benchmarking.

Sources

  • Artificial Analysis (artificialanalysis.ai)

  • LMSys Chatbot Arena (lmarena.ai)

  • Vendor benchmark publications

Section F — FinOps for AI

The organizational practice of managing AI costs as a discipline

FinOps emerged in cloud computing as the practice of organizationally managing variable cloud costs through cross-functional accountability between finance, engineering, and product. FinOps for AI extends the practice to AI-specific concerns: foundation model pricing, multi-model deployments, fine-tuning lifecycle costs, and the TCO components from Section A. The discipline is organizational more than technical; the right tooling supports the practice but doesn’t replace it.

FinOps for AI — the organizational practice

Source: FinOps Foundation (finops.org); practitioner guidance on AI-specific FinOps; cost management platforms

Classification Organizational practice for managing AI costs across functions with accountability.

Intent

Apply FinOps practice to AI deployments --- cross-functional accountability for cost, continuous visibility into spending, optimization driven by business value rather than engineering preference --- with the discipline adapted for AI-specific cost patterns.

Motivating Problem

AI costs are variable, opaque, and growing. Engineering teams optimize their inference bills; finance tracks the totals; product makes decisions that drive usage. Without coordinated FinOps practice, the three functions optimize different things in ways that produce confusion (engineering reduces inference costs while product launches features that increase usage; finance asks why total cost grew despite engineering optimizations). FinOps practice creates accountability across functions and aligns the organization on AI cost management.

How It Works

The three FinOps phases. Inform (cost visibility, allocation, reporting). Optimize (continuous optimization across functions). Operate (operational discipline of cost-aware decisions). The phases compose; mature FinOps practice runs all three continuously.

Cost allocation methodology. Attribute AI costs to the business activities they support, not just to the engineering systems that consume them. A research agent’s costs allocate to the research function it serves; a customer support agent’s costs allocate to support. The allocation surfaces which business activities are economically rational and which aren’t.

Anomaly detection. AI usage can spike unexpectedly (new feature triggering usage; user behavior shift; bug causing infinite-loop calls). FinOps practice includes monitoring and alerting on anomalous spending. The alerting catches issues before they produce surprises in the monthly bill.

Optimization across functions. Engineering optimization (Volume 16) is one input; product optimization (which features drive cost; are they worth the cost) is another; finance optimization (provider negotiation, contract terms) is another. FinOps practice coordinates the inputs into a coherent optimization strategy rather than letting each function optimize in isolation.

Showback and chargeback. Showback: show business units the costs they’re generating without billing them. Chargeback: actually bill business units for their AI costs. The choice affects accountability: chargeback produces strong accountability but creates organizational friction; showback produces awareness without friction but accountability is weaker.

Tooling. Cost management platforms (some specialized for AI; others general FinOps tools adapted): provide cost visibility, allocation, anomaly detection, optimization recommendations. Open-source options (OpenCost, OpenLLMetry); commercial options (Cloudability, CloudHealth, ProsperOps, Vantage adapted for AI; specialized AI cost tools like Helicone’s observability tier).

Organizational structure. FinOps practice typically requires named accountability --- a FinOps function or a designated person within finance and engineering organizations. Without ownership, the practice doesn’t happen; with ownership, the practice can develop into a real discipline.

When to Use It

Organizations where AI costs are material enough to warrant organizational discipline. Production deployments where cost variability is high and unmanaged. Cases where cross-functional misalignment about AI cost decisions is producing problems. Strategic-priority AI initiatives where cost discipline supports the strategic intent.

Alternatives --- informal cost tracking for small deployments where formal FinOps would be over-engineered. The transition point: when AI cost becomes visible in budget discussions, FinOps practice usually starts being valuable.

Sources

  • FinOps Foundation (finops.org)

  • AI-specific cost management platforms and observability tools (Volume 14 Section G coverage)

Section G — Capacity planning under uncertainty

Provisioning for AI workloads with variable and unpredictable demand

Chapter 5 introduced the capacity planning problem. This section makes the discipline explicit. AI workloads have characteristic volatility patterns that traditional capacity planning doesn’t handle well. The economic discipline involves planning explicitly for the uncertainty rather than betting on point forecasts that the actual demand will exceed or fall short of.

Capacity planning patterns for AI workloads

Source: Traditional capacity planning literature adapted for AI; practitioner reports on production AI scaling

Classification Patterns for provisioning AI capacity under demand uncertainty.

Intent

Provision AI capacity --- API rate limits, reserved capacity, infrastructure for self-hosted models, GPU allocations --- in ways that handle realistic demand variability without over-provisioning waste or under-provisioning failures.

Motivating Problem

AI workload demand is harder to forecast than traditional software demand. New capabilities trigger usage spikes (10x in days when users discover a new feature). Reasoning-heavy tasks have variable inference time. Multi-step agent workflows compound variability. Capacity planning that assumes traditional software volatility patterns produces either over-provisioning (paying for capacity that doesn’t get used) or under-provisioning (user-facing failures and emergency scrambles). The discipline involves explicit planning for AI-specific volatility.

How It Works

Percentile-based provisioning. Provision for the 95th or 99th percentile of demand, not the average. The cost is over-provisioning for the long tail; the benefit is reliability when the tail materializes. The choice between 95th and 99th depends on the cost of failure: low-stakes deployments can run at the 95th percentile and accept occasional throttling; high-stakes deployments need 99th or higher.

Tiered capacity strategy. Reserved capacity for baseline (committed-spend or pre-provisioned infrastructure at discounted rates). On-demand capacity for surges (full-price but available immediately). The split optimizes cost (reserved is cheaper) while preserving elasticity (on-demand handles surges). Production deployments typically reserve 60—80% of expected baseline; on-demand handles the rest plus surges.

Multi-provider strategy for elasticity. Single-provider deployments are exposed to single-provider rate limits and outages. Multi-provider deployments have natural elasticity: when one provider rate-limits or fails, others handle the spillover. The architectural complexity (Volume 16 Section D) is justified by the elasticity benefit at production scale.

Scenario planning. Build three capacity plans: baseline (most likely demand), growth (significant growth scenario), surge (capability-launch-driven spike). Each plan has different capacity, costs, and lead times. The discipline isn’t to bet on one scenario but to build the muscle to switch between plans as demand signals materialize.

Lead time management. Reserved capacity often requires lead time to provision. Self-hosted infrastructure requires GPU procurement and setup. Cloud-provider rate limit increases require negotiation. Production teams track capacity lead times and start the provisioning process before capacity is needed, not when it’s needed.

Demand smoothing. Where possible, smooth demand to reduce peak provisioning requirements. Batch processing for offline workloads (Volume 16 Section C covers batch APIs at 50% cost reduction). Rate limiting at the application layer to spread load. Caching at the user-facing layer to absorb repeat requests. Each smoothing technique reduces peak demand and the corresponding capacity requirement.

Capacity-cost-quality trade-offs. Capacity planning interacts with the latency-cost-quality frontier. Larger capacity reservations reduce latency-vs-cost trade-offs (no rate limiting forcing slower routes). Smaller reservations reduce cost but expose the deployment to latency or quality issues under load. The decisions compose; thoughtful capacity planning supports the frontier choices the deployment has made.

When to Use It

Production deployments at scale where capacity decisions have material cost or reliability impact. Deployments with significant demand volatility where naive provisioning produces waste or failures. Strategic-priority deployments where reliability under load matters more than minimum-cost operation.

Alternatives --- pay-as-you-go consumption for small deployments where capacity planning is over-engineered. Vendor-recommended defaults for cases where the deployment doesn’t justify custom capacity strategy.

Sources

  • Traditional capacity planning literature

  • Vendor capacity planning guides (Anthropic, OpenAI, Google enterprise documentation)

  • Practitioner writeups on AI scaling challenges

Section H — Discovery and resources

Where to track AI economics and FinOps discipline as it matures

AI economics is a young discipline relative to traditional FinOps. Sources are scattered across general FinOps literature, AI-specific practitioner writing, vendor pricing documentation, and industry analyst coverage. Staying current requires tracking multiple sources.

Tracking AI economics and FinOps discipline

Source: Various general FinOps and AI-specific sources

Classification Resources for staying current on AI economics practice.

Intent

Provide pointers to the active sources of AI economics and FinOps knowledge as the discipline matures.

Motivating Problem

The AI economics discipline is still consolidating. Vendor pricing changes affect strategy; new contract patterns emerge; FinOps tooling adapts to AI specifics. Production teams need ongoing tracking to keep economic decisions current.

How It Works

FinOps Foundation (finops.org). The organizational home for FinOps practice generally; AI-specific guidance is emerging through 2024—2026. Annual FinOps X conference; certifications; standards.

Vendor pricing pages. anthropic.com/pricing, openai.com/pricing, ai.google.dev pricing. The authoritative source for current pricing; updates as the providers adjust. Following the pricing pages directly catches changes sooner than industry coverage.

Industry analysts. Gartner, Forrester, IDC cover AI economics from enterprise procurement perspective. The analyst reports lag the most current developments by months but provide structured comparison.

Practitioner writing. Eugene Yan, Hamel Husain, Simon Willison continue to be valuable for technical practitioner perspective. Specific to economics: write-ups from teams at companies that have publicly shared their AI cost optimization journeys (various tech blogs).

Cost management platforms. Helicone’s blog and product documentation surface AI cost patterns. Braintrust, Galileo, others publish industry-trend content. The tooling vendors have visibility into industry patterns that individual deployments don’t have.

Conference proceedings. FinOps X (FinOps Foundation conference). AI Engineer Summit (covers economics among other topics). Vendor developer conferences sometimes cover pricing strategy and economics.

Public earnings calls. Foundation model providers (some public, some not) discuss AI economics in earnings calls. Microsoft, Google, Anthropic-via-Amazon disclosures all contain signal about AI cost trends from the supply side.

When to Use It

Teams responsible for AI economic decisions who need to maintain current knowledge. Procurement professionals working with AI vendors. FinOps practitioners adapting their practice to AI specifics. Organizations developing AI economic discipline as a function.

Alternatives --- specialized consultants for high-stakes economic decisions. Industry analyst subscriptions for organizations that prefer outsourced tracking.

Sources

  • finops.org

  • anthropic.com/pricing, openai.com/pricing, ai.google.dev

  • Gartner, Forrester, IDC analyst coverage

  • AI Engineer Summit, FinOps X, vendor conferences

Appendix A --- Pattern Reference Table

Cross-reference of the economic patterns covered in this volume, what each provides, and when to use each.

PatternProvidesWhen to useSection
Six-component TCO modelBeyond inference-billProduction deployments at scaleSection A
Volume discountsCost reduction at scaleEnterprise-scale spendingSection B
Committed-spend agreementsDiscount + commitmentPredictable growth deploymentsSection B
Per-layer build-vs-buyStrategic stack decisionsAll architectural decisionsSection C
Frontier modelingTrade-off transparencyMulti-dimensional optimizationSection D
Procurement benchmarksValue-based selectionStrategic model decisionsSection E
FinOps practiceCross-functional cost disciplineMaterial AI cost organizationsSection F
Percentile provisioningReliability under variabilityProduction deploymentsSection G
Multi-provider capacityElasticity through diversitySurge-vulnerable deploymentsSection G

Appendix B --- The Seventeen-Volume Series

This catalog joins the sixteen prior volumes. Three weaker-candidate consolidations (Volumes 15, 16, 17) now sit on top of the structural and discipline-adjacent volumes.

  • Volumes 1—10 --- Engineering substrate (Patterns, Skills, Tools, Events, Fabric, Memory, HITL, Eval & Guardrails, Multi-Agent, Retrieval).

  • Volumes 11—13 --- Complementary disciplines (Compliance, Infrastructure Security, Agent UX).

  • Volume 14 --- Products Survey (perishable snapshot).

  • Volume 15 --- Prompting and Context Engineering (weaker candidate --- talking to models).

  • Volume 16 --- LLMOps and Model Lifecycle (weaker candidate --- managing models).

  • Volume 17 --- Cost and Performance Economics (this volume; weaker candidate --- economic discipline).

Three weaker candidates is the inflection point. Volume 15 introduced the category. Volume 16 followed naturally. Volume 17 was specifically named in Volume 16’s Appendix F as a candidate that would need to clear a rising bar. Whether this volume cleared the bar depends on whether the economic-discipline distinction from Volume 16’s engineering-discipline coverage held up substantively. The reader’s judgment determines that.

More important: the bar should rise further for any additional volumes. The candidates mentioned in earlier volumes’ closing appendices --- model evaluation methodology beyond Volume 8, data engineering for AI, enterprise integration patterns --- should each face a stronger test than this volume faced. The series at seventeen volumes covers a substantial portion of the working vocabulary of agentic AI; further expansion risks producing sprawl rather than reference.

Appendix C --- The Bar Test Revisited

Chapter 1 established the bar test: did this volume’s content overlap Volume 16 substantially, in which case the volume shouldn’t exist, or did the economic-discipline framing produce content meaningfully distinct from Volume 16’s engineering-discipline content? This appendix revisits the test after the volume’s content is on the page.

The case for: the volume’s entries are substantively different from Volume 16’s. Total cost of ownership modeling (Section A) is not engineering practice. Procurement strategy and contracts (Section B) is not engineering practice. Build-vs-buy at each layer (Section C) is strategic decision-making, not implementation. Latency-cost-quality frontier modeling (Section D) is economic analysis, not engineering optimization. Performance benchmarking for procurement (Section E) is purchasing methodology, not engineering eval. FinOps for AI (Section F) is organizational practice, not technical implementation. Capacity planning under uncertainty (Section G) is provisioning strategy, not engineering scaling. Each section’s content is economic discipline rather than engineering practice; Volume 16’s overlap is at the conceptual level (cost matters; engineering and economics interact) but not at the substantive content level.

The case against: the boundary is fuzzier than the front-matter framing suggests. Procurement decisions inform engineering choices; engineering reality constrains procurement options. The TCO components include people and infrastructure costs that aren’t AI-specific; one could argue this volume’s content belongs in general engineering economics literature, not in an AI catalog. The strategic frame this volume adopts (FinOps practice, capacity planning) applies to cloud computing broadly with AI as a specific instance. Some of the strategic content is necessarily generic; the AI-specific adaptation is what makes the volume useful, but the genericity at points is real.

The honest assessment. The volume clears the bar but not by a wide margin. The substantive content is distinct from Volume 16; the strategic frame produces real value for the audience the volume is written for. But the boundary is fuzzy enough that another volume of similar substance/separation might not clear the bar without producing content that’s less distinct from what already exists. The series shouldn’t extend further on the same pattern; the bar must rise further for additional volumes. Specific potential future volumes mentioned in earlier appendices need to clear a higher test than this volume faced.

The reader’s judgment determines whether the bar was actually cleared. If the volume’s economic-discipline framing reads as substantively distinct and useful, the volume earns its place. If the volume reads as a re-packaging of Volume 16’s content with strategic terminology, the volume marks the pattern’s breaking point. The volume submits to the reader’s judgment explicitly.

Appendix D --- Discovery and Standards

Resources for tracking AI economics and FinOps discipline:

  • FinOps Foundation (finops.org) --- general FinOps practice and emerging AI-specific guidance.

  • Vendor pricing pages --- anthropic.com/pricing, openai.com/pricing, ai.google.dev pricing.

  • Industry analysts --- Gartner Magic Quadrants, Forrester Waves, IDC for enterprise procurement context.

  • Practitioner writing --- Eugene Yan, Hamel Husain, others on AI economics topics.

  • Cost management platforms --- Helicone, Braintrust, Galileo as both products and knowledge sources.

  • FinOps X conference, AI Engineer Summit, vendor developer conferences.

  • Public earnings calls of foundation model providers and major customers (Microsoft, Google, Anthropic-via-Amazon disclosures).

Two practical recommendations. First, treat AI economics as a discipline that requires ongoing investment, not a one-time setup. Pricing changes, contract terms evolve, optimization patterns mature; static knowledge ages quickly. Second, the discipline pays off when AI cost is material to the organization. Below that threshold, formal economic discipline is over-engineering; above it, the discipline produces returns that easily justify the practice investment. The transition typically happens at annual AI spend in the high-five-figures to seven-figures range.

Appendix E --- Omissions

This catalog covers about 9 substrates across 8 sections --- the smallest entry count of any volume so far, matching the weaker-candidate framing and the narrowed scope after Volume 16’s coverage of cost engineering. The wider AI economics discipline includes content not covered here:

  • Cost engineering techniques (token reduction, caching, routing). Volume 16 Section B covers.

  • Specific provider pricing analysis. Pricing changes frequently; live pricing pages are the working reference.

  • Detailed contract terms and legal analysis. Beyond the volume’s scope; legal counsel handles specific contracts.

  • Specific FinOps tooling product comparison. Volume 14 Products Survey covers ops platforms; FinOps-specific tooling is in earlier stages.

  • Industry benchmarks for AI cost per business outcome. Emerging analyst coverage; not consolidated yet.

  • Hardware accelerator economics in depth. Specialized topic; Volume 5 Fabric covers infrastructure substrate.

  • Macro-economic forecasting for AI costs. Beyond the volume’s practitioner-focused scope.

  • Cost benchmarks across industries. Proprietary in most cases; public benchmarks are sparse.

  • Detailed analysis of model deprecation cost impact. Volume 16 Section G covers migration patterns; the economic dimension is partial here.

  • Specific organizational structures for AI FinOps roles. Emerging practice; not yet consolidated.

Appendix F --- On Sprawl and the End of Weaker Candidates

This is the third weaker-candidate volume and likely should be the last. The pattern of weaker candidates has produced real content but is approaching a breaking point. Volume 15 introduced the category; Volume 16 followed naturally; Volume 17 was named specifically as a candidate that would face a higher bar and barely cleared that bar by the honest assessment in Appendix C. A fourth weaker-candidate volume would face a still-higher bar that the candidates mentioned in earlier closing appendices (model evaluation methodology, data engineering for AI, enterprise integration patterns) would struggle to clear without producing content substantially overlapping the existing volumes.

The series at seventeen volumes covers a substantial portion of the working vocabulary of agentic AI as of mid-2026. Volumes 1—10 cover the engineering substrate. Volumes 11—13 cover complementary disciplines (compliance, security, design). Volume 14 covers the product landscape. Volumes 15—17 cover practitioner-discipline consolidations. The seventeen volumes together address the working concerns of agent engineering, governance, security, design, products, prompting, model lifecycle, and economics. The reader who engages with the full series gets a comprehensive working vocabulary; the reader who engages with the relevant subset for their role gets what they need.

Further expansion of the series risks producing sprawl rather than reference. Each additional volume adds maintenance burden, dilutes the series’ coherence, and faces a higher bar than the volume before it because the easier candidates have already been written. The candidates mentioned in earlier appendices are not equally strong: some (model evaluation methodology beyond Volume 8) might justify treatment; others (data engineering for AI, enterprise integration patterns) overlap existing volumes more than this volume overlaps Volume 16. The discipline of stopping is real value; not every defensible volume should be written.

Whether to write further weaker candidates is the reader’s judgment as much as the writer’s. If the seventeen volumes feel comprehensive and additional volumes would feel like sprawl, the series should end here. If specific additional content is genuinely missing and the candidate-volumes mentioned in earlier appendices represent real gaps, the series can extend selectively with explicit framing about why each additional volume earns its place. The default should be stopping; extension should require positive justification rather than momentum.

Seventeen volumes. Patterns, Skills, Tools, Events, Fabric, Memory, Human-in-the-Loop, Evaluation & Guardrails, Multi-Agent Coordination, Retrieval & Knowledge Engineering, AI Compliance & Regulatory, AI Infrastructure Security, Agent UX Patterns, AI Agent Products Survey, Prompting and Context Engineering, LLMOps and Model Lifecycle, and now Cost and Performance Economics. The series at this size has the shape it has earned. The proposition still holds at seventeen volumes; whether it would hold at eighteen depends on whether the eighteenth volume’s content clears a bar higher than this volume cleared. The honest answer is that the bar is high enough that additional volumes probably shouldn’t be written. The series can end here without loss; further extension should require positive justification rather than reflexive continuation.

--- End of The Cost and Performance Economics Catalog v0.1 ---

— The Seventeen-Volume Series —