Week of Apr 28 – May 04 | Edition #10 | ~5 min read
Curated by Simon Brief

Compute Is the New Moat & The Agent Economy Lands at Stripe

Listen to this briefing (beta)
0:00 / 0:00

TLDR

  • Compute is the moat. Hyperscalers are pouring $1T into capex but still hitting capacity walls — provisioned throughput is now table stakes, not a "nice to have."
  • The agent economy is real. Stripe is rebuilding payments around it; xAI's $60B Cursor deal and Anthropic's $1T secondary mark are redefining how AI companies get built.
  • The harness, not the model, is the product moat. Critique loops, evals, and multi-model orchestration turn a frontier API into a reliable product.
  • The Gemini Enterprise Agent Platform (FKA Vertex AI) is built for exactly that loop — and only on GCP can founders commit to Gemini + Anthropic together in a single relationship, the kind of partnership that protects optionality instead of costing it.
  • Cloud revenue up 63% YoY to $20B with backlog doubling. This is our time to win.

The Big Picture

Compute Scarcity: A Trillion-Dollar Buildout — and Why Provisioned Throughput Is Now a Necessity

ai-compute-moat

The market is "power constrained" — OpenAI is reportedly missing targets due to compute and power supply, not demand Chamath Palihapitiya on All-In (81min, 16:11). Hyperscalers are responding with $725B in 2026 capex, projected to hit $1T industry-wide Jason Calcanis on All-In (81min, 52:50). The strain is real even at the application layer: Baseten now deploys across 18 clouds and 90 clusters to access GPUs, citing "uncomfortably high utilization" everywhere Tuhin Srivastava on No Priors (43min, 39:10).

The implication for any AI product in production: on-demand inference is no longer reliable enough. When everyone is fighting for the same GPUs, pay-as-you-go quotas can throttle, latency spikes, and your user-facing SLA breaks at exactly the wrong moment. The serious answer is provisioned throughput — pre-purchased, guaranteed inference capacity with predictable latency. It's moving from optional to mandatory for any product that promises real-time responses or runs an agent loop where each step blocks the next.

Your angle with founders: "If hyperscalers are pouring a trillion dollars in and still hitting capacity walls, where does your inference capacity come from when demand spikes? Are you running on shared on-demand quotas, or have you locked in provisioned throughput? How are you thinking about it for your next product launch?"

Stripe Rebuilds Payments for the Agent Economy

stripe-agent-economy

The agent economy stopped being theoretical this week. Stripe's Emily Sands says LLM traffic to Stripe docs is up 10× year-over-year — machines are now first-class users of developer infrastructure Emily Sands on AI & I (54min, 0:28). Two consequences worth knowing:

  • Fraud is changing shape. "Fraudsters are stealing compute" — a different problem than card theft, and AI companies are dropping free trials and blocking virtual cards in response (8:44). "Free compute is the new CAC" (10:20).
  • Pricing is collapsing seat-based SaaS. "I suspect we will see seat-based disappear" (37:13). Stripe is also building shared payment tokens so merchants integrate once and become available to every agentic storefront.

This is exactly where Google Cloud's Model Armor and Security Command Center Enterprise (SCC Enterprise) come in. Model Armor screens prompts and model responses for prompt injection, jailbreaks, sensitive-data leakage, and abuse patterns before they hit your inference bill — directly addressing the "compute theft" Sands describes. SCC Enterprise extends that same posture management across the rest of the stack: misconfigurations, identity drift, and the new attack surface that AI agents create when they start calling APIs on behalf of users.

Your angle with founders: "Two things are flipping at once for AI products: pricing (seat-based is dying) and abuse (free compute is the new CAC). Have you mapped your monetization to usage instead of seats — and on the abuse side, how are you screening prompts and model responses before they consume tokens? Model Armor and SCC Enterprise are built for this exact moment."

Builder's Corner

The Harness Is the Moat — Not the Model

llm-harness-patterns

Agent performance is 95% the harness, 5% the base model — meaning the prompts, scaffolding, and feedback loops around the model dominate the outcome Yasser Elsaid on Latent Space (61min, 17:46). Boris Cherny, creator of Claude Code, identified 9 common prompting patterns that waste 73% of tokens Boris Cherny. Shopify's Mikhail Parakhin reinforces it: fewer agents with strong critique loops beat many parallel agents on code quality, even at higher latency Mikhail Parakhin on Latent Space (75min, 0:52). And Karpathy: infrastructure and docs need to be agent-native — written for the agent first, humans second Karpathy on Sequoia Capital (30min, 20:10).

What "good harness" actually looks like in practice:

  • Critique loops: one model drafts, a second model critiques against an explicit rubric, the first model revises. Two passes beats one giant prompt almost every time.
  • Structured I/O at every step: JSON in, JSON out, schemas validated. No free-text chains.
  • Eval-driven development: every prompt change gets scored against a held-out test set before it ships. "Vibes-based prompting" is how teams burn 73% of their tokens.
  • Agent-native infrastructure: machine-readable docs (markdown, OpenAPI), structured logs, queryable state. If your agent has to scrape your own UI, you're paying for the harness twice.
  • Provisioned throughput on the inner loop: critique loops are sequential — each step blocks the next. On-demand quotas turn a 4-second loop into a 40-second loop the moment traffic spikes.

Your angle with founders: "If the model is a commodity, your moat is the harness. Walk me through your critique loop, your eval set, and your inference reliability story — that's where the next 10× in product quality is hiding."

Recursive Inference: A New Scaling Law

A paper this week showed a 7M-parameter "Tiny Recursive Model" beating 100B+ parameter LLMs on hard reasoning tasks like Sudoku, mazes, and the ARC prize Francois Shaard on Lightcone YC (38min, 36:03). The trick: instead of one forward pass, the model loops on its own draft answer in latent space, refining it through deep supervision steps. Shaard frames chain-of-thought as a "hack" — reasoning bounded by human language — and predicts the real frontier is giant models for embedding + tiny recursive models for reasoning.

Why founders care: The "biggest model wins" assumption is breaking. Some hard reasoning workloads may need a tiny recursive model on cheap inference, not a frontier API. That changes infra choices and unit economics. Worth a 30-minute look before the next architecture decision.

Founder Watch

AIG: The Insurance Playbook for Vertical AI

AIG just announced a Gen AI transformation built on Anthropic + Palantir partnerships, with both Dario Amodei and Alex Karp on stage at their investor day Peter Zaffino on Grit (57min, 30:57). The focused use case: underwriting workflow — getting an underwriter "perfect information in a fraction of the time" (41:40). What's notable for founders selling into regulated industries: AIG explicitly aligned at the top of the house with their AI partners and educated the board of directors before launching.

Conversation starter: "AIG didn't pilot AI in a sandbox — they put their CEO on stage with Anthropic's and Palantir's CEOs. If your enterprise pilots are stalling, is it because your buyer's CEO and board aren't bought in? What would it take to elevate this to a top-of-house conversation?"

Cursor + Anthropic: When Compute Becomes the Acquisition Currency

xAI is reportedly acquiring Cursor for ~$60B, while Anthropic is trading at $1T in secondary markets 20VC (111min, 3:31). The structural read: Cursor has revenue but "shitty gross margins" because it needs its own model and compute; xAI has compute and a model but no revenue. As Harry Stebbings put it, "a marriage made in heaven." Meanwhile Jason Calacanis is calling out a new B2B watch metric — "stealth churn" — where customers still pay but stop using the product because AI alternatives are eating their workflow (28:43). The bigger meta-theme from the same conversation: enterprises increasingly need an "agent fabric they can trust" — an orchestration layer that manages all the agents running across their stack (52:27).

Conversation starter: "Where does your gross margin land 12 months from now as inference usage scales? Cursor's $60B exit was essentially a compute deal — revenue intact, margins eaten by inference costs. The proactive play is locking in both committed compute and model flexibility on one platform — Gemini for price-performance workloads, Anthropic for frontier reasoning, both running through the Gemini Enterprise Agent Platform with provisioned throughput. That's the kind of partnership that protects your optionality, not the kind that costs it."

Our Play

Gemini Enterprise Agent Platform (FKA Vertex AI): The Foundation for Harness-Driven AI

This week's signal — that compute is the moat, harnesses are the differentiator, and the agent economy is real — points squarely at where Google Cloud is investing. The Gemini Enterprise Agent Platform (FKA Vertex AI) is built for the exact workflow founders are converging on. Concretely:

  • Provisioned throughput for Gemini lets founders pre-purchase guaranteed inference capacity with predictable latency — the answer to "where does my capacity come from when demand spikes." It's the difference between an agent loop that responds in 4 seconds every time and one that drifts to 40 seconds under load.
  • Agent Builder + Agent Engine give teams the orchestration primitives for the critique loops, multi-step reasoning chains, and tool-calling patterns described above. You don't have to hand-roll the harness — the platform handles state, retries, evaluation hooks, and observability.
  • Model Garden lets teams pick the right model per role in the loop — frontier Gemini for the critique step, a smaller Gemma for the cheap drafting step, and the recursive/specialized models from Lightcone's discussion as they mature. The harness pattern requires multi-model flexibility; Model Garden gives you that without re-platforming.
  • Built-in evaluation means every prompt change can be scored against a held-out test set as part of CI — eval-driven prompting, not vibes.
  • Model Armor and SCC Enterprise wrap the platform with the safety and posture management the Stripe story makes urgent: prompt-injection screening, abuse detection, and identity/agent posture across the stack.

Replit's experience is the proof point: they specifically use Gemini for price-performance-sensitive tasks — at one point sending more tokens to Google than to Anthropic Amjad Masad on 20VC (49min, 12:03). That's the Model Garden pattern in action: the right model for the right task, on infrastructure built for harness-driven AI.

Connect to this week: The shift from "biggest model wins" to "best harness wins" is a tooling shift as much as a mindset shift. The Gemini Enterprise Agent Platform is built for that loop — provisioned throughput for the reliability the loop needs, Agent Builder for the orchestration, Model Garden for the model flexibility, Model Armor for the abuse surface. And uniquely on GCP, founders can commit to Gemini and Anthropic together on one platform — the multi-model deal that gives them compute economics and model optionality in a single relationship, rather than fragmenting across providers.