Server-side GTM on Cloud Run: cost, scaling, and what to watch

Cloud Run is where your server GTM container runs. It’s also the part of the setup that generates the most anxiety — because it’s the part with a billing meter attached. Most of that anxiety is misplaced. Here’s what actually matters.

What it actually costs

For a typical site (under 10M hits/month), a single Cloud Run service running an sGTM container costs between $0 and $30/month. Most small-to-mid sites land under $10/month. That’s not a guess — it’s what the billing looks like when the container is configured correctly.

The pricing model is CPU and memory, billed per 100ms of request processing time, plus a per-request fee. For sGTM, each request is fast (single-digit milliseconds of CPU time) and small (a few KB). The per-request cost at scale is roughly $0.40 per million requests.

The number that matters is not “what could this cost” — it’s “what does this cost at my traffic level with correct settings.” For most setups, the answer is very little.

The free tier

Cloud Run’s free tier covers 2 million requests/month, 360,000 vCPU-seconds, and 180,000 GiB-seconds of memory. For a low-traffic site, this means you may pay nothing. For a moderate site, it covers most of the baseline and you pay for the overage.

The free tier resets monthly and applies per billing account, not per service. If you’re running other Cloud Run services on the same billing account, they share the allocation.

Scaling behavior

Cloud Run autoscales based on request concurrency. When traffic increases, it spins up more container instances. When traffic drops, it scales back down. This is the right model for sGTM — traffic follows user sessions, which are inherently bursty.

Min instances. The default is zero. That means when there’s no traffic, there are no running instances and no cost. The tradeoff is cold starts — the first request after idle takes 2-5 seconds while a new instance boots. For sGTM, this matters less than you’d think. GA4 requests are not user-facing page loads. A cold start delays the server-side hit, not the visitor’s browser. Set min instances to 1 if you want to eliminate cold starts; it costs roughly $5-10/month depending on region.

Max instances. The default cap is 100. For sGTM, you are unlikely to need more than 2-3 unless your site handles millions of daily hits. Set max instances to something reasonable (5-10) to prevent runaway scaling from bot traffic or misconfiguration.

Concurrency. Cloud Run can handle 80+ concurrent requests per instance by default. sGTM requests are fast and lightweight, so a single instance handles significant throughput. Don’t reduce concurrency below the default unless you have a specific reason.

Common cost mistakes

Leaving the preview server running at high min instances. The GTM preview server is a separate Cloud Run service. It only needs to run when you’re actively debugging. If you deployed it with min instances set to 1+ and forgot about it, it’s burning money 24/7 for no reason. Set preview server min instances to 0.

Over-provisioning CPU and memory. The sGTM container doesn’t need 2 vCPUs and 2 GiB of memory. 1 vCPU and 512 MiB is sufficient for most setups. Over-provisioning doesn’t make it faster — it makes it more expensive.

Wrong region. Deploy in a region close to your users, but also check regional pricing. Some regions cost 10-20% more than others for the same compute. us-central1 is usually the cheapest.

Not setting a budget alert. Cloud Run costs are predictable until they’re not. A misconfigured tag that fires in a loop, a bot flood, or an autoscaling spike can run up a bill. Set a billing alert at $25 or $50 so you find out from email, not from your monthly invoice.

What to monitor

Set these up before you go live, not after something breaks:

Request count — confirms traffic is flowing through the server container. A sudden drop means something is broken upstream.
Request latency (p50 and p95) — sGTM requests should complete in under 200ms at p95. If latency spikes, check for downstream issues (GA4 endpoint, network, container resource limits).
Error rate (5xx responses) — should be near zero. Persistent 5xx errors mean the container is crashing or misconfigured. Check Cloud Run logs.
Instance count — shows scaling behavior. If instances are climbing when traffic isn’t, investigate.
Billing alerts — not optional. Set a threshold that’s 2-3x your expected monthly cost.

All of this is available in the Cloud Run console and Google Cloud Monitoring. No additional tooling required.

When to worry vs when it’s fine

It’s fine: your bill is under $20/month, latency is stable, error rate is near zero, and request counts match expectations. This is where most setups land.

Investigate: latency is climbing, error rate is above 1%, or instance count is higher than expected for your traffic level. These usually point to configuration issues, not fundamental problems.

Act now: your bill spiked unexpectedly, 5xx errors are sustained, or request counts dropped to zero. Check Cloud Run logs, verify the container is healthy, and confirm your web GTM container is still sending traffic to the right endpoint.

The infrastructure is not the hard part of server-side GTM. Getting the tagging right is. Cloud Run just needs correct settings and basic monitoring.

The guide covers the full setup path including Cloud Run deployment. The assistant automates infrastructure checks and ongoing monitoring.