DR. ATABAK KH

Cloud Platform Modernization Architect specializing in transforming legacy systems into reliable, observable, and cost-efficient Cloud platforms.

Certified: Google Professional Cloud Architect, AWS Solutions Architect, MapR Cluster Administrator

Takeaway: You don’t need perfection-just a directionally correct cost per request/job.

Cost-to-serve (CTS) is the foundation of FinOps: understanding what each request, job, or user action actually costs. This guide shows you how to build a working CTS model in 30 minutes using BigQuery and your existing billing data.


Why Cost-to-Serve Matters

Traditional cloud cost reporting shows where you spend money (services, regions, projects), but not why (which features, endpoints, customers). Cost-to-serve bridges that gap by connecting costs to business metrics.

Business value:

  • Prioritize engineering work based on cost impact
  • Price products based on actual infrastructure costs
  • Identify waste (high-cost, low-value endpoints)
  • Forecast costs based on traffic growth
  • Optimize ROI by focusing on high-value, low-cost features

Example insight:

“Our /api/v1/recommendations endpoint costs €0.12 per request but only generates €0.08 in revenue. We should optimize it or deprecate it.”


Prerequisites

You need three things:

  1. Cloud billing export (already enabled in most GCP projects)
    • BigQuery billing export table
    • Or AWS Cost and Usage Report (CUR) in S3
  2. Traffic/request metrics (from your monitoring)
    • Request counts per service/endpoint
    • Job counts for batch workloads
    • User action counts for product metrics
  3. BigQuery access (or equivalent SQL engine)
    • Read access to billing export
    • Write access to create views/tables

Step 1: Set Up Billing Export (If Not Already Done)

For GCP:

# Enable billing export to BigQuery
gcloud billing budgets create \
  --billing-account=YOUR_BILLING_ACCOUNT \
  --display-name="Billing Export" \
  --budget-amount=1000USD \
  --threshold-rule=percent=50 \
  --threshold-rule=percent=90

# The export table is automatically created at:
# `gcp_billing_export_v1_XXXXXX.gcp_billing_export_v1_XXXXXX`

For AWS:

  • Enable Cost and Usage Report (CUR) in AWS Cost Management
  • Export to S3 bucket
  • Use AWS Glue/Athena or import to BigQuery

Step 2: Collect Traffic Metrics

You need request/job counts per service. Here are common sources:

Option A: Application Metrics (Prometheus/Cloud Monitoring)

-- Example: Extract request counts from Cloud Monitoring
CREATE OR REPLACE VIEW traffic_metrics AS
SELECT
  EXTRACT(DATE FROM timestamp) AS date,
  JSON_EXTRACT_SCALAR(resource.labels, '$.service_name') AS service,
  JSON_EXTRACT_SCALAR(resource.labels, '$.method') AS method,
  JSON_EXTRACT_SCALAR(resource.labels, '$.endpoint') AS endpoint,
  SUM(CAST(value AS INT64)) AS request_count
FROM
  `your-project.monitoring.metrics`
WHERE
  metric.type = 'serviceruntime.googleapis.com/api/request_count'
  AND EXTRACT(DATE FROM timestamp) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY
  date, service, method, endpoint;

Option B: Application Logs (Cloud Logging/CloudWatch)

-- Example: Count requests from application logs
CREATE OR REPLACE VIEW traffic_from_logs AS
SELECT
  EXTRACT(DATE FROM timestamp) AS date,
  JSON_EXTRACT_SCALAR(jsonPayload, '$.service') AS service,
  JSON_EXTRACT_SCALAR(jsonPayload, '$.endpoint') AS endpoint,
  COUNT(*) AS request_count
FROM
  `your-project.logs.cloudfunctions_googleapis_com_cloud_functions`
WHERE
  jsonPayload.message LIKE '%Request%'
  AND EXTRACT(DATE FROM timestamp) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY
  date, service, endpoint;

Option C: Load Balancer Logs

-- Example: Extract from Cloud Load Balancer logs
CREATE OR REPLACE VIEW traffic_from_lb AS
SELECT
  EXTRACT(DATE FROM timestamp) AS date,
  http_request.request_method AS method,
  http_request.request_url AS endpoint,
  COUNT(*) AS request_count
FROM
  `your-project.logs.requests`
WHERE
  EXTRACT(DATE FROM timestamp) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY
  date, method, endpoint;

If you don’t have traffic metrics yet:

  • Start with service-level aggregation (total requests per service)
  • Add endpoint-level granularity later
  • Use application logs as a fallback

Step 3: Build the Cost-to-Serve Query

Here’s the complete BigQuery query that joins costs with traffic:

-- Cost-to-Serve: Cost per request by service
WITH cost_by_service AS (
  SELECT
    service.description AS service,
    DATE(usage_start_time) AS date,
    SUM(cost) AS total_cost_eur,
    SUM(usage.amount) AS total_usage
  FROM
    `your-project.billing_export.gcp_billing_export_v1_XXXXXX`
  WHERE
    project.id IN ('prod-a', 'prod-b')
    AND DATE(usage_start_time) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
  GROUP BY
    service, date
),
traffic_by_service AS (
  SELECT
    service,
    DATE(ts) AS date,
    SUM(requests) AS total_requests
  FROM
    `your-project.analytics.traffic_metrics`  -- Your traffic view/table
  WHERE
    DATE(ts) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
  GROUP BY
    service, date
)
SELECT
  COALESCE(c.service, t.service) AS service,
  COALESCE(c.date, t.date) AS date,
  c.total_cost_eur,
  t.total_requests,
  -- Cost per request
  c.total_cost_eur / NULLIF(t.total_requests, 0) AS eur_per_request,
  -- Cost per 1k requests (common metric)
  (c.total_cost_eur / NULLIF(t.total_requests, 0)) * 1000 AS eur_per_1k_requests,
  -- Cost per 1M requests
  (c.total_cost_eur / NULLIF(t.total_requests, 0)) * 1000000 AS eur_per_1M_requests
FROM
  cost_by_service c
FULL OUTER JOIN
  traffic_by_service t
USING (service, date)
ORDER BY
  date DESC, service;

Save this as a view:

CREATE OR REPLACE VIEW `your-project.analytics.cost_to_serve` AS
-- (paste the query above)

Step 4: Add Endpoint-Level Granularity (Optional)

For more actionable insights, break down costs by endpoint:

-- Cost-to-Serve by endpoint (requires endpoint-level traffic data)
WITH cost_by_service AS (
  -- Same as above
),
traffic_by_endpoint AS (
  SELECT
    service,
    endpoint,
    DATE(ts) AS date,
    SUM(requests) AS total_requests
  FROM
    `your-project.analytics.traffic_metrics`
  WHERE
    DATE(ts) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
  GROUP BY
    service, endpoint, date
),
service_cost_per_request AS (
  SELECT
    service,
    date,
    total_cost_eur / NULLIF(total_requests, 0) AS eur_per_request
  FROM
    -- Use the service-level CTS query above
)
SELECT
  t.service,
  t.endpoint,
  t.date,
  t.total_requests,
  s.eur_per_request * t.total_requests AS estimated_endpoint_cost,
  s.eur_per_request AS cost_per_request
FROM
  traffic_by_endpoint t
JOIN
  service_cost_per_request s
USING (service, date)
ORDER BY
  estimated_endpoint_cost DESC;

Step 5: Create Executive Dashboard

Use the CTS view to build dashboards:

Daily Cost per Service

SELECT
  service,
  date,
  eur_per_1k_requests,
  total_requests,
  total_cost_eur,
  -- Trend: compare to previous day
  LAG(eur_per_1k_requests) OVER (
    PARTITION BY service
    ORDER BY date
  ) AS prev_day_cost_per_1k,
  eur_per_1k_requests - LAG(eur_per_1k_requests) OVER (
    PARTITION BY service
    ORDER BY date
  ) AS cost_change
FROM
  `your-project.analytics.cost_to_serve`
WHERE
  date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
ORDER BY
  date DESC, total_cost_eur DESC;

Top Cost Drivers

SELECT
  service,
  SUM(total_cost_eur) AS total_cost_30d,
  SUM(total_requests) AS total_requests_30d,
  AVG(eur_per_1k_requests) AS avg_cost_per_1k,
  -- Cost efficiency: lower is better
  AVG(eur_per_1k_requests) AS efficiency_score
FROM
  `your-project.analytics.cost_to_serve`
WHERE
  date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY
  service
ORDER BY
  total_cost_30d DESC
LIMIT 10;

Cost Anomaly Detection

-- Flag services with sudden cost increases
WITH daily_avg AS (
  SELECT
    service,
    date,
    eur_per_1k_requests,
    AVG(eur_per_1k_requests) OVER (
      PARTITION BY service
      ORDER BY date
      ROWS BETWEEN 6 PRECEDING AND 1 PRECEDING
    ) AS rolling_7d_avg,
    STDDEV(eur_per_1k_requests) OVER (
      PARTITION BY service
      ORDER BY date
      ROWS BETWEEN 6 PRECEDING AND 1 PRECEDING
    ) AS rolling_7d_stddev
  FROM
    `your-project.analytics.cost_to_serve`
  WHERE
    date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
)
SELECT
  service,
  date,
  eur_per_1k_requests,
  rolling_7d_avg,
  -- Flag if cost is 2 standard deviations above average
  CASE
    WHEN eur_per_1k_requests > rolling_7d_avg + 2 * rolling_7d_stddev
    THEN 'ANOMALY'
    ELSE 'NORMAL'
  END AS status
FROM
  daily_avg
WHERE
  status = 'ANOMALY'
ORDER BY
  date DESC;

Step 6: Correlate with Business Metrics

Connect costs to business value:

-- Cost-to-Serve with revenue/user metrics
WITH cts AS (
  SELECT * FROM `your-project.analytics.cost_to_serve`
),
revenue AS (
  SELECT
    service,
    DATE(transaction_time) AS date,
    SUM(amount) AS revenue_eur,
    COUNT(DISTINCT user_id) AS active_users
  FROM
    `your-project.analytics.transactions`
  WHERE
    DATE(transaction_time) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
  GROUP BY
    service, date
)
SELECT
  c.service,
  c.date,
  c.eur_per_1k_requests AS cost_per_1k,
  r.revenue_eur / NULLIF(c.total_requests, 0) * 1000 AS revenue_per_1k,
  -- Profit margin per request
  (r.revenue_eur / NULLIF(c.total_requests, 0) * 1000) - c.eur_per_1k_requests AS profit_per_1k,
  -- ROI
  (r.revenue_eur / NULLIF(c.total_cost_eur, 0)) AS roi
FROM
  cts c
JOIN
  revenue r
USING (service, date)
ORDER BY
  profit_per_1k DESC;

Common Use Cases

-- Weekly cost trend
SELECT
  EXTRACT(WEEK FROM date) AS week,
  service,
  AVG(eur_per_1k_requests) AS avg_cost_per_1k,
  SUM(total_cost_eur) AS weekly_cost
FROM
  `your-project.analytics.cost_to_serve`
WHERE
  date >= DATE_SUB(CURRENT_DATE(), INTERVAL 12 WEEK)
GROUP BY
  week, service
ORDER BY
  week DESC, weekly_cost DESC;

2. Flag High-Cost Endpoints

-- Endpoints costing more than €0.10 per request
SELECT
  service,
  endpoint,
  date,
  cost_per_request,
  total_requests,
  estimated_endpoint_cost
FROM
  `your-project.analytics.cost_to_serve_by_endpoint`
WHERE
  cost_per_request > 0.10
  AND date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
ORDER BY
  estimated_endpoint_cost DESC;

3. Forecast Future Costs

-- Simple linear forecast based on traffic growth
WITH historical AS (
  SELECT
    service,
    AVG(eur_per_1k_requests) AS avg_cost_per_1k,
    AVG(total_requests) AS avg_daily_requests,
    -- Traffic growth rate (simplified)
    (MAX(total_requests) - MIN(total_requests)) / COUNT(DISTINCT date) AS daily_growth
  FROM
    `your-project.analytics.cost_to_serve`
  WHERE
    date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
  GROUP BY
    service
)
SELECT
  service,
  avg_cost_per_1k,
  avg_daily_requests,
  -- Forecast 30 days ahead
  avg_daily_requests + (daily_growth * 30) AS forecasted_daily_requests,
  (avg_daily_requests + (daily_growth * 30)) * avg_cost_per_1k / 1000 * 30 AS forecasted_monthly_cost
FROM
  historical;

Troubleshooting

Problem: No traffic data

Solution: Start with service-level aggregation from billing export only:

SELECT
  service.description AS service,
  DATE(usage_start_time) AS date,
  SUM(cost) AS total_cost
FROM
  `your-project.billing_export.gcp_billing_export_v1_XXXXXX`
GROUP BY
  service, date;

Problem: Costs don’t match billing

Solution:

  • Check billing export delay (usually 24-48 hours)
  • Verify project filters
  • Include all cost components (compute, storage, network, etc.)

Problem: Missing service labels

Solution:

  • Use resource.name or resource.type as fallback
  • Add custom labels in application code
  • Use Cloud Monitoring resource labels

Best Practices

  1. Start simple: Service-level CTS is enough to get started
  2. Iterate: Add endpoint-level granularity after service-level works
  3. Automate: Schedule daily queries to keep data fresh
  4. Alert: Set up alerts for cost anomalies (> 20% increase)
  5. Correlate: Connect costs to business metrics (revenue, users)
  6. Document: Keep a data dictionary of service/endpoint mappings

Expected Outcomes

With cost-to-serve in place, you should be able to:

  • Track €/1k requests in executive dashboards
  • Correlate costs with p95 latency and error rates
  • Flag endpoints whose cost increases but value decreases
  • Prioritize optimization work based on cost impact
  • Forecast costs based on traffic growth
  • Price products based on actual infrastructure costs

Outcome: Aligns engineering work with money-decisions get faster and calmer.


Next Steps

  • Week 1: Set up service-level CTS
  • Week 2: Add endpoint-level granularity
  • Week 3: Correlate with business metrics
  • Week 4: Build executive dashboards and alerts

Want help setting this up?

This is a personal blog. The views, thoughts, and opinions expressed here are my own and do not represent, reflect, or constitute the views, policies, or positions of any employer, university, client, or organization I am associated with or have been associated with.

© Copyright 2017-2025