DR. ATABAK KH

Cloud Platform Modernization Architect specializing in transforming legacy systems into reliable, observable, and cost-efficient Cloud platforms.

Certified: Google Professional Cloud Architect, AWS Solutions Architect, MapR Cluster Administrator

Takeaway: You don’t need perfection-just a directionally correct cost per request/job.

Cost-to-serve (CTS) is the foundation of FinOps: understanding what each request, job, or user action actually costs. This guide shows you how to build a working CTS model in 30 minutes using BigQuery and your existing billing data.


Why Cost-to-Serve Matters

Traditional cloud cost reporting shows where you spend money (services, regions, projects), but not why (which features, endpoints, customers). Cost-to-serve bridges that gap by connecting costs to business metrics.

Business value:

  • Prioritize engineering work based on cost impact
  • Price products based on actual infrastructure costs
  • Identify waste (high-cost, low-value endpoints)
  • Forecast costs based on traffic growth
  • Optimize ROI by focusing on high-value, low-cost features

Example insight:

“Our /api/v1/recommendations endpoint costs €0.12 per request but only generates €0.08 in revenue. We should optimize it or deprecate it.”


Prerequisites

You need three things:

  1. Cloud billing export (already enabled in most GCP projects)
    • BigQuery billing export table
    • Or AWS Cost and Usage Report (CUR) in S3
  2. Traffic/request metrics (from your monitoring)
    • Request counts per service/endpoint
    • Job counts for batch workloads
    • User action counts for product metrics
  3. BigQuery access (or equivalent SQL engine)
    • Read access to billing export
    • Write access to create views/tables

Step 1: Set Up Billing Export (If Not Already Done)

For GCP:

# Enable billing export to BigQuery
gcloud billing budgets create \
  --billing-account=YOUR_BILLING_ACCOUNT \
  --display-name="Billing Export" \
  --budget-amount=1000USD \
  --threshold-rule=percent=50 \
  --threshold-rule=percent=90

# The export table is automatically created at:
# `gcp_billing_export_v1_XXXXXX.gcp_billing_export_v1_XXXXXX`

For AWS:

  • Enable Cost and Usage Report (CUR) in AWS Cost Management
  • Export to S3 bucket
  • Use AWS Glue/Athena or import to BigQuery

Step 2: Collect Traffic Metrics

You need request/job counts per service. Here are common sources:

Option A: Application Metrics (Prometheus/Cloud Monitoring)

-- Example: Extract request counts from Cloud Monitoring
CREATE OR REPLACE VIEW traffic_metrics AS
SELECT
  EXTRACT(DATE FROM timestamp) AS date,
  JSON_EXTRACT_SCALAR(resource.labels, '$.service_name') AS service,
  JSON_EXTRACT_SCALAR(resource.labels, '$.method') AS method,
  JSON_EXTRACT_SCALAR(resource.labels, '$.endpoint') AS endpoint,
  SUM(CAST(value AS INT64)) AS request_count
FROM
  `your-project.monitoring.metrics`
WHERE
  metric.type = 'serviceruntime.googleapis.com/api/request_count'
  AND EXTRACT(DATE FROM timestamp) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY
  date, service, method, endpoint;

Option B: Application Logs (Cloud Logging/CloudWatch)

-- Example: Count requests from application logs
CREATE OR REPLACE VIEW traffic_from_logs AS
SELECT
  EXTRACT(DATE FROM timestamp) AS date,
  JSON_EXTRACT_SCALAR(jsonPayload, '$.service') AS service,
  JSON_EXTRACT_SCALAR(jsonPayload, '$.endpoint') AS endpoint,
  COUNT(*) AS request_count
FROM
  `your-project.logs.cloudfunctions_googleapis_com_cloud_functions`
WHERE
  jsonPayload.message LIKE '%Request%'
  AND EXTRACT(DATE FROM timestamp) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY
  date, service, endpoint;

Option C: Load Balancer Logs

-- Example: Extract from Cloud Load Balancer logs
CREATE OR REPLACE VIEW traffic_from_lb AS
SELECT
  EXTRACT(DATE FROM timestamp) AS date,
  http_request.request_method AS method,
  http_request.request_url AS endpoint,
  COUNT(*) AS request_count
FROM
  `your-project.logs.requests`
WHERE
  EXTRACT(DATE FROM timestamp) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY
  date, method, endpoint;

If you don’t have traffic metrics yet:

  • Start with service-level aggregation (total requests per service)
  • Add endpoint-level granularity later
  • Use application logs as a fallback

Step 3: Build the Cost-to-Serve Query

Here’s the complete BigQuery query that joins costs with traffic:

-- Cost-to-Serve: Cost per request by service
WITH cost_by_service AS (
  SELECT
    service.description AS service,
    DATE(usage_start_time) AS date,
    SUM(cost) AS total_cost_eur,
    SUM(usage.amount) AS total_usage
  FROM
    `your-project.billing_export.gcp_billing_export_v1_XXXXXX`
  WHERE
    project.id IN ('prod-a', 'prod-b')
    AND DATE(usage_start_time) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
  GROUP BY
    service, date
),
traffic_by_service AS (
  SELECT
    service,
    DATE(ts) AS date,
    SUM(requests) AS total_requests
  FROM
    `your-project.analytics.traffic_metrics`  -- Your traffic view/table
  WHERE
    DATE(ts) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
  GROUP BY
    service, date
)
SELECT
  COALESCE(c.service, t.service) AS service,
  COALESCE(c.date, t.date) AS date,
  c.total_cost_eur,
  t.total_requests,
  -- Cost per request
  c.total_cost_eur / NULLIF(t.total_requests, 0) AS eur_per_request,
  -- Cost per 1k requests (common metric)
  (c.total_cost_eur / NULLIF(t.total_requests, 0)) * 1000 AS eur_per_1k_requests,
  -- Cost per 1M requests
  (c.total_cost_eur / NULLIF(t.total_requests, 0)) * 1000000 AS eur_per_1M_requests
FROM
  cost_by_service c
FULL OUTER JOIN
  traffic_by_service t
USING (service, date)
ORDER BY
  date DESC, service;

Save this as a view:

CREATE OR REPLACE VIEW `your-project.analytics.cost_to_serve` AS
-- (paste the query above)

Step 4: Add Endpoint-Level Granularity (Optional)

For more actionable insights, break down costs by endpoint:

-- Cost-to-Serve by endpoint (requires endpoint-level traffic data)
WITH cost_by_service AS (
  -- Same as above
),
traffic_by_endpoint AS (
  SELECT
    service,
    endpoint,
    DATE(ts) AS date,
    SUM(requests) AS total_requests
  FROM
    `your-project.analytics.traffic_metrics`
  WHERE
    DATE(ts) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
  GROUP BY
    service, endpoint, date
),
service_cost_per_request AS (
  SELECT
    service,
    date,
    total_cost_eur / NULLIF(total_requests, 0) AS eur_per_request
  FROM
    -- Use the service-level CTS query above
)
SELECT
  t.service,
  t.endpoint,
  t.date,
  t.total_requests,
  s.eur_per_request * t.total_requests AS estimated_endpoint_cost,
  s.eur_per_request AS cost_per_request
FROM
  traffic_by_endpoint t
JOIN
  service_cost_per_request s
USING (service, date)
ORDER BY
  estimated_endpoint_cost DESC;

Step 5: Create Executive Dashboard

Use the CTS view to build dashboards:

Daily Cost per Service

SELECT
  service,
  date,
  eur_per_1k_requests,
  total_requests,
  total_cost_eur,
  -- Trend: compare to previous day
  LAG(eur_per_1k_requests) OVER (
    PARTITION BY service
    ORDER BY date
  ) AS prev_day_cost_per_1k,
  eur_per_1k_requests - LAG(eur_per_1k_requests) OVER (
    PARTITION BY service
    ORDER BY date
  ) AS cost_change
FROM
  `your-project.analytics.cost_to_serve`
WHERE
  date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
ORDER BY
  date DESC, total_cost_eur DESC;

Top Cost Drivers

SELECT
  service,
  SUM(total_cost_eur) AS total_cost_30d,
  SUM(total_requests) AS total_requests_30d,
  AVG(eur_per_1k_requests) AS avg_cost_per_1k,
  -- Cost efficiency: lower is better
  AVG(eur_per_1k_requests) AS efficiency_score
FROM
  `your-project.analytics.cost_to_serve`
WHERE
  date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY
  service
ORDER BY
  total_cost_30d DESC
LIMIT 10;

Cost Anomaly Detection

-- Flag services with sudden cost increases
WITH daily_avg AS (
  SELECT
    service,
    date,
    eur_per_1k_requests,
    AVG(eur_per_1k_requests) OVER (
      PARTITION BY service
      ORDER BY date
      ROWS BETWEEN 6 PRECEDING AND 1 PRECEDING
    ) AS rolling_7d_avg,
    STDDEV(eur_per_1k_requests) OVER (
      PARTITION BY service
      ORDER BY date
      ROWS BETWEEN 6 PRECEDING AND 1 PRECEDING
    ) AS rolling_7d_stddev
  FROM
    `your-project.analytics.cost_to_serve`
  WHERE
    date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
)
SELECT
  service,
  date,
  eur_per_1k_requests,
  rolling_7d_avg,
  -- Flag if cost is 2 standard deviations above average
  CASE
    WHEN eur_per_1k_requests > rolling_7d_avg + 2 * rolling_7d_stddev
    THEN 'ANOMALY'
    ELSE 'NORMAL'
  END AS status
FROM
  daily_avg
WHERE
  status = 'ANOMALY'
ORDER BY
  date DESC;

Step 6: Correlate with Business Metrics

Connect costs to business value:

-- Cost-to-Serve with revenue/user metrics
WITH cts AS (
  SELECT * FROM `your-project.analytics.cost_to_serve`
),
revenue AS (
  SELECT
    service,
    DATE(transaction_time) AS date,
    SUM(amount) AS revenue_eur,
    COUNT(DISTINCT user_id) AS active_users
  FROM
    `your-project.analytics.transactions`
  WHERE
    DATE(transaction_time) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
  GROUP BY
    service, date
)
SELECT
  c.service,
  c.date,
  c.eur_per_1k_requests AS cost_per_1k,
  r.revenue_eur / NULLIF(c.total_requests, 0) * 1000 AS revenue_per_1k,
  -- Profit margin per request
  (r.revenue_eur / NULLIF(c.total_requests, 0) * 1000) - c.eur_per_1k_requests AS profit_per_1k,
  -- ROI
  (r.revenue_eur / NULLIF(c.total_cost_eur, 0)) AS roi
FROM
  cts c
JOIN
  revenue r
USING (service, date)
ORDER BY
  profit_per_1k DESC;

Common Use Cases

-- Weekly cost trend
SELECT
  EXTRACT(WEEK FROM date) AS week,
  service,
  AVG(eur_per_1k_requests) AS avg_cost_per_1k,
  SUM(total_cost_eur) AS weekly_cost
FROM
  `your-project.analytics.cost_to_serve`
WHERE
  date >= DATE_SUB(CURRENT_DATE(), INTERVAL 12 WEEK)
GROUP BY
  week, service
ORDER BY
  week DESC, weekly_cost DESC;

2. Flag High-Cost Endpoints

-- Endpoints costing more than €0.10 per request
SELECT
  service,
  endpoint,
  date,
  cost_per_request,
  total_requests,
  estimated_endpoint_cost
FROM
  `your-project.analytics.cost_to_serve_by_endpoint`
WHERE
  cost_per_request > 0.10
  AND date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
ORDER BY
  estimated_endpoint_cost DESC;

3. Forecast Future Costs

-- Simple linear forecast based on traffic growth
WITH historical AS (
  SELECT
    service,
    AVG(eur_per_1k_requests) AS avg_cost_per_1k,
    AVG(total_requests) AS avg_daily_requests,
    -- Traffic growth rate (simplified)
    (MAX(total_requests) - MIN(total_requests)) / COUNT(DISTINCT date) AS daily_growth
  FROM
    `your-project.analytics.cost_to_serve`
  WHERE
    date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
  GROUP BY
    service
)
SELECT
  service,
  avg_cost_per_1k,
  avg_daily_requests,
  -- Forecast 30 days ahead
  avg_daily_requests + (daily_growth * 30) AS forecasted_daily_requests,
  (avg_daily_requests + (daily_growth * 30)) * avg_cost_per_1k / 1000 * 30 AS forecasted_monthly_cost
FROM
  historical;

Troubleshooting

Problem: No traffic data

Solution: Start with service-level aggregation from billing export only:

SELECT
  service.description AS service,
  DATE(usage_start_time) AS date,
  SUM(cost) AS total_cost
FROM
  `your-project.billing_export.gcp_billing_export_v1_XXXXXX`
GROUP BY
  service, date;

Problem: Costs don’t match billing

Solution:

  • Check billing export delay (usually 24-48 hours)
  • Verify project filters
  • Include all cost components (compute, storage, network, etc.)

Problem: Missing service labels

Solution:

  • Use resource.name or resource.type as fallback
  • Add custom labels in application code
  • Use Cloud Monitoring resource labels

Best Practices

  1. Start simple: Service-level CTS is enough to get started
  2. Iterate: Add endpoint-level granularity after service-level works
  3. Automate: Schedule daily queries to keep data fresh
  4. Alert: Set up alerts for cost anomalies (> 20% increase)
  5. Correlate: Connect costs to business metrics (revenue, users)
  6. Document: Keep a data dictionary of service/endpoint mappings

Expected Outcomes

With cost-to-serve in place, you should be able to:

  • Track €/1k requests in executive dashboards
  • Correlate costs with p95 latency and error rates
  • Flag endpoints whose cost increases but value decreases
  • Prioritize optimization work based on cost impact
  • Forecast costs based on traffic growth
  • Price products based on actual infrastructure costs

Outcome: Aligns engineering work with money-decisions get faster and calmer.


Next Steps

  • Week 1: Set up service-level CTS
  • Week 2: Add endpoint-level granularity
  • Week 3: Correlate with business metrics
  • Week 4: Build executive dashboards and alerts

Want help setting this up?

© Copyright 2017-2025