It is essential to have a dashboard that shows how much data are sent within and outside of your infrastructure because:
Data sent between two different zones has cost.
E.g. Regional GKE clusters.Data sent between two regions has cost.
E.g. Inter-region communicating GKE clusters.Data sent outside (internet) has a cost.
E.g. APIs responses at a high rate can incur a lot of egress cost.
See https://cloud.google.com/compute/network-pricing.
Solution
Aggregate the VPC Flow logs to BigQuery and visualize it in Grafana.
Enable VPC Flow Logs
Read and follow the simple guide from here https://cloud.google.com/vpc/docs/using-flow-logs#enabling_vpc_flow_logging.
Summary:
- VPC Flow logs are enabled at the subnet level.
- VPC Flow logs are sent to Stackdriver logs.
- Logs must be sampled to reduce cost. See https://cloud.google.com/stackdriver/pricing?#table. Start with 90% sampling rate with 15 minutes interval.
If you are using terraform, this is what you may need:
resource "google_compute_subnetwork" "my_subnet" {
#
# ... some code here ...
#
enable_flow_logs = true
log_config {
aggregation_interval = "INTERVAL_15_MIN"
flow_sampling = 0.9
metadata = "INCLUDE_ALL_METADATA"
}
}
Go to Stackdriver logs and wait for the VPC logs to show using this filter.
resource.type="gce_subnetwork"
logName="projects/<YOUR GOOGLE CLOUD PROJECT ID>/logs/compute.googleapis.com%2Fvpc_flows"
Export Logs to BigQuery
Create a new Logs router sink and name it vpc_flow_logs
.
Choose the option to automatically create a BigQuery data set. If you follow the example, it will name the data set vpc_flow_logs
.
Use the same logs filter for the sink.
resource.type="gce_subnetwork"
logName="projects/<YOUR GOOGLE CLOUD PROJECT ID>/logs/compute.googleapis.com%2Fvpc_flows"
Using BigQuery
Go to BigQuery console, and copy-paste the following example queries.
A little SQL knowledge is required. If the SQL queries are not working, please let me know in the comments section!
Query 1: How much internet bandwidth was transmitted from a VPC to a continent since the date 2020-04-01
?.
SELECT CONCAT(jsonPayload.src_vpc.vpc_name, " vpc to ", jsonPayload.dest_location.continent) AS vpc_dest,
sum(CAST (jsonPayload.bytes_sent AS INT64)) AS bytes_sent
FROM `vpc_flow_logs.compute_googleapis_com_vpc_flows_*`
WHERE TIMESTAMP >= "2020-04-01 00:00:00 UTC"
AND jsonPayload.dest_location.continent IS NOT NULL
AND jsonPayload.dest_location.continent != ""
GROUP BY vpc_dest
ORDER BY bytes_sent
Example result
vpc_dest | bytes_sent |
---|---|
foo vpc to Europe | 133674777320 |
bar vpc to Asia | 3013921321 |
baz vpc to America | 934811223 |
Query 2: How much infrastructure bandwidth internally was transmitted from one region or zone to another region or zone since the date 2020-04-01
?
SELECT
TIMESTAMP_TRUNC(TIMESTAMP, HOUR) AS timestamp,
CONCAT(jsonPayload.src_vpc.vpc_name, " ", jsonPayload.src_instance.zone, " to ", jsonPayload.dest_vpc.vpc_name, " ", jsonPayload.dest_instance.zone) AS route,
sum(CAST (jsonPayload.bytes_sent AS INT64)) AS bytes_sent
FROM `vpc_flow_logs.compute_googleapis_com_vpc_flows_*`
WHERE TIMESTAMP >= "2020-04-01 00:00:00 UTC"
AND jsonPayload.src_instance.zone != ""
AND jsonPayload.dest_instance.zone != ""
AND jsonPayload.src_location IS null
AND jsonPayload.dest_location IS null
GROUP BY
timestamp, route
ORDER BY
timestamp ASC
Example result
route | bytes_sent |
---|---|
foo europe-west1-b to foo europe-west1-a | 2132121 |
bar us-west1-a to foo europe-west1-b | 765756 |
baz us-west1-a to bar europe-west1-b | 634334 |
For column names reference, see vpc flow record format and stackdriver log format.
Using Grafana
It is required that you install and configure the BigQuery Datasource plugin first before proceeding.
Building the Graphs
Here, you need to change the example queries and make use of Grafana SQL Macros.
Changes:
-
$__timeGroupAlias(TIMESTAMP, 1h)
is used to aggregate theTIMESTAMP
column with 1 hour resolution. - The column name that will be aliased as
metric
will be the label that is shown or grouped within the Grafana graph. -
$__timeFilter(TIMESTAMP)
is used to use the Grafana time range filter as theTIMESTAMP
value. - Numbers that represent the column order are used in
GROUP BY
andORDER BY
for simplification.
Build the Internet Egress graph
SELECT
$__timeGroupAlias(TIMESTAMP, 1h),
CONCAT(jsonPayload.src_vpc.vpc_name, " vpc to ", jsonPayload.dest_location.continent) AS metric,
sum(CAST (jsonPayload.bytes_sent AS INT64)) AS bytes_sent
FROM `vpc_flow_logs.compute_googleapis_com_vpc_flows_*`
WHERE $__timeFilter(TIMESTAMP)
AND jsonPayload.dest_location.continent IS NOT NULL
AND jsonPayload.dest_location.continent != ""
GROUP BY
1, 2
ORDER BY
1, 2
Build the Inter VPC Egress graph
SELECT
$__timeGroupAlias(TIMESTAMP, 1h),
CONCAT(jsonPayload.src_vpc.vpc_name, " ", jsonPayload.src_instance.zone, " to ", jsonPayload.dest_vpc.vpc_name, " ", jsonPayload.dest_instance.zone) AS metric,
sum(CAST (jsonPayload.bytes_sent AS INT64)) AS bytes_sent
FROM `vpc_flow_logs.compute_googleapis_com_vpc_flows_*`
WHERE $__timeFilter(TIMESTAMP)
AND jsonPayload.src_instance.zone != ""
AND jsonPayload.dest_instance.zone != ""
AND jsonPayload.src_location IS null
AND jsonPayload.dest_location IS null
GROUP BY
1, 2
ORDER BY
1, 2
The End
Ensure that you watch your Stackdriver logs ingestion rate to prevent unwanted cost!
Thanks for reading!
In my next post, I will write a guide on how to visualize GCP Billing data in Grafana.
References:
- Another good alternative for Grafana is Redash.
- https://cloud.google.com/blog/products/data-analytics/grafana-and-bigquery-together-at-last.
Top comments (0)