This page describes how to use a service load balancing policy to support advanced cost, latency, and resiliency optimizations for the following load balancers:
- Global external Application Load Balancer
- Cross-region internal Application Load Balancer
- Global external proxy Network Load Balancer
- Cross-region internal proxy Network Load Balancer
Cloud Service Mesh also supports advanced load balancing optimizations. For details, see Advanced load balancing overview in the Cloud Service Mesh documentation.
A service load balancing policy (serviceLbPolicy
) is a resource associated
with the load balancer's backend
service. A service load balancing policy
lets you customize the
parameters that influence how traffic is distributed within the backends
associated with a backend service:
- Customize the load balancing algorithm used to determine how traffic is distributed within a particular region or a zone.
- Enable auto-capacity draining so that the load balancer can quickly drain traffic from unhealthy backends.
- Set a failover threshold to determine when a backend is considered unhealthy. This lets traffic fail over to a different backend to avoid unhealthy backends.
Additionally, you can designate specific backends as preferred backends. These backends must be used to capacity before requests are sent to the remaining backends.
The following diagram shows how Cloud Load Balancing evaluates routing, load balancing, and traffic distribution.
Before you begin
Before reviewing the contents of this page, carefully review the Request distribution process described on the External Application Load Balancer overview page. For load balancers that are always Premium Tier, all the load balancing algorithms described on this page support spilling over between regions if a first-choice region is already full.
Supported backends
Service load balancing policies and all of the features described on this page require compatible backends that support a balancing mode. Supported backends are summarized in the following table:
Backend | Supported? |
---|---|
Instance groups | Zonal unmanaged and zonal managed instance groups are supported, but regional managed instance groups are not. |
Zonal NEGs (GCE_VM_IP_PORT endpoints) |
|
Zonal NEGs (GCE_VM_IP endpoints) |
These types of NEGs are not supported by Application Load Balancers and Proxy Network Load Balancers. |
Hybrid NEGs (NON_GCP_PRIVATE_IP_PORT endpoints) |
|
Serverless NEGs | |
Internet NEGs | |
Private Service Connect NEGs |
Load balancing algorithms
This section describes the load balancing algorithms that you can configure in a
service load balancing policy. If you don't configure an algorithm, or if you
don't configure a service load balancing policy at all, the load balancer uses
WATERFALL_BY_REGION
by default.
Waterfall by region
WATERFALL_BY_REGION
is the default load balancing algorithm. With this
algorithm, in aggregate, all the Google Front Ends (GFEs) in the region closest
to the user attempt to fill backends in proportion to their configured target
capacities (modified by their capacity scalers).
Each individual second-layer GFE prefers to select backend instances or
endpoints in a zone that's as close as possible (defined by network round-trip
time) to the second-layer GFE. Because WATERFALL_BY_REGION
minimizes latency
between zones, at low request rates, each second-layer GFE might exclusively
send requests to backends in the second-layer GFE's preferred zone.
If all the backends in the closest region are running at their configured capacity limit, traffic will then start to overflow to the next closest region while optimizing network latency.
Spray to region
The SPRAY_TO_REGION
algorithm modifies the individual behavior of each
second-layer GFE to the extent that each second-layer GFE has no preference
for selecting backend instances or endpoints that are in a zone as close as
possible to the second-layer GFE. With SPRAY_TO_REGION
, each second-layer
GFE sends requests to all backend instances or endpoints, in all zones of the
region, without preference for a shorter round-trip time between the
second-layer GFE and the backend instances or endpoints.
Like WATERFALL_BY_REGION
, in aggregate, all second-layer GFEs in the region
fill backends in proportion to their configured target capacities (modified by
their capacity scalers).
While SPRAY_TO_REGION
provides more uniform distribution among backends in all
zones of a region, especially at low request rates, this uniform distribution
comes with the following considerations:
- When backends go down (but continue to pass their health checks), more second-layer GFEs are affected, though individual impact is less severe.
- Because each second-layer GFE has no preference for one zone over another, the second-layer GFEs create more cross-zone traffic. Depending on the number of requests being processed, each second-layer GFE might create more TCP connections to the backends as well.
Waterfall by zone
The WATERFALL_BY_ZONE
algorithm modifies the individual behavior of each
second-layer GFE to the extent that each second-layer GFE has a very strong
preference to select backend instances or endpoints that are in the
closest-possible zone to the second-layer GFE. With WATERFALL_BY_ZONE
, each
second-layer GFE only sends requests to backend instances or endpoints in
other zones of the region when the second-layer GFE has filled (or
proportionally overfilled) backend instances or endpoints in its most favored
zone.
Like WATERFALL_BY_REGION
, in aggregate, all second-layer GFEs in the region
fill backends in proportion to their configured target capacities (modified by
their capacity scalers).
The WATERFALL_BY_ZONE
algorithm minimizes latency with the following
considerations:
WATERFALL_BY_ZONE
does not inherently minimize cross-zone connections. The algorithm is steered by latency only.WATERFALL_BY_ZONE
does not guarantee that each second-layer GFE always fills its most favored zone before filling other zones. Maintenance events can temporarily cause all traffic from a second-layer GFE to be sent to backend instances or endpoints in another zone.WATERFALL_BY_ZONE
can result in less uniform distribution of requests among all backend instances or endpoints within the region as a whole. For example, backend instances or endpoints in the second-layer GFE's most favored zone might be filled to capacity while backends in other zones are not filled to capacity.
Compare load balancing algorithms
The following table compares the different load balancing algorithms.
Behavior | Waterfall by region | Spray to region | Waterfall by zone |
---|---|---|---|
Uniform capacity usage within a single region | Yes | Yes | No |
Uniform capacity usage across multiple regions | No | No | No |
Uniform traffic split from load balancer | No | Yes | No |
Cross-zone traffic distribution | Yes. Traffic is distributed evenly across zones in a region while optimizing network latency. Traffic might be sent across zones if needed. | Yes | Yes. Traffic first goes to the nearest zone until it is at capacity. Then, it goes to the next closest zone. |
Sensitivity to traffic spikes in a local zone | Average; depends on how much traffic has already been shifted to balance across zones. | Lower; single zone spikes are spread across all zones in the region. | Higher; single zone spikes are likely to be served entirely by the same zone until the load balancer is able to react. |
Auto-capacity draining and undraining
Auto-capacity draining and undraining combine the concepts of health checks and backend capacity. With auto-capacity draining, health checks are used as an additional signal to set effective backend capacity to zero. With auto-capacity undraining, health checks are used as an additional signal to restore the effective backend capacity to its previous value.
Without auto-capacity draining and undraining, if you want to direct requests away from all backends in a particular region, you must manually set the effective capacity of each backend in that region to zero. For example, you can use the capacity scaler to do this.
With auto-capacity draining and undraining, health checks can be used as a signal to adjust the capacity of a backend, either by draining or undraining.
To enable auto-capacity draining and un-draining, see Configure a service load balancing policy.
Auto-capacity draining
Auto-capacity draining sets the capacity of a backend to zero when both of the following conditions are true:
- Fewer than 25% of the backend's instances or endpoints pass health checks.
- The total number of backend instance groups or NEGs that are to be drained automatically doesn't exceed 50% of the total backend instance groups or NEGs. When calculating the 50% ratio, backends with zero capacity are not included in the numerator. However, all backends are included in the denominator.
Backends with zero capacity are the following:
- Backend instance groups with no member instances, where the instance group capacity is defined on a per instance basis
- Backend NEGs with no member endpoints, where the NEG capacity is defined on a per endpoint basis
- Backend instance groups or NEGs with capacity scalers set to zero
Automatically drained backend capacity is functionally equivalent to manually
setting a backend's backendService.backends[].capacityScaler
to 0
, but
without setting the capacity scaler value.
Auto-capacity undraining
Auto-capacity undraining returns the capacity of a backend to the value controlled by the backend's capacity scaler when 35% or more of the backend instances or endpoints pass health checks for at least 60 seconds. The 60 second requirement reduces the chances of sequential draining and undraining when health checks fail and pass in rapid succession.
Failover threshold
The load balancer determines the distribution of traffic among backends in a multi-level fashion. In the steady state, it sends traffic to backends that are selected based on one of the previously described load balancing algorithms. These backends, called primary backends, are considered optimal in terms of latency and capacity.
The load balancer also keeps track of other backends that can be used if the primary backends become unhealthy and are unable to handle traffic. These backends are called failover backends. These backends are typically nearby backends with remaining capacity.
If instances or endpoints in the primary backend become unhealthy, the load balancer doesn't shift traffic to other backends immediately. Instead, the load balancer first shifts traffic to other healthy instances or endpoints in the same backend to help stabilize traffic load. If too many endpoints in a primary backend are unhealthy, and the remaining endpoints in the same backend are not able to handle the extra traffic, the load balancer uses the failover threshold to determine when to start sending traffic to a failover backend. The load balancer tolerates unhealthiness in the primary backend up to the failover threshold. After that, traffic is shifted away from the primary backend.
The failover threshold is a value between 1 and 99, expressed as a percentage of endpoints in a backend that must be healthy. If the percentage of healthy endpoints falls below the failover threshold, the load balancer tries to send traffic to a failover backend. By default, the failover threshold is 70.
If the failover threshold is set too high, unnecessary traffic spills can occur due to transient health changes. If the failover threshold is set too low, the load balancer continues to send traffic to the primary backends even though there are a lot of unhealthy endpoints.
Failover decisions are localized. Each local Google Front End (GFE) behaves independently of the other. It is your responsibility to make sure that your failover backends can handle the additional traffic.
Failover traffic can result in overloaded backends. Even if a backend is unhealthy, the load balancer might still send traffic there. To exclude unhealthy backends from the pool of available backends, enable the auto-capacity drain feature.
Preferred backends
Preferred backends are backends whose capacity you want to completely use before spilling traffic over to other backends. Any traffic over the configured capacity of preferred backends is routed to the remaining non-preferred backends. The load balancing algorithm then distributes traffic between the non-preferred backends of a backend service.
You can configure your load balancer to prefer and completely use one or more backends attached to a backend service before routing subsequent requests to the remaining backends.
Consider the following limitations when you use preferred backends:
- The backends configured as preferred backends might be further away from the clients and result in higher average latency for client requests. This happens even if there are other closer backends which could have served the clients with lower latency.
- Certain load balancing algorithms (
WATERFALL_BY_REGION
,SPRAY_TO_REGION
, andWATERFALL_BY_ZONE
) don't apply to backends configured as preferred backends.
To learn how to set preferred backends, see Set preferred backends.
Configure a service load balancing policy
The service load balancing policy resource lets you configure the following fields:
- Load balancing algorithm
- Auto-capacity draining
- Failover threshold
To set a preferred backend, see Set preferred backends.
Create a policy
To create and configure a service load balancing policy, complete the following steps:
Create a service load balancing policy resource. You can do this either by using a YAML file or directly, by using
gcloud
parameters.With a YAML file. You specify service load balancing policies in a YAML file. Here is a sample YAML file that shows you how to configure a load balancing algorithm, enable auto-capacity draining, and to set a custom failover threshold:
name: projects/PROJECT_ID/locations/global/serviceLbPolicies/SERVICE_LB_POLICY_NAME autoCapacityDrain: enable: True failoverConfig: failoverHealthThreshold: FAILOVER_THRESHOLD_VALUE loadBalancingAlgorithm: LOAD_BALANCING_ALGORITHM
Replace the following:
- PROJECT_ID: the project ID.
- SERVICE_LB_POLICY_NAME: the name of the service load balancing policy.
- FAILOVER_THRESHOLD_VALUE: the failover threshold value. This should be a number between 1 and 99.
- LOAD_BALANCING_ALGORITHM: the load balancing
algorithm to be used. This can be either
SPRAY_TO_REGION
,WATERFALL_BY_REGION
, orWATERFALL_BY_ZONE
.
After you create the YAML file, import the file to a new service load balancing policy.
gcloud network-services service-lb-policies import SERVICE_LB_POLICY_NAME \ --source=PATH_TO_POLICY_FILE \ --location=global
Without a YAML file. Alternatively, you can configure service load balancing policy features without using a YAML file.
To set the load balancing algorithm and enable auto-draining, use the following parameters:
gcloud network-services service-lb-policies create SERVICE_LB_POLICY_NAME \ --load-balancing-algorithm=LOAD_BALANCING_ALGORITHM \ --auto-capacity-drain \ --failover-health-threshold=FAILOVER_THRESHOLD_VALUE \ --location=global
Replace the following:
- SERVICE_LB_POLICY_NAME: the name of the service load balancing policy.
- LOAD_BALANCING_ALGORITHM: the load balancing
algorithm to be used. This can be either
SPRAY_TO_REGION
,WATERFALL_BY_REGION
, orWATERFALL_BY_ZONE
. - FAILOVER_THRESHOLD_VALUE: the failover threshold value. This should be a number between 1 and 99.
Update a backend service so that its
--service-lb-policy
field references the newly created service load balancing policy resource. A backend service can only be associated with one service load balancing policy resource.gcloud compute backend-services update BACKEND_SERVICE_NAME \ --service-lb-policy=SERVICE_LB_POLICY_NAME \ --global
You can associate a service load balancing policy with a backend service while creating the backend service.
gcloud compute backend-services create BACKEND_SERVICE_NAME \ --protocol=PROTOCOL \ --port-name=NAMED_PORT_NAME \ --health-checks=HEALTH_CHECK_NAME \ --load-balancing-scheme=LOAD_BALANCING_SCHEME \ --service-lb-policy=SERVICE_LB_POLICY_NAME \ --global
Remove a policy
To remove a service load balancing policy from a backend service, use the following command:
gcloud compute backend-services update BACKEND_SERVICE_NAME \ --no-service-lb-policy \ --global
Set preferred backends
You can configure preferred backends by using either the Google Cloud CLI or the API.
gcloud
Add a preferred backend
To set a preferred backend, use the gcloud compute backend-services
add-backend
command
to set the --preference
flag when you're adding the backend to the
backend service.
gcloud compute backend-services add-backend BACKEND_SERVICE_NAME \ ... --preference=PREFERENCE \ --global
Replace PREFERENCE with the level of preference you want to
assign to the backend. This can be either PREFERRED
or DEFAULT
.
The rest of the command depends on the type of backend you're using
(instance group or NEG). For all the required parameters, see the
gcloud compute backend-services add-backend
command.
Update a backend's preference
To update a backend's --preference
parameter, use the
gcloud compute backend-services update-backend
command.
gcloud compute backend-services update-backend BACKEND_SERVICE_NAME \ ... --preference=PREFERENCE \ --global
The rest of the command depends on the type of backend you're using
(instance group or NEG). The following example command updates a
backend instance group's preference and sets it to PREFERRED
:
gcloud compute backend-services update-backend BACKEND_SERVICE_NAME \ --instance-group=INSTANCE_GROUP_NAME \ --instance-group-zone=INSTANCE_GROUP_ZONE \ --preference=PREFERRED \ --global
API
To set a preferred backend, set the preference
flag on each
backend by using the global backendServices
resource.
Here is a sample that shows you how to configure the backend preference:
name: projects/PROJECT_ID/locations/global/backendServices/BACKEND_SERVICE_NAME
...
- backends
name: BACKEND_1_NAME
preference: PREFERRED
...
- backends
name: BACKEND_2_NAME
preference: DEFAULT
...
Replace the following:
- PROJECT_ID: the project ID
- BACKEND_SERVICE_NAME: the name of the backend service
- BACKEND_1_NAME: the name of the preferred backend
- BACKEND_2_NAME: the name of the default backend
Troubleshooting
Traffic distribution patterns can change when you attach a new service load balancing policy to a backend service.
To debug traffic issues, use Cloud Monitoring to look at how traffic flows between the load balancer and the backend. Cloud Load Balancing logs and metrics can also help you understand load balancing behavior.
This section summarizes a few common scenarios that you might see in the newly exposed configuration.
Traffic from a single source is sent to too many distinct backends
This is the intended behavior of the SPRAY_TO_REGION
algorithm. However, you
might experience issues caused by wider distribution of your traffic. For
example, cache hit rates might decrease because backends see traffic from a
wider selection of clients. In this case, consider using other algorithms like
WATERFALL_BY_REGION
.
Traffic is not being sent to backends with lots of unhealthy endpoints
This is the intended behavior when autoCapacityDrain
is enabled. Backends
with a lot of unhealthy endpoints are drained and removed from the load
balancing pool. If you don't want this behavior, you can disable auto-capacity
draining. However, this means that traffic can be sent to backends with a lot
of unhealthy endpoints and requests can fail.
Traffic is being sent to more distant backends before closer ones
This is the intended behavior if your preferred backends are further away than your default backends. If you don't want this behavior, update the preference settings for each backend accordingly.
Traffic is not being sent to some backends when using preferred backends
This is the intended behavior when your preferred backends have not yet reached capacity. The preferred backends are assigned first based on round-trip time latency to these backends.
If you want traffic sent to other backends, you can do one of the following:
- Update preference settings for the other backends.
- Set a lower target capacity setting for your preferred backends. The target
capacity is configured by using either the
max-rate
or themax-utilization
fields depending on the backend service's balancing mode.
Traffic is being sent to a remote backend during transient health changes
This is the intended behavior when the failover threshold is set to a high value. If you want traffic to keep going to the primary backends when there are transient health changes, set this field to a lower value.
Healthy endpoints are overloaded when other endpoints are unhealthy
This is the intended behavior when the failover threshold is set to a low value. When endpoints are unhealthy, the traffic intended for these unhealthy endpoints is instead spread among the remaining endpoints in the same backend. If you want the failover behavior to be triggered sooner, set this field to a higher value.
Limitations
- Each backend service can only be associated with a single service load balancing policy resource.