Google Kubernetes Engine (GKE)
GKE is Google’s managed version of Kubernetes. There are a lot of container orchestration engines, but K8s is the most popular.
- GKE sits between IaaS and PaaS
- GKE manages and maintains logging, health management, and monitoring of clusters
- Easy to update the Kubernetes clusters and you can choose which release channel you’d like based on update frequency
- Create regional GKE clusters to improve the availability and resilience of your apps. This will distribute GKE control plane components, nodes, and pods across zones in a region.
Container Review
- Popularized by Docker initially → write once, run (almost) anywhere
- Single package of everything needed to run an app including dependencies (other than external dependencies like a database)
- Supports consistency across dev/test/prod environments
- Loose coupling between application and OS layers
- Simpler to migrate between on-prem and cloud (including other clouds)
- Supports agile development and operations
- Great for microservices
- Cloud Artifact Registry (replacing Container Registry) for Docker containers in a private repo, gcr.io/project-name. It supports both container images and non-container artifacts.
General Kubernetes (K8s) Notes
Helpful Supplemental Materials: Best Practices, Cluster Autoscaler, Networking Overview
- Containers run inside of Pods. A pod is a K8s object and is the smallest deployable unit in GKE, a container will always be inside a pod. Many pods will only contain a single container, but some may contain sidecars such as a proxy.
- A pod is a logical application-centric unit for hosting containers
- The containers inside of a pod should be tightly-coupled
- You should be creating objects declaratively using specification files in YAML
- Writing declaratively is opposed to imperative models like using Ansible or Chef when you describe each step
- A Deployment is a declarative, desired state for Pods and/or ReplicaSets. This is the preferred object for deploying compute workloads in K8s.
- Logic for updating, rolling back, and scaling deployments
- Proportional scaling and error checking for rollouts
- Deployments allow you to perform rolling updates, creating a ReplicaSet of Pods with the new container image. Old Pods are not removed until a sufficient number of new Pods are Running, and new Pods are not created until a sufficient number of old Pods have been removed.
- ReplicaSets maintains a stable set of replica pods running at any given time. It is used to guarantee the availability of a specified number of identical pods
- StatefulSets are used to manage stateful applications by providing guarantees about the ordering and uniqueness of the pods. Unlike a Deployment, a StatefulSet maintains a persistent identity for each pod.
- Stable network identity and persistent storage, along with ordered graceful deployment, scaling, and updates of pods
- Persistent disks in StatefulSets are retained even if a Stateful Pod is removed and must be manually deleted
- When Pods in a StatefulSet are being deleted, they are terminated and removed in reverse order
- Used for applications like Elasticsearch or any other application that holds state
- Stable network identity and persistent storage, along with ordered graceful deployment, scaling, and updates of pods
- DaemonSets ensure that all (or some) nodes run a copy of a pod. Examples include a cluster storage tool, log collection tool, custom drivers for a GPU, or monitoring tool where a workload requires access to a service on every node
- One Pod per Node mode across the cluster or a subset of nodes
- If new nodes are added, new daemon pods are automatically created on them
- A Kubernetes CronJob creates Jobs on a repeating schedule
- A Job creates one or more pods to complete a task, the pods will terminate when the job completes successfully
- A parallel job may have a fixed completion count running multiple pods to completion, parallelism is configurable
- Init Containers are not related to Jobs or CronJobs, but they are a part of the containers array in a Pod spec and execute before applications or other containers in a pod.
- Jobs and CronJobs are not to be confused with Cloud Scheduler
- Tainting a node tells the scheduler not to deploy pods on it with key:NoSchedule
- The
kubectl cordon
anddrain
options allow you to safely remove pods from a node and prevent further Pods from being scheduled there. This is also good for maintenance and upgrades - Operators allow you to add automated logic to deployments and extend the Kubernetes API functionality with custom resource definitions (CRDs.)
Automatically Scaling Deployments
- Horizontally scale Pods with HorizontalPodAutoscaler
- Auto-scales the number of Pod replicas in a ReplicaSet/Deployment
- CPU and Memory thresholds are observed by GKE to trigger scaling
- Custom, multiple, and external metrics (from Cloud Logging) can be used
- Vertically scale Pods with VerticalPodAutoscaler
- Newer feature that recommends or applies CPU and RAM requests, more suited for stateful deployments where horizontal scaling isn’t ideal for the workload
- Cannot work alongside the HorizontalPodAutoscaler
- Combine horizontal and vertical with multidimensional Pod autoscaling (currently in beta as of April 2023)
- Scale cluster nodes with Node-pool cluster autoscaling
- The Cluster Autoscaler adds new VMs when we need more pods but don’t have the capacity. Underutilized nodes are torn down. You specify a minimum and maximum number of nodes per node pool.
gcloud container …. --enable-autoscaling --min-nodes 1 --max-nodes 5
- Works best with Node Pools, can have autoscaling policies per node pool
- Node Pools are a group of nodes within a cluster that all have the same configuration
- Node pools should be designed around the specific requirements of workloads, then enable cluster and horizontal pod autoscaling when appropriate
- Supports preemptible VMs in clusters and node pools
- The Cluster Autoscaler adds new VMs when we need more pods but don’t have the capacity. Underutilized nodes are torn down. You specify a minimum and maximum number of nodes per node pool.
You can pre-plan the consumption of your resources with CPU and Memory requests, some interesting Kubernetes CPU Math, 1 CPU = 1000 millicores, 100m = 1/10 of a CPU. You can also set limits to terminate pods if they exceed resource usage.
Services Overview
- Another Kubernetes object that exposes a set of pods to the network. It assigns a fixed IP to your pod replicas, three main types:
- ClusterIP: An internal IP address for your pods, kind of like an internal load balancer
- NodePort: Exposes the service on each Node’s IP at a static port
- LoadBalancer: K8s cloud controller manager creates a GCP Network Load Balancer
- Configured with selectors, key-value pairs in object metadata
- Selectors search for group of labels, like “app=nginx”
- Any pod that matches a selector will become part of that service
- Also includes a built-in DNS name
Health Checks Overview
- Liveness Probes are checks performed by a kubelet, it can check an HTTP endpoint, TCP socket, or run a command
- Include it in a pod spec YAML
- Readiness Probes are similar, but define when a pod is ready to start serving traffic. Traffic won’t be directed until it’s ready. Includes an initial delay.
- Also included in a pod spec YAML
- Probes are performed by a handler:
- ExecAction
- TCPSocketAction
- HTTPGetAction – Any response > or = to 200 and < 400 indicates success. Any other code indicates failure.
- initialDelaySeconds, periodSeconds, timeoutSeconds
- successThreshold, failureThreshold
Accessing External Services
- Service Endpoints are services with no selector, maps to an IP or FQDN
- Create a ClusterIP Service with corresponding Endpoint object or
- Create an ExternalName with FQDN
- Endpoints can point to multiple IP addresses
- Sidecars can provide a connection to the external service, essentially a proxy
Volumes & Persistent Storage Overview
- Reminder: Container storage is ephemeral and goes away when a container dies, kind of like a local SSD on a VM
- A PersistentVolume is like a persistent disk on a VM, it is a K8s object that defines a piece of storage in the cluster, configured with a Storage Class, and can be manually or dynamically provisioned
- Access Modes define how a volume may be accessed by multiple containers
- ReadWriteOnce – A Single node can mount and read/write
- ReadOnlyMany – Any node can mount, but read-only
- ReadWriteMany – Not supported by GCP persistent disks!
- If you create a PersistentVolumeClaim with a resource request, GKE will dynamically create a disk and volume using the Storage Class provisioner (which will be a persistent disk, not a GCS bucket)
- You need to consume the volume claim when defining volumes in a Pod spec
- Access Modes define how a volume may be accessed by multiple containers
- Volumes are independent objects and are directories mounted in a container to access files, here are the most common types:
- emptyDir: scratch space that can be shared by multiple containers in the same pod. Deleted forever when a Pod is removed from a Node.
- gcePersistentDisk: A volume type native to GCP, must be created beforehand and can be pre-populated with data and mounted read-only by multiple consumers. Will be unmounted when a Pod is removed.
- PersistentVolumeClaim: Used to mount a PersistentVolume into a Pod, a way to “claim” durable storage, and it requires a matching PersistentVolume object
- You’ll probably see this most frequently
- StorageClass can be updated, GKE default is standard persistent disks, but you can also change from standard to SSD and set up regional availability
- Constraints
- All replicas in a Deployment share the same PersistentVolumeClaim, so it must be in ReadOnlyMany if more than one Pod needs access
- Before using volumes (and persistent volumes) ask yourself, “Does your data really need to be stored in a disk? Or can you abstract this to GCS and Databases?” For example, don’t mount a disk just to serve images on a website.
ConfigMaps & Secrets Overview
- Secrets are objects designed to obfuscate sensitive data and insert it at runtime. They can be consumed as environment variables or volumes. They are encoded, not encrypted. Cloud KMS can encrypt secrets for an added layer of actual security
- ConfigMaps objects decouple app configuration from image content. Created from files, directories, or literal values. They can be referenced as environment variables or mounted as a Volume.
Kubernetes Deployment Patterns
Rolling Updates
- Rolling updates are the default update strategy
- Gradually replace Pods with an updated spec
- Control how many additional Pods may be created
- Specify threshold for failed pods to determine if it was successful
- Use
kubectl set image
for rolling updates - Use
kubectl rollout undo
to roll back a deployment - The RollingUpdate strategy allows you to confidently roll out a new version of an application. Defining a threshold for the maximum number of unavailable Pods will stall a rollout if new Pods do not become ready within a certain time, potentially catching any issues with the application update. A surge policy will allow slightly more Pods to be running than normal, so that the new rollout can be attempted without removing all of the existing Pods.
- If you use Strategy: Recreate, all existing pods are killed before the new ones are created. This is good if the pod needs to write to a persistent disk using ReadWriteOnce, but there may be some downtime between Pod versions
Canary Deployments
- Combines multiple Deployments with a single Service
- Deploy updates to a small subset of traffic (distributing fewer replicas for a canary)
- Common way to test on production traffic
- Can be automated with Spinnaker
Blue-Green (or Red-Black)
- Maintain two versions of your application deployment
- Switch traffic from blue to green with the Service selector
- All traffic immediately sent to new deployment, just flip the selector back if necessary
Helm: The Kubernetes Package Manager
- Helm is a standalone tool that packages Kubernetes object manifests and configurations into a Helm chart. It saves a lot of time writing extensive YAML files
- Maintains the lifecycle of a deployment to GKE
- There are public repos of helm charts for popular software, mostly OSS
- Other manifest-management solutions are available
Using Helm
- Install the helm tool
- Search for software in Artifact Hub (Kind of like Docker Hub)
- Add the necessary Helm repository
- helm repo add bitnami https://charts.bitnami.com/bitnami
- Install the Helm chart
- helm install my-wordpress bitnami/wordpress
- Helm applies the templated manifests from the chart to your cluster, can specify optional variables (use the -f values.yaml file to be applied to the underlying Chart)
Advanced Ingress Controls
- Ingress is a more customizable way to expose traffic than a cloud load balancer
- An Ingress object configures access to services from outside the cluster
- Designed for HTTP and HTTPS services (web traffic)
- Can provide SSL, name-based, and path-based routing, can also add rewrite rules
- Once you have an Ingress defined, you need an Ingress Controller:
- Ingress Controllers route traffic to services based on Ingress definitions
- Usually fronted by a Cloud Load Balancer
- Consolidates your routing through a single resource, you don’t want 10 load balancers for a website with 10 services, you want 1 with effective path-based routing
- NGINX is a very common ingress controller
Multi-cluster ingress can be used to replicate application deployments across multiple GKE clusters. kubemci creates a Global Cloud Load Balancer and can direct traffic to regions based on the lowest latency. (Need to install kubemci to use it.)
- This can also be used to route to multiple clusters using Anthos, see Part VIII for Anthos notes.
Running a Secure GKE Cluster
Remember high-level cloud security principles including VPC and IAM configuration and other GCP resources. Make sure you’re using trusted container images, secure code, and that you’re running the latest version of images with the latest updates.
Private Clusters are a type of VPC-native cluster that only uses internal IP addresses. Nodes, Pods, and Services in a private cluster require unique subnet IP address ranges.
Binary Authorization is a service on Google Cloud that provides centralized software supply-chain security for applications that run on Google Kubernetes Engine (GKE) and Anthos clusters on VMware. It ensures that only signed and authorized images are deployed in your environments. It supports signature-based verification and also allows listing images using name patterns from a repo, path, or set of images.
Role-Based Access Control (RBAC)
Granular method of regulating object access to cluster resources that can be applied at a namespace or cluster level. It grants a set of actions to specific API groups and resources and helps with applying least privilege principles. Can be used with Pod service accounts. There are Roles and ClusterRoles along with RoleBindings and ClusterRoleBindings.
Namespaces & Resource Restrictions
Virtual clusters used to isolate resources for multiple teams or projects. Can divide cluster resources with resource quotas. Default and kube-system are set up automatically on new clusters. Namespaces are a scope for resource names, so object names need to be unique in a namespace but not in a cluster (that’s why the namespace is in DNS names)
Pod Security Policies
Cluster-level resource that controls security-sensitive aspects of a Pod spec, but are deprecated so you probably don’t need to study this. They were confusing for K8s devs so they’d be confusing for anyone trying to pass the PCA exam.
Network Policies
These are like firewalls for pods. Network Policies are objects that define ingress and egress rules for Pods using selectors (and port numbers/protocols) and restrict their incoming and outgoing traffic. They can be used to isolate traffic between namespaces and can get very granular. This setting should be enabled when the cluster is built.
Workload Identities
The recommended way to access GCP services from applications running within GKE because it is more secure and easier to manage. Remember, by default, VMs in GCP use service accounts. Workload identities allow you to map custom GCP service accounts to specific workloads in GKE. Kubernetes native service accounts can be mapped to GCP service accounts with the same name!
Service Mesh Overview (Also important for Anthos… which is covered in the next post.)
Service meshes allow you to confidently operate microservices at scale. They allow you to manage traffic, maintain the reliability and visibility of services, along with dependency management
- Data Plane – Sidecar container running a proxy (Envoy in Istio). It controls network traffic in and out of the Pod. The data plane communicates with a control plane to receive routing logic and send metrics
- Control Plane – In Istio, there are 3 primary components. Istio can be enabled as part of a GKE installation or can be installed via Helm into the istio-system namespace.
- Pilot: Configures the data plane, defines proxy rules and behavior
- Mixer: Collects traffic metrics and responds to authorization, access control, or quota checks
- Citadel: Assigns TLS certificates to each service and enables end-to-end encryption
- Makes traffic management easier (good for blue/green deployments), security between pods, collecting telemetry, visualization in topology graphs. Good for SLA teams who need to solve problems with distributed tracing
- Improves security by including a managed private certificate authority (Mesh CA) for issuing mTLS certs. Review full security overview.
Traffic Director is GCP’s fully managed traffic control plane for service mesh. Uses Envoy Proxy under the hood. Works for TCP and HTTP traffic now. For details, see this video. It works at a more global level, offering Envoy proxies for Virtual Machines instead of just pods.