K8s Architecture
(image source: https://kubernetes.io/docs/concepts/architecture/)
Kubernetes (K8s) architecture is designed to be distributed and modular, consisting of multiple components that work together to manage containerized applications across a cluster of nodes. Here’s an overview of the primary components of Kubernetes architecture:
1. Control Plane (Master Node Components)
The control plane manages the state of the Kubernetes cluster, making global decisions about the cluster, and detecting and responding to cluster events.
- API Server (
kube-apiserver
): The central management entity and the only component that interacts with the etcd storage directly. It exposes the Kubernetes API and is the front-end for the Kubernetes control plane. - Etcd: A consistent and highly-available key-value database used for all cluster data, ensuring data persistence and state management.
- Scheduler (
kube-scheduler
): Responsible for assigning new pods to nodes based on resource availability, constraints, affinity/anti-affinity specifications, etc, it also supports custom scheduling, rescheduling. - Controller Manager (
kube-controller-manager
): Runs controller processes, which regulate the state of the system, managing node lifecycle, replication, endpoint creation, and other tasks.- Node Controller: Checks the status of nodes.
- Replication Controller: Maintains the correct number of pods for every replication group.
- Endpoints Controller: Joins Services & Pods.
- Service Account & Token Controllers: Create default accounts and API access tokens for new namespaces.
- Cloud Controller Manager (
cloud-controller-manager
): Links the cluster into the cloud provider’s API, managing components that interact with underlying cloud services.
2. Node Components
Nodes are worker machines in Kubernetes, which can be either physical or virtual machines, depending on the cluster.
- Kubelet (
kubelet
): An agent running on each node, ensuring that containers are running in a pod. Its primary responsibilities include pod management, health checking, node registration, resource monitoring,pod lifecycle event generation, volume management, network setup, and CRI communications - Kube-Proxy (
kube-proxy
): A network proxy running on each node, maintaining network rules and enabling communication to your pods from network sessions inside or outside of your cluster. - Container Runtime: The software responsible for running containers (e.g., Docker, containerd, CRI-O).
3. Add-ons
Kubernetes functionality is extended with add-ons, which are pods and services implementing cluster features.
- DNS: A DNS server for Kubernetes services, required by many examples.
- Web UI (Dashboard): A general-purpose, web-based UI for Kubernetes clusters.
- Container Resource Monitoring: Records generic time-series metrics about containers in a central database.
- Cluster-Level Logging: Responsible for saving container logs to a central log store with search/browsing interface.
4. Networking
- Pod Networking: Each Pod is assigned a unique IP address. Pods on a node can communicate with all pods on all nodes without NAT.
- Service Networking: Provides a stable IP address and DNS name entry for managing access to a set of pods.
5. Storage
- Volumes: A directory, possibly with data in it, accessible to the containers in a pod.
- Persistent Volumes: An abstraction of storage resources (like AWS EBS, NFS, etc.) that provides a way to store data beyond the lifecycle of individual pods.
This architecture allows Kubernetes to efficiently manage containerized applications across a cluster of machines, handling scheduling, scaling, updating, and maintenance of applications in a reliable and automated manner.
K8s Controllers
Kubernetes includes a variety of built-in controllers, each designed to handle specific tasks within the cluster. This post introduce some Key Kubernetes controllers, each with a brief explanation and a basic YAML example for using them:
1. ReplicaSet Controller
- Explanation: Ensures that a specified number of pod replicas are running at any given time.
- YAML Example:
apiVersion: apps/v1
kind: ReplicaSet
metadata:
name: example-replicaset
spec:
replicas: 3
selector:
matchLabels:
app: example
template:
metadata:
labels:
app: example
spec:
containers:
- name: nginx
image: nginx
2. Deployment Controller
- Explanation: Manages ReplicaSets and provides declarative updates to Pods along with scaling and rollback functionality.
- YAML Example:
apiVersion: apps/v1
kind: Deployment
metadata:
name: example-deployment
spec:
replicas: 3
selector:
matchLabels:
app: example
template:
metadata:
labels:
app: example
spec:
containers:
- name: nginx
image: nginx
3. StatefulSet Controller
- Explanation: Manages stateful applications, maintaining a stable identity and order for each pod.
- YAML Example:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: example-statefulset
spec:
serviceName: "example"
replicas: 3
selector:
matchLabels:
app: example
template:
metadata:
labels:
app: example
spec:
containers:
- name: nginx
image: nginx
4. DaemonSet Controller
- Explanation: Ensures that all (or some) nodes run a copy of a specified pod.
- YAML Example:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: example-daemonset
spec:
selector:
matchLabels:
app: example
template:
metadata:
labels:
app: example
spec:
containers:
- name: nginx
image: nginx
5. Job Controller
- Explanation: Manages jobs that run to completion (e.g., batch jobs).
- YAML Example:
apiVersion: batch/v1
kind: Job
metadata:
name: example-job
spec:
template:
spec:
containers:
- name: perl
image: perl
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never
backoffLimit: 4
6. CronJob Controller
- Explanation: Manages time-based jobs, similar to cron in Unix/Linux.
- YAML Example:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: example-cronjob
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
args:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster
restartPolicy: OnFailure
- Service Controller: Manages the backend logic of Kubernetes services. It handles the network routing for load balancing and service discovery.
A Service in Kubernetes is an abstraction which defines a logical set of Pods and a policy by which to access them. Services allow your applications to receive traffic and enable communication between different parts of your application as well as with external applications.
- ClusterIP: This is the default Kubernetes Service. It provides a service inside your cluster that only apps inside your cluster can access. The Service gets its own IP address, and when accessed, it load-balances traffic to the underlying Pods, but not accessible form external.
- LoadBalancer: This Service type integrates with your cloud provider’s load balancer. It forwards external traffic to the NodePort and ClusterIP services, effectively making the Service accessible from the internet.
- NodePort: This type of Service exposes the Service on each Node’s IP at a static port. It makes a Service accessible from outside the cluster using
<NodeIP>:<NodePort>
. A ClusterIP Service, to which the NodePort Service routes, is automatically created. - ExternalName: This Service type maps a Service to a DNS name, rather than to a typical selector such as
my-service
. (See: https://juejin.cn/post/7223981150775459897)
- Endpoint Controller: Populates the Endpoints object (that is, joins Services & Pods).
- Namespace Controller: Handles creation and deletion of namespaces.
In Kubernetes, a Namespace is a mechanism to partition cluster resources into multiple virtual clusters. It’s a way to provide a scope for names. Use of multiple Namespaces is a way to divide cluster resources between multiple users (via resource quota). It provides Organization and Isolation, Resource Management, Access Control, Network policies, etc. - PersistentVolume Controller: Handles the binding of PersistentVolumeClaims (PVCs) to PersistentVolumes (PVs).
- Horizontal Pod Autoscaler (HPA) Controller: Automatically scales the number of pods in a replication controller, deployment, replica set or stateful set based on observed CPU utilization.
Each of these controllers is used to manage different aspects of pod lifecycle and behavior in a Kubernetes cluster. The YAML examples provide a basic template for deploying each kind of controller, which can be further customized based on specific use cases and requirements.
StatefulSet
A StatefulSet is a Kubernetes workload API object used to manage stateful applications. It manages the deployment and scaling of a set of Pods and provides guarantees about the ordering and uniqueness of these Pods. Unlike a Deployment, a StatefulSet maintains a sticky identity for each of their Pods. This means that each Pod is created based on the same spec, but is not interchangeable; each has a persistent identifier maintained across any rescheduling.
Here are some key characteristics and uses of StatefulSets:
- Stable, Unique Network Identifiers: Each Pod in a StatefulSet derives its hostname from the name of the StatefulSet and the ordinal of the Pod. The pattern for the constructed hostname is
$(statefulset name)-$(ordinal)
. For example, if a StatefulSet is namedmyapp
, its Pods will have names likemyapp-0
,myapp-1
, and so on. This is particularly useful for stateful applications like databases, where each instance needs to have a stable identity. - Stable, Persistent Storage: StatefulSets allow each Pod to be associated with its own Persistent Volume (PV) and Persistent Volume Claim (PVC). This means that if a Pod is deleted and then rescheduled, it can reattach to its existing data, maintaining the state.
StatefulSets use Persistent Volume Claims to provide persistent storage to each Pod. These PVCs are also tied to the Pod’s identity. When a Pod is rescheduled or restarted, it reattaches to its PVC, ensuring data persistence. The PVCs are named in a way that associates them with the specific Pod, further tying the storage to the Pod’s stable identity. - Ordered, Graceful Deployment and Scaling: When Pods are deployed or deleted, StatefulSets do so in a predictable and ordered manner. This is important for stateful applications where the startup, shutdown, and scaling order might be crucial.
- Ordered, Automated Rolling Updates: For updates, StatefulSets support automated rolling updates. You can update the container image, configuration, and other properties of the Pods, and these changes will be applied to each Pod sequentially, in order.
- Use Cases: StatefulSets are ideal for applications such as databases (like MySQL, PostgreSQL), clustered software (like Elasticsearch, Kafka), and any other application that requires stable network identifiers, stable storage, and ordered, graceful deployment and scaling.
Restarting a StatefulSet in Kubernetes does not change the network identifiers of the Pods within it. The network identifiers, which include the hostnames and potentially other network settings of the Pods, remain consistent through restarts. This stability is one of the key features of StatefulSets and is crucial for stateful applications that rely on a fixed identity.
Kubernetes Control Plane: The Kubernetes control plane ensures that the desired state of the StatefulSet is maintained. When a Pod in a StatefulSet is restarted, the control plane ensures that it is brought up with the same identity and configuration as before.
Container Network Interface(CNI) in k8s
Kubernetes CNI (Container Network Interface) is an important concept in the realm of container orchestration and networking. It is a set of standards and libraries that facilitate the configuration of network interfaces for Linux containers.
CNI’s primary function is to connect container networking from different pods to the host network, ensuring seamless communication. CNI allows for a plug-and-play model, where various networking solutions can be used interchangeably without altering the core Kubernetes code.
The CNI project under the CNCF (Cloud Native Computing Foundation) standardizes how container networking should be configured, ensuring compatibility and ease of swapping networking providers. There are many CNI plugins available, like Flannel, Calico, Weave, etc., each offering different features and performance characteristics.
Istio extends Kubernetes networking to provide a full service mesh solution, offering advanced routing, resiliency, and security features for microservices communication. Cilium leverages eBPF, a powerful Linux kernel technology, to dynamically insert and run bytecode within the kernel, enabling high-performance networking and security policies.
Using Kubernetes CNI
- Network Policy Implementation: Kubernetes CNI plugins can be used to implement network policies to control the communication between different pods within a cluster.
- Multi-Host Networking: In a Kubernetes cluster spread across multiple hosts, CNI plugins enable the setup of an overlay network that allows pods on different hosts to communicate as if they were on the same physical network.
How Kubernetes CNI Works
- Pod Lifecycle: When a pod is created or deleted, Kubernetes invokes the CNI plugins configured in the cluster.
- Network Namespace: CNI plugins attach network interfaces to the pod’s network namespace and configure IP addresses, routing rules, and DNS settings.
- Compatibility: CNI plugins are designed to be compatible with various network models like bridge, overlay, underlay, etc.
Kubernetes CNI and Istio
- Istio Integration: Istio, a service mesh, can be integrated with Kubernetes using CNI. This combination enhances network security, observability, and traffic management in Kubernetes environments.
- Sidecar Injection: With Istio, a sidecar proxy is injected into Kubernetes pods. The CNI ensures that network traffic is routed through this proxy for advanced monitoring and policy enforcement.
Kubernetes CNI and Cilium
- Cilium as a CNI Plugin: Cilium, which uses eBPF technology, can be used as a CNI plugin in Kubernetes. It offers advanced features like API-aware network security, load balancing, and visibility.
- Network and Security Policies: Cilium enhances Kubernetes by providing more granular network and security policy enforcement capabilities at the pod level.
Example CNI Configurations
Configuring CNI (Container Network Interface) plugins in Kubernetes involves defining the networking behavior for pods. Below are examples of configuration for some popular CNI plugins:
Flannel CNI Configuration
{
"cniVersion": "0.3.1",
"name": "flannel",
"type": "flannel",
"delegate": {
"isDefaultGateway": true
}
}
Cilium CNI Configuration
{
"cniVersion": "0.3.1",
"name": "cilium",
"type": "cilium-cni",
"enable-debug": true,
"eni": {
"first-interface-index": 0
},
"ipam": {
"type": "host-local",
"subnet": "10.10.0.0/16",
"routes": [
{ "dst": "0.0.0.0/0" }
]
}
}
Overview of Pod Network Configuration Process
Pod Creation: When a new pod is scheduled for creation, the kubelet on the designated node initiates the pod creation process.
- CNI Configuration Discovery:
- The kubelet reads the network configuration file, typically located in
/etc/cni/net.d
. This file specifies which CNI plugin to use along with its configuration settings. - The identified CNI plugin is then loaded by the kubelet.
2. Executing the CNI Plugin:
- The kubelet invokes the CNI plugin’s “ADD” command, passing along the pod’s network configuration parameters.
- This command is executed with the binary of the CNI plugin, usually found in
/opt/cni/bin
.
3. Network Namespace and Interface Setup:
- The CNI plugin is responsible for creating a network namespace for the pod. This namespace is separate from other pods and the host network.
- Within this namespace, the plugin sets up network interfaces (such as a veth pair), configures IP addresses, routing, and firewall rules based on the parameters provided by the kubelet.
4. IP Address Management (IPAM):
- The CNI plugin often uses an IPAM plugin to allocate an IP address for the pod.
- This includes setting up necessary routes and DNS settings within the pod’s network namespace.
5. Saving Network Configuration:
- After the network is configured, the kubelet saves the actual network configuration parameters used by the CNI plugin.
- These parameters are stored in a file located in the pod’s network namespace, typically found in the
/var/run/netns/
directory on the node.
6. Network Policy Enforcement:
- If network policies are defined in the cluster, they are applied at this stage to control the pod’s network traffic.
7. Cleanup on Pod Deletion:
- Upon pod deletion, the kubelet calls the CNI plugin with the ‘DEL’ command to clean up the network interfaces and IP allocations.
Key Points
- Flexibility and Modularity: Different CNI plugins may handle these steps with variations, offering a range of features like network policy enforcement or overlay networking.
- Robust Error Handling: Kubernetes ensures any failures in the process are handled gracefully, with attempts to roll back changes to maintain system consistency.
This integrated process highlights the critical role of CNI in Kubernetes networking, enabling a standardized yet flexible approach to network configuration and management in containerized environments.
Container Communications in K8s
Containers in the same pod in Kubernetes communicate with each other using inter-process communication (IPC) methods, as they share the same network namespace. This means that they can communicate using localhost networking. Here’s how it works:
- Shared Network Space: Containers in the same pod share the same IP address and port space. Therefore, they can communicate with each other using
localhost
as the hostname. For example, if one container is serving on port 5000, other containers in the same pod can access that service usinglocalhost:5000
. - IPC Mechanisms: Since they are in the same pod, containers can also use standard inter-process communication mechanisms like System V IPC or POSIX message queues.
- Volume Sharing: Pods can also share data between containers through shared volumes. Kubernetes volumes can be mounted in multiple containers within the same pod, allowing those containers to share files.
- Environment Variables and ConfigMaps: Containers in the same pod can access shared configuration data, like environment variables or ConfigMaps, which can be used for communication settings or shared configuration data.
- No Need for External Networking: This internal communication doesn’t need to go through the Kubernetes networking layer, which means it’s typically faster and more secure since it doesn’t have to leave the pod.
- Service Discovery: While not used for direct container-to-container communication within the same pod, Kubernetes provides service discovery features that are useful for communication across different pods.
Remember, this close coupling of containers in a pod is by design, as Kubernetes considers a pod to be the smallest deployable unit that can be created, scheduled, and managed. It’s meant for containers that are tightly coupled and need to share resources closely.
SA in K8s
Service accounts in Kubernetes (K8s) are a fundamental concept for managing access to K8s resources and services. Understanding their role, creation, usage, integration with external systems like AWS IAM, and debugging methods is crucial for effective Kubernetes management.
What is a Service Account in Kubernetes?
A service account is a special type of user account that is intended for processes and pods running in a Kubernetes cluster, rather than for human users. Service accounts provide an identity for processes in a namespace to authenticate and authorize their access to the Kubernetes API.
Key Characteristics:
- Namespace Scoped: Service accounts are namespace-specific.
- Automated API Access: They’re primarily used for automated processes, like running pods or jobs in the cluster.
- Bearer Tokens: Service accounts are associated with a set of credentials (bearer tokens) automatically, which can be used for API authentication.
- Integration with RBAC: These accounts are often used in conjunction with RBAC to define their permissions.
Creating a Service Account:
- Basic Creation: You can create a service account using the
kubectl
command:
kubectl create serviceaccount [service-account-name]
- Using YAML Configuration: For more control, you can define a service account in a YAML file:
apiVersion: v1
kind: ServiceAccount
metadata:
name: [service-account-name]
And apply it using kubectl apply -f [filename].yaml
.
Using a Service Account in Pods:
To use a service account in a pod, specify the service account name in the pod’s YAML:
apiVersion: v1
kind: Pod
metadata:
name: [pod-name]
spec:
serviceAccountName: [service-account-name]
containers:
- name: [container-name]
image: [container-image]
When you specify a service account in a pod configuration, the service account’s token is automatically mounted into the pod at /var/run/secrets/kubernetes.io/serviceaccount
.
Integrating with AWS IAM:
Service accounts can be integrated with AWS IAM roles for managing access to AWS resources. This is particularly useful for EKS (Elastic Kubernetes Service) clusters.
- IAM Roles for Service Accounts (IRSA): AWS supports associating an IAM role with a Kubernetes service account. This feature is called IAM Roles for Service Accounts (IRSA).
- Using Annotations: You annotate the Kubernetes service account with the ARN of the IAM role. The EKS cluster then provides temporary AWS credentials to the service account, which can be used by the pods to access AWS resources.
- Policy Binding: Ensure that the IAM role has the necessary policies bound to it for the required AWS services.
Debugging with Service Accounts:
- Check Token Mounting: Ensure that the service account token is properly mounted in the pod under
/var/run/secrets/kubernetes.io/serviceaccount
. - Verify Permissions: If a pod is failing to access Kubernetes resources, check that the associated service account has the correct roles and permissions.
- Inspect API Server Logs: If there are issues with service account authentication, check the Kubernetes API server logs for authentication errors.
- Use
kubectl describe
: You can usekubectl describe serviceaccount [service-account-name]
to inspect the service account and its bindings.
Best Practices:
- Principle of Least Privilege: Assign only the necessary permissions to a service account to minimize security risks.
- Avoid Default Service Account: Do not use the default service account for running workloads, as it may have broader permissions than necessary.
- Regular Auditing: Regularly audit service accounts and their associated roles and permissions.
- Secure Token Storage: Be cautious with how and where you store service account tokens, especially when integrating with external systems.
Maintenance and Upkeep:
- Update Policies: Regularly review and update the roles and permissions associated with service accounts.
- Rotate Credentials: Regularly rotate service account tokens for security.
- Monitor Usage: Monitor the usage of service accounts to detect any unusual activity or potential security breaches.
Service accounts are a powerful feature in Kubernetes, enabling secure and efficient interactions between your cluster and its workloads. Their integration with systems like AWS IAM enhances their utility in managing cloud resources, making them an essential part of Kubernetes cluster management and operations.
RBAC in K8s
Role-Based Access Control (RBAC) is a method of regulating access to computer or network resources based on the roles of individual users within an organization. In Kubernetes, RBAC is used to manage authorization decisions, allowing admins to dynamically configure policies through the Kubernetes API.
Configuring RBAC in Kubernetes:
1. Understand Kubernetes RBAC Objects:
- Role: Sets permissions within a specific namespace.
- ClusterRole: Sets permissions cluster-wide.
- RoleBinding: Assigns a Role to users or groups within a specific namespace.
- ClusterRoleBinding: Assigns a ClusterRole to users or groups cluster-wide.
2. Creating a Role:
- Define a Role in YAML with the required API Groups, Resources, and Verbs (actions allowed).
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
namespace: default
name: pod-reader
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "watch", "list"]
3. Creating a RoleBinding:
- Bind the Role to a specific user, group, or service account.
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: read-pods
namespace: default
subjects:
- kind: User
name: jane
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: Role
name: pod-reader
apiGroup: rbac.authorization.k8s.io
4. Applying the Configuration:
- Use
kubectl apply -f <filename.yaml>
to create the Role and RoleBinding.
5. Updating RBAC Rules:
- Edit your YAML files and reapply them.
- Use
kubectl edit role <role-name>
orkubectl edit rolebinding <rolebinding-name>
for quick updates.
Best Practices:
- Principle of Least Privilege: Only grant the minimum permissions necessary for a user or service to perform its tasks.
- Regular Auditing and Review: Regularly review and audit roles and bindings to ensure they are still necessary and correctly scoped.
- Namespace-Specific Roles: Prefer namespaced Roles and RoleBindings for specific needs rather than ClusterRoles, to minimize scope.
- Use Group Bindings: Bind roles to groups rather than individual users where possible, for easier management.
- Clear Naming Conventions: Use clear, descriptive names for Roles and Bindings to make their purpose obvious.
- Documentation: Document your RBAC policies, including why specific permissions are granted, for future reference and team understanding.
Maintenance:
- Monitoring: Regularly monitor and log RBAC events to detect any unauthorized access attempts or misconfigurations.
- Update and Migration: Stay updated with Kubernetes releases, as RBAC policies might need adjustments with new versions.
- Backup Policies: Backup your RBAC configurations as part of your cluster’s regular backup routine.
Additional Considerations:
- Security Contexts: In addition to RBAC, use Kubernetes Security Contexts for fine-grained security settings at the Pod or Container level.
- Testing: Test your RBAC rules in a development environment before applying them to production.
- Integration with External Identity Providers: For larger systems, consider integrating Kubernetes RBAC with external identity providers for centralized user management.
By following these steps and best practices, you can effectively configure, maintain, and update RBAC in Kubernetes to ensure a secure and well-managed environment.
Selectors in K8s
In Kubernetes, selectors are used to filter resources based on their labels. Selectors enable users to manage, view, and filter resources within the cluster. They are essential components of various Kubernetes objects, such as ReplicaSets
, Deployments
, Services
, and more.
Here’s how selectors work:
Types of Selectors
- Equality-Based Selectors
- Use
=
,==
, or!=
to filter resources. - Example:
app=frontend
,env!=production
- Set-Based Selectors
- Use
in
,notin
, andexists
to filter resources. - Example:
env in (production, qa)
,tier notin (backend)
,partition
Using Selectors
1. Resource Querying
You can use selectors when querying resources via kubectl
.
kubectl get pods -l 'environment=production,tier!=frontend'
This command will get all the pods with a label environment=production
and not having a label tier
with the value frontend
.
2. Resource Configuration
In the YAML configuration of a resource like a ReplicaSet
or Service
, you will specify the selector. Here is an example of a ReplicaSet
definition:
apiVersion: apps/v1
kind: ReplicaSet
metadata:
name: frontend
spec:
replicas: 2
selector:
matchLabels:
app: frontend
template:
metadata:
labels:
app: frontend
spec:
containers:
- name: nginx
image: nginx:1.14.2
In this example, the ReplicaSet
will manage Pods with the label app=frontend
. It specifies this under spec.selector.matchLabels
.
3. Service Routing
Services
use selectors to route traffic to the right set of Pods. Here’s an example of a Service configuration:
apiVersion: v1
kind: Service
metadata:
name: frontend-service
spec:
selector:
app: frontend
ports:
- protocol: TCP
port: 80
targetPort: 8080
This Service
will route traffic to Pods with the label app=frontend
.
Selector Limitations
- Immutability: Once you create a selector for some resource types like
Services
andReplicaSets
, you can’t change them. Make sure your selectors are correctly defined. - Uniqueness: The same set of selectors should not be used across multiple resources that do the same thing (like Services), to avoid unwanted behavior.
- No OR logic: Kubernetes selectors don’t support OR logic directly. For example, you can’t have a selector that matches resources with label
app=frontend
ORapp=backend
.
Selectors are a powerful tool for managing resources in Kubernetes. Understanding how to use them effectively is key to operating Kubernetes efficiently.
Compare Label and Annotations
In Kubernetes (K8s), both labels and annotations are key-value pairs associated with Kubernetes objects, but they are used for different purposes and are subject to different operational constraints. Below are the primary distinctions:
Labels:
- Purpose: Labels are used to identify, select and group Kubernetes resources. They are commonly used to filter resources and can be used by users as well as by internal system components.
- Selectors: Labels can be used to select objects and resources. For example, you can select all pods with
app=frontend
using label selectors. - Mutability: Labels can be updated dynamically for most of the Kubernetes objects.
- Constraints: The key-value pairs in labels must be in a certain format. For example, keys have a max length of 63 characters, and values have a max length of 63 characters. There are additional format constraints.
- API Search: Labels can be used to filter resources via Kubernetes API calls.
- Built-in Features: Labels are core to many built-in Kubernetes features, such as ReplicaSets and Services, which use label selectors to manage the pods they operate on.
Annotations:
- Purpose: Annotations are used to attach arbitrary non-identifying metadata to objects. They are more for storing additional information that is not essential for selection or grouping but can be useful for tools and libraries working within your cluster.
- Selectors: Annotations are not used for object selection, so they can’t be used in selectors like labels can.
- Mutability: Annotations can also be modified dynamically, but the system won’t automatically notice or react to changes.
- Constraints: Annotations are more flexible and can hold larger pieces of data, including but not limited to JSON blobs. There are fewer format restrictions.
- API Search: Annotations are not meant to be used to filter or search for resources via Kubernetes API calls.
- Built-in Features: Generally not used by Kubernetes’ built-in features, but they can be essential for third-party tools (e.g., Helm uses annotations to keep track of releases).
Summary:
- Labels are for grouping and selecting resources.
- Annotations are for adding additional metadata and information.
Both labels and annotations are defined in the metadata section of a Kubernetes object definition.
PV and PVC in K8s
In Kubernetes, PV (Persistent Volume) and PVC (Persistent Volume Claim) are used for managing storage resources in the cluster. They abstract the details of how storage is provided and how it is consumed. Let’s break down what each component is for:
Persistent Volume (PV)
A Persistent Volume (PV) is a piece of storage that has been provisioned by an administrator or dynamically provisioned using Storage Classes. PVs are cluster-wide resources that can be used by any node in the Kubernetes cluster. They exist independently of Pods and have a lifecycle that is separate from the Pods that use them. They can be backed by various types of storage like block storage, file storage (like NFS), or even cloud storage (like AWS EBS, GCP Persistent Disk, Azure Disk, etc.).
A PV is defined using a YAML or JSON definition file that specifies properties such as capacity, access modes (ReadWriteOnce, ReadOnlyMany, ReadWriteMany), and more.
Persistent Volume Claim (PVC)
A Persistent Volume Claim (PVC) is a request for storage by a user. It is similar to a Pod: Pods consume node resources and PVCs consume PV resources. PVCs request a specific size and access modes for a Persistent Volume. Once a PVC is bound to a PV, that PV can’t be bound to another PVC (in the case of access mode ReadWriteOnce
).
When a Pod wants to use a piece of durable storage, it will specify a PVC in its Pod definition. Kubernetes will then automatically mount the corresponding PV into the Pod.
How They Work Together
- Admin Provisioning: An administrator creates one or more PVs in the cluster, which represent actual storage resources.
- User Request: When a user (or a Pod) needs storage, they create a PVC.
- Binding: The Kubernetes control plane then binds an available PV to the PVC. The binding is a one-to-one mapping.
- Pod Uses PVC: A Pod can then specify the PVC as part of its specification to mount the bounded PV as a filesystem.
- Pod Termination: When the Pod is terminated, the PVC still exists, and the data is retained in the PV.
- PVC Deletion: If a PVC is deleted, the corresponding PV can be reclaimed, released, or retained based on the reclaim policy.
This separation of PV and PVC allows for a flexible management system where storage resources can be provisioned, utilized, and monitored independently of the Pods and applications that use them.
Here are some example to show Persistent Volumes (PV) and Persistent Volume Claims (PVC) can be used in Kubernetes. This example assumes that you have a Kubernetes cluster up and running.
1. Creating a Persistent Volume (PV)
Here is a sample YAML file that defines a Persistent Volume. Save this file as my-pv.yaml
.
apiVersion: v1
kind: PersistentVolume
metadata:
name: my-pv
spec:
storageClassName: manual
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/mnt/data"
Run the following command to create the Persistent Volume:
kubectl apply -f my-pv.yaml
2. Creating a Persistent Volume Claim (PVC)
Here is a sample YAML file that defines a Persistent Volume Claim. Save this file as my-pvc.yaml
.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-pvc
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
Run the following command to create the Persistent Volume Claim:
kubectl apply -f my-pvc.yaml
3. Using PVC in a Pod
Finally, let’s create a Pod that uses the PVC. Save this file as my-pod.yaml
.
apiVersion: v1
kind: Pod
metadata:
name: my-pod
spec:
containers:
- name: my-container
image: nginx
volumeMounts:
- name: my-storage
mountPath: /usr/share/nginx/html
volumes:
- name: my-storage
persistentVolumeClaim:
claimName: my-pvc
Run the following command to create the Pod:
kubectl apply -f my-pod.yaml
This will create a Pod named my-pod
that mounts the PVC my-pvc
at the /usr/share/nginx/html
directory. The PVC my-pvc
is bound to the PV my-pv
, so effectively, the Pod is using the storage defined in my-pv
.
4. Verifying
You can verify that everything is set up correctly by using the following commands:
- List Persistent Volumes:
kubectl get pv
- List Persistent Volume Claims:
kubectl get pvc
- List Pods:
kubectl get pods
Note: The above example uses hostPath
for the sake of simplicity. In production environments, you’ll likely use more advanced storage solutions like NFS, iSCSI, or cloud-based storage.
CRD in K8s
CRD stands for Custom Resource Definition in Kubernetes (also known as k8s). It is an extension of the Kubernetes API that allows you to create your own Custom Resources. CRDs act as a type of blueprint that enables the Kubernetes API to handle the custom resource that you’re trying to add. With them, you’re able to create sophisticated, stable native applications that use the Kubernetes style declarative configuration and are maintained through Kubernetes tooling.
CRDs are essentially a way to extend the Kubernetes API and define new object types that are specific to your application or domain. Once you define a CRD, Kubernetes recognizes it as a new resource type and allows you to create, update, and delete instances of that resource using the Kubernetes API.
Here’s a high-level overview of how CRDs work:
- Define a CRD: You define a CRD by creating a CustomResourceDefinition object in Kubernetes. This object specifies the structure and behavior of the custom resource, including its API schema, validation rules, and any additional metadata or behavior you want to associate with it.
- Create instances: Once the CRD is defined, you can create instances of the custom resource by creating objects of the new resource type. These instances conform to the structure defined in the CRD.
- Controller logic: To manage the custom resources, you typically implement a custom controller that watches for changes in the custom resource instances. The controller reacts to these changes and performs actions based on the desired state of the resources. For example, the controller might create or delete other Kubernetes resources, interact with external systems, or perform any required operations to ensure the custom resources are properly managed.
- Kubernetes API integration: The custom resources created using CRDs can be managed using the Kubernetes API, just like any other native Kubernetes resources. This means you can use Kubectl or any Kubernetes client to interact with the custom resources, such as creating, updating, or deleting instances.
CRDs have gained popularity because they provide a way to extend Kubernetes in a structured and declarative manner. They allow you to define custom resources that align with your application’s specific requirements and provide a more intuitive and consistent interface for managing complex applications and services within the Kubernetes ecosystem.
Here’s a high-level process on how to use them:
Step 1: Defining your CRD
First, you need to define your custom resource. This definition will tell Kubernetes what your resource is and how it should handle it. Below is a very simple example of a CRD:
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: foos.samplecontroller.k8s.io
spec:
group: samplecontroller.k8s.io
versions:
- name: v1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
deploymentName:
type: string
replicas:
type: integer
scope: Namespaced
names:
plural: foos
singular: foo
kind: Foo
shortNames:
- f
In this example, a new Foo
kind is being created. Any instances of this kind can now be created, read, updated, and deleted in Kubernetes like any other resource.
Step 2: Create an instance of your custom resource
With the CRD in place, you can now create an instance of your custom resource:
apiVersion: "samplecontroller.k8s.io/v1"
kind: Foo
metadata:
name: example-foo
spec:
deploymentName: example-foo
replicas: 3
This creates a Foo
resource with the name example-foo
.
Here’s another quick example of another CustomResourceDefinition YAML file might look like:
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: crontabs.example.com
spec:
group: example.com
versions:
- name: v1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
cronSpec:
type: string
image:
type: string
replicas:
type: integer
scope: Namespaced
names:
plural: crontabs
singular: crontab
kind: CronTab
shortNames:
- ct
The actual custom resources would then look something like this:
apiVersion: "example.com/v1"
kind: CronTab
metadata:
name: my-new-cron-object
spec:
cronSpec: "* * * * */5"
image: my-cron-image
replicas: 1
Remember, these custom resources and their controllers run in user space, not in the Kubernetes API server itself. It’s up to the custom resource’s controller to validate and act on these resources. For example, you could have a controller that creates a pod for each CronTab resource, and it’s up to this controller to create the pod, not the Kubernetes API server.
How to update CRD
You can update the CRD just like you would with any other Kubernetes resource, using kubectl apply
. Please keep in mind that not all fields are mutable for a live object, so sometimes a full delete and create might be necessary.
Do we need to remove existing ones?
You don’t necessarily need to remove existing ones. Whether you do or not depends on your specific use case. If you’re just updating the structure of your custom resource, you can apply the changes with kubectl apply
. However, be aware that if you delete a CRD, all instances of your custom resource in your cluster will also be deleted.
In Kubernetes 1.16 and onwards, structural schemas are required for CRDs. A structural schema is a CRD specification that contains a certain set of fields and does not include other unsupported types. This is intended to make CRDs more reliable and secure. Any non-structural CRDs will need to be updated to become structural, which might require deleting and recreating them. Be sure to back up any important data before doing so.
Here’s an example of how a CustomResourceDefinition (CRD) can be used in Kubernetes:
Let’s say you want to define a CRD for managing custom “Book” resources in your Kubernetes cluster.
- Define the CRD:
Create a YAML file namedbook-crd.yaml
with the following content:
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: books.sample.com
spec:
group: sample.com
versions:
- name: v1
served: true
storage: true
scope: Namespaced
names:
plural: books
singular: book
kind: Book
shortNames:
- bk
- Create the CRD:
Apply the CRD definition to your Kubernetes cluster using the following command:
kubectl apply -f book-crd.yaml
- Create a custom Book resource:
Now that the CRD is defined, you can create instances of the custom Book resource. Create a YAML file namedbook-example.yaml
with the following content:
apiVersion: sample.com/v1
kind: Book
metadata:
name: my-book
spec:
title: Kubernetes in Action
author: John Doe
Apply the custom Book resource to your cluster:
kubectl apply -f book-example.yaml
- Verify the custom resource:
You can verify that the custom Book resource is created successfully:
kubectl get books
This will display the details of the custom Book resource, including its name, version, and other properties.
By using CRDs, you have defined a new resource type, “Book,” in Kubernetes. This allows you to create, update, and manage Book resources using the Kubernetes API, just like any other native Kubernetes resource. The CRD provides a structured way to define the properties, validation rules, and behavior of the Book resource, making it easier to work with custom resources within your Kubernetes cluster.
Reference: https://www.cncf.io/blog/2022/06/15/kubernetes-operators-what-are-they-some-examples/
Sidecar Pattern as a Proxy service in K8s
In Kubernetes, the sidecar pattern involves deploying an additional container alongside your main application container within the same Pod. This sidecar container can augment or enhance the main container by providing extra functionality that the main container might not inherently possess.
A common use-case for a sidecar proxy in Kubernetes is for service-mesh architectures like Istio, Linkerd, etc., where the sidecar proxies handle traffic management, telemetry, monitoring, and other cross-cutting concerns.
Here’s a step-by-step explanation of how a sidecar proxy works in K8s to process an HTTP request:
- Pod Deployment: First, you deploy your application in a Kubernetes Pod. A Pod is the smallest deployable unit in Kubernetes and can contain one or more containers. In this case, you’ll have at least two containers in the Pod: your main application container and the sidecar proxy container.
- Proxy Configuration: You configure the sidecar proxy with the necessary settings. This configuration can be done using environment variables, command-line arguments, or configuration files. Some common configuration options include specifying which services the proxy should route traffic to, defining routing rules, and enabling encryption or authentication.
- HTTP Request Initiation: An external client (e.g., a user’s web browser) or another service sends an HTTP request to your application. The request is typically sent to a Service in Kubernetes, which acts as a load balancer and forwards the request to one of the Pods running your application.
- Service Discovery: When the request arrives at the Kubernetes Service, it needs to be routed to one of the Pods running your application. This is where the sidecar proxy comes into play. The proxy uses service discovery mechanisms (e.g., DNS resolution or K8s service endpoints) to determine the target Pod’s IP address and port.
- Traffic Routing: The sidecar proxy intercepts the incoming HTTP request. It uses its configuration to decide how to route the request. This may involve applying routing rules, load balancing policies, or other advanced traffic management features. For example, you might configure the proxy to route requests based on HTTP headers or perform canary deployments.
- Proxying the Request: The sidecar proxy forwards the HTTP request to the appropriate instance of your application container. This forwarding can be done within the same Pod using localhost communication, making it efficient and fast.
- Application Processing: Your application container receives the HTTP request and processes it. It generates an HTTP response, which is then sent back to the sidecar proxy.
- Response Handling: The sidecar proxy intercepts the HTTP response from your application container. It may perform additional tasks such as response modification, response caching, or response compression based on its configuration.
- Returning the Response: The sidecar proxy sends the HTTP response back to the Kubernetes Service, which in turn forwards it to the client or the next service in the chain.
- Observability and Metrics: Throughout this process, the sidecar proxy can collect telemetry data and metrics about the traffic it’s handling. This information is valuable for monitoring and debugging purposes and can be sent to observability tools or dashboards.
- Security: The sidecar proxy can also enforce security policies, such as mutual TLS authentication between services, to ensure secure communication.
In summary, a sidecar proxy in Kubernetes serves as an intermediary between your application containers and the external world, providing advanced networking and communication features while allowing you to configure and manage traffic routing, security, and observability. It plays a crucial role in building resilient and scalable microservices architectures within Kubernetes clusters.
Let’s go step by step on how this works, using Istio’s Envoy proxy as an example.
1. Setup & Installation
To use a sidecar proxy in Kubernetes, you would first need to install and set up the service mesh control plane. For Istio, it involves deploying the Istio control plane to your cluster.
# istio-setup.yaml (This is a simplified example; the real setup might be more extensive)
apiVersion: v1
kind: Namespace
metadata:
name: istio-system
# ... other Istio control plane components like Pilot, Mixer, etc.
After setting up Istio, you’d enable automatic sidecar injection for your namespace or add sidecars manually to each pod spec.
2. Enabling automatic sidecar injection for a namespace
# namespace-setup.yaml
apiVersion: v1
kind: Namespace
metadata:
name: my-namespace
labels:
istio-injection: enabled
3. Deploy an application
With the namespace configured for automatic sidecar injection, any new pod started in that namespace will have the sidecar injected.
# app-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
replicas: 1
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: my-app
image: my-app-image:latest
When you deploy this application to the my-namespace
, a sidecar container (Envoy proxy) will be added to each pod alongside the my-app
container.
How a HTTP request is processed with the sidecar:
- Ingress: A client sends an HTTP request to your application.
- Sidecar Proxy: Before reaching your application, the request is intercepted by the sidecar proxy (Envoy) in the same pod.
- Routing: The proxy looks at the request and uses Istio’s routing rules to decide how to forward it. These rules can enforce policies, gather telemetry data, etc.
- To the Application: The request is then forwarded to your application container in the same pod.
- Response: After processing the request, your application sends a response back. This response is again intercepted by the sidecar.
- Post-Processing: The sidecar might modify the response or gather more telemetry data.
- To the Client: Finally, the response is sent back to the client.
Comments:
- The application code doesn’t need to be aware of the sidecar. It assumes it’s communicating directly with external clients.
- Sidecars bring a unified way to handle cross-cutting concerns like retries, timeouts, telemetry, and security without making changes to application code.
- While the example used Istio and Envoy, the basic principle applies to other service meshes and sidecars.
This is a high-level overview, and there’s a lot more that goes into a service mesh and sidecar proxy, like setting up routing rules, fault injection, securing communication, etc. But this should give you a foundational understanding of the flow and functionality.
Kubernetes Operators
A Kubernetes Operator is a method of packaging, deploying, and managing a Kubernetes application. An Operator builds upon the basic Kubernetes resource and controller concepts but includes domain or application-specific knowledge to automate common tasks. It’s designed to handle more complex, stateful applications which require additional operational knowledge.
How Kubernetes Operators Work
- Custom Resource Definitions (CRDs): Operators extend Kubernetes’ functionality using custom resources. A Custom Resource Definition (CRD) allows you to define a custom resource, which is essentially a new kind of Kubernetes object specific to your application.
- Controller: Each Operator has a specific controller that watches for changes to its custom resources and adjusts the state of the application accordingly. The controller is where the operational logic of the Operator resides.
- Operational Knowledge: Operators encode operational knowledge – how an application is deployed, upgraded, and managed in various scenarios – into software. They aim to automate complex tasks that typically require human intervention.
How to Use a Kubernetes Operator
- Install an Operator: You can find and install existing Operators from places like OperatorHub.io. Installation typically involves applying the Operator’s CRDs and then deploying the Operator itself.
- Create a Custom Resource (CR): After installing the Operator, you define your application instances by creating custom resources. This is often done through a YAML file that specifies the desired state of your application.
- Observe the Operator in Action: Once the CR is created, the Operator’s controller will detect it and take action to bring the application to the desired state. The Operator continues to monitor the application and reacts to any changes, either in the CR or in the state of the application itself.
- Update and Manage: To update or manage the application, you make changes to the CR. The Operator will respond to these changes by applying them to the application.
- Custom Operator Development: If no existing Operator meets your needs, you can develop your own using frameworks like Operator SDK or KubeBuilder. This involves writing custom controllers and defining CRDs that encapsulate operational knowledge of your application.
Let’s consider a basic example of a Kubernetes Operator for managing a simple database application like PostgreSQL.
Step 1: Define a Custom Resource Definition (CRD)
First, you define a CRD to create a new resource type in Kubernetes. This new resource type could be named PostgreSQL
.
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: postgresqls.database.example.com
spec:
group: database.example.com
versions:
- name: v1
served: true
storage: true
scope: Namespaced
names:
plural: postgresqls
singular: postgresql
kind: PostgreSQL
shortNames:
- pg
Step 2: Develop an Operator’s Controller Logic
You then create the Operator’s controller logic. This controller will watch for events related to PostgreSQL
custom resources and manage the lifecycle of a PostgreSQL database accordingly.
This involves tasks like:
- Creating a StatefulSet to run PostgreSQL instances.
- Setting up storage with Persistent Volumes.
- Managing configurations and secrets.
- Performing backups and restorations.
- Handling version upgrades.
This logic is typically written in Go and makes use of client libraries to interact with Kubernetes API.
Step 3: Deploy the Operator
Once you have the Operator’s code, you deploy it to your Kubernetes cluster. The Operator’s deployment manifest might look like this:
apiVersion: apps/v1
kind: Deployment
metadata:
name: postgresql-operator
spec:
replicas: 1
selector:
matchLabels:
name: postgresql-operator
template:
metadata:
labels:
name: postgresql-operator
spec:
containers:
- name: operator
image: your-registry/postgresql-operator:latest
Step 4: Create a Custom Resource (CR)
After deploying the Operator, you create an instance of PostgreSQL
by applying a CR.
apiVersion: database.example.com/v1
kind: PostgreSQL
metadata:
name: my-postgresql
spec:
version: "12.3"
storageSize: "10Gi"
Step 5: Operator in Action
Upon creating the PostgreSQL
CR, the Operator’s controller detects it and starts to create the necessary Kubernetes resources (like StatefulSet, PVCs, Services) to deploy a PostgreSQL instance according to the specifications in the CR.
The Operator will also monitor this PostgreSQL instance. If it detects a failure, it might automatically attempt to recover the database. If you update the CR (say, to upgrade the PostgreSQL version), the Operator will roll out the changes accordingly.
In this basic example, the PostgreSQL Operator automates the deployment and management of PostgreSQL instances in a Kubernetes environment. It illustrates the core concepts of how Kubernetes Operators work, including CRDs, custom resources, and the controller’s logic for managing the application’s lifecycle.