This document describes the requirements that need to be met to correctly configure a Kubernetes cluster on a Plexus instance
Note: In this article we refer to a user called plexus operating in a K8s namespace also called plexus. When applying these instructions you should change the name of both user and namespace to the ones you are using in your cluster.
1. Provide the kubeconfig file related to the user
Kubeconfig is a configuration file which contains the user certificate and a
key, both of which are obtained from the Certificate Authority.
1.1 S3 datasets support
In order to support S3 datasets mounts on the cluster, the plugin Datashim must be installed.
kubectl apply -f https://raw.githubusercontent.com/datashim-io/datashim/master/release-tools/manifests/dlf.yaml
In case the cluster has version lower than 1.18, it must be installed with this:
kubectl apply -f https://raw.githubusercontent.com/datashim-io/datashim/ac4a34b48048a60ed143fbcca1c88f87aab08172/release-tools/manifests/dlf.yaml
2. Configure the user with the right role
In order to allow Plexus to deploy on K8s, we need to apply the right role to
each namespace for our user. In the following listing you can see the required
role features:
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
namespace: plexus
name: plexus-manager
rules:
apiGroups: ["", "extensions", "apps"]
resources: ["deployments", "replicasets", "pods", "pods/log", "pods/exec",
"pods/portforward", "events", "deployments/status", "networkpolicies", "statefulsets"]
verbs: ["*"]
apiGroups: [""]
resources: ["services", "services/status"]
verbs: ["*"]
apiGroups: ["batch", "extensions"]
resources: ["jobs", "jobs/status"]
verbs: ["*"]
- apiGroups: [""]
resources: ["persistentvolumeclaims"]
verbs: ["*"]
- apiGroups: [""]
resources: ["secrets"]
verbs: ["*"]
- apiGroups: [""]
resources: ["resourcequotas","limitranges"]
verbs: ["list"]
- apiGroups: ["networking.k8s.io"]
resources: ["networkpolicies"]
verbs: ["*"]
- apiGroups: ["com.ie.ibm.hpsys"]
resources: ["datasets"]
verbs: ["*"]
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: plexus-manager-binding
namespace: plexus
subjects:
- kind: User
name: plexus
apiGroup: "rbac.authorization.k8s.io"
roleRef:
kind: Role
name: plexus-manager
apiGroup: "rbac.authorization.k8s.io"
3. Configure user with the right ClusterRole
In order to allow Plexus to obtain the required node and namespace resources,
we need to apply the right cluster role to each namespace for our user. In the
following listing you can see the required cluster role features. (Reminder:
This is a definition for a user called plexus in a namespace called
plexus, but you can change the names of both user and namespace to match the
ones you have in your cluster).
Note: The listing below shows the minimum required configuration – you
can also add more properties to the cluster role, like security policies and
others.
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
namespace: plexus
name: plexus-manager
labels:
rbac.authorization.k8s.io/aggregate-to-view: "true"
rules:
- apiGroups: [""]
resources: ["nodes", "namespaces"]
verbs: ["get", "list", "watch"]
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: plexus-manager-binding
namespace: plexus
subjects:
- kind: User
name: plexus
apiGroup: ""
roleRef:
kind: ClusterRole
name: plexus-manager
apiGroup: ""
4. Add Plexus labels to worker nodes
Plexus workload will be allocated in either GPU or CPU worker nodes depending on whether they require GPU architecture or not.
5. Add Plexus topology labels to nodes
To provide pod topology affinity, every node in the cluster must be labeled with topology.kubernetes.io/zone=X, where X is the node zone identifier.
kubectl label nodes nodeName topology.kubernetes.io/zone=X
6. Provide a Storage Class
In order to provide data persistence and to be able to share the data throughout the pods involved in the workload execution, we need to have a storage class with support for ReadWriteMany.
Kubernetes ReadWriteMany mode allows for volumes to be mounted as read-write by many nodes.
Plexus will create a persistent volume claim by user and mount it to every user pod.
7. The K8s cluster must have internet access.
8. Allow pods to copy files
Plexus copies internal files by using kubernetes cp.
9. Provide support for Deployments with ap iversion apps/v1
Plexus only runs deployments that are specified using the api version:
apps/v1.
10. Provide support for Jobs with api version batch/v1
Plexus only runs jobs that are specified using the api version:
batch/v1.
11. Support public exposure of pod ports
By default we are using Nodeport or LoadBalancer K8s services to expose pod
ports for public access.
Other ways that port exposure could be implemented with support from Plexus team: