Please enable Javascript in your browser.
NOW LOADING

Configure a K8s cluster to meet Plexus requirements

Configure a K8s cluster to meet Plexus requirements

This document describes the requirements that need to be met to correctly configure a Kubernetes cluster on a Plexus instance

Note: In this article we refer to a user called plexus operating in a K8s namespace also called plexus. When applying these instructions you should change the name of both user and namespace to the ones you are using in your cluster.

1. Provide the kubeconfig file related to the user
Kubeconfig is a configuration file which contains the user certificate and a key, both of which are obtained from the Certificate Authority.

1.1 S3 datasets support

In order to support S3 datasets mounts on the cluster, the plugin Datashim must be installed.

kubectl apply -f https://raw.githubusercontent.com/datashim-io/datashim/master/release-tools/manifests/dlf.yaml

In case the cluster has version lower than 1.18, it must be installed with this:

kubectl apply -f https://raw.githubusercontent.com/datashim-io/datashim/ac4a34b48048a60ed143fbcca1c88f87aab08172/release-tools/manifests/dlf.yaml

2. Configure the user with the right role
In order to allow Plexus to deploy on K8s, we need to apply the right role to each namespace for our user. In the following listing you can see the required role features:

kind: Role  
apiVersion: rbac.authorization.k8s.io/v1beta1  
metadata:  
  namespace: plexus  
  name: plexus-manager  
rules:  
  
apiGroups: ["", "extensions", "apps"]  
  resources: ["deployments", "replicasets", "pods", "pods/log", "pods/exec",
"pods/portforward", "events", "deployments/status", "networkpolicies", "statefulsets"]  
  verbs: ["*"]  
  
apiGroups: [""]  
  resources: ["services", "services/status"]  
  verbs: ["*"]  
  
apiGroups: ["batch", "extensions"]  
  resources: ["jobs", "jobs/status"]  
  verbs: ["*"]  
- apiGroups: [""]  
  resources: ["persistentvolumeclaims"]  
  verbs: ["*"]  
- apiGroups: [""]  
  resources: ["secrets"]  
  verbs: ["*"]  
- apiGroups: [""]  
  resources: ["resourcequotas","limitranges"]  
  verbs: ["list"]  
- apiGroups: ["networking.k8s.io"]  
  resources: ["networkpolicies"]  
  verbs: ["*"]
- apiGroups: ["com.ie.ibm.hpsys"]
  resources: ["datasets"]
  verbs: ["*"]  

kind: RoleBinding  
apiVersion: rbac.authorization.k8s.io/v1beta1  
metadata:  
  name: plexus-manager-binding  
  namespace: plexus  
subjects:  
- kind: User  
  name: plexus  
  apiGroup: "rbac.authorization.k8s.io"  
roleRef:  
  kind: Role  
  name: plexus-manager  
apiGroup: "rbac.authorization.k8s.io"


3. Configure user with the right ClusterRole
In order to allow Plexus to obtain the required node and namespace resources, we need to apply the right cluster role to each namespace for our user. In the following listing you can see the required cluster role features. (Reminder: This is a definition for a user called plexus in a namespace called plexus, but you can change the names of both user and namespace to match the ones you have in your cluster).
Note: The listing below shows the minimum required configuration – you can also add more properties to the cluster role, like security policies and others.

kind: ClusterRole  
apiVersion: rbac.authorization.k8s.io/v1beta1  
metadata:  
  namespace: plexus  
  name: plexus-manager  
  labels:  
     rbac.authorization.k8s.io/aggregate-to-view: "true"  
rules:  
- apiGroups: [""]  
  resources: ["nodes", "namespaces"]  
  verbs: ["get", "list", "watch"]  

kind: ClusterRoleBinding  
apiVersion: rbac.authorization.k8s.io/v1beta1  
metadata:  
  name: plexus-manager-binding  
  namespace: plexus  
subjects:  
- kind: User  
  name: plexus  
  apiGroup: ""  
roleRef:  
  kind: ClusterRole  
  name: plexus-manager  
  apiGroup: ""


4. Add Plexus labels to worker nodes

Plexus workload will be allocated in either GPU or CPU worker nodes depending on whether they require GPU architecture or not.

  • GPU worker nodes must be labeled with node-role.kubernetes.io/plexus-worker-type=plexus-gpu-worker
  • CPU worker nodes must be labeled with node-role.kubernetes.io/plexus-worker-type=plexus-cpu-worker

5. Add Plexus topology labels to nodes

To provide pod topology affinity, every node in the cluster must be labeled with topology.kubernetes.io/zone=X, where X is the node zone identifier.

kubectl label nodes nodeName topology.kubernetes.io/zone=X

6. Provide a Storage Class
In order to provide data persistence and to be able to share the data throughout the pods involved in the workload execution, we need to have a storage class with support for ReadWriteMany.

Kubernetes ReadWriteMany mode allows for volumes to be mounted as read-write by many nodes.

Plexus will create a persistent volume claim by user and mount it to every user pod.

7. The K8s cluster must have internet access.

8. Allow pods to copy files
Plexus copies internal files by using kubernetes cp.

9. Provide support for Deployments with ap iversion apps/v1
Plexus only runs deployments that are specified using the api version: apps/v1.

10. Provide support for Jobs with api version batch/v1
Plexus only runs jobs that are specified using the api version: batch/v1.

11. Support public exposure of pod ports
By default we are using Nodeport or LoadBalancer K8s services to expose pod ports for public access.

Other ways that port exposure could be implemented with support from Plexus team:

  • Plexus needs a way to be informed when ports are exposed through a different method. For example: Annotations in node or namespace