VMware VM PowerOff Experiment for LitmusChaos

ADARSH KUMAR
Geek Culture
Published in
7 min readSep 8, 2021

--

In this blog, we will be talking about VMware VM Power Off chaos Experiment for LitmusChaos.

LitmusChaos is an open-source Chaos Engineering platform that enables teams to identify weaknesses & potential outages in infrastructures by inducing chaos tests in a controlled way. If you are new to LitmusChaos or Chaos Engineering, I would recommend you to please go through this blog first.

VMware VM Power-Off experiment is one of the non-Kubernetes based Experiments of LitmusChaos. Basically, this experiment Power Off the VM that is running in the vSphere for a specified chaos duration and later it Power On the VM. The experiment uses VMware API's to start/stop the target VM. It helps to check the performance of the application/process running on the VMware server. Before Jumping to the Experiment, let’s first discuss the architecture of the vSphere.

VMware vSphere is the name of VMware’s server virtualization product. It’s formerly known as VMware Infrastructure, and it consists of ESXi, vCenter Server and a few other important features like vSphere Client.

ESXi

Esxi is a Type 1 hypervisor. It is the core of the vSphere product suite. ESXi provides a virtualization layer that abstracts the CPU, storage, memory and networking resources of the physical host into multiple virtual machines.

vCenter Server

Center Server is an application that enables us to manage our vSphere infrastructure from a centralized location. It acts as a central administration point for ESXi hosts and their respective virtual machines.

vSphere Client

The vSphere Client is an HTML5-based interface that gives users access to remotely connect to vCenter.

Pre-Requisites

Before getting into the experiment. Make sure to check the pre-requisites:

  1. vSphere of version 6.5 or later
  2. Kubernetes cluster with Litmus 2.0 installed.

Let’s Begin!

Step 1: Creating the secret

First, we have to create k8s secret and provide the IP address, user name, password of the vCenter.

kubectl apply -f secret.yaml -n litmus

Step 2: Creating Workflow

After creating a secret, we will create a workflow from the portal.

Click on the Schedule a Workflow button on the workflows page, and then choose the self-agent for now and click next.

After selecting the agent we have to choose a workflow, so here we will choose the default hub i.e, chaos Hub and Click Next.

So now, we are on Workflow Setting Page, here we have to provide the workflow name and description and click Next.

This is the important part, Here we will tune our workflow. First click on Add a new experiment

Then, search for VMware/vm-poweroff and select the experiment vmware/vm-poweroff and click Done.

After selecting the Experiment, The experiment has been added to the workflow, You can see that from the experiment graph diagram.

Now, we have to make some changes in the manifest of the experiment to specify some required experiment resource details. Click on EDIT YAML.

After clicking on Edit YAML, just scroll to the experiment part, where you can see that the credentials of vCenter that we have provided in the secret are passed as env in the experiment part.

Again scroll to the engine part and, In the engine part, we have to provide two things, which are TOTAL_CHAOS_DURATION and APP_VM_MOID.

TOTAL_CHAOS_DURATION is the duration of the chaos experiment, For how many seconds you want to Power Off the VM. For now, we will Power Off the VM for 30 seconds.

APP_VM_MOID is the VM MOID that is given by the vCenter itself, to uniquely identify its instances. You can find the VM_MOID from the URL itself. Click on the VM and see the URL of the vCenter, you will get the MOID in the form of vm-x . For now, we will be targeting the LitmusTest VM and, you can see the MOID of the VM is vm-17.

Now we need to provide these details in the engine part of the workflow.

After configuring the workflow, Click On Save Changes and click Next.

In the Reliability Score Page, for now, we will give full weightage i.e, 10 to the experiment and click Next.

In Schedule, We will be scheduling the workflow for now only, so select Schedule now and click Next.

Now we are on the final page of creating a workflow, here you can verify the whole thing that you have configured.

After verifying all the details, Click On Finish. You can see that we have successfully created the workflow.

Now On the workflow page, we can see that our newly created workflow is in a running state.

Step 3: Observing the Chaos

just click on the workflow name, Here we can observe the steps of workflow graphically, from Installing Chaos Experiment and injecting chaos to chaos revert.

As we can see in the workflow graph that chaos injection has been started. We can verify the state of VM from the vCenter itself.

Here we can see that LitmusTest VM was powered off and after 30 seconds it again powered on.

After some time we can see that all the steps of the workflow are executed and workflow has been successfully completed.

Now we can also check the logs and the chaos Result, In the Chaos Result, we can find the verdict and ProbeSuccessPercentage of the experiment.

Step 4: Analysing the Chaos

Now we can check the analytics of the workflow for better understanding.

Here you can see all the details about the workflow, workflow name, workflow id, In which namespace workflow has run and on which agent, how many total runs of workflow and all.

You can also see the Resilience score, Passed vs Failed percentage and statistics of the experiment.

In our case, we have run only one experiment and It was successful, Its Resilience score is 100% and the Passed vs failed percentage is also 100%.

Conclusion

So, In this blog, we saw, how we can perform VMware VM PowerOff chaos experiment using LitmusChaos 2.0.

LitmusChaos 2.0 makes chaos engineering more efficient for both individuals and teams and specifically enables scalability. It takes a cloud-native approach to create, manage and monitor chaos. The platform itself runs as a set of microservices and uses Kubernetes custom resources to define the chaos intent, as well as the steady state hypothesis.

If you have any doubts regarding LitmusChaos or Chaos Engineering excites you, join the Litmus Community on slack.

For joining the community please follow the following steps:

Step 1: Join the Kubernetes slack using the following link: https://slack.k8s.io/

Step 2: Join the #litmus channel on the Kubernetes slack or use this link after joining the Kubernetes slack: [https://slack.litmuschaos.io/](https://slack.litmuschaos.io/)

Check out the LitmusChaos GitHub repository. To learn more about LitmusChaos, check out the LitmusChaos documentation.

Thank you!

--

--