How do IO Limits look like

How to configure

Starting with vSphere 6.5 it is possible to configure IO limits using Storage Policies Based Management (SPBM). Alternatively you can edit VM settings and configure the limit directly at the VMDKs.

To configure using SPBM:

  1. Go to Policies and Profiles in WebClient
  2. Click VM Storage Policies
  3. Click Create VM Storage Policy
  4. Under 2a Common rules check Use common rules in VM Storage policy
  5. Click Add component, select Storage I/O Control, select Custom
  6. At this point you can enter IOPs limit
  7. Continue rule creation to the end of the wizard
  8. Select VMs you want to apply this policy, right-click your selection and select VM Policies and Edit VM Storage Policies. Select you policy and press OK. After this all VMDKs get this policy applied. If you want to exclude VMDKs from policy, edit VM settings and set storage policy for VMDK back to Datastore default.

To configure limits by editing VM settings:

  1. Right-click VM and click Edit settings
  2. Expand VMDK to limit and enter Limit – IOPs
  3. Click OK.

Both way can be done online, changes gets applied immediately.

How does it work

All the work is done by a component called mClock, running on ESXi host. To execute IO limits, mClock leverages kernel IO queuing. What is kernel IO queuing? This is something you don’t want to see during normal operations. It happens, when IOs triggert by VMs on a datastore exceed its device queue. When device queue is full, kernel IO queue jumps in to hold these IOs and put them into device queue when possible. For VM IOs this behavior is fully transparent.

Because of this you get a downside of IO limits: higher latency!

During normal IO operations, kernel IO queues should not be used. Here is a screenshot of an IOmeter test in esxtop.

IO_limit_IO32_NOlimit_2

Short column description:

  • DQLEN: device (LUN/volume) queue lenght
  • ACTV: current count of IOs in device queue
  • QUED: IOs in kernel queue
  • CMDS/s: IOs per second
  • DAVG/cmd: storage device latency
  • KAVG/cmd: kernel latency
  • GADG/cmd: guest (VM) latency
  • QAVG/cmd: kernel queue latency

As you can see, no kernel queue is used, therefore device latency is the same as guest latency.

An example when kernel queue is used is when a VM uses a larger IO queue in virtual disk controller than DQLEN and uses this queue excessively . Here an example of such situation. Disk controller in VM has a configured length or size of 256. I used IOmeter to run a test using 100 outstanding IOs:

IO_limit_Q100_2

You can see, QUED, KAVG/cmd and QAVG/cmd are > 0. You can also see ACTV + QUED ~ 100 which is the used queue length in VM. When you have no hardware problems in your SAN (cable, SFP), QAVG/cmd should be the same as KAVG/cmd.

Next test-settings:

  • IO limit for VMDK is set to 2000
  • IO-size in IOmeter is set to 32kb
  • Outstanding IOs: 16.

IO_limit_IO32_limit2k_2

You can see:

  • ACTV = 0 –> device queue is not used
  • QUED = outstanding IOs –> kernel IO queuing
  • CMDS/s ~ IO limit set
  • KAVG/cmd > 0 –> kernel IO queuing
  • GAVG/cmd = DAVG/cmd + KAVG/cmd

Conclusio: kernel queue is used to handle the amount of IOs put to device queue to slowdown IO processing. Because IOs have to wait in kernel queue, latency is increased.

About IO Size

You define IO limit using an absolute value. But mClock does not count every IO as one IO. The size of an IO matters! mClock “normalizes” IOs to a multiple of 32KB. In a mathematical notation:

counted \ IOPS = \frac{limit}{\lceil\frac{IO-size}{32}\rceil}

\lceil \ \rceil means to round up to next integer (\lceil 1.01\rceil = 2). So an IO-size in range of 1-32KB count as 1, 33-64KB count as 2, …

To show this behavior, I used the same test as before. Only differnce: 33KB instead of 32KB IO size:

IO_limit_IO33_limit2k_2

IOs cut in half.

Next 65KB IO size:

IO_limit_IO65_limit2k_2

…only one third.

Nodes

  • Despite you configure a Storage IO Control (SIOC) policy when using SPBM, SIOC must not be enabled on datastore.
  • When storage policy is applied at a VM that already uses manually set IO limit, lower limit wins and is executed.
  • When using SPBM, IO Limits are not shown when edit VM settings. There you can only see applied policy name.
  • When storage policy gets detached, disk IO Limits are reset to Unlimited.

 

Advertisements
How do IO Limits look like

Warning in VCSA Upgrade Wizard

A short (and funny) one. At my last upgrade of VCSA from 6.0 to 6.5 I got a warning that says: “User vdcs does not have the expected uid 1006”. Furthermore warning is talking about a KB article to read but without any hint what article it is talking about. This message has something with the vCenter feature Content Library. According to VMware Support when Content Library is not used, you can continue with upgrading VCSA – and using Content Library after the upgrade. When feature is use, call VMware Support.

Warning in VCSA Upgrade Wizard

How to find information to create device specific PSP rule

When you connect your storage device to your ESXi hosts, a Path Selection Policy (PSP) will selected based on defined rule set. To help the host to select the right policy for your device, you have a few options. I personally prefer to create a device specific rule based on Vendor and Model. I will describe for ways to get the information about your device you need.

Which PSP Plugin. Get the information about installed device from vendor. In my opinion it is the vendors task to provide best practices. Your should at least get the information which PSP Plugin (Most Recently Used, Fixed or Round Robin) should be used. You should also use VMware Compatibility Guide for storage to determine preferred plugin.

Find Vendor and Model. Here are four ways to get Vendor and Model-IDs. Based on information you already have or connection state of you device take the way you need.

  1. Device is not connected
    You can use VMware Compatibility Guide for storage to find you storage device. Under “Model Details” you can see: “Vendor Id” ad “Product Id”. “Product Id” stands for Model. I recommend to check these data with your running system.
  2. Device is connected – VMFS volume already created
    In this situation, run following command in ESXi console:
    esxcli storage vmfs extent list
    In result find your VMFS volume and corresponding device ID. Use this device ID to run: esxcli storage core device list -d device_ID. In lines “Model:” and “Vendor:” you find details you need.
  3. Device is connected – no VMFS volume is created, WWN (WWPN or WWDN) of device is known
    Run following command to show path information to device using your WWN
    esxcli storage core path list | grep -i nn:mm -B 15
    In this command-example nn:mm is part of searched WWN. In line starting with “Device:” you find device ID of your device. In result find your VMFS volume and corresponding device ID. Use this device ID to run: esxcli storage core device list -d device_ID. In lines “Model:” and “Vendor:” you find details you need.
  4. Device is connected – no VMFS volume is created, unknown WWN, VVOL capable
    Run following command to show VVOL Protocol Endpoints:
    esxcli storage core device list --pe-only
    You should find your device in the result of the command

To create a simple PSP rule you can use command:

esxcli storage nmp satp rule add --vendor="vendor_name" --model="model_name" --description="Text" --satp="satp_plugin_name" --psp="psp_plugin_name" --psp-option="iops=1"

Descrition:

  • satp_plugin_name
    Used satp-plugin defines your storage array type. For your device can be found in VMware Compatibility Guide for storage or running: esxcli storage nmp device list -d device_ID. You can get installed satp-plugins when running: esxcli storage nmp satp list.
  • psp_plugin
    Defines the policy for path selection during operation without failure. To list available plugins run: esxcli storage nmp psp list.
  • optional: –-psp-option “iops=1”
    When psp_plugin is VMW_PSP_RR (Round Robin), option “iops=1” can be used – default: iops=1000. With this option, every IO is sent on the next storage controller. For some devices this option is for better performance, but most often it is used to reduce path-failover time when a failure occurs. When you can’t find information about this option at you vendor, at least test this setting.

When there is an error when creating a rule, look at /var/log/vmkernel.log for more details. To remove a rule, run same command, just replace “add” by “remove”.

To create PSP rules using PowerCLI, look here (example for creating 3PAR rule)

How to find information to create device specific PSP rule

3PAR – does CPG setsize to disk-count ratio matter?

In a 3PAR system it is best practice that setsize defined in CPG is a divisor of the number of disks in the CPG. The setsize defines basically the size of a sub-RAID within the CPG. For example a setsize of 8 in a RAID 5 means a sub-RAID uses 7 data and one parity disk.

I did a few tests using an array of 8 disks. I created 4 CPGs, using RAID 5 in all of them but different setsizes of 3, 4, 6 and 8. To get more transparent results, I used 100 GB of fully provisioned virtual volumes (VV).

When a 3PAR VV is created, logical disks (LD) are created. To spread data across controller nodes, disks and ports at least one LD for each controller is created as you can see in the screenshot. To show LDs for a VV use command showld -vv VV_name

3par_showld

You can see the owner node of each LD, the size of the LD and how much is used. Because each VV has 100g in size, each LD claims the half. The size is different for each VV. This is a result of the setsize as you can see shortly. In a 3PAR system  RAID is implemented on chunklet-level. LDs are based on this level. To show all chunklets used by a LD, run command showldch LD_name. To show sub-RAID-sets of an LD, run: showldch -lformat row LD_name. Here are sub-RAIDs of one LD of each test-VV.

3par_showldch_1

3par_showldch_2

In column row,  sub-RAIDs are listed. Columns Ch0 – Chn show used physical disk (PD). So, why depends the size of a LD on the setsize of CPG/VV? For LD test_ssz8.usr.0 we saw a size of 57344MB. When we look at the used chunklets for this LD, we see 8 rows and 8 chunklets for each row. Because we use RAID 5, one chunklet per row is used for parity. As a result we get: 8 (rows) * 7 (data-chunklets) *1024 = 57344.

We have seen which sub-RAID-set of which LD uses which PD. To just show used chunklets for a VV, run: showvvpd VV_name. Lets first look at VV test_ssz8 that fits perfectly to 8 disks.

3par_showvvpd_ssz8

We see 16 chunklets on each disk. Perfectly distributed. Why 128 in sum? Because we saw 2 LDs for this VV, each uses 8*8 chunklets (including RAID-overhead) –> 2*8*8=128. What about VV test_ssz4?

3par_showvvpd_ssz4

136 = 2 * 17 (rows) * 4 (chunklets). In the next screenshot you can see used chunklets of both LDs to see how LDs spread across all disks.

3par_showldch_ssz4_both

Lets look at VV test_ssz3:

3par_showvvpd_ssz3

Not very well distributed! We can see a range from 15 to 25 chunklets per PD. And for VV test_ssz6?

3par_showvvpd_ssz6

Even worse. Range from 7 to 18 chunklets per PD. But is every VV created using a specific setsize equally? So does it use the same amount of LDs on the same PDs? 3par_showvvpd_ssz6_2

We see, distribution is not good too, but totally different!

Fortunately such situation is quite easy to fix. All you have to do is to change setsize in CPG settings and tune the whole system or just CPG. Command to tune CPG:  tunesys -cpg CPG_name. After setting setsize to 8 in CPG and tune it, distribution is equally.

3par_showvvpd_ssz6_tune

IMHO

Setsize to disk-count ratio matters! It is responsible for sub-RAID-set creation. When you plan to create just one or two VV on the system, you should take care, setsize divides disk-count. Otherwise the IO-distribution on your disks can be unequally. But the more VVs you run, the smaller the impact will be. I have seen many systems with not fitting setsize – sometimes you cannot set a setsize that fits to disk-count; for example: 38 disks. Most of them have equal distributed IOs. Just a few systems shows no good IO distribution.

3PAR – does CPG setsize to disk-count ratio matter?

Remove a cage from a 3PAR array

Removing a cage can by a simple task. But it can also be impossible without help from HPE. Before starting at point 1, you should probably check point 3 first.

Generally I would recommend to do any of these steps ONLY when your are

  • very familiar to 3PAR systems! 
  • know what you are doing!
  • know what are the consequences of your actions!

If you have doubt about any of the following steps, contact HPE! Also you should use the 3PAR Command line Reference Guide to check the used commands.

Here are the basic steps to remove a cage:

  1. Migrate/remove user data from disks in cage
    • When allocated by CPGs, use tune Virtual Volume (VV) (GUI or command tunevv) to move data to different CPG. You need Dynamic Optimization license to tune VVs.
    • Use disk-filter in CPG to “remove” disks from CPG. Use tune system (GUI or command tunesys) to remove chunklets from CPG afterwards. About disk-filter see here and here.
    • When using Adaptive Optimization (AO), change AO-configuration to remove CPG that contains disks of cage to remove.
    • Re-create Volumes on other disks. Use features like VMware Storage vMotion to move data away from volumes on disks to remove. Remove VVs and CPG afterwards.
  2. Remove spare chunklets from each disk to remove
    Command: removespare -f n:a
    n for PD-ID (physical disk ID)
  3. Run showpd -c. every disk in the cage must have 0GB used data (Column: Used|OK)! If this is not the case, you are in troubles! You can only continue, when disks are completely free! Check what data is still on disks:
    showpdch n (n for PD-ID)
    When you see data like:
    admin*.srdata* pdsld* log*
    system meta-data is stored on disks you want to remove. admin and .srdata are system-VVs. You can check which disks these VVs are using by running showvvpd admin and showvvpd .srdata. In this case, contact HPE Support to migrate data to other disk (types) in the system. Especially pdsld* log* can be hard to move!
  4. Run dismisspd m n
    m n for the list of PD-IDs
    This command removes disks from configuration. After this command, disks are stated as “new”.
    If an error says that disk is in use, go to 3. again.
  5. Run
    • controlmag offloop -f cagen 0
    • controlmag offloop -f cagen 23
      cagen is the name of the cage
      The command turns off disk. This step is optional.
  6. Run servicecage remove -f cage
    This command remove cage from configuration. If there is an error, you can remove the cage (next point) and try command again.
  7. Remove cables from cage
    At this point you should know the basis about cage-cabling. Here you can find Cabling Configuration Guide for 8000 series. If you do not operate more than two additional cages, it is quite simple. If you use more cages on each data path, you have to be careful! Check cages using showcage with every step you do.
  8. Run admithw
    To re-calculate spare-chunklets, run this command.
  9. Run checkhealth
Remove a cage from a 3PAR array

3PAR: Considerations when mixing sizes of same disk type

Generally it is supported to mix disks of different sizes of the same type within a 3PAR system. For example you can use 900GB and 1.2TB FC-disks – within the same cage and even within the same CPG. When a disk fails, HPE sends an replacement disk. Some time ago, stock of 900GB FC disks seem to be empty. So when a 900GB disk fails, you will probably get a 1.2TB disk instead.

So how to handle different disk sizes? Here are a few points to consider:

    1. How do a 3PAR system handle different sizes within the same CPG? The system tries to put he same amount of data on every disk in a CPG – no matter if there a different disk-sizes. When the smaller disk are full, larger disks continue to fill up. So replacing just a few disks within a CPG with larger disks does not matter – as long as smaller disks not running full. When this happens, just larger disks gets new data. This can lead to a serious performance problem.
    2. When talking about SSDs: mixing different sizes will probably be no problem. Even when you think of point 1. But: when your SSDs are near the performance maximum you can also get an performance problem after smaller SSDs are full.
    3. When you have different CPGs for different disk sizes (how this can be done, you can read here), you must check before replacing a failed disk by a disk of a new size. Will the replaced disk be part of the right CPG? If not, your should re-define you CPG disk filter. By the way, this cannot be done in SSMC any more! You need CLI. See point 4.
    4. What about filtering disks for CPGs by cage or position in cage instead of disk size? Since I know, HPE replaces 900GB disks by 1.2TB disks, this is my preferred option, when different CPGs are desired.
      For example you can use this command to change the disk filter for an existing CPG:
      setcpg -sdgs 32g -t r6 -ha mag -ssz 10 -p -devtype NL -cg 2 -saga „-ha -p -devtype NL -cg 2“ NL_r6_cage2
      The meaning of the different parameters, you can find here (Option -devtype is mandatory for option -cg, which is for cage selection. You can list more than one cage by list them separated by comma (1,2), or define as range (1-3). Another option is to define filter as positions of disks. Check 3PAR Command Line Reference for more information – command: setcpg.
3PAR: Considerations when mixing sizes of same disk type

PowerCLI script to copy PortGroup between hosts

Here is a short PowerCLI script to copy vSwitch PortGroups from a source host to a target host. Security policies will be copied too. The switch at the target host has to be created already. The script takes inheritance into account. This means just setting, changed at PortGroup-level are copied. All other settings are inherited.

Continue reading “PowerCLI script to copy PortGroup between hosts”

PowerCLI script to copy PortGroup between hosts