VenomFix

medical.png

This page is intended to serve as a resource for administrators who would like a rebootless procedure for Venom vulnerability patching. It has been used to patch thousands of Linux environments on varying distro’s for an alternative to rebooting every instance across large swaths of virtualized hosts. (CentOS 5, CentOS 6, Ubuntu, Debian, and others).

There was recently a CVE ( CVE-2015-3456 ) released that affected servers running Xen, KVM, and QEMU, apparently for the last 10 years or so (yikes!). The vulnerability could potentially allow code execution on the host environment from the virtual machine.

This site is hosted in order to help admins to patch their KVM virtual machines with as little downtime possible.

After reviewing the patched code, we realized it would have been difficult to implement a live patch to the already running qemu processes. We ended up utilizing common virsh commands and leveraged the quick speed of RAM in order to get the job done as quick as possible.

We were able to get all of our instances up and running on the new patched qemu in place without a full instance reboot resulting in only about 10-20s of ‘locked time’ on each instance (caveat: hardware mileage may vary)

The quick way:

Step one is to make sure you have patched qemu versions in place. For Redhat/CentOS that would be qemu-kvm-0.12.1.2-2.448 or later. This can be achieved by making sure your distro is up to date as most have released it via their package manager.

After you’ve got your patched qemu in place you can go ahead and “roll” your instances. This command may be used to save all instances on a box to /dev/shm (RAM) and restore them to a running state. The instances will be locked for less than a minute (in our experience mostly 30 seconds or less) and then return to normal function. We feel this is way better than rebooting everything. You’ll want to make sure you have enough space in memory or change /dev/shm to a local disk.

Here’s the command we used to “restart” each instance without really rebooting:

for i in $(virsh list --all | grep running | awk '{print $2}') ; do instance=$i ; date ; time virsh save $instance /dev/shm/$instance ; date ; time virsh restore /dev/shm/$instance ; date ; rm -f /dev/shm/$instance ; done

We were able to confirm the process did indeed source the new binary by looking at the process and seeing the inode change:

qemu-deleted.png

qemu_there.png


Gotcha:

QEMU KVM guest instances that are using Anonymous (Transparent) huge pages, may error on 'virsh restore' operations with 'alloc_mem_area: can't mmap hugetlbfs pages:'. To address this, you should assign huge pages explicitly instead of leaving them Anonymous (Transparent) by shifting allocation as instances are saved.

Instances using huge pages would be booted with the qemu arguments '-mem-path /dev/hugepages/', where the path is that of the 'hugetlbfs' mounted file system found in /proc/mounts:

# grep hugetlbfs /proc/mounts

hugetlbfs /dev/hugepages hugetlbfs rw,relatime 0 0

Those hypervisors using Anonymous (Transparent) huge pages where instances are running with hugepage support (see above), would have a would have a value of '0' for HugePages_Total in /proc/meminfo:

# grep -i Huge /proc/meminfo

AnonHugePages: 6680576 kB

HugePages_Total: 0

HugePages_Free: 0

HugePages_Rsvd: 0

HugePages_Surp: 0

Hugepagesize: 2048 kB

If both these conditions are true, instances will not gracefully ‘virsh save | restore’ due to the 'alloc_mem_area: can't mmap hugetlbfs pages' bug (https://bugzilla.redhat.com/show_bug.cgi?id=518099). In this situation, you would first save an instance with the 'virsh save' command then allocate an equal or greater amount (preferably greater) of pages from AnonHugePages/Hugepagesize: to HugePages_Total similar to the following:

# echo 4096 > /proc/sys/vm/nr_hugepages

At a 2048kB page size (Hugepagesize), this would allocate 4096 memory pages from Anonymous (Transparent) pages to to HugePages_Total, or 8GB of RAM. If an abundance of free memory exists, it would be recommended to allocate it to HugePages to avoid potential issues.

The caveat on this workaround is that allocated HugePages can not be merged with Kernel Same-Page Merging (KSM). That being the case, this workaround should be weighed against the memory usage on hypervisors; if memory is oversold then moving pages to allocated HugePages from Anonymous (Transparent) HugePages, will likely result in some portion of guest instances unable to start due to insufficient memory. In this scenario, the only solution to remediate the vulnerability is to stop / start all guest instances on the hypervisor.

Hope this helps!

David Collins / Patrick Pelanne / Ryan MacDonald

Endurance International Group

Credit:

Endurance International Group would like to thank the team at CrowdStrike Inc for their discovery and information disclosure at http://venom.crowdstrike.com

Disclaimer:

Our security team found a way to fix this exploit and we wanted to share it to help make the web a better place. However, we can't guarantee that the fix will work or is appropriate for your environment. You should assess your own environment and consult experts as necessary to address your specific situation. Your use of any information on this website is at your own risk. All information on this website is provided "AS IS" without any warranty of any kind. We expressly disclaim any liability for any loss or damages resulting directly or indirectly from your use of the information on this website.