Better Cloud Security Though VM Cloning

(Originally posted in SANS Cloud Security Blog)

While teaching 524 this week, the subject of VM cloning came up. Specifically, we were discussing the impact of cloning on how we apply security to our servers. The folks in class enjoyed the topic enough that I thought I would cover it here as well.

Gen2 Building and Auditing Servers

Back when we were building standalone servers, their deployment was not all that unlike how we copied books 1,000 years ago. An admin would configure the server, patch it, lock it down, and then auditors would ensure the server was built to company policy. While automated scripts may be used, there was always some amount of customization taking place, thus the need to audit the final product. Again, this process was pretty similar to how books were created back when scribes were performing the task, which required an in depth audit of every book if you wanted to maintain quality.

Gen3 Server Build and Auditing

With VM cloning, we can leverage a process that is more akin to using a photocopier. With a photocopier, you audit the master work. If that looks OK you can then print out copies en masse. Less auditing on the back end is required, as you are no longer looking for minor typos. Process issues will impact large portions of the final product, thus making them easier to detect. So less auditing is required than under the “Scribe” model, yet you still end up with a more consistent product.

With VM cloning, we start off by creating a VM that is considered our gold standard master. This is the VM that represents how we want all of our other servers to be configured. There could be a single gold master, and then scripts to configure it to each appropriate role (Web server, database server, etc.), or, more likely, we maintain a unique gold master for each server role within our environment.

Once the gold master is created, we audit it against our corporate standards. If the VM is deemed acceptable, we simply copy it every time we need a new server. What’s nice about this process is that if we need ten servers or 1,000, we can be sure that every server will be an identical copy, and thus compliant with corporate standards.

Auditing Clones

Auditing clones brings some interesting possibilities as we are no longer auditing servers against the corporate standard, but against each other as well. For example, consider the following example:

Here we have a gold standard master VM that has been cloned 10 times to generate production servers. If we review the logs, we see one of those clones was logged into and the source got the log in name/password right on the first try. Should we be worried about this? Consider that all servers are supposed to be relatively identical. This means that if patches are required or additional lockdown steps have been added to our policy, we are not going to administrate each server manually, but rather update the gold standard and reburst out copies. In other words, there is no need for someone to log in to any of the production servers. So even though no password failures occurred, the above event is still highly suspicious. Now, ask yourself, if you are running a standalone server environment, would you even detect a suspicious log in that got it right on the first try?

Here’s another example:

Note that nine of my VM clones, as well as the gold master, are missing three security patches that have recently been made available by the vendor. Also note however, that one of my clones is fully patched and up to date. Again, this is suspicious because our process is to first patch the master and then reburst. It is quite possible that the fully patched server has been compromised and the attacker has applied the missing patches to ensure they do not lose ownership of the server to another script kiddie.

Note that our last two examples are what we call “positive exceptions”. In other words an event has been detected that is not necessarily bad, but because it is different than the rest of our environment, that makes it worth investigating further. For example back when we were running standalone servers we would probably never flag a server just because it is fully patched. With cloning however, it sticks out like a sore thumb.

I hear many folks exclaim “the cloud is insecure!” and I always try to balance that by saying “the cloud is different and requires a rethink of security”. The power of exception auditing within a group of clones is an excellent example of how we can actually apply better security with a cloud environment. As a bonus, that better security is even easier to apply. You just have to look at things a bit differently.

Related Posts