Multiple baselines – there’s more than one way to be right!

Our goal with file integrity monitoring is to check that a file hasn’t been tampered with.  To detect this, we remember the attributes of a perfect copy of that file, and as long as the version on the system we’re testing matches those, we’re happy.

However, with the capabilities that a public cloud architecture offers, you may want to have multiple generations of the same server or file, and roll out generations as is most useful for your system. In that case, there would be more than one “acceptable” version of a file – how could file integrity monitoring help you then? We’ve built multi-generational capabilities into Halo FIM to make sure it can work with this unique deployment method.

Adding new user accounts
Distributing new content

Updating packages on a running cloud server 
Mail server running on multiple Linux distributions

Long term use of multiple distributions 
Expiration – regular release of new files 

Adding new user accounts

As part of looking for changes made by attackers, I want to keep an eye out for any accounts that show up unexpectedly.  One way to do this on Linux/Unix is to watch the file /etc/passwd which has a single line for each user on the system.  If that file changes on, say, the machine web74, it’s possible that that machine has been compromised and I need to investigate.

The first step is to make a policy that monitors /etc/passwd .  To identify the “perfect” copy of that file, I add a baseline to that policy using a golden master system (a pristine cloud server image that’s already been customized to your needs that’s used to clone multiple machines) that already already has the correct user accounts installed on it.

Once a night I compare each of my servers’ /etc/passwd files to that remembered pristine version, and if any of them are different, I raise an alert.

That’s fine as long as I never hire anyone, but what happens when a new web developer comes on board?  I need to add that user’s account to both the pristine system and each of the web servers and update the baseline to match the new passwd file.  But doesn’t that leave a short period where the systems being checked may not match the baseline?  Yes.

Let’s rethink this.  The problem is that for anywhere from a few minutes to several hours there may be two valid passwd files; the old one without the new developer’s account and the new one with it.  Why not keep the old baseline (so that any machines with the old file match the old baseline) and just add a second baseline with the new file (so that any machines with the new file match the new baseline).

This approach of having multiple baselines acknowledges that there are times when there are legitimate reasons to have two or more correct versions of a file.  Should you find this useful, our file integrity monitoring module can handle scenarios like this.

Distributing new content

In the previous example we had a single file that was being updated, and one could make the argument that it takes a short period of time to get all the systems updated.  What about a large tree of files distributed to a large number of systems?

Lets say I have a tree of thousands of HTML files to send to each of my web servers.  The simple act of getting that many files onto the target systems could take hours or days.

If I load the new content onto my golden master system and steadily clone new copies of that while I retire the old cloud servers, I have the same problem as above – there will be cloud servers that perfectly match the old baseline and cloud servers that perfectly match the new baseline while I’m cloning and retiring.

If I push the content out to my existing servers with rsync, scp, unison, or some other tool that updates the tree file-by-file, I end up with an even more odd problem.  While I’m sending files to a given system, some of the individual files match the old baseline while other files match the new baseline.

We’ve set up our multiple baseline feature to support both distribution approaches seamlessly.  Each file being tested has to match its corresponding file in any of the baselines, as opposed to requiring that all tested files on a server match the same baseline.  That should reduce or eliminate false positives when a FIM scan is run while files are being updated.

Updating packages on a running cloud server

We run into a similar scenario when we update software on the system.  Let’s say that we monitor the “/usr/bin/” directory on our load balancers.  The baseline we ran against our golden master on October 22nd reflects the packages that were installed then.  When we decide to apply system patches, the replaced files will no longer match the baseline.

Much like the above, we can apply all our patches to our golden master first and add, say, a second “November 4” baseline.  From that point on any files on our cloud servers need to match either the original “October 22” baseline or the new “November 4” baseline, and the only alerts we’ll get are for files that match neither.

Once we have all cloud servers patched and deployed we can delete the “October 22” baseline.  That has the added bonus that we’ll get alerted from that point on if any cloud servers show up with the old software – perhaps because we accidentally launched a cloud server from an old golden master.

Many shops choose not to directly patch running cloud servers, but rather prefer to launch new cloud servers based off a new, fully patched and tested golden master, while retiring the old images.  This approach works fine for that scenario as well; there’s no functional difference between the two as far as Halo FIM is concerned.

Mail server running on multiple Linux distributions

In an ideal world all our cloud servers would run on the same underlying operating system, for example, Red Hat Enterprise Linux 6.1.  When we check critical system files, such as the “/sbin/” directory, all “/sbin/” files on our cloud servers should match our RHEL 6.1 golden master.

When it comes time to migrate to RHEL 6.2, we create the RHEL 6.2 golden master and add a baseline for it.  Now the “mail server” cloud systems will pass their FIM scans if they have either the RHEL 6.1 or the RHEL 6.2 files in /sbin/.  We leave both baselines active during the migration.

When the migration is done, we delete the RHEL 6.1 baseline, and just like above we get the benefit of being alerted in the future if a RHEL 6.1 cloud server shows up.

This technique also works for migrating to a completely different Linux distribution too – perhaps moving from Fedora to Ubuntu.  Put in a baseline for each currently valid distribution during the migration, and when done, delete all baselines that are no longer used.

Hint: In any Halo FIM policies that look at files provided by the operating system vendor, add the target “/etc/*release*”.  Most linux distributions use a file like “/etc/fedora-release” or “/etc/centos-release” whose content shows the distribution name and version.  By checking that file, you’ll get an alert if your cloud server doesn’t seem to match the operating system of any of your baselines.

Long term use of multiple distributions

The above examples assume you’re migrating from one linux distribution to another, with a near-term end date for the old systems.  While somewhat less common, it’s possible you have – and intend to keep – multiple Linux distributions for the long term.

Perhaps you’re concerned that you don’t want to run in a completely homogenous environment, such as having all your servers based on Ubuntu 12.04 – even though it makes administration easier when they are identical.  In that case, you could reasonably run ¾ of your servers on Ubuntu 12.04 and ¼ on Centos 6.2.  Should someone come up with an attack that crashes the web server on, say, Ubuntu 12.04, the different code base on Centos 6.2 increases your chances that those will stay up.

Multiple baselines serve you well in the long term here too.  Create a baseline for both operating systems.  They’ll both be live as long as you have live cloud servers based on those.

Expiration – regular release of new files

Let’s say you know you send out new copies of your web scripts once a month on the 15th.  This is a perfect use of multiple baselines as there will be overlap while the new files are being loaded.

This is also a good time to consider the expiration date of a baseline.  When the baseline is created you can give an expiration date to it: Never, 30 days, 90 days, 180 days, or custom.  For this example we’d want to add a few days at the beginning and end, so we’d end up with perhaps a 40 day custom expiration.  At the end of the 40 day period when your cloud servers should be using the next month’s code, you’ll get alerts for any systems that haven’t been updated.
The above scenarios all fall under one big umbrella: there are times when the files you compare with Halo FIM scans won’t all match a single baseline.  While there are a number of reasons for their use and the time span where multiple baselines are needed could range from a few minutes to permanently, Halo’s FIM module can handle them all.

Related Posts