I’m working with a group of 30 cloud-based web servers, and one of them is a little different. Perhaps it has a few extra installed packages, an additional listening port, a new account I don’t recognize, or lots of network traffic to a remote host. What reaction would each of these elicit from me?
Something’s not right.
With servers built from the same process or server image and maintained in parallel, there are very few legitimate reasons why they should differ from each other. At the mundane end of the scale, another admin installed a package or started up a daemon to do some troubleshooting. Other reasons include problems with the build or patching process or a server break-in.
We can use any differences as starting points for investigation; just why is this extra package installed on 2 of these servers, but not the rest? To get to that point, though, we need a way to identify the differences first.
OOTT (One Of These Things is not like the others…)
While it sounds like the noise made by a cartoon character, OOTT is a Ruby program that collects information about your Halo-managed systems and presents HTML views of each group of machines (and one final report summarizing all of them). For a simple aspect of a system (such as “Account tjames exists”), it shows you how many systems have it:
To find out which servers have the asmith and tjames accounts, click on the number of servers and you get a list of server names as well:
When the servers can have different observed values, such as installed package versions, we summarize those as well:
The report also shows any Configuration policy rules and checks that fail as well:
The beauty here is that the report automatically summarizes all of the Configuration Policy rules and checks that need attention. Given the wide range of Configuration checks one can perform, this turns out to be a remarkably rich source of information.
And finally, I’m glad to report that these 2 servers do match in some of their aspects:
The connecting IP parent domain can quickly identify machines that are running in a network you hadn’t expected.
How does this help?
The real benefit of a report like this shows up when you manage more than a few systems. OOTT does the work of summarizing a large number of server aspects; you can look at a report like this and ask “Why do 10 of my main servers have postfix 2.5.4 installed, and 2 of them have postfix 2.3.1?” If you double the number of machines in a group, the report stays just as easy to review. This means less manual effort to identify outliers.
The pages are organized from highest priority to lowest, so you know where to start. The top half of the report holds the server aspects where they disagree. The bottom half holds the ones where they all agree. Inside both of those we start with critical+bad, then bad, indeterminate, and finally good.
As you work your way down the list, finding and fixing discrepancies, you can ask the Portal to run a new scan on the systems you’ve updated, then rerun the OOTT report. The aspects that have been fixed will fall off so you can focus on the next issues.
Giving it a try
OOTT and its associated readme and library are available at both the CloudPassage Toolbox on Github and http://www.stearns.org/halo-api/ . The install takes just a few minutes and requires Ruby and a few Ruby support libraries.
We hope it’s useful to you. Let us know if you have suggestions for improving it!