Logs are great for historical searches, but if I’m wearing my sysadmin hat I want to know immediately for certain types of problems, and I would like a way to put things into a ticketing system for less important events. Alerts will fit the bill.
To start, we’ll create an Alert Profile (under Portal > Policies > Alert Profiles) with a list of people that need to be alerted. The profile you create will apply to all alerts created by a server group, so think about who needs to get those alerts. For example, your web server group might send alerts to the webmasters and the system administration team, while the SQL server farm sends alerts to the DBAs and the sysadmins.
As you select users to receive the alerts by checking off their names, think about whether they should get Critical Alerts, Non-critical alerts, or both. My easiest way to decide – and the yardstick I use when I need to decide whether a given Configuration Policy Rule or Special Event Policy entry needs to be critical or not – is “Is it worth waking someone up at 2AM for this?” and “Are these the right people to wake up at 2AM?”
As you’re putting the Alert Profile together, you may want to add new accounts (under Settings, Site Administration, Users) for “special” alert recipients. For starters, if your organization’s internal ticketing system can automatically create a trouble ticket from an incoming email, give that email address a user account and add it to the Alert profile. By the time your pager goes off, there’s already a trouble ticket created and you can add more information as you find it.
Speaking of pagers… 🙂 Because the Portal can send alerts to any email address, you can almost certainly have your alerts go to one or more pagers or cell phones. Pager companies can tell you what email address to use to reach a particular pager. For cell phones, there’s already a page that describes how to construct an email address to reach a cell phone from a given provider:
For both pagers and cell phones you need to be mindful that the Portal does not do rate limiting. If you get into a situation where you have a large number of alerts going out – think an outage of a major data center – email accounts don’t mind usually, but you may end up getting charged a significant amount for pages or text messages over your limit. This has never happened to me *cough* *cough*.
If you find you’re getting a significant number of pages/text messages during a major or sustained outage, you can temporarily disable pagers and cell phones by just unchecking them in the Alert profile, and then re-enable them when things return to normal (make yourself a reminder!).
When you have your alert profile set up with the users you want, press “Save” at the bottom. You also need to assign it to a group (or multiple groups) of servers. To do this, press “Servers” to get the server list, scroll down to the server group you want, click on the group and press “Edit Details”. Select the drop-down box next to “Alert Profiles” and select the profile you just created. Alerts from this Server Group will now go to all the people in this Alert Profile (but you still get to filter Critical messages to some of them and Non-critical to others). This profile can be assigned to as many server groups as you’d like.
If you don’t set up an alert profile for one or more groups, all alerts from that group go to all the site administrators on the account. In an organization where all your admins are responsible for all systems, this default behaviour may be exactly what you want.