In the past two weeks I have:
- rebuilt 2 MOM environments
- repaired 1 MOM data warehouse
- repaired 1 SCOM environment
And I have learnt that:
1. I really don’t like stress very much, but then, who really does?
2. You need to pay careful attention to your antivirus exclusion lists, especially if you are running McAfee in a non-default configuration. This is a pretty complete list of the exclusions required, both for MOM and SCOM. When we deployed SCOM in this environment, the exclusions were made, but for some reason McAfee did not exclude sub-directories until the exclusion was changed to
*.*\Program files\System Center Operations Manager 2007\*.*
Once this was done, it resolved most of our problems with agents consuming CPU.
3. If the health service on your SCOM RMS server dies, nothing else works. It starts off with fewer agents reporting back to SCOM, and fewer alerts visible in the console, even though you haven’t done anything. When you start looking at the agents not reporting back, you may see the following event in the event log:
Event Type: Information
Event Source: OpsMgr Connector
Event ID: 21024
OpsMgr's configuration may be out-of-date for management group <MG Group Name>, and has requested updated configuration from the Configuration Service. The current(out-of-date) state cookie is "<cookie>"
You may also see this event in the event log of the RMS server. Restarting services doesn’t help. Steve Rachui’s blog has some information about this event and how to resolve it if it only happens on your agents. But if it happens on your RMS server, try the following:
- Stop the System Center Management service on your RMS server
- Browse to the install directory, Health Service configuration folder.
- Delete (or move) all the files in this directory
- Start the System Center Management service
The RMS server will update its configuration, and then start dishing out configurations to the agents. We did notice that on Windows 2008 agents, we needed to restart the agent service to force a config, but within 30 minutes 95% of all 600 agents were reporting back happily and alerts started flowing in again.
4. It would seem that installing the 32bit SCOM R2 agent on 64bit servers may have some negative performance impact. Deploying the 64bit agent instead resolves it instantly.