Monday, September 24, 2012

Hi Guys,

I got one more SCOM issue on which I spend many hours to troubleshoot and finally identify the cause area and resolved it hence thought to share here for SCOM 2007 R2 product.

Microsoft SCOM 2007 R2 Enterprise - Randomly entire Management Group becomes greyed out

Setup - All nodes are part of VM
OS - Microsoft Windows Server 2008 R2 Enterprise
App - Microsoft System Center Operation Manager 2007 R2 Enterprise
DB - Microsoft SQL Server 2008 R2

Randomly, the management group becomes greyed out in SCOM 2007 R2 environment. however health service on each monitored system still healthy.

In this case, followed step by step troubleshooting in order to identify the area of cause and necessary action towards resolution.

1. Start perfmon capturing with necessory counters or use command below.
Logman.exe create counter Perf-1Sec -f bincirc -max 500 -c "\LogicalDisk(*)\*" "\Memory\*" "\Network Interface(*)\*" "\Paging File(*)\*" "\PhysicalDisk(*)\*"  "\Server\*" "\System\*" "\Process(*)\*" "\Processor(*)\*"  "\Cache\*" -si 00:00:01 -o C:\PerfMonLogs\Perf-1Sec.blg
Logman.exe start Perf-1Sec   

2. Wait for problem to report again and then stop the perfmon log. Here can either use Perfmon console or “Logman stop xxxx”

3. create a dump file for health service on RMS first.     
A.Open task manager, right click health service.exe to create a dump file.  
B.After dump file is created, please go to the temp folder and copy this dump file to a safe location. After OS restart, these dump file could be clean up.     

4. get a SCOM trace.
A. Stop the healthservice (tried to stop the service from services.msc, if the process hang during the stopping process, can create another dump for it and then terminate using Task Manager)  
B. After stop healthservice, open a command line and go to “c:\program files\System Center Operations Manager 2007\Tools” folder. Try the following command:      
Del c:\windows\temp\OpsMgrTrace\*.*  
StartTracing VER     

NOTE: VER is case sensitive.      
C.       Start healthservice and wait for 10 minutes to check if the service is recovered  
D.       If not, please stop and capture the trace:      
E.       Capture all the log file under c:\windows\temp\OpsMgrTrace     

5. once all the data is captured and you are ready to reboot the system to recover, instead of restart the system, please use to trigger a blue screen on the system. At that time, the system will crash and start to dump all the memory to C:\Windows\memory.dmp. this file will record the entire OS status.  

Note - If dump cannot be captured using bug check, please help to check if this method can work? 
Click HERE

After over with analysis, it was suspected issue not with SCOM side but SQL performance and conflicting with Backup job running on same time.
Hence captured perfmon log on all Database servers as well where found Disk Latency issue on logs.

When look at the system, even though the management group becomes Gray out, it is the Health Service Watcher object is grey, the health service running on each monitored system is still healthy. Since the watcher objects is running on RMS, the problem is more related to the RMS status.
By looking at the problem history, although the issue is reported randomly, it is found the most likely, the problem is reported during midnight. Usually, that is the time for backup tasks.
Before RMS report error, we can see that SCOM SDK service always reports error for connecting to the remote DB. Thus, we checked the task on SQL side. In most cases, the RMS error time matches with the DB backup schedule. For a test, we disabled the backup job during midnight, the RMS problem disappears.     Further check the SQL server, it is found some disk latency happens. Thus, we believe the RMS issue is not caused by SCOM configuration. Actually, it is a victim of SQL performance.
After fix the SQL latency problem, SCOM has been stable.  

Reference - Here is the process to promote an MS to RMS:  Click HERE

Friday, September 14, 2012

How to check which GPO applied and which registry changing by GPO

Hi Guys,
I am adding one more article here because I feel it would be more benificial for all of us who worked on Microsoft platform under Administrative task, many of us worked or working with Group Policy, even I worked for many years but intresting is, I never saw which registry being changed by applying Group Policy Objects on server.

You can open RUN box from start menu, enter " RSOP.MSC" which will open a seperate window for Resultant set of Policies and you can see all policy applied to box.

Once the console opens you will be able to see which settings have been applied to your PC.
Note: Only settings that have been applied to your machine and user account will show up.

You can use command prompt as many are lover of it, When using the command line, it should be noted that you have to specify the scope of the results. To find all the policies that are applied to your user account, you would use the following command:
"gpresult /Scope User /v"  (Here you can save it to text file by adding >filename.txt)

Then if you scroll down, you will see the the Resultant Set Of Policies for User section.

If you are looking for all policies applied to your Computer, all you need to do is change the scope:
"gpresult /Scope Computer /v"

If you scroll down, now you will now see that there is a Resultant Set Of Policies for Computer section.

Now, question is, how do we check which registry settings added by modified group policy object. so we can use one of fantastic tool, Process Monitor here.

You can download it from MS Sys-Internal.

Then extract and run it locally.
When Proc Mon opens, you will need to add a condition as follows:
"Process Name is mmc.exe then Include"
Then click the add button.

To get only the registry keys that are changed, we need add another one:
"Operation is RegSetValue then Include"
Then again click the add button.

Once the two rules have been added, you can go ahead and click ok.

Now go and open the Group Policy setting that you wish to edit.

Before you actually change the setting, switch back over to Proc Mon and clear the log.

Then go and change the GPO and click apply.

If you switch over to Proc Mon you will see that you have a registry key(s) there. Right-click on it and select the Jump To… option from the context menu.

That will fire up Regedit and take you to the exact key which was modified