Wednesday, November 02, 2005

Possible file corruption for NTFS volumes on Windows Server 2003 Service Pack 1

I subscribe to the Windows Server 2003 Knowledge Base feed to see what issues have been discovered, if my current environment might be impacted, and if a fix is available. This has already saved me hours of troubleshooting as I discover a KB article for something I know we have seen in our environment. Sometimes you get something scary, like this: Potential file corruption problem on NTFS volumes during extensive stress tests in Windows Server 2003 Service Pack 1 Microsoft explains in the KB article that the scenario that causes corruption is very rare in real world scenarios. Still this should scare anyone in a large environmnent using file services, because Microsoft does not say the probabilty is zero for the unlikely cases, just that it's low. Low! What does that mean? 1%, 5%? If you think you meet the likely criteria, you can either not install Windows Server 2003 SP1 (which is where the problem lies, but I think most organizations have moved forward with deployment already since the "bake in" period is well over) or, uh, MS let you know when then have a hotfix. I wonder if a root cause of the late discovery of this issue, relative to release date for SP1, is the maintenance team building the service packs. I am sure they are all very bright people, but let's face it, the service pack team at Microsoft are not the starters, they're not the A-Team, they're the second string. Microsoft calls it the Windows Sustained Engineering group at least as of 2003 in one reference I found. It is normally this group that produces hotfixes and service packs, but for extraordinary times, like for Windows XP SP2 the main Windows development team is brought back into the fold. Think of this normally as two tracks, the main Windows development team, and the Sustained Engineering team for hotfixes and service packs. The people that wrote Windows are not usually the ones that fix Windows. This isn't to say there is no interaction, I don't know of course, but it is a different team without as much experience on the code when the work is done. Service Packs are also not looked at as major Windows versions (XP SP2 excluded), they're just maintenance releases. I would put money on them being tested less rigourously, not by the test team, but by the outside people the project that really uncover the kruft of the code. Windows Server 2003 Service Pack 1 was based on XP SP2, but XP SP2 certainly wouldn't be stress tested for NTFS corruption. The Windows XP KB feed has no mention of a potential NTFS corruption issue. I am not saying lack of testing or less experienced developers caused this issue, but I am saying NTFS stress tests didn't reveal any potential corruption in Windows Server 2003 RTM and its there in Service Pack 1.