mjfrank -> blocked tasks in Interix on Win 2K3 (Jan. 25, '06, 12:50:10 PM) |
I have a system that performs batch processing for my company. It is running Windows 2k3 server, and SFU 3.5, with no service packs that I'm aware of. (I'm a developer, not the sys admin, who is out of the office this week.) The source code is compiled under Interix 3.5.
The batch process is running within a ksh window, and shells off to run OS commands such as "rm", "mv", and "chmod" during the run, by using the C "system()" call. The commands are executed on files and directories that the batch process creates. The problem we are having is that at some undetermined interval, the OS commands appear to stop returning back to the calling program. At that time, you can observe multiple ksh tasks in the windows Task Manager, where there was only one before, and if you perform a ps -eaf in a ksh window, you can see the specific OS commands as tasks still sitting there. (like zombies) At this point, the batch processing is unresponsive, waiting for the system() function to return.
If I kill the most "child" process, the processing will continue to failure, due to the locked command in question not executing properly.
The system has had this problem for at least a year that I am aware of, and has required reboots on the order of once every couple of days. But, as of last week, it has suddenly gotten decidedly worse, and now can only get through three or four jobs before the problem occurs. The only change that I am aware of that corresponds to this timeframe is that the Administrator password on our Domain was changed, and this machine does log into the Domain.
Any suggestions on where/how to attack this problem?
Any help you can provide is greatly appreciated.
Thanks,
Michael Frank |
|
|
|