Thursday, 25 February 2010

A true HP-UX gem

Yesterday I've stumbled across a tool Pit and I wrote a few years ago. In honor of his birthday I decided to devote a blog entry to this (don't worry: the technical details are quite interesting).

The scenario: we were facing issues in our production environment with a multi-process server application that binds an UDP port to a multicast address on HP-UX. Every now and than the system would stop processing messages sent to its multicast address. After quite some time we realized that this ocurred whenever one of the server applications processes terminated (ie: crashed). Since this server application was a service provider on the corporate/enterprise service bus it meant that that particular service was available in one less server (as in system) instance to the service bus whenever a process terminated. That could turn in to a real problem quickly, so we needed a solution that would allow a quick problem recovery until the real issue was fixed.

The problem: HP-UX 11.11 had a bug that didn't account for multiple "participants" (processes bound to the same multicast address). In other words: if you had multiple processes on the server that used the same multicast address and one of these processes terminates then HP-UX (incorrectly) instructed the network card to not listen for incoming packets to that multicast address any longer (well, the corresponding MAC address). The remaining server processes would keep on waiting for incoming packets but the network card just didn't want them any more. The theorectical solution looked easy: if a process bound itself on that multicast address then HP-UX would instruct the network card to listen to multicast packets again. The two obvious implementation alternatives weren't quite acceptable:

  • restart the entire process tree -> a restart took too long
  • start a dummy process that binds to the multicast address and puts itself to sleep (it must not terminate because that would reverse the intended effect) -> how many dummy processes would we have over the course of several months?
So, we needed another solution and we found one. HP-UX implements an interface named DLPI (Data Link Provider Interface), which allows low-level manipulation of network card state. We wrote a small tool that used DLPI to manually add the corresponding MAC address of the multicast address to the network card's MAC address for which it should accept packets. This tool was run via CRON every minute and the problem was "solved" until a fix became available for HP-UX (which happened with 11.20 if I remember correctly). Here's your birthday present, Pit: http://pastie.org/private/qeazuwrqnmar25hzo4reg (I am pretty sure you did't have the sources for that tool any more)

Posted by Jürgen Pabel on 25 February 2010 at 00:00

 

[Trackback URL for this entry]

Your comment:

(not displayed)
 
 

Live Comment Preview:

 
« First  « Prev   1 2 3 4 5   Next »  Last »
« February »
MonTueWedThuFriSatSun
1234567
891011121314
15161718192021
22232425262728