Discussion:
Cluster Aliases goes away
(too old to reply)
Marty Kuhrt
2008-08-06 21:34:27 UTC
Permalink
Had an interesting problem pop up today. I restarted the Multinet
server and the cluster aliases stopped working.

Process Software MultiNet V5.2 Rev A-X, AlphaServer DS10L 466 MHz,
OpenVMS AXP V7.3-2

All the patches up to around 31-MAR, the last reboot.

The OPCOM error message I received when trying to do a restart was:

%%%%%%%%%%% OPCOM 6-AUG-2008 14:00:44.94 %%%%%%%%%%%
Message from user MARTY on BLUE
MultiNet Server: Cluster Alias: Failed to register alias 172.17.17.238
status 49

Error 49? An odd error number? I tried to set the terminal width to
132, thinking a digit or two got chomped. Nope.

Eventually I rebooted (GASP) and it works fine now.

So what is an error 49?
Patrick Mahan
2008-08-07 04:30:15 UTC
Permalink
look at errno.h (I don't recall where the bloody thing
is stored anymore)

EADDRNOTAVAIL - can't assign requested address.

Patrick

Marty Kuhrt presented these words - circa 8/6/08 2:34 PM->
Post by Marty Kuhrt
Had an interesting problem pop up today. I restarted the Multinet
server and the cluster aliases stopped working.
Process Software MultiNet V5.2 Rev A-X, AlphaServer DS10L 466 MHz,
OpenVMS AXP V7.3-2
All the patches up to around 31-MAR, the last reboot.
%%%%%%%%%%% OPCOM 6-AUG-2008 14:00:44.94 %%%%%%%%%%%
Message from user MARTY on BLUE
MultiNet Server: Cluster Alias: Failed to register alias 172.17.17.238
status 49
Error 49? An odd error number? I tried to set the terminal width to
132, thinking a digit or two got chomped. Nope.
Eventually I rebooted (GASP) and it works fine now.
So what is an error 49?
Marty Kuhrt
2008-08-08 05:15:46 UTC
Permalink
Post by Patrick Mahan
look at errno.h (I don't recall where the bloody thing
is stored anymore)
EADDRNOTAVAIL - can't assign requested address.
Patrick
Marty Kuhrt presented these words - circa 8/6/08 2:34 PM->
Post by Marty Kuhrt
Had an interesting problem pop up today. I restarted the Multinet
server and the cluster aliases stopped working.
Process Software MultiNet V5.2 Rev A-X, AlphaServer DS10L 466 MHz,
OpenVMS AXP V7.3-2
All the patches up to around 31-MAR, the last reboot.
%%%%%%%%%%% OPCOM 6-AUG-2008 14:00:44.94 %%%%%%%%%%%
Message from user MARTY on BLUE
MultiNet Server: Cluster Alias: Failed to register alias 172.17.17.238
status 49
Error 49? An odd error number? I tried to set the terminal width to
132, thinking a digit or two got chomped. Nope.
Eventually I rebooted (GASP) and it works fine now.
So what is an error 49?
OK, but what does _that_ mean? All of a sudden .238 stopped responding
which made the name servers, web servers, time servers, and anyone else
using that alias, stop responding. Anything that was looking to .238 to
respond started chirping "host down".

Can't assign address, why?

And just an FYI, I restarted the Multinet server to roll the smtp.log
file (to do some smtp debugging). Seems you have to rename the log file
and then restart the main MU server to get it to stop writing the old
log file.
Patrick Mahan
2008-08-08 20:43:45 UTC
Permalink
Marty Kuhrt presented these words - circa 8/7/08 10:15 PM->
Post by Marty Kuhrt
Post by Patrick Mahan
look at errno.h (I don't recall where the bloody thing
is stored anymore)
EADDRNOTAVAIL - can't assign requested address.
Patrick
Marty Kuhrt presented these words - circa 8/6/08 2:34 PM->
Post by Marty Kuhrt
Had an interesting problem pop up today. I restarted the Multinet
server and the cluster aliases stopped working.
Process Software MultiNet V5.2 Rev A-X, AlphaServer DS10L 466 MHz,
OpenVMS AXP V7.3-2
All the patches up to around 31-MAR, the last reboot.
%%%%%%%%%%% OPCOM 6-AUG-2008 14:00:44.94 %%%%%%%%%%%
Message from user MARTY on BLUE
MultiNet Server: Cluster Alias: Failed to register alias
172.17.17.238 status 49
Error 49? An odd error number? I tried to set the terminal width to
132, thinking a digit or two got chomped. Nope.
Eventually I rebooted (GASP) and it works fine now.
So what is an error 49?
OK, but what does _that_ mean? All of a sudden .238 stopped responding
which made the name servers, web servers, time servers, and anyone else
using that alias, stop responding. Anything that was looking to .238 to
respond started chirping "host down".
Can't assign address, why?
And just an FYI, I restarted the Multinet server to roll the smtp.log
file (to do some smtp debugging). Seems you have to rename the log file
and then restart the main MU server to get it to stop writing the old
log file.
It could be for many reasons: 1) failed to get the cluster lock setup, the
address is invalid for your configuration (not sure what your subnet mask,
node IP, etc are set too), 2) This address might already be in use on
another node? 3) Arp cache was corrupted causing the IP address to fail validation?

I have not played on a MulitNet system for close to 10 years now, so I apologize
for not being more helpful. (but I did help develop and maintain it for close to
8 years)

Good luck,

Patrick Mahan
nee Window Washer
Richard Whalen
2008-08-11 02:00:07 UTC
Permalink
49 is EADDRNOTAVAIL, which means that it could not assign the requested
address. MultiNet 5.n uses the routing tables to determine which
interface should get the cluster alias assigned to it; if it can not
find an interface that has the proper routing for the desired address,
then it will not assign it to an interface.

-----Original Message-----
From: Patrick Mahan [mailto:***@mahan.org]
Sent: Friday, August 08, 2008 4:44 PM
To: info-***@process.com
Subject: Re: Cluster Aliases goes away



Marty Kuhrt presented these words - circa 8/7/08 10:15 PM->
Post by Marty Kuhrt
look at errno.h (I don't recall where the bloody thing is stored
anymore)
EADDRNOTAVAIL - can't assign requested address.
Patrick
Marty Kuhrt presented these words - circa 8/6/08 2:34 PM->
Post by Marty Kuhrt
Had an interesting problem pop up today. I restarted the Multinet
server and the cluster aliases stopped working.
Process Software MultiNet V5.2 Rev A-X, AlphaServer DS10L 466 MHz,
OpenVMS AXP V7.3-2
All the patches up to around 31-MAR, the last reboot.
%%%%%%%%%%% OPCOM 6-AUG-2008 14:00:44.94 %%%%%%%%%%%
Message from user MARTY on BLUE
MultiNet Server: Cluster Alias: Failed to register alias
172.17.17.238 status 49
Error 49? An odd error number? I tried to set the terminal width
to 132, thinking a digit or two got chomped. Nope.
Eventually I rebooted (GASP) and it works fine now.
So what is an error 49?
OK, but what does _that_ mean? All of a sudden .238 stopped
responding which made the name servers, web servers, time servers, and
anyone else using that alias, stop responding. Anything that was
looking to .238 to respond started chirping "host down".
Can't assign address, why?
And just an FYI, I restarted the Multinet server to roll the smtp.log
file (to do some smtp debugging). Seems you have to rename the log
file and then restart the main MU server to get it to stop writing the
old log file.
It could be for many reasons: 1) failed to get the cluster lock setup,
the address is invalid for your configuration (not sure what your subnet
mask, node IP, etc are set too), 2) This address might already be in use
on another node? 3) Arp cache was corrupted causing the IP address to
fail validation?

I have not played on a MulitNet system for close to 10 years now, so I
apologize for not being more helpful. (but I did help develop and
maintain it for close to
8 years)

Good luck,

Patrick Mahan
nee Window Washer
Marty Kuhrt
2008-08-19 22:38:20 UTC
Permalink
Post by Richard Whalen
49 is EADDRNOTAVAIL, which means that it could not assign the requested
address. MultiNet 5.n uses the routing tables to determine which
interface should get the cluster alias assigned to it; if it can not
find an interface that has the proper routing for the desired address,
then it will not assign it to an interface.
Why would you think restarting the MU server to roll the SMTP logfile on
a single node (for now) cluster cause this failure? How can it be
prevented in the future?
Post by Richard Whalen
-----Original Message-----
Sent: Friday, August 08, 2008 4:44 PM
Subject: Re: Cluster Aliases goes away
Marty Kuhrt presented these words - circa 8/7/08 10:15 PM->
Post by Marty Kuhrt
look at errno.h (I don't recall where the bloody thing is stored
anymore)
EADDRNOTAVAIL - can't assign requested address.
Patrick
Marty Kuhrt presented these words - circa 8/6/08 2:34 PM->
Post by Marty Kuhrt
Had an interesting problem pop up today. I restarted the Multinet
server and the cluster aliases stopped working.
Process Software MultiNet V5.2 Rev A-X, AlphaServer DS10L 466 MHz,
OpenVMS AXP V7.3-2
All the patches up to around 31-MAR, the last reboot.
%%%%%%%%%%% OPCOM 6-AUG-2008 14:00:44.94 %%%%%%%%%%%
Message from user MARTY on BLUE
MultiNet Server: Cluster Alias: Failed to register alias
172.17.17.238 status 49
Error 49? An odd error number? I tried to set the terminal width
to 132, thinking a digit or two got chomped. Nope.
Eventually I rebooted (GASP) and it works fine now.
So what is an error 49?
OK, but what does _that_ mean? All of a sudden .238 stopped
responding which made the name servers, web servers, time servers, and
anyone else using that alias, stop responding. Anything that was
looking to .238 to respond started chirping "host down".
Can't assign address, why?
And just an FYI, I restarted the Multinet server to roll the smtp.log
file (to do some smtp debugging). Seems you have to rename the log
file and then restart the main MU server to get it to stop writing the
old log file.
It could be for many reasons: 1) failed to get the cluster lock setup,
the address is invalid for your configuration (not sure what your subnet
mask, node IP, etc are set too), 2) This address might already be in use
on another node? 3) Arp cache was corrupted causing the IP address to
fail validation?
I have not played on a MulitNet system for close to 10 years now, so I
apologize for not being more helpful. (but I did help develop and
maintain it for close to
8 years)
Good luck,
Patrick Mahan
nee Window Washer
Richard Whalen
2008-08-20 13:03:58 UTC
Permalink
The MultiNet master server maintains the lock that says which system in
the cluster is currently offering the alias address. The command file
that is used to restart the master server releases ownership of the
alias and lock, which allows another system to obtain it. When the
master server has restarted it should be queued for ownership of the
lock. It is necessary to tell the master server of the system that
currently maintains the cluster alias to release it so that the cluster
alias can roll over to another system.

Was there another cluster member that could have possibly taken over the
alias? The cluster alias was originally designed for UDP traffic.
Changes were made such that most TCP traffic can now work on it as
customers insisted on using it for TCP traffic even when told that it
wasn't designed for it and that either DNS load balancing or round-robin
DNS would be a better choice. BIND 9 has caused a number of
difficulties for customers that are using DNS load balancing and work is
being done to address these issues.


-----Original Message-----
From: Marty Kuhrt [mailto:***@spamloop.kuhrt.net]
Sent: Tuesday, August 19, 2008 6:38 PM
To: info-***@process.com
Subject: Re: Cluster Aliases goes away
Post by Richard Whalen
49 is EADDRNOTAVAIL, which means that it could not assign the
requested
Post by Richard Whalen
address. MultiNet 5.n uses the routing tables to determine which
interface should get the cluster alias assigned to it; if it can not
find an interface that has the proper routing for the desired address,
then it will not assign it to an interface.
Why would you think restarting the MU server to roll the SMTP logfile on

a single node (for now) cluster cause this failure? How can it be
prevented in the future?
Post by Richard Whalen
-----Original Message-----
Sent: Friday, August 08, 2008 4:44 PM
Subject: Re: Cluster Aliases goes away
Marty Kuhrt presented these words - circa 8/7/08 10:15 PM->
Post by Marty Kuhrt
look at errno.h (I don't recall where the bloody thing is stored
anymore)
EADDRNOTAVAIL - can't assign requested address.
Patrick
Marty Kuhrt presented these words - circa 8/6/08 2:34 PM->
Post by Marty Kuhrt
Had an interesting problem pop up today. I restarted the Multinet
server and the cluster aliases stopped working.
Process Software MultiNet V5.2 Rev A-X, AlphaServer DS10L 466 MHz,
OpenVMS AXP V7.3-2
All the patches up to around 31-MAR, the last reboot.
%%%%%%%%%%% OPCOM 6-AUG-2008 14:00:44.94 %%%%%%%%%%%
Message from user MARTY on BLUE
MultiNet Server: Cluster Alias: Failed to register alias
172.17.17.238 status 49
Error 49? An odd error number? I tried to set the terminal width
to 132, thinking a digit or two got chomped. Nope.
Eventually I rebooted (GASP) and it works fine now.
So what is an error 49?
OK, but what does _that_ mean? All of a sudden .238 stopped
responding which made the name servers, web servers, time servers, and
anyone else using that alias, stop responding. Anything that was
looking to .238 to respond started chirping "host down".
Can't assign address, why?
And just an FYI, I restarted the Multinet server to roll the smtp.log
file (to do some smtp debugging). Seems you have to rename the log
file and then restart the main MU server to get it to stop writing the
old log file.
It could be for many reasons: 1) failed to get the cluster lock setup,
the address is invalid for your configuration (not sure what your subnet
mask, node IP, etc are set too), 2) This address might already be in use
on another node? 3) Arp cache was corrupted causing the IP address to
fail validation?
I have not played on a MulitNet system for close to 10 years now, so I
apologize for not being more helpful. (but I did help develop and
maintain it for close to
8 years)
Good luck,
Patrick Mahan
nee Window Washer
Marty Kuhrt
2008-08-20 18:29:28 UTC
Permalink
Post by Richard Whalen
The MultiNet master server maintains the lock that says which system in
the cluster is currently offering the alias address. The command file
that is used to restart the master server releases ownership of the
alias and lock, which allows another system to obtain it. When the
master server has restarted it should be queued for ownership of the
lock. It is necessary to tell the master server of the system that
currently maintains the cluster alias to release it so that the cluster
alias can roll over to another system.
Was there another cluster member that could have possibly taken over the
alias?
It was a single node cluster.
Post by Richard Whalen
The cluster alias was originally designed for UDP traffic.
Changes were made such that most TCP traffic can now work on it as
customers insisted on using it for TCP traffic even when told that it
wasn't designed for it and that either DNS load balancing or round-robin
DNS would be a better choice. BIND 9 has caused a number of
difficulties for customers that are using DNS load balancing and work is
being done to address these issues.
Are there any handy instructions on how to get MU DNS configured to do
either?

Thanks,
Marty
Richard Whalen
2008-08-20 19:25:36 UTC
Permalink
If you are running MASTER_SERVER-020_A052 or MASTER_SERVER-050_A051 (or
later), then the cluster alias shutdown should work correctly in a
single node cluster. Earlier patches will probably have a problem.

DNS load balancing and round-robin DNS don't create an additional
address, so you would have to create a PD interface for the address.

http://www.process.com/tcpip/mndocs52/ADMIN_GUIDE/Ch10.htm#E55E67
describes DNS load balancing.

The name server will automatically do round-robin when you have multiple
A records for a name, the O'Reilly DNS and BIND book gives the following
example:
Foo.bar.biz 60 IN A 192.1.1.1
Foo.bar.biz 60 IN A 192.1.1.2
Foo.bar.biz 60 IN A 192.1.1.3

-----Original Message-----
From: Marty Kuhrt [mailto:***@spamloop.kuhrt.net]
Sent: Wednesday, August 20, 2008 2:29 PM
To: info-***@process.com
Subject: Re: Cluster Aliases goes away
Post by Richard Whalen
The MultiNet master server maintains the lock that says which system in
the cluster is currently offering the alias address. The command file
that is used to restart the master server releases ownership of the
alias and lock, which allows another system to obtain it. When the
master server has restarted it should be queued for ownership of the
lock. It is necessary to tell the master server of the system that
currently maintains the cluster alias to release it so that the cluster
alias can roll over to another system.
Was there another cluster member that could have possibly taken over the
alias?
It was a single node cluster.
Post by Richard Whalen
The cluster alias was originally designed for UDP traffic.
Changes were made such that most TCP traffic can now work on it as
customers insisted on using it for TCP traffic even when told that it
wasn't designed for it and that either DNS load balancing or
round-robin
Post by Richard Whalen
DNS would be a better choice. BIND 9 has caused a number of
difficulties for customers that are using DNS load balancing and work is
being done to address these issues.
Are there any handy instructions on how to get MU DNS configured to do
either?

Thanks,
Marty
Marty Kuhrt
2008-08-21 17:36:18 UTC
Permalink
Post by Richard Whalen
If you are running MASTER_SERVER-020_A052 or MASTER_SERVER-050_A051 (or
later), then the cluster alias shutdown should work correctly in a
single node cluster. Earlier patches will probably have a problem.
I'm running...

Process Software MultiNet V5.2 Rev A-X, AlphaServer DS10L 466 MHz,
OpenVMS AXP V7.3-2

with the MASTER_SERVER-040_A052 patch (among others) so I should be OK,
there.
Post by Richard Whalen
DNS load balancing and round-robin DNS don't create an additional
address, so you would have to create a PD interface for the address.
http://www.process.com/tcpip/mndocs52/ADMIN_GUIDE/Ch10.htm#E55E67
describes DNS load balancing.
The name server will automatically do round-robin when you have multiple
A records for a name, the O'Reilly DNS and BIND book gives the following
Foo.bar.biz 60 IN A 192.1.1.1
Foo.bar.biz 60 IN A 192.1.1.2
Foo.bar.biz 60 IN A 192.1.1.3
Thanks for the info, I'll look into it.

I was previously under the impression that stateless TCP stuff could use
the cluster alias address, while stateful stuff could not. I have my
outside world MX record pointing to a NAT address that translates to the
real machine that does mail. About the only other stuff this "cluster"
does, accessible to the outside world, is HTTP. I have that outside
world record pointing to a NAT address that translates to the cluster
alias. That way, I thought, if I ran multiple machines with a shared
Apache configuration, "the magic" would route it to whoever was available.

Looks like I may need a re-think.

Thanks, again.

Any helpful hints and doc pointers appreciated.

Marty
Post by Richard Whalen
-----Original Message-----
Sent: Wednesday, August 20, 2008 2:29 PM
Subject: Re: Cluster Aliases goes away
Post by Richard Whalen
The MultiNet master server maintains the lock that says which system
in
Post by Richard Whalen
the cluster is currently offering the alias address. The command file
that is used to restart the master server releases ownership of the
alias and lock, which allows another system to obtain it. When the
master server has restarted it should be queued for ownership of the
lock. It is necessary to tell the master server of the system that
currently maintains the cluster alias to release it so that the
cluster
Post by Richard Whalen
alias can roll over to another system.
Was there another cluster member that could have possibly taken over
the
Post by Richard Whalen
alias?
It was a single node cluster.
Post by Richard Whalen
The cluster alias was originally designed for UDP traffic.
Changes were made such that most TCP traffic can now work on it as
customers insisted on using it for TCP traffic even when told that it
wasn't designed for it and that either DNS load balancing or
round-robin
Post by Richard Whalen
DNS would be a better choice. BIND 9 has caused a number of
difficulties for customers that are using DNS load balancing and work
is
Post by Richard Whalen
being done to address these issues.
Are there any handy instructions on how to get MU DNS configured to do
either?
Thanks,
Marty
Richard Whalen
2008-08-22 13:48:14 UTC
Permalink
From the point of the network TCP is always stateful, though some TCP
connections last longer than others.

The current (for over 10 years) implementation of http keeps the
connection to an ip address open after transferring the html file
because the html file often references other files that need to be
transferred from the same node. (I do not know what the criteria are
for closing the connection.)

I'm guessing, but I would say that the following scenario might have
caused the problem.
A connection was still open when the attempt to delete the address was
made. This prevented the address from actually being deleted, but the
lock was still released and the master server restarted. When the
master server restarted it found that it could obtain the lock, so it
tried to add the address to the list of those on the appropriate
interface, but it found the address was already present and hence the
unavailable error was returned.

Loading...