Discussion:
Problem with select() under VMS-8.3 & Mnet-5.2
(too old to reply)
y***@vms.huji.ac.il
2007-06-21 08:17:51 UTC
Permalink
Helo,

I have a strange problem with Select() which I cannot find what I am doing
wrong. The problem started after upgrade from VMS 7.3-2 + Multinet 5.1 to
VMS-8.3 and Multinet 5.2 (with all the patches up to yesterday...).

Background: I am using VMS loginout callback API to verify the passwords
against an external server. I create a UDP session to the other side to
exchange the information. Sendto() and recvfrom() work ok but select() fails
only when it is called under DECW login (Telnet login works ok). From
intermediate printouts I see that the VMS channel number is different (usually
240 for Telnet, 896 for DECwindows login).

The code fragment is:

TableSize = getdtablesize(); /* For Select */
struct timeval TimeoutStruct; /* For Select's timeout */
fd_set ReadFds;

TimeoutStruct.tv_sec = UDP_READ_TIMEOUT; /* Set the timeout */
TimeoutStruct.tv_usec = 0;
FD_ZERO(&ReadFds);
FD_SET(s, &ReadFds);

status = multinet_select(s + 1, &ReadFds, NULL, NULL, &TimeoutStruct);
if(status <= 0) { /* Some error or timeout */
#ifdef DEBUG
socket_perror("Select");
#endif
return -1;
}

The error codes are: VMS errno: 8740, Unix errno: 5.

The same happens also when I replace the first argument from "s + 1" to
TableSize. Any idea what I am doing wrong here?

Thanks! __Yehavi:
Richard Whalen
2007-06-21 13:58:55 UTC
Permalink
Post by y***@vms.huji.ac.il
Helo,
I have a strange problem with Select() which I cannot find what I am doing
wrong. The problem started after upgrade from VMS 7.3-2 + Multinet 5.1 to
VMS-8.3 and Multinet 5.2 (with all the patches up to yesterday...).
Background: I am using VMS loginout callback API to verify the passwords
against an external server. I create a UDP session to the other side to
exchange the information. Sendto() and recvfrom() work ok but select() fails
only when it is called under DECW login (Telnet login works ok). From
intermediate printouts I see that the VMS channel number is different (usually
240 for Telnet, 896 for DECwindows login).
Channel numbers should be expressed in hexidecimal (and they are always a
multiple of 16).
240 = F0, 896 = 380
Post by y***@vms.huji.ac.il
TableSize = getdtablesize(); /* For Select */
struct timeval TimeoutStruct; /* For Select's timeout */
fd_set ReadFds;
TimeoutStruct.tv_sec = UDP_READ_TIMEOUT; /* Set the timeout */
TimeoutStruct.tv_usec = 0;
FD_ZERO(&ReadFds);
FD_SET(s, &ReadFds);
status = multinet_select(s + 1, &ReadFds, NULL, NULL,
&TimeoutStruct);
if(status <= 0) { /* Some error or timeout */
#ifdef DEBUG
socket_perror("Select");
#endif
return -1;
}
The error codes are: VMS errno: 8740, Unix errno: 5.
Unix errno 5 is EIO = I/O Error
VMS errno 8740 translates to E8 (232.) (and with %X7FFF and shift right 3
bits (divide by 8))
232 is %SYSTEM-W-ILLEFC, illegal event flag cluster

The system calls in the code use an event flag assigned with lib$get_ef (the
code assumes that this will succeed). I can see a possible path through the
code that could end up not getting the event flag first, but the value of
vmserrno would not have 8740 and most likely the select would have never
returned.

Have you considered the VMS Authentication Module rather than writing your
own solution?
http://www.process.com/VMSauth/index.html
Post by y***@vms.huji.ac.il
The same happens also when I replace the first argument from "s + 1" to
TableSize. Any idea what I am doing wrong here?
y***@vms.huji.ac.il
2007-06-21 22:16:14 UTC
Permalink
Post by Richard Whalen
Post by y***@vms.huji.ac.il
Helo,
I have a strange problem with Select() which I cannot find what I am doing
wrong. The problem started after upgrade from VMS 7.3-2 + Multinet 5.1 to
VMS-8.3 and Multinet 5.2 (with all the patches up to yesterday...).
...
Have you considered the VMS Authentication Module rather than writing your
own solution?
http://www.process.com/VMSauth/index.html
This is something that worked... At the time I wrote it this was the only
supported way of doing external authentication. I'll take a look of tghe VAM to
see whether it fits my needs.


Thanks, __Yehavi:
Mark Berryman
2007-06-21 22:04:31 UTC
Permalink
Post by y***@vms.huji.ac.il
Helo,
I have a strange problem with Select() which I cannot find what I am doing
wrong. The problem started after upgrade from VMS 7.3-2 + Multinet 5.1 to
VMS-8.3 and Multinet 5.2 (with all the patches up to yesterday...).
Background: I am using VMS loginout callback API to verify the passwords
against an external server. I create a UDP session to the other side to
exchange the information. Sendto() and recvfrom() work ok but select() fails
only when it is called under DECW login (Telnet login works ok). From
intermediate printouts I see that the VMS channel number is different (usually
240 for Telnet, 896 for DECwindows login).
TableSize = getdtablesize(); /* For Select */
struct timeval TimeoutStruct; /* For Select's timeout */
fd_set ReadFds;
TimeoutStruct.tv_sec = UDP_READ_TIMEOUT; /* Set the timeout */
TimeoutStruct.tv_usec = 0;
FD_ZERO(&ReadFds);
FD_SET(s, &ReadFds);
status = multinet_select(s + 1, &ReadFds, NULL, NULL, &TimeoutStruct);
if(status <= 0) { /* Some error or timeout */
#ifdef DEBUG
socket_perror("Select");
#endif
return -1;
}
The error codes are: VMS errno: 8740, Unix errno: 5.
The same happens also when I replace the first argument from "s + 1" to
TableSize. Any idea what I am doing wrong here?
There is at least one major difference between select on Multinet V5.1
and earlier and V5.2. First, a brief explanation.

getdtablesize is supposed to return the number of files that a process
can open. The actual value returned on VMS seems to vary based on
various factors, not the least of which is the CHANNELCNT sysgen
parameter. However, I have multiple systems with similar CHANNELCNT
values and on one the call returns 600, on another it returns 4000 and
on another it returns 4096. Obviously, getdtablesize obtains this value
at runtime.

There is also a compile time parameter, called FD_SETSIZE, that
determines size of the various FD Sets used by the select call. By
default, this value is 512 (it is 1024 in TCPIP services) and Multinet's
documentation recommends that it not be changed (although it is okay to
increase it if need be).

And finally, select only works on channels assigned to sockets. It
won't work on any other VMS device.

So now we reach a point where we've called select, passing the value
returned by getdtablesize and one or more fd_set data structures whose
size is governed by FD_SETSIZE. There is a good chance that the value
returned from getdtablesize is going to be greater than the value of
FD_SETSIZE which is going to cause select to read garbage as it tries to
determine what channels to check. In Multinet V5.1 and earlier, this
garbage was ignored whether it was an invalid channel or a valid channel
that pointed to a non-socket device. In V5.2, an error is returned.
Usually it will be EBADF, indicating an invalid channel. But it may
also be possible for it to return an I/O error.

If you maximize the value returned by getdtablesize against FD_SETSIZE
before passing it to select, you will prevent this error from happening.

In your case, it is very likely that you will hit this error when you
use the value returned by getdtablesize. Whether or not you encounter
this error when using "s" would depend on whether FD_SETSIZE has been
redefined or if the channel number being used is greater than FD_SETSIZE
(which appears to be the case for the DECWindows login). Try increasing
the value of FD_SETSIZE in your program to at least 1024 and see what
that does. You can also try increasing it to the current value of your
CHANNELCNT Sysgen parameter.

Mark Berryman

Loading...