(Someone on this list appear to be using an annoying and severely miss-configured spam prevention systems called COSI, or something, that sends mail to everyone that posts to the list asking them to respond with cookies, a cookie that goes to the original author. If people have to use such systems at all, please at least configure them properly, and have them include information on whom the person requesting the cookie is so I can consciously ignore it.)
nisse@lysator.liu.se (Niels Möller) writes:
I think I understand the problem. First, let me explain the connection_lock/connection_unlock thing. The point is to make sure that that during the processing of one userauth message, further messages that are received from the client are not processed, just queued (in lshd or in kernel socket buffers) for processing when finished with the current userauth message. Userauth messages need to be serialized in the server, while the client is allowed to send any number of userauth messages without waiting for replies.
This doesn't matter much for userauth methods which results in an answer right away, without returning to the main select loop. The main reason for it is password authentication with a helper program, as the userauth handler will then spawn a process, set an exit callback on the process, and return to the main select loop. Later the exit callback is invoked, it will pass a value or an exception back to the userauth code which then unlocks the connection.
The connection is locked in server_userauth:do_handle_userauth, and unlocked in the continuation handler do_userauth_continuation (invoked when userauth succeeds) and in do_exc_userauth_handler, for exceptions of type EXC_USERAUTH or EXC_USERAUTH_SPECIAL.
For all earlier userauth methods, the client sends a SSH_MSG_USERAUTH, and then the server replies with a success or failure or some special message like PUBLICKEY_OK. GSS-API seems a little different, due to the SSH_MSG_USERAUTH_GSSAPI_TOKEN and SSH_MSG_USERAUTH_GSSAPI_EXCHANGE_COMPLETE which are sent by the client. The general server userauth code don't see these, so the connection isn't locked automatically, but the general userauth code still tries to unlock it when you invoke the continuation or exception handler.
I think the simplest way to solve the problem is to add calls to connection_lock at the start of the packet handlers you install.
Adding connection_lock(connection) to do_handle_gssapi_finish (i.e., before COMMAND_RETURN is invoked) solves the problem, and authentication succeeds and the login proceeds. However, if I add the lock to do_handle_gssapi_token, lsh stalls after responding with the GSSAPI_TOKEN to the client. It seems to never get the EXCHANGE_COMPLETE from the client. For now I'll just ignore this problem since it works, and I don't understand how to improve it.
That may not get things exactly right, one also needs to consider a client that doesn't send SSH_MSG_USERAUTH_GSSAPI_EXCHANGE_COMPLETE as expected, but instead sends a new SSH_MSG_USERAUTH. Without having read the GSS-API spec, I guess that such client behaviour should either cancel the GSS-API exchange which is in progress, or raise a protocol error. Changes to the general server_userauth code may be needed to get that right, as there currently is no notion of a "subprotocol" being in progress, nor for cancelling such a subprotocol.
It should restart authentication, and the previous authentication attempt should be forgotten. I think this will work without much extra logic, since the newly invoked userauth_gssapi code installs fresh handlers for the GSSAPI messages every time. Probably, they could kill any existing handler to make sure resources are deallocated.
Thanks for the explanation.
maj_stat = gss_acquire_cred (&min_stat, GSS_C_NO_NAME, 0, GSS_C_NULL_OID_SET, GSS_C_ACCEPT, &cred, NULL, NULL);
Can gss_acquire_cred (or other gss-api functions) block, for example for contacting a kerberos server? Then the entire lshd server will block too. If that is a problem, we need either a non-blocking "native" interface to gss-api, or put the gss-api code in a separate process.
GSS is generic, so anything can occur, but the specification says the function is NOT intended to do a network logon, and that if the operation takes time it can be delayed until gss_accept_sec_context(). For Kerberos 5, GSS in server never does any network communication, it just reads a secret key from a file and parse and generate tokens using it. I wouldn't worry about this until someone has a problem. What is the worst problem this could cause anyway? I can only think of the lsh server core being delayed when responding to the client, but the client should kind of expect this anyway, when it requests a GSS mechanism that takes a very long time to finish.
Thanks again, Simon