Simon Josefsson jas@extundo.com writes:
Adding connection_lock(connection) to do_handle_gssapi_finish (i.e., before COMMAND_RETURN is invoked) solves the problem, and authentication succeeds and the login proceeds. However, if I add the lock to do_handle_gssapi_token, lsh stalls after responding with the GSSAPI_TOKEN to the client.
Hmm, that's natural, given that do_handle_gssapi_token uses C_WRITE to write the response, not EXC_USERAUTH_SPECIAL. The automatic locking and unlocking happens only when the flow of control passes through the general userauth code, which it doesn't do here.
That may not get things exactly right, one also needs to consider a client that doesn't send SSH_MSG_USERAUTH_GSSAPI_EXCHANGE_COMPLETE as expected, but instead sends a new SSH_MSG_USERAUTH. Without having read the GSS-API spec, I guess that such client behaviour should either cancel the GSS-API exchange which is in progress, or raise a protocol error. Changes to the general server_userauth code may be needed to get that right, as there currently is no notion of a "subprotocol" being in progress, nor for cancelling such a subprotocol.
It should restart authentication, and the previous authentication attempt should be forgotten. I think this will work without much extra logic, since the newly invoked userauth_gssapi code installs fresh handlers for the GSSAPI messages every time. Probably, they could kill any existing handler to make sure resources are deallocated.
I suspect some extra logic is needed, but perhaps not much. Consider a client sending
SSH_MSG_USERAUTH "gssapi" (starting a gssapi "session") SSH_MSG_USERAUTH "none" (restarting authentication) SSH_MSG_USERAUTH_GSSAPI_TOKEN
Then the client should get a protocol error, but it might make contact with your old handler which is still installed. It might be simple to fix, by having the handler for SSH_MSG_USERAUTH reset all handlers for messages in the userauth range.
Can gss_acquire_cred (or other gss-api functions) block, for example for contacting a kerberos server? Then the entire lshd server will block too.
What is the worst problem this could cause anyway? I can only think of the lsh server core being delayed when responding to the client, but the client should kind of expect this anyway, when it requests a GSS mechanism that takes a very long time to finish.
What will happen is that the lshd server will block and stop responding on *all* connections, for all users. So it's a denial of service attack on the other users of the system.
Regards, /Niels