Speaking of umac, I'm also looking at the umac context structs, for potential micro optimizations and fixes before it becomes a part of the ABI.
Some fields, like nonce_length, index, and (for umac32 and umac64) nonce_low, fit in 16 or even 8 bits. So it might make sense to make them adjacent.
And on the other hand, the umac block count is currently unsigned, and will wraparound after 2*32 blocks or 2^42 bytes. Other hash functions typically support data sizes up to 2^64 (except sha512 which uses a 128-bit coutner, which seems gross overkill).
For umac, the block counter is only needed to keep track of when to switch to different layer 2 hashing, and to keep track of odd and even blocks for poly128. So it could probably be made to work with only 16 bits and some saturation logic. But extending it to 64 bits seems simpler.
It would also be nice if we could force 16-byte alignment for the l1_key array (this is important for assembly routines), which would them imply 16-byte alignment for the complete context struct. Could help x86 sse2 assembly. And could help also on ARM, but I'm not sure if the system (primarily linker and malloc) really makes 16-byte alignment possible there.
And it would also be good to get a reasonably large alignment for the block buffer.
In gcc, there's __attribute__ ((aligned (16))), but since this gets part of the ABI, we can't use it in public headers unless we can specify the same alignment for *all* reasonable compilers for the given architecture.
Regards, /Niels