Hi,
Nettle includes a function for side-channel silent modular inversion, which asymptotically is O(n^2), like binary gcd but slower by a pretty large constant factor.
For prime moduli, inversion can also be done via a^{-1} = a^{p-2} (mod p). That's asymptotically O(n^3) (if the underlying multiplies are done with the basic O(n^2) algorithm), but it may nevertheless be faster than the other method for numbers of the size used for Nettle's elliptic curves.
The code for curve25519 and curve448 has been using powering to invert for a long time. I've now spent some time writing specific powering code for the five secp curves as well. I've found fairly efficient addition chains where powering for a prime of n bits needs n-1 squarings and about a dozen multiplies. (I don't know what the *optimal* addition chains are, if you know of tools for that, let me know).
Code is on the branch optimize-ecc-invert. However, there's some risk the new code is slower on some platforms, in particular platforms with slow multiplication.
The main benchmark is
./examples/hogweed-benchmark ecdsa
To get numbers for just the changed function, one can run
./examples-ecc-benchmark
and look at the modinv column. Please compare performance between this branch and master, on the platforms that are important to you.
And to get relevant numbers, make sure to build using a recent GMP library; performance when built with mini-gmp is not so important.
I will merge these changes to master in a week or two, if no problems show up
I've benchmarked on one recent and one older x86_64 machine, and on raspberry-pi version 1 (the slowest ARM machine I had easy access to). I've seen improvements in ecdsa signing performance for all the secp curves, ranging from 5% up to 20% depending on curve and platform.
Not all inversions are rewritten. I haven't changed the modq inversion which is needed for ecdsa, and I haven't changed the code for the two supported gost curves.
Regards, /Niels