Re: ECC plans

23 Jan 2013

      On Fri, Jan 18, 2013 at 9:35 AM, Niels Möller nisse@lysator.liu.se wrote:
...
...
How does this algorithm compare to others?
If I read the gnutls code right, ecc_mulmod_cache makes the same number
of ecc_projective_dbl_point calls as the non-cached version? If that is
right, any comb-like algorithm should be able to beat it, since those
algorithms reduce the number of point doublings and the number of
iterations in the outer loop. Depending on selected table size, of
course.
In the main loop there is no difference in the wmnaf code, but outside
the loop the cached version does one doubling less and several
additions less.
...
Do you have any ecc benchmarking code I could borrow?
I use gnutls-cli's benchmark-tls-kx option. It measures TLS key
exchange time (one of them is ECDHE-ECDSA using an 192-bit curve).
...
...
Also for that part you don't need
constant time. [...] (in ECDH you also don't care unless it is fixed key ECDH).
In ECDH, any data dependent timing does leak some information about your
secret exponent. What you're saying, if I understand you correctly, is
that since you usually use each secret exponent only once, an attacker
will in most cases not be able to collect enough information to do any
damage.
Yes. That's the way ECDHE is used in gnutls. No key is reused and the
timing information from a single session isn't sufficient to recover
it.
...
...
I would put that of lower priority that signing. Servers (where
performance matters) rarely verify ECDSA keys.
Noted (although I'm not sure I agree it's less important than signing
performance).
Sorry, indeed, that's for TLS only. In TLS usually only the server is
authenticated. If there is client authentication too, then this is
also important.
...
...
Why is that? Did you see any issues on the timing resistant version of
the function?
Let me first explain what I mean when I talk about a "side-channel
silent" function. That means that if we call the function with operands
which are of the same size, but otherwise different, the function should
execute exactly the same sequence of instructions in both cases, and
access memory in exactly the same pattern.
If one then assumes that the underlying machine instructions have data
independent timing (true for most current cpus), we leak no side
information from timing or cache behaviour. We may still leak
information through power analysis, if, e.g., executing a multiplication
instructions consumes different amount of energy depending on the input
bit patterns.
You are correct, but power analysis is useful for dedicated devices
such as smart cards. A generic-purpose library will not be used there
anyway. Protection from timing analysis is important, and while it is
not precisely measured, the gnutls wmnaf code contains
counter-measures for that. If you check the main loop in
ecc_mulmod_cached_timing() there is an "else" case where a dummy
addition is being performed. That is to make the number of operations
used independent of the input data.
...
I'm not intimately familiar with wmnaf, but it looks like the function
ecc_wMNAF has a lot of branches depending on the exponent bits.
Could be, but ecc_wMNAF is a fixed conversion that doesn't depend on
any input from an attacker. That is the best you can do is associate
the key with a time needed for its wmNAF conversion. I don't think
this is better than the association with its public key.
...
Next, in the main ecc_mulmod functions, the timing resistant versions
calls the madd primitive the same number of times independent of the
values of the digits. But there are some small remaining leaks (by the
above pretty high standard):

If wmnaf_len depends on the exponent bit pattern, then so does the
number of iterations through the loop.

This is a fixed association again. An attack cannot control wmnaf_len
or the number of iterations. Timing attacks usually work by the
attacker trying different inputs and measuring the different timings.
In the cases you describe, as far as I understand, different input
will not provide additional timing information.
...

Branches depending on whether digit is > 0, < 0 or == 0. Those will
have slightly different timing. Since digit == 0 is unlikely, that
branch will be badly predicted. And it also affects instruction
cache.
For the data cache, accesses to the wmnaf table are obviously data
dependent. Also, data is written to different locations in memory (R
or T) depending on whether or not the current digit is non-zero.

Could be, these are very cpu dependent. If your argumentation is that
a really safe timing resistant multiplication version has to be used,
then I wouldn't disagree, if it's up to the level that is not so slow
that no-one uses it. However, there is no reason for the code that
does not need to be timing resistant, not to be as fast as possible.
...
...
...
In the point addition, it seems hard to avoid special cases when adding
a point to itself or to its negation, or adding the zero point.
Do you mean the infinity points or the point (0,0,0)?
The infinity point, which is the unit element of the group. I denote it
0 below (P + 0 = P for all P).
I've seen some implementations do not handle that at all, but I agree
that it has to be handled properly.
regards,
Nikos

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: ECC plans