Re: [PATCH] SLH-DSA

15 Jan 2025

      Zoltan Fridrich zfridric@redhat.com writes:
...
Could you please review the attached patch and give me your feedback?
I've now had a first read, with one round of comments, see below.
I'm tempted to try to write a minimal implementation of just
SLH-DSA-SHAKE-128s from the spec, to get a better understanding.
Regards,
/Niels
...
--- a/slh-dsa.c	1970-01-01 01:00:00.000000000 +0100
+++ b/slh-dsa.c	2024-12-02 17:00:07.386340199 +0100

+/* Generates SLH-DSA key pair

Format sk: [SK_SEED || SK_PRF || PUB_SEED || root]

Format pk: [root || PUB_SEED]

*/

Why different order for the public key? I would have expected PUB_SEED
followed by the root (as in figure 16 in the spec, and same order as in
the secret key). Does the NIST spec or some other standard clearly
specify the serialization of the keys as byte strings?
For the Nettle api, I would also consider to not include the pubkey as
part of the secret key, but instead pass both pubkey and secret key to
the secret key functions. Similar to, e.g., ed25519_sha512_sign.
...
+void
+slh_dsa_generate_keypair(enum slh_dsa_mode mode,

	 uint8_t *pk, uint8_t *sk,

	 void *random_ctx,

                    nettle_random_func *random)

+{

struct slh_dsa_params p = _nettle_slh_dsa_params_get(mode);
uint8_t seed[3 * p.n];

random(random_ctx, sizeof(seed), seed);
slh_dsa_generate_keypair_from_seed(mode, seed, pk, sk);

+}

+void
+slh_dsa_sign(enum slh_dsa_mode mode, const uint8_t *sk,

    void *random_ctx, nettle_random_func *random,

    size_t message_size, const uint8_t *message,

    uint8_t *signature)

+{
I think I would prefer separate top-level functions for the different
algorithm variants, rather than the enum slh_dsa_mode. They could all
call the same internal function, passing the appropriate params struct
as argument.
...

struct slh_dsa_params p = _nettle_slh_dsa_params_get(mode);
slh_dsa_ctx ctx;
const uint8_t *sk_prf = sk + p.n;
const uint8_t *pk = sk + 2 * p.n;
uint8_t root[p.n];
uint8_t optrand[p.n];
uint8_t mhash[p.fors_msg_bytes];

I don't think variable size arrays are quite portable. You would need to
either use the TMP_ALLOC macros, or if you do separate top-level
functions, each can allocated the needed constant amount of storage.
...

uint32_t wots_addr[8] = {0};
uint32_t tree_addr[8] = {0};
uint32_t i, idx_leaf;
uint64_t tree;

memcpy(ctx.sk_seed, sk, p.n);
memcpy(ctx.pub_seed, pk, p.n);

_nettle_set_type(&p, wots_addr, SLH_DSA_ADDR_TYPE_WOTS);
_nettle_set_type(&p, tree_addr, SLH_DSA_ADDR_TYPE_HASHTREE);

/* Optionally, signing can be made non-deterministic using optrand.
This can help counter side-channel attacks that would benefit from

getting a large number of traces when the signer uses the same nodes. */

if (random_ctx && random)
random(random_ctx, sizeof(optrand), optrand);
else
memcpy(optrand, pk, sizeof(optrand));

So this function supports borth random and deterministic signatures.
Minor detail: I think it should be just if (random), it's fine for some
randomness generators to use NULL random_ctx (e.g., for an application
where random state is global).
...

/* Compute the digest randomization value. */
p.gen_message_random(&p, signature, sk_prf, optrand, message, message_size);

/* Derive the message digest and leaf index from R, PK and M. */
p.hash_message(&p, mhash, &tree, &idx_leaf, signature, pk, message, message_size);
signature += p.n;

_nettle_set_tree_addr(&p, wots_addr, tree);
_nettle_set_keypair_addr(&p, wots_addr, idx_leaf);

/* Sign the message hash using FORS. */
_nettle_fors_sign(&p, signature, root, mhash, &ctx, wots_addr);
signature += p.fors_bytes;

for (i = 0; i < p.d; i++)
{
 _nettle_set_layer_addr(&p, tree_addr, i);

 _nettle_set_tree_addr(&p, tree_addr, tree);

 _nettle_copy_subtree_addr(&p, wots_addr, tree_addr);

 _nettle_set_keypair_addr(&p, wots_addr, idx_leaf);

One possible microoptimization, when we make lots of hashes with a
common prefix, could be to copy the underlying hash context (e.g.,
sha3_256_ctx for shake256) after calling <hash>_update on that prefix.
But maybe doens't matter much if prefix is small compared to the hash
function's block size.
...
diff --color -ruNp a/slh-dsa-context.h b/slh-dsa-context.h
--- a/slh-dsa-context.h	1970-01-01 01:00:00.000000000 +0100
+++ b/slh-dsa-context.h	2024-12-02 15:41:43.292643667 +0100
...
+typedef struct

{
uint8_t pub_seed[32];
uint8_t sk_seed[32];
} slh_dsa_ctx;

It could make some sense to collect all the internal declarations into a
single slh-dsa-internal.h.
...
diff --color -ruNp a/slh-dsa-fors.c b/slh-dsa-fors.c
--- a/slh-dsa-fors.c	1970-01-01 01:00:00.000000000 +0100
+++ b/slh-dsa-fors.c	2024-12-02 16:59:07.186076254 +0100
...
+void
+_nettle_fors_pk_from_sig(const struct slh_dsa_params *p,

                    uint8_t *pk, const uint8_t *sig,

                    const uint8_t *m, const slh_dsa_ctx* ctx,

                    const uint32_t fors_addr[8])

+{

uint32_t indices[p->fors_trees];
uint8_t roots[p->fors_trees * p->n];

...

/* Hash horizontally across all tree roots to derive the public key. */
p->thash(p, pk, roots, p->fors_trees, ctx, fors_pk_addr);

If we could call the underlying hash function's update method for each
root as it is computed, we wouldn't need to allocate any array of roots.
...
diff --color -ruNp a/slh-dsa.h b/slh-dsa.h
--- a/slh-dsa.h	1970-01-01 01:00:00.000000000 +0100
+++ b/slh-dsa.h	2024-12-02 15:34:51.201341870 +0100
...
+void
+slh_dsa_generate_keypair_from_seed(enum slh_dsa_mode mode,

	           const uint8_t *seed,

	           uint8_t *pk, uint8_t *sk);

+void
+slh_dsa_generate_keypair(enum slh_dsa_mode mode,

	 uint8_t *pk, uint8_t *sk,

	 void *random_ctx, nettle_random_func *random);

Not sure how to best think about the keypair and key generation. The
secret key (or rather, the secret part of the secret key), SK.seed and
SK.prf, are arbitrary random byte strings of the right size, analogous to
ed25519. So we don't strictly need any function for generating a valid
secret, just using the random byte generator is fine. We do need a
function to derive the public key from such a secret, but then the
public key includes it's own randomization, which is unusual.
...
diff --color -ruNp a/slh-dsa-hash.c b/slh-dsa-hash.c
--- a/slh-dsa-hash.c	1970-01-01 01:00:00.000000000 +0100
+++ b/slh-dsa-hash.c	2024-12-02 16:53:03.160423482 +0100
...
+/* Computes the message-dependent randomness R,

using a secret seed and an optional

randomization value as well as the message.

*/

+void
+_nettle_gen_message_random_shake(const struct slh_dsa_params *p,

                            uint8_t *R, const uint8_t *sk_prf,

                            const uint8_t *optrand,

                            const uint8_t *m,

                            unsigned long long mlen)

Should be size_t, not unsigned long long. And nettle convention on
argument order is to have length followed by the pointer.
Also, these functions should either be static, or have slh somewhere in
the names.
...
diff --color -ruNp a/slh-dsa-params.c b/slh-dsa-params.c
--- a/slh-dsa-params.c	1970-01-01 01:00:00.000000000 +0100
+++ b/slh-dsa-params.c	2024-12-02 16:58:28.762269523 +0100
...
+struct slh_dsa_params
+_nettle_slh_dsa_params_get(enum slh_dsa_mode mode)
+{

struct slh_dsa_params p;

I think it is better to define one static const struct for each variant
(a bit similar to struct ecc_curve).
...
diff --color -ruNp a/slh-dsa-params.h b/slh-dsa-params.h
--- a/slh-dsa-params.h	1970-01-01 01:00:00.000000000 +0100
+++ b/slh-dsa-params.h	2024-12-02 16:51:06.559969811 +0100
...
+struct slh_dsa_params

{
unsigned n; /* Hash output length in bytes */
unsigned full_height; /* Height of the hypertree */
unsigned d; /* Number of subtree layer */

/* FORS tree dimensions */
unsigned fors_height;
unsigned fors_trees;

unsigned wots_w; /* Winternitz parameter */

unsigned addr_bytes;

/* WOTS parameters */
unsigned wots_logw;
unsigned wots_len1;
unsigned wots_len2;
unsigned wots_len;
unsigned wots_bytes;
unsigned wots_pk_bytes;

unsigned tree_height; /* Subtree size */

/* FORS parameters */
unsigned fors_msg_bytes;
unsigned fors_bytes;
unsigned fors_pk_bytes;

/* Resulting SLH-DSA sizes */
unsigned bytes;
unsigned pk_bytes;
unsigned sk_bytes;

/* Offsets of various fields in the address structure */
unsigned offset_layer; /* The byte used to specify the Merkle tree layer */
unsigned offset_tree; /* The start of the 8 byte field used to specify the tree */
unsigned offset_type; /* The byte used to specify the hash type (reason) */
unsigned offset_kp_addr; /* The start of the 4 byte field used to specify the key pair address */
unsigned offset_chain_addr; /* The byte used to specify the chain address (which Winternitz chain) */
unsigned offset_hash_addr; /* The byte used to specify the hash address (where in the Winternitz chain) */
unsigned offset_tree_hgt; /* The byte used to specify the height of this node in the FORS or Merkle tree */
unsigned offset_tree_index; /* The start of the 4 byte field used to specify
                              the node in the FORS or Merkle tree */

It's unfortunate that we have the "compressed" addresses, that seems to
add complexity, and unclear for what gain.
...

/* Hash specific functions */
void (*thash)(const struct slh_dsa_params *,
             uint8_t *, const uint8_t *,

             uint32_t, const slh_dsa_ctx *,

             uint32_t[8]);

void (*prf_addr)(const struct slh_dsa_params *,
                uint8_t *, const slh_dsa_ctx *,

                const uint32_t[8]);

void (*gen_message_random)(const struct slh_dsa_params *,
                          uint8_t *, const uint8_t *,

                          const uint8_t *,

                          const uint8_t *,

                          unsigned long long);

void (*hash_message)(const struct slh_dsa_params *,
                    uint8_t *, uint64_t *,

                    uint32_t *, const uint8_t *,

                    const uint8_t *, const uint8_t *,

                    unsigned long long);

Would make sense with typedefs for these function types.
...
diff --color -ruNp a/slh-dsa-thash.c b/slh-dsa-thash.c
--- a/slh-dsa-thash.c	1970-01-01 01:00:00.000000000 +0100
+++ b/slh-dsa-thash.c	2024-12-02 16:43:52.812842310 +0100
...
+/* Takes an array of inblocks concatenated arrays of N bytes.

*/

+void
+_nettle_thash_shake(const struct slh_dsa_params *p,

               uint8_t *out, const uint8_t *in,

               uint32_t inblocks, const slh_dsa_ctx *ctx,

               uint32_t addr[8])

+{

uint8_t buf[p->n + p->addr_bytes + inblocks * p->n];
struct sha3_256_ctx hash_ctx;

memcpy(buf, ctx->pub_seed, p->n);
memcpy(buf + p->n, addr, p->addr_bytes);
memcpy(buf + p->n + p->addr_bytes, in, inblocks * p->n);

There's no need for allocation and memcpy, just call sha3_256_update
multiple times.
...

sha3_256_init(&hash_ctx);
sha3_256_update(&hash_ctx, sizeof(buf), buf);
sha3_256_shake(&hash_ctx, p->n, out);

+}
...
diff --color -ruNp a/slh-dsa-utils.c b/slh-dsa-utils.c
--- a/slh-dsa-utils.c	1970-01-01 01:00:00.000000000 +0100
+++ b/slh-dsa-utils.c	2024-12-02 16:59:18.728318592 +0100
...
+void
+_nettle_u32_to_bytes(uint8_t *out, uint32_t in)
+{

out[0] = (uint8_t)(in >> 24);
out[1] = (uint8_t)(in >> 16);
out[2] = (uint8_t)(in >> 8);
out[3] = (uint8_t)in;

+}
This is same as Nettle's WRITE_UINT32. Somewhat related, it looked like
the algorithm descriptions in the spec had some unneeded conversions,
e.g., converting certain digest bytes to an integer, and then only
writing that integer back into an ADDR. Instead of directly copying the
relevant bytes.
...
+/* Computes a root node given a leaf and an auth path.

Expects address to be complete other than

the tree_height and tree_index.

*/

+void
+_nettle_compute_root(const struct slh_dsa_params *p,

                uint8_t *root, const uint8_t *leaf,

                uint32_t leaf_idx, uint32_t idx_offset,

                const uint8_t *auth_path,

                uint32_t tree_height, const slh_dsa_ctx *ctx,

                uint32_t addr[8])

+{

uint32_t i;
uint8_t buffer[2 * p->n];

/* If leaf_idx is odd (last bit = 1),

current path element is a right child

and auth_path has to go left.

Otherwise it is the other way around. */

if (leaf_idx & 1)
{
 memcpy(buffer + p->n, leaf, p->n);

 memcpy(buffer, auth_path, p->n);

}
else
{
 memcpy(buffer, leaf, p->n);

 memcpy(buffer + p->n, auth_path, p->n);

}
auth_path += p->n;

I think all memcpy calls in this function could be eliminated.
...
+/* For a given leaf index, computes the authentication path

and the resulting root node using Merkle's TreeHash algorithm.

Expects the layer and tree parts of the tree_addr to be set,

as well as the tree type (i.e. ADDR_TYPE_HASHTREE or ADDR_TYPE_FORSTREE).

Applies the offset idx_offset to indices before building addresses,

so that it is possible to continue counting indices across trees.

*/

+void
+_nettle_treehash(const struct slh_dsa_params *p,

            uint8_t *root, uint8_t *auth_path,

            const slh_dsa_ctx* ctx, uint32_t leaf_idx,

            uint32_t idx_offset, uint32_t tree_height,

            void (*gen_leaf)(

              uint8_t* /* leaf */,

              const slh_dsa_ctx* /* ctx */,

              uint32_t /* addr_idx */,

              const uint32_t[8] /* tree_addr */),

            uint32_t tree_addr[8])

+{
I haven't read this in detail, but it was something like this I expected
when I wrote about processing all leafs sequentially, rather than
recursing. I like that. It also looks like it should be side-channel
silent, i.e., no branches or memory addresses depending on the secret
key?
...
diff --color -ruNp a/slh-dsa-utilsx1.c b/slh-dsa-utilsx1.c
--- a/slh-dsa-utilsx1.c	1970-01-01 01:00:00.000000000 +0100
+++ b/slh-dsa-utilsx1.c	2024-12-02 16:41:37.290990469 +0100
...
+/* Generate the entire Merkle tree, computing the authentication path for

leaf_idx, and the resulting root node using Merkle's TreeHash algorithm.

Expects the layer and tree parts of the tree_addr to be set, as well as the

tree type (i.e. SPX_ADDR_TYPE_HASHTREE or SPX_ADDR_TYPE_FORSTREE)

This expects tree_addr to be initialized to the addr structures for the

Merkle tree nodes

Applies the offset idx_offset to indices before building addresses, so that

it is possible to continue counting indices across trees.

This works by using the standard Merkle tree building algorithm,

*/

+void
+_nettle_treehashx1(const struct slh_dsa_params *p,

              uint8_t *root, uint8_t *auth_path,

              const slh_dsa_ctx* ctx, uint32_t leaf_idx,

              uint32_t idx_offset, uint32_t tree_height,

              void (*gen_leaf)(

                const struct slh_dsa_params *,

                uint8_t* /* Where to write the leaves */,

                const slh_dsa_ctx* /* ctx */,

                uint32_t idx, void *info),

              uint32_t tree_addr[8], void *info)

+{
It's unclear how this differs from _nettle_treehash further above? Or in
general, whats the meaning of "x1" suffix on some functions?
...
diff --color -ruNp a/slh-dsa-wots.c b/slh-dsa-wots.c
--- a/slh-dsa-wots.c	1970-01-01 01:00:00.000000000 +0100
+++ b/slh-dsa-wots.c	2024-12-02 16:59:28.665527230 +0100
...
+/* base_w algorithm as described in draft.

Interprets an array of bytes as integers in base w.

This only works when log_w is a divisor of 8.

*/

If log_w is always 4, no need to have a more general conversion. (And
maybe we don't even need an explicit conversion, of we can extract the
right nibbles on the fly). I think the checksum, as well as other parts
of wots could be simplifyed quite a lot for this special case.
...
--- a/slh-dsa-wotsx1.c	1970-01-01 01:00:00.000000000 +0100
+++ b/slh-dsa-wotsx1.c	2024-12-02 16:59:44.395857500 +0100
...
+/* This generates a WOTS public key

It also generates the WOTS signature if leaf_info

indicates that we're signing with this WOTS key

*/

+void
+_nettle_wots_gen_leafx1(const struct slh_dsa_params *p,

                   uint8_t *dest, const slh_dsa_ctx *ctx,

                   uint32_t leaf_idx, void *v_info)

+{

struct slh_dsa_leaf_info_x1 *info = v_info;
uint32_t *leaf_addr = info->leaf_addr;
uint32_t *pk_addr = info->pk_addr;
unsigned int i, k;
uint8_t pk_buffer[p->wots_bytes];
uint8_t *buffer;
uint32_t wots_k_mask;

if (leaf_idx == info->wots_sign_leaf)
{
 /* We're traversing the leaf that's signing;

  * generate the WOTS signature */

 wots_k_mask = 0;

}
else
{
 /* Nope, we're just generating pk's;

  * turn off the signature logic */

 wots_k_mask = (uint32_t)~0;

}

Not sure I get why wots_k_mask is used? Is it intended to improve
sidechannel-silence or performance?
-- 
Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677.
Internet email is subject to wholesale government surveillance.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [PATCH] SLH-DSA