Zoltan Fridrich zfridric@redhat.com writes:
Could you please review the attached patch and give me your feedback?
I've now had a first read, with one round of comments, see below. I'm tempted to try to write a minimal implementation of just SLH-DSA-SHAKE-128s from the spec, to get a better understanding.
Regards, /Niels
--- a/slh-dsa.c 1970-01-01 01:00:00.000000000 +0100 +++ b/slh-dsa.c 2024-12-02 17:00:07.386340199 +0100
+/* Generates SLH-DSA key pair
- Format sk: [SK_SEED || SK_PRF || PUB_SEED || root]
- Format pk: [root || PUB_SEED]
- */
Why different order for the public key? I would have expected PUB_SEED followed by the root (as in figure 16 in the spec, and same order as in the secret key). Does the NIST spec or some other standard clearly specify the serialization of the keys as byte strings?
For the Nettle api, I would also consider to not include the pubkey as part of the secret key, but instead pass both pubkey and secret key to the secret key functions. Similar to, e.g., ed25519_sha512_sign.
+void +slh_dsa_generate_keypair(enum slh_dsa_mode mode,
uint8_t *pk, uint8_t *sk,
void *random_ctx,
nettle_random_func *random)
+{
- struct slh_dsa_params p = _nettle_slh_dsa_params_get(mode);
- uint8_t seed[3 * p.n];
- random(random_ctx, sizeof(seed), seed);
- slh_dsa_generate_keypair_from_seed(mode, seed, pk, sk);
+}
+void +slh_dsa_sign(enum slh_dsa_mode mode, const uint8_t *sk,
void *random_ctx, nettle_random_func *random,
size_t message_size, const uint8_t *message,
uint8_t *signature)
+{
I think I would prefer separate top-level functions for the different algorithm variants, rather than the enum slh_dsa_mode. They could all call the same internal function, passing the appropriate params struct as argument.
- struct slh_dsa_params p = _nettle_slh_dsa_params_get(mode);
- slh_dsa_ctx ctx;
- const uint8_t *sk_prf = sk + p.n;
- const uint8_t *pk = sk + 2 * p.n;
- uint8_t root[p.n];
- uint8_t optrand[p.n];
- uint8_t mhash[p.fors_msg_bytes];
I don't think variable size arrays are quite portable. You would need to either use the TMP_ALLOC macros, or if you do separate top-level functions, each can allocated the needed constant amount of storage.
- uint32_t wots_addr[8] = {0};
- uint32_t tree_addr[8] = {0};
- uint32_t i, idx_leaf;
- uint64_t tree;
- memcpy(ctx.sk_seed, sk, p.n);
- memcpy(ctx.pub_seed, pk, p.n);
- _nettle_set_type(&p, wots_addr, SLH_DSA_ADDR_TYPE_WOTS);
- _nettle_set_type(&p, tree_addr, SLH_DSA_ADDR_TYPE_HASHTREE);
- /* Optionally, signing can be made non-deterministic using optrand.
This can help counter side-channel attacks that would benefit from
getting a large number of traces when the signer uses the same nodes. */
- if (random_ctx && random)
- random(random_ctx, sizeof(optrand), optrand);
- else
- memcpy(optrand, pk, sizeof(optrand));
So this function supports borth random and deterministic signatures. Minor detail: I think it should be just if (random), it's fine for some randomness generators to use NULL random_ctx (e.g., for an application where random state is global).
- /* Compute the digest randomization value. */
- p.gen_message_random(&p, signature, sk_prf, optrand, message, message_size);
- /* Derive the message digest and leaf index from R, PK and M. */
- p.hash_message(&p, mhash, &tree, &idx_leaf, signature, pk, message, message_size);
- signature += p.n;
- _nettle_set_tree_addr(&p, wots_addr, tree);
- _nettle_set_keypair_addr(&p, wots_addr, idx_leaf);
- /* Sign the message hash using FORS. */
- _nettle_fors_sign(&p, signature, root, mhash, &ctx, wots_addr);
- signature += p.fors_bytes;
- for (i = 0; i < p.d; i++)
- {
_nettle_set_layer_addr(&p, tree_addr, i);
_nettle_set_tree_addr(&p, tree_addr, tree);
_nettle_copy_subtree_addr(&p, wots_addr, tree_addr);
_nettle_set_keypair_addr(&p, wots_addr, idx_leaf);
One possible microoptimization, when we make lots of hashes with a common prefix, could be to copy the underlying hash context (e.g., sha3_256_ctx for shake256) after calling <hash>_update on that prefix. But maybe doens't matter much if prefix is small compared to the hash function's block size.
diff --color -ruNp a/slh-dsa-context.h b/slh-dsa-context.h --- a/slh-dsa-context.h 1970-01-01 01:00:00.000000000 +0100 +++ b/slh-dsa-context.h 2024-12-02 15:41:43.292643667 +0100
+typedef struct
- {
- uint8_t pub_seed[32];
- uint8_t sk_seed[32];
- } slh_dsa_ctx;
It could make some sense to collect all the internal declarations into a single slh-dsa-internal.h.
diff --color -ruNp a/slh-dsa-fors.c b/slh-dsa-fors.c --- a/slh-dsa-fors.c 1970-01-01 01:00:00.000000000 +0100 +++ b/slh-dsa-fors.c 2024-12-02 16:59:07.186076254 +0100
+void +_nettle_fors_pk_from_sig(const struct slh_dsa_params *p,
uint8_t *pk, const uint8_t *sig,
const uint8_t *m, const slh_dsa_ctx* ctx,
const uint32_t fors_addr[8])
+{
- uint32_t indices[p->fors_trees];
- uint8_t roots[p->fors_trees * p->n];
- /* Hash horizontally across all tree roots to derive the public key. */
- p->thash(p, pk, roots, p->fors_trees, ctx, fors_pk_addr);
If we could call the underlying hash function's update method for each root as it is computed, we wouldn't need to allocate any array of roots.
diff --color -ruNp a/slh-dsa.h b/slh-dsa.h --- a/slh-dsa.h 1970-01-01 01:00:00.000000000 +0100 +++ b/slh-dsa.h 2024-12-02 15:34:51.201341870 +0100
+void +slh_dsa_generate_keypair_from_seed(enum slh_dsa_mode mode,
const uint8_t *seed,
uint8_t *pk, uint8_t *sk);
+void +slh_dsa_generate_keypair(enum slh_dsa_mode mode,
uint8_t *pk, uint8_t *sk,
void *random_ctx, nettle_random_func *random);
Not sure how to best think about the keypair and key generation. The secret key (or rather, the secret part of the secret key), SK.seed and SK.prf, are arbitrary random byte strings of the right size, analogous to ed25519. So we don't strictly need any function for generating a valid secret, just using the random byte generator is fine. We do need a function to derive the public key from such a secret, but then the public key includes it's own randomization, which is unusual.
diff --color -ruNp a/slh-dsa-hash.c b/slh-dsa-hash.c --- a/slh-dsa-hash.c 1970-01-01 01:00:00.000000000 +0100 +++ b/slh-dsa-hash.c 2024-12-02 16:53:03.160423482 +0100
+/* Computes the message-dependent randomness R,
- using a secret seed and an optional
- randomization value as well as the message.
- */
+void +_nettle_gen_message_random_shake(const struct slh_dsa_params *p,
uint8_t *R, const uint8_t *sk_prf,
const uint8_t *optrand,
const uint8_t *m,
unsigned long long mlen)
Should be size_t, not unsigned long long. And nettle convention on argument order is to have length followed by the pointer.
Also, these functions should either be static, or have slh somewhere in the names.
diff --color -ruNp a/slh-dsa-params.c b/slh-dsa-params.c --- a/slh-dsa-params.c 1970-01-01 01:00:00.000000000 +0100 +++ b/slh-dsa-params.c 2024-12-02 16:58:28.762269523 +0100
+struct slh_dsa_params +_nettle_slh_dsa_params_get(enum slh_dsa_mode mode) +{
- struct slh_dsa_params p;
I think it is better to define one static const struct for each variant (a bit similar to struct ecc_curve).
diff --color -ruNp a/slh-dsa-params.h b/slh-dsa-params.h --- a/slh-dsa-params.h 1970-01-01 01:00:00.000000000 +0100 +++ b/slh-dsa-params.h 2024-12-02 16:51:06.559969811 +0100
+struct slh_dsa_params
- {
- unsigned n; /* Hash output length in bytes */
- unsigned full_height; /* Height of the hypertree */
- unsigned d; /* Number of subtree layer */
- /* FORS tree dimensions */
- unsigned fors_height;
- unsigned fors_trees;
- unsigned wots_w; /* Winternitz parameter */
- unsigned addr_bytes;
- /* WOTS parameters */
- unsigned wots_logw;
- unsigned wots_len1;
- unsigned wots_len2;
- unsigned wots_len;
- unsigned wots_bytes;
- unsigned wots_pk_bytes;
- unsigned tree_height; /* Subtree size */
- /* FORS parameters */
- unsigned fors_msg_bytes;
- unsigned fors_bytes;
- unsigned fors_pk_bytes;
- /* Resulting SLH-DSA sizes */
- unsigned bytes;
- unsigned pk_bytes;
- unsigned sk_bytes;
- /* Offsets of various fields in the address structure */
- unsigned offset_layer; /* The byte used to specify the Merkle tree layer */
- unsigned offset_tree; /* The start of the 8 byte field used to specify the tree */
- unsigned offset_type; /* The byte used to specify the hash type (reason) */
- unsigned offset_kp_addr; /* The start of the 4 byte field used to specify the key pair address */
- unsigned offset_chain_addr; /* The byte used to specify the chain address (which Winternitz chain) */
- unsigned offset_hash_addr; /* The byte used to specify the hash address (where in the Winternitz chain) */
- unsigned offset_tree_hgt; /* The byte used to specify the height of this node in the FORS or Merkle tree */
- unsigned offset_tree_index; /* The start of the 4 byte field used to specify
the node in the FORS or Merkle tree */
It's unfortunate that we have the "compressed" addresses, that seems to add complexity, and unclear for what gain.
- /* Hash specific functions */
- void (*thash)(const struct slh_dsa_params *,
uint8_t *, const uint8_t *,
uint32_t, const slh_dsa_ctx *,
uint32_t[8]);
- void (*prf_addr)(const struct slh_dsa_params *,
uint8_t *, const slh_dsa_ctx *,
const uint32_t[8]);
- void (*gen_message_random)(const struct slh_dsa_params *,
uint8_t *, const uint8_t *,
const uint8_t *,
const uint8_t *,
unsigned long long);
- void (*hash_message)(const struct slh_dsa_params *,
uint8_t *, uint64_t *,
uint32_t *, const uint8_t *,
const uint8_t *, const uint8_t *,
unsigned long long);
Would make sense with typedefs for these function types.
diff --color -ruNp a/slh-dsa-thash.c b/slh-dsa-thash.c --- a/slh-dsa-thash.c 1970-01-01 01:00:00.000000000 +0100 +++ b/slh-dsa-thash.c 2024-12-02 16:43:52.812842310 +0100
+/* Takes an array of inblocks concatenated arrays of N bytes.
- */
+void +_nettle_thash_shake(const struct slh_dsa_params *p,
uint8_t *out, const uint8_t *in,
uint32_t inblocks, const slh_dsa_ctx *ctx,
uint32_t addr[8])
+{
- uint8_t buf[p->n + p->addr_bytes + inblocks * p->n];
- struct sha3_256_ctx hash_ctx;
- memcpy(buf, ctx->pub_seed, p->n);
- memcpy(buf + p->n, addr, p->addr_bytes);
- memcpy(buf + p->n + p->addr_bytes, in, inblocks * p->n);
There's no need for allocation and memcpy, just call sha3_256_update multiple times.
- sha3_256_init(&hash_ctx);
- sha3_256_update(&hash_ctx, sizeof(buf), buf);
- sha3_256_shake(&hash_ctx, p->n, out);
+}
diff --color -ruNp a/slh-dsa-utils.c b/slh-dsa-utils.c --- a/slh-dsa-utils.c 1970-01-01 01:00:00.000000000 +0100 +++ b/slh-dsa-utils.c 2024-12-02 16:59:18.728318592 +0100
+void +_nettle_u32_to_bytes(uint8_t *out, uint32_t in) +{
- out[0] = (uint8_t)(in >> 24);
- out[1] = (uint8_t)(in >> 16);
- out[2] = (uint8_t)(in >> 8);
- out[3] = (uint8_t)in;
+}
This is same as Nettle's WRITE_UINT32. Somewhat related, it looked like the algorithm descriptions in the spec had some unneeded conversions, e.g., converting certain digest bytes to an integer, and then only writing that integer back into an ADDR. Instead of directly copying the relevant bytes.
+/* Computes a root node given a leaf and an auth path.
- Expects address to be complete other than
- the tree_height and tree_index.
- */
+void +_nettle_compute_root(const struct slh_dsa_params *p,
uint8_t *root, const uint8_t *leaf,
uint32_t leaf_idx, uint32_t idx_offset,
const uint8_t *auth_path,
uint32_t tree_height, const slh_dsa_ctx *ctx,
uint32_t addr[8])
+{
- uint32_t i;
- uint8_t buffer[2 * p->n];
- /* If leaf_idx is odd (last bit = 1),
- current path element is a right child
- and auth_path has to go left.
- Otherwise it is the other way around. */
- if (leaf_idx & 1)
- {
memcpy(buffer + p->n, leaf, p->n);
memcpy(buffer, auth_path, p->n);
- }
- else
- {
memcpy(buffer, leaf, p->n);
memcpy(buffer + p->n, auth_path, p->n);
- }
- auth_path += p->n;
I think all memcpy calls in this function could be eliminated.
+/* For a given leaf index, computes the authentication path
- and the resulting root node using Merkle's TreeHash algorithm.
- Expects the layer and tree parts of the tree_addr to be set,
- as well as the tree type (i.e. ADDR_TYPE_HASHTREE or ADDR_TYPE_FORSTREE).
- Applies the offset idx_offset to indices before building addresses,
- so that it is possible to continue counting indices across trees.
- */
+void +_nettle_treehash(const struct slh_dsa_params *p,
uint8_t *root, uint8_t *auth_path,
const slh_dsa_ctx* ctx, uint32_t leaf_idx,
uint32_t idx_offset, uint32_t tree_height,
void (*gen_leaf)(
uint8_t* /* leaf */,
const slh_dsa_ctx* /* ctx */,
uint32_t /* addr_idx */,
const uint32_t[8] /* tree_addr */),
uint32_t tree_addr[8])
+{
I haven't read this in detail, but it was something like this I expected when I wrote about processing all leafs sequentially, rather than recursing. I like that. It also looks like it should be side-channel silent, i.e., no branches or memory addresses depending on the secret key?
diff --color -ruNp a/slh-dsa-utilsx1.c b/slh-dsa-utilsx1.c --- a/slh-dsa-utilsx1.c 1970-01-01 01:00:00.000000000 +0100 +++ b/slh-dsa-utilsx1.c 2024-12-02 16:41:37.290990469 +0100
+/* Generate the entire Merkle tree, computing the authentication path for
- leaf_idx, and the resulting root node using Merkle's TreeHash algorithm.
- Expects the layer and tree parts of the tree_addr to be set, as well as the
- tree type (i.e. SPX_ADDR_TYPE_HASHTREE or SPX_ADDR_TYPE_FORSTREE)
- This expects tree_addr to be initialized to the addr structures for the
- Merkle tree nodes
- Applies the offset idx_offset to indices before building addresses, so that
- it is possible to continue counting indices across trees.
- This works by using the standard Merkle tree building algorithm,
- */
+void +_nettle_treehashx1(const struct slh_dsa_params *p,
uint8_t *root, uint8_t *auth_path,
const slh_dsa_ctx* ctx, uint32_t leaf_idx,
uint32_t idx_offset, uint32_t tree_height,
void (*gen_leaf)(
const struct slh_dsa_params *,
uint8_t* /* Where to write the leaves */,
const slh_dsa_ctx* /* ctx */,
uint32_t idx, void *info),
uint32_t tree_addr[8], void *info)
+{
It's unclear how this differs from _nettle_treehash further above? Or in general, whats the meaning of "x1" suffix on some functions?
diff --color -ruNp a/slh-dsa-wots.c b/slh-dsa-wots.c --- a/slh-dsa-wots.c 1970-01-01 01:00:00.000000000 +0100 +++ b/slh-dsa-wots.c 2024-12-02 16:59:28.665527230 +0100
+/* base_w algorithm as described in draft.
- Interprets an array of bytes as integers in base w.
- This only works when log_w is a divisor of 8.
- */
If log_w is always 4, no need to have a more general conversion. (And maybe we don't even need an explicit conversion, of we can extract the right nibbles on the fly). I think the checksum, as well as other parts of wots could be simplifyed quite a lot for this special case.
--- a/slh-dsa-wotsx1.c 1970-01-01 01:00:00.000000000 +0100 +++ b/slh-dsa-wotsx1.c 2024-12-02 16:59:44.395857500 +0100
+/* This generates a WOTS public key
- It also generates the WOTS signature if leaf_info
- indicates that we're signing with this WOTS key
- */
+void +_nettle_wots_gen_leafx1(const struct slh_dsa_params *p,
uint8_t *dest, const slh_dsa_ctx *ctx,
uint32_t leaf_idx, void *v_info)
+{
- struct slh_dsa_leaf_info_x1 *info = v_info;
- uint32_t *leaf_addr = info->leaf_addr;
- uint32_t *pk_addr = info->pk_addr;
- unsigned int i, k;
- uint8_t pk_buffer[p->wots_bytes];
- uint8_t *buffer;
- uint32_t wots_k_mask;
- if (leaf_idx == info->wots_sign_leaf)
- {
/* We're traversing the leaf that's signing;
* generate the WOTS signature */
wots_k_mask = 0;
- }
- else
- {
/* Nope, we're just generating pk's;
* turn off the signature logic */
wots_k_mask = (uint32_t)~0;
- }
Not sure I get why wots_k_mask is used? Is it intended to improve sidechannel-silence or performance?