Some things I fixed, when having a look at the sources.
Just two points for code reviewing:
1. #define NONNULL(...) __attribute__ ((nonnull(__VA_ARGS__))) is contains a C99 feature (...), but there are also C99 long long constants somewhere in the code (if you mind C89 compliancy).
2. In cast128.c I removed the wiping of t, l and r. Instead I set t=0 at the beginning of the loops (It seemed to be used uninitialized in F1 macro). Please just have a short look into it - maybe the "wiping" has some undocumented deeper meaning !?
Regards, Tim