Salsa20 as a PRNG with streams
$begingroup$
Can I use Salsa20 as a good non-cryptographic PRNG with different streams if I reduce the number of rounds to 8 and omit the addition step at the end? I want to omit the final step because I don't want to get all zero outputs.
random-number-generator salsa20
$endgroup$
add a comment |
$begingroup$
Can I use Salsa20 as a good non-cryptographic PRNG with different streams if I reduce the number of rounds to 8 and omit the addition step at the end? I want to omit the final step because I don't want to get all zero outputs.
random-number-generator salsa20
$endgroup$
1
$begingroup$
Related: crypto.stackexchange.com/q/57670/54184
$endgroup$
– forest
Jan 21 at 11:07
3
$begingroup$
Consider ChaCha8. ChaCha is faster and has more diffusion per round.
$endgroup$
– Future Security
Jan 21 at 16:15
add a comment |
$begingroup$
Can I use Salsa20 as a good non-cryptographic PRNG with different streams if I reduce the number of rounds to 8 and omit the addition step at the end? I want to omit the final step because I don't want to get all zero outputs.
random-number-generator salsa20
$endgroup$
Can I use Salsa20 as a good non-cryptographic PRNG with different streams if I reduce the number of rounds to 8 and omit the addition step at the end? I want to omit the final step because I don't want to get all zero outputs.
random-number-generator salsa20
random-number-generator salsa20
asked Jan 21 at 10:52


ThorhamThorham
588
588
1
$begingroup$
Related: crypto.stackexchange.com/q/57670/54184
$endgroup$
– forest
Jan 21 at 11:07
3
$begingroup$
Consider ChaCha8. ChaCha is faster and has more diffusion per round.
$endgroup$
– Future Security
Jan 21 at 16:15
add a comment |
1
$begingroup$
Related: crypto.stackexchange.com/q/57670/54184
$endgroup$
– forest
Jan 21 at 11:07
3
$begingroup$
Consider ChaCha8. ChaCha is faster and has more diffusion per round.
$endgroup$
– Future Security
Jan 21 at 16:15
1
1
$begingroup$
Related: crypto.stackexchange.com/q/57670/54184
$endgroup$
– forest
Jan 21 at 11:07
$begingroup$
Related: crypto.stackexchange.com/q/57670/54184
$endgroup$
– forest
Jan 21 at 11:07
3
3
$begingroup$
Consider ChaCha8. ChaCha is faster and has more diffusion per round.
$endgroup$
– Future Security
Jan 21 at 16:15
$begingroup$
Consider ChaCha8. ChaCha is faster and has more diffusion per round.
$endgroup$
– Future Security
Jan 21 at 16:15
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
Reducing the rounds to 8 would give you Salsa20/8, which is not just a fast PRNG operating at 1.88 cycles per byte on Core2Duo, but is still quite cryptographically secure with the best attack requiring approximately 2244 operations. Removing the final addition step would not be good though, as without that, it would be trivial to reverse the function and discover the key and counter given just a single block of known plaintext. You will not get all zero outputs by keeping the addition, so you should keep it.
You could cut the algorithm down to four rounds in order to roughly double the speed while completely sacrificing cryptographic security. Less than four rounds results in incomplete diffusion, leading to biased and non-uniform output. However, it will still be roughly twice as slow as the fastest dedicated non-cryptographic PRNG, XorShift128+ (an LFSR-based PRNG at 0.48 cycles per byte on Kaby Lake).
$endgroup$
$begingroup$
Some other non-cryptographic algorithms are certainly faster, but they have a smaller state (I need room for a SHA256 hash) and I need streams. There doesn't seem to be much choice other than crypto algorithms if you have these requirements.
$endgroup$
– Thorham
Jan 21 at 12:00
$begingroup$
Is 0.48 c/b for a scalar implementation? You can run four XorShift128+ PRNGs in parallel in elements of an AVX2 vector. See AVX/SSE version of xorshift128+ for__m256i xorshift128plus_avx2(struct rngstate256 *sp)
. 8 SIMD ALU uops per 32 bytes of results => about 12 bytes per cycle, or 0.0833 c/b on SKL / KBL. (I used it in my answer on What's the fastest way to generate a 1 GB text file containing random digits? which does > 8 bytes per cycle of space-separated ASCII decimal digits on SKL.)
$endgroup$
– Peter Cordes
Jan 21 at 14:52
$begingroup$
@Thorham: would it work to use a SHA256 hash as the seed for two XorShift128+ PRNGs operating in parallel? If so, 2x 128-bit SIMD vectors will work, and let you generate 2x 64-bit random numbers in parallel. Or use 256-bit vectors to run 4 generators in parallel, requiring twice as much seed data. See my previous comment for C++ and C intrinsics implementations.
$endgroup$
– Peter Cordes
Jan 21 at 14:58
2
$begingroup$
@PeterCordes, Thorham, xoshiro256**/+ are available too. (They use rotate, not just xor-shifts. May not be as suitable for vector implementations.) Two instances of a 128-bit algorithm seeded like that isn't too different from just truncating SHA-256 output to 128 bits.
$endgroup$
– Future Security
Jan 21 at 16:09
1
$begingroup$
@FutureSecurity: SSE2 / AVX2 xoshiro256** looks very possible. AVX512 even has SIMD rotates, making it even better. Other SIMD ISAs can emulate it with shift+shift+OR. SIMD integer multiply is not bad for 32-bit integers on Intel CPUs with SSE4.1, but requires extended-precision techniques for 64-bit integer elements (until AVX512), which is why I used xorshift+ instead of *. But xoshiro256** only multiplies by the constants*5
and*9
, which are both power-of-2 + 1 so are just left-shift+add. (In a scalar implementation, x86 can do that in one cycle withlea rax, [rbx + rbx*8]
.)
$endgroup$
– Peter Cordes
Jan 21 at 16:24
|
show 2 more comments
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "281"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcrypto.stackexchange.com%2fquestions%2f66640%2fsalsa20-as-a-prng-with-streams%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Reducing the rounds to 8 would give you Salsa20/8, which is not just a fast PRNG operating at 1.88 cycles per byte on Core2Duo, but is still quite cryptographically secure with the best attack requiring approximately 2244 operations. Removing the final addition step would not be good though, as without that, it would be trivial to reverse the function and discover the key and counter given just a single block of known plaintext. You will not get all zero outputs by keeping the addition, so you should keep it.
You could cut the algorithm down to four rounds in order to roughly double the speed while completely sacrificing cryptographic security. Less than four rounds results in incomplete diffusion, leading to biased and non-uniform output. However, it will still be roughly twice as slow as the fastest dedicated non-cryptographic PRNG, XorShift128+ (an LFSR-based PRNG at 0.48 cycles per byte on Kaby Lake).
$endgroup$
$begingroup$
Some other non-cryptographic algorithms are certainly faster, but they have a smaller state (I need room for a SHA256 hash) and I need streams. There doesn't seem to be much choice other than crypto algorithms if you have these requirements.
$endgroup$
– Thorham
Jan 21 at 12:00
$begingroup$
Is 0.48 c/b for a scalar implementation? You can run four XorShift128+ PRNGs in parallel in elements of an AVX2 vector. See AVX/SSE version of xorshift128+ for__m256i xorshift128plus_avx2(struct rngstate256 *sp)
. 8 SIMD ALU uops per 32 bytes of results => about 12 bytes per cycle, or 0.0833 c/b on SKL / KBL. (I used it in my answer on What's the fastest way to generate a 1 GB text file containing random digits? which does > 8 bytes per cycle of space-separated ASCII decimal digits on SKL.)
$endgroup$
– Peter Cordes
Jan 21 at 14:52
$begingroup$
@Thorham: would it work to use a SHA256 hash as the seed for two XorShift128+ PRNGs operating in parallel? If so, 2x 128-bit SIMD vectors will work, and let you generate 2x 64-bit random numbers in parallel. Or use 256-bit vectors to run 4 generators in parallel, requiring twice as much seed data. See my previous comment for C++ and C intrinsics implementations.
$endgroup$
– Peter Cordes
Jan 21 at 14:58
2
$begingroup$
@PeterCordes, Thorham, xoshiro256**/+ are available too. (They use rotate, not just xor-shifts. May not be as suitable for vector implementations.) Two instances of a 128-bit algorithm seeded like that isn't too different from just truncating SHA-256 output to 128 bits.
$endgroup$
– Future Security
Jan 21 at 16:09
1
$begingroup$
@FutureSecurity: SSE2 / AVX2 xoshiro256** looks very possible. AVX512 even has SIMD rotates, making it even better. Other SIMD ISAs can emulate it with shift+shift+OR. SIMD integer multiply is not bad for 32-bit integers on Intel CPUs with SSE4.1, but requires extended-precision techniques for 64-bit integer elements (until AVX512), which is why I used xorshift+ instead of *. But xoshiro256** only multiplies by the constants*5
and*9
, which are both power-of-2 + 1 so are just left-shift+add. (In a scalar implementation, x86 can do that in one cycle withlea rax, [rbx + rbx*8]
.)
$endgroup$
– Peter Cordes
Jan 21 at 16:24
|
show 2 more comments
$begingroup$
Reducing the rounds to 8 would give you Salsa20/8, which is not just a fast PRNG operating at 1.88 cycles per byte on Core2Duo, but is still quite cryptographically secure with the best attack requiring approximately 2244 operations. Removing the final addition step would not be good though, as without that, it would be trivial to reverse the function and discover the key and counter given just a single block of known plaintext. You will not get all zero outputs by keeping the addition, so you should keep it.
You could cut the algorithm down to four rounds in order to roughly double the speed while completely sacrificing cryptographic security. Less than four rounds results in incomplete diffusion, leading to biased and non-uniform output. However, it will still be roughly twice as slow as the fastest dedicated non-cryptographic PRNG, XorShift128+ (an LFSR-based PRNG at 0.48 cycles per byte on Kaby Lake).
$endgroup$
$begingroup$
Some other non-cryptographic algorithms are certainly faster, but they have a smaller state (I need room for a SHA256 hash) and I need streams. There doesn't seem to be much choice other than crypto algorithms if you have these requirements.
$endgroup$
– Thorham
Jan 21 at 12:00
$begingroup$
Is 0.48 c/b for a scalar implementation? You can run four XorShift128+ PRNGs in parallel in elements of an AVX2 vector. See AVX/SSE version of xorshift128+ for__m256i xorshift128plus_avx2(struct rngstate256 *sp)
. 8 SIMD ALU uops per 32 bytes of results => about 12 bytes per cycle, or 0.0833 c/b on SKL / KBL. (I used it in my answer on What's the fastest way to generate a 1 GB text file containing random digits? which does > 8 bytes per cycle of space-separated ASCII decimal digits on SKL.)
$endgroup$
– Peter Cordes
Jan 21 at 14:52
$begingroup$
@Thorham: would it work to use a SHA256 hash as the seed for two XorShift128+ PRNGs operating in parallel? If so, 2x 128-bit SIMD vectors will work, and let you generate 2x 64-bit random numbers in parallel. Or use 256-bit vectors to run 4 generators in parallel, requiring twice as much seed data. See my previous comment for C++ and C intrinsics implementations.
$endgroup$
– Peter Cordes
Jan 21 at 14:58
2
$begingroup$
@PeterCordes, Thorham, xoshiro256**/+ are available too. (They use rotate, not just xor-shifts. May not be as suitable for vector implementations.) Two instances of a 128-bit algorithm seeded like that isn't too different from just truncating SHA-256 output to 128 bits.
$endgroup$
– Future Security
Jan 21 at 16:09
1
$begingroup$
@FutureSecurity: SSE2 / AVX2 xoshiro256** looks very possible. AVX512 even has SIMD rotates, making it even better. Other SIMD ISAs can emulate it with shift+shift+OR. SIMD integer multiply is not bad for 32-bit integers on Intel CPUs with SSE4.1, but requires extended-precision techniques for 64-bit integer elements (until AVX512), which is why I used xorshift+ instead of *. But xoshiro256** only multiplies by the constants*5
and*9
, which are both power-of-2 + 1 so are just left-shift+add. (In a scalar implementation, x86 can do that in one cycle withlea rax, [rbx + rbx*8]
.)
$endgroup$
– Peter Cordes
Jan 21 at 16:24
|
show 2 more comments
$begingroup$
Reducing the rounds to 8 would give you Salsa20/8, which is not just a fast PRNG operating at 1.88 cycles per byte on Core2Duo, but is still quite cryptographically secure with the best attack requiring approximately 2244 operations. Removing the final addition step would not be good though, as without that, it would be trivial to reverse the function and discover the key and counter given just a single block of known plaintext. You will not get all zero outputs by keeping the addition, so you should keep it.
You could cut the algorithm down to four rounds in order to roughly double the speed while completely sacrificing cryptographic security. Less than four rounds results in incomplete diffusion, leading to biased and non-uniform output. However, it will still be roughly twice as slow as the fastest dedicated non-cryptographic PRNG, XorShift128+ (an LFSR-based PRNG at 0.48 cycles per byte on Kaby Lake).
$endgroup$
Reducing the rounds to 8 would give you Salsa20/8, which is not just a fast PRNG operating at 1.88 cycles per byte on Core2Duo, but is still quite cryptographically secure with the best attack requiring approximately 2244 operations. Removing the final addition step would not be good though, as without that, it would be trivial to reverse the function and discover the key and counter given just a single block of known plaintext. You will not get all zero outputs by keeping the addition, so you should keep it.
You could cut the algorithm down to four rounds in order to roughly double the speed while completely sacrificing cryptographic security. Less than four rounds results in incomplete diffusion, leading to biased and non-uniform output. However, it will still be roughly twice as slow as the fastest dedicated non-cryptographic PRNG, XorShift128+ (an LFSR-based PRNG at 0.48 cycles per byte on Kaby Lake).
edited Jan 21 at 11:08
answered Jan 21 at 11:03


forestforest
4,3501641
4,3501641
$begingroup$
Some other non-cryptographic algorithms are certainly faster, but they have a smaller state (I need room for a SHA256 hash) and I need streams. There doesn't seem to be much choice other than crypto algorithms if you have these requirements.
$endgroup$
– Thorham
Jan 21 at 12:00
$begingroup$
Is 0.48 c/b for a scalar implementation? You can run four XorShift128+ PRNGs in parallel in elements of an AVX2 vector. See AVX/SSE version of xorshift128+ for__m256i xorshift128plus_avx2(struct rngstate256 *sp)
. 8 SIMD ALU uops per 32 bytes of results => about 12 bytes per cycle, or 0.0833 c/b on SKL / KBL. (I used it in my answer on What's the fastest way to generate a 1 GB text file containing random digits? which does > 8 bytes per cycle of space-separated ASCII decimal digits on SKL.)
$endgroup$
– Peter Cordes
Jan 21 at 14:52
$begingroup$
@Thorham: would it work to use a SHA256 hash as the seed for two XorShift128+ PRNGs operating in parallel? If so, 2x 128-bit SIMD vectors will work, and let you generate 2x 64-bit random numbers in parallel. Or use 256-bit vectors to run 4 generators in parallel, requiring twice as much seed data. See my previous comment for C++ and C intrinsics implementations.
$endgroup$
– Peter Cordes
Jan 21 at 14:58
2
$begingroup$
@PeterCordes, Thorham, xoshiro256**/+ are available too. (They use rotate, not just xor-shifts. May not be as suitable for vector implementations.) Two instances of a 128-bit algorithm seeded like that isn't too different from just truncating SHA-256 output to 128 bits.
$endgroup$
– Future Security
Jan 21 at 16:09
1
$begingroup$
@FutureSecurity: SSE2 / AVX2 xoshiro256** looks very possible. AVX512 even has SIMD rotates, making it even better. Other SIMD ISAs can emulate it with shift+shift+OR. SIMD integer multiply is not bad for 32-bit integers on Intel CPUs with SSE4.1, but requires extended-precision techniques for 64-bit integer elements (until AVX512), which is why I used xorshift+ instead of *. But xoshiro256** only multiplies by the constants*5
and*9
, which are both power-of-2 + 1 so are just left-shift+add. (In a scalar implementation, x86 can do that in one cycle withlea rax, [rbx + rbx*8]
.)
$endgroup$
– Peter Cordes
Jan 21 at 16:24
|
show 2 more comments
$begingroup$
Some other non-cryptographic algorithms are certainly faster, but they have a smaller state (I need room for a SHA256 hash) and I need streams. There doesn't seem to be much choice other than crypto algorithms if you have these requirements.
$endgroup$
– Thorham
Jan 21 at 12:00
$begingroup$
Is 0.48 c/b for a scalar implementation? You can run four XorShift128+ PRNGs in parallel in elements of an AVX2 vector. See AVX/SSE version of xorshift128+ for__m256i xorshift128plus_avx2(struct rngstate256 *sp)
. 8 SIMD ALU uops per 32 bytes of results => about 12 bytes per cycle, or 0.0833 c/b on SKL / KBL. (I used it in my answer on What's the fastest way to generate a 1 GB text file containing random digits? which does > 8 bytes per cycle of space-separated ASCII decimal digits on SKL.)
$endgroup$
– Peter Cordes
Jan 21 at 14:52
$begingroup$
@Thorham: would it work to use a SHA256 hash as the seed for two XorShift128+ PRNGs operating in parallel? If so, 2x 128-bit SIMD vectors will work, and let you generate 2x 64-bit random numbers in parallel. Or use 256-bit vectors to run 4 generators in parallel, requiring twice as much seed data. See my previous comment for C++ and C intrinsics implementations.
$endgroup$
– Peter Cordes
Jan 21 at 14:58
2
$begingroup$
@PeterCordes, Thorham, xoshiro256**/+ are available too. (They use rotate, not just xor-shifts. May not be as suitable for vector implementations.) Two instances of a 128-bit algorithm seeded like that isn't too different from just truncating SHA-256 output to 128 bits.
$endgroup$
– Future Security
Jan 21 at 16:09
1
$begingroup$
@FutureSecurity: SSE2 / AVX2 xoshiro256** looks very possible. AVX512 even has SIMD rotates, making it even better. Other SIMD ISAs can emulate it with shift+shift+OR. SIMD integer multiply is not bad for 32-bit integers on Intel CPUs with SSE4.1, but requires extended-precision techniques for 64-bit integer elements (until AVX512), which is why I used xorshift+ instead of *. But xoshiro256** only multiplies by the constants*5
and*9
, which are both power-of-2 + 1 so are just left-shift+add. (In a scalar implementation, x86 can do that in one cycle withlea rax, [rbx + rbx*8]
.)
$endgroup$
– Peter Cordes
Jan 21 at 16:24
$begingroup$
Some other non-cryptographic algorithms are certainly faster, but they have a smaller state (I need room for a SHA256 hash) and I need streams. There doesn't seem to be much choice other than crypto algorithms if you have these requirements.
$endgroup$
– Thorham
Jan 21 at 12:00
$begingroup$
Some other non-cryptographic algorithms are certainly faster, but they have a smaller state (I need room for a SHA256 hash) and I need streams. There doesn't seem to be much choice other than crypto algorithms if you have these requirements.
$endgroup$
– Thorham
Jan 21 at 12:00
$begingroup$
Is 0.48 c/b for a scalar implementation? You can run four XorShift128+ PRNGs in parallel in elements of an AVX2 vector. See AVX/SSE version of xorshift128+ for
__m256i xorshift128plus_avx2(struct rngstate256 *sp)
. 8 SIMD ALU uops per 32 bytes of results => about 12 bytes per cycle, or 0.0833 c/b on SKL / KBL. (I used it in my answer on What's the fastest way to generate a 1 GB text file containing random digits? which does > 8 bytes per cycle of space-separated ASCII decimal digits on SKL.)$endgroup$
– Peter Cordes
Jan 21 at 14:52
$begingroup$
Is 0.48 c/b for a scalar implementation? You can run four XorShift128+ PRNGs in parallel in elements of an AVX2 vector. See AVX/SSE version of xorshift128+ for
__m256i xorshift128plus_avx2(struct rngstate256 *sp)
. 8 SIMD ALU uops per 32 bytes of results => about 12 bytes per cycle, or 0.0833 c/b on SKL / KBL. (I used it in my answer on What's the fastest way to generate a 1 GB text file containing random digits? which does > 8 bytes per cycle of space-separated ASCII decimal digits on SKL.)$endgroup$
– Peter Cordes
Jan 21 at 14:52
$begingroup$
@Thorham: would it work to use a SHA256 hash as the seed for two XorShift128+ PRNGs operating in parallel? If so, 2x 128-bit SIMD vectors will work, and let you generate 2x 64-bit random numbers in parallel. Or use 256-bit vectors to run 4 generators in parallel, requiring twice as much seed data. See my previous comment for C++ and C intrinsics implementations.
$endgroup$
– Peter Cordes
Jan 21 at 14:58
$begingroup$
@Thorham: would it work to use a SHA256 hash as the seed for two XorShift128+ PRNGs operating in parallel? If so, 2x 128-bit SIMD vectors will work, and let you generate 2x 64-bit random numbers in parallel. Or use 256-bit vectors to run 4 generators in parallel, requiring twice as much seed data. See my previous comment for C++ and C intrinsics implementations.
$endgroup$
– Peter Cordes
Jan 21 at 14:58
2
2
$begingroup$
@PeterCordes, Thorham, xoshiro256**/+ are available too. (They use rotate, not just xor-shifts. May not be as suitable for vector implementations.) Two instances of a 128-bit algorithm seeded like that isn't too different from just truncating SHA-256 output to 128 bits.
$endgroup$
– Future Security
Jan 21 at 16:09
$begingroup$
@PeterCordes, Thorham, xoshiro256**/+ are available too. (They use rotate, not just xor-shifts. May not be as suitable for vector implementations.) Two instances of a 128-bit algorithm seeded like that isn't too different from just truncating SHA-256 output to 128 bits.
$endgroup$
– Future Security
Jan 21 at 16:09
1
1
$begingroup$
@FutureSecurity: SSE2 / AVX2 xoshiro256** looks very possible. AVX512 even has SIMD rotates, making it even better. Other SIMD ISAs can emulate it with shift+shift+OR. SIMD integer multiply is not bad for 32-bit integers on Intel CPUs with SSE4.1, but requires extended-precision techniques for 64-bit integer elements (until AVX512), which is why I used xorshift+ instead of *. But xoshiro256** only multiplies by the constants
*5
and *9
, which are both power-of-2 + 1 so are just left-shift+add. (In a scalar implementation, x86 can do that in one cycle with lea rax, [rbx + rbx*8]
.)$endgroup$
– Peter Cordes
Jan 21 at 16:24
$begingroup$
@FutureSecurity: SSE2 / AVX2 xoshiro256** looks very possible. AVX512 even has SIMD rotates, making it even better. Other SIMD ISAs can emulate it with shift+shift+OR. SIMD integer multiply is not bad for 32-bit integers on Intel CPUs with SSE4.1, but requires extended-precision techniques for 64-bit integer elements (until AVX512), which is why I used xorshift+ instead of *. But xoshiro256** only multiplies by the constants
*5
and *9
, which are both power-of-2 + 1 so are just left-shift+add. (In a scalar implementation, x86 can do that in one cycle with lea rax, [rbx + rbx*8]
.)$endgroup$
– Peter Cordes
Jan 21 at 16:24
|
show 2 more comments
Thanks for contributing an answer to Cryptography Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcrypto.stackexchange.com%2fquestions%2f66640%2fsalsa20-as-a-prng-with-streams%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
$begingroup$
Related: crypto.stackexchange.com/q/57670/54184
$endgroup$
– forest
Jan 21 at 11:07
3
$begingroup$
Consider ChaCha8. ChaCha is faster and has more diffusion per round.
$endgroup$
– Future Security
Jan 21 at 16:15