This is less exciting now that CPUs have high quality, fast hardware random generators built in.
(Which can be safely mixed with other sources of randomness, because the NSA obviously backdoored Intel's generator)
Researchers have devised a method of generating random numbers that could shake up computer encryption. University of Texas computer science professor David Zuckerman and PhD student Eshan Chattopadhyay have found that a "high-quality" random number could be generated by combining two "low-quality" random sources. You can …
Careful or that thinking can lead to the following lol.
“So the OpenSSL codebase does “get the time, add it as a random seed” in a bunch of places inside the TLS engine, to try to keep entropy high. I wonder if their moto is “If you can’t solve a problem, at least try to do it badly”.” — theo
"Well, even if time() isn't random, your RSA private key is probably pretty random (explaining one of OpenSSL methods for getting random input)
Do not feed RSA private key information to the random subsystem as entropy. It might be fed to a pluggable random subsystem…. What were they thinking?!"
This post has been deleted by its author
To be fair https://www.openssl.org/policies/roadmap.html is fairly impressive what has been done already. Amazing what can be done when money and nerds combine. That said still glad I only have nss and Libressl for internet encryption on my machines even if both of plenty of CVEs as well.
Lots of people had already had the same idea, but the assumption from the intelligentsia was that there was no way to mathematically generate random numbers, so combining two non-random streams was merely going to give you a different, but still non-random, stream.
Lots of people had already had the same idea,
Obviously, since we have many widely-used procedures for combining both random sources and PRNG outputs. (It's not clear which you're referring to.) Assuming by "the same idea" you mean what the OP meant - the general, and rather obvious, idea of combining random or pseudorandom sources. In computing, that goes back at least to Von Neumann ("Various techniques used in connection with random digits", 1951).
Of course the specific mechanism described the paper is not entirely novel either; we've known about Ramsey graphs since, er, Ramsey (so sometime in the 1920s). This particular application does appear to be new, and worthy of respect, despite what the Reg brain trust believe.
but the assumption from the intelligentsia was that there was no way to mathematically generate random numbers
This remains true, and irrelevant, except as background, to the discussion at hand. And it's not an "assumption from the intelligentsia"; it's a fundamental consequence of deterministic processes, at least as long as causality remains intact.
so combining two non-random streams was merely going to give you a different, but still non-random, stream.
Still true.
The paper that the article refers to describes a mechanism for extracting a high-entropy bitstream from two sources, each of which have lower entropy. (It compresses the source streams, consuming multiple input bits to produce each output bit, obviously, since you can't create information entropy using a deterministic process.)
All you people had to do was click on the link and glance over the abstract. Even if you don't have the mathematical background to understand the details, phrases like "the best previous extractor" and "the best previous construction" would have clued you in: this is a new, significantly better solution to a problem that has seen considerable research. That's the sense in which it is a "breakthrough".
Absolutely.
I was generating some random values the other day for unit testing and hit upon the problem of random() not really being random and used the very same idea of combining two random number generators together.
I'm no genius and definitely no crypto expert, so if I can hit upon the idea, I'm sure others must have.
used the very same idea of combining two random number generators together
The very same idea? You used the combining mechanism Zuckerman and Chattopadhyay describe in their paper?
Or do you mean "I combined two [PRNGs, or more likely two instances of the same PRNG with different seeds] using some trivial mechanism"?
Because, shockingly, over the past several decades many researchers have examined various ways of combining PRNG output. We've known for a long time, for example, that any linear combination of LCRNGs is breakable.
I'm no genius and definitely no crypto expert
Clearly. Apparently you don't know enough about cryptography to understand the difference between what's being described (albeit poorly) in the article, and any half-assed algorithm for combining two PRNGs.
But then from the other comments we can see you're in good company.
Use the first source to fill a large array (65536 entries is good). Then use the second source to generate a 16-bit index into the array. The entry so found, you export, then replace with the first source's next output.
This is known as algorithm 'M'. It is mentioned in Cryptographic Shuffling of Random and Pseudorandom Sequences, a 2007 paper by Markus Dichtl.
Algorithm M is OK for some purposes, but you're leaking significant chunks of output from source A on each iteration, and 2**16 is not a lot of confusion for cryptographic purposes, particularly if an attacker can recover a sufficiently large stream. And in general it requires the two sources have decent statistical pseudorandomness.
The algorithm M construction is also worryingly similar to that of RC4, which makes me wonder if some of the correlations that break RC4 might also show up in M.
In any case, it's not terribly relevant to the paper cited in the article, unless and until someone does a similar analysis and shows that it has competitive entropy recovery, error (bias), resilience, etc.
Any* camera image, even heavily compressed jpeg will hold oodles of entropy. Even a "dark frame." It's the unsung silver lining of their epic shitness. Just feed it into a hash function if you want to do is distil out that entropy... but make sure it never gets stored/cached or used anywhere else... obviously.
*except a grossly over exposed snap of a white sheet of paper... perhaps...
"Entropy" is a measure of the randomness of an input. Just because you have a 64-bit number, doesn't mean that all the 64 bits are equally random. If the bits have some correlation between them, the effective randomness can be much less.
For example: English text has between 0.6 and 1.3 bits of entropy for each character. This means that on average, if you've seen the text written up to a certain point, you have roughly a 50/50 chance of guessing the next character (for 1 bit of entropy)
The same applies to a 64-bit random number. If each zero and one is not completely independent from the previous one, then having seen some of the bits you are able to predict with a better than 50/50 certainty what the next one will be. This gives you a head start in breaking many cryptographic algorithms.
But can't you just feed "a character of English" into a pseudo random generator for every 0.6 to 1.3 bits you take out of it?
It still seems obvious that you can have a simple "mixer" (i.e. a feedback shift register) where you add the raw bits from your source, and make sure you never pull out more entropy than you put in. So for every 32 bits from your bad random source, you add, for example 7 to a counter if you estimate that it'll have 7 bits of Entropy. The function to get a random number will have a parameter indicating the number of bits it wants, and for this you substract that number from your counter. If your counter would go below 0, you wait until you have enough entropy in your buffer.
Compute two 64-bit numbers seeded by different values, and now you have a random 128-bit number.
No. First, what you have is pseudorandom; and second, linear combinations of LCPRNGs and other common PRNGs are not suitable for cryptographic purposes. That was demonstrated decades ago.
I wonder how they do this
The link to the paper's right there in the article. Section 1.2 in particular describes most of the theory behind it. It's not trivial.
There are a few key ideas, such as Ramsey graphs, which we might say intuitively are graphs that are "sufficiently messy"; and resilient functions. A resilient function is a Boolean function of many inputs (bits), such that even if several of the inputs are fixed (influenced by an attacker), the output is still unpredictable with high probability. The key to their approach is a feasible method of generating a suitable resilient function.
I'd imagine that it is well known that as you add more and more streams of random numbers together the closer the resulting series becomes to being random drawn from a normal distribution.
That's not well known, because it's not true.
First: If you actually have "streams of random numbers", then combining them in some fashion (let's take "add" to mean any mechanism for combining them, and not just the arithmetical operation) could reduce entropy and produce a less statistically random output. That could be because the streams are correlated in some way, or because some of the streams have lower entropy, or because the combining mechanism is lossy.
Second, if in fact you mean pseudorandom streams, then some combining mechanisms can create output that has better statistical randomness; but no deterministic mechanism can create true randomness, and some combiners will, again, produce worse output, not better.
It's analogous to using salt and pepper with your hash.
The trick will be making sure that there is indeed no relation to either value. The same goes with the method of combining the two (hummmmm two random figures added together randomly? The mind boggles).
Also when combining the two how do they prevent lost values? (think having 2 dice, yes you can go upto 12 but you lose 1 in the process as well as having an increase likelihood of certain values occurring, nee 5,6,7 but I digress).
Argh argh argh.
The paper provides the mechanism for combining the two sources. That is what it is about.
It is not claiming that combining two weak entropy sources is a novel idea. In fact, as anyone who ever reads such things would expect, it goes into the extant research at considerable length.
When I was learning to program on a BBC micro and needed random numbers for a manufacturing simulation, I used the gap between certain user keystrokes as the seed for the BASIC random number generator function. This was to prevent the simulation running predictably as it would've done with a baked in seed value.
Then you cheated. Now rewrite the code to work without a human at a keyboard and you know the problem; every typical modern computer lacks a true random number generation system. Various devices exists that spew out randomness, using the variance in the properties of certain electronic components, but those are special use cases. The pseudo-random number generators are what we're left with. And if you've ever tried to spew some randomness out of your code you run into that very problem; the pseudo-random numbers ARE NOT random. Depending on timey things like ticks, or time, or the Epoch seconds just doesn't cut the mustard. Neither does using "weird things" like the number of files, kernel or network stats, or the hash of all the files in the system at a given time; none are unique within the same system.
That's why this is a big deal because it might be possible to craft two chunks of psudo-random and get something much closer to an actual random number. Doing that will extend the abilities of non-quantum systems to remain fairly viable with respect to security and other things that depend on randomness, like AI or games, etc.
"...every typical modern computer lacks a true random number generation system..."
Every typical modern computer is connected to the Internet. Plenty of randomness available, including web services that will return supposedly good random data.
e.g. random.org
Also, if you need only modest quantities, include a 'one-time pad'-like store of random bits as a file.
Also, if you need only modest quantities, include a 'one-time pad'-like store of random bits as a file.
But how do you generate those random bits? It's unlikely to be a modest quantity, either. You're going to need a lot of bits for the lifespan of each installation of your software. That could require many days of a human moving a mouse around at random (which wouldn't be a good method for this purpose anyway, for many reasons).
"Every typical modern computer is connected to the Internet. Plenty of randomness available, including web services that will return supposedly good random data. e.g. random.org"
That makes securing the connection between your source of randomness and your use of it in passwords and key materials a whole lot more complex and difficult than it already was. In the sixties IBM consultants told mainframe managers to generate master passwords using dice in a closed office, as this was the one thing they had full control over and could restrict external observation of.
Nothing on the Internet can really be trusted since it can be pwned or MITM'd, and a highly-resourceful adversary (such as a state) can even game network flow. As for a store of random bits, anything that can be stored can be read and cheated. That's why the Linux kernel never stores its entropy pool in known areas of memory and limits it to (last I checked) 4096 bits.
it might be possible to craft two chunks of psudo-random and get something much closer to an actual random number
No, that is not possible, assuming you're talking about a deterministic procedure and causality is still in force.
What is possible is combining two low-quality sources into a less-biased, less-predictable one. The goal is to do so with poor (low min-entropy) sources, in a practical fashion (i.e. while preserving as much entropy as possible, minimizing bias, using reasonable resources, etc). This paper represents a significant advance in that area.
@Nifty,
The BBC BASIC RAND() function was not random, but deterministic. If you seed it with the same number, it will return the same sequence of numbers. It is only once you seed it with a single random number that it becomes a pseudo-random generator. Normally, that seed is taken from user-generated timing data, but it could come from another source. But anyway, in the seeded pseudorandom generator, you are not combining two "poor" random sources -- random_number+algorithm = pseudorandom_source
So what's wrong with my method?
Pick a OS and programming language. Use it's random number generator. By seeding it with a time value you get a traceable sequence of numbers that appear random if you don't have that seed value.
I noticed that just a small variation of the OS or the language causes the random number sequence to be vastly different. Works for me.
It works in a computer game dice roll scenario but not a security scenario. Your possible seed values is minutely small because I have a high probability of guessing your clock time to "within seconds". The default system timer on windows has a resolution approximating 10ms (actually closer to 16ms but 10 makes my math easier). That leaves only 100 possible seed values per second. That is easily brute forced.
Now, take the number of files on the computer and use that number as a seed.
Then XOR the "random" number from your method with the "random" number from the files, and you have a decent random number. Even better: you can use the XOR result as a seed, and that should be an acceptably random result.
> Use it's random number generator. By seeding it with a time value you get a traceable sequence of numbers that appear random if you don't have that seed value
That's the problem: they only *appear* random. The only true "randomness" is in the time you started the program, and anyone who can guess that will be able to generate exactly the same sequence.
Let's say I see an encrypted E-mail coming out of your computer. It's fairly reasonable to guess that you started the program sometime in the previous 24 hours. That gives only 86400 seeds to try (if your clock has one-second resolution)
Even if your clock has milliseconds that's only 2^26 settings to try; and in practice not all of those are equally likely (e.g. it's much more likely you started the program within the 10 minutes before you sent the mail).
56 bits of entropy would give 2^56 bit combinations to try; that's still feasible by brute force.
What it boils down to is: if you're generating a random 128 bit key, then you need 128 bits of entropy to make all keys equally likely. Generating that much entropy is harder than it seems. Unless your machine has a hardware random number generator, then the OS does its best by measuring things like the times between keystrokes and the time for the hard drive to respond to I/O requests.
This is a serious problem for VMs, because you could start 1000 VMs from the same image in an identical state; if they are generating (say) ssh keys, then without an external source of randomness they will generate predictable keys.
The sad fact is that this result is not new at all. Back in the mid 80's I was R&D manager for a manufacturer of poker machines and one of the first pieces of advice we were given by the University Math/Stats consultant we retained was to use two random number generators and combine the outputs to create a single random number of much higher quality.
Not sure either how you get a paper out of this in 2016 or why it's being reported now.
But did you read the paper? I am guessing that about half the people on this thread didn't, and of those who opened the paper, 99.999% said 'stuff this - it's all number theory'.
And it is, very advanced number theory.
But I agree with you, There is definite prior art on this.
But I agree with you, There is definite prior art on this.
Care to cite it? Who has previously published a feasible approach for generating a two-source extractor with good properties, using sources with min-entropy lower than 0.499n (or the other previously-best approaches, such as Raz's where one source only needs min-entropy ~ log n, but the other needs better than n/2)? I'm sure we'd be glad to hear about this lost treasure.
It depends on what the paper actually covers, which I have not read. Prior may have been based on noticing that the results are more random but was it based on sound math? The paper may be giving a sound math basis for using and how to use 2 random number sources to make a better random number. This could be the difference between solid engineering (it works but we are completely sure why) and solid math explaining how it works.
Prior may have been based on noticing that the results are more random but was it based on sound math?
Not "more random". That's completely wrong. The goal of an extractor is to provide an output stream that meets various statistical criteria, preserves as much entropy of its inputs as possible, is feasible to implement and use, and is amenable to analysis proving these attributes. What an extractor cannot do is produce an output stream that is "more random" than its inputs. You can't create information entropy using a deterministic process; you can only move it around.
There is a vast amount of research in this area - the paper has four pages of citations, nearly all to relatively recent work. The approach described in the paper is significantly better than the best previously published results.
"...use two random number generators and combine the outputs to create a single random number of much higher quality."
One woūld have to careful about how they're combined. (Yeah, I know it's obvious...)
Bit-wise OR would be a very poor choice (75% '1's).
Bit-wise XOR might work.
Shuffling might be better.
In the past linear congruential RNGs (shudder) were often combined (it's even Numerical Recipes) by taking the "better" higher order bytes of two different ones and concatenating them, because the lower order bits were not very random at all (the least significant goes 0,1,0,1,0,1,0,1,0,1,0,1,0,1,.... if you have got it right, otherwise it goes 0,0,0,0,0,... or 1,1,1,1,1,...). This shows that the basic idea is certainly not new, and for certain (toy) problems it may well suffice. What apparently is new is a proof of method to combine two reliably (still have to read the paper properly). Proof is a thing that carries rather more weight in science than just a nice (and plausible, or even obvious) idea.
> Kaltern
If you are going to use an external sensor then it is much easier to use a hardware random generator. Most electronic circuits generate noise especially shot noise, this can be enhanced and it will generate a stream of random 1 and 0s.
Just google hardware random number generator for circuits.
The reason we often do not want truly random numbers is that we want to be able to generate the same sequence again with the same seed (or set of seeds). If we have truly random bits, and want to use that for encryption, we must pass the whole sequence to the receiver through a secure channel (i.e. we are using a one-time pad). Very secure if we can pass the pad safely to the receiver.
"Never really understood why a external sensor for say, air temp or barometric pressure, accurate to say, 10 significant figures couldn't be used. That would be constantly changing, and would be pretty much impossible to recreate accurately - and would be totally autonomous."
Don't be so sure. You have the consider the resourcefulness of high-level adversaries such as states that could well have methods to control the atmosphere for their purposes. Suppose they confiscate a machine and put it in a very-climate-controlled room?
"Cryptographic algorithms attempt to keep information secret by transforming it into a stream of random numbers, making it unintelligible"
I always thought that if you have a stream of truly random numbers, then all you have is a stream of random numbers. Not only is it unintelligible, but nothing you ever do to it will turn it back into information.
You forget the Million Monkeys principle. ANY random bitstream can be altered (say an XOR) with just the right bitstream to return intelligible information. That's one reason why the One-Time Pad is considered the only cryptographically strong system: because it relies on the principle that the end result depends completely on the key: another key can result in another cleartext of equal probability.
If they had reviewed the existing literature better, they would have learned that they only need a strong Brownian Motion producer (say a nice hot cup of tea).
Cryptographic algorithms attempt to keep information secret by transforming it into a stream of random numbers, making it unintelligible. These numbers, however, are not truly random since the algorithm uses a predetermined set of mathematical formulae to generate these numbers, so the information can be breached.
Wow. There is almost nothing correct in that entire paragraph.
While this paper mentions cryptographic applications of multiple-source extractors, those applications do not include "transforming [plaintext] into a stream of random numbers", except in the most tenuous sense - and that's if we gloss "random" as "pseudorandom". And "a predetermined set of mathematical formulae" is just an awkward definition for an algorithm, so that whole clause is tautological. And "the information can be breached", insofar as it means anything at all, is obvious; encryption is defined by the possibility of decryption.