Re: timing attacks - can these be protected against by simply inserting into a p...

tptacek · on Oct 19, 2014

Apart from the fact that adding noise just increases the measurement cost, the "add random sleeps" solution has two other problems, one small and one big.

The big one: it implies that you actually know what the side channel is, so you can sleep at a useful point. The virtue of relying solely on constant-time operations is that they're constant time by design. You don't have to worry about how the variable-time operations are implemented, because there aren't any.

The smaller problem is that you have to make sure that your noise function is not itself a target of attack, which can be a little tricky because your noise function must be extremely, extremely cheap to compute.

There's also the obvious "non-problem" that reducing a function to it's lowest common denominator of performance defeats the purpose of high-speed cryptography and variably-timed implementations in the first place.

kragen · on Oct 21, 2014

Random sleeps, as you say, have the problem that they don't eliminate the timing side channel, but just add noise to it. However, as robryk points out below, you can eliminate the timing side channel entirely, assuming you know what it is, by sleeping until a predetermined point. In pdkl95's example, you could sleep until 1.1 seconds after the cryptographic operation was initiated, then sending your response. The predetermined point could be determined randomly as long as the random number generator is constrained not to return a point that's so early that the crypto might still be running. But randomness does not help in this case.

tptacek · on Oct 21, 2014

You addressed the "apart from the fact" point in my comment, but not the meat of it. I point that out because someone else already brought up the measurement cost issue; my point was that there are other concerns.

Also, my comment directly addressed the notion of a lowest- common- denominator sleep.

I think maybe you meant to respond to a different comment?

kragen · on Oct 21, 2014

No, my response is attached to the correct comment of yours. I directly responded to the two "problems" you brought up, but I will now attempt to do so in more detail, in case the problem is that my comment was not clearly expressed.

You say that adding sleeps only helps if you know what the side channel is. That's true; for example, adding sleeps won't help at all against power analysis, acoustic analysis, or cache timing attacks. But that's also true of other side-channel countermeasures. For example, using constant-time operations (as Bernstein suggests and you endorse) probably won't help against attacks using those other side channels either. Most side-channel countermeasures are only effective against the side channels they're intended to protect, or not even those.

But pdkl95 was specifically talking about over-the-network timing side-channel attacks on a protocol. Adding sleeps will defend against that, even if you don't know which part of your code has variable timing. For example, Futoransky–Saura–Waissbain 2007 published "ND2DB", a feasible over-the-network timing side-channel attack on database indices. Sleeping so that the time in between network transactions does not depend on any of the data being transmitted will defend against ND2DB as well as any timing attack against your cryptosystems, while reimplementing your cryptosystems to use only constant-time operations will not defend against Futoransky's attack.

TheLoneWolfling pointed out correctly that you can only do this imperfectly in the presence of concurrency and caches, because the network response may take longer to send if, for example, your data has been swapped to disk in the interim. That is true, but I think that you can still reduce the information leakage by two to five orders of magnitude with this approach. For example, if you choose 1100ms as the time interval between receiving the request and sending the response, and 0.1% of the time your response is delayed until 1108ms because of paging in a single disk page, you are leaking about 0.01 bits to the attacker per transaction. If your timing varied over about a 50ms range and the attacker can measure latency to within 50μs, you would have been leaking about 10 bits to the attacker per transaction.

Second, you say that you have to make sure that your noise function itself is not a target of attack, which I take to mean that you must ensure that it doesn't leak information either; for example, the predictable cycling of the low-order bits of old "rand()" implementations could tell an attacker whether the process they connected to had handled transactions from other clients in between transactions to the attacker.

While this is true, I agree with you that it is not a big problem. Here's why. As I said, randomness does not help in this case, so you can simply pick a constant time interval, or one that only depends on data the attacker can already observe, such as the response size; or you can use any CSPRNG to generate a time to add to the constant time interval.

I didn't say anything about this before, but I also agree that it's (usually) a non-problem to make a transaction's average-case time the same as its worst-case time, but I don't agree that doing so "defeats the purpose of high-speed cryptography", because high-speed cryptography generally reduces the worst-case time to an acceptable time.

Is that better? Please correct me if I've said something incorrect above.

tptacek · on Oct 22, 2014

That's much clearer to me. I think we probably agree more than we disagree.

My point, as you've acknowledged, is simple. I think, for a bunch of reasons which are more and less easy to mitigate depending, it's a good rule of thumb to just avoid key-dependent side channels altogether. If you know what you're doing, you can shrink the risks. I'm dubious about whether they can be eliminated.

But even if one is the kind of person who can argue eloquently and accurately about those risks on a mailing list (nb: not alluding to you here), one is unlikely in my experience to be the kind of person who can actually implement countermeasures to these problems reliably. I've reviewed lots of expertly developed cryptosystems and found exploitable problems. My experience suggests that even good implementors do not tend to design systems from the vantage point of attackers, which squares with my experience of every other kind of software security as well.

A caveat to all of this is that it's easy for me to opine about what I think good conservative rules of thumb are because my technical depth doesn't extend to safely implementing variably-timed crypto operations. A subject matter expert might be able to refute a lot of what I think.

Because this is a fluffy comment, a quick shotgun blast of technical responses, none of which beg for rebuttals (but feel free):

* I can think of cases where randomized delays on one code path failed to eliminate timing leaks on other code paths that turned out to be relevant to attackers.

* I'm skeptical about exploitation of microscopic timing leaks; I don't even worry about memcmp timing, really. Other people are less skeptical, and there's a line of research pursuing statistical filtering and heuristics to bring those timing channels into reach. Also, there are some secrets that are attackable essentially indefinitely.

* My thoughts about worst-case timing are probably poisoned by client reactions to the idea of slowing down crypto operations at all.

kragen · on Oct 23, 2014

I agree, except that I'm not skeptical about exploitation of microscopic timing leaks.

pdkl95 · on Oct 20, 2014

Ok, thanks! That was as very clear explanation!

MichaelGG · on Oct 19, 2014

Adding random sleeps does not prevent timing attacks. It just adds noise making them a bit more work to pull off.

e12e · on Oct 19, 2014

I suppose one (not very good) approach to "patching-in" constant run-time, would be to limit the runtime of every iteration to be that of the upper bound (possibly +some constant). I suspect that's pretty hard to actually do, and even if accomplished, would open up to other side-channel/timming attacks (eg: check cpu load, and infer timing).

robryk · on Oct 19, 2014

But, if you have an upper bound on the time of the operation, you can wait until the bound elapses, so that your operation always takes that long.

TheLoneWolfling · on Oct 19, 2014

The problem is: good luck getting an upper bound. What happens if the process is paged to disk, for instance?

No matter what upper bound you set you'll still end up with it going over at some point - and as soon as you go over the upper limit you're leaking timing information again.

This also ups RAM requirements, drastically in some cases.

And it isn't even necessarily secure against timing attacks! Because the thread scheduling by the OS changes depending on how much work is being done, and so it could take less or more time to retrieve the saved response depending on how long another request takes to process.