Twelve Days 2013: Side Channel and Timing Attacks
Day Nine: Side Channel and Timing Attacks
TL/DR
Layer 7 attacks are becoming increasingly sophisticated and prevalent, and side channel attacks can break mathematically sound cryptography. They’re interesting in their own right but the same principles are highly relevant to real-time computing, which is why I’m writing about this. Anything that results in measurable latency jitter or nondeterministic behavior can be a problem from both a performance and security standpoint.
Disclaimer
Cryptography and security is just an interest—it’s not my day job. I’m presenting this in the context of low latency/real-time computing, and one of the best ways to illustrate potential problems in that space is to choose canonically bad data as an input. That happens to be the same thing that you would do for malicious purposes. The code that I’ve provided to illustrate these effects is inefficient by design. If you’re interested in this stuff, read about it from the guys who know what they’re doing, like Dan Boneh, Paul Kocher and Bruce Schneier to name a few.
Information Leakage and Low Hanging Fruit
Our world banks (literally) on cryptography being mathematically sound. If it’s not and someone knows that we have a big problem, so let’s just assume that it is. Given that, why are there still media worthy security breaches almost daily? Attacking the problem head on as it’s expressed in equation form might be very hard, and hopefully impossible given our modern computing capabilities. However, at the end of the day everything is implemented in software and code executes on hardware. Crypto is often used incorrectly, code has bugs, and hardware can leak information in surprising ways. There are lots of ways to cause big problems short of breaking the underlying algorithms.
Sources of Variance in Time
For modern desktop/server CPUs there is, by design, a huge disconnect between the code that’s written (even in assembly) and the computations carried out. When you layer an operating system on top of that, there’s an even larger disconnect between what you’ve written in userland and what happens on the bare metal. At the lowest level CPUs exploits instruction level parallelism to do as much work as possible per clock cycle. Processors such as the x86 have multiple execution units to accelerate arithmetic and add additional parallelism. As main memory access is slow relative to the CPU, speculative execution and branch prediction are used to “guess” what data will come back from a load instruction so that computation can continue based on this guess and rewind when the guess is wrong. Furthermore, a CPU’s cache tries to keep recently used (or likely to be used) data closer to the processor for fast access. When the CPU goes to read a value that’s not there it has to access main memory, which can be an order of magnitude slower. Cache misses, branch mispredictions, TLB page faults, and instruction level parallelism all betray the secrets of the computation going on behind the scenes.
Given the above, at times it’s possible to work out the instructions/data that were used as inputs via statistical analysis of something that’s hard to hide: time elapsed. Things get even worse if the attacker has physical access to the box. An analysis of the power used by the CPU, the sounds that the computer makes, or residual data left in RAM even after the computer powers down can be used to access data or recover a private encryption key. These techniques can even break virtual machine hypervisor isolation. On the application layer things are much simpler and the attacker has even more to work with. Studying variations in how long it takes to get a response to some request is enough to bleed information remotely. If the attacker wants to interfere with normal operations (denial of service), or statistically derive sensitive information, remote timings can help her/him do so.
Hash Collisions of Death
When cryptographers discuss hash collisions it’s usually in the context of cryptographic hash functions. For a cryptographically secure hash function, having two distinct inputs hash to the same value by chance is really, really bad. Figuring out how to compute the original value given the hash is worse. Being able to find two distinct inputs that result in a collision is the granddaddy of them all (the inverse and the collision are called the first and second preimage, respectively). Why is this bad? Well, among other things, websites (should always) store salted, hashed passwords in lieu of the original ones. That way, if their security is compromised the attacker still has to figure out what the user’s original password was before it does them any good. For strong passwords and a good hash function, doing so is computationally infeasible.
Cryptographic hashs can take thousands of times longer to compute than a simple one so they’re only used when necessary. However, non-cryptographic hash collisions can still pose a big problem. Hash tables use (as their name suggests) a hash function to quickly store or look-up a value given its key. Multiple keys may have the same hash, resulting in a collision. Hash tables usually resolve collision via probing or chaining. Probing hash tables work by walking over to the next available slot after a collision occurs, and chaining hash maps store a list-like structure at collision points. The average, look-up behavior of a hash table relies on keys having uniformly distributed hash values so that all available slots are filled with the same probability. When this breaks down you’re left with a structure that performs like a linked list— insertions, deletions, and look-ups.
This is where the connection to real-time computing comes in. Two things really ruin your day in this space: data structures performing at their worst-case time complexity, and memory allocation at an inopportune time. Hash collisions can result in both. Most high level OO languages try to hash built-in objects in a sensible fashion that minimizes collisions and in some cases provides security by design. However, it’s often possible to engineer collisions either intentionally or unintentionally.
A Case Study: Java Strings
As of writing (and for as long as I can remember) the Oracle JDK hashes a string S with the following function:
That’s a speedy and perfectly reasonable hash function. Unfortunately, we can use a very inefficient brute force method to find collisions on short strings, and much better methods exist. Given some admissible set of characters the computation looks something like:
private static final int[] permissibleCharacters;
static {
permissibleCharacters = new int[62];
int i = 0;
for(int v = '0'; v <= 'z'; ++v) {
if(isPermissibleCharacter(v)) {
permissibleCharacters[i++] = v;
}
}
}
private static boolean isPermissibleCharacter(int x) {
return (x >= '0' && x <= '9') ||
(x >= 'A' && x <= 'Z') ||
(x >= 'a' && x <= 'z');
}
private static List<String> generateCollisionsForString(String target,
int[] validCharacters) {
int n = validCharacters.length;
final List<String> outputList = new ArrayList<>();
final int targetHash = target.hashCode();
for(int k = 0; k <= target.length(); ++k) {
generateCollisionsForString(targetHash, validCharacters, "", n, k,
outputList);
}
return outputList;
}
private static void generateCollisionsForString(int targetHash,
int[] set,
String prefix,
int n,
int k,
List<String> outputList) {
if (k == 0) {
if(prefix.hashCode() == targetHash) {
outputList.add(prefix);
}
return;
}
for (int i = 0; i < n; ++i) {
generateCollisionsForString(targetHash, set, prefix + (char) set[i],
n, k - 1, outputList);
}
}
Restricting myself to ASCII [a-Z 0-9]
gives 35 collisions for the word “world”—which hashes to 113318802
. A few
samples are: “wormE, wosMd, and xPsNE”. What could we do with these colliding strings? If some remote service relied on
a hash table-like structure you might be able to cause problems by a) overloading the hash table with colliding values
resulting in performance, or b) glean information on other values in the collection, which is easier to do if
you have known, colliding values. It’s a big enough problem that starting with JDK 7.04 Oracle included a secondary
string hash for the hash based collections: sun.misc.Hashing.stringHash32()
. Note that generally speaking, in the
absence of secondary hashing, it’s actually easier to cause a collision in a hash table than a pure hash collision as we
only need to find a collision modulo the table width. Java collections apply both a secondary hash, and a randomized
hash seed to make this sort of reverse engineering harder to do. However, if you “roll your own” collection without
these features, or use something other than a string as a key, you’re still vulnerable.
So, how pronounced is the effect? Let’s write a little code to find out. You’ll need to run it on Java 6 given what I mentioned above.
private static void benchmarkLoadedHashMap(int randomKeys,
int stringLength,
List<String> collisions,
int samples) {
// Generate a hash map with 50% load
final Map<String, String> hashMap = new HashMap<String, String>(
2 * randomKeys);
final String[] randKeyArr = new String[randomKeys];
final String[] collisionKeys = new String[randomKeys];
final Random rand = new Random();
// Load the map with random keys
for(int i = 0; i < randomKeys; ++i) {
final String randomString = (new BigInteger(stringLength * 8,
rand)).toString();
randKeyArr[i] = randomString;
hashMap.put(randomString, randomString);
}
// Load the map with our colliding keys
for(String s : collisions) {
hashMap.put(s, s);
}
for(int i = 0; i < randomKeys; ++i) {
collisionKeys[i] = collisions.get(i % collisions.size());
}
// Warm up before benchmarking
for(int i = 0; i < 5; ++i) {
timeAccess(randKeyArr, hashMap, samples, rand);
timeAccess(collisionKeys, hashMap, samples, rand);
}
for(int i = 0; i < 5; ++i) {
System.out.printf("Iteration %d\n", i + 1);
System.out.printf("Random keys: %.4fs\n",
timeAccess(randKeyArr, hashMap, samples, rand));
System.out.printf("Collision keys: %.4fs\n",
timeAccess(collisionKeys, hashMap, samples, rand));
}
}
private static double timeAccess(String[] keys, Map<String, String> map,
int samples, Random random) {
final long startTime = System.nanoTime();
for(int i = 0; i < samples; ++i) {
poisonCache(keys, map, random);
// Pick a key at random to minimize cache effects in the benchmark
final String key = keys[random.nextInt(keys.length)];
map.get(key);
}
final long endTime = System.nanoTime();
return (double) (endTime - startTime) / Math.pow(10d, 9);
}
// Preform some basic cache busing between iterations to make sure that our
// benchmark isn't just showing cache effects (not that those aren't useful
// in their own right)
private static void poisonCache(String[] keys, Map<String, String> map,
Random random) {
for(int i = 0; i < 10; ++i) {
map.get(keys[random.nextInt(keys.length - 1)]);
}
}
public static void main(String[] args) {
final List<String> collisions = generateCollisionsForString("world",
permissibleCharacters);
benchmarkLoadedHashMap(10000, 3, collisions, 100000);
}
Running this code on Oracle JDK 1.6.37 gives me:
Iteration 1 Random keys: 0.0526s Collision keys: 0.1258s Iteration 2 Random keys: 0.0518s Collision keys: 0.1238s Iteration 3 Random keys: 0.0519s Collision keys: 0.1232s Iteration 4 Random keys: 0.0508s Collision keys: 0.1235s Iteration 5 Random keys: 0.0472s Collision keys: 0.1230s.
Ouch. There’s a very stable and obvious 2.5x timing differential between the colliding and random keys. Longer strings admit more collisions so it only gets worse from here.
Closing Thoughts
This is a contrived example but illustrates a real problem to the extent that Oracle addressed it in Java 7. Making strings collide is relatively hard—it’s much easier to create these effects (accidentally or deliberately) using a less general type of key. Hash map performance degrades significantly under load factor and collisions, so both are big problems for low latency computing.