How it works? Flush and Reload

October 25, 2020

How it works? Flush and Reload

All the illustrations used in this post was made with Excalidraw - If you want to use high quality, hand drawn looking illustrations in your project, consider using Excalidraw

Welcome it the first edition of How it works? In this blog post we will look at one of more popular side-channel attack called Flush and Reload. Flush and Reload exploits a shared cache and timer to deduce the secret within a victim program.

Note: The illustration of Flush and Reload in this article is a naive one. This is meant for the beginners to understand how the exploit works at the core. A real exploit will take more analysis and patience to be carried out fruitfully.
If you prefer to a video over an article talking about Flush and Reload, I suggest the video by Chester Rebeiro, IIT Madras - Flush and Reload Attack created as a part of NPTEL online course.

Before we jump into how it works, we need to understand some terminologies:

Cache: Cache is a small chunk of memory close to the CPU that is used to fetch data with very low latency. Frequently used data (or the data that program may require soon) is placed in the cache so that it might be accessed as fast as possible. Cache are further hierarchically classified into L1, L2 and L3 cache. The lower the level the cache, the closer it is to the processing unit and faster the data can be accessed. Most cache hierarchy are designed inclusive, i.e., a copy of data present in the L1 cache exists in L2 cache and in L3 cache.

Cache Miss: When a data is requested and is not present in the L3 cache, we encounter a cache miss. This means that the data needs to be brought in from the main memory into the cache and then is supplied to the processor.
Point to note - an access of cached data is visibly more faster than accessing a data that is not cached and needs to be brought in from the main memory.

Shared Cache: The L3 cache is often shared by the two threads of a single physical core of the processor. The data being used by both the threads go through the same L3 cache.

Cache Line Mapping: The cache is a small area of memory but it needs to bring in data from a large main memory. There are several ways on how this data can be placed in cache - you can place it anywhere, you can replace the least used data - each method has its own pros and cons. If you place a data anywhere, we need to search the entire cache to see if the particular value exists and this is bad from a performance standpoint.
Consider m:n mapping. Consider you have 8 cache lines and 64 addresses that can be accessed. Then for a given address d, it will be placed in the cache line numbered (d % 8)
Now suppose we access data at address 9 - it will be placed in cache line 1. A request now to address 17 will evict the data from line 1 and replace it with data at address 17.
If one needs to access address 20, one has to look only in cache line (20 % 8) = 4, the cache line 4 if the data from address 20 is present or not.
Point to note: You may note in the above caching technique, if we repeatedly access 9, 17, we always get a cache miss and it very bad for performance. In reality, the algorithm used will be n:m where n addresses can be stored on m cache lines. This is the best case scenario for performance as this doesn't lead to thrashing (repeated cache miss) and a data access doesn't need to search the whole cache but just a subset of it. This is called a cache set.

Programmatic Data Eviction: There exists an instruction CLFLUSH that can be used to invalidate data in a particular cache line. When a data is invalidated, to access this data again, it has to be brought in from the main memory. CLFLUSH is not a privileged instruction and hence can be used by any program to invalidate cache.
You may notice here that if two programs share the same cache, one program can invalidate a cache line containing the data used by the other program thus affecting the performance of the other program.

The actual flush and reload attack needs to find a secret within the victim program and not just slow the victim program down. To understand this completely we need to look some more into optimizations made by the operating system related to memory allocation and memory management:

Copy of Write: Let us say two programs are using the C math library. If each program needs to make copy of this library, a significant chunk of memory will be wasted due to duplication. To avoid this, the Linux memory manager uses a techniques called Copy of Write - Instead of having a copy of the library, there is a symbolic link between program and the library. If we are only reading the memory and not modifying it, we use the same common copy for all the programs. Supose one program makes a change to this memory, the library is copied to a separate location, modified and the symbolic link is updated to point to this new copy.
This is important as now we can deterministically tell where a given library / library function will be cached in the memory.

Lets look at the exploit

A point to note: This doesn't leak everything from the victim's memory but needs the victim program to run in a specific pattern that indirectly leaks the secret at micro architectural level.

Consider the program:

while(secret_copy > 0) {
        if (secret % 2) {
                some_shard_library_function();
        }
       secret_copy = secret_copy >> 2;
}

The above program calls a particular shared library function based on the bits of a given secret.

Now as an attacker running on the adjacent thread, I flush the cache line containing some_shard_library_function(). The location of this function can be found by evicting the cache and trying to call the some_shard_library_function ourselves and seeing on which cache line eviction will it take longer for the process to run. We can do this beforehand to find the cache set for this particular shared function.

We also need to know the time at which the victim is running this code - most flush and reload attacks target victim code that is run very frequently. We flush the cache line containing the and some_shard_library_function wait. The we run the some_shard_library_function ourselves and calculate the precise time it took to run it.

A precise time can be measured using the a high resolution timer to detect how fast or slow the function call was processed.

After this, we flush the cache line again and wait and time the function execution after some time. As we continue to do this we can see whether the specific bits are 1s or 0s and get a sample of the secret. Reading the entire secret in hard because the following scenario can take place:

The victim program can call the function multiple times between a single flush and reload but we can only say for certain it has been called at least once.
If the reload syncs with the victim's function call, a bit may be misread.

The image below portrays the most likely scenario

But these scenarios are also possible:

i) Multiple read between a single flush and reload

ii) Attacker reloads just as victim tries to load the shard library

Flush and Reload is an exploit that cannot be run out of blind - they depend on knowing the victim code and sharing libraries. However they were common exploit to extract cryptographic keys from the victim as the attacker had access to the public implementation of the algorithm used and shared in production.

If you would like to know more about consequences of Flush and Reload, I would recommend watching Cache Side Channel Attack: Exploitability and Countermeasures talk from Black Hat Asia 2017.

Rest assured, there have been mitigations to prevent it on shared systems like those in cloud but the fundamental design of inclusive cache hierarchy and the nature of CLFLUSH allows for any program to exploit this weakness of the modern day hardware.

Thank you for reading the first article of How it Works? I will be back with more content pertaining to microprocessor and computer architecture in the future.

References:

CPU Security: Basics I by Mr. Sanket Kadam, a CPU Verification Engineer at Nvidia,
Wikipedia: Side Channel Attacks article
Flush and Reload Attack - NPTEL open course by Chester Rebeiro, IIT Madras
Information Security - 5 - Secure Systems Engineering playlist by NPTEL-NOC IITM
Cache Side Channel Attack: Exploitability and Countermeasures talk from Black Hat Asia 2017

Search This Blog

RiscY

How it works? Flush and Reload

Comments

Post a Comment

Popular Posts

How it works? Prime and Probe