I found the slides and notes that I had prepared for a workshop I gave last year and decided to write a post about that topic. Here we will explore how we can sandbox a process to protect it from memory-corruption vulnerabilities and malicious code injection. To be honest, I’m not really fluent in Windows development, so I used official documentation and then just tested the code. I’m not sure if the best practices were used there, but the code was tested and it works. We will look at the techniques in 3 different worlds: Linux, Windows, and FreeBSD. The C23 standard is used in the examples.

Attention!

Don’t forget to use [[nodiscard]] to prevent silently ignoring failures of these sandboxing features.

Windows

As I said before, I’m not really fluent in Windows development since I’m mostly a Linux (and sometimes FreeBSD) developer. In Windows, we have Windows Process Mitigation Policies, controlled by the SetProcessMitigationPolicy API. This interface applies constraints against memory-corruption exploitation. These mitigation policies must be applied immediately after the entry point of the program. The API is really heavy, and I spent a couple of days trying to understand it, but we will not dive too deep here (otherwise it would be a book instead of a post), and will just look at specific mechanics and check a simple code example.

What do we have then?

  • Data Execution Prevention (DEP) - used to mark all non-code memory as non-executable.
  • Address Space Layout Randomization (ASLR) - used to create maximum entropy in memory allocation.
  • Control Flow Guard (CFG) - used for strict validation of indirect calls.

These are 3 basic policies (more are available, but as I said, it’s too long a story) that we can apply to our process. In code, it will look something like this:

We use zero-initialization of the structures to clear all reserved/undocumented flags within them. To use all of this in production, of course, you should do a deep dive into MS documentation and read detailed information about this and other features in the SetProcessMitigationPolicy API. It has a lot of features, so maybe one day I will make a more detailed post for those who don’t want to read too much in the official docs.

Linux

In Linux, we have ptrace(). It provides the ability to basically control another process’s execution, inspect internal registers, and read/modify the virtual memory space of the targeted process. Cool for debugging (really, it’s hard to overrate this), bad for security. Sounds bad, but we have prctl() as well, which allows our program to declare itself undumpable and untraceable, creating a memory sandbox that prevents unauthorized tampering. To make our program just undumpable, we have to set the flag “PR_SET_DUMPABLE” to “0” (prctl(PR_SET_DUMPABLE, 0), the full example is shown below). It will significantly harden the ptrace attachment process.

Next, we will use the seccomp() (Secure Computing) syscall to restrict which syscalls our process can invoke. By using SECCOMP_MODE_STRICT in prctl(), we will force the system (at the OS level) to correctly process the syscalls sigreturn(), read(), write(), and _exit(), but terminate the process if any other syscall is used.

The Linux Capability System is controlled by the libcap interface, and it manages capability sets to control access to the process. I’m sure I’ll make a dedicated post about LCS because it’s basically complex and a bit tricky (same as Capsicum in FreeBSD). Each capability can be granted or revoked independently. For now, let’s just see the example.

One more thing is Yama LSM. It’s a system-wide security system that can be configured by the system user to enhance protection. So, basically, it’s all on the end user: they can configure the Yama level of their choice. In critical systems, it really does make sense. Of course, Yama has no API and can’t be configured by other apps since it’s, again, system-wide.

FreeBSD

That’s my favorite, for sure! FreeBSD has a really cool sandboxing framework at the kernel level named Capsicum.

How does it work? When a process executes the cap_enter() syscall, the kernel transitions the execution context into capability mode. In this state, the process is prohibited from using any syscalls that require global namespace lookups. The problem is that after the cap_enter() syscall is used, access to ALL the file system is terminated for the process instantly. So all necessary files should be opened again. And sockets… And everything else… It can be a bit of a problem during cross-platform development, and it can add significant overhead. Don’t forget, EVERY used file should be opened again, including dependency files. It can be tricky sometimes…

Another problem with Capsicum: it will not work at all if the system uses a kernel built without the options CAPABILITY_MODE parameter in the kernel configuration. In this case, cap_enter() will return “-1”.

To guarantee the security of file descriptors, they should be properly configured with cap_rights_limit(). By that link, you can get more information about how to do it; then I’ll show an additional example.

Summary

Of course, it’s just impossible to cover such a big topic in one blog post, but I provided some basic information (and even a couple of links) to help you understand where to dig. Additionally, I have a saved draft for a dedicated blog post about Capsicum, and maybe it will be finished one day. :) I really hope this post will help developers who want to work with OS-level security features but had no idea where to begin. As I said at the beginning, the post was inspired by notes that I used at a workshop for students, so it’s kind of “entry-level”.