Capabilities in Linux
What Are Linux Capabilities?
Linux capabilities are a way to manage process privileges more securely. Instead of processes being either fully privileged (root) or not, capabilities break down root powers into smaller, specific permissions.
For example, a process might need to bind to a privileged port (like port 80) but shouldn't have full system control. Capabilities let you grant just that permission, reducing security risks.
Why capabilities were "needed"?
Linux capabilities are a feature introduced in kernel version 2.2 to address the limitations of the traditional Unix security model, where processes are either privileged (running as root with UID 0) or unprivileged (running as a non-root user). This binary approach posed significant security risks, as processes with root privileges could perform any action, potentially allowing attackers to exploit vulnerabilities for privilege escalation.
Capabilities mitigate this by dividing superuser privileges into distinct, granular units, each corresponding to a specific privileged operation. This adheres to the principle of least privilege, allowing processes to perform necessary tasks without requiring full root access.
Capabilities are particularly crucial in modern Linux environments, such as containers, where minimizing privileges is essential for security. By granting specific permissions, such as binding to privileged ports or changing file ownership, capabilities reduce the attack surface and prevent unnecessary privilege escalations.
How Capabilities Work
Capabilities are per-thread attributes, meaning they are associated with individual threads within a process, allowing for even more granular control. Each process has several capability sets, which define the privileges it can use:
Permitted: The capabilities that the process is allowed to use, representing the full set of privileges it can potentially activate.
Effective: The capabilities that are currently active for the process, determining what privileged operations it can perform at any given time.
Inheritable: The capabilities that can be inherited by child processes, ensuring continuity of privileges across process creation.
Bounding: The capabilities that cannot be added to the permitted set, used to restrict the process from gaining additional privileges.
Ambient: Capabilities that are preserved across user ID changes, introduced in Linux 4.3 to maintain privileges in dynamic environments.
Additionally, capabilities can be associated with files, stored in the file's extended attributes under the security.capability
section.
When a file with capabilities is executed, the process inherits those capabilities, enabling fine-grained privilege management at the file level. This is supported since Linux 2.6.24, with different versions of file capability formats (VFS_CAP_REVISION_1
, VFS_CAP_REVISION_2
, and VFS_CAP_REVISION_3
) providing enhanced functionality, such as namespaced capabilities since Linux 4.14. The implementation requires kernel checks for effective set capabilities, system calls for setting changes, and filesystem support for file capabilities.
The /proc/sys/kernel/cap_last_cap
file exposes the highest supported capability, available since Linux 3.2, providing a way to query the system's capability support.
Managing and Enumerating Capabilities
Capabilities can be managed and enumerated using specific tools and commands, providing administrators with the ability to audit and configure process privileges.
The following table summarizes key commands for managing capabilities:
getcap /path/to/file
View capabilities of a file.
Eg: getcap /usr/bin/ping
setcap cap_name=ep file
Set capabilities on a file.
Eg: setcap cap_net_bind_service=ep /usr/sbin/apache2
capsh --print
Inspect capability sets of a running process.
capsh --decode=1 <cap>
Decode capability values for analysis.
Eg: capsh --decode=1
Common Capabilities
CAP_AUDIT_CONTROL
Enable and disable kernel auditing; change auditing filter rules; retrieve auditing status and filtering rules.
Disable auditing to hide malicious activities.
Use a binary with this capability to disable auditing.
CAP_AUDIT_READ
Allow reading the audit log via a multicast netlink socket.
Read audit logs to gain information about system activities.
Use a binary with this capability to read audit logs.
CAP_AUDIT_WRITE
Write records to the kernel auditing log.
Tamper with audit logs by writing fake records.
Use a binary with this capability to write to audit logs.
CAP_BLOCK_SUSPEND
Employ features that can block system suspend (epoll(7) EPOLLWAKEUP, /proc/sys/wake_lock).
Prevent system suspend to maintain access.
Use a binary with this capability to block suspend.
CAP_BPF
Employ privileged BPF operations; see bpf(2) and bpf-helpers(7).
Attach BPF programs to hooks for malicious purposes, like sniffing traffic.
Use a binary with this capability to attach a BPF program.
CAP_CHECKPOINT_RESTORE
Update /proc/sys/kernel/ns_last_pid; employ set_tid feature of clone3(2); read /proc/pid/map_files for other processes.
Manipulate process states using checkpoint and restore.
Use a binary with this capability to checkpoint and restore processes.
CAP_CHOWN
Make arbitrary changes to file UIDs and GIDs (see chown(2)).
Take ownership of sensitive files.
Use a binary with this capability to change ownership of /etc/passwd.
CAP_DAC_OVERRIDE
Bypass file read, write, and execute permission checks.
Read or write sensitive files without permission.
Use a binary with this capability to read /etc/shadow.
CAP_DAC_READ_SEARCH
Bypass file read permission checks and directory read and execute permission checks; invoke open_by_handle_at(2); use linkat(2) AT_EMPTY_PATH.
Read sensitive files or traverse directories without permission.
Use tar with this capability to read /etc/shadow.
CAP_FOWNER
Bypass permission checks on operations requiring filesystem UID match; set inode flags; set ACLs; ignore directory sticky bit; modify user extended attributes; specify O_NOATIME.
Set inode flags or modify ACLs on sensitive files.
Use a binary with this capability to set immutable flag on a file.
CAP_FSETID
Don't clear set-user-ID and set-group-ID mode bits when a file is modified; set set-group-ID bit for files.
Maintain setuid/setgid bits on files after modification.
Modify a setuid binary without losing its setuid bit.
CAP_IPC_LOCK
Lock memory (mlock(2), mlockall(2), mmap(2), shmctl(2)); allocate memory using huge pages.
Prevent memory from being swapped out, hiding malicious code.
Lock memory containing sensitive data.
CAP_IPC_OWNER
Bypass permission checks for operations on System V IPC objects.
Access or modify IPC objects owned by other users.
Read messages from a message queue owned by root.
CAP_KILL
Bypass permission checks for sending signals (see kill(2)); includes ioctl(2) KDSIGACCEPT.
Send signals to arbitrary processes, potentially killing or manipulating them.
Send SIGKILL to a process to terminate it.
CAP_LEASE
Establish leases on arbitrary files (see fcntl(2)).
Lock files to prevent other processes from accessing them.
Lease a file to prevent it from being modified.
CAP_LINUX_IMMUTABLE
Set FS_APPEND_FL and FS_IMMUTABLE_FL inode flags (see FS_IOC_SETFLAGS(2const)).
Make files immutable, preventing modifications.
Make /etc/passwd immutable.
CAP_MAC_ADMIN
Allow MAC configuration or state changes. Implemented for the Smack LSM.
Configure MAC to weaken security policies.
Change SELinux policies to allow more permissive settings.
CAP_MAC_OVERRIDE
Override Mandatory Access Control (MAC). Implemented for the Smack LSM.
Bypass MAC restrictions.
Perform actions forbidden by MAC in systems like SELinux or AppArmor.
CAP_MKNOD
Create special files using mknod(2).
Create device files that grant access to hardware.
Create a character device to access /dev/kmem.
CAP_NET_ADMIN
Perform various network-related operations: interface configuration, IP firewall, routing tables, transparent proxying, TOS, promiscuous mode, multicasting, setsockopt(2) options.
Manipulate network settings, such as setting up IP forwarding or NAT rules.
Set up IP forwarding or NAT rules.
CAP_NET_BIND_SERVICE
Bind a socket to Internet domain privileged ports (port numbers less than 1024).
Run services on low ports without root privileges.
Run a web server on port 80.
CAP_NET_BROADCAST
(Unused) Make socket broadcasts, and listen to multicasts.
Send broadcast packets or listen to multicasts for network reconnaissance.
Send broadcast packets to discover hosts on the network.
CAP_NET_RAW
Use RAW and PACKET sockets; bind to any address for transparent proxying.
Sniff traffic or send arbitrary packets.
Use tcpdump with this capability to capture network traffic.
CAP_PERFMON
Employ performance-monitoring mechanisms, including perf_event_open(2) and BPF operations with performance implications.
Monitor system performance, but not directly for privilege escalation.
Use perf_event_open to monitor CPU performance.
CAP_SETGID
Make arbitrary manipulations of process GIDS and supplementary GID list; forge GID when passing socket credentials; write group ID mapping in user namespace.
Set group ID to 0 (root), leading to privilege escalation.
Use a binary with this capability to set the GID to 0.
CAP_SETFCAP
Set arbitrary capabilities on a file; since Linux 5.12, needed to map user ID 0 in a new user namespace.
Add capabilities to a binary, potentially making it exploitable.
Add CAP_SETUID to a binary.
CAP_SETPCAP
Add capabilities from bounding set to inheritable set; drop capabilities from bounding set; make changes to securebits flags (varies by file capabilities support).
Gain additional capabilities by manipulating the bounding set.
Add CAP_SETUID to the inheritable set.
CAP_SETUID
Make arbitrary manipulations of process UIDS; forge UID when passing socket credentials; write user ID mapping in user namespace.
Set user ID to 0 (root), leading to privilege escalation.
Use Python with this capability: ./python3 -c 'import os; os.setuid(0); os.system("/bin/bash")'
CAP_SYS_ADMIN
Perform system administration operations: quotactl(2), mount(2), umount(2), pivot_root(2), swapon(2), swapoff(2), sethostname(2), setdomainname(2); various privileged operations (overloaded, see notes).
Perform administrative operations like mounting filesystems or loading kernel modules.
Use a binary with this capability to mount a filesystem.
CAP_SYS_BOOT
Use reboot(2) and kexec_load(2).
Reboot the system or load a new kernel, potentially bypassing security measures.
Use kexec_load to load a malicious kernel.
CAP_SYS_CHROOT
Use chroot(2); change mount namespaces using setns(2).
Escape from a chroot jail or manipulate namespaces.
Change mount namespaces to access files outside the current namespace.
CAP_SYS_MODULE
Load and unload kernel modules (init_module(2), delete_module(2)); before 2.6.25, drop capabilities from system-wide bounding set.
Load malicious kernel modules that grant root access.
Load a kernel module that executes arbitrary code.
CAP_SYS_NICE
Lower process nice value; set real-time scheduling policies; set CPU affinity; set I/O scheduling class and priority; apply migrate_pages(2), move_pages(2); use MPOL_MF_MOVE_ALL with mbind(2).
Gain control over system resources by setting process priorities.
Set a process to real-time scheduling to gain more CPU time.
CAP_SYS_PACCT
Use acct(2).
Enable process accounting to monitor system activity.
Use acct(2) to start process accounting.
CAP_SYS_PTRACE
Trace arbitrary processes using ptrace(2); apply get_robust_list(2); transfer data to/from memory of arbitrary processes; inspect processes using kcmp(2).
Attach to and control other processes, potentially injecting code.
Attach to a privileged process and inject code.
CAP_SYS_RAWIO
Perform I/O port operations; access /proc/kcore; employ FIBMAP ioctl(2); open MSR devices; update /proc/sys/vm/mmap_min_addr; map files in /proc/bus/pci; open /dev/mem and /dev/kmem; perform SCSI and device-specific operations.
Directly manipulate hardware or read/write kernel memory.
Read kernel memory to find sensitive information.
CAP_SYS_RESOURCE
Use reserved space on ext2 filesystems; override disk quota limits; increase resource limits; override RLIMIT_NPROC; allow more than 64hz interrupts; raise msg_qbytes limit; bypass file descriptor limits; override pipe size limits; employ prctl(2) PR_SET_MM; set /proc/pid/oom_score_adj.
Override resource limits or exhaust system resources.
Increase the number of processes to launch a fork bomb.
CAP_SYS_TIME
Set system clock (settimeofday(2), stime(2), adjtimex(2)); set real-time clock.
Manipulate the system clock to bypass time-based security mechanisms.
Set the clock back to bypass time-based restrictions.
CAP_SYS_TTY_CONFIG
Use vhangup(2); employ privileged ioctl(2) operations on virtual terminals.
Manipulate terminal settings or hang up terminals.
Hang up a terminal to disconnect a user.
CAP_SYSLOG
Perform privileged syslog(2) operations; view kernel addresses via /proc when /proc/sys/kernel/kptr_restrict is 1.
Manipulate system logs or view kernel addresses.
Clear system logs to hide evidence.
CAP_WAKE_ALARM
Trigger something that will wake the system (set CLOCK_REALTIME_ALARM and CLOCK_BOOTTIME_ALARM timers).
Wake the system at specific times for malicious purposes.
Set an alarm to wake the system when it's supposed to be asleep.
Last updated
Was this helpful?