adllm Insights logo adllm Insights logo

Troubleshooting and Resolving ENOSPC Errors from fanotify_mark on Linux

Published on by The adllm Team. Last modified: . Tags: Linux fanotify ENOSPC System Administration Kernel Troubleshooting File Systems System Monitoring

The Linux fanotify API is a powerful tool for monitoring filesystem events, crucial for applications like security auditing tools, virus scanners, and real-time backup solutions. However, when working with fanotify_mark to register filesystem objects (files, directories, mounts, or entire filesystems) for monitoring, developers and administrators can encounter the ENOSPC (“No space left on device”) error. Contrary to what the message might suggest for disk space, in this context, ENOSPC typically signals that a resource limit related to fanotify itself has been exhausted. This issue is particularly prevalent on systems with a large number of mount points, such as those heavily utilizing containerization technologies.

This article provides a comprehensive guide to understanding, diagnosing, and effectively resolving ENOSPC errors stemming from fanotify_mark. We’ll explore kernel limits, privileged operations, strategic marking, and robust coding practices to ensure your fanotify-based applications operate reliably.

Understanding fanotify and the ENOSPC Error

Before diving into solutions, it’s essential to grasp the fundamentals of fanotify and why ENOSPC occurs.

What is fanotify?

The fanotify API, detailed in the fanotify(7) man page, allows applications to receive notifications for a wide range of filesystem events. Key system calls include:

  • fanotify_init(2): Creates an fanotify notification group and returns a file descriptor for reading events. See the fanotify_init(2) man page.
  • fanotify_mark(2): Adds, removes, or modifies marks on filesystem objects (files, directories, mounts, entire filesystems) to specify what should be monitored. See the fanotify_mark(2) man page.

fanotify can monitor individual files, directories, or, significantly, entire mount points (FAN_MARK_MOUNT) or filesystems (FAN_MARK_FILESYSTEM, since Linux 4.20).

The ENOSPC Culprit: Mark Limits

When fanotify_mark returns ENOSPC, it means the kernel cannot allocate resources for a new mark because a user-specific limit on the number of fanotify marks has been reached. Each mark consumes a small amount of non-swappable kernel memory. To prevent any single unprivileged user from exhausting these resources, Linux imposes a default limit.

This limit is defined by the /proc/sys/fs/fanotify/max_user_marks kernel parameter, documented as part of the /proc filesystem interface described in the proc(5) man page and more specifically within fanotify(7). If an application attempts to create more marks than this value allows for its user ID (and it hasn’t used FAN_UNLIMITED_MARKS during initialization), fanotify_mark will fail with ENOSPC.

Systems with many mount points (e.g., hundreds or thousands, common in container hosts using Docker or Kubernetes, or systems with complex bind mount setups) are susceptible if an application tries to place an individual mark on each mount point without appropriate configuration or privileges.

Initial Checks: Verifying Current Limits

The first step in diagnosing an ENOSPC issue is to check the current fanotify mark limit for your system.

You can inspect the current value using cat:

1
cat /proc/sys/fs/fanotify/max_user_marks

A common default value is 8192. If your application needs to monitor more distinct filesystem entities than this limit (and is not using FAN_MARK_FILESYSTEM effectively or FAN_UNLIMITED_MARKS), this is likely the source of the ENOSPC errors.

Strategies for Resolving and Preventing ENOSPC

Several strategies can be employed to prevent or resolve ENOSPC errors from fanotify_mark, ranging from privileged operations to kernel tuning and smarter application design.

1. Using FAN_UNLIMITED_MARKS (Privileged Operations)

If your application runs with sufficient privileges (specifically, CAP_SYS_ADMIN, detailed in the capabilities(7) man page), you can instruct the kernel to bypass the per-user mark limit by using the FAN_UNLIMITED_MARKS flag during fanotify_init.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#include <sys/fanotify.h>
#include <stdio.h>
#include <errno.h>

// This function attempts to initialize fanotify with unlimited marks.
// Returns the fanotify file descriptor on success, -1 on error.
int initialize_fanotify_unlimited() {
    int fan_fd;

    // FAN_CLOEXEC: Close fanotify fd on execve
    // FAN_NONBLOCK: Non-blocking fanotify fd
    // FAN_UNLIMITED_MARKS: Bypass user mark limit (requires CAP_SYS_ADMIN)
    // FAN_UNLIMITED_QUEUE: Bypass event queue limit (requires CAP_SYS_ADMIN)
    fan_fd = fanotify_init(FAN_CLOEXEC | FAN_NONBLOCK |
                               FAN_UNLIMITED_MARKS | FAN_UNLIMITED_QUEUE,
                           0); // event_f_flags for O_RDWR (optional)

    if (fan_fd == -1) {
        perror("fanotify_init failed");
        // EPERM if not privileged for unlimited flags
        return -1; 
    }

    printf("fanotify_init successful with unlimited marks (fd: %d)\n", fan_fd);
    return fan_fd;
}

// Example usage (ensure to compile and link correctly, and run as root)
// int main() {
//     int fd = initialize_fanotify_unlimited();
//     if (fd != -1) {
//         // Proceed with fanotify_mark calls...
//         // close(fd);
//     }
//     return 0;
// }

Note: Using FAN_UNLIMITED_MARKS should be done judiciously, as it allows a single application to potentially consume more kernel resources. It’s suitable for trusted system-level daemons.

2. Increasing max_user_marks System-Wide

If FAN_UNLIMITED_MARKS is not an option or finer control over a global limit is desired, a system administrator can increase the fs.fanotify.max_user_marks kernel parameter.

Temporarily (Resets on Reboot):

You can change the value at runtime using echo (requires root privileges):

1
2
# Example: Increase the limit to 16384
sudo echo 16384 > /proc/sys/fs/fanotify/max_user_marks

Verify the change:

1
cat /proc/sys/fs/fanotify/max_user_marks

Persistently (Survives Reboots):

To make the change permanent, add the setting to a file in /etc/sysctl.d/ (e.g., /etc/sysctl.d/99-fanotify.conf) or to /etc/sysctl.conf (see the sysctl.conf(5) man page):

1
2
3
# /etc/sysctl.d/99-fanotify.conf
# Increase the maximum number of fanotify marks per user
fs.fanotify.max_user_marks = 16384

Apply the changes without rebooting using the sysctl(8) utility:

1
2
sudo sysctl --system 
# or specific to the file: sudo sysctl -p /etc/sysctl.d/99-fanotify.conf

Increasing this limit system-wide affects all users. Choose a value appropriate for your system’s load and resources.

3. Leveraging FAN_MARK_FILESYSTEM (Linux Kernel 4.20+)

For monitoring all objects within an entire filesystem, regardless of how many times it’s mounted, the FAN_MARK_FILESYSTEM flag (introduced in Linux 4.20) is highly efficient. It requires only a single mark for the entire filesystem. This also typically requires CAP_SYS_ADMIN. The AT_FDCWD constant used in the example signifies that any relative pathname is interpreted relative to the current working directory; its usage is common in system calls like openat, as described in the openat(2) man page.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#include <sys/fanotify.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <fcntl.h> // For AT_FDCWD
// Assume fan_fd is an initialized fanotify file descriptor

// Marks an entire filesystem for specified events.
// path_on_filesystem can be any path residing on the target filesystem.
int mark_filesystem(int fan_fd, const char *path_on_filesystem) {
    int ret;
    unsigned int mark_flags = FAN_MARK_ADD | FAN_MARK_FILESYSTEM;
    // Monitor for file access and modifications on the filesystem
    uint64_t event_mask = FAN_ACCESS | FAN_MODIFY | FAN_CLOSE_WRITE;

    ret = fanotify_mark(fan_fd,
                          mark_flags,
                          event_mask,
                          AT_FDCWD, // Directory fd for relative path
                          path_on_filesystem);

    if (ret == -1) {
        perror("fanotify_mark (FAN_MARK_FILESYSTEM) failed");
        if (errno == ENOSPC) {
            fprintf(stderr, "ENOSPC: Fanotify mark limit reached.\n");
        } else if (errno == EINVAL) {
            fprintf(stderr, "EINVAL: Kernel too old or invalid args?\n");
        }
        return -1;
    }
    printf("Successfully marked filesystem containing '%s'\n", 
           path_on_filesystem);
    return 0;
}

// Example:
// int fan_fd = fanotify_init(...); // See initialize_fanotify_unlimited
// if (fan_fd != -1) {
//     mark_filesystem(fan_fd, "/var/lib/docker"); // Or any relevant path
//     // ... read events ...
//     // close(fan_fd);
// }

This is often the best approach for monitoring container storage drivers or large, distinct filesystems.

4. Strategic Use of FAN_MARK_MOUNT

If you need to monitor specific mount points (and FAN_MARK_FILESYSTEM is not suitable, e.g., you need different event masks for different mounts of the same filesystem), use FAN_MARK_MOUNT. However, be mindful of the total number of marks. Information about current mounts can often be found in /proc/mounts (see proc(5) man page).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#include <sys/fanotify.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <fcntl.h> // For AT_FDCWD
// Assume fan_fd is an initialized fanotify file descriptor

// Marks a specific mount point for specified events.
int mark_single_mount(int fan_fd, const char *mount_point_path) {
    int ret;
    unsigned int mark_flags = FAN_MARK_ADD | FAN_MARK_MOUNT;
    uint64_t event_mask = FAN_OPEN | FAN_CLOSE_WRITE; // Example events

    ret = fanotify_mark(fan_fd,
                          mark_flags,
                          event_mask,
                          AT_FDCWD,
                          mount_point_path);
    if (ret == -1) {
        perror("fanotify_mark (FAN_MARK_MOUNT) failed");
        if (errno == ENOSPC) {
            fprintf(stderr, 
                    "ENOSPC on mount %s. Mark limit likely reached.\n",
                    mount_point_path);
        }
        return -1;
    }
    printf("Successfully marked mount point '%s'\n", mount_point_path);
    return 0;
}

// Example of how one might iterate (conceptual, error handling simplified):
// FILE *mounts_file = fopen("/proc/mounts", "r");
// char line[256]; // Ensure buffer is large enough for lines in /proc/mounts
// if (mounts_file == NULL) { /* Handle error */ }
// while (fgets(line, sizeof(line), mounts_file)) {
//     char mount_path[128]; // Ensure this is adequate for your mount paths
//     // A more robust parsing of /proc/mounts is recommended
//     sscanf(line, "%*s %127s %*s", mount_path); // Extracts mount path
//     // Add logic here to select which mount_path to mark
//     // if (should_monitor(mount_path)) { // Your selection logic
//     //     if (mark_single_mount(fan_fd, mount_path) == -1 && errno == ENOSPC) {
//     //         // Handle ENOSPC, perhaps stop marking or log critical error
//     //         break; 
//     //     }
//     // }
// }
// if (mounts_file != NULL) fclose(mounts_file);

If marking many individual mounts, ensure your chosen limits (max_user_marks or FAN_UNLIMITED_MARKS) can accommodate the load.

5. Efficient Application Design and Resource Management

  • Filter in Userspace: Instead of creating numerous fine-grained marks to exclude certain events or subdirectories, consider using broader marks (e.g., on a parent directory or mount) and then filtering unwanted events in your application’s userspace code. This can reduce the total number of kernel marks required.
  • Close File Descriptors: Always ensure the fanotify file descriptor obtained from fanotify_init is closed (close(fan_fd)) when no longer needed, as detailed in the close(2) man page. This frees associated kernel resources. While not directly causing ENOSPC for new marks (unless group limits are also hit), it’s crucial for good resource hygiene.

Diagnosing ENOSPC Errors

Effective diagnosis involves checking error codes and using system utilities.

Checking errno in C/C++

After any fanotify_mark call, always check its return value. If it’s -1, inspect errno (see the errno(3) man page). The strerror() function, described in the strerror(3) man page, can convert errno values to human-readable strings.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
#include <errno.h>
#include <stdio.h>
#include <string.h>
// Assume pathname is the path being marked, and fan_fd is initialized.
// int ret = fanotify_mark(fan_fd, flags, mask, dirfd, pathname);
// if (ret == -1) {
//     fprintf(stderr, "fanotify_mark for '%s' failed: %s (errno %d)\n",
//             pathname, strerror(errno), errno);
//     if (errno == ENOSPC) {
//         fprintf(stderr, "This is ENOSPC - mark limit exceeded!\n");
//         // Implement recovery or logging strategy
//     } else if (errno == EPERM) {
//         fprintf(stderr, "Permission denied. CAP_SYS_ADMIN needed?\n");
//     }
//     // Handle other errors like EINVAL, ENOENT, etc.
// }

Using strace

The strace utility (see strace(1) man page) can trace system calls made by a process. This is invaluable for seeing the exact arguments to fanotify_mark and the errors returned.

1
2
3
4
5
# Trace a running process by its PID
sudo strace -e trace=fanotify_mark,fanotify_init -p YOUR_PID

# Trace a new program execution
sudo strace -e trace=fanotify_mark,fanotify_init your_program your_args

Look for fanotify_mark calls returning -1 ENOSPC (No space left on device).

Inspecting /proc/[pid]/fdinfo/[fanotify_fd]

For a running process, you can inspect details about its fanotify file descriptors via the /proc filesystem (refer again to proc(5) man page).

  1. Find the process ID (PID).
  2. List its file descriptors: ls -l /proc/YOUR_PID/fd/
  3. Identify the fanotify file descriptor (often links to anon_inode:fanotify).
  4. View its fdinfo: cat /proc/YOUR_PID/fdinfo/THE_FANOTIFY_FD

The output will show fanotify flags, event masks, and details about existing marks, which can help confirm if numerous marks have been created. An fa_id field may appear related to marks, representing a unique identifier for the mark. Mount marks will show mnt_id, and filesystem marks show fsid.

1
2
# Example (replace 12345 with your PID and 10 with the fanotify fd)
sudo cat /proc/12345/fdinfo/10

This output can be complex but may show the number and types of marks associated with the fanotify group.

Common Pitfalls

  • Forgetting FAN_UNLIMITED_MARKS: When running with CAP_SYS_ADMIN and needing to mark many items, not using FAN_UNLIMITED_MARKS in fanotify_init is a common cause of ENOSPC.
  • Ignoring fanotify_mark Return Values: Failing to check for -1 and errno == ENOSPC leads to applications silently failing to monitor intended targets.
  • Misunderstanding Mark Scope: Using FAN_MARK_MOUNT excessively where a single FAN_MARK_FILESYSTEM on the underlying storage would suffice.
  • Resource Leaks: Not closing fanotify file descriptors, leading to gradual resource depletion (though not directly ENOSPC for marks, it contributes to system load).

Considerations for Containerized Environments (Docker/Kubernetes)

Container hosts often have a very large number of mount points (e.g., for overlay filesystems, bind mounts, tmpfs instances per container). An agent attempting to monitor activity within all containers by placing a FAN_MARK_MOUNT on every container-related mount point can easily hit the default max_user_marks.

Strategies include:

  • Privileged Agents with FAN_UNLIMITED_MARKS: A monitoring agent running as a privileged container or host process can use FAN_UNLIMITED_MARKS.
  • Marking Underlying Filesystems (FAN_MARK_FILESYSTEM): More scalably, mark the host’s underlying filesystems that store container layers and volumes (e.g., /var/lib/docker or specific backing storage filesystems). Events can then be filtered and correlated to specific containers in userspace.
  • Adjusting max_user_marks on the Host: If per-mount monitoring by a less privileged agent is unavoidable, increasing max_user_marks on the container host might be necessary.

Advanced Topics

  • Filesystem-Specific Behaviors: Some filesystems, like Btrfs (see the Btrfs Wiki), with its subvolumes, can have nuanced interactions with fanotify_mark regarding fsid and how FAN_MARK_FILESYSTEM behaves across subvolumes. Testing is crucial for specific filesystem types. Linux kernel 6.8 introduced improvements and changes for Btrfs fsid reporting, which can affect fanotify.
  • Network Filesystems (NFS, SMBFS): Traditionally, fanotify primarily reports events triggered locally. Monitoring changes made remotely to network filesystems has limitations. While there is ongoing work to improve network filesystem change notifications in the kernel, behavior can vary.

Conclusion

Encountering ENOSPC from fanotify_mark is a common hurdle on Linux systems managing numerous filesystem entities, especially mount points. By understanding that this error signals a mark limit exhaustion, administrators and developers can take targeted action. Utilizing FAN_UNLIMITED_MARKS for privileged applications, strategically increasing max_user_marks, leveraging the efficient FAN_MARK_FILESYSTEM flag, and designing resource-conscious applications are key to building robust and scalable file monitoring solutions. Always combine these techniques with diligent error checking and diagnostic practices to ensure your fanotify-based tools perform reliably under demanding conditions.