Custom Bazel Rules: Mastering Code Generation from Obscure IDLs

Modern software development often involves integrating diverse components, sometimes defined by Interface Definition Languages (IDLs). While standard IDLs like Protocol Buffers or gRPC have excellent Bazel support, many projects encounter proprietary, legacy, or simply “obscure” IDL formats. Manually managing code generation from these IDLs is error-prone, inefficient, and breaks the principles of reproducible, scalable builds. The solution lies in crafting custom Bazel rules.

This article provides a comprehensive walkthrough for developing a custom Bazel rule using Starlark to automate code generation from any text-based IDL. We’ll cover the fundamental concepts, rule implementation details, best practices for tool integration, and how to consume the generated code, empowering you to bring even the most arcane IDLs into your hermetic Bazel build process.

Why Custom Bazel Rules for Code Generation?

Before diving into implementation, let’s understand why a custom Bazel rule is the superior approach for IDL-based code generation compared to ad-hoc scripts or even Bazel’s generic genrule:

Hermeticity & Reproducibility: Custom rules enforce explicit declaration of all inputs (IDL files, compiler tool) and outputs. This ensures that builds are self-contained and produce the same result every time, regardless of the environment. Bazel’s philosophy emphasizes reproducibility.
Incrementality: Bazel intelligently rebuilds only what’s necessary. If your IDL file or the code generator tool changes, only the affected generated code and its downstream dependencies will be rebuilt, saving significant time.
Scalability & Maintainability: As your project grows, custom rules provide a clean, modular way to manage code generation logic. They are far more maintainable than scattered scripts.
Type Safety & Richer API: Starlark rules offer typed attributes and structured providers, enabling better error checking during the analysis phase and clearer contracts between rules compared to genrule’s string-based commands.
Integration with the Build Graph: Generated code seamlessly becomes part of Bazel’s dependency graph, allowing other rules (e.g., cc_library, java_library) to depend on it directly and correctly.
Testability: Custom rules and the code they generate can be easily tested within the Bazel framework.

While genrule can be a quick starting point for simple tasks, it often becomes unwieldy for complex code generation scenarios involving multiple outputs, specific tool invocations, or conditional logic based on inputs. Custom rules offer far more power and structure.

Anatomy of a Custom Code Generation Rule

A custom Bazel rule is defined in a .bzl file using Starlark, a dialect of Python. Let’s outline the core components:

The .bzl File: This file houses your rule definition and its implementation logic. For instance, //tools/build_rules/my_idl.bzl.
rule() Function: This Starlark function declares your new rule, specifying its implementation function, attributes, and other properties.
Implementation Function: Usually named _my_rule_impl(ctx), this function contains the core logic: it receives inputs, defines actions (like running the IDL compiler), and returns providers that describe the rule’s outputs.
Attributes (attrs): These define the inputs your rule accepts (e.g., IDL source files, the IDL compiler tool, output language options). Each attribute has a type (e.g., attr.label_list for source files, attr.label for a single tool, attr.string for options).

Here’s a conceptual structure within a my_idl.bzl file:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# //tools/build_rules/my_idl.bzl

# Forward declaration for a custom provider (optional, but good practice)
MyIdlInfo = provider(
    fields = {
        "generated_srcs": "Depset of generated source files",
        "generated_hdrs": "Depset of generated header files (if any)",
    },
)

def _my_obscure_idl_rule_impl(ctx):
    # Rule implementation logic goes here
    # 1. Get input IDL files from ctx.files.srcs
    # 2. Get the IDL compiler tool from ctx.executable._idl_compiler
    # 3. For each IDL file:
    #    a. Declare output files (e.g., foo_generated.cc, foo_generated.h)
    #    b. Construct arguments for the IDL compiler
    #    c. Register an action (ctx.actions.run) to invoke the compiler
    # 4. Collect all generated files
    # 5. Return providers (e.g., DefaultInfo, MyIdlInfo)
    pass # Placeholder

my_obscure_idl_rule = rule(
    implementation = _my_obscure_idl_rule_impl,
    attrs = {
        "srcs": attr.label_list(
            doc = "List of .obscure_idl source files",
            allow_files = [".obscure_idl"], # Restrict to specific file type
            mandatory = True,
        ),
        "_idl_compiler": attr.label(
            doc = "The IDL compiler tool",
            cfg = "exec", # Tool runs in execution configuration
            executable = True,
            allow_files = True, # Can be a script or binary
            default = Label("//tools/idl_compiler:compiler_wrapper"),
        ),
        "output_language": attr.string(
            doc = "Target language for code generation (e.g., 'cpp', 'java')",
            values = ["cpp", "java", "python"], # Enforce specific values
            default = "cpp",
        ),
        # Add other attributes as needed (e.g., include paths for IDL)
    },
    doc = "Generates code from .obscure_idl files.",
)

In this snippet, _idl_compiler is a private attribute (conventionally prefixed with _) because it’s an implicit dependency, often with a default value, rather than something a user explicitly sets per target unless overriding the default. The cfg = "exec" ensures the tool is built for the execution platform. More on attributes can be found in the Bazel documentation.

The Rule Implementation Function (`_impl`)

The _my_obscure_idl_rule_impl(ctx) function is where the magic happens. The ctx (context) object is your interface to Bazel’s build information and actions.

Accessing Inputs

Source IDL Files: ctx.files.srcs provides a list of File objects for the IDL files listed in the srcs attribute.
IDL Compiler Tool: ctx.executable._idl_compiler gives you the File object for the executable compiler tool.
Attribute Values: ctx.attr.output_language would give the string value of the output_language attribute.

Declaring Output Files

For each input IDL file, you need to determine the names of the files that will be generated and declare them to Bazel. ctx.actions.declare_file() is used for this. It’s crucial to ensure unique output filenames, often by basing them on the input filename and adding a suffix or changing the extension.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# Inside _my_obscure_idl_rule_impl(ctx)
generated_srcs_list = []
generated_hdrs_list = [] # If your IDL generates headers

for idl_file in ctx.files.srcs:
    base_name = idl_file.basename[:-len(".obscure_idl")] # Remove extension

    # Example: Generating a .cc and .h file for C++
    if ctx.attr.output_language == "cpp":
        out_src_name = base_name + "_generated.cc"
        out_hdr_name = base_name + "_generated.h"

        generated_cc = ctx.actions.declare_file(out_src_name)
        generated_h = ctx.actions.declare_file(out_hdr_name)

        generated_srcs_list.append(generated_cc)
        generated_hdrs_list.append(generated_h)
        # ... (call ctx.actions.run later with these as outputs) ...
    # Add similar blocks for other output_language values

This ensures that Bazel knows about the files your rule will create.

Registering the Code Generation Action

The core of code generation is invoking the IDL compiler. This is done by registering an action, typically with ctx.actions.run().

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# Inside the loop, after declaring outputs for an idl_file
# Continuing the C++ example: generated_cc and generated_h are declared

# Prepare arguments for the IDL compiler
args = ctx.actions.args()
args.add("--input_idl", idl_file.path)
args.add("--output_cc", generated_cc.path)
args.add("--output_h", generated_h.path)
args.add("--language", ctx.attr.output_language)
# Add any other necessary flags for your compiler

ctx.actions.run(
    outputs = [generated_cc, generated_h],
    inputs = depset([idl_file], transitive = [ctx.files._idl_compiler]),
    executable = ctx.executable._idl_compiler,
    arguments = [args],
    mnemonic = "ObscureIdlCompile",
    progress_message = f"Compiling {idl_file.short_path} with obscure IDL compiler",
)

Key parameters for ctx.actions.run():

outputs: A list of the output File objects declared earlier.
inputs: A depset of all input files the action needs. This crucially includes the idl_file itself and the _idl_compiler’s files (obtained via ctx.files._idl_compiler to ensure all runfiles of the tool are available).
executable: The File object for the IDL compiler.
arguments: A list of command-line arguments. Using ctx.actions.args() is best practice as it handles argument quoting and expansion correctly.
mnemonic: A short string used by Bazel to identify this type of action (e.g., in UI or performance profiles).
progress_message: A user-friendly message displayed during the build.

Consult the actions documentation for more details.

Returning Providers

After defining actions, the rule must tell Bazel about its outputs so other rules can use them. This is done by returning a list of provider instances.

DefaultInfo: This is a standard provider that tells Bazel which files should be built by default if this target is requested. It typically includes all generated files that are primary outputs (e.g., compiled libraries, not just source code if the rule also compiled). For code generation rules, it often exposes the generated source files.
Custom Providers: You can define your own providers (like MyIdlInfo shown earlier) to pass specific, structured information to consuming rules. This is useful if generated code has distinct categories (e.g., sources, headers, metadata files) that need to be handled differently.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# At the end of _my_obscure_idl_rule_impl(ctx)
# Assuming generated_srcs_list and generated_hdrs_list are populated

return [
    DefaultInfo(
        files = depset(generated_srcs_list + generated_hdrs_list),
    ),
    MyIdlInfo(
        generated_srcs = depset(generated_srcs_list),
        generated_hdrs = depset(generated_hdrs_list),
    ),
]

Using depset is important for performance when dealing with many files.

Defining the Obscure IDL Compiler Tool

Your custom rule needs an actual compiler tool to invoke. This tool is responsible for parsing the obscure IDL and generating the target language code. Bazel needs to know about this tool as a build target itself.

Pre-built Binary: If you have a pre-compiled IDL compiler, you can use prebuilt_cxx_library (if it’s C/C++), java_import, or simply check it into your repository and define an sh_binary that executes it.
Script-based Compiler: If your compiler is a script (e.g., Python, Shell), use py_binary or sh_binary.

A common practice is to create a wrapper script (e.g., an sh_binary) for your tool. This wrapper can handle complex argument parsing or set up an environment if needed.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# //tools/idl_compiler/BUILD

# Assume your actual compiler is 'obscure_compiler_binary'
# and it's checked in or built by another rule.
# For this example, let's say it's a script.
# You might also have data files the compiler needs.

sh_binary(
    name = "compiler_wrapper",
    srcs = ["compiler_wrapper.sh"],
    data = [
        ":actual_obscure_compiler_script",
        # Add any other data files the compiler needs at runtime
        # "//path/to:compiler_config_data",
    ],
    visibility = ["//visibility:public"],
)

# This could be your actual compiler script target
# For example, if it's a python script:
# py_binary(
#     name = "actual_obscure_compiler_script",
#     srcs = ["actual_compiler.py"],
#     ...
# )
# Or just a file if the wrapper script knows how to find it via runfiles
filegroup(
    name = "actual_obscure_compiler_script",
    srcs = ["actual_compiler.py"], # Assuming it's a python script
)

The compiler_wrapper.sh script would then locate the actual_compiler.py (or binary) in its runfiles directory and execute it with the provided arguments:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
#!/bin/bash
# compiler_wrapper.sh

# The actual_compiler.py will be in the runfiles,
# typically at a path relative to the script's location or workspace name.
# A robust way is to use the MANIFEST file if available or known relative paths.
# For simplicity, assuming it's in the same directory in runfiles:
SCRIPT_DIR=$(dirname "$0")
COMPILER_EXECUTABLE="$SCRIPT_DIR/actual_compiler.py"
# Or if `actual_obscure_compiler_script` is a filegroup target in `//tools/idl_compiler`:
# COMPILER_EXECUTABLE="$SCRIPT_DIR/actual_obscure_compiler_script"

# Execute the actual compiler with all passed arguments
"$COMPILER_EXECUTABLE" "$@"

This wrapper is then referenced by the _idl_compiler attribute in your .bzl file’s rule definition.

Using the Custom Rule in a `BUILD` File

Once your .bzl file and the compiler tool are set up, using the rule in a BUILD file is straightforward:

Load the rule: Use the load statement at the top of your BUILD file.
Instantiate the rule: Call the rule function, providing a name, srcs, and any other attributes.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# //my_project/BUILD

load("//tools/build_rules:my_idl.bzl", "my_obscure_idl_rule")

# Generate C++ code from an IDL file
my_obscure_idl_rule(
    name = "my_service_idl_codegen",
    srcs = ["my_service.obscure_idl"],
    output_language = "cpp",
    # You might add other attributes like include paths for the IDL here
    # idl_includes = ["//common/idl_defs"],
)

# Consume the generated code in a C++ library
cc_library(
    name = "my_service_lib",
    srcs = [
        ":my_service_idl_codegen", # This refers to the generated sources
        "my_service_impl.cc",    # Your handwritten implementation
    ],
    hdrs = [
        # If headers are distinct and you use a custom provider,
        # you might need a more specific way to get them.
        # If they are part of DefaultInfo files, this might just work
        # or you'd also list them from the codegen target explicitly
        # if the filegroup from DefaultInfo isn't specific enough.
        # For simplicity, assuming generated headers are picked up with sources,
        # or handled by how the cc_library processes the target.
        # A common pattern is for the codegen rule to make generated headers
        # available such that they are included by the generated sources.
    ],
    deps = [
        "//base:some_library",
    ],
)

When my_obscure_idl_rule is used in the srcs of cc_library, Bazel automatically uses the files provided by its DefaultInfo provider. If you need finer control (e.g., distinguishing generated headers for the hdrs attribute of cc_library), you would access your custom provider (MyIdlInfo in our example) using syntax like ":my_service_idl_codegen[MyIdlInfo].generated_hdrs", though this requires MyIdlInfo to be correctly structured and potentially the cc_library to understand how to consume it or an intermediate filegroup.

A common pattern for headers is to have the generated .cc files include their corresponding generated .h files using relative paths, and then only list the generated .cc files in the srcs of the cc_library. The cc_library rule’s include scanning mechanism will then find the generated headers.

Advanced Considerations

Output Groups: For more complex scenarios where a rule generates different categories of files (e.g., sources, documentation, metadata), OutputGroupInfo provider can be used to label these sets of files. Consumers can then request specific output groups.
Toolchains: If your IDL compiler has platform-specific versions or requires a complex environment (like a specific JDK for a Java-based compiler), Bazel toolchains provide a robust way to manage this. This decouples your rule logic from tool selection. Read more on toolchains.
Testing the Rule: Create *_test targets (e.g., sh_test, py_test, or language-specific tests) that invoke your my_obscure_idl_rule, then build and run tests against the generated code to ensure correctness.
Aspects: For very advanced scenarios, like gathering information transitively from all IDL-generated targets in a dependency graph, aspects can be used. This is generally not needed for basic code generation.

Debugging Your Custom Rule

Debugging Starlark rules can sometimes be tricky:

print() Statements: Use print() within your .bzl implementation function. These messages appear on the console during the analysis phase (bazel build //...).
bazel query: To inspect dependencies and rule attributes:
- bazel query 'kind(my_obscure_idl_rule, //...)' – find all instances.
- bazel query --output=build //my_project:my_service_idl_codegen – show the “evaluated” rule definition.
bazel aquery (Action Query): Crucial for inspecting the actual commands Bazel will run:
- bazel aquery 'mnemonic(ObscureIdlCompile, //my_project:my_service_idl_codegen)' This shows the exact command, inputs, outputs, and environment variables. Indispensable for debugging ctx.actions.run(). Learn about aquery.
--subcommands and --verbose_failures:
- bazel build --subcommands //my_project:my_service_idl_codegen prints executed commands.
- bazel build --verbose_failures //my_project:my_service_idl_codegen gives more detail on errors.
Examine bazel-out: Inspect generated files or action logs in the bazel-out directory.

Conclusion

Creating custom Bazel rules for code generation from obscure IDLs is a powerful technique to enhance build reliability, performance, and maintainability. By leveraging Starlark’s capabilities to define attributes, actions, and providers, you can seamlessly integrate any IDL-based code generation process into Bazel’s hermetic and incremental build system. While the initial learning curve for rule development exists, the long-term benefits for large-scale projects are substantial, transforming a potential source of build complexity into a well-defined, automated, and robust part of your development lifecycle.