Add documentation
Hydra nac3artiq-msys2 Hydra build #200160 of artiq:nac3:nac3artiq-msys2
Hydra nac3artiq-msys2-pkg Hydra build #200161 of artiq:nac3:nac3artiq-msys2-pkg
Hydra nac3artiq-profile Hydra build #200162 of artiq:nac3:nac3artiq-profile
Hydra nac3artiq Hydra build #200159 of artiq:nac3:nac3artiq
Hydra nac3artiq-msys2 Hydra build #200160 of artiq:nac3:nac3artiq-msys2
Hydra nac3artiq-msys2-pkg Hydra build #200161 of artiq:nac3:nac3artiq-msys2-pkg
Hydra nac3artiq-profile Hydra build #200162 of artiq:nac3:nac3artiq-profile
Hydra nac3artiq Hydra build #200159 of artiq:nac3:nac3artiq
This commit was merged in pull request #750.
This commit is contained in:
@@ -0,0 +1,129 @@
|
||||
# Architecture
|
||||
|
||||
NAC3 follows a classic compiler pipeline: parse, analyze, generate. The codebase is split into several Rust crates that separate concerns cleanly enough that `nac3core` contains nothing specific to ARTIQ.
|
||||
|
||||
## Crate Layout
|
||||
| Crate | |
|
||||
| -------------- | --------------------------------------------------------------------------------------------- |
|
||||
| nac3ast | Python AST node definitions (based on [RustPython](https://github.com/RustPython/RustPython)) |
|
||||
| nac3parser | Lexer + LALRPOP parser producing nac3ast trees |
|
||||
| nac3core | Type checking, type inference, LLVM code generation |
|
||||
| nac3artiq | ARTIQ frontend - Python/PyO3 integration, timeline, RPC<br> |
|
||||
| nac3standalone | Minimal frontend - compiles a .py file to an object file |
|
||||
| nac3binutils | Linker (nac3ld), symbolizer, DWARF utilities |
|
||||
| runkernel | Test harness that runs compiled ARTIQ kernels on the host |
|
||||
|
||||
`nac3core` is where most of the compiler lives. It is intentionally frontend-agnostic: the two frontends (`nac3artiq` and `nac3standalone`) plug in through a small set of traits described below.
|
||||
|
||||
## Compilation Pipeline
|
||||
|
||||
A complete compilation proceeds in five stages. The frontends drive the first and last stages; `nac3core` owns everything in between.
|
||||
|
||||

|
||||
|
||||
### Stage 1: Parsing
|
||||
|
||||
`nac3parser` tokenizes Python source and feeds it into a LALRPOP-generated parser. The output is a `Vec<Stmt>`; a list of top-level AST statements from `nac3ast`. The parser is a lightly modified fork of RustPython's parser.
|
||||
|
||||
### Stage 2: Registration
|
||||
|
||||
The frontend walks the parsed statements and registers each class and function with `TopLevelComposer::register_top_level()`. This populates the global definition list with `TopLevelDef::Class` and `TopLevelDef::Function` entries, each identified by a `DefinitionId` (a plain `usize` index).
|
||||
|
||||
Assignments at module scope are handled separately by the frontend, typically to register `TypeVar` and `ConstGeneric` declarations or module-level constants.
|
||||
|
||||
### Stage 3: Type Analysis
|
||||
|
||||
`TopLevelComposer::start_analysis()` processes all registered definitions:
|
||||
|
||||
1. Resolves type annotations (inheritance, field types, method signatures).
|
||||
2. Runs the type inferencer on every function body.
|
||||
3. Unifies type constraints using a union-find based `Unifier`.
|
||||
|
||||
After this stage the AST is annotated: every expression node carries an `Option<Type>` indicating its inferred type. `Type` is a `UnificationKey`: a lightweight handle into the unification table, not a concrete description. To inspect what a `Type` actually is, you query the `Unifier` for its `TypeEnum`.
|
||||
|
||||
The important `TypeEnum` variants are:
|
||||
|
||||
- `TObj`: a class instance, carrying its `DefinitionId`, field map, and type parameter bindings.
|
||||
- `TFunc`: a function signature (argument types, return type, type variables).
|
||||
- `TVar`: an unconstrained or range-constrained type variable, resolved during unification.
|
||||
- `TRigidVar`: a type variable that must not be unified further (appears in generic class/function definitions).
|
||||
- `TTuple`, `TLiteral`, `TVirtual`, `TCall`: tuples, literal types, virtual dispatch wrappers, and unresolved call sites respectively.
|
||||
|
||||
### Stage 4: Code Generation
|
||||
|
||||
Code generation is parallel and demand-driven. The frontend creates a `WorkerRegistry` with N worker threads, each owning a `CodeGenerator` and an independent LLVM `Context`.
|
||||
|
||||
The entry point function is submitted as a `CodeGenTask`. When a worker picks up a task, it generates LLVM IR for that function. If the function calls another generic function with concrete type arguments that has not been compiled yet, a new `CodeGenTask` is created and placed on the shared work queue. This continues until no more tasks remain.
|
||||
|
||||
Each task carries:
|
||||
|
||||
- The function body (typed AST).
|
||||
- A `ConcreteTypeStore` with monomorphized types for this instantiation.
|
||||
- Type substitutions mapping type variables to their concrete types.
|
||||
- A `SymbolResolver` for looking up external names.
|
||||
|
||||
The per-function context is `CodeGenContext`, which holds the LLVM builder, variable assignments, type caches, and control-flow state (loop targets, unwind targets, return buffer). It derefs to `ModuleContext`, which holds the LLVM module and target-specific type information.
|
||||
|
||||
After all workers finish, the frontend links the per-worker LLVM modules together, links in the IRRT (runtime library), runs the LLVM optimization pipeline, and emits the final object file.
|
||||
|
||||
### Stage 5: Optimization and Linking
|
||||
|
||||
The merged LLVM module is run through LLVM's new pass manager. The pass string typically looks like `globaldce,strip-dead-prototypes,default<O2>`. After optimization, the target machine emits an object file. For ARTIQ, `nac3ld` performs final linking to produce an ELF suitable for loading onto the core
|
||||
device.
|
||||
|
||||
## Frontend Integration Points
|
||||
|
||||
Frontends customize the compiler through four traits and one callback:
|
||||
|
||||
**`SymbolResolver`**: maps identifiers to types and values. The frontend implements this to bridge its own name resolution (Python runtime objects in nac3artiq, a simple hash map in nac3standalone) into `nac3core`'s type system. Key methods: `get_symbol_type()`, `get_identifier_def()`, `get_symbol_value()`.
|
||||
|
||||
**`CodeGenerator`**: controls IR generation for expressions, statements, calls, and control flow. `DefaultCodeGenerator` provides the standard implementation; `ArtiqCodeGenerator` overrides `gen_with()` and `gen_call()` to handle `with parallel` blocks and timeline manipulation.
|
||||
|
||||
**`BuiltinRegistry`**: determines how AST expressions are matched to builtin type/function definitions. `DefaultBuiltinRegistry` matches by name strings; nac3artiq's `ArtiqBuiltinRegistry` matches by Python object identity (via PyO3).
|
||||
|
||||
**`TimeFns`**: (nac3artiq only) emits LLVM IR for `now_mu()`, `at_mu()`, and `delay_mu()`. Implementations differ by target ISA (VexRiscv with 32-bit or 64-bit data bus, or external function calls for host execution).
|
||||
|
||||
**`GenCall`**: a callback stored on `TopLevelDef::Function` that overrides code generation for specific functions. nac3artiq uses this for RPC stubs, where the generated code must serialize arguments and invoke the host runtime instead of calling a compiled function.
|
||||
|
||||
## nac3standalone
|
||||
|
||||
The standalone frontend is a command-line tool that compiles a single Python file to an object file. It expects a `run()` function as the entry point. The implementation is under 500 lines and serves as the reference for how to drive `nac3core`.
|
||||
|
||||
The compilation flow:
|
||||
|
||||
1. Parse the input file.
|
||||
2. Create a `TopLevelComposer` with `DefaultBuiltinRegistry`.
|
||||
3. Register all top-level definitions; handle `TypeVar`/`ConstGeneric` assignments separately.
|
||||
4. Run `start_analysis()`.
|
||||
5. Look up the `run` function, create a `CodeGenTask` for it.
|
||||
6. Spawn `WorkerRegistry` threads with `DefaultCodeGenerator`.
|
||||
7. Link modules, optimize, write `module.o`.
|
||||
|
||||
## nac3artiq
|
||||
|
||||
The ARTIQ frontend is a Python extension module (built as a `cdylib` via PyO3). It is loaded by the ARTIQ runtime and compiles `@kernel` functions on demand.
|
||||
|
||||
Key differences from the standalone frontend:
|
||||
|
||||
- **Python interop**: `InnerResolver` implements `SymbolResolver` by inspecting live Python objects through PyO3. Class fields, method signatures, and default parameter values are all extracted from the Python runtime.
|
||||
- **Decorators**: `@kernel`, `@portable`, `@rpc`, and `@extern` mark functions for different compilation strategies. `@rpc` functions get a `GenCall` callback that generates serialization/deserialization code instead of a normal function body.
|
||||
- **Parallel blocks**: `with parallel` and `with sequential` are context managers that manipulate the RTIO timeline. `ArtiqCodeGenerator` overrides `gen_with()` to track timeline positions and reset/advance the cursor appropriately.
|
||||
- **Timeline**: The `TimeFns` trait abstracts over different hardware targets.`NowPinningTimeFns64` directly reads/writes split 32-bit CSR registers on VexRiscv; `ExternTimeFns` calls out to external C functions for host-mode execution.
|
||||
- **Target ISAs**: nac3artiq can target `riscv32-unknown-linux` (Kasli/core device), `armv7-unknown-linux-eabihf` (Zynq), or the host triple.
|
||||
- **Attribute writeback**: After compilation, mutable object attributes may need to be written back to the Python runtime. This is handled by `attributes_writeback()`.
|
||||
|
||||
## IRRT (Inline Runtime)
|
||||
|
||||
The IRRT is a small runtime library written in C++ under `nac3core/irrt/`. It provides helper functions for operations that are too complex to emit inline (integer exponentiation, range slicing, string operations, list helpers, etc.).
|
||||
|
||||
The build process (in `nac3core/build.rs`):
|
||||
|
||||
1. Compile `irrt.cpp` to LLVM IR using `clang-irrt` targeting `wasm32` (to get target-independent IR).
|
||||
2. Filter the IR with regexes to keep only function definitions, declarations, type definitions, and globals.
|
||||
3. Strip debug metadata.
|
||||
4. Assemble to bitcode with `llvm-as-irrt`.
|
||||
5. Embed the bitcode via `include_bytes!()`.
|
||||
|
||||
At compile time, `load_irrt()` parses this embedded bitcode into an LLVM module and initializes exception ID globals. The module is then linked into the final output.
|
||||
|
||||
To debug IRRT issues, set `DEBUG_DUMP_IRRT=1` when building nac3core. This writes `irrt.ll` (raw) and `irrt-filtered.ll` (after regex filtering) to the build output directory.
|
||||
+134
@@ -0,0 +1,134 @@
|
||||
# Code Generation
|
||||
|
||||
This document covers the internals of `nac3core`'s type system and code generation pipeline. It is meant to orient developers on the critical types and the flow from typed AST to LLVM IR; the fine details live in Rustdoc comments on the relevant structs and functions.
|
||||
|
||||
## Type System
|
||||
|
||||
### Types and the Unifier
|
||||
|
||||
`Type` is a `UnificationKey`: a handle into the unification table. It is not a type description by itself. To inspect what a `Type` actually represents, look it up through the `Unifier`:
|
||||
|
||||
```rust
|
||||
let ty_enum: &TypeEnum = &*unifier.get_ty(some_type);
|
||||
match ty_enum {
|
||||
TypeEnum::TObj { obj_id, fields, params, .. } => { /* ... */ }
|
||||
TypeEnum::TFunc(sig) => { /* ... */ }
|
||||
// ...
|
||||
}
|
||||
```
|
||||
|
||||
The `Unifier` owns a `UnificationTable` implementing union-find. Unification merges two types by constraining them to be equal; if the constraint is contradictory, a `TypeError` is returned. During type inference, `TVar` nodes start unconstrained (or range-constrained) and get progressively pinned down as the inferencer processes the AST.
|
||||
|
||||
`SharedUnifier` (`Arc<Mutex<(UnificationTable, u32, Vec<Call>)>>`) is used when unifiers need to be shared across threads; each module gets its own unifier during analysis, and the shared form is stored in `TopLevelContext::unifiers`.
|
||||
|
||||
### PrimitiveStore
|
||||
|
||||
`PrimitiveStore` holds `Type` handles for all builtin primitive types (`int32`, `int64`, `uint32`, `uint64`, `float`, `bool`, `str`, `none`, `exception`, `option`, `ndarray`, etc.). It is created once during `TopLevelComposer` initialization and threaded through the entire pipeline.
|
||||
|
||||
### TopLevelDef and DefinitionId
|
||||
|
||||
Every class, function, and module registered with the compiler gets a `TopLevelDef` entry and a `DefinitionId` (index into the definition list).
|
||||
|
||||
`TopLevelDef::Function` has two important maps:
|
||||
|
||||
- `instance_to_symbol`: maps a string key (derived from concrete type variable bindings) to the LLVM symbol name for that instantiation.
|
||||
- `instance_to_stmt`: maps the same key to a `FunInstance` containing the typed AST body, call site information, and type substitutions.
|
||||
|
||||
When a generic function is called with specific type arguments, the codegen looks up (or creates) an entry in these maps. If a new entry is created, a new `CodeGenTask` is queued.
|
||||
|
||||
`TopLevelDef::Function` can also carry a `codegen_callback` (`GenCall`), which entirely replaces normal code generation for that function. nac3artiq uses this for RPC functions, where instead of compiling a function body, the generated code serializes arguments and calls into the ARTIQ RPC runtime.
|
||||
|
||||
## Monomorphization
|
||||
|
||||
NAC3 compiles generic functions by monomorphization: each distinct combination of concrete type arguments produces a separate LLVM function. The `ConcreteTypeStore` manages this mapping.
|
||||
|
||||
The flow:
|
||||
|
||||
1. During codegen, a call to a generic function triggers `gen_func_instance()`.
|
||||
2. The type variable bindings are collected into a substitution key (a sorted string of variable ID/type pairs).
|
||||
3. If `instance_to_symbol` already has this key, the existing symbol is reused.
|
||||
4. Otherwise a new `CodeGenTask` is created with the concrete substitutions and placed on the `WorkerRegistry` queue.
|
||||
|
||||
Because workers run in parallel, `gen_func_instance()` must handle the race where two workers try to instantiate the same function simultaneously. The default implementation uses the lock on the `TopLevelDef` to serialize this check.
|
||||
|
||||
## CodeGenContext
|
||||
|
||||
`CodeGenContext` is the per-function state during IR generation. It holds:
|
||||
|
||||
- **`builder`**: the LLVM `Builder` for emitting instructions.
|
||||
- **`var_assignment`**: maps variable names to `VarValue` (an LLVM pointer plus an optional `StaticValue` for compile-time-known values).
|
||||
- **`type_cache` / `alloca_type_cache`**: caches `Type` to LLVM `BasicTypeEnum` conversions. `alloca_type_cache` is specifically for in-memory representations (e.g., `bool` is `i8` in memory but `i1` in the ABI).
|
||||
- **`loop_target`**: `(header, exit)` basic blocks for the current loop, used by `break`/`continue`.
|
||||
- **`unwind_target`**: the landing pad for exception handling.
|
||||
- **`return_buffer`** / **`return_target`**: for functions that need a single return point (e.g., when exception cleanup is involved).
|
||||
|
||||
`CodeGenContext` derefs to `ModuleContext`, which provides access to the LLVM `Context`, `Module`, target-specific integer types (`i32`, `i64`, `size_t`), and the type context for converting nac3 types to LLVM types.
|
||||
|
||||
## Expression and Statement Generation
|
||||
|
||||
Expression codegen (`codegen/expr.rs`) and statement codegen (`codegen/stmt.rs`) are the two largest files in the codebase. They follow the AST structure closely:
|
||||
|
||||
- `gen_expr()` dispatches on `ExprKind` and returns an `RtValue` (a pair of `Type` and an optional LLVM value).
|
||||
- `gen_stmt()` dispatches on `StmtKind` and returns `()` (control flow is handled through the builder's current basic block).
|
||||
|
||||
Both are implemented as free functions that take a `&mut dyn CodeGenerator` and `&mut CodeGenContext`. The `CodeGenerator` trait methods delegate to these free functions by default, letting frontends override specific behaviors without duplicating the rest.
|
||||
|
||||
## Parallel Compilation
|
||||
|
||||
`WorkerRegistry` manages a pool of codegen worker threads. Each worker:
|
||||
|
||||
1. Receives `CodeGenTask` items from a shared channel.
|
||||
2. Creates (or reuses) a `ModuleContext` with its own LLVM `Context`.
|
||||
3. Calls `gen_func_impl()` to generate the function body.
|
||||
4. When the function calls another function that needs a new instantiation, the worker calls `registry.add_task()` to queue it.
|
||||
5. After finishing a task, writes the module bitcode to a buffer and signals completion.
|
||||
|
||||
The registry tracks outstanding tasks with a counter and a condvar. When all tasks are done, the main thread collects the per-worker LLVM bitcode buffers, links them into one module, and proceeds with optimization.
|
||||
|
||||
Workers are created with `WorkerRegistry::create_workers()`, which takes a `Vec<Box<G>>` of `CodeGenerator` instances (one per thread). This is where the frontend passes in its custom generator type.
|
||||
|
||||
## Type Layouts
|
||||
|
||||
The `codegen/types/` directory contains proxy types that map nac3 types to LLVM struct layouts. Each proxy type implements `ProxyType` and provides methods for accessing fields, creating instances, and generating related operations.
|
||||
|
||||
The important proxy types:
|
||||
|
||||
- `ListType`: a `{ptr, len}` struct. The pointer references a heap-allocated array of elements.
|
||||
- `NDArrayType`: the representation of `numpy.ndarray`. Contains data pointer, number of dimensions, shape array, and strides. Broadcasting and indexing operations are in `codegen/types/ndarray/`.
|
||||
- `StringType`: a `{ptr, len}` pair for UTF-8 data.
|
||||
- `RangeType`: `{start, stop, step}` integers.
|
||||
- `TupleType`: an LLVM struct with one field per element.
|
||||
- `ExceptionType`: carries exception class ID, message, parameters, and source location fields.
|
||||
- `OptionType`: a tagged union with a flag byte and optional value.
|
||||
|
||||
## Exception Handling
|
||||
|
||||
NAC3 uses LLVM's `landingpad`-based exception handling with a personality function. The personality symbol is set via `TopLevelContext::personality_symbol` (nac3artiq sets this to `__nac3_personality`).
|
||||
|
||||
The flow for a `try`/`except` block:
|
||||
|
||||
1. `gen_stmt` for `Try` sets `ctx.unwind_target` to a landing pad block.
|
||||
2. Calls within the `try` body are emitted as `invoke` instructions targeting both a normal continuation and the landing pad.
|
||||
3. The landing pad dispatches on exception class ID to the matching `except` clause.
|
||||
4. `raise` compiles to a call to `__nac3_raise` followed by `unreachable`.
|
||||
|
||||
Each exception class is assigned a numeric ID via `SymbolResolver::get_exception_id()`, and the IRRT uses `SymbolResolver::get_string_id()` for exception name strings.
|
||||
|
||||
## IRRT Functions
|
||||
|
||||
When you need a runtime helper that is too complex for inline LLVM IR, add it to the IRRT (`nac3core/irrt/`). The C++ source is compiled to target-independent LLVM bitcode and linked into every compilation. See `irrt/irrt.cpp` and the submodule headers.
|
||||
|
||||
To call an IRRT function from Rust codegen, declare it in the appropriate `codegen/irrt/*.rs` module and call it through the LLVM builder. Functions that need to differ between 32-bit and 64-bit `size_t` use the `get_usize_dependent_function_name()` helper to select the right variant.
|
||||
|
||||
## Builtin Functions
|
||||
|
||||
Builtin functions (e.g., `int32()`, `len()`, `range()`, `np_zeros()`) are registered during `TopLevelComposer` initialization. The `PrimDef` enum in `toplevel/helper.rs` lists every builtin type and function.
|
||||
|
||||
Most builtins have their code generation in `codegen/builtin_fns.rs`. NumPy operations are in `codegen/numpy.rs`. The implementations receive `CodeGenContext` and the call arguments, and return the result as LLVM values.
|
||||
|
||||
When adding a new builtin:
|
||||
|
||||
1. Add a variant to `PrimDef`.
|
||||
2. Register the type and signature in `make_primitives()`.
|
||||
3. Write the codegen implementation.
|
||||
4. If the builtin needs a `GenCall` callback (because it requires custom calling conventions), set `codegen_callback` on the `TopLevelDef::Function`.
|
||||
Binary file not shown.
|
After Width: | Height: | Size: 157 KiB |
+148
@@ -0,0 +1,148 @@
|
||||
# Developer Guide
|
||||
|
||||
Practical information for building, testing, debugging, and extending NAC3.
|
||||
|
||||
## Building
|
||||
|
||||
### With Nix
|
||||
|
||||
```
|
||||
$ nix develop # enter the dev shell (bash)
|
||||
$ nix develop --command zsh # or use your preferred shell
|
||||
$ cargo build --release
|
||||
```
|
||||
|
||||
The Nix flake provides LLVM 16, `clang-irrt`, `llvm-as-irrt`, and all other
|
||||
dependencies.
|
||||
|
||||
### PGO Build
|
||||
|
||||
The flake includes a profile-guided optimization (PGO) build for nac3artiq. PGO recompiles LLVM itself using profiling data collected from a real ARTIQ compilation, which improves codegen throughput.
|
||||
|
||||
```
|
||||
$ nix build .#nac3artiq-pgo -L
|
||||
```
|
||||
|
||||
The PGO pipeline has three stages, all handled automatically by Nix:
|
||||
|
||||
1. **Instrumented build** (`nac3artiq-instrumented`): builds nac3artiq against an instrumented LLVM that records branch frequency data during execution.
|
||||
2. **Profile collection** (`nac3artiq-profile`): runs the instrumented compiler on the `nac3devices` ARTIQ example to produce `llvm.profdata`.
|
||||
3. **PGO build** (`nac3artiq-pgo`): rebuilds LLVM with the collected profile applied, then builds nac3artiq against this optimized LLVM.
|
||||
|
||||
The intermediate packages can also be built individually if needed (e.g., `nix build .#nac3artiq-profile` to just collect profile data).
|
||||
|
||||
### IRRT Build
|
||||
|
||||
The `nac3core` build script (`build.rs`) compiles the C++ runtime under `nac3core/irrt/` to LLVM bitcode. If you modify IRRT sources, `cargo` will automatically rebuild. To inspect the generated IR:
|
||||
|
||||
```
|
||||
$ DEBUG_DUMP_IRRT=1 cargo build -p nac3core
|
||||
```
|
||||
|
||||
This writes `irrt.ll` and `irrt-filtered.ll` to the cargo output directory (printed by cargo as `OUT_DIR`).
|
||||
|
||||
## Running nac3standalone
|
||||
|
||||
The standalone compiler expects a Python file with a `run()` entry point:
|
||||
|
||||
```
|
||||
$ cargo run --release -p nac3standalone -- my_program.py
|
||||
```
|
||||
|
||||
This produces `module.o`. Link it against your runtime stubs (e.g., the demo `output_*` functions) to get an executable.
|
||||
|
||||
Useful flags:
|
||||
|
||||
- `-O0` / `-O2` / `-O3`: optimization level
|
||||
- `--emit-llvm-ir`: write `main.ll` for each compilation stage
|
||||
- `--emit-llvm-bc`: write `main.bc` (bitcode)
|
||||
- `-T 0`: use all available threads for compilation
|
||||
|
||||
### Running demos
|
||||
|
||||
The `nac3standalone/demo/` directory contains example programs and a helper
|
||||
script that compiles, links, and runs them in one step. From the demo directory:
|
||||
|
||||
```
|
||||
$ cd nac3standalone/demo
|
||||
$ ./run_demo.sh -- src/demo_test.py
|
||||
```
|
||||
|
||||
`run_demo.sh` does three things:
|
||||
|
||||
1. Compiles the Python source with `nac3standalone`, producing `module.o`.
|
||||
2. Compiles `demo.c` (the C runtime stubs for `output_int32`, `output_bool`, etc.) with clang.
|
||||
3. Links both object files (plus `liblinalg.a` for linear algebra demos) into an executable and runs it.
|
||||
|
||||
Options:
|
||||
|
||||
- `--debug`: use the debug build of nac3standalone instead of release.
|
||||
- `-i686`: cross-compile to 32-bit x86 (uses `--triple i686-unknown-linux-gnu` and links against the 32-bit linalg stub).
|
||||
- `--out OUTFILE`: redirect the program output to a file instead of stdout.
|
||||
- Extra nac3standalone flags can be passed after `--`: e.g., `./run_demo.sh -- --emit-llvm-ir src/demo_test.py`.
|
||||
|
||||
### Checking demos
|
||||
|
||||
`check_demos.sh` runs every `src/*.py` demo through both the Python interpreter and the NAC3 compiler, then diffs the output:
|
||||
|
||||
```
|
||||
$ cd nac3standalone/demo
|
||||
$ ./check_demos.sh
|
||||
```
|
||||
|
||||
This is the same check that the Nix build runs. Pass `-i686` to also verify 32-bit output. Individual demos can be checked with `check_demo.sh`:
|
||||
|
||||
```
|
||||
$ ./check_demo.sh src/demo_test.py
|
||||
```
|
||||
|
||||
## Running nac3artiq + runkernel locally
|
||||
|
||||
For testing ARTIQ kernels without hardware, use `runkernel`. It provides stub implementations of `now_mu`, `at_mu`, `delay_mu`, `rtio_output`, and a few other ARTIQ syscalls.
|
||||
|
||||
The workflow:
|
||||
|
||||
1. Compile your kernel. nac3artiq produces `module.elf` (and optionally `debug.elf`) when invoked through the ARTIQ `Core.run()` method. The demo under `nac3artiq/demo/` shows the minimal setup, including `min_artiq.py` (a self-contained ARTIQ-like environment) and `device_db.py`.
|
||||
2. Run through runkernel:
|
||||
```
|
||||
$ cargo run --release -p runkernel -- module.elf
|
||||
```
|
||||
`runkernel` loads the ELF, looks up `__modinit__`, and executes it. RTIO calls print their arguments so you can trace the output timeline.
|
||||
|
||||
### Running the demo
|
||||
|
||||
```
|
||||
$ cd nac3artiq/demo
|
||||
$ python demo.py
|
||||
```
|
||||
|
||||
This uses `min_artiq.py` to set up the compiler, compiles the demo kernels, and produces `module.elf`. You can then run it with `runkernel` as above.
|
||||
|
||||
## Testing
|
||||
|
||||
```
|
||||
$ cargo test # all tests
|
||||
$ cargo test -p nac3core # core tests only
|
||||
$ cargo test -p nac3parser # parser tests only
|
||||
```
|
||||
|
||||
## Extending the Compiler
|
||||
|
||||
### Adding a new type to codegen
|
||||
|
||||
The canonical pattern for adding type support in `codegen/types/`:
|
||||
|
||||
1. Create a new file (e.g., `codegen/types/mytype.rs`).
|
||||
2. Define a struct that wraps the LLVM struct layout.
|
||||
3. Implement `ProxyType` for it. This provides the interface for creating instances, accessing fields, and converting to/from LLVM values.
|
||||
4. Register the type in `codegen/types/mod.rs`.
|
||||
5. Add handling in `gen_expr` and `gen_stmt` where the type appears (attribute access, method calls, etc.).
|
||||
|
||||
### Adding a new builtin function
|
||||
|
||||
1. Add a variant to the `PrimDef` enum in `toplevel/helper.rs`.
|
||||
2. In `make_primitives()` (same file), register the function's type signature with the `TopLevelComposer`.
|
||||
3. If the function needs special type-checking logic (e.g., it accepts heterogeneous argument types returns a type derived from its arguments, or cannot be expressed as a simple signature), add a branch to `try_fold_special_call()` in `typecheck/type_inferencer/mod.rs`. This is where builtins like `len()`, `virtual()`, and NumPy array constructors perform their custom type inference.
|
||||
4. Implement code generation. For simple functions, add a branch in `codegen/builtin_fns.rs`. For NumPy functions, use `codegen/numpy.rs`.
|
||||
5. If the function needs custom calling conventions (like RPC), create a `GenCall` callback and assign it to the `TopLevelDef::Function`'s `codegen_callback` field.
|
||||
6. Register the function in the frontend's builtin registry (`DefaultBuiltinRegistry` or `ArtiqBuiltinRegistry`).
|
||||
@@ -0,0 +1,14 @@
|
||||
# NAC3 Developer Documentation
|
||||
|
||||
NAC3 is a Python-to-machine-code compiler. It compiles a statically-typed subset of Python to LLVM IR, for use in
|
||||
[ARTIQ](https://m-labs.hk/artiq). The compiler is written in Rust and uses
|
||||
[inkwell](https://github.com/TheDan64/inkwell) as its LLVM binding.
|
||||
|
||||
This documentation is intended for developers working on NAC3 itself. For user-facing language documentation, see the
|
||||
[ARTIQ manual](https://m-labs.hk/artiq/manual/).
|
||||
|
||||
## Contents
|
||||
|
||||
- [Architecture](architecture.md) - Crate layout, compilation pipeline, and how the pieces fit together.
|
||||
- [Code Generation](codegen.md) - LLVM IR generation, the `CodeGenerator` trait, parallel compilation, IRRT, and type layouts.
|
||||
- [Developer Guide](guide.md) - Building, debugging, extending codegen/types, running nac3artiq locally, and common pitfalls.
|
||||
@@ -64,6 +64,8 @@ enum ParallelMode {
|
||||
Deep,
|
||||
}
|
||||
|
||||
/// ARTIQ-specific code generator that extends the default with timeline manipulation,
|
||||
/// `with parallel`/`with sequential` block handling, and RPC support.
|
||||
pub struct ArtiqCodeGenerator<'a> {
|
||||
name: String,
|
||||
|
||||
|
||||
@@ -3,7 +3,11 @@ use nac3core::{
|
||||
inkwell::{AtomicOrdering, values::BasicValueEnum},
|
||||
};
|
||||
|
||||
/// Functions for manipulating the timeline.
|
||||
/// Trait for emitting LLVM IR for ARTIQ timeline operations.
|
||||
///
|
||||
/// Different implementations target different hardware backends: `NowPinningTimeFns64`
|
||||
/// directly reads/writes split 32-bit CSR registers on VexRiscv, while `ExternTimeFns`
|
||||
/// calls external C functions (used for host-mode execution and `runkernel`).
|
||||
pub trait TimeFns {
|
||||
/// Emits LLVM IR for `now_mu`.
|
||||
fn emit_now_mu<'ctx>(
|
||||
|
||||
@@ -15,6 +15,11 @@ use crate::{
|
||||
typecheck::typedef::{FunSignature, Type},
|
||||
};
|
||||
|
||||
/// Trait for customizing LLVM IR generation.
|
||||
///
|
||||
/// The default implementations delegate to the free functions in `codegen::expr` and
|
||||
/// `codegen::stmt`. Frontends override specific methods to change behavior -- for example,
|
||||
/// `ArtiqCodeGenerator` overrides `gen_with()` to handle `with parallel` blocks.
|
||||
pub trait CodeGenerator {
|
||||
/// Return the module name for the code generator.
|
||||
fn get_name(&self) -> &str;
|
||||
@@ -221,6 +226,7 @@ pub trait CodeGenerator {
|
||||
}
|
||||
}
|
||||
|
||||
/// Default code generator with no frontend-specific behavior. Used by nac3standalone.
|
||||
pub struct DefaultCodeGenerator {
|
||||
name: String,
|
||||
}
|
||||
|
||||
@@ -326,6 +326,12 @@ impl WithCall {
|
||||
}
|
||||
}
|
||||
|
||||
/// Thread pool for parallel code generation.
|
||||
///
|
||||
/// Workers consume `CodeGenTask` items from a shared channel. Each worker has its own LLVM
|
||||
/// `Context` and `CodeGenerator`. When a function call requires a new monomorphized instance,
|
||||
/// the worker queues a new task. The main thread waits for all tasks to complete, then
|
||||
/// collects the per-worker LLVM bitcode buffers for linking.
|
||||
pub struct WorkerRegistry {
|
||||
sender: Arc<Sender<Option<CodeGenTask>>>,
|
||||
receiver: Arc<Receiver<Option<CodeGenTask>>>,
|
||||
@@ -485,6 +491,7 @@ impl WorkerRegistry {
|
||||
}
|
||||
}
|
||||
|
||||
/// A unit of work for the codegen thread pool, representing one monomorphized function.
|
||||
pub struct CodeGenTask {
|
||||
pub subst: Vec<(Type, ConcreteType)>,
|
||||
pub store: ConcreteTypeStore,
|
||||
|
||||
@@ -349,6 +349,11 @@ impl<'ctx> ValueEnum<'ctx> {
|
||||
}
|
||||
}
|
||||
|
||||
/// Trait for resolving identifiers to types and values.
|
||||
///
|
||||
/// Frontends implement this trait to bridge their name resolution (e.g., Python runtime objects
|
||||
/// in nac3artiq, or a simple hash map in nac3standalone) into the nac3core type system. The
|
||||
/// resolver is consulted during type inference, type analysis, and code generation.
|
||||
pub trait SymbolResolver {
|
||||
/// Get type of type variable identifier or top-level function type,
|
||||
fn get_symbol_type(
|
||||
|
||||
@@ -34,6 +34,7 @@ use crate::{
|
||||
/// for standalone mode. Use `DefaultBuiltinRegistry` when you need a simple
|
||||
/// builtin registry without custom matching logic.
|
||||
#[derive(Debug, Clone, Copy, Default)]
|
||||
/// Name-based builtin registry used by nac3standalone. Matches builtins by string comparison.
|
||||
pub struct DefaultBuiltinRegistry;
|
||||
|
||||
impl BuiltinRegistry for DefaultBuiltinRegistry {}
|
||||
@@ -374,6 +375,14 @@ pub fn promote_expr_type(
|
||||
}
|
||||
|
||||
pub type DefAst = (Arc<RwLock<TopLevelDef>>, Option<Stmt<()>>);
|
||||
|
||||
/// Orchestrates the registration and type analysis of all top-level definitions.
|
||||
///
|
||||
/// The typical usage is:
|
||||
/// 1. Create with `TopLevelComposer::new()`.
|
||||
/// 2. Call `register_top_level()` for each class and function definition.
|
||||
/// 3. Call `start_analysis()` to run type inference and unification on all definitions.
|
||||
/// 4. Call `make_top_level_context()` to produce a `TopLevelContext` for code generation.
|
||||
pub struct TopLevelComposer {
|
||||
// list of top level definitions, same as top level context
|
||||
pub definition_ast_list: Vec<DefAst>,
|
||||
|
||||
@@ -28,6 +28,7 @@ pub mod numpy;
|
||||
mod test;
|
||||
pub mod type_annotation;
|
||||
|
||||
/// Index of a top-level definition (class, function, or module) in the global definition list.
|
||||
#[derive(PartialEq, Eq, PartialOrd, Ord, Clone, Copy, Hash, Debug)]
|
||||
pub struct DefinitionId(pub usize);
|
||||
|
||||
@@ -40,6 +41,10 @@ type GenCallCallback = dyn for<'ctx, 'a> Fn(
|
||||
+ Send
|
||||
+ Sync;
|
||||
|
||||
/// A callback that overrides code generation for a specific function.
|
||||
///
|
||||
/// Used by frontends to implement custom calling conventions (e.g., RPC serialization in
|
||||
/// nac3artiq) instead of generating a normal function call.
|
||||
pub struct GenCall {
|
||||
fp: Box<GenCallCallback>,
|
||||
}
|
||||
@@ -74,6 +79,8 @@ impl Debug for GenCall {
|
||||
}
|
||||
}
|
||||
|
||||
/// A monomorphized instance of a generic function, containing the typed AST body and the
|
||||
/// type variable substitutions for this particular instantiation.
|
||||
#[derive(Clone, Debug)]
|
||||
pub struct FunInstance {
|
||||
pub body: Arc<Vec<Stmt<Option<Type>>>>,
|
||||
@@ -87,6 +94,11 @@ pub enum FunAttribute {
|
||||
StaticMethod,
|
||||
}
|
||||
|
||||
/// A top-level definition: module, class, or function.
|
||||
///
|
||||
/// Definitions are stored in a global list and referenced by [`DefinitionId`]. During type
|
||||
/// analysis, fields and method signatures are populated. During code generation, function
|
||||
/// instances are created on demand as generic functions are called with concrete type arguments.
|
||||
#[derive(Debug, Clone)]
|
||||
pub enum TopLevelDef {
|
||||
Module {
|
||||
@@ -167,6 +179,10 @@ pub enum TopLevelDef {
|
||||
},
|
||||
}
|
||||
|
||||
/// Global compilation context shared across all codegen workers.
|
||||
///
|
||||
/// Contains the full list of top-level definitions, per-module unifiers, and the builtin
|
||||
/// registry. Created by `TopLevelComposer::make_top_level_context()` after type analysis.
|
||||
pub struct TopLevelContext {
|
||||
pub definitions: Arc<RwLock<Vec<Arc<RwLock<TopLevelDef>>>>>,
|
||||
pub unifiers: Arc<RwLock<Vec<(SharedUnifier, PrimitiveStore)>>>,
|
||||
|
||||
@@ -229,7 +229,10 @@ impl AttrKind {
|
||||
}
|
||||
}
|
||||
|
||||
/// Category of variable and value types.
|
||||
/// The concrete representation of a type, stored in the unification table.
|
||||
///
|
||||
/// `Type` handles are lightweight keys; to inspect what a type actually is, look it up
|
||||
/// through `Unifier::get_ty()` to obtain a `TypeEnum`.
|
||||
#[derive(Debug, Clone)]
|
||||
pub enum TypeEnum {
|
||||
TRigidVar {
|
||||
@@ -314,6 +317,11 @@ impl TypeEnum {
|
||||
|
||||
pub type SharedUnifier = Arc<Mutex<(UnificationTable<TypeEnum>, u32, Vec<Call>)>>;
|
||||
|
||||
/// Type unification engine based on union-find.
|
||||
///
|
||||
/// Manages type constraints during inference and resolves type variables to concrete types.
|
||||
/// Each module gets its own `Unifier` during analysis; during code generation, workers receive
|
||||
/// a snapshot.
|
||||
#[derive(Clone)]
|
||||
pub struct Unifier {
|
||||
pub(crate) top_level: Option<Arc<TopLevelContext>>,
|
||||
|
||||
Reference in New Issue
Block a user