Fix bare except:s not catching anything #354

Merged
sb10q merged 1 commits from jcoates/artiq-zynq:dwarf-target2-ttype into master 2025-02-05 15:42:00 +08:00
Contributor

Bare exceptions are currently broken on Kasli-SoC (https://nixbld.m-labs.hk/build/173443/nixlog/12), and do not catch any exceptions.

I believe this is because under the ARM EHABI, the type info pointer is encoded as a R_ARM_TARGET2 value (see section 10.2), rather than a DWARF encoded pointer. This format is largely equivalent to DW_EH_PE_pcrel. However, null pointers should not apply an offset, and should instead return 0 directly.1

I have run all tests in test_exceptions.py with this patch, and then now all pass. except: clauses which specify an exception continue to work with this change.


  1. It's not clear to me what the correct semantics of a 0-offset with DW_EH_PE_pcrel (or other relative offsets) would be. GCC skips applying offsets when the value is 0, but other implementations (e.g. LLVM or Rust's) do not. ↩︎

Bare exceptions are currently broken on Kasli-SoC (https://nixbld.m-labs.hk/build/173443/nixlog/12), and do not catch any exceptions. I *believe* this is because under the ARM EHABI, the type info pointer is encoded as a `R_ARM_TARGET2` value (see [section 10.2](https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#personality-routine-exception-handling-table-entries)), rather than a DWARF encoded pointer. This format is largely equivalent to `DW_EH_PE_pcrel`. However, null pointers should *not* apply an offset, and should instead return 0 directly.[^1] I have run all tests in [`test_exceptions.py`](https://github.com/m-labs/artiq/blob/master/artiq/test/coredevice/test_exceptions.py) with this patch, and then now all pass. `except:` clauses which specify an exception continue to work with this change. [^1]: It's not clear to me what the correct semantics of a 0-offset with `DW_EH_PE_pcrel` (or other relative offsets) would be. [GCC skips applying offsets](https://github.com/gcc-mirror/gcc/blob/2a77afa0ee41cb8a3664679dcd4545ccd1aa3b35/libgcc/unwind-pe.h#L265) when the value is 0, but other implementations (e.g. LLVM or Rust's) do not.
jcoates added 1 commit 2025-01-30 21:39:05 +08:00
Under the ARM EHABI, type information is encoded as a TARGET2 value,
rather than a DWARF pointer.

For the kernel ELF object, we have a ttype_encoding of DW_EH_PE_absptr,
which we combine with DW_EH_PE_pcrel when reading the type info.

For non-null pointers, this is equivalent to TARGET2 - we read a 32bit
value, and then use it as an offset from the current DWARF info. However,
null pointers should *not* apply an offset, and should instead return 0
directly.

An alternative fix here would be to change read_encoded_pointer_with_base
to skip adding base/original_ptr when reading 0. This is what GCC
does[^1], but feels a more dangerous change.

[^1]: 2a77afa0ee/libgcc/unwind-pe.h (L265)
Owner

Have you ran the remaining tests to check if anything else was broken?

Have you ran the remaining tests to check if anything else was broken?
srenblad was assigned by sb10q 2025-02-01 09:49:39 +08:00
srenblad reviewed 2025-02-03 10:27:07 +08:00
@ -75,17 +75,23 @@ unsafe fn get_ttype_entry(
ttype_base: usize,
ttype: *const u8,
) -> Result<Option<*const u8>, ()> {
if encoding != DW_EH_PE_absptr {
Collaborator

Is this check necessary?

Is this check necessary?
srenblad marked this conversation as resolved
srenblad reviewed 2025-02-03 11:26:02 +08:00
@ -89,0 +84,4 @@
// Read in a target2 value. See "Personality routine exception-handling table
// entries" in the ehabi32 spec, and the ARM_EHABI version of `get_shim_type_info`
// in LLVM's libcxxabi/src/cxa_personality.cpp.
let mut reader = DwarfReader::new(ttype.offset(-i));
Collaborator

I am trying to understand what this is doing differently from the read_encoded_pointer_with_base version, assuming the encoding is DW_EH_PE_absptr | DW_EH_PE_pcrel? My assumption is that the pcrel hack is just to add the offset and the original offset, which you also do explicitly here (only difference being the use of add vs +=).

I am trying to understand what this is doing differently from the `read_encoded_pointer_with_base` version, assuming the encoding is `DW_EH_PE_absptr | DW_EH_PE_pcrel`? My assumption is that the pcrel hack is just to add the offset and the original offset, which you also do explicitly here (only difference being the use of `add` vs `+=`).
srenblad marked this conversation as resolved
jcoates force-pushed dwarf-target2-ttype from 1be126606d to 8b9bb38331 2025-02-04 00:02:46 +08:00 Compare
Author
Contributor

I am trying to understand what this is doing differently from the read_encoded_pointer_with_base version, assuming the encoding is DW_EH_PE_absptr | DW_EH_PE_pcrel?

The difference is for the null pointer — in those cases, we shouldn't apply the offset, and keep it as null. After discussions with @dpn on Mattermost, it's actually fine to check that in read_encoded_pointer_with_base, so have reverted the previous commit and applied that change instead.

> I am trying to understand what this is doing differently from the `read_encoded_pointer_with_base` version, assuming the encoding is `DW_EH_PE_absptr | DW_EH_PE_pcrel`? The difference is for the null pointer — in those cases, we shouldn't apply the offset, and keep it as null. After discussions with @dpn [on Mattermost](https://chat.m-labs.hk/m-labs/pl/4qtc6yj85bge8jruko9imhw8nc), it's actually fine to check that in `read_encoded_pointer_with_base`, so have reverted the previous commit and applied that change instead.
Collaborator

@jcoates Noted, I see the distinction more clearly now. Assuming this passes all HITL tests, we can merge.

EDIT:
I ran some tests in office, seems one of the format tests was not yet adopted for zynq, but thats unrelated to this PR. The rest of the tests pass.

@jcoates Noted, I see the distinction more clearly now. Assuming this passes all HITL tests, we can merge. EDIT: I ran some tests in office, seems one of the format tests was not yet adopted for zynq, but thats unrelated to this PR. The rest of the tests pass.
Author
Contributor

I ran some tests in office, seems one of the format tests was not yet adopted for zynq, but thats unrelated to this PR. The rest of the tests pass.

Was that artiq.test.coredevice.test_exceptions.ExceptionFormatTest.test_nested_formatted_kernel_exception? I'd definitely expect that one to pass — it does in my tests (ARTIQ d4bd4e587958a3ea249a7d4759d2ddf144d51a85).

> I ran some tests in office, seems one of the format tests was not yet adopted for zynq, but thats unrelated to this PR. The rest of the tests pass. Was that `artiq.test.coredevice.test_exceptions.ExceptionFormatTest.test_nested_formatted_kernel_exception`? I'd definitely expect that one to pass — it does in my tests (ARTIQ d4bd4e587958a3ea249a7d4759d2ddf144d51a85).
Collaborator

Was that artiq.test.coredevice.test_exceptions.ExceptionFormatTest.test_nested_formatted_kernel_exception?

Apparently the test doesn't account for a device without a device_map, so unrelated to this PR. But to clarify, the test that fails is test_rtio_underflow.

> Was that artiq.test.coredevice.test_exceptions.ExceptionFormatTest.test_nested_formatted_kernel_exception? Apparently the test doesn't account for a device without a `device_map`, so unrelated to this PR. But to clarify, the test that fails is `test_rtio_underflow`.
sb10q merged commit 8b9bb38331 into master 2025-02-05 15:42:00 +08:00
Sign in to join this conversation.
No reviewers
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: M-Labs/artiq-zynq#354
No description provided.