b0300b16ed
optimize memset and memclr for ARM This commit optimizes those routines by rewriting them in assembly and performing the memory copying in 32-bit chunks, rather than in 8-bit chunks as it was done before this commit. This assembly implementation is compatible with the ARMv6 and ARMv7 architectures. This change results in a reduction of runtime of about 40-70% in all cases that matter (the compiler will never use these intrinsics for sizes smaller than 4 bytes). See data below: | Bytes | HEAD | this PR | diff | | ----- | ---- | ------- | ---------- | | 0 | 6 | 14 | +133.3333% | | 1 | 10 | 13 | +30% | | 2 | 14 | 13 | -7.1429% | | 3 | 18 | 13 | -27.77% | | 4 | 24 | 21 | -12.5% | | 16 | 70 | 36 | -48.5714% | | 64 | 263 | 97 | -63.1179% | | 256 | 1031 | 337 | -67.3133% | | 1024 | 4103 | 1297 | -68.389% | All times are in clock cycles. The measurements were done on a Cortex-M3 processor running at 8 MHz using the technique described [here]. [here]: http://blog.japaric.io/rtfm-overhead --- For relevance all pure Rust programs for Cortex-M microcontrollers use memclr to zero the .bss during startup so this change results in a quicker boot time. Some questions / comments: - ~~the original code (it had a bug) comes from this [repo] and it's licensed under the ICS license. I have preserved the copyright and license text in the source code. IANAL, is that OK?~~ no longer applies. The intrinsics are written in Rust now. - ~~I don't know whether this ARM implementation works for ARMv4 or ARMv5. @FenrirWolf and @Uvekilledkenny may want to take look at it first.~~ no longer applies. The intrinsics are written in Rust now. - ~~No idea whether this implementation works on processors that have no thumb instruction set. The current implementation uses 16-bit thumb instructions.~~ no longer applies. The intrinsics are written in Rust now. - ~~The loop code can be rewritten in less instructions but using 32-bit thumb instructions. That 32-bit version would only work on ARMv7 though. I have yet to check whether that makes any difference in the runtime of the intrinsic.~~ no longer applies. The intrinsics are written in Rust now. - ~~I'll look into memcpy4 next.~~ done [repo]: https://github.com/bobbl/libaeabi-cortexm0 |
||
---|---|---|
ci | ||
compiler-rt@3bc0272cab | ||
examples | ||
src | ||
tests | ||
.gitignore | ||
.gitmodules | ||
.travis.yml | ||
appveyor.yml | ||
build.rs | ||
Cargo.toml | ||
LICENSE.TXT | ||
README.md | ||
thumbv6m-linux-eabi.json | ||
thumbv7em-linux-eabi.json | ||
thumbv7em-linux-eabihf.json | ||
thumbv7m-linux-eabi.json |
compiler-builtins
[WIP] Porting
compiler-rt
intrinsics to Rust
See rust-lang/rust#35437.
When and how to use this crate?
If you are working with a target that doesn't have binary releases of std
available via rustup (this probably means you are building the core crate
yourself) and need compiler-rt intrinsics (i.e. you are probably getting linker
errors when building an executable: undefined reference to __aeabi_memcpy
),
you can use this crate to get those intrinsics and solve the linker errors. To
do that, add this crate somewhere in the dependency graph of the crate you are
building:
# Cargo.toml
[dependencies]
compiler_builtins = { git = "https://github.com/rust-lang-nursery/compiler-builtins" }
extern crate compiler_builtins;
// ...
If you still get an "undefined reference to $INTRINSIC" error after that change,
that means that we haven't ported $INTRINSIC
to Rust yet! Please open an
issue with the name of the intrinsic and the LLVM triple (e.g.
thumbv7m-none-eabi) of the target you are using. That way we can prioritize
porting that particular intrinsic.
If you've got a C compiler available for your target then while we implement this intrinsic you can temporarily enable a fallback to the actual compiler-rt implementation as well for unimplemented intrinsics:
[dependencies.compiler_builtins]
git = "https://github.com/rust-lang-nursery/compiler-builtins"
features = ["c"]
Contributing
- Pick one or more intrinsics from the [pending list][#progress].
- Fork this repository
- Port the intrinsic(s) and their corresponding unit tests from their C implementation to Rust.
- Send a Pull Request (PR)
- Once the PR passes our extensive testing infrastructure, we'll merge it!
- Celebrate 🎉
Porting Reminders
- Rust and C have slightly different operator precedence. C evaluates comparisons (
== !=
) before bitwise operations (& | ^
), while Rust evaluates the other way. - C assumes wrapping operations everywhere. Rust panics on overflow when in debug mode. Consider using the Wrapping type or the explicit wrapping_* functions where applicable.
- Note C implicit casts, especially integer promotion. Rust is much more explicit about casting, so be sure that any cast which affects the output is ported to the Rust implementation.
- Rust has many functions for integer or floating point manipulation in the standard library. Consider using one of these functions rather than porting a new one.
Progress
- adddf3.c
- addsf3.c
- arm/adddf3vfp.S
- arm/addsf3vfp.S
- arm/aeabi_dcmp.S
- arm/aeabi_fcmp.S
- arm/aeabi_idivmod.S
- arm/aeabi_ldivmod.S
- arm/aeabi_memcpy.S
- arm/aeabi_memmove.S
- arm/aeabi_memset.S
- arm/aeabi_uidivmod.S
- arm/aeabi_uldivmod.S
- arm/divdf3vfp.S
- arm/divmodsi4.S (generic version is done)
- arm/divsf3vfp.S
- arm/divsi3.S (generic version is done)
- arm/eqdf2vfp.S
- arm/eqsf2vfp.S
- arm/extendsfdf2vfp.S
- arm/fixdfsivfp.S
- arm/fixsfsivfp.S
- arm/fixunsdfsivfp.S
- arm/fixunssfsivfp.S
- arm/floatsidfvfp.S
- arm/floatsisfvfp.S
- arm/floatunssidfvfp.S
- arm/floatunssisfvfp.S
- arm/gedf2vfp.S
- arm/gesf2vfp.S
- arm/gtdf2vfp.S
- arm/gtsf2vfp.S
- arm/ledf2vfp.S
- arm/lesf2vfp.S
- arm/ltdf2vfp.S
- arm/ltsf2vfp.S
- arm/modsi3.S (generic version is done)
- arm/muldf3vfp.S
- arm/mulsf3vfp.S
- arm/nedf2vfp.S
- arm/negdf2vfp.S
- arm/negsf2vfp.S
- arm/nesf2vfp.S
- arm/softfloat-alias.list
- arm/subdf3vfp.S
- arm/subsf3vfp.S
- arm/truncdfsf2vfp.S
- arm/udivmodsi4.S (generic version is done)
- arm/udivsi3.S (generic version is done)
- arm/umodsi3.S (generic version is done)
- arm/unorddf2vfp.S
- arm/unordsf2vfp.S
- ashldi3.c
- ashrdi3.c
- divdf3.c
- divdi3.c
- divmoddi4.c
- divmodsi4.c
- divsf3.c
- divsi3.c
- extendhfsf2.c
- extendsfdf2.c
- fixdfdi.c
- fixdfsi.c
- fixsfdi.c
- fixsfsi.c
- fixunsdfdi.c
- fixunsdfsi.c
- fixunssfdi.c
- fixunssfsi.c
- floatdidf.c
- floatdisf.c
- floatsidf.c
- floatsisf.c
- floatundidf.c
- floatundisf.c
- floatunsidf.c
- floatunsisf.c
- i386/ashldi3.S
- i386/ashrdi3.S
- i386/chkstk.S
- i386/chkstk2.S
- i386/divdi3.S
- i386/lshrdi3.S
- i386/moddi3.S
- i386/muldi3.S
- i386/udivdi3.S
- i386/umoddi3.S
- lshrdi3.c
- moddi3.c
- modsi3.c
- muldf3.c
- muldi3.c
- mulodi4.c
- mulosi4.c
- mulsf3.c
- powidf2.c
- powisf2.c
- subdf3.c
- subsf3.c
- truncdfhf2.c
- truncdfsf2.c
- truncsfhf2.c
- udivdi3.c
- udivmoddi4.c
- udivmodsi4.c
- udivsi3.c
- umoddi3.c
- umodsi3.c
- x86_64/chkstk.S
- x86_64/chkstk2.S
These builtins are needed to support 128-bit integers, which are in the process of being added to Rust.
- ashlti3.c
- ashrti3.c
- divti3.c
- fixdfti.c
- fixsfti.c
- fixunsdfti.c
- fixunssfti.c
- floattidf.c
- floattisf.c
- floatuntidf.c
- floatuntisf.c
- lshrti3.c
- modti3.c
- muloti4.c
- multi3.c
- udivmodti4.c
- udivti3.c
- umodti3.c
Unimplemented functions
These builtins involve floating-point types ("f128
", "f80
" and complex numbers) that are not supported by Rust.
addtf3.ccomparetf2.cdivdc3.cdivsc3.cdivtc3.cdivtf3.cdivxc3.cextenddftf2.cextendsftf2.cfixtfdi.cfixtfsi.cfixtfti.cfixunstfdi.cfixunstfsi.cfixunstfti.cfixunsxfdi.cfixunsxfsi.cfixunsxfti.cfixxfdi.cfixxfti.cfloatditf.cfloatdixf.cfloatsitf.cfloattixf.cfloatunditf.cfloatundixf.cfloatunsitf.cfloatuntixf.ci386/floatdixf.Si386/floatundixf.Smuldc3.cmulsc3.cmultc3.cmultf3.cmulxc3.cpowitf2.cpowixf2.cppc/divtc3.cppc/fixtfdi.cppc/fixunstfdi.cppc/floatditf.cppc/floatunditf.cppc/gcc_qadd.cppc/gcc_qdiv.cppc/gcc_qmul.cppc/gcc_qsub.cppc/multc3.csubtf3.ctrunctfdf2.ctrunctfsf2.cx86_64/floatdixf.cx86_64/floatundixf.S
These builtins are never called by LLVM.
absvdi2.cabsvsi2.cabsvti2.caddvdi3.caddvsi3.caddvti3.carm/aeabi_cdcmp.Sarm/aeabi_cdcmpeq_check_nan.carm/aeabi_cfcmp.Sarm/aeabi_cfcmpeq_check_nan.carm/aeabi_div0.carm/aeabi_drsub.carm/aeabi_frsub.carm/aeabi_memcmp.Sarm/bswapdi2.Sarm/bswapsi2.Sarm/clzdi2.Sarm/clzsi2.Sarm/comparesf2.Sarm/restore_vfp_d8_d15_regs.Sarm/save_vfp_d8_d15_regs.Sarm/switch16.Sarm/switch32.Sarm/switch8.Sarm/switchu8.Sclzdi2.cclzsi2.cclzti2.ccmpdi2.ccmpti2.ccomparedf2.ccomparesf2.cctzdi2.cctzsi2.cctzti2.cffsdi2.cffsti2.cmulvdi3.cmulvsi3.cmulvti3.cnegdf2.cnegdi2.cnegsf2.cnegti2.cnegvdi2.cnegvsi2.cnegvti2.cparitydi2.cparitysi2.cparityti2.cpopcountdi2.cpopcountsi2.cpopcountti2.cppc/restFP.Sppc/saveFP.Ssubvdi3.csubvsi3.csubvti3.cucmpdi2.cucmpti2.cudivmodti4.c
Rust only exposes atomic types on platforms that support them, and therefore does not need to fall back to software implementations.
arm/sync_fetch_and_add_4.Sarm/sync_fetch_and_add_8.Sarm/sync_fetch_and_and_4.Sarm/sync_fetch_and_and_8.Sarm/sync_fetch_and_max_4.Sarm/sync_fetch_and_max_8.Sarm/sync_fetch_and_min_4.Sarm/sync_fetch_and_min_8.Sarm/sync_fetch_and_nand_4.Sarm/sync_fetch_and_nand_8.Sarm/sync_fetch_and_or_4.Sarm/sync_fetch_and_or_8.Sarm/sync_fetch_and_sub_4.Sarm/sync_fetch_and_sub_8.Sarm/sync_fetch_and_umax_4.Sarm/sync_fetch_and_umax_8.Sarm/sync_fetch_and_umin_4.Sarm/sync_fetch_and_umin_8.Sarm/sync_fetch_and_xor_4.Sarm/sync_fetch_and_xor_8.Sarm/sync_synchronize.Satomic.catomic_flag_clear.catomic_flag_clear_explicit.catomic_flag_test_and_set.catomic_flag_test_and_set_explicit.catomic_signal_fence.catomic_thread_fence.c
Miscellaneous functionality that is not used by Rust.
apple_versioning.cclear_cache.cemutls.cenable_execute_stack.ceprintf.cgcc_personality_v0.ctrampoline_setup.c
Floating-point implementations of builtins that are only called from soft-float code. It would be better to simply use the generic soft-float versions in this case.
i386/floatdidf.Si386/floatdisf.Si386/floatundidf.Si386/floatundisf.Sx86_64/floatundidf.Sx86_64/floatundisf.Sx86_64/floatdidf.cx86_64/floatdisf.c
License
The compiler-builtins crate is dual licensed under both the University of Illinois "BSD-Like" license and the MIT license. As a user of this code you may choose to use it under either license. As a contributor, you agree to allow your code to be used under both.
Full text of the relevant licenses is in LICENSE.TXT.