A modified version of compiler-builtins for zynq, with fast memcpy implementation adapted from newlib.
Go to file
bors b0300b16ed Auto merge of #164 - rust-lang-nursery:memclr, r=alexcrichton
optimize memset and memclr for ARM

This commit optimizes those routines by rewriting them in assembly and
performing the memory copying in 32-bit chunks, rather than in 8-bit chunks
as it was done before this commit. This assembly implementation is
compatible with the ARMv6 and ARMv7 architectures.

This change results in a reduction of runtime of about 40-70% in all cases
that matter (the compiler will never use these intrinsics for sizes smaller
than 4 bytes). See data below:

| Bytes | HEAD | this PR | diff       |
| ----- | ---- | ------- | ---------- |
| 0     | 6    | 14      | +133.3333% |
| 1     | 10   | 13      | +30%       |
| 2     | 14   | 13      | -7.1429%   |
| 3     | 18   | 13      | -27.77%    |
| 4     | 24   | 21      | -12.5%     |
| 16    | 70   | 36      | -48.5714%  |
| 64    | 263  | 97      | -63.1179%  |
| 256   | 1031 | 337     | -67.3133%  |
| 1024  | 4103 | 1297    | -68.389%   |

All times are in clock cycles. The measurements were done on a Cortex-M3
processor running at 8 MHz using the technique described [here].

[here]: http://blog.japaric.io/rtfm-overhead

---

For relevance all pure Rust programs for Cortex-M microcontrollers use memclr to
zero the .bss during startup so this change results in a quicker boot time.

Some questions / comments:

- ~~the original code (it had a bug) comes from this [repo] and it's licensed
  under the ICS license. I have preserved the copyright and license text in the
  source code. IANAL, is that OK?~~ no longer applies. The intrinsics are written in Rust now.

- ~~I don't know whether this ARM implementation works for ARMv4 or ARMv5.
  @FenrirWolf and @Uvekilledkenny may want to take look at it first.~~ no longer applies. The intrinsics are written in Rust now.

- ~~No idea whether this implementation works on processors that have no thumb
  instruction set. The current implementation uses 16-bit thumb instructions.~~ no longer applies. The intrinsics are written in Rust now.

- ~~The loop code can be rewritten in less instructions but using 32-bit thumb
  instructions. That 32-bit version would only work on ARMv7 though. I have yet
  to check whether that makes any difference in the runtime of the intrinsic.~~ no longer applies. The intrinsics are written in Rust now.

- ~~I'll look into memcpy4 next.~~ done

[repo]: https://github.com/bobbl/libaeabi-cortexm0
2017-07-01 07:27:55 +00:00
ci enable tests now that #150 has been fixed 2017-06-27 22:48:57 -05:00
compiler-rt@3bc0272cab move the compiler-rt submodule to the root 2017-04-10 11:23:03 -05:00
examples Enable the intrinsics program on thumb 2017-06-25 10:09:50 -07:00
src no aeabi_mem* symbols on iOS, weak symbols on thumb, normal symbols elsewhere 2017-06-30 18:06:25 -05:00
tests optimize 32-bit aligned mem{cpy,clr,set} intrinsics for ARM 2017-06-29 22:40:58 -05:00
.gitignore initial commit 2016-08-07 15:58:21 -05:00
.gitmodules move the compiler-rt submodule to the root 2017-04-10 11:23:03 -05:00
.travis.yml Remove the travis cache 2017-06-23 20:20:42 -07:00
appveyor.yml Tweak testing and such: 2017-06-24 10:10:04 -07:00
build.rs Don't build gcc_personality_v0 2017-06-24 11:36:05 -07:00
Cargo.toml Don't test mangled names on thumb 2017-06-24 12:54:35 -07:00
LICENSE.TXT Correct the license to that of upstream compiler-rt 2016-10-12 17:50:39 +00:00
README.md Mark the functions just implemented in README.md 2017-05-06 15:47:38 +02:00
thumbv6m-linux-eabi.json adapt the thumb target specs to upstream linker-flavor changes 2017-04-11 11:32:44 -05:00
thumbv7em-linux-eabi.json adapt the thumb target specs to upstream linker-flavor changes 2017-04-11 11:32:44 -05:00
thumbv7em-linux-eabihf.json adapt the thumb target specs to upstream linker-flavor changes 2017-04-11 11:32:44 -05:00
thumbv7m-linux-eabi.json adapt the thumb target specs to upstream linker-flavor changes 2017-04-11 11:32:44 -05:00

compiler-builtins

Build status Build Status

[WIP] Porting compiler-rt intrinsics to Rust

See rust-lang/rust#35437.

When and how to use this crate?

If you are working with a target that doesn't have binary releases of std available via rustup (this probably means you are building the core crate yourself) and need compiler-rt intrinsics (i.e. you are probably getting linker errors when building an executable: undefined reference to __aeabi_memcpy), you can use this crate to get those intrinsics and solve the linker errors. To do that, add this crate somewhere in the dependency graph of the crate you are building:

# Cargo.toml
[dependencies]
compiler_builtins = { git = "https://github.com/rust-lang-nursery/compiler-builtins" }
extern crate compiler_builtins;

// ...

If you still get an "undefined reference to $INTRINSIC" error after that change, that means that we haven't ported $INTRINSIC to Rust yet! Please open an issue with the name of the intrinsic and the LLVM triple (e.g. thumbv7m-none-eabi) of the target you are using. That way we can prioritize porting that particular intrinsic.

If you've got a C compiler available for your target then while we implement this intrinsic you can temporarily enable a fallback to the actual compiler-rt implementation as well for unimplemented intrinsics:

[dependencies.compiler_builtins]
git = "https://github.com/rust-lang-nursery/compiler-builtins"
features = ["c"]

Contributing

  1. Pick one or more intrinsics from the [pending list][#progress].
  2. Fork this repository
  3. Port the intrinsic(s) and their corresponding unit tests from their C implementation to Rust.
  4. Send a Pull Request (PR)
  5. Once the PR passes our extensive testing infrastructure, we'll merge it!
  6. Celebrate 🎉

Porting Reminders

  1. Rust and C have slightly different operator precedence. C evaluates comparisons (== !=) before bitwise operations (& | ^), while Rust evaluates the other way.
  2. C assumes wrapping operations everywhere. Rust panics on overflow when in debug mode. Consider using the Wrapping type or the explicit wrapping_* functions where applicable.
  3. Note C implicit casts, especially integer promotion. Rust is much more explicit about casting, so be sure that any cast which affects the output is ported to the Rust implementation.
  4. Rust has many functions for integer or floating point manipulation in the standard library. Consider using one of these functions rather than porting a new one.

Progress

  • adddf3.c
  • addsf3.c
  • arm/adddf3vfp.S
  • arm/addsf3vfp.S
  • arm/aeabi_dcmp.S
  • arm/aeabi_fcmp.S
  • arm/aeabi_idivmod.S
  • arm/aeabi_ldivmod.S
  • arm/aeabi_memcpy.S
  • arm/aeabi_memmove.S
  • arm/aeabi_memset.S
  • arm/aeabi_uidivmod.S
  • arm/aeabi_uldivmod.S
  • arm/divdf3vfp.S
  • arm/divmodsi4.S (generic version is done)
  • arm/divsf3vfp.S
  • arm/divsi3.S (generic version is done)
  • arm/eqdf2vfp.S
  • arm/eqsf2vfp.S
  • arm/extendsfdf2vfp.S
  • arm/fixdfsivfp.S
  • arm/fixsfsivfp.S
  • arm/fixunsdfsivfp.S
  • arm/fixunssfsivfp.S
  • arm/floatsidfvfp.S
  • arm/floatsisfvfp.S
  • arm/floatunssidfvfp.S
  • arm/floatunssisfvfp.S
  • arm/gedf2vfp.S
  • arm/gesf2vfp.S
  • arm/gtdf2vfp.S
  • arm/gtsf2vfp.S
  • arm/ledf2vfp.S
  • arm/lesf2vfp.S
  • arm/ltdf2vfp.S
  • arm/ltsf2vfp.S
  • arm/modsi3.S (generic version is done)
  • arm/muldf3vfp.S
  • arm/mulsf3vfp.S
  • arm/nedf2vfp.S
  • arm/negdf2vfp.S
  • arm/negsf2vfp.S
  • arm/nesf2vfp.S
  • arm/softfloat-alias.list
  • arm/subdf3vfp.S
  • arm/subsf3vfp.S
  • arm/truncdfsf2vfp.S
  • arm/udivmodsi4.S (generic version is done)
  • arm/udivsi3.S (generic version is done)
  • arm/umodsi3.S (generic version is done)
  • arm/unorddf2vfp.S
  • arm/unordsf2vfp.S
  • ashldi3.c
  • ashrdi3.c
  • divdf3.c
  • divdi3.c
  • divmoddi4.c
  • divmodsi4.c
  • divsf3.c
  • divsi3.c
  • extendhfsf2.c
  • extendsfdf2.c
  • fixdfdi.c
  • fixdfsi.c
  • fixsfdi.c
  • fixsfsi.c
  • fixunsdfdi.c
  • fixunsdfsi.c
  • fixunssfdi.c
  • fixunssfsi.c
  • floatdidf.c
  • floatdisf.c
  • floatsidf.c
  • floatsisf.c
  • floatundidf.c
  • floatundisf.c
  • floatunsidf.c
  • floatunsisf.c
  • i386/ashldi3.S
  • i386/ashrdi3.S
  • i386/chkstk.S
  • i386/chkstk2.S
  • i386/divdi3.S
  • i386/lshrdi3.S
  • i386/moddi3.S
  • i386/muldi3.S
  • i386/udivdi3.S
  • i386/umoddi3.S
  • lshrdi3.c
  • moddi3.c
  • modsi3.c
  • muldf3.c
  • muldi3.c
  • mulodi4.c
  • mulosi4.c
  • mulsf3.c
  • powidf2.c
  • powisf2.c
  • subdf3.c
  • subsf3.c
  • truncdfhf2.c
  • truncdfsf2.c
  • truncsfhf2.c
  • udivdi3.c
  • udivmoddi4.c
  • udivmodsi4.c
  • udivsi3.c
  • umoddi3.c
  • umodsi3.c
  • x86_64/chkstk.S
  • x86_64/chkstk2.S

These builtins are needed to support 128-bit integers, which are in the process of being added to Rust.

  • ashlti3.c
  • ashrti3.c
  • divti3.c
  • fixdfti.c
  • fixsfti.c
  • fixunsdfti.c
  • fixunssfti.c
  • floattidf.c
  • floattisf.c
  • floatuntidf.c
  • floatuntisf.c
  • lshrti3.c
  • modti3.c
  • muloti4.c
  • multi3.c
  • udivmodti4.c
  • udivti3.c
  • umodti3.c

Unimplemented functions

These builtins involve floating-point types ("f128", "f80" and complex numbers) that are not supported by Rust.

  • addtf3.c
  • comparetf2.c
  • divdc3.c
  • divsc3.c
  • divtc3.c
  • divtf3.c
  • divxc3.c
  • extenddftf2.c
  • extendsftf2.c
  • fixtfdi.c
  • fixtfsi.c
  • fixtfti.c
  • fixunstfdi.c
  • fixunstfsi.c
  • fixunstfti.c
  • fixunsxfdi.c
  • fixunsxfsi.c
  • fixunsxfti.c
  • fixxfdi.c
  • fixxfti.c
  • floatditf.c
  • floatdixf.c
  • floatsitf.c
  • floattixf.c
  • floatunditf.c
  • floatundixf.c
  • floatunsitf.c
  • floatuntixf.c
  • i386/floatdixf.S
  • i386/floatundixf.S
  • muldc3.c
  • mulsc3.c
  • multc3.c
  • multf3.c
  • mulxc3.c
  • powitf2.c
  • powixf2.c
  • ppc/divtc3.c
  • ppc/fixtfdi.c
  • ppc/fixunstfdi.c
  • ppc/floatditf.c
  • ppc/floatunditf.c
  • ppc/gcc_qadd.c
  • ppc/gcc_qdiv.c
  • ppc/gcc_qmul.c
  • ppc/gcc_qsub.c
  • ppc/multc3.c
  • subtf3.c
  • trunctfdf2.c
  • trunctfsf2.c
  • x86_64/floatdixf.c
  • x86_64/floatundixf.S

These builtins are never called by LLVM.

  • absvdi2.c
  • absvsi2.c
  • absvti2.c
  • addvdi3.c
  • addvsi3.c
  • addvti3.c
  • arm/aeabi_cdcmp.S
  • arm/aeabi_cdcmpeq_check_nan.c
  • arm/aeabi_cfcmp.S
  • arm/aeabi_cfcmpeq_check_nan.c
  • arm/aeabi_div0.c
  • arm/aeabi_drsub.c
  • arm/aeabi_frsub.c
  • arm/aeabi_memcmp.S
  • arm/bswapdi2.S
  • arm/bswapsi2.S
  • arm/clzdi2.S
  • arm/clzsi2.S
  • arm/comparesf2.S
  • arm/restore_vfp_d8_d15_regs.S
  • arm/save_vfp_d8_d15_regs.S
  • arm/switch16.S
  • arm/switch32.S
  • arm/switch8.S
  • arm/switchu8.S
  • clzdi2.c
  • clzsi2.c
  • clzti2.c
  • cmpdi2.c
  • cmpti2.c
  • comparedf2.c
  • comparesf2.c
  • ctzdi2.c
  • ctzsi2.c
  • ctzti2.c
  • ffsdi2.c
  • ffsti2.c
  • mulvdi3.c
  • mulvsi3.c
  • mulvti3.c
  • negdf2.c
  • negdi2.c
  • negsf2.c
  • negti2.c
  • negvdi2.c
  • negvsi2.c
  • negvti2.c
  • paritydi2.c
  • paritysi2.c
  • parityti2.c
  • popcountdi2.c
  • popcountsi2.c
  • popcountti2.c
  • ppc/restFP.S
  • ppc/saveFP.S
  • subvdi3.c
  • subvsi3.c
  • subvti3.c
  • ucmpdi2.c
  • ucmpti2.c
  • udivmodti4.c

Rust only exposes atomic types on platforms that support them, and therefore does not need to fall back to software implementations.

  • arm/sync_fetch_and_add_4.S
  • arm/sync_fetch_and_add_8.S
  • arm/sync_fetch_and_and_4.S
  • arm/sync_fetch_and_and_8.S
  • arm/sync_fetch_and_max_4.S
  • arm/sync_fetch_and_max_8.S
  • arm/sync_fetch_and_min_4.S
  • arm/sync_fetch_and_min_8.S
  • arm/sync_fetch_and_nand_4.S
  • arm/sync_fetch_and_nand_8.S
  • arm/sync_fetch_and_or_4.S
  • arm/sync_fetch_and_or_8.S
  • arm/sync_fetch_and_sub_4.S
  • arm/sync_fetch_and_sub_8.S
  • arm/sync_fetch_and_umax_4.S
  • arm/sync_fetch_and_umax_8.S
  • arm/sync_fetch_and_umin_4.S
  • arm/sync_fetch_and_umin_8.S
  • arm/sync_fetch_and_xor_4.S
  • arm/sync_fetch_and_xor_8.S
  • arm/sync_synchronize.S
  • atomic.c
  • atomic_flag_clear.c
  • atomic_flag_clear_explicit.c
  • atomic_flag_test_and_set.c
  • atomic_flag_test_and_set_explicit.c
  • atomic_signal_fence.c
  • atomic_thread_fence.c

Miscellaneous functionality that is not used by Rust.

  • apple_versioning.c
  • clear_cache.c
  • emutls.c
  • enable_execute_stack.c
  • eprintf.c
  • gcc_personality_v0.c
  • trampoline_setup.c

Floating-point implementations of builtins that are only called from soft-float code. It would be better to simply use the generic soft-float versions in this case.

  • i386/floatdidf.S
  • i386/floatdisf.S
  • i386/floatundidf.S
  • i386/floatundisf.S
  • x86_64/floatundidf.S
  • x86_64/floatundisf.S
  • x86_64/floatdidf.c
  • x86_64/floatdisf.c

License

The compiler-builtins crate is dual licensed under both the University of Illinois "BSD-Like" license and the MIT license. As a user of this code you may choose to use it under either license. As a contributor, you agree to allow your code to be used under both.

Full text of the relevant licenses is in LICENSE.TXT.