page_size is used in every nearly every pointer calculation in os.rs,
and the Stack methods are called fairly often. It's definitely not worth
spilling registers for to call out to a libc function.
With this change, page_size becomes effectively free. It is cached in an
atomic usize, with relaxed ordering, so no actual atomic operations are
involved.
Benchmark:
```
test bench_page_size ... bench: 5 ns/iter (+/- 1)
test bench_page_size_cached ... bench: 0 ns/iter (+/- 0)
```