Linear algebra library for Rust.
184f38b227
The existing algorithm for `column_variance` uses the textbook formula (`E[X^2]` - E[X]^2), which is well-established to have numerical issues. While the intention (traversal of the elements in column-major order) of the extant algorithm is apparent, we should not sacrifice precision when we do not need to -- the two-pass algorithm for variance (N.B. the existing algorithm is already a two-pass algorithm, anyway) using the formula `E[(x - E[x])(x - E[x]])` can be substituted without issue. Notably, the other variance implementations in the `statistics` module use `E[(x -E[x])(x - E[x]])`. Loss of precision aside, keeping the existing implementation of `column_variance` causes the obvious absurdity: ```rust use nalgebra::Matrix2x3; let m = Matrix2x3::new(1.0, 2.0, 3.0, 4.0, 5.0, 6.0); assert_ne!(m.column_variance().transpose(), m.transpose().row_variance()); ``` We can eliminate both the loss of precision the glaring inconsistency by switching to the implementation provided by this PR. For a comprehensive analysis of variance algorithms, see this [reference](https://ds.ifi.uni-heidelberg.de/files/Team/eschubert/publications/SSDBM18-covariance-authorcopy.pdf), in particular, Table 2. The "two-pass" described in the paper is the implementation given in this PR. In terms of simplicity (hence, easier to maintain), "two-pass" is a suitable choice; in terms of runtime performance and precision, it is a good balance (c.f. Youngs & Cramer and "textbook"). Furthermore, it is consistent with the variance algorithm used in the other "*variance" algorithms in the `statistics` module. |
||
---|---|---|
.github | ||
benches | ||
examples | ||
nalgebra-glm | ||
nalgebra-lapack | ||
nalgebra-macros | ||
nalgebra-sparse | ||
src | ||
tests | ||
.gitignore | ||
Cargo.toml | ||
CHANGELOG.md | ||
clippy.toml | ||
LICENSE | ||
README.md | ||
rustfmt.toml |