archive builds on zfs /tank #4

Closed
opened 2020-07-29 19:09:19 +08:00 by sb10q · 18 comments

This currently doesn't work very well... https://github.com/NixOS/hydra/issues/796

This currently doesn't work very well... https://github.com/NixOS/hydra/issues/796
Poster
Owner

Hydra is serving off the ZFS tank now, and things are mostly working, including the download links. So we now have 13TB of space to archive builds :)

Two issues remain:

  1. Would it be possible to check the system nix store first when hydra serves a store entry? Right now I had to nix copy the whole /nix/store into /tank/hydra so that old builds including dependencies are still accessible, which is quite unwieldy (especially considering the slowness and excessive memory consumption of nix copy) and also wastes a few dozen gigabytes. This would also allow serving builds that did not go through Hydra. And, less importantly, the system nix store is on NVMe SSD which is faster than the cheap spinning rust of the ZFS tank.
  2. This broke the functionality from the RESTRICTDIST patch.
Hydra is serving off the ZFS tank now, and things are mostly working, including the download links. So we now have 13TB of space to archive builds :) Two issues remain: 1. Would it be possible to check the system nix store first when hydra serves a store entry? Right now I had to `nix copy` the whole `/nix/store` into `/tank/hydra` so that old builds including dependencies are still accessible, which is quite unwieldy (especially considering the slowness and excessive memory consumption of `nix copy`) and also wastes a few dozen gigabytes. This would also allow serving builds that did not go through Hydra. And, less importantly, the system nix store is on NVMe SSD which is faster than the cheap spinning rust of the ZFS tank. 2. This broke the functionality from the RESTRICTDIST patch.
Poster
Owner

There is still some unexplained and potentially major breakage.

> nix-store -r /nix/store/0n0w7h88j13wdw2mimvghy0062r60ill-sinara-systems-99d3594
these paths will be fetched (0.00 MiB download, 0.12 MiB unpacked):
  /nix/store/0n0w7h88j13wdw2mimvghy0062r60ill-sinara-systems-99d3594
copying path '/nix/store/0n0w7h88j13wdw2mimvghy0062r60ill-sinara-systems-99d3594' from 'https://nixbld.m-labs.hk'...
error 10 while decompressing xz file
error: build of '/nix/store/0n0w7h88j13wdw2mimvghy0062r60ill-sinara-systems-99d3594' failed
There is still some unexplained and potentially major breakage. ```text > nix-store -r /nix/store/0n0w7h88j13wdw2mimvghy0062r60ill-sinara-systems-99d3594 these paths will be fetched (0.00 MiB download, 0.12 MiB unpacked): /nix/store/0n0w7h88j13wdw2mimvghy0062r60ill-sinara-systems-99d3594 copying path '/nix/store/0n0w7h88j13wdw2mimvghy0062r60ill-sinara-systems-99d3594' from 'https://nixbld.m-labs.hk'... error 10 while decompressing xz file error: build of '/nix/store/0n0w7h88j13wdw2mimvghy0062r60ill-sinara-systems-99d3594' failed ```
Poster
Owner

Seems to affect those store entries that have been copied with nix copy and not with hydra-queue-runner...

Seems to affect those store entries that have been copied with `nix copy` and not with hydra-queue-runner...
Poster
Owner

Disabled file:// store for now.

Disabled file:// store for now.
Poster
Owner

Involved files in the Hydra store:

Involved files in the Hydra store:

This corruption still eludes me.

Could please you attach a dump of the HTTP response which contains the actual corruption for the .nar.xz in hydra-bug.tar? Maybe a diff of these is going to hint at what is going on.

This corruption still eludes me. Could please you attach a dump of the HTTP response which contains the actual corruption for the .nar.xz in hydra-bug.tar? Maybe a diff of these is going to hint at what is going on.
Poster
Owner

This corruption still eludes me.

What did you try?
Does spawning an Hydra instance with the files I posted in a file:// store_uri fail to reproduce the problem?

> This corruption still eludes me. What did you try? Does spawning an Hydra instance with the files I posted in a ``file://`` store_uri fail to reproduce the problem?

Does spawning an Hydra instance with the files I posted in a file:// store_uri fail to reproduce the problem?

That is exactly what I did. Then I put them through Hydra/http and they decompress just fine.

I don't have a clue where this is being corrupted. There is no mangling of the .nar.xz payload. We have already tried opening the file in raw mode.

Please share a curl -D- ...nar.xz > dump so that I can investigate the type of corruption.

> Does spawning an Hydra instance with the files I posted in a file:// store_uri fail to reproduce the problem? That is exactly what I did. Then I put them through Hydra/http and they decompress just fine. I don't have a clue where this is being corrupted. There is no mangling of the .nar.xz payload. We have already tried opening the file in raw mode. Please share a `curl -D- ...nar.xz > dump` so that I can investigate the type of corruption.
Poster
Owner

I couldn't reproduce the issue with the original path - it now works normally. But now /nix/store/b433sdzddsafbr7mzpjycys0bxhxzby4-openocd-mlabs-0.10.0 is affected. Hydra is serving an empty nar.xz file.

I couldn't reproduce the issue with the original path - it now works normally. But now ``/nix/store/b433sdzddsafbr7mzpjycys0bxhxzby4-openocd-mlabs-0.10.0`` is affected. Hydra is serving an empty ``nar.xz`` file.
Poster
Owner

Nothing reported in journalctl -xe -u hydra-server.service

Nothing reported in ``journalctl -xe -u hydra-server.service``
Poster
Owner

Hydra is serving an empty nar.xz file.

Mh it actually sends empty files on any /nar/xxx URL... what kind of error behavior is that.

> Hydra is serving an empty nar.xz file. Mh it actually sends empty files on any ``/nar/xxx`` URL... what kind of error behavior is that.
Poster
Owner

@astro you should be able to reproduce the "return empty file on error" problem and fix it. Just try to access http://hydra/nar/bogusfile with your patch applied.

@astro you should be able to reproduce the "return empty file on error" problem and fix it. Just try to access ``http://hydra/nar/bogusfile`` with your patch applied.
Poster
Owner

Without your patch I correctly get a 404, so it's related.

Without your patch I correctly get a 404, so it's related.

Ok, I added some error handling for that case to PR #11.

Also, I ran fetches for all files in my binary cache store. They decompressed and nix-store --restore without error.

Ok, I added some error handling for that case to PR #11. Also, I ran fetches for all files in my binary cache store. They decompressed and `nix-store --restore` without error.
Poster
Owner

Now there are errors such as

file 'nar/3mk294vsb4b62rg9g0f3ffna616i611a-python3.8-llvmlite-artiq-0.23.0.dev' does not exist in binary cache 'https://nixbld.m-labs.hk'

for things that actually exist.

Now there are errors such as ``` file 'nar/3mk294vsb4b62rg9g0f3ffna616i611a-python3.8-llvmlite-artiq-0.23.0.dev' does not exist in binary cache 'https://nixbld.m-labs.hk' ``` for things that actually exist.

What are you doing to cause such a path to be requested? Both filetypes (/*.narinfo and /nar/*.nar.xz) have only a hash in their filename. Is this a client error?

Access patterns look like this for me (running nix-env -i /nix/store/b433sdzddsafbr7mzpjycys0bxhxzby4-openocd-mlabs-0.10.0, running nginx in the middle):

10.23.23.6 - - [23/Feb/2021:00:01:18 +0000] "GET /nix-cache-info HTTP/1.1" 200 52 "-" "curl/7.70.0 Nix/2.3.6"
10.23.23.6 - - [23/Feb/2021:00:01:18 +0000] "GET /b433sdzddsafbr7mzpjycys0bxhxzby4.narinfo HTTP/1.1" 200 816 "-" "curl/7.70.0 Nix/2.3.6"
10.23.23.6 - - [23/Feb/2021:00:01:23 +0000] "GET /nar/1j8bmi2m18rpfkpgsyf5mzg58gdi7xvbxvcn3lkkihh7lz72cvy3.nar.xz HTTP/1.1" 200 1281528 "-" "curl/7.70.0 Nix/2.3.6"
What are you doing to cause such a path to be requested? Both filetypes (`/*.narinfo` and `/nar/*.nar.xz`) have only a hash in their filename. Is this a client error? Access patterns look like this for me (running `nix-env -i /nix/store/b433sdzddsafbr7mzpjycys0bxhxzby4-openocd-mlabs-0.10.0`, running nginx in the middle): ``` 10.23.23.6 - - [23/Feb/2021:00:01:18 +0000] "GET /nix-cache-info HTTP/1.1" 200 52 "-" "curl/7.70.0 Nix/2.3.6" 10.23.23.6 - - [23/Feb/2021:00:01:18 +0000] "GET /b433sdzddsafbr7mzpjycys0bxhxzby4.narinfo HTTP/1.1" 200 816 "-" "curl/7.70.0 Nix/2.3.6" 10.23.23.6 - - [23/Feb/2021:00:01:23 +0000] "GET /nar/1j8bmi2m18rpfkpgsyf5mzg58gdi7xvbxvcn3lkkihh7lz72cvy3.nar.xz HTTP/1.1" 200 1281528 "-" "curl/7.70.0 Nix/2.3.6" ```
Poster
Owner

What are you doing to cause such a path to be requested?

nix-shell on a regular shell.nix file to install ARTIQ. The error message is as printed by Nix, I do not know if it corresponds to the actual URL.

> What are you doing to cause such a path to be requested? ``nix-shell`` on a regular ``shell.nix`` file to install ARTIQ. The error message is as printed by Nix, I do not know if it corresponds to the actual URL.
Poster
Owner

New server arrived with root-on-ZFS.

New server arrived with root-on-ZFS.
sb10q closed this issue 2021-06-15 21:18:39 +08:00
Sign in to join this conversation.
No Label
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: M-Labs/it-infra#4
There is no content yet.