homegrown NAS rebuild

brendan_kearney · May 3, 2024

my NAS OS drive spit the bit and i rebuilt the OS on a new disk, and now i need to get all the data disks online. the data disks are a software RAID 5, using mdadm, with LVM and separate mountpoints for each of the LVs. i have the array created, /dev/md0, as well as the PV and VG. i now need to find the LVs and get them recognized and mounted. it has been a while since i did all of this and i am a bit rusty. also, my notes on how to do all of this are on one of the LVs. does anyone have some quick tips on how to discover the LVs within the VG, and get them mounted? i am pretty close, but dont have these last few pieces put together.

hal_incandenza · May 3, 2024

lvdisplay and lvmdiskscan should show you what you have

brendan_kearney · May 3, 2024

using the short versions, for brevity. the PV and VG are there, but the LVs are not showing up. scanning or disabling, exporting, importing and enabling all did nothing.

pvs:

Code:

  PV         VG            Fmt  Attr PSize    PFree   
  /dev/md0   vg_nas_export lvm2 a--    <8.19t   <8.19t
  /dev/sde4  vg_nas        lvm2 a--  <463.76g <436.76g

vgs:

Code:

  VG            #PV #LV #SN Attr   VSize    VFree   
  vg_nas          1   6   0 wz--n- <463.76g <436.76g
  vg_nas_export   1   0   0 wz--n-   <8.19t   <8.19t

lvs:

Code:

  LV               VG     Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  lv_root          vg_nas -wi-ao---- 15.00g                                                    
  lv_swap          vg_nas -wi-ao----  8.00g                                                    
  lv_var_lib_iscsi vg_nas -wi-ao----  1.00g                                                    
  lv_var_lib_nfs   vg_nas -wi-ao----  1.00g                                                    
  lv_var_lib_samba vg_nas -wi-ao----  1.00g                                                    
  lv_var_log       vg_nas -wi-ao----  1.00g

i tried mounting /dev/md0 to /mnt, and it tells me that its an unknown filesystem, which stands to reason, but makes me think that i am missing something about the RAID array being an LVM or something else, and i am missing a step to get things recognized...

Code:

[root@nas ~]# mount /dev/md0 /mnt
mount: /mnt: unknown filesystem type 'LVM2_member'.
       dmesg(1) may have more information after failed mount system call.

of course, there is nothing in dmesg.

brendan_kearney · May 3, 2024

for giggles, i tried to create one of the "missing" LVs, testing things first...

Code:

[root@nas ~]# lvcreate -t -L 1T vg_nas_export -n lv_movies
  TEST MODE: Metadata will NOT be updated and volumes will not be (de)activated.
  Logical volume "lv_movies" created.

then, i tried for real...

Code:

[root@nas ~]# lvcreate -L 1T vg_nas_export -n lv_movies
WARNING: xfs signature detected on /dev/vg_nas_export/lv_movies at offset 0. Wipe it? [y/n]: n
  Aborted wiping of xfs.
  1 existing signature left on the device.
  Failed to wipe signatures on logical volume vg_nas_export/lv_movies.
  Aborting. Failed to wipe start of new LV.

the filesystem(s) are there, as is the data it seems, but why doesn't the system identify them with lvscan/lvs/etc?

wobblytickle · May 4, 2024

and cat /proc/mdstat says /dev/md0 is healthy right? all members U

wobblytickle · May 4, 2024

very curious. I would try deactivating and reactivating vg_nas_export but you said you did that already. I'd not bother with any lv* commands at this point but it'd be good to have the mdstat output and pvdisplay, vgdisplay for the missing pv/vg/lv

brendan_kearney · May 4, 2024

mdstat:

Code:

[root@nas ~]# cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] 
md0 : active raid5 sda1[0] sdc1[2] sdd1[4] sdb1[1]
      8790398976 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 0/22 pages [0KB], 65536KB chunk

unused devices: <none>

pvdisplay:

Code:

[root@nas ~]# pvdisplay
  --- Physical volume ---
  PV Name               /dev/sde4
  VG Name               vg_nas
  PV Size               <463.76 GiB / not usable 2.00 MiB
  Allocatable           yes 
  PE Size               4.00 MiB
  Total PE              118722
  Free PE               111810
  Allocated PE          6912
  PV UUID               F47qcD-I3Hs-RRXh-3bER-r9e7-Xo96-342d9d
   
  --- Physical volume ---
  PV Name               /dev/md0
  VG Name               vg_nas_export
  PV Size               <8.19 TiB / not usable 2.00 MiB
  Allocatable           yes 
  PE Size               4.00 MiB
  Total PE              2146093
  Free PE               2146093
  Allocated PE          0
  PV UUID               stCm4i-7Xe6-xKfA-udTp-avxw-KCj9-IAAPCd

vgdisplay:

Code:

[root@nas ~]# vgdisplay
  --- Volume group ---
  VG Name               vg_nas
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  7
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                6
  Open LV               6
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               <463.76 GiB
  PE Size               4.00 MiB
  Total PE              118722
  Alloc PE / Size       6912 / 27.00 GiB
  Free  PE / Size       111810 / <436.76 GiB
  VG UUID               UqILMh-8YQ3-DYyE-RKRs-5Jsy-4i5a-yFIcJR
   
  --- Volume group ---
  VG Name               vg_nas_export
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  11
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                0
  Open LV               0
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               <8.19 TiB
  PE Size               4.00 MiB
  Total PE              2146093
  Alloc PE / Size       0 / 0   
  Free  PE / Size       2146093 / <8.19 TiB
  VG UUID               qJLa1u-JmnG-t1yk-qLif-Fmy0-2NJV-n4kn2T

brendan_kearney · May 4, 2024

wierdly, in /etc/lvm/archive there are files with contents like this:

Code:

[root@nas archive]# cat vg_nas_export_00010-1535023889.vg 
# Generated by LVM2 version 2.03.23(2) (2023-11-21): Fri May  3 16:13:24 2024

contents = "Text Format Volume Group"
version = 1

description = "Created *before* executing 'lvcreate -L 1T vg_nas_export -n lv_movies'"

creation_host = "nas"    # Linux nas 6.8.7-300.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Apr 17 19:21:08 UTC 2024 x86_64
creation_time = 1714767204    # Fri May  3 16:13:24 2024

vg_nas_export {
    id = "qJLa1u-JmnG-t1yk-qLif-Fmy0-2NJV-n4kn2T"
    seqno = 10
    format = "lvm2"            # informational
    status = ["RESIZEABLE", "READ", "WRITE"]
    flags = []
    extent_size = 8192        # 4 Megabytes
    max_lv = 0
    max_pv = 0
    metadata_copies = 0

    physical_volumes {

        pv0 {
            id = "stCm4i-7Xe6-xKfA-udTp-avxw-KCj9-IAAPCd"
            device = "/dev/md0"    # Hint only

            device_id_type = "md_uuid"
            device_id = "a009f443-eec0-57b7-1cfc-96d560d83f40"
            status = ["ALLOCATABLE"]
            flags = []
            dev_size = 17580797952    # 8.1867 Terabytes
            pe_start = 3072
            pe_count = 2146093    # 8.1867 Terabytes
        }
    }

    logical_volumes {

        lv_movies {
            id = "8NrM2U-pOiD-hMsM-HHfl-Rred-YZT4-RNCXgU"
            status = ["READ", "WRITE", "VISIBLE"]
            flags = []
            creation_time = 1714767196    # 2024-05-03 16:13:16 -0400
            creation_host = "nas"
            segment_count = 1

            segment1 {
                start_extent = 0
                extent_count = 262144    # 1024 Gigabytes

                type = "striped"
                stripe_count = 1    # linear

                stripes = [
                    "pv0", 0
                ]
            }
        }
    }

}

i did not run the command that is listed... this seems like some kind of enumeration attempt that did not succeed in creating the LVs

wobblytickle · May 4, 2024

brendan_kearney said:
description = "Created before executing 'lvcreate -L 1T vg_nas_export -n lv_movies'"

i did not run the command that is listed... this seems like some kind of enumeration attempt that did not succeed in creating the LVs

But you did on friday no in this post? this is super, super weird. Must admit I'm at a loss but I will have a poke around the LVM stuff I have here and have a think

brendan_kearney · May 12, 2024

i've been away for the week, in training, and have not been near my gear. picking back up on this...

you are correct, i did run that and i must have misconstrued the dates or times. i jumped 4 releases from fedora 36 to fedora 40 when i did the rebuild with the new SSD, so i am going to try rebuilding on f36 and see if that helps. maybe there are some backwards compatibility issues between the versions, and the original OS version may get me to my data. it will be a bit before i have everything ready for the f36 rebuild but will report back with any progress.

zelmak · May 18, 2024

I'm not a Fedora aficionado, but, with the command you ran, it mentioned XFS signature found and it aborted. Are the XFS file system tools installed on your new build?

brendan_kearney · May 18, 2024

ugg... the rig is built with fedora 36, but i cant get console because of some video issue. a previous thread lead me to think nomodeset could help, but alas, it seems there is some other issue with 36. 40 actually works out of the box. i need to rebuild on 40 for console, and configure networking etc. then i can get on it and futz around. testdisk seems like it might help recover the partitions. the sad thing is that testdisk does not work with XFS, as it says Support for this filesystem hasn't been implemented.. i am hoping that i can recover the partitions and the filesystems will be intact. otherwise i need to find other recovery options.

wobblytickle · May 19, 2024

ooi what's the content of /dev/mapper and the listing of /dev/dm*?

brendan_kearney · Jun 27, 2024

its been a while since i really did anything with the NAS, as i have grown disenfranchised with the whole lot of it. its the second time my OS disk when belly up and took my data with it. the data being on separate spinning rust does not seem sufficient to isolate things when an SSD goes FUBAR. that has me thinking about why.

there is another thread about inodes and i am wondering if there is a correlation between my OS SSD going bad and my data being clobbered. i installed the testdisk package and it can see the filesystems. photorec can see the data and i need to get a separate disk to put the recovered files onto, as a recovery effort. but how do i avoid having the OS disk destroy the records of where the data is located on other filesystems?

it seems there should be a way to properly isolate things so that one bad disk does not affect the ability to keep track of data on another disk/array/filesystem.

of course, if i had proper backups this would be moot, but here i am...

brendan_kearney · Jun 27, 2024

wobblytickle said:
ooi what's the content of /dev/mapper and the listing of /dev/dm*?

Code:

[root@nas ~]# ll /dev/mapper/
total 0
crw------- 1 root root 10, 236 May 31 14:10 control
lrwxrwxrwx 1 root root       7 May 31 13:34 vg_nas-lv_root -> ../dm-0
lrwxrwxrwx 1 root root       7 May 31 13:34 vg_nas-lv_swap -> ../dm-1
lrwxrwxrwx 1 root root       7 May 31 13:34 vg_nas-lv_var_lib_iscsi -> ../dm-5
lrwxrwxrwx 1 root root       7 May 31 13:34 vg_nas-lv_var_lib_nfs -> ../dm-4
lrwxrwxrwx 1 root root       7 May 31 13:34 vg_nas-lv_var_lib_samba -> ../dm-3
lrwxrwxrwx 1 root root       7 May 31 13:34 vg_nas-lv_var_log -> ../dm-2

and

Code:

[root@nas ~]# ll /dev/dm*
brw-rw---- 1 root disk 253, 0 May 31 13:34 /dev/dm-0
brw-rw---- 1 root disk 253, 1 May 31 13:34 /dev/dm-1
brw-rw---- 1 root disk 253, 2 May 31 13:34 /dev/dm-2
brw-rw---- 1 root disk 253, 3 May 31 13:34 /dev/dm-3
brw-rw---- 1 root disk 253, 4 May 31 13:34 /dev/dm-4
brw-rw---- 1 root disk 253, 5 May 31 13:34 /dev/dm-5

wobblytickle · Jun 27, 2024

having a re-read of the thread to refresh myself. I think the /etc/lvm/archive stuff might be as a result of you doing the dry-run of the create/and or you said no. The thing that I find odd now on the second reading is

Code:

  --- Volume group ---
  VG Name               vg_nas_export
...
  Free  PE / Size       2146093 / <8.19 TiB
...

which to be fair to you is also in the vgs output.

What does pvs -v --segments say? also lvs -o vg_all

brendan_kearney · Jun 27, 2024

Code:

[root@nas ~]# pvs -v --segments
  PV         VG            Fmt  Attr PSize    PFree    Start SSize   LV               Start Type   PE Ranges         
  /dev/md0   vg_nas_export lvm2 a--    <8.19t   <8.19t     0 2146093                      0 free                     
  /dev/sde5  vg_nas        lvm2 a--  <463.76g <436.76g     0    2048 lv_swap              0 linear /dev/sde5:0-2047   
  /dev/sde5  vg_nas        lvm2 a--  <463.76g <436.76g  2048     256 lv_var_log           0 linear /dev/sde5:2048-2303
  /dev/sde5  vg_nas        lvm2 a--  <463.76g <436.76g  2304     256 lv_var_lib_samba     0 linear /dev/sde5:2304-2559
  /dev/sde5  vg_nas        lvm2 a--  <463.76g <436.76g  2560     256 lv_var_lib_nfs       0 linear /dev/sde5:2560-2815
  /dev/sde5  vg_nas        lvm2 a--  <463.76g <436.76g  2816     256 lv_var_lib_iscsi     0 linear /dev/sde5:2816-3071
  /dev/sde5  vg_nas        lvm2 a--  <463.76g <436.76g  3072    3840 lv_root              0 linear /dev/sde5:3072-6911
  /dev/sde5  vg_nas        lvm2 a--  <463.76g <436.76g  6912  111810                      0 free

and

Code:

[root@nas ~]# lvs -o vg_all
  Fmt  VG UUID                                VG     Attr   VPerms     Extendable Exported   Partial    AllocPol   Clustered  Shared  VSize    VFree    SYS ID System ID LockType VLockArgs Ext   #Ext   Free   MaxLV MaxPV #PV #PV Missing #LV #SN Seq VG Tags VProfile #VMda #VMdaUse VMdaFree  VMdaSize  #VMdaCps
  lvm2 rnzkM9-vYI1-Nim1-C914-WWM8-oe2n-g3iduD vg_nas wz--n- writeable  extendable                       normal                        <463.76g <436.76g                                     4.00m 118722 111810     0     0   1           0   6   0   7                      1        1   506.50k  1020.00k unmanaged
  lvm2 rnzkM9-vYI1-Nim1-C914-WWM8-oe2n-g3iduD vg_nas wz--n- writeable  extendable                       normal                        <463.76g <436.76g                                     4.00m 118722 111810     0     0   1           0   6   0   7                      1        1   506.50k  1020.00k unmanaged
  lvm2 rnzkM9-vYI1-Nim1-C914-WWM8-oe2n-g3iduD vg_nas wz--n- writeable  extendable                       normal                        <463.76g <436.76g                                     4.00m 118722 111810     0     0   1           0   6   0   7                      1        1   506.50k  1020.00k unmanaged
  lvm2 rnzkM9-vYI1-Nim1-C914-WWM8-oe2n-g3iduD vg_nas wz--n- writeable  extendable                       normal                        <463.76g <436.76g                                     4.00m 118722 111810     0     0   1           0   6   0   7                      1        1   506.50k  1020.00k unmanaged
  lvm2 rnzkM9-vYI1-Nim1-C914-WWM8-oe2n-g3iduD vg_nas wz--n- writeable  extendable                       normal                        <463.76g <436.76g                                     4.00m 118722 111810     0     0   1           0   6   0   7                      1        1   506.50k  1020.00k unmanaged
  lvm2 rnzkM9-vYI1-Nim1-C914-WWM8-oe2n-g3iduD vg_nas wz--n- writeable  extendable                       normal                        <463.76g <436.76g                                     4.00m 118722 111810     0     0   1           0   6   0   7                      1        1   506.50k  1020.00k unmanaged

wobblytickle · Jun 27, 2024

pvscan --cache? clutching at straws now. If testdisk can see the the LVMs they're clearly still there. Anything in /etc/lvm/backup?

teubbist · Jun 27, 2024

testdisk might just see the XFS filesystems, and assuming the original LV's weren't messed with much(i.e. alloc'd but never resized) they might all just be linear allocations. In which case you might be able to mount them with some block offset shenanigans and loopback devices.

But yes, the LV's being gone is a bit weird. If an extra PV was involved in the VG and LV metadata landed on that somehow I'd have thought LVM would complain bitterly about it. The only other theory I can come up with is that grub(or something) wiped/overwrote the LV metadata blocks when the OS was installed but left enough of the PV/VG blocks in place for that to be detectable.

brendan_kearney said:
it seems there should be a way to properly isolate things so that one bad disk does not affect the ability to keep track of data on another disk/array/filesystem.

Properly configured LVM should behave this way. Otherwise, getting rid of abstractions is the only way to isolate yourself from them breaking. In this case, just using the raw MD device without LVM.

wobblytickle · Jun 27, 2024

teubbist said:
In which case you might be able to mount them with some block offset shenanigans and loopback devices

yeah this is where my thoughts were leading. Looking at a random box here with some lvm stuff on it it looks like the contents of /etc/lvm/backup were created at the last boot, I wonder if you have them and if they look sane(ish) then maybe vgcfgrestore is your friend... but once you're getting down this road you want to image that metadevice first...

brendan_kearney · Jun 27, 2024

testdisk can see the LVM partitions but cant do anything with them. some limitation in dealing with XFS, i think. photorec can read the files. my thinking is to recover everything i can with photorec onto an external drive and pave over things. imaging the array first may also be something i do before wiping.

when i built the NAS, i physically removed the disks because the fedora installer would not allow me to install to /dev/sde, since the SSD is connected to the "cdrom" SATA header. its enumerated last in the order and the 4 HDDs were the only option to install to. there should be no anaconda/installer impact to the disks.

i guess i dont know what "properly configured LVM" is, since a buggered SSD has twice fowled up my data disks. it seems that the inode tables are corrupted because of the failing SSD.

wobblytickle · Jun 27, 2024

if testdisk can see the lv's can you get it to image them to a second drive you would then at lease be able to mount them (or as @teubbist if it can give you the start and the size you can mount them on a loop device)

teubbist · Jun 27, 2024

~~Unless I've missed a memo, testdisk doesn't understand LVM~~. Looks like I did miss a memo and it has some LVM support.

Are the partitions it finds marked as LVM or XFS? If XFS then you might be able to use the start/end positions with losetup(read only mode, or work on a clone) to try mount the partitions as loopback devices.

brendan_kearney · Jun 28, 2024

wobblytickle said:
pvscan --cache? clutching at straws now. If testdisk can see the the LVMs they're clearly still there. Anything in /etc/lvm/backup?

Code:

[root@nas ~]# pvscan --cache
  pvscan[74030] PV /dev/md0 online.
  pvscan[74030] PV /dev/sde5 online.

brendan_kearney · Jul 17, 2024

i am looking into what testdisk and photorec can do, and dont know if a use case i have is covered. on the NAS drives are photos taken from my camera, which have a naming convention like IMG_0001.JPG, where the camera increments the number each time you take a pic. i have copied pics off the camera, onto the NAS, and removed the photos from the SD card in the camera. this resets the number in the file naming convention. i have different folders on the NAS with files that would have naming convention conflicts, if they were in the same folder.

when i try to use photorec, will it understand and create the directories found and the restore files along the directory structure, or will all files wind up in one flat directory? if the latter, will photorec add something like _1 to a duplicate filename? how can manage the fact that i have files with conflicting names, but were in different directories in the original state?

wobblytickle · Jul 17, 2024

hi @brendan_kearney I really should have seen this yesterday but me and the forum notifications... don't seem to get along

photorec is just looking at bytes on the target storage, if it finds something it recognises it copies that to a recovery directory tree: recup_dir.1, recup_dir.2... recup.dirN or something along those lines. Files within a get an arbitrary name I think based on the inode+magic but would need to rtfm. It's a forensic recovery tool which means if the storage you're recovering from was general purpose you end up with all sorts of weird and wonderful browser cache gifs and pngs, fragments of text files and the rest.

brendan_kearney · Jul 17, 2024

ha, i posted earlier today. no worries on timing. im not paying you a contract

the storage was some block and mostly file services, so not a lot of wierd wonderment that i should have to worry about. if there is a directory structure to the recovered files, then i should be able to manage. the manual effort to name things will be the large effort.

i will probably buy a couple 12 TB Iron Wolf disks, dd the raid array to one of the Iron Wolfs and restore the files to the other, and begin naming and identifying all the files. embrace the suck... and maybe back my $#!t up.

brendan_kearney · Jul 31, 2024

i got the drives that i am going to use for the recovery and eventual backups of the data to be stored on the rebuilt nas, but i am wondering about filesystems and which to use. zfs and btrfs are not what i am looking for, as the os is fedora and neither are "fully baked" options on fedora. i used xfs and dont seem to have issues with that, except that testdisk and photorec are not as capable with xfs. is ext4 or some other filesystem more appropriate for a nas? what are others doing for nas filesystems, without using zfs or btrfs?

brendan_kearney · Jul 31, 2024

connected my toaster, loaded the HDD and started the dd of the array. we'll see what can be recovered onto the second HDD, once the dd completes.

wobblytickle · Jul 31, 2024

good luck, I fear you're going to have a lot of recovered intact files but with garbage filenames

I personally wouldn't go near btrfs and given all the other constraints I suppose you're left with ext4...

rodalpho · Jul 31, 2024

Btrfs is fine so long as you don’t use it for RAID. Synology doesn’t.

EXT4 should be fine in a mdraid too.

Why are you running your NAS on fedora?

steelghost · Aug 1, 2024

I have nothing technical to add regarding LVM or XFS, but I can't help feeling that if this NAS system is one you want to store data safely, you'd be better off putting your energies into setting up something like TrueNAS, plus some sort of backups. That way you benefit from all the focus and attention that goes into that codebase.

brendan_kearney · Aug 1, 2024

i am running fedora on all my boxes, and want to stay on fedora for the sake of ease and consistency. the amount of services and capability i have running on my network kinda requires simple and repeatable setup, install, etc. the ability to migrate from version to version with my configs has become pretty important because i rely on those services being available. in short, its just easier to have one consistent OS that i am familiar with, that i can roll over easily, and have running quickly after rebuild. also, RHEL is so prevalent in enterprise data centers, that knowing fedora really helped me in my days as an engineer.

moving to something like TrueNAS or other "feature specific" distros is not what i want to do. i'm not interested in learning new distros because what i want to learn is the service and protocol, not the nuances of how 47 different OS's implement them. professionally, i've moved out of the engineering role and i no longer need to have my thumb on the pulse of technology. with my personal gear, i can focus on the areas of interest i have and continue to be aware of what is going on in the world of IT. to be honest, fedora has served me well and i am happy with what i have going on. sometimes you find these corner cases and they need to be worked through, but all in all fedora has been really solid.

i moved out of a place i had with the ex-girlfriend and my backup HDDs spit the bit. since then, i have not replaced them and my monthly backup routine fell off. no excuses, its pure laziness and all my fault that i dont have backups. moving to a new distro wont be the kick in the ass i need to keep good backups, so i dont see a compelling reason to move to a different distro for backups. to me, the distro is not the problem, i am, so deal with the problem and not symptoms. get disks and do backups.

Code:

[root@nas ~]# dd if=/dev/md0 of=/dev/sdf bs=4096
2197599744+0 records in
2197599744+0 records out
9001368551424 bytes (9.0 TB, 8.2 TiB) copied, 36706.1 s, 245 MB/s

the dd completed overnight and now i am going to put the second 12 TB disk in the toaster and start photorec to recover everything i can. three are some scripts that can be used to bring filenames back to a lot of the data, and i intend on using them to quickly get the recovered files named properly. hopefully, the bulk of the renaming work can be automated.

as for the filesystem of choice for the rebuilt NAS, ext4 or xfs almost does not matter to me, but what i want to understand better is the relationship between the OS, the disk the OS is on, and the data on the data disks. how have i bungled my data by having an OS disk go bad? how do i avoid this scenario in the future? i have a single SSD that the OS is installed on. i have 4 HDDs in a mdadm RAID5 config. when the OS disk went bad, i lost the ability to access the data on the 4 HDDs. this tells me that something on the OS disk "maps" the location of the files on the RAID array. when the disk went bad, that "mapping" was lost or corrupted and thats why i cannot access the data on the array. the data is clearly stll there, but no OS can get to the data. to me, this seems to be a failure, and i want to address that and not have to deal with it in the future. what insights do folks have around how an OS keeps track of data on different disks?

brendan_kearney · Aug 1, 2024

great fucking yay... the second brand new 12 TB disk is dead out of the box. now i have to make a 3 hour round trip drive to microcenter to get another one. oh the joy.

koala · Aug 1, 2024

I have a quite similar viewpoint (although I use EL clones instead of Fedora- Fedora moves too fast for me).

You can disregard the following if you feel it's arguing and you don't want to argue, but:

I make an exception on my all-EL policy for my NASes and hypervisors because... Proxmox is really great, because it's one of the few Linux distros which supports everything-on-ZFS, which is superconvenient. Also, it's LXC support is great, so I tend to run LXC instead of VMs whenever I can, which saves on a ton of resources.

(Actually, I discovered Incus which is even nicer- it's much easier to automate than Proxmox. OTOH, although it likes ZFS, it's not a full distro with a ZFS installer like Proxmox).
Fedora does not "support" everything-on-ZFS, but there are ZFS packages in a zfsonlinux.org repo.

I really really like supporting a single distro in my infrastructure as code, but ZFS is so nice that it shifts a bit the tradeoff for me.

(I also do infra as code as close as I can to "I can reprovision anything with very little effort", and I know it multiplies the up-front work, and I know first-hand that supporting multiple distros adds a lot of cost.)

(Additionally, I understand your viewpoint. I'm doing my personal infra for a lot of time and... there's a lot of stuff I'd do differently now, but I don't feel it's worthwhile to switch at this point- too much effort for so little spare time.)

I'm just saying: I'm a similar place, but ZFS is really worthwhile. It's not only that I feel that checksumming is vital for a NAS, but also ZFS has so many just nice stuff (send/recv, transparent compression, etc.).

brendan_kearney · Aug 1, 2024

no arguments. i get why zfs is attractive and i know the zol project is trying to move zfs into a "gpl-like" license structure for full distribution in the open source ecosystem. regardless of distro, zfs is another large investment in learning, too, in order to get it right.

i dont begrudge you the learning you have gained along your path. the exceptions to your own policy that you make are your decisions to make. that whole "you do you" thing. the exceptions you made do make sense, and i might have considered them in the past, but i'm a bit invested in my solutions, policies and exceptions that i may have or might yet decide upon.

fedora does move fast, and thats why i have done all i can to be able to take a box down for rebuild and get it back up and running as fast as possible. i also skip the odd numbered releases so that i can get about a year of service out of a release and put off upgrades. sadly, with the nas, my data suffered collateral damage when the os disk went belly up.

i am also now seeing that one of my rank-and-file servers is having issues with the os ssd, and i am realizing the age of my gear is starting to show. i have an original hp microserver n36l, the first model in the series. it may be that i have start investing in new hardware. oh the joy of having IT as a hobby, passion, and career.

koala · Aug 1, 2024

Heh, I have a Microserver Gen8 too (and I manage my brother's N40L).

I'm actually starting thinking about what to do next, too.

I've moved everything I've been able to move to a Hetzner box, so really I just have very simple needs at home; NAS, VPN, DNS/DHCP. I've bought 32gb of RAM for an old desktop, and I intend to have an Incus playground there, so I'm tempted to just go FreeNAS (although it really bothers me that they only support separating OS and data) (and it also bothers me that the infra as code history of that is not clear to me yet) (and of course, having to redo stuff which I have working already).

(As a side note, for my needs, I haven't had to do any tuning of ZFS. It did take me a while to get send/recv + keytabs working, but it's really not that complex. But at some point I should look at Stratis. Unfortunately, few checksumming stuff in Linux

Burn24 · Aug 1, 2024

brendan_kearney said:
i have 4 HDDs in a mdadm RAID5 config. when the OS disk went bad, i lost the ability to access the data on the 4 HDDs. this tells me that something on the OS disk "maps" the location of the files on the RAID array. when the disk went bad, that "mapping" was lost or corrupted and thats why i cannot access the data on the array. the data is clearly stll there, but no OS can get to the data. to me, this seems to be a failure, and i want to address that and not have to deal with it in the future. what insights do folks have around how an OS keeps track of data on different disks?

I might have missed something elsewhere in the thread, but I'm pretty sure in a standard setup, you are incorrect. The system should scan block devices on startup, and.. whenever it wants to I guess, and notes LVM headers and by default will assemble and online LVM volumes if everything looks copacetic. It doesn't strictly require custom config in /etc . If your deployment does require custom block device configuration, well, normally a very early step in your disaster recovery playbook is making sure you have your block device/controller drivers somewhere safe offline accessible, if they are required, along with vital storage configuration data to bootstrap it. We can assume you just unfortunately misplaced this, heh.

Regarding your ask on insights folks have around how an OS keeps track of data on different disks, For some actual meat on this for LVM check out man 7 lvmautoactivation

brendan_kearney · Aug 1, 2024

i am happy to accept that i am ignorant on how the os and data interact and how the data is mapped by the os, when the data is stored on different disks, but this is the second time an os disk loss has caused data loss on different disks. i'll certainly look at that man page and see if that enlightens me. the setup i have is pretty basic and there are no real customizations i made. just create a lvm and filesystems for what i want to export via iscsi, nfs, cifs, etc. the raid is mdadm and not hardware so maybe there is a wrinkle in that detail. dunno, but again backups are the mitigating measure for all of this.

Burn24 · Aug 1, 2024

I also try to shove everything into Fedora in some fashion, and things don't always fit in cleanly, and usually there are no guides for doing weird stuff on Fedora, but for me it's been an easier and more interesting maintenance burden. Also sometimes I want to do weird stuff that bespoke distros don't seem easily able to accommodate. I did shoehorn Fedora into the role of VPN gateway, such that it only had routes out to my chosen VPN endpoints, and would NAT traffic behind it out on the VPN. It works reliably for me, and I enjoyed learning the details of putting it together, but it's certainly not for most people. I had to sit and spend some time understanding nftables.

I used to consider Fedora DOA for hobbyist hosting, but with the changing times and loss of CentOS it has been the lowest friction move for me so far. I'm trying to deploy most stuff in clouds anyways, and with the more nimble/disposable deployments in the cloud with opentofu it feels much less of a downside to me that a Fedora release support term is 18mo, I have probably already re-deployed that service as part of regular maintenance over a year, or I should have. It's also been an easy platform for simple container hosting using podman and quadlets, and so there is much less exposure to changes in Fedora OS, I can just re-roll the container host with updated packages and not really care about it.

It has made me think I should look more into CoreOS for personal use, but since things Just Work now, and I don't expect knowing CoreOS will specifically land me a sweet job, I haven't quite gotten there yet.

Not to harp on the cloud shit too hard, but something I have also really liked is per container service I roll out, each gets its own storage volumes as needed, separate from the disposable OS drive, and each can have its own backup strategy in AWS. Notifications are configurable. If you watch your data budget and backup policies, backup and storage for regular services can be really cheap. BUT, I understand you are probably looking at media storage, and at those volumes AWS will absolutely ream us heh.

Lastly, I think these topics are a good portrait of how storage can be very complicated, and why it's so expensive in enterprise. Once you need more reliability than going down to buy a consumer retail disk, the cost skyrockets. I remember trying to figure out a few years ago what the sweet spot for drive size was, to minimize failure chance, and exposure and potential raid rebuild times (eg, if you have a raid1 of two 12TB disks, and one fails, what is the chance the lone good source doesn't choke in the 13 or so hours @ 250MB/sec it takes to bring the raid back optimal?)... and how many disks I should be using in a raid array to minimize those outages. I think I read backblaze has something like 50% of the disks in their raid arrays are redundant.. in other words, in a 10 disk array, 5 hotspares. We're not even talking about backups yet.

homegrown NAS rebuild

Ars Tribunus Militum

Ars Centurion

Ars Tribunus Militum

Ars Tribunus Militum

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Tribunus Militum

Ars Tribunus Militum

Ars Scholae Palatinae

Ars Tribunus Militum

Ars Praefectus

Ars Tribunus Militum

Ars Scholae Palatinae

Ars Tribunus Militum

Ars Tribunus Militum

Ars Scholae Palatinae

Ars Tribunus Militum

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Tribunus Militum

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Tribunus Militum

Ars Tribunus Militum

Ars Scholae Palatinae

Ars Tribunus Militum

Ars Tribunus Militum

Ars Tribunus Militum

Ars Scholae Palatinae

Ars Praefectus

Ars Praefectus

Ars Tribunus Militum

Ars Tribunus Militum

Ars Tribunus Angusticlavius

Ars Tribunus Militum

Ars Tribunus Angusticlavius

Smack-Fu Master, in training

Ars Tribunus Militum

Smack-Fu Master, in training