Page 1 of 3 123 LastLast
Results 1 to 10 of 22

Thread: Oh No not another RAID Post

  1. #1
    Join Date
    Mar 2024
    Location
    Central Region U.S.A
    Beans
    32
    Distro
    Ubuntu

    Oh No not another RAID Post

    ---------BLUF-------------------
    I attempted to follow this guide to reduce the number of drives within the current RAID 5 to 6 Active drive and then have a spare. https://www.stevewilson.co.uk/sysadmin/reducing-raid-5-disks-with-mdadm/
    The article is very well written but somewhere I'm failing to actually make it work. The first question is there something omitted that I'm unaware of?
    MDADM.conf?

    What my error was that MDADM kept looking for the 2 other drives after following the article. I did catch the part about shrinking the filesystem before issuing MDADM the new number of drives

    that never seemed to take. Which after seeing the results I wound up adding the drive (sdb1 the only seagate drive, the remainder are all Toshiba's the exact model numbers , last drive is /dev sdi1 ) back to the RAID5 and right now is running fine with no issues with 8 drives except the seagate still have the errors with smart.

    Currently I have only used 918.93 GiB of 3.13 TiB Available so in my mind it Should have worked easily
    .

    Code:
    -----------Background------Informatttion----------------------------------- 
    The  system is Ubuntu 22.04 server running headless (Dell Optiplex 790 SFF  32 Gb Ram with a extra Sata non-Raid card 8 port) to run as a media  server.
    
    Operating System is on /dev/sda and not in the MDADM Raid 5 Configuration.
    
    Before  anyone posts it,   I know RAID is NOT a Backup, but rather a collection  of drives into one big drive (laymans terms). 
    My data is backed in a  different location.
    
    The background on the RAID 5 is I set it up  with 8 drives (500 GB /465 accessible)  had I really thought about it I  
    would have set up as 6 active with a spare.
    
    In the current situation I have 1 drive (sdb) which is starting to get errors within Smart so I have a desire to remove it.
    But do Have some 1TB drives enroute and I was going to grow the Raid (/dev/md0) one by one to a 6 drive Raid 5 with spare.  
    My thoughts was that it would be easier to reduce the number of drives with 500 gibs then slowing grow the array with the 
    inbound 1 TiB drives
    here is my current MDADM.conf
    Code:
    # mdadm.conf
    #
    # !NB! Run update-initramfs -u after updating this file.
    # !NB! This will ensure that initramfs has an uptodate copy.
    #
    # Please refer to mdadm.conf(5) for information about this file.
    #
    
    # by default (built-in), scan all partitions (/proc/partitions) and all
    # containers for MD superblocks. alternatively, specify devices to scan, using
    # wildcards if desired.
    #DEVICE partitions containers
    
    # automatically tag new arrays as belonging to the local system
    HOMEHOST <system>
    
    # instruct the monitoring daemon where to send mail alerts
    MAILADDR root
    
    # definitions of existing MD arrays
    
    # This configuration was auto-generated on Fri, 16 Feb 2024 18:45:30 +0000 by mkconf
    ARRAY /dev/md0 uuid=58952835:75d234f4:d4201fb9:d535d0c4
    The actual easiest Solution may just very well disassemble the existing Raid install the new drives and just simply redo it as I now think I should have setup originally. Which I was after the learning experience of manipulating the array.
    Thoughts/ and any advice would be great Thank you for looking this over.
    Last edited by sgt-mike; March 16th, 2024 at 02:35 AM.

  2. #2
    Join Date
    Nov 2009
    Location
    Catalunya, Spain
    Beans
    14,565
    Distro
    Ubuntu 18.04 Bionic Beaver

    Re: Oh No not another RAID Post

    First of all, mdadm.conf is not very relevant for shrinking. The ARRAY definition in that file basically just tells to assemble an array with all members (superblocks) with specific UUID present.

    You need to start with filesystem size and array size. So please post in code tags the output of:
    Code:
    df -h
    sudo mdadm -D /dev/md0
    sudo blkid
    After that I believe we can work on it step by step.
    Darko.
    -----------------------------------------------------------------------
    Ubuntu 18.04 LTS 64bit

  3. #3
    Join Date
    Mar 2024
    Location
    Central Region U.S.A
    Beans
    32
    Distro
    Ubuntu

    Re: Oh No not another RAID Post

    df-h

    Code:
    [mike@bastion:~$ df -h
    Filesystem      Size  Used Avail Use% Mounted on
    tmpfs           3.2G  1.3M  3.2G   1% /run
    /dev/sda2       228G   14G  202G   7% /
    tmpfs            16G     0   16G   0% /dev/shm
    tmpfs           5.0M     0  5.0M   0% /run/lock
    /dev/md0        3.2T  919G  2.1T  31% /mnt/Movies
    /dev/sda1       1.1G  6.1M  1.1G   1% /boot/efi
    tmpfs           3.2G  4.0K  3.2G   1% /run/user/1000
    sudo mdadm -D /dev/md0
    Code:
              Version : 1.2
         Creation Time : Thu Mar  7 22:57:49 2024
            Raid Level : raid5
            Array Size : 3417774080 (3.18 TiB 3.50 TB)
         Used Dev Size : 488253440 (465.63 GiB 499.97 GB)
          Raid Devices : 8
         Total Devices : 8
           Persistence : Superblock is persistent
    
           Update Time : Fri Mar 15 13:27:24 2024
                 State : clean 
        Active Devices : 8
       Working Devices : 8
        Failed Devices : 0
         Spare Devices : 0
    
                Layout : left-symmetric
            Chunk Size : 512K
    
    Consistency Policy : resync
    
                  Name : bastion:0  (local to host bastion)
                  UUID : 58952835:75d234f4:d4201fb9:d535d0c4
                Events : 5325
    
        Number   Major   Minor   RaidDevice State
           8       8       17        0      active sync   /dev/sdb1
           1       8       33        1      active sync   /dev/sdc1
           2       8       49        2      active sync   /dev/sdd1
           3       8       65        3      active sync   /dev/sde1
           4       8       81        4      active sync   /dev/sdf1
           5       8       97        5      active sync   /dev/sdg1
           6       8      113        6      active sync   /dev/sdh1
           7       8      129        7      active sync   /dev/sdi1
    sudo blkid

    Code:
    /dev/sdf1: UUID="58952835-75d2-34f4-d420-1fb9d535d0c4" UUID_SUB="11b04334-ee6f-7979-758a-ca40418867fc" LABEL="bastion:0" TYPE="linux_raid_member" PARTUUID="25f66cae-01"
    /dev/sdd1: UUID="58952835-75d2-34f4-d420-1fb9d535d0c4" UUID_SUB="2862b7ca-4892-eb20-e890-15322117dcbe" LABEL="bastion:0" TYPE="linux_raid_member" PARTUUID="2e85756f-01"
    /dev/sdb1: UUID="58952835-75d2-34f4-d420-1fb9d535d0c4" UUID_SUB="1642bc34-f63e-14e0-945e-c5c5fef3beab" LABEL="bastion:0" TYPE="linux_raid_member" PARTUUID="1a9b49b7-01"
    /dev/sdi1: UUID="58952835-75d2-34f4-d420-1fb9d535d0c4" UUID_SUB="4ca84bfd-7b6b-0bb7-678d-a306a9af1b12" LABEL="bastion:0" TYPE="linux_raid_member" PARTUUID="0f61245b-01"
    /dev/md0: UUID="9aa3afa3-1658-48d6-a4c2-3cb0fb6ef7eb" BLOCK_SIZE="4096" TYPE="ext4"
    /dev/sdg1: UUID="58952835-75d2-34f4-d420-1fb9d535d0c4" UUID_SUB="7f764be4-1680-e152-3fb8-0a3e6464642f" LABEL="bastion:0" TYPE="linux_raid_member" PARTUUID="a9a61446-01"
    /dev/sde1: UUID="58952835-75d2-34f4-d420-1fb9d535d0c4" UUID_SUB="322ca567-3577-138c-a074-cc3ebcf76426" LABEL="bastion:0" TYPE="linux_raid_member" PARTUUID="47ad3a9c-01"
    /dev/sdc1: UUID="58952835-75d2-34f4-d420-1fb9d535d0c4" UUID_SUB="6de3e8c1-03c4-7a23-0195-650b2583050d" LABEL="bastion:0" TYPE="linux_raid_member" PARTUUID="e4d7e781-01"
    /dev/sda2: UUID="394dfb40-ea1d-4f5f-90b8-b5be8be855b9" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="4ea80dba-eb83-4dd9-96cf-fc96fad05f24"
    /dev/sda1: UUID="1074-E197" BLOCK_SIZE="512" TYPE="vfat" PARTUUID="cbcc1d37-4cd8-4c52-8a7f-b057a92db558"
    /dev/sdh1: UUID="58952835-75d2-34f4-d420-1fb9d535d0c4" UUID_SUB="5a789c85-8839-c9fb-ff62-bd96fece5cd1" LABEL="bastion:0" TYPE="linux_raid_member" PARTUUID="a269eb9b-01"
    /dev/loop1: TYPE="squashfs"
    /dev/loop6: TYPE="squashfs"
    /dev/loop4: TYPE="squashfs"
    /dev/loop2: TYPE="squashfs"
    /dev/loop0: TYPE="squashfs"
    /dev/loop7: TYPE="squashfs"
    /dev/loop5: TYPE="squashfs"
    /dev/loop3: TYPE="squashfs"
    Glad to hear that MDADM.conf is not really involved in the reducing drives within the array. It was just a thought

    I'll attach the SMART Report just as a FYI the raw read write report is what is getting me
    Code:
    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED
    See vendor-specific Attribute list for marginal Attributes.
    
    General SMART Values:
    Offline data collection status:  (0x00)	Offline data collection activity
    					was never started.
    					Auto Offline Data Collection: Disabled.
    Self-test execution status:      (   0)	The previous self-test routine completed
    					without error or no self-test has ever 
    					been run.
    Total time to complete Offline 
    data collection: 		(    0) seconds.
    Offline data collection
    capabilities: 			 (0x73) SMART execute Offline immediate.
    					Auto Offline data collection on/off support.
    					Suspend Offline collection upon new
    					command.
    					No Offline surface scan supported.
    					Self-test supported.
    					Conveyance Self-test supported.
    					Selective Self-test supported.
    SMART capabilities:            (0x0003)	Saves SMART data before entering
    					power-saving mode.
    					Supports SMART auto save timer.
    Error logging capability:        (0x01)	Error logging supported.
    					General Purpose Logging supported.
    Short self-test routine 
    recommended polling time: 	 (   1) minutes.
    Extended self-test routine
    recommended polling time: 	 ( 101) minutes.
    Conveyance self-test routine
    recommended polling time: 	 (   2) minutes.
    SCT capabilities: 	       (0x1035)	SCT Status supported.
    					SCT Feature Control supported.
    					SCT Data Table supported.
    
    SMART Attributes Data Structure revision number: 10
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x000f   117   099   006    Pre-fail  Always       -       136966680
      3 Spin_Up_Time            0x0003   099   098   000    Pre-fail  Always       -       0
      4 Start_Stop_Count        0x0032   099   099   020    Old_age   Always       -       1121
      5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
      7 Seek_Error_Rate         0x000f   078   060   030    Pre-fail  Always       -       68971556
      9 Power_On_Hours          0x0032   092   092   000    Old_age   Always       -       7768 (193 66 0)
     10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
     12 Power_Cycle_Count       0x0032   099   099   020    Old_age   Always       -       1079
    184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
    187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
    188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
    189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
    190 Airflow_Temperature_Cel 0x0022   075   045   045    Old_age   Always   In_the_past 25 (Min/Max 25/27)
    191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       159
    192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       103
    193 Load_Cycle_Count        0x0032   094   094   000    Old_age   Always       -       12054
    194 Temperature_Celsius     0x0022   025   055   000    Old_age   Always       -       25 (0 14 0 0 0)
    197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
    198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
    199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
    240 Head_Flying_Hours       0x0000   092   092   000    Old_age   Offline      -       7674 (28 61 0)
    241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       8565115506
    242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       6168688394
    254 Free_Fall_Sensor        0x0032   001   001   000    Old_age   Always       -       2
    
    SMART Error Log Version: 1
    No Errors Logged
    
    SMART Self-test log structure revision number 1
    Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
    # 1  Short offline       Completed without error       00%     32341         -
    # 2  Short offline       Completed without error       00%     32336         -
    
    SMART Selective self-test log data structure revision number 1
     SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
        1        0        0  Not_testing
        2        0        0  Not_testing
        3        0        0  Not_testing
        4        0        0  Not_testing
        5        0        0  Not_testing
    Selective self-test flags (0x0):
      After scanning selected spans, do NOT read-scan remainder of disk.
    If Selective self-test is pending on power-up, resume after 0 minute delay.
    Last edited by sgt-mike; March 16th, 2024 at 02:45 AM.

  4. #4
    Join Date
    Mar 2010
    Location
    Squidbilly-Land
    Beans
    Hidden!
    Distro
    Ubuntu

    Re: Oh No not another RAID Post

    Don't let your RAID HDDs spin down. That's bad for data integrity.

    Also, if you use mdadm, I'd use LVM on the RAID device to gain some flexibility in storage allocations, then put the file system(s), sized as needed, inside each LV you create. Always start small and only size an LV for your use in the next 3 months. Growing an LV while the devices are up and being used is trivial, but only if there is unallocated space in the VG that an LV can pull from. Takes about 5 seconds, including extending the file system (assuming ext4). Reducing an LV later is a huge hassle.

    Or use ZFS in RAIDz configuration, but ZFS is more specific about how disks are used (or I just don't understand it anymore).

    I'd be worried a bit about that Raw_Read_Error_Rate. At least your cables don't seem bad. On my disks, these are the numbers from the most recent tests.
    Code:
    smart.2024-03-05.sda:  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       7
    smart.2024-03-05.sdb:  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
    smart.2024-03-05.sdc:  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
    smart.2024-03-05.sdd:  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
    smart.2024-03-05.sde:  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       1
    smart.2024-03-12.sda:  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       10
    smart.2024-03-12.sdb:  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
    smart.2024-03-12.sdc:  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
    smart.2024-03-12.sdd:  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
    smart.2024-03-12.sde:  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       1
    These disks are mostly 7+ yrs old.

    In another system,
    Code:
    smart.2024-03-04.sda:  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
    smart.2024-03-04.sdb:  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
    smart.2024-03-04.sdc:  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
    smart.2024-03-04.sdd:  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
    smart.2024-03-04.sde:  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
    smart.2024-03-04.sdf:  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       8
    smart.2024-03-04.sdg:  1 Raw_Read_Error_Rate     0x000f   113   089   006    Pre-fail  Always       -       113825801
    smart.2024-03-04.sdh:  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
    smart.2024-03-11.sda:  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
    smart.2024-03-11.sdb:  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
    smart.2024-03-11.sdc:  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
    smart.2024-03-11.sdd:  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
    smart.2024-03-11.sde:  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
    smart.2024-03-11.sdf:  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       8
    smart.2024-03-11.sdg:  1 Raw_Read_Error_Rate     0x000f   115   089   006    Pre-fail  Always       -       95679048
    smart.2024-03-11.sdh:  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
    sdg was bought in 2007-ish. It was used in a RAID array for about 5 yrs, then became a scratch disk for temporary processing. I stopped using it last year, but haven't pulled it from the system yet. I really need to do that.
    Code:
    Model Family:     Seagate Barracuda 7200.10
    Device Model:     ST3320620AS
    smart.2024-03-11.sdg:  9 Power_On_Hours          0x0032   001   001   000    Old_age   Always       -       107002
    That's 12.2 yrs spinning. ZERO Reallocated_Sector_Ct for all the drives above in both systems.

    I run short SMART tests weekly and long tests once a month.

  5. #5
    Join Date
    Nov 2009
    Location
    Catalunya, Spain
    Beans
    14,565
    Distro
    Ubuntu 18.04 Bionic Beaver

    Re: Oh No not another RAID Post

    Good, I see you use partition as raid members (like /dev/sdb1) and not the whole unpartitioned disk (like /dev/sdb). That is a good best practice with mdadm but some people don't follow it. When using disks always try to partition them and use as partitions, even if that means creating one single big partiton to span the whole disk.

    As you can see from df your filesystem size is 3.2TB. And from mdadm -D your array size is also 3.17TiB and each member size is 465GiB. For raid5 that makes sense, the array size is 7x the size of the member size, because the 8th member is the parity. And your filesystem currently has the maximum size, the total array size, which is normal.

    If I understood correctly, you want to remove two raid members. One to completely remove it, the other to serve as spare.

    To shrink the array to 6 members means only 5x members will be used for data (one is for parity). Which means your new array size and filesystem size maximum value will be approx 5x 465GiB = 2325GiB.

    As explained in that article you linked, the first important step is to unmount and shrink the filesystem first. Your current used size is only 919GiB so that is good. So go ahead and shrink the filesystem to the minimum value or to any value between 919GiB and 2325GiB.

    For example as per the article:
    Code:
    sudo umount /dev/md0
    sudo resize2fs -pM /dev/md0
    If you don't want to shrink it to minimum then grow it again, you can select another safe value in between, like 1800GiB. In that case it would be something like:
    Code:
    sudo resize2fs -p /dev/md0 1800G
    That operation might take some hours so go ahead while preparing the mdadm shrink commands.
    Darko.
    -----------------------------------------------------------------------
    Ubuntu 18.04 LTS 64bit

  6. #6
    Join Date
    Nov 2009
    Location
    Catalunya, Spain
    Beans
    14,565
    Distro
    Ubuntu 18.04 Bionic Beaver

    Re: Oh No not another RAID Post

    Related to the mdadm shrink, the new size will be 5x members for data, and in mdadm -D you have the size of each member = 488253440. So according to that you first need to temporary shrink the array size to 5x 488253440 = 2441267200.

    But you don't have to calculate this number, and just in case I'm wrong you can do what they did in the tutorial, run the grow command first which will tell you to what value to resize the array first. The order of the commands should be something like:

    Code:
    sudo mdadm --grow /dev/md0 --raid-devices=6 (you get the minimum array size from here)
    sudo mdadm --grow /dev/md0 --array-size 2441267200
    sudo mdadm --grow /dev/md0 --raid-devices=6
    After that process completes you will still have array with 8 members but only 6 will be active and 2 will be spare. I couldn't find any command how to control the spares during the grow (because you want to remove specific disk, sdb), so I think the spares will be random. Or the last two members of the array will be marked as spares, which would be sdh1 and sdi1.

    To remove sdb you will have to fail it and remove it:
    Code:
    sudo mdadm /dev/md0 --fail /dev/sdb1
    sudo mdadm /dev/md0 --remove /dev/sdb1
    After this process is finished you should have the array with 6 members and 1 spare. Remove the sdb1 superblock to avoid any confusions in the future and you are good to go.

    Code:
    sudo mdadm --zero-superblock /dev/sdb1
    The last remaining thing is to grow the filesystem to the new maximum size and mount it.
    Code:
    sudo resize2fs -p /dev/md0
    sudo mount /dev/md0
    Darko.
    -----------------------------------------------------------------------
    Ubuntu 18.04 LTS 64bit

  7. #7
    Join Date
    Mar 2024
    Location
    Central Region U.S.A
    Beans
    32
    Distro
    Ubuntu

    Re: Oh No not another RAID Post

    @ Dakrod
    I think where I was not getting the results I wanted to achieve was I failed sdb (1) first after unmounting then ( or rather before) attempted to reduce the array.
    I am in the thought /assumption that is would be a bad idea to fail sdb (the only seagate drive in the array is the one with the errors the other all read zero on that line) and sdc (which will become the spare) at the same time.
    But would be ok / best practice to fail sdb rebuild with the altered n amount of 7, once clean then fail sbc and then issue the n=6 after rebuild or probably correct term resync and then add sdc back as the spare.
    Or am I assuming wrong ? and sbc and sdc can be failed together as the array is greater than 4 members?


    " Related to the mdadm shrink, the new size will be 5x members for data, and in mdadm -D you have the size of each member = 488253440. So according to that you first need to temporary shrink the array size to 5x 488253440 = 2441267200."

    I was actually thinking right around that size maybe a bit smaller. But the 465 gibs (488253440) x 5 make sense or like I said even a bit smaller then grow to full size of the 5 disks with the 6th member for parity. . (2400000000 was what I thought without math)

    " If I understood correctly, you want to remove two raid members. One to completely remove it, the other to serve as spare."

    Yes Sir that would be correct.


    @ TheFu
    LOL yes that is why I was attempting to pull sdb (the smart report is of that drive) from the array.

    hopefully that my post was clear sometimes I get wordy and give too much information which causes confusion.
    Last edited by sgt-mike; March 16th, 2024 at 11:36 AM.

  8. #8
    Join Date
    Nov 2009
    Location
    Catalunya, Spain
    Beans
    14,565
    Distro
    Ubuntu 18.04 Bionic Beaver

    Re: Oh No not another RAID Post

    As far as I see in mdadm -D sdb1 is not marked as failed yet. Despite any smart errors.

    So, you have two ways to proceed.

    1). Ignore the smart info for now, and shrink the array from 8 to 6 members (I would do it in one go, instead of 8 -> 7 -> 6 because in one go means one reshape operation). This process would be like I explained above.

    2). Fail and remove sdb1 first. And then work on the shrink/reshape. But in this case take into account that with failing sdb1 your raid5 is working with the minimum 7 members present. Any possible disk failure during the reshape will probably make the array fail. In case another disk fails before the reshape is completed. This is a risk.

    It is also a small risk doing the reshape when sdb1 member is already showing smart errors, but at least it is still reported as active in mdadm, not failed.

    So, it would be up to you which way you want to go.

    I didn't quite understand what you tried to do. Are you saying the reshape failed because you failed sdb1 first? I haven't tried doing reshapes with failed member, maybe it really doesn't work, it might not accept doing the operation when one member is failed. If that is true it would leave you only option 1) I guess. Doing the reshape with sdb1 present and hoping that it will stick it out.
    Darko.
    -----------------------------------------------------------------------
    Ubuntu 18.04 LTS 64bit

  9. #9
    Join Date
    Mar 2024
    Location
    Central Region U.S.A
    Beans
    32
    Distro
    Ubuntu

    Re: Oh No not another RAID Post

    when I attempted to reduce the number of drives in the array, which I failed sdb then attempted to downsize the array. which for some reason I never noticed that I should have resized then fail/remove drive. when I failed the drive (sdb) I did remove the superblock I then attempted to change the number of members in the array. even after attempts to reshape mdadm kept looking for that 8th member. and never would be clean but rather stayed in clean /degraded status. at this point I figured it best to put sdb back in. Ask for help and re-attempt

    I'll re attempt this sometime today or tomorrow.

    yes I know that no drive has failed yet just trying to get ahead of a possible failure.

  10. #10
    Join Date
    Mar 2010
    Location
    Squidbilly-Land
    Beans
    Hidden!
    Distro
    Ubuntu

    Re: Oh No not another RAID Post

    Quote Originally Posted by sgt-mike View Post
    yes I know that no drive has failed yet just trying to get ahead of a possible failure.
    +1 ! If it were me and there wasn't a trivial way to do it, I'd have ensured I had excellent backups for the data, then wiped the array and started over with a fresh mdadm create to the size I wanted. I'd also load LVM, for the reason already spelled out, then I'd restore the data. LVM + mdadm were made for each other. While LVM **can** provide multiple different RAID levels using forked code from mdadm, I find the resulting LVM objects funky. I'd rather keep them in layers.

    HDD ---> Partition ---> mdadm Array ---> LVM ---> PVs ---> VGs ---> LVs ---> File system

    While all those layers may appear to add overhead, somehow they don't impact performance more than even 1% while providing 500% more flexibility.

Page 1 of 3 123 LastLast

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •