Sep
20
2019
Partition MisAlignment

this article also has an alternative title:

How I Learned to Stop Worrying and Loved my Team

This is a story of troubleshooting cloud disk volumes (long post).

Cloud Disk Volume

Working with data disk volumes in the cloud have a few benefits. One of them is when the volume runs out of space, you can just increase it! No need of replacing the disk, no need of buying a new one, no need of transferring 1TB of data from one disk to another. It is a very simple matter.

Partitions Vs Disks

My personal opinion is not to use partitions. Cloud data disk on EVS (elastic volume service) or cloud volumes for short, they do not need a partition table. You can use the entire disk for data.

Use: /dev/vdb instead of /dev/vdb1

Filesystem

You have to choose your filesystem carefully. You can use XFS that supports Online resizing via xfs_growfs, but you can not shrunk them. But I understand that most of us are used to work with extended filesystem ext4 and to be honest I also feel more comfortable with ext4.

You can read the below extensive article in wikipedia Comparison of file systems for more info, and you can search online regarding performance between xfs and ext4. There are really close to each other nowadays.

Increase Disk

Today, working on a simple operational task (increase a cloud disk volume), I followed the official documentation. This is something that I have done in the past like a million times. To provide a proper documentation I will use redhat’s examples:

In a nutshell

  • Umount data disk
  • Increase disk volume within the cloud dashboard
  • Extend (change) the geometry
  • Check filesystem
  • Resize ext4 filesystem
  • Mount data disk

Commands

Let’s present the commands for reference:

# umount /dev/vdb1

[increase cloud disk volume]

# partprobe

# fdisk /dev/vdb
[delete partition]
[create partition]

# partprobe

# e2fsck /dev/vdb1
# e2fsck -f /dev/vdb1
# resize2fs /dev/vdb1
# mount /dev/vdb1

And here is fdisk in more detail:

Fdisk

# fdisk /dev/vdb

Welcome to fdisk (util-linux 2.27.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Command (m for help): p
Disk /dev/vdb: 1.4 TiB, 1503238553600 bytes, 2936012800 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x0004e2c8

Device     Boot Start        End    Sectors  Size Id Type
/dev/vdb1           1 2097151999 2097151999 1000G 83 Linux

Delete


Command (m for help): d
Selected partition 1
Partition 1 has been deleted.

Create

Command (m for help): n
Partition type
   p   primary (0 primary, 0 extended, 4 free)
   e   extended (container for logical partitions)
Select (default p): p
Partition number (1-4, default 1):
First sector (2048-2936012799, default 2048):
Last sector, +sectors or +size{K,M,G,T,P} (2048-2936012799, default 2936012799):

Created a new partition 1 of type 'Linux' and of size 1.4 TiB.

Print

Command (m for help): p
Disk /dev/vdb: 1.4 TiB, 1503238553600 bytes, 2936012800 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x0004e2c8

Device     Boot Start        End    Sectors  Size Id Type
/dev/vdb1        2048 2936012799 2936010752  1.4T 83 Linux

Write

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

File system consistency check

An interesting error occurred, something that I had never seen before when using e2fsck

# e2fsck /dev/vdb1
e2fsck 1.42.13 (17-May-2015)
ext2fs_open2: Bad magic number in super-block
e2fsck: Superblock invalid, trying backup blocks...
e2fsck: Bad magic number in super-block while trying to open /dev/vdb1

The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
 or
    e2fsck -b 32768 <device>

Superblock invalid, trying backup blocks

Panic

I think I lost 1 TB of files!

At that point, I informed my team to raise awareness.

partition_panic.png

Yes I know, I was a bit sad at the moment. I’ve done this work a million times before, also the Impostor Syndrome kicked in!

Snapshot

I was lucky enough because I could create a snapshot, de-attach the disk from the VM, create a new disk from the snapshot and work on the new (test) disk to try recovering 1TB of lost files!

Make File System

mke2fs has a dry-run option that will show us the superblocks:

mke2fs 1.42.13 (17-May-2015)
Creating filesystem with 367001344 4k blocks and 91750400 inodes
Filesystem UUID: f130f422-2ad7-4f36-a6cb-6984da34ead1
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
        102400000, 214990848

Testing super blocks

so I created a small script to test every super block against /dev/vdb1

e2fsck  -b  32768      /dev/vdb1
e2fsck  -b  98304      /dev/vdb1
e2fsck  -b  163840     /dev/vdb1
e2fsck  -b  229376     /dev/vdb1
e2fsck  -b  294912     /dev/vdb1
e2fsck  -b  819200     /dev/vdb1
e2fsck  -b  884736     /dev/vdb1
e2fsck  -b  1605632    /dev/vdb1
e2fsck  -b  2654208    /dev/vdb1
e2fsck  -b  4096000    /dev/vdb1
e2fsck  -b  7962624    /dev/vdb1
e2fsck  -b  11239424   /dev/vdb1
e2fsck  -b  20480000   /dev/vdb1
e2fsck  -b  23887872   /dev/vdb1
e2fsck  -b  71663616   /dev/vdb1
e2fsck  -b  78675968   /dev/vdb1
e2fsck  -b  102400000  /dev/vdb1
e2fsck  -b  214990848  /dev/vdb1

Unfortunalyt none of the above commands worked!

last-ditch recovery method

There is a nuclear option DO NOT DO IT

mke2fs -S /dev/vdb1

Write superblock and group descriptors only. This is useful if all of the superblock and backup superblocks are corrupted, and a last-ditch recovery method is desired. It causes mke2fs to reinitialize the superblock and group descriptors, while not touching the inode table and the block and inode bitmaps.

Then e2fsck -y -f /dev/vdb1 moved 1TB of files under lost+found with their inode as the name of every file.

I cannot stress this enough: DO NOT DO IT !

Misalignment

So what is the issue?

See the difference of fdisk on 1TB and 1.4TB

Device     Boot Start        End    Sectors  Size Id Type
/dev/vdb1           1 2097151999 2097151999 1000G 83 Linux

Device     Boot Start        End    Sectors  Size Id Type
/dev/vdb1        2048 2936012799 2936010752  1.4T 83 Linux

The First sector is now at 2048 instead of 1.

Okay delete disk, create a new one from the snapshot and try again.

Fdisk Part Two

Now it is time to manually put the first sector on 1.

# fdisk /dev/vdb

Welcome to fdisk (util-linux 2.27.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Command (m for help): p

Disk /dev/vdb: 1.4 TiB, 1503238553600 bytes, 2936012800 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x0004e2c8

Device     Boot Start        End    Sectors  Size Id Type
/dev/vdb1        2048 2936012799 2936010752  1.4T 83 Linux

Command (m for help): d
Selected partition 1
Partition 1 has been deleted.

Command (m for help): n
Partition type
   p   primary (0 primary, 0 extended, 4 free)
   e   extended (container for logical partitions)
Select (default p): p
Partition number (1-4, default 1): 1
First sector (2048-2936012799, default 2048): 1
Value out of range.

Value out of range.

damn it!

sfdisk

In our SRE team, we use something like a Bat-Signal to ask for All hands on a problem and that was what we were doing. A colleague made a point that fdisk is not the best tool for the job, but we should use sfdisk instead. I actually use sfdisk to create backups and restore partition tables but I was trying not to deviate from the documentation and I was not sure that everybody knew how to use sfdisk.

So another colleague suggested to use a similar 1TB disk from another VM.
I could hear the gears in my mind working…

sfdisk export partition table

sfdisk -d /dev/vdb > vdb.out

# fdisk -l /dev/vdb
Disk /dev/vdb: 1000 GiB, 1073741824000 bytes, 2097152000 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x0009e732

Device     Boot Start        End    Sectors  Size Id Type
/dev/vdb1           1 2097151999 2097151999 1000G 83 Linux

# sfdisk -d /dev/vdb > vdb.out

# cat vdb.out
label: dos
label-id: 0x0009e732
device: /dev/vdb
unit: sectors

/dev/vdb1 : start=           1, size=  2097151999, type=83

okay we have something here to work with, start sector is 1 and the geometry is 1TB for an ext file system. Identically to the initial partition table (before using fdisk).

sfdisk restore partition table

sfdisk /dev/vdb < vdb.out

# sfdisk /dev/vdb < vdb.out

Checking that no-one is using this disk right now ... OK

Disk /dev/vdb: 1.4 TiB, 1503238553600 bytes, 2936012800 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x0004e2c8

Old situation:

Device     Boot Start        End    Sectors  Size Id Type
/dev/vdb1        2048 2936012799 2936010752  1.4T 83 Linux

>>> Script header accepted.
>>> Script header accepted.
>>> Script header accepted.
>>> Script header accepted.
>>> Created a new DOS disklabel with disk identifier 0x0009e732.
Created a new partition 1 of type 'Linux' and of size 1000 GiB.
/dev/vdb2:
New situation:

Device     Boot Start        End    Sectors  Size Id Type
/dev/vdb1           1 2097151999 2097151999 1000G 83 Linux

The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

# fdisk -l /dev/vdb
Disk /dev/vdb: 1.4 TiB, 1503238553600 bytes, 2936012800 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x0009e732

Device     Boot Start        End    Sectors  Size Id Type
/dev/vdb1           1 2097151999 2097151999 1000G 83 Linux

Filesystem Check ?

# e2fsck -f /dev/vdb1
e2fsck 1.42.13 (17-May-2015)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
SATADISK: 766227/65536000 files (1.9% non-contiguous), 200102796/262143999 blocks

f#ck YES

Mount ?

# mount /dev/vdb1 /mnt

# df -h /mnt
Filesystem      Size  Used Avail Use% Mounted on
/dev/vdb1       985G  748G  187G  81% /mnt

f3ck Yeah !!

Extend geometry

It is time to extend the partition geometry to 1.4TB with sfdisk.
If you remember from the fdisk output

Device     Boot Start        End    Sectors  Size Id Type
/dev/vdb1           1 2097151999 2097151999 1000G 83 Linux
/dev/vdb1        2048 2936012799 2936010752  1.4T 83 Linux

We have 2936010752 sectors in total.
The End sector of 1.4T is 2936012799
Simple math problem: End Sector - Sectors = 2936012799 - 2936010752 = 2047

The previous fdisk command, had the Start Sector at 2048,
So 2048 - 2047 = 1 the preferable Start Sector!

New sfdisk

By editing the text vdb.out file to re-present our new situation:

# diff vdb.out vdb.out.14
6c6
< /dev/vdb1 : start=           1, size=  2097151999, type=83
---
> /dev/vdb1 : start=           1, size=  2936010752, type=83

1.4TB

Let’s put everything together

# sfdisk /dev/vdb < vdb.out.14
Checking that no-one is using this disk right now ... OK

Disk /dev/vdb: 1.4 TiB, 1503238553600 bytes, 2936012800 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x0009e732

Old situation:

Device     Boot Start        End    Sectors  Size Id Type
/dev/vdb1           1 2097151999 2097151999 1000G 83 Linux

>>> Script header accepted.
>>> Script header accepted.
>>> Script header accepted.
>>> Script header accepted.
>>> Created a new DOS disklabel with disk identifier 0x0009e732.
Created a new partition 1 of type 'Linux' and of size 1.4 TiB.
/dev/vdb2:
New situation:

Device     Boot Start        End    Sectors  Size Id Type
/dev/vdb1           1 2936010752 2936010752  1.4T 83 Linux

The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

# e2fsck /dev/vdb1
e2fsck 1.42.13 (17-May-2015)
SATADISK: clean, 766227/65536000 files, 200102796/262143999 blocks

# e2fsck -f /dev/vdb1
e2fsck 1.42.13 (17-May-2015)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
SATADISK: 766227/65536000 files (1.9% non-contiguous), 200102796/262143999 blocks

# resize2fs /dev/vdb1
resize2fs 1.42.13 (17-May-2015)
Resizing the filesystem on /dev/vdb1 to 367001344 (4k) blocks.
The filesystem on /dev/vdb1 is now 367001344 (4k) blocks long.

# mount /dev/vdb1 /mnt

# df -h  /mnt
Filesystem      Size  Used Avail Use% Mounted on
/dev/vdb1       1.4T  748G  561G  58%  /mnt

Finally!!

Partition Alignment

By the way, you can read this amazing article to fully understand why this happened:

Partition Alignment