Jul
08
2019
Repair a Faulty Disk in Raid-5

Quick notes

Identify slow disk

# hdparm -Tt /dev/sda

/dev/sda:
 Timing cached reads:   2502 MB in  2.00 seconds = 1251.34 MB/sec
 Timing buffered disk reads: 538 MB in  3.01 seconds = 178.94 MB/sec

# hdparm -Tt /dev/sdb

/dev/sdb:
 Timing cached reads:   2490 MB in  2.00 seconds = 1244.86 MB/sec
 Timing buffered disk reads: 536 MB in  3.01 seconds = 178.31 MB/sec

# hdparm -Tt /dev/sdc

/dev/sdc:
 Timing cached reads:   2524 MB in  2.00 seconds = 1262.21 MB/sec
 Timing buffered disk reads: 538 MB in  3.00 seconds = 179.15 MB/sec

# hdparm -Tt /dev/sdd

/dev/sdd:
 Timing cached reads:   2234 MB in  2.00 seconds = 1117.20 MB/sec
 Timing buffered disk reads: read(2097152) returned 929792 bytes

 

Set disk to Faulty State and Remove it

#  mdadm --manage /dev/md0 --fail /dev/sdd
mdadm: set /dev/sdd faulty in /dev/md0

#  mdadm --manage /dev/md0 --remove  /dev/sdd
mdadm: hot removed /dev/sdd from /dev/md0

Verify Status

# mdadm --verbose --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Thu Feb  6 15:06:34 2014
     Raid Level : raid5
     Array Size : 2929893888 (2794.16 GiB 3000.21 GB)
  Used Dev Size : 976631296 (931.39 GiB 1000.07 GB)
   Raid Devices : 4
  Total Devices : 3
    Persistence : Superblock is persistent

    Update Time : Mon Jul  8 00:51:14 2019
          State : clean, degraded
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : ServerOne:0  (local to host ServerOne)
           UUID : d635095e:50457059:7e6ccdaf:7da91c9b
         Events : 18122

    Number   Major   Minor   RaidDevice State
       0       8       16        0      active sync   /dev/sdb
       6       8       32        1      active sync   /dev/sdc
       4       0        0        4      removed
       4       8        0        3      active sync   /dev/sda

Format Disk

  • quick format to identify bad blocks,
  • better solution zeroing the disk
# mkfs.ext4 -cc -v  /dev/sdd 
  • middle ground to use -c

-c Check the device for bad blocks before creating the file system. If this option is specified twice, then a slower read-write test is used instead of a fast read-only test.

# mkfs.ext4 -c -v  /dev/sdd 

output:

Running command: badblocks -b 4096 -X -s /dev/sdd 244190645
Checking for bad blocks (read-only test):   9.76% done, 7:37 elapsed

Remove ext headers

# dd if=/dev/zero of=/dev/sdd bs=4096 count=4096

Using dd to remove any ext headers

Test disk


# hdparm -Tt /dev/sdd

/dev/sdd:
 Timing cached reads:   2174 MB in  2.00 seconds = 1087.20 MB/sec
 Timing buffered disk reads: 516 MB in  3.00 seconds = 171.94 MB/sec

Add Disk to Raid


# mdadm --add /dev/md0 /dev/sdd
mdadm: added /dev/sdd

Speed

# hdparm -Tt /dev/md0

/dev/md0:
 Timing cached reads:   2480 MB in  2.00 seconds = 1239.70 MB/sec
 Timing buffered disk reads: 1412 MB in  3.00 seconds = 470.62 MB/sec

Status


# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdd[5] sda[4] sdc[6] sdb[0]
      2929893888 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UU_U]
      [>....................]  recovery =  0.0% (44032/976631296) finish=369.5min speed=44032K/sec

unused devices: <none>

Verify Raid


# mdadm --verbose --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Thu Feb  6 15:06:34 2014
     Raid Level : raid5
     Array Size : 2929893888 (2794.16 GiB 3000.21 GB)
  Used Dev Size : 976631296 (931.39 GiB 1000.07 GB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent

    Update Time : Mon Jul  8 00:58:38 2019
          State : clean, degraded, recovering
 Active Devices : 3
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 512K

 Rebuild Status : 0% complete

           Name : ServerOne:0  (local to host ServerOne)
           UUID : d635095e:50457059:7e6ccdaf:7da91c9b
         Events : 18244

    Number   Major   Minor   RaidDevice State
       0       8       16        0      active sync   /dev/sdb
       6       8       32        1      active sync   /dev/sdc
       5       8       48        2      spare rebuilding   /dev/sdd
       4       8        0        3      active sync   /dev/sda
Tag(s): mdadm, raid5
Jul
03
2019
Down the troubleshooting rabbit-hole

Hardware Details

HP ProLiant MicroServer
AMD Turion(tm) II Neo N54L Dual-Core Processor
Memory Size: 2 GB - DIMM Speed: 1333 MT/s
Maximum Capacity: 8 GB

Running 24×7 from 23/08/2010, so nine years!

N54L

 

Prologue

The above server started it’s life on CentOS 5 and ext3. Re-formatting to run CentOS 6.x with ext4 on 4 x 1TB OEM Hard Disks with mdadm raid-5. That provided 3 TB storage with Fault tolerance 1-drive failure. And believe me, I used that setup to zeroing broken disks or replacing faulty disks.

 

As we are reaching the end of CentOS 6.x and there is no official dist-upgrade path for CentOS, and still waiting for CentOS 8.x, I made decision to switch to Ubuntu 18.04 LTS. At that point this would be the 3rd official OS re-installation of this server. I chose ubuntu so that I can dist-upgrade from LTS to LTS.

 

This is a backup server, no need for huge RAM, but for a reliable system. On that storage I have 2m files that in retrospect are not very big. So with the re-installation I chose to use xfs instead of ext4 filesystem.

 

I am also running an internal snapshot mechanism to have delta for every day and that pushed the storage usage to 87% of the 3Tb. If you do the math, 2m is about 1.2Tb usage, we need a full initial backup, so 2.4Tb (80%) and then the daily (rotate) incremental backups are ~210Mb per day. That gave me space for five (5) daily snapshots aka a work-week.

To remove this impediment, I also replaced the disks with WD Red Pro 6TB 7200rpm disks, and use raid-1 instead of raid-5. Usage is now ~45%

 

Problem

Frozen System

From time to time, this very new, very clean, very reliable system froze to death!

When attached monitor & keyboard no output. Strange enough I can ping the network interfaces but I can not ssh to the server or even telnet (nc) to ssh port. Awkward! Okay - hardware cold reboot then.

As this system is remote … in random times, I need to ask from someone to cold-reboot this machine. Awkward again.

Kernel Panic

If that was not enough, this machine also has random kernel panics.

damn_disk.jpeg

 

Errors

Let’s start troubleshooting this system

# journalctl -p 3 -x

 

Important Errors

ERST: Failed to get Error Log Address Range.
APEI: Can not request [mem 0x7dfab650-0x7dfab6a3] for APEI BERT registers
ipmi_si dmi-ipmi-si.0: Could not set up I/O space

and more important Errors:

INFO: task kswapd0:40 blocked for more than 120 seconds.
      Not tainted 4.15.0-54-generic #58-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task xfsaild/dm-0:761 blocked for more than 120 seconds.
      Not tainted 4.15.0-54-generic #58-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task kworker/u9:2:3612 blocked for more than 120 seconds.
      Not tainted 4.15.0-54-generic #58-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task kworker/1:0:5327 blocked for more than 120 seconds.
      Not tainted 4.15.0-54-generic #58-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task rm:5901 blocked for more than 120 seconds.
      Not tainted 4.15.0-54-generic #58-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task kworker/u9:1:5902 blocked for more than 120 seconds.
      Not tainted 4.15.0-54-generic #58-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task kworker/0:0:5906 blocked for more than 120 seconds.
      Not tainted 4.15.0-54-generic #58-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task kswapd0:40 blocked for more than 120 seconds.
      Not tainted 4.15.0-54-generic #58-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task xfsaild/dm-0:761 blocked for more than 120 seconds.
      Not tainted 4.15.0-54-generic #58-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task kworker/u9:2:3612 blocked for more than 120 seconds.
      Not tainted 4.15.0-54-generic #58-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

 

First impressions ?

damn.jpeg

 

BootOptions

After a few (hours) of internet research the suggestion is to disable

  • ACPI stands for Advanced Configuration and Power Interface.
  • APIC stands for Advanced Programmable Interrupt Controller.

This site is very helpful for ubuntu, although Red Hat still has a huge advanced on describing kernel options better than canonical.

Grub

# vim /etc/default/grub
GRUB_CMDLINE_LINUX="noapic acpi=off"

then

# update-grub
Sourcing file `/etc/default/grub'
Sourcing file `/etc/default/grub.d/50-curtin-settings.cfg'
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.15.0-54-generic
Found initrd image: /boot/initrd.img-4.15.0-54-generic
Found linux image: /boot/vmlinuz-4.15.0-52-generic
Found initrd image: /boot/initrd.img-4.15.0-52-generic
done

Verify

# grep noapic /boot/grub/grub.cfg | head -1

        linux   /boot/vmlinuz-4.15.0-54-generic root=UUID=0c686739-e859-4da5-87a2-dfd5fcccde3d ro noapic acpi=off maybe-ubiquity

reboot and check again:

#  journalctl -p 3 -xb
-- Logs begin at Thu 2019-03-14 19:26:12 EET, end at Wed 2019-07-03 21:31:08 EEST. --
Jul 03 21:30:49 servertwo kernel: ipmi_si dmi-ipmi-si.0: Could not set up I/O space

okay !!!

 

ipmi_si

Unfortunately I could not find anything useful regarding

# dmesg | grep -i ipm
[   10.977914] ipmi message handler version 39.2
[   11.188484] ipmi device interface
[   11.203630] IPMI System Interface driver.
[   11.203662] ipmi_si dmi-ipmi-si.0: ipmi_platform: probing via SMBIOS
[   11.203665] ipmi_si: SMBIOS: mem 0x0 regsize 1 spacing 1 irq 0
[   11.203667] ipmi_si: Adding SMBIOS-specified kcs state machine
[   11.203729] ipmi_si: Trying SMBIOS-specified kcs state machine at mem address 0x0, slave address 0x20, irq 0
[   11.203732] ipmi_si dmi-ipmi-si.0: Could not set up I/O space

# ipmitool list
Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory

# lsmod | grep -i ipmi
ipmi_si                61440  0
ipmi_devintf           20480  0
ipmi_msghandler        53248  2 ipmi_devintf,ipmi_si

 

blocked for more than 120 seconds.

But let’s try to fix the timeout warnings:

INFO: task kswapd0:40 blocked for more than 120 seconds.
      Not tainted 4.15.0-54-generic #58-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message

if you search online the above message, most of the sites will suggest to tweak dirty pages for your system.

This is the most common response across different sites:

This is a know bug. By default Linux uses up to 40% of the available memory for file system caching. After this mark has been reached the file system flushes all outstanding data to disk causing all following IOs going synchronous. For flushing out this data to disk this there is a time limit of 120 seconds by default. In the case here the IO subsystem is not fast enough to flush the data withing 120 seconds. This especially happens on systems with a lot of memory.

Okay this may be the problem but we do not have a lot of memory, only 2GB RAM and 2GB Swap. But even then, our vm.dirty_ratio = 20 setting is 20% instead of 40%.

 

But I have the ability to cross-check ubuntu 18.04 with CentOS 6.10 to compare notes:

 

ubuntu 18.04

# uname -r
4.15.0-54-generic

# sysctl -a | egrep -i  'swap|dirty|raid'|sort
dev.raid.speed_limit_max = 200000
dev.raid.speed_limit_min = 1000
vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 10
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 20
vm.dirtytime_expire_seconds = 43200
vm.dirty_writeback_centisecs = 500
vm.swappiness = 60

 

CentOS 6.11

#  uname -r
2.6.32-754.15.3.el6.centos.plus.x86_64

# sysctl -a | egrep -i  'swap|dirty|raid'|sort
dev.raid.speed_limit_max = 200000
dev.raid.speed_limit_min = 1000
vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 10
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 20
vm.dirty_writeback_centisecs = 500
vm.swappiness = 60

 

Scheduler for Raid

This is the best online documentation on the
optimize raid

Comparing notes we see that both systems have the same settings, even when the kernel version is a lot different, 2.6.32 Vs 4.15.0 !!!

Researching on raid optimization there is a note of kernel scheduler.

 

Ubuntu 18.04

# for drive in {a..c}; do cat /sys/block/sd${drive}/queue/scheduler; done

noop deadline [cfq]
noop deadline [cfq]
noop deadline [cfq] 

 

CentOS 6.11

# for drive in {a..d}; do cat /sys/block/sd${drive}/queue/scheduler; done

noop anticipatory deadline [cfq]
noop anticipatory deadline [cfq]
noop anticipatory deadline [cfq]
noop anticipatory deadline [cfq] 

 

Anticipatory scheduling

CentOS supports Anticipatory scheduling on the hard disks but nowadays anticipatory scheduler is not supported in modern kernel versions.

That said, from the above output we can verify that both systems are running the default scheduler cfq.

Disks

Ubuntu 18.04

  • Western Digital Red Pro WDC WD6003FFBX-6
# for i in sd{b..c} ; do hdparm -Tt  /dev/$i; done

/dev/sdb:
 Timing cached reads:   2344 MB in  2.00 seconds = 1171.76 MB/sec
 Timing buffered disk reads: 738 MB in  3.00 seconds = 245.81 MB/sec

/dev/sdc:
 Timing cached reads:   2264 MB in  2.00 seconds = 1131.40 MB/sec
 Timing buffered disk reads: 774 MB in  3.00 seconds = 257.70 MB/sec

CentOS 6.11

  • Seagate ST1000DX001
/dev/sdb:
 Timing cached reads:   2490 MB in  2.00 seconds = 1244.86 MB/sec
 Timing buffered disk reads: 536 MB in  3.01 seconds = 178.31 MB/sec

/dev/sdc:
 Timing cached reads:   2524 MB in  2.00 seconds = 1262.21 MB/sec
 Timing buffered disk reads: 538 MB in  3.00 seconds = 179.15 MB/sec

/dev/sdd:
 Timing cached reads:   2452 MB in  2.00 seconds = 1226.15 MB/sec
 Timing buffered disk reads: 546 MB in  3.01 seconds = 181.64 MB/sec

 

So what I am missing ?

My initial personal feeling was the low memory. But after running a manual rsync I’ve realized that:

cpu

was load average: 0.87, 0.46, 0.19

mem

was (on high load), when hit 40% of RAM, started to use swap.

KiB Mem :  2008464 total,    77528 free,   635900 used,  1295036 buff/cache
KiB Swap:  2097148 total,  2096624 free,      524 used.  1184220 avail Mem 

So I tweaked a bit the swapiness and reduce it from 60% to 40%

and run a local snapshot (that is a bit heavy on the disks) and doing an upgrade and trying to increase CPU load. Still everything is fine !

I will keep an eye on this story.

fantastic

 

Jun
20
2019
Tools I use daily the Win10 edition

a couple years ago I wrote an article about the Tools I use daily. The last nine (9) months I am partial using win10 due to new job challenges and thus I would like to write an updated version on my previous article.

 

I will try to use the same structure for comparison as the previous article although this a nine to five setup (work related). So here it goes.

 

Operating System

I use Win10 as my primary operating system in my work laptop. I have two impediments that can not work on a laptop distribution:

  • WebEx
  • MS Office

We are using webex as our primary communication tool. We are sharing our screen and have our video camera on, so that we can see each other. And we have a lot of meetings that integrate with our company’s outlook. I can use OWA as an alternative but in fact it is difficult to use both of them in a linux desktop.

I have considered to use a VM but a win10 vm needs more than 4G of RAM and a couple of CPUs just to boot up. In that case, means that I have to reduce my work laptop resources for half the day, every day. So for the time being I am staying with Win10 as the primary operating system.

 

Desktop

Default Win10 desktop

 

Disk / Filesystem

Default Win10 filesystem with bitlocker. Every HW change will lock the entire system and in the past just was the case!

Dropbox as a cloud sync software, an encfs partition and syncthing for secure personal syncing files.

 

Mail

Mostly OWA for calendar purposes and … still thunderbird for primary reading mails.

 

Shell

WSL … waiting for the official WSLv2 ! This is a huge HUGE upgrade for windows. I have setup an archlinux WSL environment to continue work on a linux environment, I mean bash. I use my WSL archlinux as a jumphost to my VMs.

 

Editor

Using Visual Studio Code for any python (or any other) scripting code file. Vim within WSL and notepad for temp text notes. The last year I have switched to Boostnote and markdown notes.

 

Browser

Multiple Instances of firefox, chromium, firefox Nightly, Tor Browser and Brave

 

Communication

I use mostly slack and signal-desktop. We are using webex but I prefer zoom. Less and less riot-matrix.

 

Media

VLC for windows, what else ? Also gimp for image editing. I have switched to spotify for music and draw for diagrams. Last I use CPod for podcasts.

 

In conclusion

I have switched to a majority of electron applications. I use the same applications on my Linux boxes. Encrypted notes on boostnote, synced over syncthing. Same browsers, same bash/shell, the only thing I dont have on my linux boxes are webex and outlook. Consider everything else, I think it is a decent setup across every distro.

 

Tag(s): win10
Jun
10
2019
MariaDB Galera Cluster on Ubuntu 18.04.2 LTS

MariaDB Galera Cluster on Ubuntu 18.04.2 LTS

Last Edit: 2019 06 11
Thanks to Manolis Kartsonakis for the extra info.

 

Official Notes here:
MariaDB Galera Cluster

a Galera Cluster is a synchronous multi-master cluster setup. Each node can act as master. The XtraDB/InnoDB storage engine can sync its data using rsync. Each data transaction gets a Global unique Id and then using Write Set REPLication the nodes can sync data across each other. When a new node joins the cluster the State Snapshot Transfers (SSTs) synchronize full data but in Incremental State Transfers (ISTs) only the missing data are synced.

With this setup we can have:

  • Data Redundancy
  • Scalability
  • Availability

galeracluster.png

 

Installation

In Ubuntu 18.04.2 LTS three packages should exist in every node.
So run the below commands in all of the nodes - change your internal IPs accordingly

as root

# apt -y install mariadb-server
# apt -y install galera-3
# apt -y install rsync

host file

as root

# echo 10.10.68.91 gal1 >> /etc/hosts
# echo 10.10.68.92 gal2 >> /etc/hosts
# echo 10.10.68.93 gal3 >> /etc/hosts

 

Storage Engine

Start the MariaDB/MySQL in one node and check the default storage engine. It should be

MariaDB [(none)]> show variables like 'default_storage_engine';

or

echo "SHOW Variables like 'default_storage_engine';" | mysql
+------------------------+--------+
| Variable_name          | Value  |
+------------------------+--------+
| default_storage_engine | InnoDB |
+------------------------+--------+

 

Architecture

A Galera Cluster should be behind a Load Balancer (proxy) and you should never talk with a node directly.

galeracluster_elb.png

Galera Configuration

Now copy the below configuration file in all 3 nodes:

/etc/mysql/conf.d/galera.cnf
[mysqld]
binlog_format=ROW
default-storage-engine=InnoDB
innodb_autoinc_lock_mode=2
bind-address=0.0.0.0

# Galera Provider Configuration
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so

# Galera Cluster Configuration
wsrep_cluster_name="galera_cluster"
wsrep_cluster_address="gcomm://10.10.68.91,10.10.68.92,10.10.68.93"

# Galera Synchronization Configuration
wsrep_sst_method=rsync

# Galera Node Configuration
wsrep_node_address="10.10.68.91"
wsrep_node_name="gal1"

Per Node

Be careful the last 2 lines should change to each node:

Node 01

# Galera Node Configuration
wsrep_node_address="10.10.68.91"
wsrep_node_name="gal1"

Node 02

# Galera Node Configuration
wsrep_node_address="10.10.68.92"
wsrep_node_name="gal2"

Node 03

# Galera Node Configuration
wsrep_node_address="10.10.68.93"
wsrep_node_name="gal3"

 

Galera New Cluster

We are ready to create our galera cluster:

galera_new_cluster

or

mysqld --wsrep-new-cluster

JournalCTL

Jun 10 15:01:20 gal1 systemd[1]: Starting MariaDB 10.1.40 database server...
Jun 10 15:01:24 gal1 sh[2724]: WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1
Jun 10 15:01:24 gal1 mysqld[2865]: 2019-06-10 15:01:24 139897056971904 [Note] /usr/sbin/mysqld (mysqld 10.1.40-MariaDB-0ubuntu0.18.04.1) starting as process 2865 ...
Jun 10 15:01:24 gal1 /etc/mysql/debian-start[2906]: Upgrading MySQL tables if necessary.
Jun 10 15:01:24 gal1 systemd[1]: Started MariaDB 10.1.40 database server.
Jun 10 15:01:24 gal1 /etc/mysql/debian-start[2909]: /usr/bin/mysql_upgrade: the '--basedir' option is always ignored
Jun 10 15:01:24 gal1 /etc/mysql/debian-start[2909]: Looking for 'mysql' as: /usr/bin/mysql
Jun 10 15:01:24 gal1 /etc/mysql/debian-start[2909]: Looking for 'mysqlcheck' as: /usr/bin/mysqlcheck
Jun 10 15:01:24 gal1 /etc/mysql/debian-start[2909]: This installation of MySQL is already upgraded to 10.1.40-MariaDB, use --force if you still need to run mysql_upgrade
Jun 10 15:01:24 gal1 /etc/mysql/debian-start[2918]: Checking for insecure root accounts.
Jun 10 15:01:24 gal1 /etc/mysql/debian-start[2922]: WARNING: mysql.user contains 4 root accounts without password or plugin!
Jun 10 15:01:24 gal1 /etc/mysql/debian-start[2923]: Triggering myisam-recover for all MyISAM tables and aria-recover for all Aria tables
# ss -at '( sport = :mysql )'

State                Recv-Q                Send-Q                                Local Address:Port                                  Peer Address:Port
LISTEN               0                     80                                        127.0.0.1:mysql                                      0.0.0.0:*         
# echo "SHOW STATUS LIKE 'wsrep_%';" | mysql  | egrep -i 'cluster|uuid|ready' | column -t
wsrep_cluster_conf_id     1
wsrep_cluster_size        1
wsrep_cluster_state_uuid  8abc6a1b-8adc-11e9-a42b-c6022ea4412c
wsrep_cluster_status      Primary
wsrep_gcomm_uuid          d67e5b7c-8b90-11e9-ba3d-23ea221848fd
wsrep_local_state_uuid    8abc6a1b-8adc-11e9-a42b-c6022ea4412c
wsrep_ready               ON

 

Second Node

systemctl restart mariadb.service
root@gal2:~# echo "SHOW STATUS LIKE 'wsrep_%';" | mysql  | egrep -i 'cluster|uuid|ready' | column -t

wsrep_cluster_conf_id     2
wsrep_cluster_size        2
wsrep_cluster_state_uuid  8abc6a1b-8adc-11e9-a42b-c6022ea4412c
wsrep_cluster_status      Primary
wsrep_gcomm_uuid          a5eaae3e-8b91-11e9-9662-0bbe68c7d690
wsrep_local_state_uuid    8abc6a1b-8adc-11e9-a42b-c6022ea4412c
wsrep_ready               ON

 

Third Node

systemctl restart mariadb.service
root@gal3:~# echo "SHOW STATUS LIKE 'wsrep_%';" | mysql  | egrep -i 'cluster|uuid|ready' | column -t

wsrep_cluster_conf_id     3
wsrep_cluster_size        3
wsrep_cluster_state_uuid  8abc6a1b-8adc-11e9-a42b-c6022ea4412c
wsrep_cluster_status      Primary
wsrep_gcomm_uuid          013e1847-8b92-11e9-9055-7ac5e2e6b947
wsrep_local_state_uuid    8abc6a1b-8adc-11e9-a42b-c6022ea4412c
wsrep_ready               ON

 

Primary Component (PC)

The last node in the cluster -in theory- has all the transactions. That means it should be the first to start next time from a power-off.

State

cat /var/lib/mysql/grastate.dat

eg.

# GALERA saved state
version: 2.1
uuid:    8abc6a1b-8adc-11e9-a42b-c6022ea4412c
seqno:   -1
safe_to_bootstrap: 0

if safe_to_bootstrap: 1 then you can bootstrap this node as Primary.

 

Common Mistakes

Sometimes DBAs want to setup a new cluster (lets say upgrade into a new scheme - non compatible with the previous) so they want a clean state/directory. The most common way is to move the current mysql directory

mv /var/lib/mysql /var/lib/mysql_BAK

If you try to start your galera node, it will fail:

# systemctl restart mariadb
WSREP: Failed to start mysqld for wsrep recovery:
[Warning] Can't create test file /var/lib/mysql/gal1.lower-test
Failed to start MariaDB 10.1.40 database server

You need to create and initialize the mysql directory first:

mkdir -pv /var/lib/mysql
chown -R mysql:mysql /var/lib/mysql
chmod 0755 /var/lib/mysql
mysql_install_db -u mysql

On another node, cluster_size = 2

# echo "SHOW STATUS LIKE 'wsrep_%';" | mysql  | egrep -i 'cluster|uuid|ready' | column -t

wsrep_cluster_conf_id     4
wsrep_cluster_size        2
wsrep_cluster_state_uuid  8abc6a1b-8adc-11e9-a42b-c6022ea4412c
wsrep_cluster_status      Primary
wsrep_gcomm_uuid          a5eaae3e-8b91-11e9-9662-0bbe68c7d690
wsrep_local_state_uuid    8abc6a1b-8adc-11e9-a42b-c6022ea4412c
wsrep_ready               ON

then:

# systemctl restart mariadb

rsync from the Primary:


Jun 10 15:19:00 gal1 rsyncd[3857]: rsyncd version 3.1.2 starting, listening on port 4444
Jun 10 15:19:01 gal1 rsyncd[3884]: connect from gal3 (192.168.122.93)
Jun 10 15:19:01 gal1 rsyncd[3884]: rsync to rsync_sst/ from gal3 (192.168.122.93)
Jun 10 15:19:01 gal1 rsyncd[3884]: receiving file list
#  echo "SHOW STATUS LIKE 'wsrep_%';" | mysql  | egrep -i 'cluster|uuid|ready' | column -t

wsrep_cluster_conf_id     5
wsrep_cluster_size        3
wsrep_cluster_state_uuid  8abc6a1b-8adc-11e9-a42b-c6022ea4412c
wsrep_cluster_status      Primary
wsrep_gcomm_uuid          12afa7bc-8b93-11e9-88fc-6f41be61a512
wsrep_local_state_uuid    8abc6a1b-8adc-11e9-a42b-c6022ea4412c
wsrep_ready               ON

Be Aware: Try to keep your DATA directory to a seperated storage disk

 

Adding new Nodes

A healthy Quorum has an odd number of nodes. So when you scale your galera gluster consider adding two (2) at every step!

# echo 10.10.68.94 gal4 >> /etc/hosts
# echo 10.10.68.95 gal5 >> /etc/hosts

Data Replication will lock your donor-node so it is best to put-off your donor-node from your Load Balancer:

galeracluster_elb_donor.png

Then explicit point your donor-node to your new nodes by adding the below line in your configuration file:

wsrep_sst_donor= gal3

After the synchronization:

  • comment-out the above line
  • restart mysql service and
  • put all three nodes behind the Local Balancer

 

Split Brain

Find the node with the max

SHOW STATUS LIKE 'wsrep_last_committed';

and set it as master by

SET GLOBAL wsrep_provider_options='pc.bootstrap=YES';

 

Weighted Quorum for Three Nodes

When configuring quorum weights for three nodes, use the following pattern:

node1: pc.weight = 4
node2: pc.weight = 3
node3: pc.weight = 2
node4: pc.weight = 1
node5: pc.weight = 0

eg.

SET GLOBAL wsrep_provider_options="pc.weight=3";

In the same VPC setting up pc.weight will avoid a split brain situation. In different regions, you can setup something like this:

node1: pc.weight = 2
node2: pc.weight = 2
node3: pc.weight = 2
  <->
node4: pc.weight = 1
node5: pc.weight = 1
node6: pc.weight = 1

 

Jun
08
2019
arch-audit

TIL: arch-audit

In archlinux there is a package named: arch-audit that is
an utility like pkg-audit based on Arch CVE Monitoring Team data.

 

Install

# pacman -Ss arch-audit
community/arch-audit 0.1.10-1

# sudo pacman -S arch-audit
resolving dependencies...
looking for conflicting packages...

Package (1)           New Version  Net Change  Download Size

community/arch-audit  0.1.10-1       1.96 MiB       0.57 MiB

Total Download Size:   0.57 MiB
Total Installed Size:  1.96 MiB

 

Run

  # arch-audit
Package docker is affected by CVE-2018-15664. High risk!
Package gettext is affected by CVE-2018-18751. High risk!
Package glibc is affected by CVE-2019-9169, CVE-2019-5155, CVE-2018-20796, CVE-2016-10739. High risk!
Package libarchive is affected by CVE-2019-1000020, CVE-2019-1000019, CVE-2018-1000880, CVE-2018-1000879, CVE-2018-1000878, CVE-2018-1000877. High risk!
Package libtiff is affected by CVE-2019-7663, CVE-2019-6128. Medium risk!
Package linux-lts is affected by CVE-2018-5391, CVE-2018-3646, CVE-2018-3620, CVE-2018-3615, CVE-2018-8897, CVE-2017-8824, CVE-2017-17741, CVE-2017-17450, CVE-2017-17448, CVE-2017-16644, CVE-2017-5753, CVE-2017-5715, CVE-2018-1121, CVE-2018-1120, CVE-2017-1000379, CVE-2017-1000371, CVE-2017-1000370, CVE-2017-1000365. High risk!
Package openjpeg2 is affected by CVE-2019-6988. Low risk!
Package python-yaml is affected by CVE-2017-18342. High risk!. Update to 5.1-1 from testing repos!
Package sdl is affected by CVE-2019-7638, CVE-2019-7637, CVE-2019-7636, CVE-2019-7635, CVE-2019-7578, CVE-2019-7577, CVE-2019-7576, CVE-2019-7575, CVE-2019-7574, CVE-2019-7573, CVE-2019-7572. High risk!
Package sdl2 is affected by CVE-2019-7638, CVE-2019-7637, CVE-2019-7636, CVE-2019-7635, CVE-2019-7578, CVE-2019-7577, CVE-2019-7576, CVE-2019-7575, CVE-2019-7574, CVE-2019-7573, CVE-2019-7572. High risk!
Package unzip is affected by CVE-2018-1000035. Low risk!
Tag(s): archlinux
Jun
06
2019
Using OpenTelekomCloud with the native python-openstackclient

 

Open Telekom Cloud (OTC) is Deutsche Telekom’s Cloud for Business Customers. I would suggest to visit the Cloud Infrastructure & Cloud Platform Solutions for more information but I will try to keep this a technical post.

 

cloud.png

 

In this post you will find my personal notes on how to use the native python openstack CLI client to access OTC from your console.

Notes are based on Ubuntu 18.04.2 LTS

otcextensions

 

Virtual Environment

Create an isolated python virtual environment (directory) to setup everything under there:

~>  mkdir -pv otc/ && cd otc/
mkdir: created directory 'otc/'

~>  virtualenv -p `which python3`  .

Already using interpreter /usr/bin/python3
Using base prefix '/usr'
New python executable in /home/ebal/otc/bin/python3
Also creating executable in /home/ebal/otc/bin/python
Installing setuptools, pkg_resources, pip, wheel...done.
~>   source bin/activate

(otc) ebal@ubuntu:otc~>
(otc) ebal@ubuntu:otc~> python -V
Python 3.6.7

 

OpenStack Dependencies

Install python openstack dependencies

  • openstack sdk

(otc) ebal@ubuntu:otc~> pip install python-openstacksdk

Collecting python-openstacksdk
Collecting openstacksdk (from python-openstacksdk)
  Downloading https://files.pythonhosted.org/packages/
  ...
  • openstack client

(otc) ebal@ubuntu:otc~> pip install python-openstackclient

Collecting python-openstackclient
  Using cached https://files.pythonhosted.org/packages
  ...

 

Install OTC Extensions

It’s time to install the otcextensions

(otc) ebal@ubuntu:otc~> pip install otcextensions

Collecting otcextensions
Requirement already satisfied: openstacksdk>=0.19.0 in ./lib/python3.6/site-packages (from otcextensions) (0.30.0)
...
Installing collected packages: otcextensions
Successfully installed otcextensions-0.6.3

 

List

(otc) ebal@ubuntu:otc~> pip list | egrep '^python|otc'

otcextensions          0.6.3
python-cinderclient    4.2.0
python-glanceclient    2.16.0
python-keystoneclient  3.19.0
python-novaclient      14.1.0
python-openstackclient 3.18.0
python-openstacksdk    0.5.2

 

Authentication

Create a new clouds.yaml with your OTC credentials

template:

clouds:
 otc:
   auth:
     username: 'USER_NAME'
     password: 'PASS'
     project_name: 'eu-de'
     auth_url: 'https://iam.eu-de.otc.t-systems.com:443/v3'
     user_domain_name: 'OTC00000000001000000xxx'
   interface: 'public'
   identity_api_version: 3 # !Important
   ak: 'AK_VALUE' # AK/SK pair for access to OBS
   sk: 'SK_VAL

 

OTC Connect

Let’s tested it !

(otc) ebal@ubuntu:otc~> openstack --os-cloud otc server list

+--------------------------------------+---------------+---------+-------------------------------------------------------------------+-----------------------------------+---------------+
| ID                                   | Name          | Status  | Networks                                                          | Image                             | Flavor        |
+--------------------------------------+---------------+---------+-------------------------------------------------------------------+-----------------------------------+---------------+
| XXXXXXXX-1234-4d7f-8097-YYYYYYYYYYYY | app00-prod    | ACTIVE  | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.101.72.110               | Standard_Ubuntu_16.04_latest      | s2.large.2    |
| XXXXXXXX-1234-4f7d-beaa-YYYYYYYYYYYY | app01-prod    | ACTIVE  | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.101.64.95                | Standard_Ubuntu_16.04_latest      | s2.medium.2   |
| XXXXXXXX-1234-4088-baa8-YYYYYYYYYYYY | app02-prod    | ACTIVE  | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.100.76.160               | Standard_Ubuntu_16.04_latest      | s2.large.4    |
| XXXXXXXX-1234-43f5-8a10-YYYYYYYYYYYY | web00-prod    | ACTIVE  | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.100.76.121               | Standard_Ubuntu_16.04_latest      | s2.xlarge.2   |
| XXXXXXXX-1234-41eb-aa0b-YYYYYYYYYYYY | web01-prod    | ACTIVE  | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.100.76.151               | Standard_Ubuntu_16.04_latest      | s2.large.4    |
| XXXXXXXX-1234-41f7-98ff-YYYYYYYYYYYY | web00-stage   | ACTIVE  | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.100.76.150               | Standard_Ubuntu_16.04_latest      | s2.large.4    |
| XXXXXXXX-1234-41b2-973f-YYYYYYYYYYYY | web01-stage   | ACTIVE  | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.100.76.120               | Standard_Ubuntu_16.04_latest      | s2.xlarge.2   |
| XXXXXXXX-1234-468f-a41c-YYYYYYYYYYYY | app00-stage   | SHUTOFF | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.101.70.111               | Community_Ubuntu_14.04_TSI_latest | s2.xlarge.2   |
| XXXXXXXX-1234-4fdf-8b4c-YYYYYYYYYYYY | app01-stage   | SHUTOFF | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.100.64.92                | Community_Ubuntu_14.04_TSI_latest | s1.large      |
| XXXXXXXX-1234-4e68-a86d-YYYYYYYYYYYY | app02-stage   | SHUTOFF | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.100.66.96                | Community_Ubuntu_14.04_TSI_latest | s2.xlarge.4   |
| XXXXXXXX-1234-475d-9a66-YYYYYYYYYYYY | web00-test    | SHUTOFF | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.102.76.11, 10.44.23.18   | Community_Ubuntu_14.04_TSI_latest | c1.medium     |
| XXXXXXXX-1234-4dac-a6b1-YYYYYYYYYYYY | web01-test    | SHUTOFF | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.101.64.14                | Community_Ubuntu_14.04_TSI_latest | s1.xlarge     |
| XXXXXXXX-1234-458e-8e21-YYYYYYYYYYYY | web02-test    | SHUTOFF | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.101.64.13                | Community_Ubuntu_14.04_TSI_latest | s1.xlarge     |
| XXXXXXXX-1234-42c4-b953-YYYYYYYYYYYY | k8s02-prod    | SHUTOFF | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.101.64.12                | Community_Ubuntu_14.04_TSI_latest | s1.xlarge     |
| XXXXXXXX-1234-4225-b6af-YYYYYYYYYYYY | k8s02-stage   | SHUTOFF | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.101.64.11                | Community_Ubuntu_14.04_TSI_latest | s1.xlarge     |
| XXXXXXXX-1234-4eb1-a596-YYYYYYYYYYYY | k8s02-test    | SHUTOFF | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.102.64.14                | Community_Ubuntu_14.04_TSI_latest | s1.xlarge     |
| XXXXXXXX-1234-4222-b866-YYYYYYYYYYYY | k8s03-test    | SHUTOFF | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.102.64.13                | Community_Ubuntu_14.04_TSI_latest | s1.xlarge     |
| XXXXXXXX-1234-453d-a9c5-YYYYYYYYYYYY | k8s04-test    | SHUTOFF | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.101.64.10                | Community_Ubuntu_14.04_TSI_latest | s1.2xlarge    |
| XXXXXXXX-1234-4968-a2be-YYYYYYYYYYYY | k8s05-test    | SHUTOFF | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.102.76.14, 10.44.22.138  | Community_Ubuntu_14.04_TSI_latest | c2.2xlarge    |
| XXXXXXXX-1234-4c71-a00f-YYYYYYYYYYYY | k8s07-test    | SHUTOFF | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.102.76.170               | Community_Ubuntu_14.04_TSI_latest | c1.medium     |
+--------------------------------------+---------------+---------+-------------------------------------------------------------------+-----------------------------------+---------------+

 

Load Balancers

(otc) ebal@ubuntu:~/otc~> openstack --os-cloud otc loadbalancer list

+--------------------------------------+----------------------------------+----------------------------------+---------------+---------------------+----------+
| id                                   | name                             | project_id                       | vip_address   | provisioning_status | provider |
+--------------------------------------+----------------------------------+----------------------------------+---------------+---------------------+----------+
| XXXXXXXX-ad99-4de0-d885-YYYYYYYYYYYY | aaccaacbddd1111eee5555aaaaa22222 | 44444bbbbbbb4444444cccccc3333333 | 100.100.10.143 | ACTIVE              | vlb      |
+--------------------------------------+----------------------------------+----------------------------------+---------------+---------------------+----------+

 

Tag(s): otc, python, openstack
May
04
2019
Hardening OpenSSH Server

start by reading:

man 5 sshd_config

 

CentOS 6.x

Ciphers aes128-ctr,aes192-ctr,aes256-ctr
KexAlgorithms diffie-hellman-group-exchange-sha256
MACs hmac-sha2-256,hmac-sha2-512

and change the below setting in /etc/sysconfig/sshd:

AUTOCREATE_SERVER_KEYS=RSAONLY

 

CentOS 7.x

Ciphers chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com
KexAlgorithms curve25519-sha256,curve25519-sha256@libssh.org,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group14-sha256
MACs umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512

 

Ubuntu 18.04.2 LTS

Ciphers chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com
KexAlgorithms curve25519-sha256,curve25519-sha256@libssh.org,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group14-sha256
MACs umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512
HostKeyAlgorithms ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp521-cert-v01@openssh.com,ssh-ed25519-cert-v01@openssh.com,ssh-rsa-cert-v01@openssh.com,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,ssh-ed25519,rsa-sha2-512,rsa-sha2-256

 

Archlinux

Ciphers chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com
KexAlgorithms curve25519-sha256,curve25519-sha256@libssh.org,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group14-sha256
MACs umac-128-etm@openssh.com,hmac-sha2-512-etm@openssh.com
HostKeyAlgorithms ssh-ed25519-cert-v01@openssh.com,rsa-sha2-512-cert-v01@openssh.com,rsa-sha2-256-cert-v01@openssh.com,ssh-rsa-cert-v01@openssh.com,ssh-ed25519,rsa-sha2-512,rsa-sha2-256,ssh-rsa

 

Renew SSH Host Keys

rm -f /etc/ssh/ssh_host_* && service sshd restart

 

Generating ssh moduli file

for advance users only

ssh-keygen -G /tmp/moduli -b 4096

ssh-keygen -T /etc/ssh/moduli -f /tmp/moduli 
Tag(s): openssh
Apr
01
2019
ansible assertations tests with Loop

This blog post is written mostly as a reminder to myself!

 

Loop Assert Time Tests with Regex

 

test.yml

 

---
- hosts: localhost
  connection: local
  become: no
  gather_facts: no

  tasks:
  - name: Assert Time
    assert:
      that:
        - item | regex_findall('^(?:\d|[01]\d|2[0-3]):[0-5]\d(:[0-5]\d)?$')
      msg: "starTime should be HH:MM:SS or HH:MM"
    with_items:
      - "21:00"
      - "22:22"
      - "3:3"
      - "03:3"
      - "03:03"
      - "03:03:03"
      - "24:00"
      - "22:22:22"
      - "23:00:60"
      - "24:24:24"
      - "00:00:00"
      - "21:21"
      - "foo"
      - "foo:bar"
      - "True"
      - "False"
      - "1:01"
      - "01:01"
    ignore_errors: True

# vim: sts=2 sw=2 ts=2 et

 

output

 

PLAY [localhost] **************************************************************

TASK [Assert Time] ************************************************************
ok: [localhost] => (item=21:00) => {
    "changed": false,
    "item": "21:00",
    "msg": "All assertions passed"
}
ok: [localhost] => (item=22:22) => {
    "changed": false,
    "item": "22:22",
    "msg": "All assertions passed"
}
failed: [localhost] (item=3:3) => {
    "assertion": "item | regex_findall('^(?:\\d|[01]\\d|2[0-3]):[0-5]\\d(:[0-5]\\d)?$')",
    "changed": false,
    "evaluated_to": false,
    "item": "3:3",
    "msg": "starTime should be HH:MM:SS or HH:MM"
}
failed: [localhost] (item=03:3) => {
    "assertion": "item | regex_findall('^(?:\\d|[01]\\d|2[0-3]):[0-5]\\d(:[0-5]\\d)?$')",
    "changed": false,
    "evaluated_to": false,
    "item": "03:3",
    "msg": "starTime should be HH:MM:SS or HH:MM"
}
ok: [localhost] => (item=03:03) => {
    "changed": false,
    "item": "03:03",
    "msg": "All assertions passed"
}
ok: [localhost] => (item=03:03:03) => {
    "changed": false,
    "item": "03:03:03",
    "msg": "All assertions passed"
}
failed: [localhost] (item=24:00) => {
    "assertion": "item | regex_findall('^(?:\\d|[01]\\d|2[0-3]):[0-5]\\d(:[0-5]\\d)?$')",
    "changed": false,
    "evaluated_to": false,
    "item": "24:00",
    "msg": "starTime should be HH:MM:SS or HH:MM"
}
ok: [localhost] => (item=22:22:22) => {
    "changed": false,
    "item": "22:22:22",
    "msg": "All assertions passed"
}
failed: [localhost] (item=23:00:60) => {
    "assertion": "item | regex_findall('^(?:\\d|[01]\\d|2[0-3]):[0-5]\\d(:[0-5]\\d)?$')",
    "changed": false,
    "evaluated_to": false,
    "item": "23:00:60",
    "msg": "starTime should be HH:MM:SS or HH:MM"
}
failed: [localhost] (item=24:24:24) => {
    "assertion": "item | regex_findall('^(?:\\d|[01]\\d|2[0-3]):[0-5]\\d(:[0-5]\\d)?$')",
    "changed": false,
    "evaluated_to": false,
    "item": "24:24:24",
    "msg": "starTime should be HH:MM:SS or HH:MM"
}
ok: [localhost] => (item=00:00:00) => {
    "changed": false,
    "item": "00:00:00",
    "msg": "All assertions passed"
}
ok: [localhost] => (item=21:21) => {
    "changed": false,
    "item": "21:21",
    "msg": "All assertions passed"
}
failed: [localhost] (item=foo) => {
    "assertion": "item | regex_findall('^(?:\\d|[01]\\d|2[0-3]):[0-5]\\d(:[0-5]\\d)?$')",
    "changed": false,
    "evaluated_to": false,
    "item": "foo",
    "msg": "starTime should be HH:MM:SS or HH:MM"
}
failed: [localhost] (item=foo:bar) => {
    "assertion": "item | regex_findall('^(?:\\d|[01]\\d|2[0-3]):[0-5]\\d(:[0-5]\\d)?$')",
    "changed": false,
    "evaluated_to": false,
    "item": "foo:bar",
    "msg": "starTime should be HH:MM:SS or HH:MM"
}
failed: [localhost] (item=True) => {
    "assertion": "item | regex_findall('^(?:\\d|[01]\\d|2[0-3]):[0-5]\\d(:[0-5]\\d)?$')",
    "changed": false,
    "evaluated_to": false,
    "item": "True",
    "msg": "starTime should be HH:MM:SS or HH:MM"
}
failed: [localhost] (item=False) => {
    "assertion": "item | regex_findall('^(?:\\d|[01]\\d|2[0-3]):[0-5]\\d(:[0-5]\\d)?$')",
    "changed": false,
    "evaluated_to": false,
    "item": "False",
    "msg": "starTime should be HH:MM:SS or HH:MM"
}
ok: [localhost] => (item=1:01) => {
    "changed": false,
    "item": "1:01",
    "msg": "All assertions passed"
}
ok: [localhost] => (item=01:01) => {
    "changed": false,
    "item": "01:01",
    "msg": "All assertions passed"
}
...ignoring

PLAY RECAP ********************************************************************
localhost                  : ok=1    changed=0    unreachable=0    failed=0   

 

Tag(s): ansible, loop, assert
Mar
10
2019
Generate a random root password aka Ansible Password Plugin

I was suspicious with a cron entry on a new ubuntu server cloud vm, so I ended up to be looking on the logs.

Authentication token is no longer valid; new one required

After a quick internet search,

# chage -l root

Last password change                                    : password must be changed
Password expires                                        : password must be changed
Password inactive                                       : password must be changed
Account expires                                         : never
Minimum number of days between password change          : 0
Maximum number of days between password change          : 99999
Number of days of warning before password expires       : 7

due to the password must be changed on the root account, the cron entry does not run as it should.

This ephemeral image does not need to have a persistent known password, as the notes suggest, and it doesn’t! Even so, we should change to root password when creating the VM.

Ansible

Ansible have a password plugin that we can use with lookup.

TLDR; here is the task:

- name: Generate Random Password
    user:
      name: root
      password: "{{ lookup('password','/dev/null encrypt=sha256_crypt length=32') }}"

after ansible-playbook runs

# chage -l root

Last password change                                    : Mar 10, 2019
Password expires                                        : never
Password inactive                                       : never
Account expires                                         : never
Minimum number of days between password change          : 0
Maximum number of days between password change          : 99999
Number of days of warning before password expires       : 7

and cron entry now runs as it should.

Password Plugin

Let explain how password plugin works.

Lookup needs at-least two (2) variables, the plugin name and a file to store the output. Instead, we will use /dev/null not to persist the password to a file.

To begin with it, a test ansible playbook:

- hosts: localhost
  gather_facts: False
  connection: local

  tasks:
    - debug:
        msg: "{{ lookup('password', '/dev/null') }}"
      with_sequence: count=5

Output:

ok: [localhost] => (item=1) => {
    "msg": "dQaVE0XwWti,7HMUgq::"
}
ok: [localhost] => (item=2) => {
    "msg": "aT3zqg.KjLwW89MrAApx"
}
ok: [localhost] => (item=3) => {
    "msg": "4LBNn:fVw5GhXDWh6TnJ"
}
ok: [localhost] => (item=4) => {
    "msg": "v273Hbox1rkQ3gx3Xi2G"
}
ok: [localhost] => (item=5) => {
    "msg": "NlwzHoLj8S.Y8oUhcMv,"
}

Length

In password plugin we can also use length variable:

msg: "{{ lookup('password', '/dev/null length=32') }}"

output:

ok: [localhost] => (item=1) => {
    "msg": "4.PEb6ycosnyL.SN7jinPM:AC9w2iN_q"
}
ok: [localhost] => (item=2) => {
    "msg": "s8L6ZU_Yzuu5yOk,ISM28npot4.KwQrE"
}
ok: [localhost] => (item=3) => {
    "msg": "L9QvLyNTvpB6oQmcF8WVFy.7jE4Q1K-W"
}
ok: [localhost] => (item=4) => {
    "msg": "6DMH8KqIL:kx0ngFe8:ri0lTK4hf,SWS"
}
ok: [localhost] => (item=5) => {
    "msg": "ByW11i_66K_0mFJVB37Mq2,.fBflepP9"
}

Characters

We can define a specific type of python string constants

  • ascii_letters (ascii_lowercase and ascii_uppercase
  • ascii_lowercase
  • ascii_uppercase
  • digits
  • hexdigits
  • letters (lowercase and uppercase)
  • lowercase
  • octdigits
  • punctuation
  • printable (digits, letters, punctuation and whitespace
  • uppercase
  • whitespace

eg.

msg: "{{ lookup('password', '/dev/null length=32 chars=ascii_lowercase') }}"

ok: [localhost] => (item=1) => {
    "msg": "vwogvnpemtdobjetgbintcizjjgdyinm"
}
ok: [localhost] => (item=2) => {
    "msg": "pjniysksnqlriqekqbstjihzgetyshmp"
}
ok: [localhost] => (item=3) => {
    "msg": "gmeoeqncdhllsguorownqbynbvdusvtw"
}
ok: [localhost] => (item=4) => {
    "msg": "xjluqbewjempjykoswypqlnvtywckrfx"
}
ok: [localhost] => (item=5) => {
    "msg": "pijnjfcpjoldfuxhmyopbmgdmgdulkai"
}

Encrypt

We can also define the encryption hash. Ansible uses passlib so the unix active encrypt hash algorithms are:

  • passlib.hash.bcrypt - BCrypt
  • passlib.hash.sha256_crypt - SHA-256 Crypt
  • passlib.hash.sha512_crypt - SHA-512 Crypt

eg.

msg: "{{ lookup('password', '/dev/null length=32 chars=ascii_lowercase encrypt=sha512_crypt') }}"

ok: [localhost] => (item=1) => {
    "msg": "$6$BR96lZqN$jy.CRVTJaubOo6QISUJ9tQdYa6P6tdmgRi1/NQKPxwX9/Plp.7qETuHEhIBTZDTxuFqcNfZKtotW5q4H0BPeN."
}
ok: [localhost] => (item=2) => {
    "msg": "$6$ESf5xnWJ$cRyOuenCDovIp02W0eaBmmFpqLGGfz/K2jd1FOSVkY.Lsuo8Gz8oVGcEcDlUGWm5W/CIKzhS43xdm5pfWyCA4."
}
ok: [localhost] => (item=3) => {
    "msg": "$6$pS08v7j3$M4mMPkTjSwElhpY1bkVL727BuMdhyl4IdkGM7Mq10jRxtCSrNlT4cAU3wHRVxmS7ZwZI14UwhEB6LzfOL6pM4/"
}
ok: [localhost] => (item=4) => {
    "msg": "$6$m17q/zmR$JdogpVxY6MEV7nMKRot069YyYZN6g8GLqIbAE1cRLLkdDT3Qf.PImkgaZXDqm.2odmLN8R2ZMYEf0vzgt9PMP1"
}
ok: [localhost] => (item=5) => {
    "msg": "$6$tafW6KuH$XOBJ6b8ORGRmRXHB.pfMkue56S/9NWvrY26s..fTaMzcxTbR6uQW1yuv2Uy1DhlOzrEyfFwvCQEhaK6MrFFye."
}
Tag(s): ansible, password
Mar
03
2019
Scaling automation with ansible-pull

Ansible is a wonderful software to automatically configure your systems. The default mode of using ansible is Push Model.

 

Ansible Push

That means from your box, and only using ssh + python, you can configure your flee of machines.

 

Ansible is imperative. You define tasks in your playbooks, roles and they will run in a serial manner on the remote machines. The task will first check if needs to run and otherwise it will skip the action. And although we can use conditional to skip actions, tasks will perform all checks. For that reason ansible seems slow instead of other configuration tools. Ansible runs in serial mode the tasks but in psedo-parallel mode against the remote servers, to increase the speed. But sometimes you need to gather_facts and that would cost in execution time. There are solutions to cache the ansible facts in a redis (in memory key:value db) but even then, you need to find a work-around to speed your deployments.

But there is an another way, the Pull Mode!

 

Useful Reading Materials

to learn more on the subject, you can start reading these two articles on ansible-pull.

 

Pull Mode

So here how it looks:

Ansible Pull

 

You will first notice, that your ansible repository is moved from you local machine to an online git repository. For me, this is GitLab. As my git repo is private, I have created a Read-Only, time-limit, Deploy Token.

With that scenario, our (ephemeral - or not) VMs will pull their ansible configuration from the git repo and run the tasks locally. I usually build my infrastructure with Terraform by HashiCorp and make advance of cloud-init to initiate their initial configuration.

Cloud-init

The tail of my user-data.yml looks pretty much like this:

...
# Install packages
packages:
  - ansible

# Run ansible-pull
runcmd:
  - ansible-pull -U https://gitlab+deploy-token-XXXXX:YYYYYYYY@gitlab.com/username/myrepo.git 

Playbook

You can either create a playbook named with the hostname of the remote server, eg. node1.yml or use the local.yml as the default playbook name.

Here is an example that will also put ansible-pull into a cron entry. This is very useful because it will check for any changes in the git repo every 15 minutes and run ansible again.

- hosts: localhost

  tasks:
    - name: Ensure ansible-pull is running every 15 minutes
      cron:
        name: "ansible-pull"
        minute: "15"
        job: "ansible-pull -U https://gitlab+deploy-token-XXXXX:YYYYYYYY@gitlab.com/username/myrepo.git &> /dev/null"

    - name: Create a custom local vimrc file
      lineinfile:
        path: /etc/vim/vimrc.local
        line: 'set modeline'
        create: yes

    - name: Remove "cloud-init" package
      apt:
        name: "cloud-init"
        purge: yes
        state: absent

    - name: Remove useless packages from the cache
      apt:
        autoclean: yes

    - name: Remove dependencies that are no longer required
      apt:
        autoremove: yes

# vim: sts=2 sw=2 ts=2 et
Feb
27
2019
ansible tips

In this short blog post, I will share a few ansible tips:

Run a specific role

ansible -i hosts -m include_role -a name=roles/common all -b

Print Ansible Variables

ansible -m debug -a 'var=vars' 127.0.0.1 | sed "1 s/^.*$/{/" | jq .

Ansible Conditionals

Variable contains string

matching string in a variable

  vars:
      var1: "hello world"

  - debug:
        msg: " {{ var1 }} "
    when:  ........

old way

  when: var1.find("hello") != -1

deprecated way

  when: var1 | search("hello")

Correct way

  when: var1 is search("hello")

Multiple Conditionals

Logical And

      when:
        - ansible_distribution == "Archlinux"
        - ansible_hostname is search("myhome")

Numeric Conditionals

getting variable from command line (or from somewhere else)

   -  set_fact:
       my_variable_keepday: "{{ keepday | default(7)  | int }}"

    - name: something
      debug:
        msg: "keepday : {{  my_variable_keepday }}"
      when:
        - my_variable_keepday|int >= 0
        - my_variable_keepday|int <= 10

Validate Time variable with ansible

I need to validate a variable. It should be ‘%H:%M:%S’
My work-around is to convert it to datetime so that I can validate it:

  tasks:
    - debug:
        msg: "{{ startTime | to_datetime('%H:%M:%S') }}"

First example: 21:30:15

True: "msg": "1900-01-01 21:30:15"

Second example: ‘25:00:00′

could not be converted to an dict.
The error was: time data '25:00:00' does not match format '%H:%M:%S'
Tag(s): ansible
Feb
21
2019
ArchLinux WSL

 

This article will show how to install Arch Linux in Windows 10 under Windows Subsystem for Linux.

WSL

Prerequisite is to have enabled WSL on your Win10 and already reboot your machine.

You can enable WSL :

  • Windows Settings
  • Apps
  • Apps & features
  • Related settings -> Programs and Features (bottom)
  • Turn Windows features on or off (left)

 

wsl.png

 

Store

After rebooting your Win10, you can use Microsoft Store to install a Linux distribution like Ubuntu. Archlinux is not an official supported linux distribution thus this guide !

 

Launcher

The easiest way to install Archlinux (or any Linux distro) is to download the wsldl from github. This project provides a generic Launcher.exe and any rootfs as source base. First thing is to rename Launcher.exe to Archlinux.exe.

ebal@myworklaptop:~$ mkdir -pv Archlinux
mkdir: created directory 'Archlinux'

ebal@myworklaptop:~$ cd Archlinux/

ebal@myworklaptop:~/Archlinux$ curl -sL -o Archlinux.exe https://github.com/yuk7/wsldl/releases/download/18122700/Launcher.exe
ebal@myworklaptop:~/Archlinux$ ls -l
total 320
-rw-rw-rw- 1 ebal ebal 143147 Feb 21 20:40 Archlinux.exe

 

RootFS

Next step is to download the latest archlinux root filesystem and create a new rootfs.tar.gz archive file, as wsldl uses this type.

ebal@myworklaptop:~/Archlinux$ curl -sLO http://ftp.otenet.gr/linux/archlinux/iso/latest/archlinux-bootstrap-2019.02.01-x86_64.tar.gz

ebal@myworklaptop:~/Archlinux$ ls -l
total 147392
-rw-rw-rw- 1 ebal ebal    143147 Feb 21 20:40 Archlinux.exe
-rw-rw-rw- 1 ebal ebal 149030552 Feb 21 20:42 archlinux-bootstrap-2019.02.01-x86_64.tar.gz

ebal@myworklaptop:~/Archlinux$ sudo tar xf archlinux-bootstrap-2019.02.01-x86_64.tar.gz

ebal@myworklaptop:~/Archlinux$  cd root.x86_64/

ebal@myworklaptop:~/Archlinux/root.x86_64$ ls
README  bin  boot  dev  etc  home  lib  lib64  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var

ebal@myworklaptop:~/Archlinux/root.x86_64$  sudo tar czf rootfs.tar.gz .
tar: .: file changed as we read it

ebal@myworklaptop:~/Archlinux/root.x86_64$ ls
README  bin  boot  dev  etc  home  lib  lib64  mnt  opt  proc  root  rootfs.tar.gz  run  sbin  srv  sys  tmp  usr  var

ebal@myworklaptop:~/Archlinux/root.x86_64$ du -sh rootfs.tar.gz
144M    rootfs.tar.gz

ebal@myworklaptop:~/Archlinux/root.x86_64$ sudo mv rootfs.tar.gz ../

ebal@myworklaptop:~/Archlinux/root.x86_64$ cd ..
ebal@myworklaptop:~/Archlinux$ ls
Archlinux.exe  archlinux-bootstrap-2019.02.01-x86_64.tar.gz  root.x86_64  rootfs.tar.gz

ebal@myworklaptop:~/Archlinux$
ebal@myworklaptop:~/Archlinux$ ls
Archlinux.exe  rootfs.tar.gz

ebal@myworklaptop:~$ mv Archlinux/ /mnt/c/Users/EvaggelosBalaskas/Downloads/ArchlinuxWSL
ebal@myworklaptop:~$

As you can see, I do a little clean up and I move the directory under windows filesystem.

 

Install & Verify

archwsl.png

 

Microsoft Windows [Version 10.0.17134.619]
(c) 2018 Microsoft Corporation. All rights reserved.

C:UsersEvaggelosBalaskas>cd Downloads/ArchlinuxWSL

C:UsersEvaggelosBalaskasDownloadsArchlinuxWSL>dir
 Volume in drive C is Windows
 Volume Serial Number is 6C02-EE43

 Directory of C:UsersEvaggelosBalaskasDownloadsArchlinuxWSL

21-Feb-19  21:04    <DIR>          .
21-Feb-19  21:04    <DIR>          ..
21-Feb-19  20:40           143,147 Archlinux.exe
21-Feb-19  20:52       150,178,551 rootfs.tar.gz
               2 File(s)    150,321,698 bytes
               2 Dir(s)  374,579,486,720 bytes free

C:UsersEvaggelosBalaskasDownloadsArchlinuxWSL>Archlinux.exe
Installing...
Installation Complete!
Press any key to continue...

C:UsersEvaggelosBalaskasDownloadsArchlinuxWSL>Archlinux.exe run uname -a
Linux myworklaptop 4.4.0-17134-Microsoft #523-Microsoft Mon Dec 31 17:49:00 PST 2018 x86_64 GNU/Linux

C:UsersEvaggelosBalaskasDownloadsArchlinuxWSL>Archlinux.exe run cat /etc/os-release
NAME="Arch Linux"
PRETTY_NAME="Arch Linux"
ID=arch
BUILD_ID=rolling
ANSI_COLOR="0;36"
HOME_URL="https://www.archlinux.org/"
DOCUMENTATION_URL="https://wiki.archlinux.org/"
SUPPORT_URL="https://bbs.archlinux.org/"
BUG_REPORT_URL="https://bugs.archlinux.org/"

C:UsersEvaggelosBalaskasDownloadsArchlinuxWSL>Archlinux.exe run bash
[root@myworklaptop ArchlinuxWSL]#
[root@myworklaptop ArchlinuxWSL]# exit

 

Archlinux

C:UsersEvaggelosBalaskasDownloadsArchlinuxWSL>Archlinux.exe run bash
[root@myworklaptop ArchlinuxWSL]#

[root@myworklaptop ArchlinuxWSL]# date
Thu Feb 21 21:41:41 STD 2019

Remember, archlinux by default does not have any configuration. So you need to configure this instance !

Here are some basic configuration:

[root@myworklaptop ArchlinuxWSL]# echo nameserver 8.8.8.8 > /etc/resolv.conf

[root@myworklaptop ArchlinuxWSL]# cat > /etc/pacman.d/mirrorlist <<EOF
Server = http://ftp.otenet.gr/linux/archlinux/$repo/os/$arch
EOF

[root@myworklaptop ArchlinuxWSL]#  pacman-key --init

[root@myworklaptop ArchlinuxWSL]#  pacman-key --populate

[root@myworklaptop ArchlinuxWSL]# pacman -Syy

you are pretty much ready to use archlinux inside your windows 10 !!

 

Remove

You can remove Archlinux by simple:

 Archlinux.exe clean 

 

Default User

There is a simple way to use Archlinux within Windows Subsystem for Linux , by connecting with a default user.

But before configure ArchWSL, we need to create this user inside the archlinux instance:

[root@myworklaptop ArchWSL]# useradd -g 374 -u 374 ebal

[root@myworklaptop ArchWSL]# id ebal
uid=374(ebal) gid=374(ebal) groups=374(ebal)

[root@myworklaptop ArchWSL]# cp -rav /etc/skel/ /home/ebal
'/etc/skel/' -> '/home/ebal'
'/etc/skel/.bashrc' -> '/home/ebal/.bashrc'
'/etc/skel/.bash_profile' -> '/home/ebal/.bash_profile'
'/etc/skel/.bash_logout' -> '/home/ebal/.bash_logout'

chown -R ebal:ebal /home/ebal/

then exit the linux app and run:

> Archlinux.exe config --default-user ebal

and try to login again:

> Archlinux.exe run bash
[ebal@myworklaptop ArchWSL]$ 

[ebal@myworklaptop ArchWSL]$ cd ~

ebal@myworklaptop ~$ pwd -P
/home/ebal

 

Tag(s): archlinux, win10, WSL
Feb
11
2019
Flatpak

Flatpak is a software utility for software deployment, package management, and application virtualization for Linux desktop computers. It provides a sandbox environment in which users can run applications in isolation from the rest of the system.

… in a nutshell, it is an isolate software bundle package which you can run with restricted permissions!

 

User Vs System

We can install flatpak applications for system-wide or for single-user. The last part does not need administrative access or any special permissions.

To use flatpak as a user, we have to add --user next to every flatpak command.

 

Applications

A flatpak application has a manifest that describes dependancies & permissions.

 

Repositories

A repository contains a list of application manifests & the flatpak package (code).

 

Branches

One of the best features of flatpak, is that we can have multile versions of a specific application. This is being done by using a different branch or version (like git). Most common branch is default.

 

Add flathub

To use/install a flatpak application, we have first to add a remote flatpack repository localy.
The most well-known flatpak repository is called: flathub

flatpak remote-add --user flathub https://dl.flathub.org/repo/flathub.flatpakrepo

 

Search Applications

$ flatpak search --user Signal

Description                                                                                     Application                                 Version              Branch              Remotes
Signal - Private messenger for the desktop                                                      org.signal.Signal                           1.20.0               stable              flathub

 

Install package as user

The default syntax of install packages is:

repository - application

$ flatpak install --user -y flathub com.dropbox.Client
$ flatpak install --user -y flathub org.signal.Signal

 

List

See how many packages to you have installed

$ flatpak list

Application
___________

com.dropbox.Client
com.jetbrains.PyCharm-Community
com.slack.Slack
com.visualstudio.code.oss
io.atom.Atom
org.signal.Signal

 

Run

To execute a flatpak application :

flatpak run com.slack.Slack
flatpak run org.signal.Signal

 

Update

and finally how to update your flatpack packages:

$ flatpak --user -y update

 

Remove Unused Packages

$ flatpak uninstall --user --unused

 

Uninstall flatpak package

$ flatpak uninstall --user com.visualstudio.code.oss

 

Tag(s): flatpak
Feb
11
2019
exit codes & I/O Redirection

TLDR; Exit status value does not change when using redirection.

~> false
~> echo $?
1

~> true
~> echo $?
0

~> false > /dev/null
~> echo $?
1

~> true > /dev/null
~> echo $?
0

~> false 1> /dev/null
~> echo $?
1

~> true 1> /dev/null
~> echo $?
0

~> false 2> /dev/null
~> echo $?
1

~> true 2> /dev/null
~> echo $?
0

~> false &> /dev/null
~> echo $?
1

~> true &> /dev/null
~> echo $?
0

~> false 2>&1 >/dev/null
~> echo $?
1

~> true 2>&1 /dev/null
~> echo $?
0

~> false < /dev/null > /dev/null
~> echo $?
1

~> true < /dev/null > /dev/null
~> echo $?
0
Tag(s): bash
Jan
21
2019
Using Terraform and cloud-init on Hetzner

Using Terraform by HashiCorp and cloud-init on Hetzner cloud provider.

Nowadays with the help of modern tools, we use our infrastructure as code. This approach is very useful because we can have Immutable design with our infra by declaring the state would like our infra to be. This also provide us with flexibility and a more generic way on how to handle our infra as lego bricks, especially on scaling.

UPDATE: 2019.01.22

 

Hetzner

We need to create an Access API Token within a new project under the console of hetzner cloud.

hetzner_token.png

Copy this token and with that in place we can continue with terraform.
For the purposes of this article, I am going to use as the API token: 01234567890

 

Install Terraform

the latest terraform version at the time of writing this blog post is: v.11.11

$ curl -sL https://releases.hashicorp.com/terraform/0.11.11/terraform_0.11.11_linux_amd64.zip |
   bsdtar -xf- && chmod +x terraform
$ sudo mv terraform /usr/local/bin/

and verify it

$ terraform version
Terraform v0.11.11

 

Terraform Provider for Hetzner Cloud

To use the hetzner cloud via terraform, we need the terraform-provider-hcloud plugin.

hcloud, is part of terraform providers repository. So the first time of initialize our project, terraform will download this plugin locally.

Initializing provider plugins...
- Checking for available provider plugins on https://releases.hashicorp.com...
- Downloading plugin for provider "hcloud" (1.7.0)...
...
* provider.hcloud: version = "~> 1.7"

 

Compile hcloud

If you like, you can always build hcloud from the source code.
There are notes on how to build the plugin here Terraform Hetzner Cloud provider.

GitLab CI

or you can even download the artifact from my gitlab-ci repo.

Plugin directory

You will find the terraform hcloud plugin under your current directory:

./.terraform/plugins/linux_amd64/terraform-provider-hcloud_v1.7.0_x4

I prefer to copy the tf plugins centralized under my home directory:

$ mkdir -pv ~/.terraform/plugins/linux_amd64/
$ mv ./.terraform/plugins/linux_amd64/terraform-provider-hcloud_v1.7.0_x4 ~/.terraform.d/plugins/linux_amd64/terraform-provider-hcloud

or if you choose the artifact from gitlab:

$ curl -sL -o ~/.terraform/plugins/linux_amd64/terraform-provider-hcloud https://gitlab.com/ebal/terraform-provider-hcloud-ci/-/jobs/artifacts/master/raw/bin/terraform-provider-hcloud?job=run-build

That said, when working with multiple terraform projects you may be in a position that you need different versions of the same tf-plugin. In that case it is better to have them under your current working directory/project instead of your home directory. Perhaps one project needs v1.2.3 and another v4.5.6 of the same tf-plugin.

 

Hetzner Cloud API

Here is a few examples on how to use the Hetzner Cloud API:

$ export -p API_TOKEN="01234567890"

$ curl -sH "Authorization: Bearer $API_TOKEN" https://api.hetzner.cloud/v1/datacenters | jq -r .datacenters[].name
fsn1-dc8
nbg1-dc3
hel1-dc2
fsn1-dc14
$ curl -sH "Authorization: Bearer $API_TOKEN" https://api.hetzner.cloud/v1/locations | jq -r .locations[].name
fsn1
nbg1
hel1
$ curl -sH "Authorization: Bearer $API_TOKEN" https://api.hetzner.cloud/v1/images | jq -r .images[].name
ubuntu-16.04
debian-9
centos-7
fedora-27
ubuntu-18.04
fedora-28

 

hetzner.tf

At this point, we are ready to write our terraform file.
It can be as simple as this (CentOS 7):

# Set the variable value in *.tfvars file
# or using -var="hcloud_token=..." CLI option
variable "hcloud_token" {}

# Configure the Hetzner Cloud Provider
provider "hcloud" {
  token = "${var.hcloud_token}"
}

# Create a new server running centos
resource "hcloud_server" "node1" {
  name = "node1"
  image = "centos-7"
  server_type = "cx11"
}

 

Project_Ebal

or a more complex config: Ubuntu 18.04 LTS

# Project_Ebal
variable "hcloud_token" {}

# Configure the Hetzner Cloud Provider
provider "hcloud" {
  token = "${var.hcloud_token}"
}

# Create a new server running centos
resource "hcloud_server" "Project_Ebal" {
  name = "ebal_project"
  image = "ubuntu-18.04"
  server_type = "cx11"
  location = "nbg1"
}

 

Repository Structure

Although in this blog post we have a small and simple example of using hetzner cloud with terraform, on larger projects is usually best to have separated terraform files for variables, code and output. For more info, you can take a look here: VCS Repository Structure - Workspaces

  ├── variables.tf
  ├── main.tf
  ├── outputs.tf

 

Cloud-init

To use cloud-init with hetzner is very simple.
We just need to add this declaration user_data = "${file("user-data.yml")}" to terraform file.
So our previous tf is now this:

# Project_Ebal
variable "hcloud_token" {}

# Configure the Hetzner Cloud Provider
provider "hcloud" {
  token = "${var.hcloud_token}"
}

# Create a new server running centos
resource "hcloud_server" "Project_Ebal" {
  name = "ebal_project"
  image = "ubuntu-18.04"
  server_type = "cx11"
  location = "nbg1"
  user_data = "${file("user-data.yml")}"
}

to get the IP_Address of the virtual machine, I would also like to have an output declaration:

output "ipv4_address" {
  value = "${hcloud_server.ebal_project.ipv4_address}"
}

 

Clout-init

You will find more notes on cloud-init on a previous blog post: Cloud-init with CentOS 7.

below is an example of user-data.yml

#cloud-config

disable_root: true
ssh_pwauth: no

users:
  - name: ubuntu
    ssh_import_id:
     - gh:ebal
    shell: /bin/bash
    sudo: ALL=(ALL) NOPASSWD:ALL

# Set TimeZone
timezone: Europe/Athens

# Install packages
packages:
  - mlocate
  - vim
  - figlet

# Update/Upgrade & Reboot if necessary
package_update: true
package_upgrade: true
package_reboot_if_required: true

# Remove cloud-init
runcmd:
  - figlet Project_Ebal > /etc/motd
  - updatedb

 

Terraform

First thing with terraform is to initialize our environment.

Init

$ terraform init

Initializing provider plugins...

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

 

Plan

Of course it is not necessary to plan and then plan with out.
You can skip this step, here exist only for documentation purposes.

$ terraform plan


Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

------------------------------------------------------------------------

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  + hcloud_server.ebal_project
      id:            <computed>
      backup_window: <computed>
      backups:       "false"
      datacenter:    <computed>
      image:         "ubuntu-18.04"
      ipv4_address:  <computed>
      ipv6_address:  <computed>
      ipv6_network:  <computed>
      keep_disk:     "false"
      location:      "nbg1"
      name:          "ebal_project"
      server_type:   "cx11"
      status:        <computed>
      user_data:     "sk6134s+ys+wVdGITc+zWhbONYw="

Plan: 1 to add, 0 to change, 0 to destroy.

------------------------------------------------------------------------

Note: You didn't specify an "-out" parameter to save this plan, so Terraform
can't guarantee that exactly these actions will be performed if
"terraform apply" is subsequently run.

 

Out

$ terraform plan -out terraform.tfplan


Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

------------------------------------------------------------------------

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  + hcloud_server.ebal_project
      id:            <computed>
      backup_window: <computed>
      backups:       "false"
      datacenter:    <computed>
      image:         "ubuntu-18.04"
      ipv4_address:  <computed>
      ipv6_address:  <computed>
      ipv6_network:  <computed>
      keep_disk:     "false"
      location:      "nbg1"
      name:          "ebal_project"
      server_type:   "cx11"
      status:        <computed>
      user_data:     "sk6134s+ys+wVdGITc+zWhbONYw="

Plan: 1 to add, 0 to change, 0 to destroy.

------------------------------------------------------------------------

This plan was saved to: terraform.tfplan

To perform exactly these actions, run the following command to apply:
    terraform apply "terraform.tfplan"

 

Apply

$ terraform apply "terraform.tfplan"

hcloud_server.ebal_project: Creating...
  backup_window: "" => "<computed>"
  backups:       "" => "false"
  datacenter:    "" => "<computed>"
  image:         "" => "ubuntu-18.04"
  ipv4_address:  "" => "<computed>"
  ipv6_address:  "" => "<computed>"
  ipv6_network:  "" => "<computed>"
  keep_disk:     "" => "false"
  location:      "" => "nbg1"
  name:          "" => "ebal_project"
  server_type:   "" => "cx11"
  status:        "" => "<computed>"
  user_data:     "" => "sk6134s+ys+wVdGITc+zWhbONYw="
hcloud_server.ebal_project: Still creating... (10s elapsed)
hcloud_server.ebal_project: Still creating... (20s elapsed)
hcloud_server.ebal_project: Creation complete after 23s (ID: 1676988)

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

Outputs:

ipv4_address = 1.2.3.4

 

SSH and verify cloud-init

$ ssh 1.2.3.4 -l ubuntu

Welcome to Ubuntu 18.04.1 LTS (GNU/Linux 4.15.0-43-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Fri Jan 18 12:17:14 EET 2019

  System load:  0.41              Processes:           89
  Usage of /:   9.7% of 18.72GB   Users logged in:     0
  Memory usage: 8%                IP address for eth0: 1.2.3.4
  Swap usage:   0%

0 packages can be updated.
0 updates are security updates.

project_ebal

 

Destroy

Be Careful without providing a specific terraform out plan, terraform will destroy every tfplan within your working directory/project. So it is always a good practice to explicit destroy a specify resource/tfplan.

$ terraform destroy should better be:

$ terraform destroy -out terraform.tfplan

hcloud_server.ebal_project: Refreshing state... (ID: 1676988)

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  - destroy

Terraform will perform the following actions:

  - hcloud_server.ebal_project

Plan: 0 to add, 0 to change, 1 to destroy.

Do you really want to destroy all resources?
  Terraform will destroy all your managed infrastructure, as shown above.
  There is no undo. Only 'yes' will be accepted to confirm.

  Enter a value: yes

hcloud_server.ebal_project: Destroying... (ID: 1676988)
hcloud_server.ebal_project: Destruction complete after 1s

Destroy complete! Resources: 1 destroyed.

 

That’s it !

 

Dec
18
2018
Dropbox To End Sync Support For All Filesystems Except Ext4 on Linux

I am only using btrfs for the last few years, without any problem. Drobox’s decision is based on supporting Extended file attributes and even so btrfs supports extended attributes, seems you will get this error:

dropbox_error

I have the benefit of using encrypted disks via LUKS so in this blog post, I will only present a way to have an virtual disk with ext4, to your dropbox folder on-top of your btrfs!

 

Allocating disk space

Let’s say that your have 2G of dropbox space, allocate 2G of file size:

fallocate -l 2G Dropbox.img

you can verify the disk image by:

qemu-img info Dropbox.img

image: Dropbox.img
file format: raw
virtual size: 2.0G (2147483648 bytes)
disk size: 2.0G

 

Map Virtual Disk to Loop Device

Then map your virtual disk to a loop device:

sudo losetup `sudo losetup -f` Dropbox.img

verify it:

losetup -a

/dev/loop0: []: (/home/ebal/Dropbox.img)

then loop0 it is.

 

Create an ext4 filesystem

sudo mkfs.ext4 -L Dropbox /dev/loop0

Creating filesystem with 524288 4k blocks and 131072 inodes
Filesystem UUID: 09c7eb41-b5f3-4c58-8965-5e366ddf4d97
Superblock backups stored on blocks:
    32768, 98304, 163840, 229376, 294912

Allocating group tables: done
Writing inode tables: done
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done

 

Move Dropbox folder

You have to move your previous dropbox folder to another location. Dont forget to cloase dropbox-client and any other application that is using files from this folder.

mv Dropbox Dropbox.bak

renamed 'Dropbox' -> 'Dropbox.bak'

and then create a new Dropbox folder:

mkdir -pv Dropbox/

mkdir: created directory 'Dropbox/'

 

Mount the ext4 filesystem

sudo mount /dev/loop0 ~/Dropbox/

verify it:

df -h ~/Dropbox/

Filesystem      Size  Used Avail Use% Mounted on
/dev/loop0      2.0G  6.0M  1.8G   1% /home/ebal/Dropbox

and give the right perms:

sudo chown -R ebal:ebal ~/Dropbox/

 

Copy files

Now, copy the files from the old Dropbox folder to the new:

rsync -ravx Dropbox.bak/ Dropbox/

sent 464,823,778 bytes  received 13,563 bytes  309,891,560.67 bytes/sec
total size is 464,651,441  speedup is 1.00

and start dropbox-client

dropbox_running

Tag(s): dropbox
Dec
15
2018
Setup VLC Remote

Four Step Process

vlc_01.png
vlc_02.png
vlc_03.png

$ sudo iptables -nvL | grep 8765

0    0    ACCEPT    tcp  --  *    *    192.168.0.0/24    0.0.0.0/0    tcp dpt:8765
Tag(s): vlc
Nov
18
2018
Apple iOS Vs your Linux Mail, Contact and Calendar Server

The purpose of this blog post is to act as a visual guide/tutorial on how to setup an iOS device (iPad or iPhone) using the native apps against a custom Linux Mail, Calendar & Contact server.

Disclaimer: I wrote this blog post after 36hours with an apple device. I have never had any previous encagement with an apple product. Huge culture change & learning curve. Be aware, that the below notes may not apply to your setup.

Original creation date: Friday 12 Oct 2018
Last Update: Sunday 18 Nov 2018

 

Linux Mail Server

Notes are based on the below setup:

  • CentOS 6.10
  • Dovecot IMAP server with STARTTLS (TCP Port: 143) with Encrypted Password Authentication.
  • Postfix SMTP with STARTTLS (TCP Port: 587) with Encrypted Password Authentication.
  • Baïkal as Calendar & Contact server.

 

Thunderbird

Thunderbird settings for imap / smtp over STARTTLS and encrypted authentication

mail settings

 

Baikal

Dashboard

baikal dashboard

 

CardDAV

contact URI for user Username

https://baikal.baikal.example.org/html/card.php/addressbooks/Username/default

CalDAV

calendar URI for user Username

https://baikal.example.org/html/cal.php/calendars/Username/default

 

iOS

There is a lot of online documentation but none in one place. Random Stack Overflow articles & posts in the internet. It took me almost an entire day (and night) to figure things out. In the end, I enabled debug mode on my dovecot/postifx & apache web server. After that, throught trail and error, I managed to setup both iPhone & iPad using only native apps.

 

Mail

Open Password & Accounts & click on New Account

iPad_iOS_mail_01

Choose Other

iPad_iOS_mail_02

iPad_iOS_mail_03

iPad_iOS_mail_04

 

Now the tricky part, you have to click Next and fill the imap & smtp settings.

 

iPad_iOS_mail_05

iPad_iOS_mail_06

iPad_iOS_mail_07

 

Now we have to go back and change the settings, to enable STARTTLS and encrypted password authentication.

 

iPad_iOS_mail_08

iPad_iOS_mail_09

 

STARTTLS with Encrypted Passwords for Authentication

 

iPad_iOS_mail_10

iPad_iOS_mail_11

iPad_iOS_mail_12

iPad_iOS_mail_13

iPad_iOS_mail_14

iPad_iOS_mail_15

iPad_iOS_mail_16

 

In the home-page of the iPad/iPhone we will see the Mail-Notifications have already fetch some headers.

 

iPad_iOS_mail_17

and finally, open the native mail app:

iPad_iOS_mail_18

 

Contact Server

Now ready for setting up the contact account

https://baikal.baikal.example.org/html/card.php/addressbooks/Username/default

iPad_iOS_mail_19

iPad_iOS_mail_20

iPad_iOS_mail_21

iPad_iOS_mail_22

iPad_iOS_mail_23

 

Opening Contact App:

 

iPad_iOS_mail_24

 

Calendar Server

https://baikal.example.org/html/cal.php/calendars/Username/default

iPad_iOS_mail_25

iPad_iOS_mail_26

iPad_iOS_mail_27

iPad_iOS_mail_28

iPad_iOS_mail_29

iPad_iOS_mail_30

iPad_iOS_mail_31

 

Nov
18
2018
Cloud-init with CentOS 7

Cloud-init is the defacto multi-distribution package that handles early initialization of a cloud instance

This article is a mini-HowTo use cloud-init with centos7 in your own libvirt qemu/kvm lab, instead of using a public cloud provider.

 

How Cloud-init works

cloud-init.png

Josh Powers @ DebConf17

How really works?

Cloud-init has Boot Stages

  • Generator
  • Local
  • Network
  • Config
  • Final

and supports modules to extend configuration and support.

Here is a brief list of modules (sorted by name):

  • bootcmd
  • final-message
  • growpart
  • keys-to-console
  • locale
  • migrator
  • mounts
  • package-update-upgrade-install
  • phone-home
  • power-state-change
  • puppet
  • resizefs
  • rsyslog
  • runcmd
  • scripts-per-boot
  • scripts-per-instance
  • scripts-per-once
  • scripts-user
  • set_hostname
  • set-passwords
  • ssh
  • ssh-authkey-fingerprints
  • timezone
  • update_etc_hosts
  • update_hostname
  • users-groups
  • write-files
  • yum-add-repo

 

Gist

Cloud-init example using a Generic Cloud CentOS-7 on a libvirtd qmu/kvm lab · GitHub

 

Generic Cloud CentOS 7

You can find a plethora of centos7 cloud images here:

Download the latest version

$ curl -LO http://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud.qcow2.xz

Uncompress file

$ xz -v --keep -d CentOS-7-x86_64-GenericCloud.qcow2.xz

Check cloud image

$ qemu-img info CentOS-7-x86_64-GenericCloud.qcow2

image: CentOS-7-x86_64-GenericCloud.qcow2
file format: qcow2
virtual size: 8.0G (8589934592 bytes)
disk size: 863M
cluster_size: 65536
Format specific information:
    compat: 0.10
    refcount bits: 16

The default image is 8G.
If you need to resize it, check below in this article.

 

Create metadata file

meta-data are data that comes from the cloud provider itself. In this example, I will use static network configuration.

cat > meta-data <<EOF
instance-id: testingcentos7
local-hostname: testingcentos7

network-interfaces: |
  iface eth0 inet static
  address 192.168.122.228
  network 192.168.122.0
  netmask 255.255.255.0
  broadcast 192.168.122.255
  gateway 192.168.122.1

# vim:syntax=yaml
EOF

 

Crete cloud-init (userdata) file

user-data are data that comes from you aka the user.

cat > user-data <<EOF
#cloud-config

# Set default user and their public ssh key
# eg. https://github.com/ebal.keys
users:
  - name: ebal
    ssh-authorized-keys:
      - `curl -s -L https://github.com/ebal.keys`
    sudo: ALL=(ALL) NOPASSWD:ALL

# Enable cloud-init modules
cloud_config_modules:
  - resolv_conf
  - runcmd
  - timezone
  - package-update-upgrade-install

# Set TimeZone
timezone: Europe/Athens

# Set DNS
manage_resolv_conf: true
resolv_conf:
  nameservers: ['9.9.9.9']

# Install packages
packages:
  - mlocate
  - vim
  - epel-release

# Update/Upgrade & Reboot if necessary
package_update: true
package_upgrade: true
package_reboot_if_required: true

# Remove cloud-init
runcmd:
  - yum -y remove cloud-init
  - updatedb

# Configure where output will go
output:
  all: ">> /var/log/cloud-init.log"

# vim:syntax=yaml
EOF

 

Create the cloud-init ISO

When using libvirt with qemu/kvm the most common way to pass the meta-data/user-data to cloud-init, is through an iso (cdrom).

$ genisoimage -output cloud-init.iso -volid cidata -joliet -rock user-data meta-data

or

$ mkisofs -o cloud-init.iso -V cidata -J -r user-data meta-data

 

Provision new virtual machine

Finally run this as root:

# virt-install
    --name centos7_test
    --memory 2048
    --vcpus 1
    --metadata description="My centos7 cloud-init test"
    --import
    --disk CentOS-7-x86_64-GenericCloud.qcow2,format=qcow2,bus=virtio
    --disk cloud-init.iso,device=cdrom
    --network bridge=virbr0,model=virtio
    --os-type=linux
    --os-variant=centos7.0
    --noautoconsole

 

The List of Os Variants

There is an interesting command to find out all the os variants that are being supported by libvirt in your lab:

eg. CentOS

$ osinfo-query os | grep CentOS

centos6.0  |  CentOS  6.0  |  6.0  |  http://centos.org/centos/6.0
centos6.1  |  CentOS  6.1  |  6.1  |  http://centos.org/centos/6.1
centos6.2  |  CentOS  6.2  |  6.2  |  http://centos.org/centos/6.2
centos6.3  |  CentOS  6.3  |  6.3  |  http://centos.org/centos/6.3
centos6.4  |  CentOS  6.4  |  6.4  |  http://centos.org/centos/6.4
centos6.5  |  CentOS  6.5  |  6.5  |  http://centos.org/centos/6.5
centos6.6  |  CentOS  6.6  |  6.6  |  http://centos.org/centos/6.6
centos6.7  |  CentOS  6.7  |  6.7  |  http://centos.org/centos/6.7
centos6.8  |  CentOS  6.8  |  6.8  |  http://centos.org/centos/6.8
centos6.9  |  CentOS  6.9  |  6.9  |  http://centos.org/centos/6.9
centos7.0  |  CentOS  7.0  |  7.0  |  http://centos.org/centos/7.0

 

DHCP

If you are not using a static network configuration scheme, then to identify the IP of your cloud instance, type:

$ virsh net-dhcp-leases default

 Expiry Time           MAC address         Protocol   IP address           Hostname   Client ID or DUID
---------------------------------------------------------------------------------------------------------
 2018-11-17 15:40:31   52:54:00:57:79:3e   ipv4       192.168.122.144/24   -          -                  

 

Resize

The easiest way to grow/resize your virtual machine is via qemu-img command:

$ qemu-img resize CentOS-7-x86_64-GenericCloud.qcow2 20G

Image resized.

$ qemu-img info CentOS-7-x86_64-GenericCloud.qcow2

image: CentOS-7-x86_64-GenericCloud.qcow2
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 870M
cluster_size: 65536
Format specific information:
    compat: 0.10
    refcount bits: 16

You can add the below lines into your user-data file

growpart:
  mode: auto
  devices: ['/']
  ignore_growroot_disabled: false

The result:

[root@testingcentos7 ebal]# df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1        20G  870M   20G   5% /

 

Default cloud-init.cfg

For reference, this is the default centos7 cloud-init configuration file.

# /etc/cloud/cloud.cfg 
users:
 - default

disable_root: 1
ssh_pwauth:   0

mount_default_fields: [~, ~, 'auto', 'defaults,nofail', '0', '2']
resize_rootfs_tmp: /dev
ssh_deletekeys:   0
ssh_genkeytypes:  ~
syslog_fix_perms: ~

cloud_init_modules:
 - migrator
 - bootcmd
 - write-files
 - growpart
 - resizefs
 - set_hostname
 - update_hostname
 - update_etc_hosts
 - rsyslog
 - users-groups
 - ssh

cloud_config_modules:
 - mounts
 - locale
 - set-passwords
 - rh_subscription
 - yum-add-repo
 - package-update-upgrade-install
 - timezone
 - puppet
 - chef
 - salt-minion
 - mcollective
 - disable-ec2-metadata
 - runcmd

cloud_final_modules:
 - rightscale_userdata
 - scripts-per-once
 - scripts-per-boot
 - scripts-per-instance
 - scripts-user
 - ssh-authkey-fingerprints
 - keys-to-console
 - phone-home
 - final-message
 - power-state-change

system_info:
  default_user:
    name: centos
    lock_passwd: true
    gecos: Cloud User
    groups: [wheel, adm, systemd-journal]
    sudo: ["ALL=(ALL) NOPASSWD:ALL"]
    shell: /bin/bash
  distro: rhel
  paths:
    cloud_dir: /var/lib/cloud
    templates_dir: /etc/cloud/templates
  ssh_svcname: sshd

# vim:syntax=yaml
Oct
28
2018
Linux Software RAID mismatch Warning

I use Linux Software RAID for years now. It is reliable and stable (as long as your hard disks are reliable) with very few problems. One recent issue -that the daily cron raid-check was reporting- was this:

 

WARNING: mismatch_cnt is not 0 on /dev/md0

 

Raid Environment

A few details on this specific raid setup:

RAID 5 with 4 Drives

with 4 x 1TB hard disks and according the online raid calculator:

RAID Calculator

raid5-4disks

that means this setup is fault tolerant and cheap but not fast.

 

Raid Details

# /sbin/mdadm --detail /dev/md0

raid configuration is valid

/dev/md0:
        Version : 1.2
  Creation Time : Wed Feb 26 21:00:17 2014
     Raid Level : raid5
     Array Size : 2929893888 (2794.16 GiB 3000.21 GB)
  Used Dev Size : 976631296 (931.39 GiB 1000.07 GB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent

    Update Time : Sat Oct 27 04:38:04 2018
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : ServerTwo:0  (local to host ServerTwo)
           UUID : ef5da4df:3e53572e:c3fe1191:925b24cf
         Events : 60352

    Number   Major   Minor   RaidDevice State
       4       8       16        0      active sync   /dev/sdb
       1       8       32        1      active sync   /dev/sdc
       6       8       48        2      active sync   /dev/sdd
       5       8        0        3      active sync   /dev/sda

 

Examine Verbose Scan

with a more detailed output:

# mdadm -Evvvvs

there are a few Bad Blocks, although it is perfectly normal for a two (2) year disks to have some. smartctl is a tool you need to use from time to time.

/dev/sdd:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : ef5da4df:3e53572e:c3fe1191:925b24cf
           Name : ServerTwo:0  (local to host ServerTwo)
  Creation Time : Wed Feb 26 21:00:17 2014
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 1953266096 (931.39 GiB 1000.07 GB)
     Array Size : 2929893888 (2794.16 GiB 3000.21 GB)
  Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
    Data Offset : 259072 sectors
   Super Offset : 8 sectors
   Unused Space : before=258984 sectors, after=3504 sectors
          State : clean
    Device UUID : bdd41067:b5b243c6:a9b523c4:bc4d4a80

    Update Time : Sun Oct 28 09:04:01 2018
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 6baa02c9 - correct
         Events : 60355

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)

 

/dev/sde:
   MBR Magic : aa55
Partition[0] :      8388608 sectors at         2048 (type 82)
Partition[1] :    226050048 sectors at      8390656 (type 83)
/dev/sdc:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : ef5da4df:3e53572e:c3fe1191:925b24cf
           Name : ServerTwo:0  (local to host ServerTwo)
  Creation Time : Wed Feb 26 21:00:17 2014
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
     Array Size : 2929893888 (2794.16 GiB 3000.21 GB)
  Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
    Data Offset : 259072 sectors
   Super Offset : 8 sectors
   Unused Space : before=258992 sectors, after=3504 sectors
          State : clean
    Device UUID : a90e317e:43848f30:0de1ee77:f8912610

    Update Time : Sun Oct 28 09:04:01 2018
       Checksum : 30b57195 - correct
         Events : 60355

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)

 

/dev/sdb:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : ef5da4df:3e53572e:c3fe1191:925b24cf
           Name : ServerTwo:0  (local to host ServerTwo)
  Creation Time : Wed Feb 26 21:00:17 2014
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
     Array Size : 2929893888 (2794.16 GiB 3000.21 GB)
  Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
    Data Offset : 259072 sectors
   Super Offset : 8 sectors
   Unused Space : before=258984 sectors, after=3504 sectors
          State : clean
    Device UUID : ad7315e5:56cebd8c:75c50a72:893a63db

    Update Time : Sun Oct 28 09:04:01 2018
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : b928adf1 - correct
         Events : 60355

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)

 

/dev/sda:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : ef5da4df:3e53572e:c3fe1191:925b24cf
           Name : ServerTwo:0  (local to host ServerTwo)
  Creation Time : Wed Feb 26 21:00:17 2014
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
     Array Size : 2929893888 (2794.16 GiB 3000.21 GB)
  Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
    Data Offset : 259072 sectors
   Super Offset : 8 sectors
   Unused Space : before=258984 sectors, after=3504 sectors
          State : clean
    Device UUID : f4e1da17:e4ff74f0:b1cf6ec8:6eca3df1

    Update Time : Sun Oct 28 09:04:01 2018
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : bbe3e7e8 - correct
         Events : 60355

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)

 

MisMatch Warning

WARNING: mismatch_cnt is not 0 on /dev/md0

So this is not a critical error, rather tells us that there are a few blocks that are “Not Synced Yet” across all disks.

 

Status

Checking the Multiple Device (md) driver status:

# cat /proc/mdstat

Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdc[1] sda[5] sdd[6] sdb[4]
      2929893888 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]

We verify that none job is running on the raid.

 

Repair

We can run a manual repair job:

# echo repair >/sys/block/md0/md/sync_action

now status looks like:

# cat /proc/mdstat

Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdc[1] sda[5] sdd[6] sdb[4]
      2929893888 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      [=========>...........]  resync = 45.6% (445779112/976631296) finish=54.0min speed=163543K/sec

unused devices: <none>

Progress

Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdc[1] sda[5] sdd[6] sdb[4]
      2929893888 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      [============>........]  resync = 63.4% (619673060/976631296) finish=38.2min speed=155300K/sec

unused devices: <none>
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdc[1] sda[5] sdd[6] sdb[4]
      2929893888 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      [================>....]  resync = 81.9% (800492148/976631296) finish=21.6min speed=135627K/sec

unused devices: <none>

Finally

Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdc[1] sda[5] sdd[6] sdb[4]
      2929893888 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]

unused devices: <none>

 

Check

After repair is it useful to check again the status of our software raid:

# echo check >/sys/block/md0/md/sync_action

# cat /proc/mdstat

Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdc[1] sda[5] sdd[6] sdb[4]
      2929893888 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      [=>...................]  check =  9.5% (92965776/976631296) finish=91.0min speed=161680K/sec

unused devices: <none>

and finally

# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdc[1] sda[5] sdd[6] sdb[4]
      2929893888 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]

unused devices: <none>
Tag(s): md0, mdadm, linux, raid