Quick notes
Identify slow disk
# hdparm -Tt /dev/sda
/dev/sda:
Timing cached reads: 2502 MB in 2.00 seconds = 1251.34 MB/sec
Timing buffered disk reads: 538 MB in 3.01 seconds = 178.94 MB/sec
# hdparm -Tt /dev/sdb
/dev/sdb:
Timing cached reads: 2490 MB in 2.00 seconds = 1244.86 MB/sec
Timing buffered disk reads: 536 MB in 3.01 seconds = 178.31 MB/sec
# hdparm -Tt /dev/sdc
/dev/sdc:
Timing cached reads: 2524 MB in 2.00 seconds = 1262.21 MB/sec
Timing buffered disk reads: 538 MB in 3.00 seconds = 179.15 MB/sec
# hdparm -Tt /dev/sdd
/dev/sdd:
Timing cached reads: 2234 MB in 2.00 seconds = 1117.20 MB/sec
Timing buffered disk reads: read(2097152) returned 929792 bytes
Set disk to Faulty State and Remove it
# mdadm --manage /dev/md0 --fail /dev/sdd
mdadm: set /dev/sdd faulty in /dev/md0
# mdadm --manage /dev/md0 --remove /dev/sdd
mdadm: hot removed /dev/sdd from /dev/md0
Verify Status
# mdadm --verbose --detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Thu Feb 6 15:06:34 2014
Raid Level : raid5
Array Size : 2929893888 (2794.16 GiB 3000.21 GB)
Used Dev Size : 976631296 (931.39 GiB 1000.07 GB)
Raid Devices : 4
Total Devices : 3
Persistence : Superblock is persistent
Update Time : Mon Jul 8 00:51:14 2019
State : clean, degraded
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : ServerOne:0 (local to host ServerOne)
UUID : d635095e:50457059:7e6ccdaf:7da91c9b
Events : 18122
Number Major Minor RaidDevice State
0 8 16 0 active sync /dev/sdb
6 8 32 1 active sync /dev/sdc
4 0 0 4 removed
4 8 0 3 active sync /dev/sda
Format Disk
- quick format to identify bad blocks,
- better solution zeroing the disk
# mkfs.ext4 -cc -v /dev/sdd
- middle ground to use
-c
-c Check the device for bad blocks before creating the file system. If this option is specified twice, then a slower read-write test is used instead of a fast read-only test.
# mkfs.ext4 -c -v /dev/sdd
output:
Running command: badblocks -b 4096 -X -s /dev/sdd 244190645
Checking for bad blocks (read-only test): 9.76% done, 7:37 elapsed
Remove ext headers
# dd if=/dev/zero of=/dev/sdd bs=4096 count=4096
Using dd to remove any ext headers
Test disk
# hdparm -Tt /dev/sdd
/dev/sdd:
Timing cached reads: 2174 MB in 2.00 seconds = 1087.20 MB/sec
Timing buffered disk reads: 516 MB in 3.00 seconds = 171.94 MB/sec
Add Disk to Raid
# mdadm --add /dev/md0 /dev/sdd
mdadm: added /dev/sdd
Speed
# hdparm -Tt /dev/md0
/dev/md0:
Timing cached reads: 2480 MB in 2.00 seconds = 1239.70 MB/sec
Timing buffered disk reads: 1412 MB in 3.00 seconds = 470.62 MB/sec
Status
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdd[5] sda[4] sdc[6] sdb[0]
2929893888 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UU_U]
[>....................] recovery = 0.0% (44032/976631296) finish=369.5min speed=44032K/sec
unused devices: <none>
Verify Raid
# mdadm --verbose --detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Thu Feb 6 15:06:34 2014
Raid Level : raid5
Array Size : 2929893888 (2794.16 GiB 3000.21 GB)
Used Dev Size : 976631296 (931.39 GiB 1000.07 GB)
Raid Devices : 4
Total Devices : 4
Persistence : Superblock is persistent
Update Time : Mon Jul 8 00:58:38 2019
State : clean, degraded, recovering
Active Devices : 3
Working Devices : 4
Failed Devices : 0
Spare Devices : 1
Layout : left-symmetric
Chunk Size : 512K
Rebuild Status : 0% complete
Name : ServerOne:0 (local to host ServerOne)
UUID : d635095e:50457059:7e6ccdaf:7da91c9b
Events : 18244
Number Major Minor RaidDevice State
0 8 16 0 active sync /dev/sdb
6 8 32 1 active sync /dev/sdc
5 8 48 2 spare rebuilding /dev/sdd
4 8 0 3 active sync /dev/sda
Hardware Details
HP ProLiant MicroServer
AMD Turion(tm) II Neo N54L Dual-Core Processor
Memory Size: 2 GB - DIMM Speed: 1333 MT/s
Maximum Capacity: 8 GB
Running 24×7 from 23/08/2010, so nine years!
Prologue
The above server started it’s life on CentOS 5 and ext3. Re-formatting to run CentOS 6.x with ext4 on 4 x 1TB OEM Hard Disks with mdadm raid-5. That provided 3 TB storage with Fault tolerance 1-drive failure. And believe me, I used that setup to zeroing broken disks or replacing faulty disks.
As we are reaching the end of CentOS 6.x and there is no official dist-upgrade path for CentOS, and still waiting for CentOS 8.x, I made decision to switch to Ubuntu 18.04 LTS. At that point this would be the 3rd official OS re-installation of this server. I chose ubuntu so that I can dist-upgrade from LTS to LTS.
This is a backup server, no need for huge RAM, but for a reliable system. On that storage I have 2m files that in retrospect are not very big. So with the re-installation I chose to use xfs instead of ext4 filesystem.
I am also running an internal snapshot mechanism to have delta for every day and that pushed the storage usage to 87% of the 3Tb. If you do the math, 2m is about 1.2Tb usage, we need a full initial backup, so 2.4Tb (80%) and then the daily (rotate) incremental backups are ~210Mb per day. That gave me space for five (5) daily snapshots aka a work-week.
To remove this impediment, I also replaced the disks with WD Red Pro 6TB 7200rpm disks, and use raid-1 instead of raid-5. Usage is now ~45%
Problem
Frozen System
From time to time, this very new, very clean, very reliable system froze to death!
When attached monitor & keyboard no output. Strange enough I can ping the network interfaces but I can not ssh to the server or even telnet (nc) to ssh port. Awkward! Okay - hardware cold reboot then.
As this system is remote … in random times, I need to ask from someone to cold-reboot this machine. Awkward again.
Kernel Panic
If that was not enough, this machine also has random kernel panics.
Errors
Let’s start troubleshooting this system
# journalctl -p 3 -x
Important Errors
ERST: Failed to get Error Log Address Range.
APEI: Can not request [mem 0x7dfab650-0x7dfab6a3] for APEI BERT registers
ipmi_si dmi-ipmi-si.0: Could not set up I/O space
and more important Errors:
INFO: task kswapd0:40 blocked for more than 120 seconds.
Not tainted 4.15.0-54-generic #58-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task xfsaild/dm-0:761 blocked for more than 120 seconds.
Not tainted 4.15.0-54-generic #58-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task kworker/u9:2:3612 blocked for more than 120 seconds.
Not tainted 4.15.0-54-generic #58-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task kworker/1:0:5327 blocked for more than 120 seconds.
Not tainted 4.15.0-54-generic #58-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task rm:5901 blocked for more than 120 seconds.
Not tainted 4.15.0-54-generic #58-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task kworker/u9:1:5902 blocked for more than 120 seconds.
Not tainted 4.15.0-54-generic #58-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task kworker/0:0:5906 blocked for more than 120 seconds.
Not tainted 4.15.0-54-generic #58-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task kswapd0:40 blocked for more than 120 seconds.
Not tainted 4.15.0-54-generic #58-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task xfsaild/dm-0:761 blocked for more than 120 seconds.
Not tainted 4.15.0-54-generic #58-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task kworker/u9:2:3612 blocked for more than 120 seconds.
Not tainted 4.15.0-54-generic #58-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
First impressions ?
BootOptions
After a few (hours) of internet research the suggestion is to disable
- ACPI stands for Advanced Configuration and Power Interface.
- APIC stands for Advanced Programmable Interrupt Controller.
This site is very helpful for ubuntu, although Red Hat still has a huge advanced on describing kernel options better than canonical.
Grub
# vim /etc/default/grub
GRUB_CMDLINE_LINUX="noapic acpi=off"
then
# update-grub
Sourcing file `/etc/default/grub'
Sourcing file `/etc/default/grub.d/50-curtin-settings.cfg'
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.15.0-54-generic
Found initrd image: /boot/initrd.img-4.15.0-54-generic
Found linux image: /boot/vmlinuz-4.15.0-52-generic
Found initrd image: /boot/initrd.img-4.15.0-52-generic
done
Verify
# grep noapic /boot/grub/grub.cfg | head -1
linux /boot/vmlinuz-4.15.0-54-generic root=UUID=0c686739-e859-4da5-87a2-dfd5fcccde3d ro noapic acpi=off maybe-ubiquity
reboot and check again:
# journalctl -p 3 -xb
-- Logs begin at Thu 2019-03-14 19:26:12 EET, end at Wed 2019-07-03 21:31:08 EEST. --
Jul 03 21:30:49 servertwo kernel: ipmi_si dmi-ipmi-si.0: Could not set up I/O space
okay !!!
ipmi_si
Unfortunately I could not find anything useful regarding
# dmesg | grep -i ipm
[ 10.977914] ipmi message handler version 39.2
[ 11.188484] ipmi device interface
[ 11.203630] IPMI System Interface driver.
[ 11.203662] ipmi_si dmi-ipmi-si.0: ipmi_platform: probing via SMBIOS
[ 11.203665] ipmi_si: SMBIOS: mem 0x0 regsize 1 spacing 1 irq 0
[ 11.203667] ipmi_si: Adding SMBIOS-specified kcs state machine
[ 11.203729] ipmi_si: Trying SMBIOS-specified kcs state machine at mem address 0x0, slave address 0x20, irq 0
[ 11.203732] ipmi_si dmi-ipmi-si.0: Could not set up I/O space
# ipmitool list
Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory
# lsmod | grep -i ipmi
ipmi_si 61440 0
ipmi_devintf 20480 0
ipmi_msghandler 53248 2 ipmi_devintf,ipmi_si
blocked for more than 120 seconds.
But let’s try to fix the timeout warnings:
INFO: task kswapd0:40 blocked for more than 120 seconds.
Not tainted 4.15.0-54-generic #58-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message
if you search online the above message, most of the sites will suggest to tweak dirty pages for your system.
This is the most common response across different sites:
This is a know bug. By default Linux uses up to 40% of the available memory for file system caching. After this mark has been reached the file system flushes all outstanding data to disk causing all following IOs going synchronous. For flushing out this data to disk this there is a time limit of 120 seconds by default. In the case here the IO subsystem is not fast enough to flush the data withing 120 seconds. This especially happens on systems with a lot of memory.
Okay this may be the problem but we do not have a lot of memory, only 2GB RAM and 2GB Swap. But even then, our vm.dirty_ratio = 20
setting is 20% instead of 40%.
But I have the ability to cross-check ubuntu 18.04 with CentOS 6.10 to compare notes:
ubuntu 18.04
# uname -r
4.15.0-54-generic
# sysctl -a | egrep -i 'swap|dirty|raid'|sort
dev.raid.speed_limit_max = 200000
dev.raid.speed_limit_min = 1000
vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 10
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 20
vm.dirtytime_expire_seconds = 43200
vm.dirty_writeback_centisecs = 500
vm.swappiness = 60
CentOS 6.11
# uname -r
2.6.32-754.15.3.el6.centos.plus.x86_64
# sysctl -a | egrep -i 'swap|dirty|raid'|sort
dev.raid.speed_limit_max = 200000
dev.raid.speed_limit_min = 1000
vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 10
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 20
vm.dirty_writeback_centisecs = 500
vm.swappiness = 60
Scheduler for Raid
This is the best online documentation on the
optimize raid
Comparing notes we see that both systems have the same settings, even when the kernel version is a lot different, 2.6.32 Vs 4.15.0 !!!
Researching on raid optimization there is a note of kernel scheduler.
Ubuntu 18.04
# for drive in {a..c}; do cat /sys/block/sd${drive}/queue/scheduler; done
noop deadline [cfq]
noop deadline [cfq]
noop deadline [cfq]
CentOS 6.11
# for drive in {a..d}; do cat /sys/block/sd${drive}/queue/scheduler; done
noop anticipatory deadline [cfq]
noop anticipatory deadline [cfq]
noop anticipatory deadline [cfq]
noop anticipatory deadline [cfq]
Anticipatory scheduling
CentOS supports Anticipatory scheduling on the hard disks but nowadays anticipatory scheduler is not supported in modern kernel versions.
That said, from the above output we can verify that both systems are running the default scheduler cfq.
Disks
Ubuntu 18.04
- Western Digital Red Pro WDC WD6003FFBX-6
# for i in sd{b..c} ; do hdparm -Tt /dev/$i; done
/dev/sdb:
Timing cached reads: 2344 MB in 2.00 seconds = 1171.76 MB/sec
Timing buffered disk reads: 738 MB in 3.00 seconds = 245.81 MB/sec
/dev/sdc:
Timing cached reads: 2264 MB in 2.00 seconds = 1131.40 MB/sec
Timing buffered disk reads: 774 MB in 3.00 seconds = 257.70 MB/sec
CentOS 6.11
- Seagate ST1000DX001
/dev/sdb:
Timing cached reads: 2490 MB in 2.00 seconds = 1244.86 MB/sec
Timing buffered disk reads: 536 MB in 3.01 seconds = 178.31 MB/sec
/dev/sdc:
Timing cached reads: 2524 MB in 2.00 seconds = 1262.21 MB/sec
Timing buffered disk reads: 538 MB in 3.00 seconds = 179.15 MB/sec
/dev/sdd:
Timing cached reads: 2452 MB in 2.00 seconds = 1226.15 MB/sec
Timing buffered disk reads: 546 MB in 3.01 seconds = 181.64 MB/sec
So what I am missing ?
My initial personal feeling was the low memory. But after running a manual rsync I’ve realized that:
cpu
was load average: 0.87, 0.46, 0.19
mem
was (on high load), when hit 40% of RAM, started to use swap.
KiB Mem : 2008464 total, 77528 free, 635900 used, 1295036 buff/cache
KiB Swap: 2097148 total, 2096624 free, 524 used. 1184220 avail Mem
So I tweaked a bit the swapiness and reduce it from 60% to 40%
and run a local snapshot (that is a bit heavy on the disks) and doing an upgrade and trying to increase CPU load. Still everything is fine !
I will keep an eye on this story.
a couple years ago I wrote an article about the Tools I use daily. The last nine (9) months I am partial using win10 due to new job challenges and thus I would like to write an updated version on my previous article.
I will try to use the same structure for comparison as the previous article although this a nine to five setup (work related). So here it goes.
Operating System
I use Win10 as my primary operating system in my work laptop. I have two impediments that can not work on a laptop distribution:
- WebEx
- MS Office
We are using webex as our primary communication tool. We are sharing our screen and have our video camera on, so that we can see each other. And we have a lot of meetings that integrate with our company’s outlook. I can use OWA as an alternative but in fact it is difficult to use both of them in a linux desktop.
I have considered to use a VM but a win10 vm needs more than 4G of RAM and a couple of CPUs just to boot up. In that case, means that I have to reduce my work laptop resources for half the day, every day. So for the time being I am staying with Win10 as the primary operating system.
Desktop
Default Win10 desktop
Disk / Filesystem
Default Win10 filesystem with bitlocker. Every HW change will lock the entire system and in the past just was the case!
Dropbox as a cloud sync software, an encfs partition and syncthing for secure personal syncing files.
Mostly OWA for calendar purposes and … still thunderbird for primary reading mails.
Shell
WSL … waiting for the official WSLv2 ! This is a huge HUGE upgrade for windows. I have setup an archlinux WSL environment to continue work on a linux environment, I mean bash. I use my WSL archlinux as a jumphost to my VMs.
Editor
Using Visual Studio Code for any python (or any other) scripting code file. Vim within WSL and notepad for temp text notes. The last year I have switched to Boostnote and markdown notes.
Browser
Multiple Instances of firefox, chromium, firefox Nightly, Tor Browser and Brave
Communication
I use mostly slack and signal-desktop. We are using webex but I prefer zoom. Less and less riot-matrix.
Media
VLC for windows, what else ? Also gimp for image editing. I have switched to spotify for music and draw for diagrams. Last I use CPod for podcasts.
In conclusion
I have switched to a majority of electron applications. I use the same applications on my Linux boxes. Encrypted notes on boostnote, synced over syncthing. Same browsers, same bash/shell, the only thing I dont have on my linux boxes are webex and outlook. Consider everything else, I think it is a decent setup across every distro.
MariaDB Galera Cluster on Ubuntu 18.04.2 LTS
Last Edit: 2019 06 11
Thanks to Manolis Kartsonakis for the extra info.
Official Notes here:
MariaDB Galera Cluster
a Galera Cluster is a synchronous multi-master cluster setup. Each node can act as master. The XtraDB/InnoDB storage engine can sync its data using rsync. Each data transaction gets a Global unique Id and then using Write Set REPLication the nodes can sync data across each other. When a new node joins the cluster the State Snapshot Transfers (SSTs) synchronize full data but in Incremental State Transfers (ISTs) only the missing data are synced.
With this setup we can have:
- Data Redundancy
- Scalability
- Availability
Installation
In Ubuntu 18.04.2 LTS
three packages should exist in every node.
So run the below commands in all of the nodes - change your internal IPs accordingly
as root
# apt -y install mariadb-server
# apt -y install galera-3
# apt -y install rsync
host file
as root
# echo 10.10.68.91 gal1 >> /etc/hosts
# echo 10.10.68.92 gal2 >> /etc/hosts
# echo 10.10.68.93 gal3 >> /etc/hosts
Storage Engine
Start the MariaDB/MySQL in one node and check the default storage engine. It should be
MariaDB [(none)]> show variables like 'default_storage_engine';
or
echo "SHOW Variables like 'default_storage_engine';" | mysql
+------------------------+--------+
| Variable_name | Value |
+------------------------+--------+
| default_storage_engine | InnoDB |
+------------------------+--------+
Architecture
A Galera Cluster should be behind a Load Balancer (proxy) and you should never talk with a node directly.
Galera Configuration
Now copy the below configuration file in all 3 nodes:
/etc/mysql/conf.d/galera.cnf
[mysqld]
binlog_format=ROW
default-storage-engine=InnoDB
innodb_autoinc_lock_mode=2
bind-address=0.0.0.0
# Galera Provider Configuration
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so
# Galera Cluster Configuration
wsrep_cluster_name="galera_cluster"
wsrep_cluster_address="gcomm://10.10.68.91,10.10.68.92,10.10.68.93"
# Galera Synchronization Configuration
wsrep_sst_method=rsync
# Galera Node Configuration
wsrep_node_address="10.10.68.91"
wsrep_node_name="gal1"
Per Node
Be careful the last 2 lines should change to each node:
Node 01
# Galera Node Configuration
wsrep_node_address="10.10.68.91"
wsrep_node_name="gal1"
Node 02
# Galera Node Configuration
wsrep_node_address="10.10.68.92"
wsrep_node_name="gal2"
Node 03
# Galera Node Configuration
wsrep_node_address="10.10.68.93"
wsrep_node_name="gal3"
Galera New Cluster
We are ready to create our galera cluster:
galera_new_cluster
or
mysqld --wsrep-new-cluster
JournalCTL
Jun 10 15:01:20 gal1 systemd[1]: Starting MariaDB 10.1.40 database server...
Jun 10 15:01:24 gal1 sh[2724]: WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1
Jun 10 15:01:24 gal1 mysqld[2865]: 2019-06-10 15:01:24 139897056971904 [Note] /usr/sbin/mysqld (mysqld 10.1.40-MariaDB-0ubuntu0.18.04.1) starting as process 2865 ...
Jun 10 15:01:24 gal1 /etc/mysql/debian-start[2906]: Upgrading MySQL tables if necessary.
Jun 10 15:01:24 gal1 systemd[1]: Started MariaDB 10.1.40 database server.
Jun 10 15:01:24 gal1 /etc/mysql/debian-start[2909]: /usr/bin/mysql_upgrade: the '--basedir' option is always ignored
Jun 10 15:01:24 gal1 /etc/mysql/debian-start[2909]: Looking for 'mysql' as: /usr/bin/mysql
Jun 10 15:01:24 gal1 /etc/mysql/debian-start[2909]: Looking for 'mysqlcheck' as: /usr/bin/mysqlcheck
Jun 10 15:01:24 gal1 /etc/mysql/debian-start[2909]: This installation of MySQL is already upgraded to 10.1.40-MariaDB, use --force if you still need to run mysql_upgrade
Jun 10 15:01:24 gal1 /etc/mysql/debian-start[2918]: Checking for insecure root accounts.
Jun 10 15:01:24 gal1 /etc/mysql/debian-start[2922]: WARNING: mysql.user contains 4 root accounts without password or plugin!
Jun 10 15:01:24 gal1 /etc/mysql/debian-start[2923]: Triggering myisam-recover for all MyISAM tables and aria-recover for all Aria tables
# ss -at '( sport = :mysql )'
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 80 127.0.0.1:mysql 0.0.0.0:*
# echo "SHOW STATUS LIKE 'wsrep_%';" | mysql | egrep -i 'cluster|uuid|ready' | column -t
wsrep_cluster_conf_id 1
wsrep_cluster_size 1
wsrep_cluster_state_uuid 8abc6a1b-8adc-11e9-a42b-c6022ea4412c
wsrep_cluster_status Primary
wsrep_gcomm_uuid d67e5b7c-8b90-11e9-ba3d-23ea221848fd
wsrep_local_state_uuid 8abc6a1b-8adc-11e9-a42b-c6022ea4412c
wsrep_ready ON
Second Node
systemctl restart mariadb.service
root@gal2:~# echo "SHOW STATUS LIKE 'wsrep_%';" | mysql | egrep -i 'cluster|uuid|ready' | column -t
wsrep_cluster_conf_id 2
wsrep_cluster_size 2
wsrep_cluster_state_uuid 8abc6a1b-8adc-11e9-a42b-c6022ea4412c
wsrep_cluster_status Primary
wsrep_gcomm_uuid a5eaae3e-8b91-11e9-9662-0bbe68c7d690
wsrep_local_state_uuid 8abc6a1b-8adc-11e9-a42b-c6022ea4412c
wsrep_ready ON
Third Node
systemctl restart mariadb.service
root@gal3:~# echo "SHOW STATUS LIKE 'wsrep_%';" | mysql | egrep -i 'cluster|uuid|ready' | column -t
wsrep_cluster_conf_id 3
wsrep_cluster_size 3
wsrep_cluster_state_uuid 8abc6a1b-8adc-11e9-a42b-c6022ea4412c
wsrep_cluster_status Primary
wsrep_gcomm_uuid 013e1847-8b92-11e9-9055-7ac5e2e6b947
wsrep_local_state_uuid 8abc6a1b-8adc-11e9-a42b-c6022ea4412c
wsrep_ready ON
Primary Component (PC)
The last node in the cluster -in theory- has all the transactions. That means it should be the first to start next time from a power-off.
State
cat /var/lib/mysql/grastate.dat
eg.
# GALERA saved state
version: 2.1
uuid: 8abc6a1b-8adc-11e9-a42b-c6022ea4412c
seqno: -1
safe_to_bootstrap: 0
if safe_to_bootstrap: 1
then you can bootstrap this node as Primary.
Common Mistakes
Sometimes DBAs want to setup a new cluster (lets say upgrade into a new scheme - non compatible with the previous) so they want a clean state/directory. The most common way is to move the current mysql directory
mv /var/lib/mysql /var/lib/mysql_BAK
If you try to start your galera node, it will fail:
# systemctl restart mariadb
WSREP: Failed to start mysqld for wsrep recovery:
[Warning] Can't create test file /var/lib/mysql/gal1.lower-test
Failed to start MariaDB 10.1.40 database server
You need to create and initialize the mysql directory first:
mkdir -pv /var/lib/mysql
chown -R mysql:mysql /var/lib/mysql
chmod 0755 /var/lib/mysql
mysql_install_db -u mysql
On another node, cluster_size = 2
# echo "SHOW STATUS LIKE 'wsrep_%';" | mysql | egrep -i 'cluster|uuid|ready' | column -t
wsrep_cluster_conf_id 4
wsrep_cluster_size 2
wsrep_cluster_state_uuid 8abc6a1b-8adc-11e9-a42b-c6022ea4412c
wsrep_cluster_status Primary
wsrep_gcomm_uuid a5eaae3e-8b91-11e9-9662-0bbe68c7d690
wsrep_local_state_uuid 8abc6a1b-8adc-11e9-a42b-c6022ea4412c
wsrep_ready ON
then:
# systemctl restart mariadb
rsync from the Primary:
Jun 10 15:19:00 gal1 rsyncd[3857]: rsyncd version 3.1.2 starting, listening on port 4444
Jun 10 15:19:01 gal1 rsyncd[3884]: connect from gal3 (192.168.122.93)
Jun 10 15:19:01 gal1 rsyncd[3884]: rsync to rsync_sst/ from gal3 (192.168.122.93)
Jun 10 15:19:01 gal1 rsyncd[3884]: receiving file list
# echo "SHOW STATUS LIKE 'wsrep_%';" | mysql | egrep -i 'cluster|uuid|ready' | column -t
wsrep_cluster_conf_id 5
wsrep_cluster_size 3
wsrep_cluster_state_uuid 8abc6a1b-8adc-11e9-a42b-c6022ea4412c
wsrep_cluster_status Primary
wsrep_gcomm_uuid 12afa7bc-8b93-11e9-88fc-6f41be61a512
wsrep_local_state_uuid 8abc6a1b-8adc-11e9-a42b-c6022ea4412c
wsrep_ready ON
Be Aware: Try to keep your DATA directory to a seperated storage disk
Adding new Nodes
A healthy Quorum has an odd number of nodes. So when you scale your galera gluster consider adding two (2) at every step!
# echo 10.10.68.94 gal4 >> /etc/hosts
# echo 10.10.68.95 gal5 >> /etc/hosts
Data Replication will lock your donor-node so it is best to put-off your donor-node from your Load Balancer:
Then explicit point your donor-node to your new nodes by adding the below line in your configuration file:
wsrep_sst_donor= gal3
After the synchronization:
- comment-out the above line
- restart mysql service and
- put all three nodes behind the Local Balancer
Split Brain
Find the node with the max
SHOW STATUS LIKE 'wsrep_last_committed';
and set it as master by
SET GLOBAL wsrep_provider_options='pc.bootstrap=YES';
Weighted Quorum for Three Nodes
When configuring quorum weights for three nodes, use the following pattern:
node1: pc.weight = 4
node2: pc.weight = 3
node3: pc.weight = 2
node4: pc.weight = 1
node5: pc.weight = 0
eg.
SET GLOBAL wsrep_provider_options="pc.weight=3";
In the same VPC setting up pc.weight
will avoid a split brain situation. In different regions, you can setup something like this:
node1: pc.weight = 2
node2: pc.weight = 2
node3: pc.weight = 2
<->
node4: pc.weight = 1
node5: pc.weight = 1
node6: pc.weight = 1
WSREP_SST: [ERROR] Error while getting data from donor node
In cases that a specific node can not sync/join the cluster with the above error, we can use this workaround
Change wsrep_sst_method to rsync from xtrabackup , do a restart and check the logs.
Then revert the change back to xtrabackup
/etc/mysql/conf.d/galera.cnf:;wsrep_sst_method = rsync
/etc/mysql/conf.d/galera.cnf:wsrep_sst_method = xtrabackup
STATUS
echo "SHOW STATUS LIKE 'wsrep_%';" | mysql | grep -Ei 'cluster|uuid|ready|commit' | column -t
TIL: arch-audit
In archlinux there is a package named: arch-audit that is
an utility like pkg-audit based on Arch CVE Monitoring Team data.
Install
# pacman -Ss arch-audit
community/arch-audit 0.1.10-1
# sudo pacman -S arch-audit
resolving dependencies...
looking for conflicting packages...
Package (1) New Version Net Change Download Size
community/arch-audit 0.1.10-1 1.96 MiB 0.57 MiB
Total Download Size: 0.57 MiB
Total Installed Size: 1.96 MiB
Run
# arch-audit
Package docker is affected by CVE-2018-15664. High risk!
Package gettext is affected by CVE-2018-18751. High risk!
Package glibc is affected by CVE-2019-9169, CVE-2019-5155, CVE-2018-20796, CVE-2016-10739. High risk!
Package libarchive is affected by CVE-2019-1000020, CVE-2019-1000019, CVE-2018-1000880, CVE-2018-1000879, CVE-2018-1000878, CVE-2018-1000877. High risk!
Package libtiff is affected by CVE-2019-7663, CVE-2019-6128. Medium risk!
Package linux-lts is affected by CVE-2018-5391, CVE-2018-3646, CVE-2018-3620, CVE-2018-3615, CVE-2018-8897, CVE-2017-8824, CVE-2017-17741, CVE-2017-17450, CVE-2017-17448, CVE-2017-16644, CVE-2017-5753, CVE-2017-5715, CVE-2018-1121, CVE-2018-1120, CVE-2017-1000379, CVE-2017-1000371, CVE-2017-1000370, CVE-2017-1000365. High risk!
Package openjpeg2 is affected by CVE-2019-6988. Low risk!
Package python-yaml is affected by CVE-2017-18342. High risk!. Update to 5.1-1 from testing repos!
Package sdl is affected by CVE-2019-7638, CVE-2019-7637, CVE-2019-7636, CVE-2019-7635, CVE-2019-7578, CVE-2019-7577, CVE-2019-7576, CVE-2019-7575, CVE-2019-7574, CVE-2019-7573, CVE-2019-7572. High risk!
Package sdl2 is affected by CVE-2019-7638, CVE-2019-7637, CVE-2019-7636, CVE-2019-7635, CVE-2019-7578, CVE-2019-7577, CVE-2019-7576, CVE-2019-7575, CVE-2019-7574, CVE-2019-7573, CVE-2019-7572. High risk!
Package unzip is affected by CVE-2018-1000035. Low risk!
Open Telekom Cloud (OTC) is Deutsche Telekom’s Cloud for Business Customers. I would suggest to visit the Cloud Infrastructure & Cloud Platform Solutions for more information but I will try to keep this a technical post.
In this post you will find my personal notes on how to use the native python openstack CLI client to access OTC from your console.
Notes are based on Ubuntu 18.04.2 LTS
Virtual Environment
Create an isolated python virtual environment (directory) to setup everything under there:
~> mkdir -pv otc/ && cd otc/
mkdir: created directory 'otc/'
~> virtualenv -p `which python3` .
Already using interpreter /usr/bin/python3
Using base prefix '/usr'
New python executable in /home/ebal/otc/bin/python3
Also creating executable in /home/ebal/otc/bin/python
Installing setuptools, pkg_resources, pip, wheel...done.
~> source bin/activate
(otc) ebal@ubuntu:otc~>
(otc) ebal@ubuntu:otc~> python -V
Python 3.6.7
OpenStack Dependencies
Install python openstack dependencies
- openstack sdk
(otc) ebal@ubuntu:otc~> pip install python-openstacksdk
Collecting python-openstacksdk
Collecting openstacksdk (from python-openstacksdk)
Downloading https://files.pythonhosted.org/packages/
...
- openstack client
(otc) ebal@ubuntu:otc~> pip install python-openstackclient
Collecting python-openstackclient
Using cached https://files.pythonhosted.org/packages
...
Install OTC Extensions
It’s time to install the otcextensions
(otc) ebal@ubuntu:otc~> pip install otcextensions
Collecting otcextensions
Requirement already satisfied: openstacksdk>=0.19.0 in ./lib/python3.6/site-packages (from otcextensions) (0.30.0)
...
Installing collected packages: otcextensions
Successfully installed otcextensions-0.6.3
List
(otc) ebal@ubuntu:otc~> pip list | egrep '^python|otc'
otcextensions 0.6.3
python-cinderclient 4.2.0
python-glanceclient 2.16.0
python-keystoneclient 3.19.0
python-novaclient 14.1.0
python-openstackclient 3.18.0
python-openstacksdk 0.5.2
Authentication
Create a new clouds.yaml
with your OTC credentials
template:
clouds:
otc:
auth:
username: 'USER_NAME'
password: 'PASS'
project_name: 'eu-de'
auth_url: 'https://iam.eu-de.otc.t-systems.com:443/v3'
user_domain_name: 'OTC00000000001000000xxx'
interface: 'public'
identity_api_version: 3 # !Important
ak: 'AK_VALUE' # AK/SK pair for access to OBS
sk: 'SK_VAL
OTC Connect
Let’s tested it !
(otc) ebal@ubuntu:otc~> openstack --os-cloud otc server list
+--------------------------------------+---------------+---------+-------------------------------------------------------------------+-----------------------------------+---------------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+---------------+---------+-------------------------------------------------------------------+-----------------------------------+---------------+
| XXXXXXXX-1234-4d7f-8097-YYYYYYYYYYYY | app00-prod | ACTIVE | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.101.72.110 | Standard_Ubuntu_16.04_latest | s2.large.2 |
| XXXXXXXX-1234-4f7d-beaa-YYYYYYYYYYYY | app01-prod | ACTIVE | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.101.64.95 | Standard_Ubuntu_16.04_latest | s2.medium.2 |
| XXXXXXXX-1234-4088-baa8-YYYYYYYYYYYY | app02-prod | ACTIVE | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.100.76.160 | Standard_Ubuntu_16.04_latest | s2.large.4 |
| XXXXXXXX-1234-43f5-8a10-YYYYYYYYYYYY | web00-prod | ACTIVE | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.100.76.121 | Standard_Ubuntu_16.04_latest | s2.xlarge.2 |
| XXXXXXXX-1234-41eb-aa0b-YYYYYYYYYYYY | web01-prod | ACTIVE | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.100.76.151 | Standard_Ubuntu_16.04_latest | s2.large.4 |
| XXXXXXXX-1234-41f7-98ff-YYYYYYYYYYYY | web00-stage | ACTIVE | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.100.76.150 | Standard_Ubuntu_16.04_latest | s2.large.4 |
| XXXXXXXX-1234-41b2-973f-YYYYYYYYYYYY | web01-stage | ACTIVE | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.100.76.120 | Standard_Ubuntu_16.04_latest | s2.xlarge.2 |
| XXXXXXXX-1234-468f-a41c-YYYYYYYYYYYY | app00-stage | SHUTOFF | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.101.70.111 | Community_Ubuntu_14.04_TSI_latest | s2.xlarge.2 |
| XXXXXXXX-1234-4fdf-8b4c-YYYYYYYYYYYY | app01-stage | SHUTOFF | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.100.64.92 | Community_Ubuntu_14.04_TSI_latest | s1.large |
| XXXXXXXX-1234-4e68-a86d-YYYYYYYYYYYY | app02-stage | SHUTOFF | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.100.66.96 | Community_Ubuntu_14.04_TSI_latest | s2.xlarge.4 |
| XXXXXXXX-1234-475d-9a66-YYYYYYYYYYYY | web00-test | SHUTOFF | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.102.76.11, 10.44.23.18 | Community_Ubuntu_14.04_TSI_latest | c1.medium |
| XXXXXXXX-1234-4dac-a6b1-YYYYYYYYYYYY | web01-test | SHUTOFF | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.101.64.14 | Community_Ubuntu_14.04_TSI_latest | s1.xlarge |
| XXXXXXXX-1234-458e-8e21-YYYYYYYYYYYY | web02-test | SHUTOFF | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.101.64.13 | Community_Ubuntu_14.04_TSI_latest | s1.xlarge |
| XXXXXXXX-1234-42c4-b953-YYYYYYYYYYYY | k8s02-prod | SHUTOFF | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.101.64.12 | Community_Ubuntu_14.04_TSI_latest | s1.xlarge |
| XXXXXXXX-1234-4225-b6af-YYYYYYYYYYYY | k8s02-stage | SHUTOFF | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.101.64.11 | Community_Ubuntu_14.04_TSI_latest | s1.xlarge |
| XXXXXXXX-1234-4eb1-a596-YYYYYYYYYYYY | k8s02-test | SHUTOFF | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.102.64.14 | Community_Ubuntu_14.04_TSI_latest | s1.xlarge |
| XXXXXXXX-1234-4222-b866-YYYYYYYYYYYY | k8s03-test | SHUTOFF | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.102.64.13 | Community_Ubuntu_14.04_TSI_latest | s1.xlarge |
| XXXXXXXX-1234-453d-a9c5-YYYYYYYYYYYY | k8s04-test | SHUTOFF | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.101.64.10 | Community_Ubuntu_14.04_TSI_latest | s1.2xlarge |
| XXXXXXXX-1234-4968-a2be-YYYYYYYYYYYY | k8s05-test | SHUTOFF | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.102.76.14, 10.44.22.138 | Community_Ubuntu_14.04_TSI_latest | c2.2xlarge |
| XXXXXXXX-1234-4c71-a00f-YYYYYYYYYYYY | k8s07-test | SHUTOFF | XXXXXXXX-YYYY-ZZZZ-AAAA-BBBBBBCCCCCC=100.102.76.170 | Community_Ubuntu_14.04_TSI_latest | c1.medium |
+--------------------------------------+---------------+---------+-------------------------------------------------------------------+-----------------------------------+---------------+
Load Balancers
(otc) ebal@ubuntu:~/otc~> openstack --os-cloud otc loadbalancer list
+--------------------------------------+----------------------------------+----------------------------------+---------------+---------------------+----------+
| id | name | project_id | vip_address | provisioning_status | provider |
+--------------------------------------+----------------------------------+----------------------------------+---------------+---------------------+----------+
| XXXXXXXX-ad99-4de0-d885-YYYYYYYYYYYY | aaccaacbddd1111eee5555aaaaa22222 | 44444bbbbbbb4444444cccccc3333333 | 100.100.10.143 | ACTIVE | vlb |
+--------------------------------------+----------------------------------+----------------------------------+---------------+---------------------+----------+
start by reading:
man 5 sshd_config
CentOS 6.x
Ciphers aes128-ctr,aes192-ctr,aes256-ctr
KexAlgorithms diffie-hellman-group-exchange-sha256
MACs hmac-sha2-256,hmac-sha2-512
and change the below setting in /etc/sysconfig/sshd:
AUTOCREATE_SERVER_KEYS=RSAONLY
CentOS 7.x
Ciphers chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com
KexAlgorithms curve25519-sha256,curve25519-sha256@libssh.org,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group14-sha256
MACs umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512
Ubuntu 18.04.2 LTS
Ciphers chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com
KexAlgorithms curve25519-sha256,curve25519-sha256@libssh.org,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group14-sha256
MACs umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512
HostKeyAlgorithms ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp521-cert-v01@openssh.com,ssh-ed25519-cert-v01@openssh.com,ssh-rsa-cert-v01@openssh.com,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,ssh-ed25519,rsa-sha2-512,rsa-sha2-256
Archlinux
Ciphers chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com
KexAlgorithms curve25519-sha256,curve25519-sha256@libssh.org,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group14-sha256
MACs umac-128-etm@openssh.com,hmac-sha2-512-etm@openssh.com
HostKeyAlgorithms ssh-ed25519-cert-v01@openssh.com,rsa-sha2-512-cert-v01@openssh.com,rsa-sha2-256-cert-v01@openssh.com,ssh-rsa-cert-v01@openssh.com,ssh-ed25519,rsa-sha2-512,rsa-sha2-256,ssh-rsa
Renew SSH Host Keys
rm -f /etc/ssh/ssh_host_* && service sshd restart
Generating ssh moduli file
for advance users only
ssh-keygen -G /tmp/moduli -b 4096
ssh-keygen -T /etc/ssh/moduli -f /tmp/moduli
This blog post is written mostly as a reminder to myself!
Loop Assert Time Tests with Regex
test.yml
---
- hosts: localhost
connection: local
become: no
gather_facts: no
tasks:
- name: Assert Time
assert:
that:
- item | regex_findall('^(?:\d|[01]\d|2[0-3]):[0-5]\d(:[0-5]\d)?$')
msg: "starTime should be HH:MM:SS or HH:MM"
with_items:
- "21:00"
- "22:22"
- "3:3"
- "03:3"
- "03:03"
- "03:03:03"
- "24:00"
- "22:22:22"
- "23:00:60"
- "24:24:24"
- "00:00:00"
- "21:21"
- "foo"
- "foo:bar"
- "True"
- "False"
- "1:01"
- "01:01"
ignore_errors: True
# vim: sts=2 sw=2 ts=2 et
output
PLAY [localhost] **************************************************************
TASK [Assert Time] ************************************************************
ok: [localhost] => (item=21:00) => {
"changed": false,
"item": "21:00",
"msg": "All assertions passed"
}
ok: [localhost] => (item=22:22) => {
"changed": false,
"item": "22:22",
"msg": "All assertions passed"
}
failed: [localhost] (item=3:3) => {
"assertion": "item | regex_findall('^(?:\\d|[01]\\d|2[0-3]):[0-5]\\d(:[0-5]\\d)?$')",
"changed": false,
"evaluated_to": false,
"item": "3:3",
"msg": "starTime should be HH:MM:SS or HH:MM"
}
failed: [localhost] (item=03:3) => {
"assertion": "item | regex_findall('^(?:\\d|[01]\\d|2[0-3]):[0-5]\\d(:[0-5]\\d)?$')",
"changed": false,
"evaluated_to": false,
"item": "03:3",
"msg": "starTime should be HH:MM:SS or HH:MM"
}
ok: [localhost] => (item=03:03) => {
"changed": false,
"item": "03:03",
"msg": "All assertions passed"
}
ok: [localhost] => (item=03:03:03) => {
"changed": false,
"item": "03:03:03",
"msg": "All assertions passed"
}
failed: [localhost] (item=24:00) => {
"assertion": "item | regex_findall('^(?:\\d|[01]\\d|2[0-3]):[0-5]\\d(:[0-5]\\d)?$')",
"changed": false,
"evaluated_to": false,
"item": "24:00",
"msg": "starTime should be HH:MM:SS or HH:MM"
}
ok: [localhost] => (item=22:22:22) => {
"changed": false,
"item": "22:22:22",
"msg": "All assertions passed"
}
failed: [localhost] (item=23:00:60) => {
"assertion": "item | regex_findall('^(?:\\d|[01]\\d|2[0-3]):[0-5]\\d(:[0-5]\\d)?$')",
"changed": false,
"evaluated_to": false,
"item": "23:00:60",
"msg": "starTime should be HH:MM:SS or HH:MM"
}
failed: [localhost] (item=24:24:24) => {
"assertion": "item | regex_findall('^(?:\\d|[01]\\d|2[0-3]):[0-5]\\d(:[0-5]\\d)?$')",
"changed": false,
"evaluated_to": false,
"item": "24:24:24",
"msg": "starTime should be HH:MM:SS or HH:MM"
}
ok: [localhost] => (item=00:00:00) => {
"changed": false,
"item": "00:00:00",
"msg": "All assertions passed"
}
ok: [localhost] => (item=21:21) => {
"changed": false,
"item": "21:21",
"msg": "All assertions passed"
}
failed: [localhost] (item=foo) => {
"assertion": "item | regex_findall('^(?:\\d|[01]\\d|2[0-3]):[0-5]\\d(:[0-5]\\d)?$')",
"changed": false,
"evaluated_to": false,
"item": "foo",
"msg": "starTime should be HH:MM:SS or HH:MM"
}
failed: [localhost] (item=foo:bar) => {
"assertion": "item | regex_findall('^(?:\\d|[01]\\d|2[0-3]):[0-5]\\d(:[0-5]\\d)?$')",
"changed": false,
"evaluated_to": false,
"item": "foo:bar",
"msg": "starTime should be HH:MM:SS or HH:MM"
}
failed: [localhost] (item=True) => {
"assertion": "item | regex_findall('^(?:\\d|[01]\\d|2[0-3]):[0-5]\\d(:[0-5]\\d)?$')",
"changed": false,
"evaluated_to": false,
"item": "True",
"msg": "starTime should be HH:MM:SS or HH:MM"
}
failed: [localhost] (item=False) => {
"assertion": "item | regex_findall('^(?:\\d|[01]\\d|2[0-3]):[0-5]\\d(:[0-5]\\d)?$')",
"changed": false,
"evaluated_to": false,
"item": "False",
"msg": "starTime should be HH:MM:SS or HH:MM"
}
ok: [localhost] => (item=1:01) => {
"changed": false,
"item": "1:01",
"msg": "All assertions passed"
}
ok: [localhost] => (item=01:01) => {
"changed": false,
"item": "01:01",
"msg": "All assertions passed"
}
...ignoring
PLAY RECAP ********************************************************************
localhost : ok=1 changed=0 unreachable=0 failed=0
I was suspicious with a cron entry on a new ubuntu server cloud vm, so I ended up to be looking on the logs.
Authentication token is no longer valid; new one required
After a quick internet search,
# chage -l root
Last password change : password must be changed
Password expires : password must be changed
Password inactive : password must be changed
Account expires : never
Minimum number of days between password change : 0
Maximum number of days between password change : 99999
Number of days of warning before password expires : 7
due to the password must be changed on the root account, the cron entry does not run as it should.
This ephemeral image does not need to have a persistent known password, as the notes suggest, and it doesn’t! Even so, we should change to root password when creating the VM.
Ansible
Ansible have a password plugin that we can use with lookup.
TLDR; here is the task:
- name: Generate Random Password
user:
name: root
password: "{{ lookup('password','/dev/null encrypt=sha256_crypt length=32') }}"
after ansible-playbook runs
# chage -l root
Last password change : Mar 10, 2019
Password expires : never
Password inactive : never
Account expires : never
Minimum number of days between password change : 0
Maximum number of days between password change : 99999
Number of days of warning before password expires : 7
and cron entry now runs as it should.
Password Plugin
Let explain how password plugin works.
Lookup needs at-least two (2) variables, the plugin name and a file to store the output. Instead, we will use /dev/null
not to persist the password to a file.
To begin with it, a test ansible playbook:
- hosts: localhost
gather_facts: False
connection: local
tasks:
- debug:
msg: "{{ lookup('password', '/dev/null') }}"
with_sequence: count=5
Output:
ok: [localhost] => (item=1) => {
"msg": "dQaVE0XwWti,7HMUgq::"
}
ok: [localhost] => (item=2) => {
"msg": "aT3zqg.KjLwW89MrAApx"
}
ok: [localhost] => (item=3) => {
"msg": "4LBNn:fVw5GhXDWh6TnJ"
}
ok: [localhost] => (item=4) => {
"msg": "v273Hbox1rkQ3gx3Xi2G"
}
ok: [localhost] => (item=5) => {
"msg": "NlwzHoLj8S.Y8oUhcMv,"
}
Length
In password plugin we can also use length variable:
msg: "{{ lookup('password', '/dev/null length=32') }}"
output:
ok: [localhost] => (item=1) => {
"msg": "4.PEb6ycosnyL.SN7jinPM:AC9w2iN_q"
}
ok: [localhost] => (item=2) => {
"msg": "s8L6ZU_Yzuu5yOk,ISM28npot4.KwQrE"
}
ok: [localhost] => (item=3) => {
"msg": "L9QvLyNTvpB6oQmcF8WVFy.7jE4Q1K-W"
}
ok: [localhost] => (item=4) => {
"msg": "6DMH8KqIL:kx0ngFe8:ri0lTK4hf,SWS"
}
ok: [localhost] => (item=5) => {
"msg": "ByW11i_66K_0mFJVB37Mq2,.fBflepP9"
}
Characters
We can define a specific type of python string constants
- ascii_letters (ascii_lowercase and ascii_uppercase
- ascii_lowercase
- ascii_uppercase
- digits
- hexdigits
- letters (lowercase and uppercase)
- lowercase
- octdigits
- punctuation
- printable (digits, letters, punctuation and whitespace
- uppercase
- whitespace
eg.
msg: "{{ lookup('password', '/dev/null length=32 chars=ascii_lowercase') }}"
ok: [localhost] => (item=1) => {
"msg": "vwogvnpemtdobjetgbintcizjjgdyinm"
}
ok: [localhost] => (item=2) => {
"msg": "pjniysksnqlriqekqbstjihzgetyshmp"
}
ok: [localhost] => (item=3) => {
"msg": "gmeoeqncdhllsguorownqbynbvdusvtw"
}
ok: [localhost] => (item=4) => {
"msg": "xjluqbewjempjykoswypqlnvtywckrfx"
}
ok: [localhost] => (item=5) => {
"msg": "pijnjfcpjoldfuxhmyopbmgdmgdulkai"
}
Encrypt
We can also define the encryption hash. Ansible uses passlib so the unix active encrypt hash algorithms are:
- passlib.hash.bcrypt - BCrypt
- passlib.hash.sha256_crypt - SHA-256 Crypt
- passlib.hash.sha512_crypt - SHA-512 Crypt
eg.
msg: "{{ lookup('password', '/dev/null length=32 chars=ascii_lowercase encrypt=sha512_crypt') }}"
ok: [localhost] => (item=1) => {
"msg": "$6$BR96lZqN$jy.CRVTJaubOo6QISUJ9tQdYa6P6tdmgRi1/NQKPxwX9/Plp.7qETuHEhIBTZDTxuFqcNfZKtotW5q4H0BPeN."
}
ok: [localhost] => (item=2) => {
"msg": "$6$ESf5xnWJ$cRyOuenCDovIp02W0eaBmmFpqLGGfz/K2jd1FOSVkY.Lsuo8Gz8oVGcEcDlUGWm5W/CIKzhS43xdm5pfWyCA4."
}
ok: [localhost] => (item=3) => {
"msg": "$6$pS08v7j3$M4mMPkTjSwElhpY1bkVL727BuMdhyl4IdkGM7Mq10jRxtCSrNlT4cAU3wHRVxmS7ZwZI14UwhEB6LzfOL6pM4/"
}
ok: [localhost] => (item=4) => {
"msg": "$6$m17q/zmR$JdogpVxY6MEV7nMKRot069YyYZN6g8GLqIbAE1cRLLkdDT3Qf.PImkgaZXDqm.2odmLN8R2ZMYEf0vzgt9PMP1"
}
ok: [localhost] => (item=5) => {
"msg": "$6$tafW6KuH$XOBJ6b8ORGRmRXHB.pfMkue56S/9NWvrY26s..fTaMzcxTbR6uQW1yuv2Uy1DhlOzrEyfFwvCQEhaK6MrFFye."
}
Ansible is a wonderful software to automatically configure your systems. The default mode of using ansible is Push Model.
That means from your box, and only using ssh + python, you can configure your flee of machines.
Ansible is imperative. You define tasks in your playbooks, roles and they will run in a serial manner on the remote machines. The task will first check if needs to run and otherwise it will skip the action. And although we can use conditional to skip actions, tasks will perform all checks. For that reason ansible seems slow instead of other configuration tools. Ansible runs in serial mode the tasks but in psedo-parallel mode against the remote servers, to increase the speed. But sometimes you need to gather_facts and that would cost in execution time. There are solutions to cache the ansible facts in a redis (in memory key:value db) but even then, you need to find a work-around to speed your deployments.
But there is an another way, the Pull Mode!
Useful Reading Materials
to learn more on the subject, you can start reading these two articles on ansible-pull.
Pull Mode
So here how it looks:
You will first notice, that your ansible repository is moved from you local machine to an online git repository. For me, this is GitLab. As my git repo is private, I have created a Read-Only, time-limit, Deploy Token.
With that scenario, our (ephemeral - or not) VMs will pull their ansible configuration from the git repo and run the tasks locally. I usually build my infrastructure with Terraform by HashiCorp and make advance of cloud-init to initiate their initial configuration.
Cloud-init
The tail of my user-data.yml looks pretty much like this:
...
# Install packages
packages:
- ansible
# Run ansible-pull
runcmd:
- ansible-pull -U https://gitlab+deploy-token-XXXXX:YYYYYYYY@gitlab.com/username/myrepo.git
Playbook
You can either create a playbook named with the hostname of the remote server, eg. node1.yml
or use the local.yml
as the default playbook name.
Here is an example that will also put ansible-pull into a cron entry. This is very useful because it will check for any changes in the git repo every 15 minutes and run ansible again.
- hosts: localhost
tasks:
- name: Ensure ansible-pull is running every 15 minutes
cron:
name: "ansible-pull"
minute: "15"
job: "ansible-pull -U https://gitlab+deploy-token-XXXXX:YYYYYYYY@gitlab.com/username/myrepo.git &> /dev/null"
- name: Create a custom local vimrc file
lineinfile:
path: /etc/vim/vimrc.local
line: 'set modeline'
create: yes
- name: Remove "cloud-init" package
apt:
name: "cloud-init"
purge: yes
state: absent
- name: Remove useless packages from the cache
apt:
autoclean: yes
- name: Remove dependencies that are no longer required
apt:
autoremove: yes
# vim: sts=2 sw=2 ts=2 et
In this short blog post, I will share a few ansible tips:
Run a specific role
ansible -i hosts -m include_role -a name=roles/common all -b
Print Ansible Variables
ansible -m debug -a 'var=vars' 127.0.0.1 | sed "1 s/^.*$/{/" | jq .
Ansible Conditionals
Variable contains string
matching string in a variable
vars:
var1: "hello world"
- debug:
msg: " {{ var1 }} "
when: ........
old way
when: var1.find("hello") != -1
deprecated way
when: var1 | search("hello")
Correct way
when: var1 is search("hello")
Multiple Conditionals
Logical And
when:
- ansible_distribution == "Archlinux"
- ansible_hostname is search("myhome")
Numeric Conditionals
getting variable from command line (or from somewhere else)
- set_fact:
my_variable_keepday: "{{ keepday | default(7) | int }}"
- name: something
debug:
msg: "keepday : {{ my_variable_keepday }}"
when:
- my_variable_keepday|int >= 0
- my_variable_keepday|int <= 10
Validate Time variable with ansible
I need to validate a variable. It should be ‘%H:%M:%S’
My work-around is to convert it to datetime so that I can validate it:
tasks:
- debug:
msg: "{{ startTime | to_datetime('%H:%M:%S') }}"
First example: 21:30:15
True: "msg": "1900-01-01 21:30:15"
Second example: ‘25:00:00′
could not be converted to an dict.
The error was: time data '25:00:00' does not match format '%H:%M:%S'
This article will show how to install Arch Linux in Windows 10 under Windows Subsystem for Linux.
WSL
Prerequisite is to have enabled WSL on your Win10 and already reboot your machine.
You can enable WSL :
- Windows Settings
- Apps
- Apps & features
- Related settings -> Programs and Features (bottom)
- Turn Windows features on or off (left)
Store
After rebooting your Win10, you can use Microsoft Store to install a Linux distribution like Ubuntu. Archlinux is not an official supported linux distribution thus this guide !
Launcher
The easiest way to install Archlinux (or any Linux distro) is to download the wsldl from github. This project provides a generic Launcher.exe and any rootfs as source base. First thing is to rename Launcher.exe to Archlinux.exe.
ebal@myworklaptop:~$ mkdir -pv Archlinux
mkdir: created directory 'Archlinux'
ebal@myworklaptop:~$ cd Archlinux/
ebal@myworklaptop:~/Archlinux$ curl -sL -o Archlinux.exe https://github.com/yuk7/wsldl/releases/download/18122700/Launcher.exe
ebal@myworklaptop:~/Archlinux$ ls -l
total 320
-rw-rw-rw- 1 ebal ebal 143147 Feb 21 20:40 Archlinux.exe
RootFS
Next step is to download the latest archlinux root filesystem and create a new rootfs.tar.gz archive file, as wsldl uses this type.
ebal@myworklaptop:~/Archlinux$ curl -sLO http://ftp.otenet.gr/linux/archlinux/iso/latest/archlinux-bootstrap-2019.02.01-x86_64.tar.gz
ebal@myworklaptop:~/Archlinux$ ls -l
total 147392
-rw-rw-rw- 1 ebal ebal 143147 Feb 21 20:40 Archlinux.exe
-rw-rw-rw- 1 ebal ebal 149030552 Feb 21 20:42 archlinux-bootstrap-2019.02.01-x86_64.tar.gz
ebal@myworklaptop:~/Archlinux$ sudo tar xf archlinux-bootstrap-2019.02.01-x86_64.tar.gz
ebal@myworklaptop:~/Archlinux$ cd root.x86_64/
ebal@myworklaptop:~/Archlinux/root.x86_64$ ls
README bin boot dev etc home lib lib64 mnt opt proc root run sbin srv sys tmp usr var
ebal@myworklaptop:~/Archlinux/root.x86_64$ sudo tar czf rootfs.tar.gz .
tar: .: file changed as we read it
ebal@myworklaptop:~/Archlinux/root.x86_64$ ls
README bin boot dev etc home lib lib64 mnt opt proc root rootfs.tar.gz run sbin srv sys tmp usr var
ebal@myworklaptop:~/Archlinux/root.x86_64$ du -sh rootfs.tar.gz
144M rootfs.tar.gz
ebal@myworklaptop:~/Archlinux/root.x86_64$ sudo mv rootfs.tar.gz ../
ebal@myworklaptop:~/Archlinux/root.x86_64$ cd ..
ebal@myworklaptop:~/Archlinux$ ls
Archlinux.exe archlinux-bootstrap-2019.02.01-x86_64.tar.gz root.x86_64 rootfs.tar.gz
ebal@myworklaptop:~/Archlinux$
ebal@myworklaptop:~/Archlinux$ ls
Archlinux.exe rootfs.tar.gz
ebal@myworklaptop:~$ mv Archlinux/ /mnt/c/Users/EvaggelosBalaskas/Downloads/ArchlinuxWSL
ebal@myworklaptop:~$
As you can see, I do a little clean up and I move the directory under windows filesystem.
Install & Verify
Microsoft Windows [Version 10.0.17134.619]
(c) 2018 Microsoft Corporation. All rights reserved.
C:UsersEvaggelosBalaskas>cd Downloads/ArchlinuxWSL
C:UsersEvaggelosBalaskasDownloadsArchlinuxWSL>dir
Volume in drive C is Windows
Volume Serial Number is 6C02-EE43
Directory of C:UsersEvaggelosBalaskasDownloadsArchlinuxWSL
21-Feb-19 21:04 <DIR> .
21-Feb-19 21:04 <DIR> ..
21-Feb-19 20:40 143,147 Archlinux.exe
21-Feb-19 20:52 150,178,551 rootfs.tar.gz
2 File(s) 150,321,698 bytes
2 Dir(s) 374,579,486,720 bytes free
C:UsersEvaggelosBalaskasDownloadsArchlinuxWSL>Archlinux.exe
Installing...
Installation Complete!
Press any key to continue...
C:UsersEvaggelosBalaskasDownloadsArchlinuxWSL>Archlinux.exe run uname -a
Linux myworklaptop 4.4.0-17134-Microsoft #523-Microsoft Mon Dec 31 17:49:00 PST 2018 x86_64 GNU/Linux
C:UsersEvaggelosBalaskasDownloadsArchlinuxWSL>Archlinux.exe run cat /etc/os-release
NAME="Arch Linux"
PRETTY_NAME="Arch Linux"
ID=arch
BUILD_ID=rolling
ANSI_COLOR="0;36"
HOME_URL="https://www.archlinux.org/"
DOCUMENTATION_URL="https://wiki.archlinux.org/"
SUPPORT_URL="https://bbs.archlinux.org/"
BUG_REPORT_URL="https://bugs.archlinux.org/"
C:UsersEvaggelosBalaskasDownloadsArchlinuxWSL>Archlinux.exe run bash
[root@myworklaptop ArchlinuxWSL]#
[root@myworklaptop ArchlinuxWSL]# exit
Archlinux
C:UsersEvaggelosBalaskasDownloadsArchlinuxWSL>Archlinux.exe run bash
[root@myworklaptop ArchlinuxWSL]#
[root@myworklaptop ArchlinuxWSL]# date
Thu Feb 21 21:41:41 STD 2019
Remember, archlinux by default does not have any configuration. So you need to configure this instance !
Here are some basic configuration:
[root@myworklaptop ArchlinuxWSL]# echo nameserver 8.8.8.8 > /etc/resolv.conf
[root@myworklaptop ArchlinuxWSL]# cat > /etc/pacman.d/mirrorlist <<EOF
Server = http://ftp.otenet.gr/linux/archlinux/$repo/os/$arch
EOF
[root@myworklaptop ArchlinuxWSL]# pacman-key --init
[root@myworklaptop ArchlinuxWSL]# pacman-key --populate
[root@myworklaptop ArchlinuxWSL]# pacman -Syy
you are pretty much ready to use archlinux inside your windows 10 !!
Remove
You can remove Archlinux by simple:
Archlinux.exe clean
Default User
There is a simple way to use Archlinux within Windows Subsystem for Linux , by connecting with a default user.
But before configure ArchWSL, we need to create this user inside the archlinux instance:
[root@myworklaptop ArchWSL]# useradd -g 374 -u 374 ebal
[root@myworklaptop ArchWSL]# id ebal
uid=374(ebal) gid=374(ebal) groups=374(ebal)
[root@myworklaptop ArchWSL]# cp -rav /etc/skel/ /home/ebal
'/etc/skel/' -> '/home/ebal'
'/etc/skel/.bashrc' -> '/home/ebal/.bashrc'
'/etc/skel/.bash_profile' -> '/home/ebal/.bash_profile'
'/etc/skel/.bash_logout' -> '/home/ebal/.bash_logout'
chown -R ebal:ebal /home/ebal/
then exit the linux app and run:
> Archlinux.exe config --default-user ebal
and try to login again:
> Archlinux.exe run bash
[ebal@myworklaptop ArchWSL]$
[ebal@myworklaptop ArchWSL]$ cd ~
ebal@myworklaptop ~$ pwd -P
/home/ebal
Flatpak is a software utility for software deployment, package management, and application virtualization for Linux desktop computers. It provides a sandbox environment in which users can run applications in isolation from the rest of the system.
… in a nutshell, it is an isolate software bundle package which you can run with restricted permissions!
User Vs System
We can install flatpak applications for system-wide or for single-user. The last part does not need administrative access or any special permissions.
To use flatpak as a user, we have to add --user
next to every flatpak command.
Applications
A flatpak application has a manifest that describes dependancies & permissions.
Repositories
A repository contains a list of application manifests & the flatpak package (code).
Branches
One of the best features of flatpak, is that we can have multile versions of a specific application. This is being done by using a different branch or version (like git). Most common branch is default.
Add flathub
To use/install a flatpak application, we have first to add a remote flatpack repository localy.
The most well-known flatpak repository is called: flathub
flatpak remote-add --user flathub https://dl.flathub.org/repo/flathub.flatpakrepo
Search Applications
$ flatpak search --user Signal
Description Application Version Branch Remotes
Signal - Private messenger for the desktop org.signal.Signal 1.20.0 stable flathub
Install package as user
The default syntax of install packages is:
repository - application
$ flatpak install --user -y flathub com.dropbox.Client
$ flatpak install --user -y flathub org.signal.Signal
List
See how many packages to you have installed
$ flatpak list
Application
___________
com.dropbox.Client
com.jetbrains.PyCharm-Community
com.slack.Slack
com.visualstudio.code.oss
io.atom.Atom
org.signal.Signal
Run
To execute a flatpak application :
flatpak run com.slack.Slack
flatpak run org.signal.Signal
Update
and finally how to update your flatpack packages:
$ flatpak --user -y update
Remove Unused Packages
$ flatpak uninstall --user --unused
Uninstall flatpak package
$ flatpak uninstall --user com.visualstudio.code.oss
TLDR; Exit status value does not change when using redirection.
~> false
~> echo $?
1
~> true
~> echo $?
0
~> false > /dev/null
~> echo $?
1
~> true > /dev/null
~> echo $?
0
~> false 1> /dev/null
~> echo $?
1
~> true 1> /dev/null
~> echo $?
0
~> false 2> /dev/null
~> echo $?
1
~> true 2> /dev/null
~> echo $?
0
~> false &> /dev/null
~> echo $?
1
~> true &> /dev/null
~> echo $?
0
~> false 2>&1 >/dev/null
~> echo $?
1
~> true 2>&1 /dev/null
~> echo $?
0
~> false < /dev/null > /dev/null
~> echo $?
1
~> true < /dev/null > /dev/null
~> echo $?
0
Using Terraform by HashiCorp and cloud-init on Hetzner cloud provider.
Nowadays with the help of modern tools, we use our infrastructure as code. This approach is very useful because we can have Immutable design with our infra by declaring the state would like our infra to be. This also provide us with flexibility and a more generic way on how to handle our infra as lego bricks, especially on scaling.
UPDATE: 2019.01.22
Hetzner
We need to create an Access API Token
within a new project under the console of hetzner cloud.
Copy this token and with that in place we can continue with terraform.
For the purposes of this article, I am going to use as the API token: 01234567890
Install Terraform
the latest terraform version at the time of writing this blog post is: v.11.11
$ curl -sL https://releases.hashicorp.com/terraform/0.11.11/terraform_0.11.11_linux_amd64.zip |
bsdtar -xf- && chmod +x terraform
$ sudo mv terraform /usr/local/bin/
and verify it
$ terraform version
Terraform v0.11.11
Terraform Provider for Hetzner Cloud
To use the hetzner cloud via terraform, we need the terraform-provider-hcloud plugin.
hcloud, is part of terraform providers repository. So the first time of initialize our project, terraform will download this plugin locally.
Initializing provider plugins...
- Checking for available provider plugins on https://releases.hashicorp.com...
- Downloading plugin for provider "hcloud" (1.7.0)...
...
* provider.hcloud: version = "~> 1.7"
Compile hcloud
If you like, you can always build hcloud from the source code.
There are notes on how to build the plugin here Terraform Hetzner Cloud provider.
GitLab CI
or you can even download the artifact from my gitlab-ci repo.
Plugin directory
You will find the terraform hcloud plugin under your current directory:
./.terraform/plugins/linux_amd64/terraform-provider-hcloud_v1.7.0_x4
I prefer to copy the tf plugins centralized under my home directory:
$ mkdir -pv ~/.terraform/plugins/linux_amd64/
$ mv ./.terraform/plugins/linux_amd64/terraform-provider-hcloud_v1.7.0_x4 ~/.terraform.d/plugins/linux_amd64/terraform-provider-hcloud
or if you choose the artifact from gitlab:
$ curl -sL -o ~/.terraform/plugins/linux_amd64/terraform-provider-hcloud https://gitlab.com/ebal/terraform-provider-hcloud-ci/-/jobs/artifacts/master/raw/bin/terraform-provider-hcloud?job=run-build
That said, when working with multiple terraform projects you may be in a position that you need different versions of the same tf-plugin. In that case it is better to have them under your current working directory/project instead of your home directory. Perhaps one project needs v1.2.3 and another v4.5.6 of the same tf-plugin.
Hetzner Cloud API
Here is a few examples on how to use the Hetzner Cloud API:
$ export -p API_TOKEN="01234567890"
$ curl -sH "Authorization: Bearer $API_TOKEN" https://api.hetzner.cloud/v1/datacenters | jq -r .datacenters[].name
fsn1-dc8
nbg1-dc3
hel1-dc2
fsn1-dc14
$ curl -sH "Authorization: Bearer $API_TOKEN" https://api.hetzner.cloud/v1/locations | jq -r .locations[].name
fsn1
nbg1
hel1
$ curl -sH "Authorization: Bearer $API_TOKEN" https://api.hetzner.cloud/v1/images | jq -r .images[].name
ubuntu-16.04
debian-9
centos-7
fedora-27
ubuntu-18.04
fedora-28
hetzner.tf
At this point, we are ready to write our terraform file.
It can be as simple as this (CentOS 7):
# Set the variable value in *.tfvars file
# or using -var="hcloud_token=..." CLI option
variable "hcloud_token" {}
# Configure the Hetzner Cloud Provider
provider "hcloud" {
token = "${var.hcloud_token}"
}
# Create a new server running centos
resource "hcloud_server" "node1" {
name = "node1"
image = "centos-7"
server_type = "cx11"
}
Project_Ebal
or a more complex config: Ubuntu 18.04 LTS
# Project_Ebal
variable "hcloud_token" {}
# Configure the Hetzner Cloud Provider
provider "hcloud" {
token = "${var.hcloud_token}"
}
# Create a new server running centos
resource "hcloud_server" "Project_Ebal" {
name = "ebal_project"
image = "ubuntu-18.04"
server_type = "cx11"
location = "nbg1"
}
Repository Structure
Although in this blog post we have a small and simple example of using hetzner cloud with terraform, on larger projects is usually best to have separated terraform files for variables, code and output. For more info, you can take a look here: VCS Repository Structure - Workspaces
├── variables.tf
├── main.tf
├── outputs.tf
Cloud-init
To use cloud-init with hetzner is very simple.
We just need to add this declaration user_data = "${file("user-data.yml")}"
to terraform file.
So our previous tf is now this:
# Project_Ebal
variable "hcloud_token" {}
# Configure the Hetzner Cloud Provider
provider "hcloud" {
token = "${var.hcloud_token}"
}
# Create a new server running centos
resource "hcloud_server" "Project_Ebal" {
name = "ebal_project"
image = "ubuntu-18.04"
server_type = "cx11"
location = "nbg1"
user_data = "${file("user-data.yml")}"
}
to get the IP_Address of the virtual machine, I would also like to have an output declaration:
output "ipv4_address" {
value = "${hcloud_server.ebal_project.ipv4_address}"
}
Clout-init
You will find more notes on cloud-init on a previous blog post: Cloud-init with CentOS 7.
below is an example of user-data.yml
#cloud-config
disable_root: true
ssh_pwauth: no
users:
- name: ubuntu
ssh_import_id:
- gh:ebal
shell: /bin/bash
sudo: ALL=(ALL) NOPASSWD:ALL
# Set TimeZone
timezone: Europe/Athens
# Install packages
packages:
- mlocate
- vim
- figlet
# Update/Upgrade & Reboot if necessary
package_update: true
package_upgrade: true
package_reboot_if_required: true
# Remove cloud-init
runcmd:
- figlet Project_Ebal > /etc/motd
- updatedb
Terraform
First thing with terraform is to initialize our environment.
Init
$ terraform init
Initializing provider plugins...
Terraform has been successfully initialized!
You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.
If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
Plan
Of course it is not necessary to plan and then plan with out.
You can skip this step, here exist only for documentation purposes.
$ terraform plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.
------------------------------------------------------------------------
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
+ create
Terraform will perform the following actions:
+ hcloud_server.ebal_project
id: <computed>
backup_window: <computed>
backups: "false"
datacenter: <computed>
image: "ubuntu-18.04"
ipv4_address: <computed>
ipv6_address: <computed>
ipv6_network: <computed>
keep_disk: "false"
location: "nbg1"
name: "ebal_project"
server_type: "cx11"
status: <computed>
user_data: "sk6134s+ys+wVdGITc+zWhbONYw="
Plan: 1 to add, 0 to change, 0 to destroy.
------------------------------------------------------------------------
Note: You didn't specify an "-out" parameter to save this plan, so Terraform
can't guarantee that exactly these actions will be performed if
"terraform apply" is subsequently run.
Out
$ terraform plan -out terraform.tfplan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.
------------------------------------------------------------------------
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
+ create
Terraform will perform the following actions:
+ hcloud_server.ebal_project
id: <computed>
backup_window: <computed>
backups: "false"
datacenter: <computed>
image: "ubuntu-18.04"
ipv4_address: <computed>
ipv6_address: <computed>
ipv6_network: <computed>
keep_disk: "false"
location: "nbg1"
name: "ebal_project"
server_type: "cx11"
status: <computed>
user_data: "sk6134s+ys+wVdGITc+zWhbONYw="
Plan: 1 to add, 0 to change, 0 to destroy.
------------------------------------------------------------------------
This plan was saved to: terraform.tfplan
To perform exactly these actions, run the following command to apply:
terraform apply "terraform.tfplan"
Apply
$ terraform apply "terraform.tfplan"
hcloud_server.ebal_project: Creating...
backup_window: "" => "<computed>"
backups: "" => "false"
datacenter: "" => "<computed>"
image: "" => "ubuntu-18.04"
ipv4_address: "" => "<computed>"
ipv6_address: "" => "<computed>"
ipv6_network: "" => "<computed>"
keep_disk: "" => "false"
location: "" => "nbg1"
name: "" => "ebal_project"
server_type: "" => "cx11"
status: "" => "<computed>"
user_data: "" => "sk6134s+ys+wVdGITc+zWhbONYw="
hcloud_server.ebal_project: Still creating... (10s elapsed)
hcloud_server.ebal_project: Still creating... (20s elapsed)
hcloud_server.ebal_project: Creation complete after 23s (ID: 1676988)
Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
Outputs:
ipv4_address = 1.2.3.4
SSH and verify cloud-init
$ ssh 1.2.3.4 -l ubuntu
Welcome to Ubuntu 18.04.1 LTS (GNU/Linux 4.15.0-43-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
System information as of Fri Jan 18 12:17:14 EET 2019
System load: 0.41 Processes: 89
Usage of /: 9.7% of 18.72GB Users logged in: 0
Memory usage: 8% IP address for eth0: 1.2.3.4
Swap usage: 0%
0 packages can be updated.
0 updates are security updates.
Destroy
Be Careful without providing a specific terraform out plan, terraform will destroy every tfplan within your working directory/project. So it is always a good practice to explicit destroy a specify resource/tfplan.
$ terraform destroy
should better be:
$ terraform destroy -out terraform.tfplan
hcloud_server.ebal_project: Refreshing state... (ID: 1676988)
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
- destroy
Terraform will perform the following actions:
- hcloud_server.ebal_project
Plan: 0 to add, 0 to change, 1 to destroy.
Do you really want to destroy all resources?
Terraform will destroy all your managed infrastructure, as shown above.
There is no undo. Only 'yes' will be accepted to confirm.
Enter a value: yes
hcloud_server.ebal_project: Destroying... (ID: 1676988)
hcloud_server.ebal_project: Destruction complete after 1s
Destroy complete! Resources: 1 destroyed.
That’s it !
I am only using btrfs for the last few years, without any problem. Drobox’s decision is based on supporting Extended file attributes and even so btrfs supports extended attributes, seems you will get this error:
I have the benefit of using encrypted disks via LUKS so in this blog post, I will only present a way to have an virtual disk with ext4, to your dropbox folder on-top of your btrfs!
Allocating disk space
Let’s say that your have 2G of dropbox space, allocate 2G of file size:
fallocate -l 2G Dropbox.img
you can verify the disk image by:
qemu-img info Dropbox.img
image: Dropbox.img
file format: raw
virtual size: 2.0G (2147483648 bytes)
disk size: 2.0G
Map Virtual Disk to Loop Device
Then map your virtual disk to a loop device:
sudo losetup `sudo losetup -f` Dropbox.img
verify it:
losetup -a
/dev/loop0: []: (/home/ebal/Dropbox.img)
then loop0 it is.
Create an ext4 filesystem
sudo mkfs.ext4 -L Dropbox /dev/loop0
Creating filesystem with 524288 4k blocks and 131072 inodes
Filesystem UUID: 09c7eb41-b5f3-4c58-8965-5e366ddf4d97
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912
Allocating group tables: done
Writing inode tables: done
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done
Move Dropbox folder
You have to move your previous dropbox folder to another location. Dont forget to cloase dropbox-client and any other application that is using files from this folder.
mv Dropbox Dropbox.bak
renamed 'Dropbox' -> 'Dropbox.bak'
and then create a new Dropbox folder:
mkdir -pv Dropbox/
mkdir: created directory 'Dropbox/'
Mount the ext4 filesystem
sudo mount /dev/loop0 ~/Dropbox/
verify it:
df -h ~/Dropbox/
Filesystem Size Used Avail Use% Mounted on
/dev/loop0 2.0G 6.0M 1.8G 1% /home/ebal/Dropbox
and give the right perms:
sudo chown -R ebal:ebal ~/Dropbox/
Copy files
Now, copy the files from the old Dropbox folder to the new:
rsync -ravx Dropbox.bak/ Dropbox/
sent 464,823,778 bytes received 13,563 bytes 309,891,560.67 bytes/sec
total size is 464,651,441 speedup is 1.00
and start dropbox-client
Four Step Process
$ sudo iptables -nvL | grep 8765
0 0 ACCEPT tcp -- * * 192.168.0.0/24 0.0.0.0/0 tcp dpt:8765
The purpose of this blog post is to act as a visual guide/tutorial on how to setup an iOS device (iPad or iPhone) using the native apps against a custom Linux Mail, Calendar & Contact server.
Disclaimer: I wrote this blog post after 36hours with an apple device. I have never had any previous encagement with an apple product. Huge culture change & learning curve. Be aware, that the below notes may not apply to your setup.
Original creation date: Friday 12 Oct 2018
Last Update: Sunday 18 Nov 2018
Linux Mail Server
Notes are based on the below setup:
- CentOS 6.10
- Dovecot IMAP server with STARTTLS (TCP Port: 143) with Encrypted Password Authentication.
- Postfix SMTP with STARTTLS (TCP Port: 587) with Encrypted Password Authentication.
- Baïkal as Calendar & Contact server.
Thunderbird
Thunderbird settings for imap / smtp over STARTTLS and encrypted authentication
Baikal
Dashboard
CardDAV
contact URI
for user Username
https://baikal.baikal.example.org/html/card.php/addressbooks/Username/default
CalDAV
calendar URI
for user Username
https://baikal.example.org/html/cal.php/calendars/Username/default
iOS
There is a lot of online documentation but none in one place. Random Stack Overflow articles & posts in the internet. It took me almost an entire day (and night) to figure things out. In the end, I enabled debug mode on my dovecot/postifx & apache web server. After that, throught trail and error, I managed to setup both iPhone & iPad using only native apps.
Open Password & Accounts
& click on New Account
Choose Other
Now the tricky part, you have to click Next and fill the imap & smtp settings.
Now we have to go back and change the settings, to enable STARTTLS and encrypted password authentication.
STARTTLS with Encrypted Passwords for Authentication
In the home-page of the iPad/iPhone we will see the Mail-Notifications have already fetch some headers.
and finally, open the native mail app:
Contact Server
Now ready for setting up the contact account
https://baikal.baikal.example.org/html/card.php/addressbooks/Username/default
Opening Contact App:
Calendar Server
https://baikal.example.org/html/cal.php/calendars/Username/default
Cloud-init is the defacto multi-distribution package that handles early initialization of a cloud instance
This article is a mini-HowTo use cloud-init with centos7 in your own libvirt qemu/kvm lab, instead of using a public cloud provider.
How Cloud-init works
Josh Powers @ DebConf17
How really works?
Cloud-init has Boot Stages
- Generator
- Local
- Network
- Config
- Final
and supports modules to extend configuration and support.
Here is a brief list of modules (sorted by name):
- bootcmd
- final-message
- growpart
- keys-to-console
- locale
- migrator
- mounts
- package-update-upgrade-install
- phone-home
- power-state-change
- puppet
- resizefs
- rsyslog
- runcmd
- scripts-per-boot
- scripts-per-instance
- scripts-per-once
- scripts-user
- set_hostname
- set-passwords
- ssh
- ssh-authkey-fingerprints
- timezone
- update_etc_hosts
- update_hostname
- users-groups
- write-files
- yum-add-repo
Gist
Cloud-init example using a Generic Cloud CentOS-7 on a libvirtd qmu/kvm lab · GitHub
Generic Cloud CentOS 7
You can find a plethora of centos7 cloud images here:
Download the latest version
$ curl -LO http://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud.qcow2.xz
Uncompress file
$ xz -v --keep -d CentOS-7-x86_64-GenericCloud.qcow2.xz
Check cloud image
$ qemu-img info CentOS-7-x86_64-GenericCloud.qcow2
image: CentOS-7-x86_64-GenericCloud.qcow2
file format: qcow2
virtual size: 8.0G (8589934592 bytes)
disk size: 863M
cluster_size: 65536
Format specific information:
compat: 0.10
refcount bits: 16
The default image is 8G.
If you need to resize it, check below in this article.
Create metadata file
meta-data are data that comes from the cloud provider itself. In this example, I will use static network configuration.
cat > meta-data <<EOF
instance-id: testingcentos7
local-hostname: testingcentos7
network-interfaces: |
iface eth0 inet static
address 192.168.122.228
network 192.168.122.0
netmask 255.255.255.0
broadcast 192.168.122.255
gateway 192.168.122.1
# vim:syntax=yaml
EOF
Crete cloud-init (userdata) file
user-data are data that comes from you aka the user.
cat > user-data <<EOF
#cloud-config
# Set default user and their public ssh key
# eg. https://github.com/ebal.keys
users:
- name: ebal
ssh-authorized-keys:
- `curl -s -L https://github.com/ebal.keys`
sudo: ALL=(ALL) NOPASSWD:ALL
# Enable cloud-init modules
cloud_config_modules:
- resolv_conf
- runcmd
- timezone
- package-update-upgrade-install
# Set TimeZone
timezone: Europe/Athens
# Set DNS
manage_resolv_conf: true
resolv_conf:
nameservers: ['9.9.9.9']
# Install packages
packages:
- mlocate
- vim
- epel-release
# Update/Upgrade & Reboot if necessary
package_update: true
package_upgrade: true
package_reboot_if_required: true
# Remove cloud-init
runcmd:
- yum -y remove cloud-init
- updatedb
# Configure where output will go
output:
all: ">> /var/log/cloud-init.log"
# vim:syntax=yaml
EOF
Create the cloud-init ISO
When using libvirt with qemu/kvm the most common way to pass the meta-data/user-data to cloud-init, is through an iso (cdrom).
$ genisoimage -output cloud-init.iso -volid cidata -joliet -rock user-data meta-data
or
$ mkisofs -o cloud-init.iso -V cidata -J -r user-data meta-data
Provision new virtual machine
Finally run this as root:
# virt-install
--name centos7_test
--memory 2048
--vcpus 1
--metadata description="My centos7 cloud-init test"
--import
--disk CentOS-7-x86_64-GenericCloud.qcow2,format=qcow2,bus=virtio
--disk cloud-init.iso,device=cdrom
--network bridge=virbr0,model=virtio
--os-type=linux
--os-variant=centos7.0
--noautoconsole
The List of Os Variants
There is an interesting command to find out all the os variants that are being supported by libvirt in your lab:
eg. CentOS
$ osinfo-query os | grep CentOS
centos6.0 | CentOS 6.0 | 6.0 | http://centos.org/centos/6.0
centos6.1 | CentOS 6.1 | 6.1 | http://centos.org/centos/6.1
centos6.2 | CentOS 6.2 | 6.2 | http://centos.org/centos/6.2
centos6.3 | CentOS 6.3 | 6.3 | http://centos.org/centos/6.3
centos6.4 | CentOS 6.4 | 6.4 | http://centos.org/centos/6.4
centos6.5 | CentOS 6.5 | 6.5 | http://centos.org/centos/6.5
centos6.6 | CentOS 6.6 | 6.6 | http://centos.org/centos/6.6
centos6.7 | CentOS 6.7 | 6.7 | http://centos.org/centos/6.7
centos6.8 | CentOS 6.8 | 6.8 | http://centos.org/centos/6.8
centos6.9 | CentOS 6.9 | 6.9 | http://centos.org/centos/6.9
centos7.0 | CentOS 7.0 | 7.0 | http://centos.org/centos/7.0
DHCP
If you are not using a static network configuration scheme, then to identify the IP of your cloud instance, type:
$ virsh net-dhcp-leases default
Expiry Time MAC address Protocol IP address Hostname Client ID or DUID
---------------------------------------------------------------------------------------------------------
2018-11-17 15:40:31 52:54:00:57:79:3e ipv4 192.168.122.144/24 - -
Resize
The easiest way to grow/resize your virtual machine is via qemu-img command:
$ qemu-img resize CentOS-7-x86_64-GenericCloud.qcow2 20G
Image resized.
$ qemu-img info CentOS-7-x86_64-GenericCloud.qcow2
image: CentOS-7-x86_64-GenericCloud.qcow2
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 870M
cluster_size: 65536
Format specific information:
compat: 0.10
refcount bits: 16
You can add the below lines into your user-data file
growpart:
mode: auto
devices: ['/']
ignore_growroot_disabled: false
The result:
[root@testingcentos7 ebal]# df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 20G 870M 20G 5% /
Default cloud-init.cfg
For reference, this is the default centos7 cloud-init configuration file.
# /etc/cloud/cloud.cfg
users:
- default
disable_root: 1
ssh_pwauth: 0
mount_default_fields: [~, ~, 'auto', 'defaults,nofail', '0', '2']
resize_rootfs_tmp: /dev
ssh_deletekeys: 0
ssh_genkeytypes: ~
syslog_fix_perms: ~
cloud_init_modules:
- migrator
- bootcmd
- write-files
- growpart
- resizefs
- set_hostname
- update_hostname
- update_etc_hosts
- rsyslog
- users-groups
- ssh
cloud_config_modules:
- mounts
- locale
- set-passwords
- rh_subscription
- yum-add-repo
- package-update-upgrade-install
- timezone
- puppet
- chef
- salt-minion
- mcollective
- disable-ec2-metadata
- runcmd
cloud_final_modules:
- rightscale_userdata
- scripts-per-once
- scripts-per-boot
- scripts-per-instance
- scripts-user
- ssh-authkey-fingerprints
- keys-to-console
- phone-home
- final-message
- power-state-change
system_info:
default_user:
name: centos
lock_passwd: true
gecos: Cloud User
groups: [wheel, adm, systemd-journal]
sudo: ["ALL=(ALL) NOPASSWD:ALL"]
shell: /bin/bash
distro: rhel
paths:
cloud_dir: /var/lib/cloud
templates_dir: /etc/cloud/templates
ssh_svcname: sshd
# vim:syntax=yaml
I use Linux Software RAID for years now. It is reliable and stable (as long as your hard disks are reliable) with very few problems. One recent issue -that the daily cron raid-check was reporting- was this:
WARNING: mismatch_cnt is not 0 on /dev/md0
Raid Environment
A few details on this specific raid setup:
RAID 5 with 4 Drives
with 4 x 1TB hard disks and according the online raid calculator:
that means this setup is fault tolerant and cheap but not fast.
Raid Details
# /sbin/mdadm --detail /dev/md0
raid configuration is valid
/dev/md0:
Version : 1.2
Creation Time : Wed Feb 26 21:00:17 2014
Raid Level : raid5
Array Size : 2929893888 (2794.16 GiB 3000.21 GB)
Used Dev Size : 976631296 (931.39 GiB 1000.07 GB)
Raid Devices : 4
Total Devices : 4
Persistence : Superblock is persistent
Update Time : Sat Oct 27 04:38:04 2018
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : ServerTwo:0 (local to host ServerTwo)
UUID : ef5da4df:3e53572e:c3fe1191:925b24cf
Events : 60352
Number Major Minor RaidDevice State
4 8 16 0 active sync /dev/sdb
1 8 32 1 active sync /dev/sdc
6 8 48 2 active sync /dev/sdd
5 8 0 3 active sync /dev/sda
Examine Verbose Scan
with a more detailed output:
# mdadm -Evvvvs
there are a few Bad Blocks, although it is perfectly normal for a two (2) year disks to have some. smartctl is a tool you need to use from time to time.
/dev/sdd:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : ef5da4df:3e53572e:c3fe1191:925b24cf
Name : ServerTwo:0 (local to host ServerTwo)
Creation Time : Wed Feb 26 21:00:17 2014
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 1953266096 (931.39 GiB 1000.07 GB)
Array Size : 2929893888 (2794.16 GiB 3000.21 GB)
Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
Data Offset : 259072 sectors
Super Offset : 8 sectors
Unused Space : before=258984 sectors, after=3504 sectors
State : clean
Device UUID : bdd41067:b5b243c6:a9b523c4:bc4d4a80
Update Time : Sun Oct 28 09:04:01 2018
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 6baa02c9 - correct
Events : 60355
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sde:
MBR Magic : aa55
Partition[0] : 8388608 sectors at 2048 (type 82)
Partition[1] : 226050048 sectors at 8390656 (type 83)
/dev/sdc:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : ef5da4df:3e53572e:c3fe1191:925b24cf
Name : ServerTwo:0 (local to host ServerTwo)
Creation Time : Wed Feb 26 21:00:17 2014
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
Array Size : 2929893888 (2794.16 GiB 3000.21 GB)
Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
Data Offset : 259072 sectors
Super Offset : 8 sectors
Unused Space : before=258992 sectors, after=3504 sectors
State : clean
Device UUID : a90e317e:43848f30:0de1ee77:f8912610
Update Time : Sun Oct 28 09:04:01 2018
Checksum : 30b57195 - correct
Events : 60355
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdb:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : ef5da4df:3e53572e:c3fe1191:925b24cf
Name : ServerTwo:0 (local to host ServerTwo)
Creation Time : Wed Feb 26 21:00:17 2014
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
Array Size : 2929893888 (2794.16 GiB 3000.21 GB)
Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
Data Offset : 259072 sectors
Super Offset : 8 sectors
Unused Space : before=258984 sectors, after=3504 sectors
State : clean
Device UUID : ad7315e5:56cebd8c:75c50a72:893a63db
Update Time : Sun Oct 28 09:04:01 2018
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : b928adf1 - correct
Events : 60355
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sda:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : ef5da4df:3e53572e:c3fe1191:925b24cf
Name : ServerTwo:0 (local to host ServerTwo)
Creation Time : Wed Feb 26 21:00:17 2014
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
Array Size : 2929893888 (2794.16 GiB 3000.21 GB)
Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
Data Offset : 259072 sectors
Super Offset : 8 sectors
Unused Space : before=258984 sectors, after=3504 sectors
State : clean
Device UUID : f4e1da17:e4ff74f0:b1cf6ec8:6eca3df1
Update Time : Sun Oct 28 09:04:01 2018
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : bbe3e7e8 - correct
Events : 60355
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 3
Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
MisMatch Warning
WARNING: mismatch_cnt is not 0 on /dev/md0
So this is not a critical error, rather tells us that there are a few blocks that are “Not Synced Yet” across all disks.
Status
Checking the Multiple Device (md) driver status:
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdc[1] sda[5] sdd[6] sdb[4]
2929893888 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
We verify that none job is running on the raid.
Repair
We can run a manual repair job:
# echo repair >/sys/block/md0/md/sync_action
now status looks like:
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdc[1] sda[5] sdd[6] sdb[4]
2929893888 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
[=========>...........] resync = 45.6% (445779112/976631296) finish=54.0min speed=163543K/sec
unused devices: <none>
Progress
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdc[1] sda[5] sdd[6] sdb[4]
2929893888 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
[============>........] resync = 63.4% (619673060/976631296) finish=38.2min speed=155300K/sec
unused devices: <none>
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdc[1] sda[5] sdd[6] sdb[4]
2929893888 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
[================>....] resync = 81.9% (800492148/976631296) finish=21.6min speed=135627K/sec
unused devices: <none>
Finally
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdc[1] sda[5] sdd[6] sdb[4]
2929893888 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
unused devices: <none>
Check
After repair is it useful to check again the status of our software raid:
# echo check >/sys/block/md0/md/sync_action
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdc[1] sda[5] sdd[6] sdb[4]
2929893888 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
[=>...................] check = 9.5% (92965776/976631296) finish=91.0min speed=161680K/sec
unused devices: <none>
and finally
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdc[1] sda[5] sdd[6] sdb[4]
2929893888 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
unused devices: <none>