rss.png profile for ebal on Stack Exchange, a network of free, community-driven Q&A sites
Jul
03
2019
Down the troubleshooting rabbit-hole

Hardware Details

HP ProLiant MicroServer
AMD Turion(tm) II Neo N54L Dual-Core Processor
Memory Size: 2 GB - DIMM Speed: 1333 MT/s
Maximum Capacity: 8 GB

Running 24×7 from 23/08/2010, so nine years!

N54L

 

Prologue

The above server started it’s life on CentOS 5 and ext3. Re-formatting to run CentOS 6.x with ext4 on 4 x 1TB OEM Hard Disks with mdadm raid-5. That provided 3 TB storage with Fault tolerance 1-drive failure. And believe me, I used that setup to zeroing broken disks or replacing faulty disks.

 

As we are reaching the end of CentOS 6.x and there is no official dist-upgrade path for CentOS, and still waiting for CentOS 8.x, I made decision to switch to Ubuntu 18.04 LTS. At that point this would be the 3rd official OS re-installation of this server. I chose ubuntu so that I can dist-upgrade from LTS to LTS.

 

This is a backup server, no need for huge RAM, but for a reliable system. On that storage I have 2m files that in retrospect are not very big. So with the re-installation I chose to use xfs instead of ext4 filesystem.

 

I am also running an internal snapshot mechanism to have delta for every day and that pushed the storage usage to 87% of the 3Tb. If you do the math, 2m is about 1.2Tb usage, we need a full initial backup, so 2.4Tb (80%) and then the daily (rotate) incremental backups are ~210Mb per day. That gave me space for five (5) daily snapshots aka a work-week.

To remove this impediment, I also replaced the disks with WD Red Pro 6TB 7200rpm disks, and use raid-1 instead of raid-5. Usage is now ~45%

 

Problem

Frozen System

From time to time, this very new, very clean, very reliable system froze to death!

When attached monitor & keyboard no output. Strange enough I can ping the network interfaces but I can not ssh to the server or even telnet (nc) to ssh port. Awkward! Okay - hardware cold reboot then.

As this system is remote … in random times, I need to ask from someone to cold-reboot this machine. Awkward again.

Kernel Panic

If that was not enough, this machine also has random kernel panics.

damn_disk.jpeg

 

Errors

Let’s start troubleshooting this system

# journalctl -p 3 -x

 

Important Errors

ERST: Failed to get Error Log Address Range.
APEI: Can not request [mem 0x7dfab650-0x7dfab6a3] for APEI BERT registers
ipmi_si dmi-ipmi-si.0: Could not set up I/O space

and more important Errors:

INFO: task kswapd0:40 blocked for more than 120 seconds.
      Not tainted 4.15.0-54-generic #58-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task xfsaild/dm-0:761 blocked for more than 120 seconds.
      Not tainted 4.15.0-54-generic #58-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task kworker/u9:2:3612 blocked for more than 120 seconds.
      Not tainted 4.15.0-54-generic #58-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task kworker/1:0:5327 blocked for more than 120 seconds.
      Not tainted 4.15.0-54-generic #58-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task rm:5901 blocked for more than 120 seconds.
      Not tainted 4.15.0-54-generic #58-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task kworker/u9:1:5902 blocked for more than 120 seconds.
      Not tainted 4.15.0-54-generic #58-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task kworker/0:0:5906 blocked for more than 120 seconds.
      Not tainted 4.15.0-54-generic #58-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task kswapd0:40 blocked for more than 120 seconds.
      Not tainted 4.15.0-54-generic #58-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task xfsaild/dm-0:761 blocked for more than 120 seconds.
      Not tainted 4.15.0-54-generic #58-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task kworker/u9:2:3612 blocked for more than 120 seconds.
      Not tainted 4.15.0-54-generic #58-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

 

First impressions ?

damn.jpeg

 

BootOptions

After a few (hours) of internet research the suggestion is to disable

  • ACPI stands for Advanced Configuration and Power Interface.
  • APIC stands for Advanced Programmable Interrupt Controller.

This site is very helpful for ubuntu, although Red Hat still has a huge advanced on describing kernel options better than canonical.

Grub

# vim /etc/default/grub
GRUB_CMDLINE_LINUX="noapic acpi=off"

then

# update-grub
Sourcing file `/etc/default/grub'
Sourcing file `/etc/default/grub.d/50-curtin-settings.cfg'
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.15.0-54-generic
Found initrd image: /boot/initrd.img-4.15.0-54-generic
Found linux image: /boot/vmlinuz-4.15.0-52-generic
Found initrd image: /boot/initrd.img-4.15.0-52-generic
done

Verify

# grep noapic /boot/grub/grub.cfg | head -1

        linux   /boot/vmlinuz-4.15.0-54-generic root=UUID=0c686739-e859-4da5-87a2-dfd5fcccde3d ro noapic acpi=off maybe-ubiquity

reboot and check again:

#  journalctl -p 3 -xb
-- Logs begin at Thu 2019-03-14 19:26:12 EET, end at Wed 2019-07-03 21:31:08 EEST. --
Jul 03 21:30:49 servertwo kernel: ipmi_si dmi-ipmi-si.0: Could not set up I/O space

okay !!!

 

ipmi_si

Unfortunately I could not find anything useful regarding

# dmesg | grep -i ipm
[   10.977914] ipmi message handler version 39.2
[   11.188484] ipmi device interface
[   11.203630] IPMI System Interface driver.
[   11.203662] ipmi_si dmi-ipmi-si.0: ipmi_platform: probing via SMBIOS
[   11.203665] ipmi_si: SMBIOS: mem 0x0 regsize 1 spacing 1 irq 0
[   11.203667] ipmi_si: Adding SMBIOS-specified kcs state machine
[   11.203729] ipmi_si: Trying SMBIOS-specified kcs state machine at mem address 0x0, slave address 0x20, irq 0
[   11.203732] ipmi_si dmi-ipmi-si.0: Could not set up I/O space

# ipmitool list
Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory

# lsmod | grep -i ipmi
ipmi_si                61440  0
ipmi_devintf           20480  0
ipmi_msghandler        53248  2 ipmi_devintf,ipmi_si

 

blocked for more than 120 seconds.

But let’s try to fix the timeout warnings:

INFO: task kswapd0:40 blocked for more than 120 seconds.
      Not tainted 4.15.0-54-generic #58-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message

if you search online the above message, most of the sites will suggest to tweak dirty pages for your system.

This is the most common response across different sites:

This is a know bug. By default Linux uses up to 40% of the available memory for file system caching. After this mark has been reached the file system flushes all outstanding data to disk causing all following IOs going synchronous. For flushing out this data to disk this there is a time limit of 120 seconds by default. In the case here the IO subsystem is not fast enough to flush the data withing 120 seconds. This especially happens on systems with a lot of memory.

Okay this may be the problem but we do not have a lot of memory, only 2GB RAM and 2GB Swap. But even then, our vm.dirty_ratio = 20 setting is 20% instead of 40%.

 

But I have the ability to cross-check ubuntu 18.04 with CentOS 6.10 to compare notes:

 

ubuntu 18.04

# uname -r
4.15.0-54-generic

# sysctl -a | egrep -i  'swap|dirty|raid'|sort
dev.raid.speed_limit_max = 200000
dev.raid.speed_limit_min = 1000
vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 10
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 20
vm.dirtytime_expire_seconds = 43200
vm.dirty_writeback_centisecs = 500
vm.swappiness = 60

 

CentOS 6.11

#  uname -r
2.6.32-754.15.3.el6.centos.plus.x86_64

# sysctl -a | egrep -i  'swap|dirty|raid'|sort
dev.raid.speed_limit_max = 200000
dev.raid.speed_limit_min = 1000
vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 10
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 20
vm.dirty_writeback_centisecs = 500
vm.swappiness = 60

 

Scheduler for Raid

This is the best online documentation on the
optimize raid

Comparing notes we see that both systems have the same settings, even when the kernel version is a lot different, 2.6.32 Vs 4.15.0 !!!

Researching on raid optimization there is a note of kernel scheduler.

 

Ubuntu 18.04

# for drive in {a..c}; do cat /sys/block/sd${drive}/queue/scheduler; done

noop deadline [cfq]
noop deadline [cfq]
noop deadline [cfq] 

 

CentOS 6.11

# for drive in {a..d}; do cat /sys/block/sd${drive}/queue/scheduler; done

noop anticipatory deadline [cfq]
noop anticipatory deadline [cfq]
noop anticipatory deadline [cfq]
noop anticipatory deadline [cfq] 

 

Anticipatory scheduling

CentOS supports Anticipatory scheduling on the hard disks but nowadays anticipatory scheduler is not supported in modern kernel versions.

That said, from the above output we can verify that both systems are running the default scheduler cfq.

Disks

Ubuntu 18.04

  • Western Digital Red Pro WDC WD6003FFBX-6
# for i in sd{b..c} ; do hdparm -Tt  /dev/$i; done

/dev/sdb:
 Timing cached reads:   2344 MB in  2.00 seconds = 1171.76 MB/sec
 Timing buffered disk reads: 738 MB in  3.00 seconds = 245.81 MB/sec

/dev/sdc:
 Timing cached reads:   2264 MB in  2.00 seconds = 1131.40 MB/sec
 Timing buffered disk reads: 774 MB in  3.00 seconds = 257.70 MB/sec

CentOS 6.11

  • Seagate ST1000DX001
/dev/sdb:
 Timing cached reads:   2490 MB in  2.00 seconds = 1244.86 MB/sec
 Timing buffered disk reads: 536 MB in  3.01 seconds = 178.31 MB/sec

/dev/sdc:
 Timing cached reads:   2524 MB in  2.00 seconds = 1262.21 MB/sec
 Timing buffered disk reads: 538 MB in  3.00 seconds = 179.15 MB/sec

/dev/sdd:
 Timing cached reads:   2452 MB in  2.00 seconds = 1226.15 MB/sec
 Timing buffered disk reads: 546 MB in  3.01 seconds = 181.64 MB/sec

 

So what I am missing ?

My initial personal feeling was the low memory. But after running a manual rsync I’ve realized that:

cpu

was load average: 0.87, 0.46, 0.19

mem

was (on high load), when hit 40% of RAM, started to use swap.

KiB Mem :  2008464 total,    77528 free,   635900 used,  1295036 buff/cache
KiB Swap:  2097148 total,  2096624 free,      524 used.  1184220 avail Mem 

So I tweaked a bit the swapiness and reduce it from 60% to 40%

and run a local snapshot (that is a bit heavy on the disks) and doing an upgrade and trying to increase CPU load. Still everything is fine !

I will keep an eye on this story.

fantastic

 

Jun
10
2019
MariaDB Galera Cluster on Ubuntu 18.04.2 LTS

MariaDB Galera Cluster on Ubuntu 18.04.2 LTS

Last Edit: 2019 06 11
Thanks to Manolis Kartsonakis for the extra info.

 

Official Notes here:
MariaDB Galera Cluster

a Galera Cluster is a synchronous multi-master cluster setup. Each node can act as master. The XtraDB/InnoDB storage engine can sync its data using rsync. Each data transaction gets a Global unique Id and then using Write Set REPLication the nodes can sync data across each other. When a new node joins the cluster the State Snapshot Transfers (SSTs) synchronize full data but in Incremental State Transfers (ISTs) only the missing data are synced.

With this setup we can have:

  • Data Redundancy
  • Scalability
  • Availability

galeracluster.png

 

Installation

In Ubuntu 18.04.2 LTS three packages should exist in every node.
So run the below commands in all of the nodes - change your internal IPs accordingly

as root

# apt -y install mariadb-server
# apt -y install galera-3
# apt -y install rsync

host file

as root

# echo 10.10.68.91 gal1 >> /etc/hosts
# echo 10.10.68.92 gal2 >> /etc/hosts
# echo 10.10.68.93 gal3 >> /etc/hosts

 

Storage Engine

Start the MariaDB/MySQL in one node and check the default storage engine. It should be

MariaDB [(none)]> show variables like 'default_storage_engine';

or

echo "SHOW Variables like 'default_storage_engine';" | mysql
+------------------------+--------+
| Variable_name          | Value  |
+------------------------+--------+
| default_storage_engine | InnoDB |
+------------------------+--------+

 

Architecture

A Galera Cluster should be behind a Load Balancer (proxy) and you should never talk with a node directly.

galeracluster_elb.png

Galera Configuration

Now copy the below configuration file in all 3 nodes:

/etc/mysql/conf.d/galera.cnf
[mysqld]
binlog_format=ROW
default-storage-engine=InnoDB
innodb_autoinc_lock_mode=2
bind-address=0.0.0.0

# Galera Provider Configuration
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so

# Galera Cluster Configuration
wsrep_cluster_name="galera_cluster"
wsrep_cluster_address="gcomm://10.10.68.91,10.10.68.92,10.10.68.93"

# Galera Synchronization Configuration
wsrep_sst_method=rsync

# Galera Node Configuration
wsrep_node_address="10.10.68.91"
wsrep_node_name="gal1"

Per Node

Be careful the last 2 lines should change to each node:

Node 01

# Galera Node Configuration
wsrep_node_address="10.10.68.91"
wsrep_node_name="gal1"

Node 02

# Galera Node Configuration
wsrep_node_address="10.10.68.92"
wsrep_node_name="gal2"

Node 03

# Galera Node Configuration
wsrep_node_address="10.10.68.93"
wsrep_node_name="gal3"

 

Galera New Cluster

We are ready to create our galera cluster:

galera_new_cluster

or

mysqld --wsrep-new-cluster

JournalCTL

Jun 10 15:01:20 gal1 systemd[1]: Starting MariaDB 10.1.40 database server...
Jun 10 15:01:24 gal1 sh[2724]: WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1
Jun 10 15:01:24 gal1 mysqld[2865]: 2019-06-10 15:01:24 139897056971904 [Note] /usr/sbin/mysqld (mysqld 10.1.40-MariaDB-0ubuntu0.18.04.1) starting as process 2865 ...
Jun 10 15:01:24 gal1 /etc/mysql/debian-start[2906]: Upgrading MySQL tables if necessary.
Jun 10 15:01:24 gal1 systemd[1]: Started MariaDB 10.1.40 database server.
Jun 10 15:01:24 gal1 /etc/mysql/debian-start[2909]: /usr/bin/mysql_upgrade: the '--basedir' option is always ignored
Jun 10 15:01:24 gal1 /etc/mysql/debian-start[2909]: Looking for 'mysql' as: /usr/bin/mysql
Jun 10 15:01:24 gal1 /etc/mysql/debian-start[2909]: Looking for 'mysqlcheck' as: /usr/bin/mysqlcheck
Jun 10 15:01:24 gal1 /etc/mysql/debian-start[2909]: This installation of MySQL is already upgraded to 10.1.40-MariaDB, use --force if you still need to run mysql_upgrade
Jun 10 15:01:24 gal1 /etc/mysql/debian-start[2918]: Checking for insecure root accounts.
Jun 10 15:01:24 gal1 /etc/mysql/debian-start[2922]: WARNING: mysql.user contains 4 root accounts without password or plugin!
Jun 10 15:01:24 gal1 /etc/mysql/debian-start[2923]: Triggering myisam-recover for all MyISAM tables and aria-recover for all Aria tables
# ss -at '( sport = :mysql )'

State                Recv-Q                Send-Q                                Local Address:Port                                  Peer Address:Port
LISTEN               0                     80                                        127.0.0.1:mysql                                      0.0.0.0:*         
# echo "SHOW STATUS LIKE 'wsrep_%';" | mysql  | egrep -i 'cluster|uuid|ready' | column -t
wsrep_cluster_conf_id     1
wsrep_cluster_size        1
wsrep_cluster_state_uuid  8abc6a1b-8adc-11e9-a42b-c6022ea4412c
wsrep_cluster_status      Primary
wsrep_gcomm_uuid          d67e5b7c-8b90-11e9-ba3d-23ea221848fd
wsrep_local_state_uuid    8abc6a1b-8adc-11e9-a42b-c6022ea4412c
wsrep_ready               ON

 

Second Node

systemctl restart mariadb.service
root@gal2:~# echo "SHOW STATUS LIKE 'wsrep_%';" | mysql  | egrep -i 'cluster|uuid|ready' | column -t

wsrep_cluster_conf_id     2
wsrep_cluster_size        2
wsrep_cluster_state_uuid  8abc6a1b-8adc-11e9-a42b-c6022ea4412c
wsrep_cluster_status      Primary
wsrep_gcomm_uuid          a5eaae3e-8b91-11e9-9662-0bbe68c7d690
wsrep_local_state_uuid    8abc6a1b-8adc-11e9-a42b-c6022ea4412c
wsrep_ready               ON

 

Third Node

systemctl restart mariadb.service
root@gal3:~# echo "SHOW STATUS LIKE 'wsrep_%';" | mysql  | egrep -i 'cluster|uuid|ready' | column -t

wsrep_cluster_conf_id     3
wsrep_cluster_size        3
wsrep_cluster_state_uuid  8abc6a1b-8adc-11e9-a42b-c6022ea4412c
wsrep_cluster_status      Primary
wsrep_gcomm_uuid          013e1847-8b92-11e9-9055-7ac5e2e6b947
wsrep_local_state_uuid    8abc6a1b-8adc-11e9-a42b-c6022ea4412c
wsrep_ready               ON

 

Primary Component (PC)

The last node in the cluster -in theory- has all the transactions. That means it should be the first to start next time from a power-off.

State

cat /var/lib/mysql/grastate.dat

eg.

# GALERA saved state
version: 2.1
uuid:    8abc6a1b-8adc-11e9-a42b-c6022ea4412c
seqno:   -1
safe_to_bootstrap: 0

if safe_to_bootstrap: 1 then you can bootstrap this node as Primary.

 

Common Mistakes

Sometimes DBAs want to setup a new cluster (lets say upgrade into a new scheme - non compatible with the previous) so they want a clean state/directory. The most common way is to move the current mysql directory

mv /var/lib/mysql /var/lib/mysql_BAK

If you try to start your galera node, it will fail:

# systemctl restart mariadb
WSREP: Failed to start mysqld for wsrep recovery:
[Warning] Can't create test file /var/lib/mysql/gal1.lower-test
Failed to start MariaDB 10.1.40 database server

You need to create and initialize the mysql directory first:

mkdir -pv /var/lib/mysql
chown -R mysql:mysql /var/lib/mysql
chmod 0755 /var/lib/mysql
mysql_install_db -u mysql

On another node, cluster_size = 2

# echo "SHOW STATUS LIKE 'wsrep_%';" | mysql  | egrep -i 'cluster|uuid|ready' | column -t

wsrep_cluster_conf_id     4
wsrep_cluster_size        2
wsrep_cluster_state_uuid  8abc6a1b-8adc-11e9-a42b-c6022ea4412c
wsrep_cluster_status      Primary
wsrep_gcomm_uuid          a5eaae3e-8b91-11e9-9662-0bbe68c7d690
wsrep_local_state_uuid    8abc6a1b-8adc-11e9-a42b-c6022ea4412c
wsrep_ready               ON

then:

# systemctl restart mariadb

rsync from the Primary:


Jun 10 15:19:00 gal1 rsyncd[3857]: rsyncd version 3.1.2 starting, listening on port 4444
Jun 10 15:19:01 gal1 rsyncd[3884]: connect from gal3 (192.168.122.93)
Jun 10 15:19:01 gal1 rsyncd[3884]: rsync to rsync_sst/ from gal3 (192.168.122.93)
Jun 10 15:19:01 gal1 rsyncd[3884]: receiving file list
#  echo "SHOW STATUS LIKE 'wsrep_%';" | mysql  | egrep -i 'cluster|uuid|ready' | column -t

wsrep_cluster_conf_id     5
wsrep_cluster_size        3
wsrep_cluster_state_uuid  8abc6a1b-8adc-11e9-a42b-c6022ea4412c
wsrep_cluster_status      Primary
wsrep_gcomm_uuid          12afa7bc-8b93-11e9-88fc-6f41be61a512
wsrep_local_state_uuid    8abc6a1b-8adc-11e9-a42b-c6022ea4412c
wsrep_ready               ON

Be Aware: Try to keep your DATA directory to a seperated storage disk

 

Adding new Nodes

A healthy Quorum has an odd number of nodes. So when you scale your galera gluster consider adding two (2) at every step!

# echo 10.10.68.94 gal4 >> /etc/hosts
# echo 10.10.68.95 gal5 >> /etc/hosts

Data Replication will lock your donor-node so it is best to put-off your donor-node from your Load Balancer:

galeracluster_elb_donor.png

Then explicit point your donor-node to your new nodes by adding the below line in your configuration file:

wsrep_sst_donor= gal3

After the synchronization:

  • comment-out the above line
  • restart mysql service and
  • put all three nodes behind the Local Balancer

 

Split Brain

Find the node with the max

SHOW STATUS LIKE 'wsrep_last_committed';

and set it as master by

SET GLOBAL wsrep_provider_options='pc.bootstrap=YES';

 

Weighted Quorum for Three Nodes

When configuring quorum weights for three nodes, use the following pattern:

node1: pc.weight = 4
node2: pc.weight = 3
node3: pc.weight = 2
node4: pc.weight = 1
node5: pc.weight = 0

eg.

SET GLOBAL wsrep_provider_options="pc.weight=3";

In the same VPC setting up pc.weight will avoid a split brain situation. In different regions, you can setup something like this:

node1: pc.weight = 2
node2: pc.weight = 2
node3: pc.weight = 2
  <->
node4: pc.weight = 1
node5: pc.weight = 1
node6: pc.weight = 1

 

Oct
15
2018
WslRegisterDistribution failed with error: 0×800703fa

Getting this error on Windows 10 [Ubuntu running on Windows Subsystem for Linux]

0x800703fa.png

Go to

this PC –> Manage –> Services & Applications –> Services –> LXSSMANAGER

lxssmanager.png