Aug
03
2019
Object storage with Backblaze, Rclone, MinIO and s3cmd

In this blog post, I will try to write a comprehensive guide on how to use cloud object storage for backup purposes.

Goal

obs_s3.png

What is Object Storage

In a nutshell object storage software uses commodity hard disks in a distributed way across a cluster of systems.

Why using Object Storage

The main characteristics of object storage are:

  • Scalability
  • Reliability
  • Efficiency
  • Performance
  • Accessibility

Scalability

We can immediately increase our storage by simple adding new commodity systems in our infrastructure to scale up our storage needs, as we go.

Reliability

As we connect more and more systems, we can replicate our data across all of them. We can choose how many copies we would like to have or in which systems we would like to have our replicated data. Also (in most cases) a self-healing mechanism is running in the background to preserve our data from corruption.

Efficiency

By not having a single point of failure in a distributed system, we can reach high throughput across our infrastructure.

Performance

As data are being dispersed across disks and systems, improves read and write performance. Reduces any bottleneck as we can get objects from different places in a psedoparalleler way to construct our data.

Accessibility

Accessing data through a REST API (aka endpoint) using tokens. We can define specific permissions to users or applications and/or we can separate access by creating different keys. We can limit read, write, list, delete or even share specific objects with limited keys!

Backblaze - Cloud Storage Backup

There are a lot of cloud data storage provider. A lot!
When choosing your storage provider, you need to think a couple of things:

  • Initial data size
  • Upload/Sync files (delta size)
  • Delete files
  • Download files

Every storage provider have different prices for every read/write/delete/share operation. Your needs will define who is the best provider for you. My plan, is to use cloud storage as archive-backups. That means I need to make an initial upload and after that, frequently sync my files there. But I do NOT need them immediately. This is the backup of my backup in case my primary site is down (or corrupted, or broken, or stolen, or seized, or whatever). I have heard really good words about backblaze and their pricing model suits me.

Create an Account

Create an account and enable Backblaze B2 Cloud Storage. This option will also enable Multi Factor Authentication (MFA) by adding a TOTP in your mobile app or use SMS (mobile phone is required) as a fallback. This is why it is called Multi-Factor, because you can need more than one way to login. Also this is the way that Backblaze can protect themselves of people creating multiple accounts and get 10G free storage for every account.

B2 Cloud Storage

You will see a Master application key. Create a New Application Key.

b2_master_application_keys.png

I already have created a New Bucket and I want to give explicit access to this new Application Key.

b2_add_application_key.png

Now, the important step (the one that I initial did wrong!).
The below screen will be visible ONLY ONE time!
Copy the application key (marked in the blue rectangle).
If you lose this key, you need to delete it and create a new one from the start.

b2_new_application_key.png

That’s it, pretty much we are done with backblaze!

Rclone

rsync for cloud storage

Next it is time to install and configure rclone. Click here to read the online documentation of rclone on backblaze. Rclone is a go static binary build application, that means you do not have to install or use it as root!

Install Rclone

I will use the latest version of rclone:

curl -sLO https://downloads.rclone.org/rclone-current-linux-amd64.zip
unzip rclone-current-linux-amd64.zip
cd rclone-*linux-amd64/
$ ./rclone version
rclone v1.48.0
- os/arch: linux/amd64
- go version: go1.12.6

Configure Rclone

You can configure rclone with this command:

./rclone config

but for this article I will follow a more shorter procedure:

Create an empty file under

mkdir -pv ~/.config/rclone/

$ cat > ~/.config/rclone/rclone.conf  <<EOF
[remote]
type = b2
account = 0026f98XXXXXXXXXXXXXXXXXX
key = K0021XXXXXXXXXXXXXXXXXXXXXXXXXX
hard_delete = true
EOF

Replace acount and key with your own backblaze application secrets

  • KeyID
  • ApplicationKey

In our configuration, the name of backblaze b2 cloud storage is remote.

Test Rclone

We can test rclone with this:

./rclone lsd remote:

$ ./rclone lsd remote:
          -1 2019-08-03 22:01:05        -1 vog-m23XXXXX

if we see our bucket name, then everything is fine.

A possible error

401 bad_auth_token

In my first attempt, I did not save the applicationKey when I created the new pair of access keys. So I put the wrong key in the rclone configuration! So be careful, if you see this error, just delete your application key and create a new one.

Rclone Usage

Let’s copy/sync a directory to see if everything is working as advertised:

rclone sync dnl/ remote:vog-m23XXXXX/dnl/

from our browser:

rclone_sync.png

Delete Files

rclone delete remote:vog-m23XXXXX/dnl
rclone purge remote:vog-m23XXXXX/dnl

List Files

rclone ls remote:vog-m23XXXXX
(empty)

rclone tree remote:vog-m23XXXXX
/

0 directories, 0 files

Rclone Crypt

Of course we want to have encrypted backups on the cloud. Read this documentation for more info: Crypt.
We need to re-configure rclone so that can encrypt our files before passing them to our data storage provider.

rclone config

Our remote b2 is already there:

$ rclone config
Current remotes:

Name                 Type
====                 ====
remote               b2

e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q>

New Remote

n to create a new remote, and I will give encrypt as it’s name.

e/n/d/r/c/s/q> n
name> encrypt
Type of storage to configure.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value

Crypt Module

We choose: crypt module:

...
 9 / Encrypt/Decrypt a remote
    "crypt"
...
Storage> 9
** See help for crypt backend at: https://rclone.org/crypt/ **

Remote to encrypt/decrypt.
Normally should contain a ':' and a path, eg "myremote:path/to/dir",
"myremote:bucket" or maybe "myremote:" (not recommended).
Enter a string value. Press Enter for the default ("").

Remote Bucket Name

We also need to give a name, so rclone can combine crypt with b2 module.
I will use my b2-bucket name for this:

remote:vog-m23XXXXX

Remote to encrypt/decrypt.
Normally should contain a ':' and a path, eg "myremote:path/to/dir",
"myremote:bucket" or maybe "myremote:" (not recommended).
Enter a string value. Press Enter for the default ("").
remote> remote:vog-m23XXXXX

Encrypt the filenames

Yes, we want rclone to encrypt our filenames

How to encrypt the filenames.
Enter a string value. Press Enter for the default ("standard").
Choose a number from below, or type in your own value
 1 / Don't encrypt the file names.  Adds a ".bin" extension only.
    "off"
 2 / Encrypt the filenames see the docs for the details.
    "standard"
 3 / Very simple filename obfuscation.
    "obfuscate"
filename_encryption> 2

Encrypt directory names

Yes, those too

Option to either encrypt directory names or leave them intact.
Enter a boolean value (true or false). Press Enter for the default ("true").
Choose a number from below, or type in your own value
 1 / Encrypt directory names.
    "true"
 2 / Don't encrypt directory names, leave them intact.
    "false"
directory_name_encryption> 1

Password or pass phrase for encryption

This will be an automated backup script in the end, so I will use random password for this step, with 256 bits and no salt.

Password or pass phrase for encryption.
y) Yes type in my own password
g) Generate random password
n) No leave this optional password blank
y/g/n> g

Password strength in bits.
64 is just about memorable
128 is secure
1024 is the maximum
Bits> 256

Your password is: VE64tx4zlXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Use this password? Please note that an obscured version of this
password (and not the password itself) will be stored under your
configuration file, so keep this generated password in a safe place.
y) Yes
n) No
y/n> y

Password or pass phrase for salt. Optional but recommended.
Should be different to the previous password.
y) Yes type in my own password
g) Generate random password
n) No leave this optional password blank
y/g/n> n

Keep in your password manager this password:
VE64tx4zlXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

In your setup will be something completly different.

Saving configuration

No need of advanced configuration, review your rclone config and save it.

Edit advanced config? (y/n)
y) Yes
n) No
y/n> n
Remote config
--------------------
[encrypt]
type = crypt
remote = remote:vog-m23XXXXX
filename_encryption = standard
directory_name_encryption = true
password = *** ENCRYPTED ***
--------------------
y) Yes this is OK
e) Edit this remote
d) Delete this remote
y/e/d> y
Current remotes:

Name                 Type
====                 ====
encrypt              crypt
remote               b2

e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q> q

Rclone Encrypt Sync

Now let’s see if this crypt modules is working:

rclone sync dnl/ encrypt:

List of encrypted files

rclone ls remote:vog-m23XXXXX

       78 germrc3i2lisdd9iutvmbmtt8g
   241188 p8jmes5qcpj3lka398vb8qril4/1pg9mb8gca05scmkg8nn86tgse3905trubkeah8t75dd97a7e2caqgo275uphgkj95p78e4i3rejm
  6348676 p8jmes5qcpj3lka398vb8qril4/ehhjp4k6bdueqj9arveg4liaameh0qu55oq6hsmgne4nklg83o0d149b9cdc5oq3c0otlivjufk0o
    27040 p8jmes5qcpj3lka398vb8qril4/tsiuegm9j7nghheualtbutg4m3r65blqbdn03cdaipnjsnoq0fh26eno22f79fhl1re3m5kiqjfnu

rclone tree  remote:vog-m23XXXXX

/
├── germrc3i2lisdd9iutvmbmtt8g
└── p8jmes5qcpj3lka398vb8qril4
    ├── 1pg9mb8gca05scmkg8nn86tgse3905trubkeah8t75dd97a7e2caqgo275uphgkj95p78e4i3rejm
    ├── ehhjp4k6bdueqj9arveg4liaameh0qu55oq6hsmgne4nklg83o0d149b9cdc5oq3c0otlivjufk0o
    └── tsiuegm9j7nghheualtbutg4m3r65blqbdn03cdaipnjsnoq0fh26eno22f79fhl1re3m5kiqjfnu

1 directories, 4 files

Backblaze dashboard

rclone_encrypt_browse.png

Encrypted file

But is it indeed encrypted or just is it only the file name ?
In our system the content of file1 are:

# cat dnl/file1
Sun Aug  4 00:01:54 EEST 2019

If we download this file:

$ cat /tmp/germrc3i2lisdd9iutvmbmtt8g
RCLONENc�s��w�YF��r,O�S�"���U?���>ȘDh�3-�'/5��k��g�x'5yz�i� �H��

rclone_encrypted_file.png

Rclone Sync Script

Here is my personal rclone sync script: rclone.sh

#!/bin/sh
# ebal, Sun, 04 Aug 2019 16:33:14 +0300

# Create Rclone Log Directory
mkdir -p  /var/log/rclone/`date +%Y`/`date +%m`/`date +%d`/

# Compress previous log file
gzip /var/log/rclone/`date +%Y`/`date +%m`/`date +%d`/*

# Define current log file
log_file="/var/log/rclone/`date +%Y`/`date +%m`/`date +%d`/`hostname -f`-`date +%Y%m%d_%H%M`.log"

# Filter out - exclude dirs & files that we do not need
filter_f="/root/.config/rclone/filter-file.txt"

# Sync !
/usr/local/bin/rclone
    --quiet
    --delete-before
    --ignore-existing
    --links
    --filter-from $filter_f
    --log-file $log_file
    sync / encrypt:/`hostname -f`/

and this is what I am filtering out (exclude):

- /dev/**
- /lost+found/**
- /media/**
- /mnt/**
- /proc/**
- /run/**
- /swap.img
- /swapfile
- /sys/**
- /tmp/**
- /var/tmp/**

MinIO

MinIO is a high performance object storage server compatible with Amazon S3 APIs

Most of the online cloud object storage data providers (and applications) are S3 compatible. Amazon S3 or Amazon Simple Storage Service is the de-facto on object storage and their S3 API (or driver) is being used by many applications.

B2 Cloud Storage API Compatible with Amazon S3

API Compatible

Backblaze is using a REST-API but it is not S3 compatible. So in case your application can only talk via S3, we need a translator from S3 <--> B2 thus we need Minio, as an S3 Compatible Object Storage driver Gateway!

Install Minio

Minio is also a go software!

curl -sLO https://dl.min.io/server/minio/release/linux-amd64/minio
chmod +x minio

./minio version

$ ./minio version

Version: 2019-08-01T22:18:54Z
Release-Tag: RELEASE.2019-08-01T22-18-54Z
Commit-ID: c5ac901e8dac48d45079095a6bab04674872b28b

Configure Minio

actually no configuration needed, just export Access/Sercet keys to local environment:

export -p MINIO_ACCESS_KEY=0026f98XXXXXXXXXXXXXXXXXX
export -p MINIO_SECRET_KEY=K0021XXXXXXXXXXXXXXXXXXXXXXXXXX

Run Minio S3 gateway

./minio gateway b2

$ ./minio gateway b2

Endpoint:  http://93.184.216.34:9000  http://127.0.0.1:9000
AccessKey: 0026f98XXXXXXXXXXXXXXXXXX
SecretKey: K0021XXXXXXXXXXXXXXXXXXXXXXXXXX

Browser Access:
   http://93.184.216.34:9000  http://127.0.0.1:9000

Command-line Access: https://docs.min.io/docs/minio-client-quickstart-guide
   $ mc config host add myb2 http://93.184.216.34:9000 0026f98XXXXXXXXXXXXXXXXXX K0021XXXXXXXXXXXXXXXXXXXXXXXXXX

Object API (Amazon S3 compatible):
   Go:         https://docs.min.io/docs/golang-client-quickstart-guide
   Java:       https://docs.min.io/docs/java-client-quickstart-guide
   Python:     https://docs.min.io/docs/python-client-quickstart-guide
   JavaScript: https://docs.min.io/docs/javascript-client-quickstart-guide
   .NET:       https://docs.min.io/docs/dotnet-client-quickstart-guide

Web Dashboard

Minio comes with it’s own web-ui dashboard!
How awesome is this ?

minio_dashboard.png

minio_browse.png

S3cmd

The most common S3 command line tool is a python program named: s3cmd
It (probable) already exists in your package manager and you can install it.

On a rpm-based system:

yum -y install s3cmd

On a deb-based system:

apt -y install s3cmd

you can also install it via pip or even inside a virtualenv

pip install s3cmd

Configure s3cmd

We need to configre s3cmd, by running:

s3cmd --configure

$ s3cmd --configure

Enter new values or accept defaults in brackets with Enter.
Refer to user manual for detailed description of all options.

Access key and Secret key are your identifiers for Amazon S3. Leave them empty for using the env variables.
Access Key: 0026f98XXXXXXXXXXXXXXXXXX
Secret Key: K0021XXXXXXXXXXXXXXXXXXXXXXXXXX
Default Region [US]:
Use "s3.amazonaws.com" for S3 Endpoint and not modify it to the target Amazon S3.
S3 Endpoint [s3.amazonaws.com]: http://127.0.0.1:9000

Use "%(bucket)s.s3.amazonaws.com" to the target Amazon S3. "%(bucket)s" and "%(location)s" vars can be used
if the target S3 system supports dns based buckets.
DNS-style bucket+hostname:port template for accessing a bucket [%(bucket)s.s3.amazonaws.com]:

Encryption password is used to protect your files from reading
by unauthorized persons while in transfer to S3
Encryption password:
Path to GPG program [/usr/bin/gpg]:
When using secure HTTPS protocol all communication with Amazon S3
servers is protected from 3rd party eavesdropping. This method is
slower than plain HTTP, and can only be proxied with Python 2.7 or newer
Use HTTPS protocol [Yes]: n

On some networks all internet access must go through a HTTP proxy.
Try setting it here if you can't connect to S3 directly
HTTP Proxy server name:
New settings:
  Access Key: 0026f98XXXXXXXXXXXXXXXXXX
  Secret Key: K0021XXXXXXXXXXXXXXXXXXXXXXXXXX
  Default Region: US
  S3 Endpoint: http://127.0.0.1:9000
  DNS-style bucket+hostname:port template for accessing a bucket: %(bucket)s.s3.amazonaws.com
  Encryption password:
  Path to GPG program: /usr/bin/gpg
  Use HTTPS protocol: False
  HTTP Proxy server name:
  HTTP Proxy server port: 0

Test access with supplied credentials? [Y/n] n

Save settings? [y/N] y
Configuration saved to '/home/ebal/.s3cfg'

Summarize Config

To summarize, these are the settings we need to type, everything else can be default:

Access Key: 0026f98XXXXXXXXXXXXXXXXXX
Secret Key: K0021XXXXXXXXXXXXXXXXXXXXXXXXXX
S3 Endpoint [s3.amazonaws.com]: http://127.0.0.1:9000
Use HTTPS protocol [Yes]: n

Test s3cmd

$ s3cmd ls
1970-01-01 00:00  s3://vog-m23XXX

s4cmd

Super S3 command line tool

Notable mention: s4cmd

s4cmd is using Boto 3, an S3 SDK for python. You can build your own application, using S3 as backend storage with boto.

$ pip search s4cmd
s4cmd (2.1.0)  - Super S3 command line tool

$ pip install s4cmd

Configure s4cmd

If you have already configure s3cmd, then s4cmd will read the same config file. But you can also just export these enviroment variables and s4cmd will use them:

export -p S3_ACCESS_KEY=0026f98XXXXXXXXXXXXXXXXXX
export -p S3_SECRET_KEY=K0021XXXXXXXXXXXXXXXXXXXXXXXXXX

Run s4cmd

s4cmd --endpoint-url=http://127.0.0.1:9000 ls

$ s4cmd --endpoint-url=http://127.0.0.1:9000 ls
1970-01-01 00:00 DIR s3://vog-m23XXXXX/

SSH Local Port Forwarding

You can also use s3cmd/s4cmd or any other S3 compatible software from another machine if you can bring minio gateway local.

You can do this by running a ssh command:

ssh -L 9000:127.0.0.1:9000 <remote_machine_that_runs_minio_gateway>