Backup and recovery with custom storage

This section discusses how to configure backup and recovery of your Cassandra database using ssh and your file system instead of using Google Cloud. See also:

What is Cassandra backup and recovery with custom storage?

Backup with custom storage stores backups of your Cassandra database to compressed files in the file system of a server you specify. Backups occur on a schedule you specify in your overrides file. The connection to the server is by secure SSH.

Setting up backups without Cloud Services

The following steps include common examples for completing specific tasks, like creating an SSH key pair. Use the methods that are appropriate to your installation.

The procedure has the following parts:

Set up the server and SSH

  1. Designate a Linux or Unix server for your backups. This server must be reachable using SSH from your Apigee hybrid runtime plane. It must have enough storage for your backups.
  2. Set up an SSH server on the server, or ensure that it has a secure SSH server configured.
  3. Create an SSH key pair and store the private key file in a path that is accessible from your hybrid runtime plane. You must use a blank password for your key pair or the backup will fail. For example:
    ssh-keygen -t rsa -b 4096 -C [email protected]
      Enter file in which to save the key (/Users/exampleuser/.ssh/id_rsa): $APIGEE_HOME/hybrid-files/certs/ssh_key
      Enter passphrase (empty for no passphrase):
      Enter same passphrase again:
      Your identification has been saved in ssh_key
      Your public key has been saved in ssh_key.pub
      The key fingerprint is:
      SHA256:DWKo334XMZcZYLOLrd/8HNpjTERPJJ0mc11UYmrPvSA [email protected]
      The key's randomart image is:
      +---[RSA 4096]----+
      |          +.  ++X|
      |     .   . o.=.*+|
      |    . o . . o==o |
      |   . . . =oo+o...|
      |  .     S +E oo .|
      |   . .   .. . o .|
      |    . . .  . o.. |
      |     .  ...o ++. |
      |      .. .. +o+. |
      +----[SHA256]-----+
  4. Create a user account on the backup server with the name apigee. Make sure the new apigee user has a home directory under /home.
  5. On the backup server, create an ssh directory in the new /home/apigee directory.
  6. Copy the public key (ssh_key.pub in the previous example) into a file named authorized_keys in the new /home/apigee/ssh directory. For example:
    cd /home/apigee
    mkdir .ssh
    cd .ssh
    vi authorized_keys
  7. On your backup server, create a backup directory within the /home/apigee/ directory. The backup directory can be any directory as long as the apigee user has access to it. For example:
    cd /home/apigee
    mkdir cassandra-backup
  8. Test the connection. You need to make sure that your Cassandra pods can connect to your backup server using SSH:
    1. Log into the shell of your Cassandra pod. For example:
      kubectl exec -it -n apigee APIGEE_CASSANDRA_DEFAULT_0 -- /bin/bash

      Where APIGEE_CASSANDRA_DEFAULT_0 is the name of a Cassandra pod. Change this to the name of the pod you want to connect from.

    2. Connect by SSH to your backup server, using the server IP address:
      ssh apigee@BACKUP_SERVER_IP

Set the schedule and destination for backup

You set the schedule and destination for backups in your overrides.yaml file.

  1. Add the following parameters to your overrides.yaml file:

    Parameters

    cassandra:
      backup:
        enabled: true
        keyFile: "PATH_TO_PRIVATE_KEY_FILE"
        server: "BACKUP_SERVER_IP"
        storageDirectory: "/home/apigee/BACKUP_DIRECTORY"
        cloudProvider: "HYBRID" # required verbatim "HYBRID" (all caps)
        schedule: "SCHEDULE"
    

    Example

    cassandra:
      backup:
        enabled: true
        keyFile: "/Users/exampleuser/apigee-hybrid/hybrid-files/service-accounts/private.key"
        server: "34.56.78.90"
        storageDirectory: "/home/apigee/cassbackup"
        cloudProvider: "HYBRID"
        schedule: "0 2 * * *"
    

    Where:

    Property Description
    backup:enabled Backup is disabled by default. You must set this property to true.
    backup:keyFile

    PATH_TO_PRIVATE_KEY_FILE

    The path on your local file system to the SSH private key file (named ssh_key in the step where you created the SSH key pair).

    backup:server

    BACKUP_SERVER_IP

    The IP address of your backup server.

    backup:storageDirectory

    BACKUP_DIRECTORY

    The name of the backup directory on your backup server. This must be a directory within home/apigee (the backup directory is named cassandra_backup in the step where you created the backup directory).

    backup:cloudProvider

    HYBRID

    The cloudProvider: "HYBRID" property is required.

    backup:schedule

    SCHEDULE

    The time when the backup starts, specified in standard crontab syntax. Default: 0 2 * * *

  2. Use apigeectl to apply the backup configuration to the storage scope of your cluster:
    $APIGEECTL_HOME/apigeectl --datastore -f YOUR_OVERRIDES_FILE

    Where YOUR_OVERRIDES_FILE is the path to the overrides file you just edited.

Configure restore

Restoration takes your data from the backup location and restores the data into a new Cassandra cluster with the same number of nodes. No data is taken from the old Cassandra cluster.

The restoration instructions below are for single region deployments that do not use Google Cloud Storage for backups. For other deployments, see the following:

To restore Cassandra backups:

  1. Create a new namespace within the existing Kubernetes cluster that will be used to restore the hybrid huntime deployment. Do not use the original namespace name for the new namespace. Do not use the old namespace for restoration.
  2. In the root hybrid installation directory, create a new overrides-restore.yaml file.
  3. Copy the complete Cassandra configuration from your original overrides.yaml file into the new overrides-restore.yaml file. For example:
    cp ./overrides.yaml ./overrides-restore.yaml
    
  4. Add a namespace element to the new overrides-restore.yaml file.

    Parameters

    namespace: YOUR_RESTORE_NAMESPACE
    cassandra:
      ...
      restore:
        enabled: true
        keyFile: "PATH_TO_PRIVATE_KEY_FILE"
        server: "BACKUP_SERVER_IP"
        storageDirectory: "/home/apigee/BACKUP_DIRECTORY"
        cloudProvider: "HYBRID"  # required verbatim "HYBRID" (all caps)
        snapshotTimestamp: "BACKUP_TO_RESTORE"
      ...
    

    Example

    namespace: cassandra-restore
    cassandra:
      restore:
        enabled: true
        keyFile: "/Users/exampleuser/apigee-hybrid/hybrid-files/service-accounts/private.key"
        server: "34.56.78.90"
        storageDirectory: "/home/apigee/cassbackup"
        cloudProvider: "HYBRID"
        snapshotTimestamp: "20201001183903"
    
  5. Where:

    Property Description
    namespace

    YOUR_RESTORE_NAMESPACE

    The name of the new namespace you created in step 1 for the new Cassandra cluster. Do not use the same namespace you used for your original cluster.

    restore:enabled Restore is disabled by default. You must set this property to true.
    restore:keyFile

    PATH_TO_PRIVATE_KEY_FILE

    The path on your local file system to the SSH private key file (named ssh_key in the step where you created the SSH key pair).

    restore:server

    BACKUP_SERVER_IP

    The IP address of your backup server.

    restore:storageDirectory

    BACKUP_DIRECTORY

    The name of the backup directory on your backup server. This must be a directory within home/apigee (the backup directory is named cassandra_backup in the step where you created the backup directory).

    restore:cloudProvider

    HYBRID

    The cloudProvider: "HYBRID" property is required.

    restore:snapshotTimestamp

    BACKUP_TO_RESTORE

    The specific backup you want to restore, specified in standard crontab syntax (no wildcards allowed).

  6. Change the app label on any Cassandra nodes in the old namespace by executing the following command:
    kubectl label pods --overwrite --namespace=OLD_NAMESPACE -l app=apigee-cassandra app=apigee-cassandra-old
    
  7. Create a new hybrid runtime deployment. This will create a new Cassandra cluster and begin restoring the backup data into the cluster:
    ./apigeectl init  -f ../overrides-restore.yaml
    
    ./apigeectl apply  -f ../overrides-restore.yaml
    
  8. Once the restoration is complete, the traffic must be switched to use the Cassandra cluster in the new namespace. Run the following commands to switch the traffic:

    kubectl get rs -n OLD_NAMESPACE # look for the 'apigee-connect' replicaset
    
    kubectl patch rs -n OLD_NAMESPACE APIGEE_CONNECT_RS_NAME -p '{"spec":{"replicas" : 0}}'
    
  9. Once the traffic switch is complete, you can reconfigure backups on the restored cluster by removing the restore configuration and adding the backup configuration to the overrides-restore.yaml file. Replace YOUR_RESTORE_NAMESPACE with the new namespace name created in step 1.
    namespace: YOUR_RESTORE_NAMESPACE
    cassandra:
      ...
      backup:
        enabled: true
        serviceAccountPath: SA_JSON_FILE_PATH
        dbStorageBucket: CLOUD_STORAGE_BUCKET_PATH
        schedule: BACKUP_SCHEDULE_CODE
      ...
    

    Then apply the backup configuration with the following command:

    ./apigeectl apply  -f ../overrides-restore.yaml