=============================================== Secrets management with CCE and Hashicorp Vault =============================================== Most modern IT setups are composed of several subsystems like databases, object stores, master controller, node access, and more. To access one component from another, some form of credentials are required. Configuring and storing these secrets directly in the components is considered as an antipattern, since a vulnerability of one component may iteratively affect the security of the whole setup. With centralized secret management it becomes unnecessary to keep secrets used by various applications spreaded across DevOps environments. This helps to close some security attack vectors (like `secret sprawl `_, `security islands `_), but usually introduces a problem of the so-called `Secret Zero `_ as a key to the key storage. Vault is an open-source software, provided and maintained by Hashicorp, that addresses this very problem. It is considered one of the reference solutions for it. This article demonstrates how to utilize infrastructure authorization with Hashicorp Vault in an CCE-powered setup. As an example workload, we deploy a Zookeeper cluster with enabled TLS protection. Certificates for Zookeeper are stored in Vault, and they oblige required practices like rotations or audits. Zookeper can easily be replaced by any other component that requires access to internal credentials. Overview ======== .. graphviz:: dot/cce_vault_overview.dot :layout: dot TLS secrets are kept in the Vault. They are being read by Vault Agent component running as a sidecar in Zookeeper service pod and writes certificates onto the file system. Zookeeper services reads certificates populated by Agent. Vault Agent is configured to use password-less access to Vault. Further in the document it is explained how exactly this is implemented. Establishing trust between CCE and Vault ======================================== Before any application managed by the CCE is able to login to Vault relying on infrastructure based authentication it is required to do some steps on the Vault side. Kubernetes auth plugin is enabled and configured to only access requests from specific Kubernetes cluster by providing its Certificate Authority. To allow several multiple different CCE clusters to use Vault, a dedicated auth path is going to be used. .. code-block:: shell $ vault auth enable -path kubernetes_cce1 kubernetes $ vault write auth/kubernetes_cce1/config \ kubernetes_host="$K8S_HOST" \ kubernetes_ca_cert="$SA_CA_CRT" Since in our example a dedicated service account with token is being periodically rotated using `client JWT as reviewer JWT `_ can be used. Access rules for Vault ====================== Having Auth plugin enabled, as described above, CCE workloads are able to authenticate to Vault, but they can do nothing. It is now necessary to establish further level of authorization and let particular service accounts of CCE to get access to secrets in Vault. For the scope of the use case, we grant the Zookeeper service account from its namespace access to the TLS secrets stored in Vault's key-value store. For that a policy providing a read-only access to the /tls/zk* and /tls/ca paths is created. .. code-block:: shell $ vault policy write tls-zk-ro - <`_ * Use `PKI secrets engine `_ to issue certificates Vault enables users not only to store TLS certificates data in the key-value store, but also to create and revoke them. To keep this tutorial simple enough we are not going to do this and just upload generated certificates into the KV store. For production setups this example can be easily extended with extra actions. .. code-block:: shell $ vault kv put secret/tls/ca certificate=@ca.crt $ vault kv put secret/tls/zk_server certificate=@zk_server.crt private_key=@zk_server.key $ vault kv put secret/tls/zk_client certificate=@zk_client.crt private_key=@zk_client.key Certificate paths and property names used here are referenced by the Zookeeper installation. Deploying Zookeeper =================== Now that the secrets are stored safely in Vault and only allowed applications can fetch them it is time to look how exactly the application accesses the secrets. Generally, utilizing Vault requires modification of the application. `Vault agent `_ is a tool that was created to simplify secrets delivery for applications when it is hard or difficult to change the application itself. The Agent is taking care of reading secrets from Vault and can deliver them to the file system. There are many way how to properly implement Zookeeper service on the Kubernetes. The scope of the blueprint is not Zookeeper itself, but demostrating how an application can be supplied by required certificates. The reference architecture described here bases on the best practices gathered from various sources and extended by HashiCorp Vault. It overrides default Zookeeper start scripts in order to allow better control of the runtime settings and properly fill all required configuration options for TLS to work. Other methods of deploying Zookeeper can be easily used here instead. 1. Create a Kubernetes namespace named `zookeeper`. .. code-block:: shell $ kubectl create namespace zookeeper 2. Create a Kubernetes service account named `zookeeper`. .. code-block:: shell $ kubectl create serviceaccount zookeeper 3. In Kubernetes a *service account* provides an identity for the services running in the pod so that the process can access Kubernetes API. The same identity can be used to access Vault, but require one special permission - access to the tokenreview API of the Kubernetes. When instead a dedicated reviewer JWT is used, this step is not necessary, but it also means long-living sensitive data is used and frequently transferred over the network. More details on various ways to use Kubernetes tokens to authorize to Vault `can be found here `_. .. code-block:: shell $ kubectl create clusterrolebinding vault-client-auth-delegator \ --clusterrole=system:auth-delegator \ --serviceaccount=zookeeper:zookeeper 4. Create a Kubernetes ConfigMap with all required configurations. One possible approach is to define dedicated health and readiness check scripts and to override automatically created Zookeeper start script. This is especially useful when TLS protection is enabled, but default container scripts do not support this. .. code-block:: yaml :caption: zookeeper-cm.yaml --- apiVersion: v1 kind: ConfigMap metadata: name: zookeeper-config namespace: "zookeeper" data: ok: | #!/bin/sh # This sript is used by live-check of Kubernetes pod if [ -f /tls/ca.pem ]; then echo "srvr" | openssl s_client -CAfile /tls/ca.pem -cert /tls/client/tls.crt \ -key /tls/client/tls.key -connect 127.0.0.1:${1:-2281} -quiet -ign_eof 2>/dev/null | grep Mode else zkServer.sh status fi ready: | #!/bin/sh # This sript is used by readiness-check of Kubernetes pod if [ -f /tls/ca.pem ]; then echo "ruok" | openssl s_client -CAfile /tls/ca.pem -cert /tls/client/tls.crt \ -key /tls/client/tls.key -connect 127.0.0.1:${1:-2281} -quiet -ign_eof 2>/dev/null else echo ruok | nc 127.0.0.1 ${1:-2181} fi run: | #!/bin/bash # This is the main starting script set -a ROOT=$(echo /apache-zookeeper-*) ZK_USER=${ZK_USER:-"zookeeper"} ZK_LOG_LEVEL=${ZK_LOG_LEVEL:-"INFO"} ZK_DATA_DIR=${ZK_DATA_DIR:-"/data"} ZK_DATA_LOG_DIR=${ZK_DATA_LOG_DIR:-"/data/log"} ZK_CONF_DIR=${ZK_CONF_DIR:-"/conf"} ZK_CLIENT_PORT=${ZK_CLIENT_PORT:-2181} ZK_SSL_CLIENT_PORT=${ZK_SSL_CLIENT_PORT:-2281} ZK_SERVER_PORT=${ZK_SERVER_PORT:-2888} ZK_ELECTION_PORT=${ZK_ELECTION_PORT:-3888} ID_FILE="$ZK_DATA_DIR/myid" ZK_CONFIG_FILE="$ZK_CONF_DIR/zoo.cfg" LOG4J_PROPERTIES="$ZK_CONF_DIR/log4j.properties" HOST=$(hostname) DOMAIN=`hostname -d` APPJAR=$(echo $ROOT/*jar) CLASSPATH="${ROOT}/lib/*:${APPJAR}:${ZK_CONF_DIR}:" if [[ $HOST =~ (.*)-([0-9]+)$ ]]; then NAME=${BASH_REMATCH[1]} ORD=${BASH_REMATCH[2]} MY_ID=$((ORD+1)) else echo "Failed to extract ordinal from hostname $HOST" exit 1 fi mkdir -p $ZK_DATA_DIR mkdir -p $ZK_DATA_LOG_DIR echo $MY_ID >> $ID_FILE echo "dataDir=$ZK_DATA_DIR" >> $ZK_CONFIG_FILE echo "dataLogDir=$ZK_DATA_LOG_DIR" >> $ZK_CONFIG_FILE echo "4lw.commands.whitelist=*" >> $ZK_CONFIG_FILE # Client TLS configuration if [[ -f /tls/ca.pem ]]; then echo "secureClientPort=$ZK_SSL_CLIENT_PORT" >> $ZK_CONFIG_FILE echo "ssl.keyStore.location=/tls/client/client.pem" >> $ZK_CONFIG_FILE echo "ssl.trustStore.location=/tls/ca.pem" >> $ZK_CONFIG_FILE else echo "clientPort=$ZK_CLIENT_PORT" >> $ZK_CONFIG_FILE fi # Server TLS configuration if [[ -f /tls/ca.pem ]]; then echo "serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory" >> $ZK_CONFIG_FILE echo "sslQuorum=true" >> $ZK_CONFIG_FILE echo "ssl.quorum.keyStore.location=/tls/server/server.pem" >> $ZK_CONFIG_FILE echo "ssl.quorum.trustStore.location=/tls/ca.pem" >> $ZK_CONFIG_FILE fi for (( i=1; i<=$ZK_REPLICAS; i++ )) do echo "server.$i=$NAME-$((i-1)).$DOMAIN:$ZK_SERVER_PORT:$ZK_ELECTION_PORT" >> $ZK_CONFIG_FILE done rm -f $LOG4J_PROPERTIES echo "zookeeper.root.logger=$ZK_LOG_LEVEL, CONSOLE" >> $LOG4J_PROPERTIES echo "zookeeper.console.threshold=$ZK_LOG_LEVEL" >> $LOG4J_PROPERTIES echo "zookeeper.log.threshold=$ZK_LOG_LEVEL" >> $LOG4J_PROPERTIES echo "zookeeper.log.dir=$ZK_DATA_LOG_DIR" >> $LOG4J_PROPERTIES echo "zookeeper.log.file=zookeeper.log" >> $LOG4J_PROPERTIES echo "zookeeper.log.maxfilesize=256MB" >> $LOG4J_PROPERTIES echo "zookeeper.log.maxbackupindex=10" >> $LOG4J_PROPERTIES echo "zookeeper.tracelog.dir=$ZK_DATA_LOG_DIR" >> $LOG4J_PROPERTIES echo "zookeeper.tracelog.file=zookeeper_trace.log" >> $LOG4J_PROPERTIES echo "log4j.rootLogger=\${zookeeper.root.logger}" >> $LOG4J_PROPERTIES echo "log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender" >> $LOG4J_PROPERTIES echo "log4j.appender.CONSOLE.Threshold=\${zookeeper.console.threshold}" >> $LOG4J_PROPERTIES echo "log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout" >> $LOG4J_PROPERTIES echo "log4j.appender.CONSOLE.layout.ConversionPattern=\ %d{ISO8601} [myid:%X{myid}] - %-5p [%t:%C{1}@%L] - %m%n" >> $LOG4J_PROPERTIES if [ -n "$JMXDISABLE" ] then MAIN=org.apache.zookeeper.server.quorum.QuorumPeerMain else MAIN="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=$JMXPORT \ -Dcom.sun.management.jmxremote.authenticate=$JMXAUTH \ -Dcom.sun.management.jmxremote.ssl=$JMXSSL \ -Dzookeeper.jmx.log4j.disable=$JMXLOG4J \ org.apache.zookeeper.server.quorum.QuorumPeerMain" fi set -x exec java -cp "$CLASSPATH" $JVMFLAGS $MAIN $ZK_CONFIG_FILE vault-agent-config.hcl: | exit_after_auth = true pid_file = "/home/vault/pidfile" auto_auth { method "kubernetes" { mount_path = "auth/kubernetes_cce1" config = { role = "zookeeper" token_path = "/run/secrets/tokens/vault-token" } } sink "file" { config = { path = "/home/vault/.vault-token" } } } cache { use_auto_auth_token = true } # ZK is neat-picky on cert file extensions template { destination = "/tls/ca.pem" contents = <` with the address of the Vault server. This includes a pod with Vault Agent side container as an init container, Vault Agent side container used continuously in the run cycle of the pod and Zookeeper main container. .. code-block:: yaml :caption: zookeeper-ss.yaml apiVersion: apps/v1 kind: StatefulSet spec: podManagementPolicy: Parallel replicas: 3 selector: matchLabels: app: zookeeper component: server serviceName: zookeeper-headless template: metadata: labels: app: zookeeper component: server spec: containers: - args: - agent - -config=/etc/vault/vault-agent-config.hcl - -log-level=debug - -exit-after-auth=false env: - name: VAULT_ADDR value: image: vault:1.9.0 name: vault-agent-sidecar volumeMounts: - mountPath: /etc/vault name: vault-agent-config - mountPath: /tls name: cert-data - mountPath: /var/run/secrets/tokens name: k8-tokens - command: - /bin/bash - -xec - /config-scripts/run env: - name: ZK_REPLICAS value: "3" - name: ZOO_PORT value: "2181" - name: ZOO_STANDALONE_ENABLED value: "false" - name: ZOO_TICK_TIME value: "2000" image: zookeeper:3.7.0 livenessProbe: exec: command: - sh - /config-scripts/ok failureThreshold: 2 initialDelaySeconds: 20 periodSeconds: 30 successThreshold: 1 timeoutSeconds: 5 name: zookeeper ports: - containerPort: 2281 name: client protocol: TCP - containerPort: 2888 name: server protocol: TCP - containerPort: 3888 name: election protocol: TCP readinessProbe: exec: command: - sh - /config-scripts/ready failureThreshold: 2 initialDelaySeconds: 20 periodSeconds: 30 successThreshold: 1 timeoutSeconds: 5 securityContext: runAsUser: 1000 volumeMounts: - mountPath: /data name: datadir - mountPath: /tls name: cert-data - mountPath: /config-scripts name: zookeeper-config dnsPolicy: ClusterFirst initContainers: - args: - agent - -config=/etc/vault/vault-agent-config.hcl - -log-level=debug - -exit-after-auth=true env: - name: VAULT_ADDR value: image: vault:1.9.0 name: vault-agent volumeMounts: - mountPath: /etc/vault name: vault-agent-config - mountPath: /tls name: cert-data - mountPath: /var/run/secrets/tokens name: k8-tokens restartPolicy: Always serviceAccount: zookeeper serviceAccountName: zookeeper terminationGracePeriodSeconds: 1800 volumes: - configMap: defaultMode: 420 items: - key: vault-agent-config.hcl path: vault-agent-config.hcl name: zookeeper-config name: vault-agent-config - configMap: defaultMode: 365 name: zookeeper-config name: zookeeper-config - emptyDir: {} name: cert-data - name: k8-tokens projected: defaultMode: 420 sources: - serviceAccountToken: expirationSeconds: 7200 path: vault-token updateStrategy: type: RollingUpdate volumeClaimTemplates: - apiVersion: v1 kind: PersistentVolumeClaim metadata: name: datadir spec: accessModes: - ReadWriteOnce resources: requests: storage: 5Gi storageClassName: csi-disk volumeMode: Filesystem .. code-block:: bash $ kubectl apply -f zookeeper-ss.yaml With this a production-ready Zookeeper service with enabled TLS has been deployed sucessfully to the CCE. The Vault Agent takes care of authorizing to HashiCorp Vault using a Kubernetes service account with a short time to live token and fetches required secrets to the file system. In the entire Kubernetes deployment there are no secrets for the application, neither the key to the Vault, nor TLS certificates themselves. Not even using Kubernetes secrets is necessary. References ========== * https://learn.hashicorp.com/tutorials/vault/agent-kubernetes?in=vault/app-integration * https://learn.hashicorp.com/tutorials/vault/agent-kubernetes?in=vault/auth-methods * https://www.vaultproject.io/docs/auth/kubernetes