Hasko, Vladimir ab7bdd1686 Update tox.ini && conf.py file
Reviewed-by: Gode, Sebastian <sebastian.gode@t-systems.com>
Co-authored-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-committed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
2024-01-16 08:32:27 +00:00

619 lines
22 KiB
ReStructuredText

===============================================
Secrets management with CCE and Hashicorp Vault
===============================================
Most modern IT setups are composed of several subsystems like databases, object
stores, master controller, node access, and more. To access one component from
another, some form of credentials are required. Configuring and storing these
secrets directly in the components is considered as an antipattern, since a
vulnerability of one component may iteratively affect the security of the whole
setup.
With centralized secret management it becomes unnecessary to keep secrets used
by various applications spreaded across DevOps environments. This helps to close
some security attack vectors (like `secret sprawl
<https://www.hashicorp.com/resources/what-is-secret-sprawl-why-is-it-harmful>`_,
`security islands <https://www.conjur.org/blog/security-islands/>`_), but
usually introduces a problem of the so-called `Secret Zero
<https://www.hashicorp.com/resources/secret-zero-mitigating-the-risk-of-secret-introduction-with-vault>`_
as a key to the key storage.
Vault is an open-source software, provided and maintained by Hashicorp, that
addresses this very problem. It is considered one of the reference solutions
for it. This article demonstrates how to utilize infrastructure authorization
with Hashicorp Vault in an CCE-powered setup. As an example workload, we deploy
a Zookeeper cluster with enabled TLS protection. Certificates for Zookeeper are
stored in Vault, and they oblige required practices like rotations or audits.
Zookeper can easily be replaced by any other component that requires access to
internal credentials.
Overview
========
.. graphviz:: dot/cce_vault_overview.dot
:layout: dot
TLS secrets are kept in the Vault. They are being read by Vault Agent component
running as a sidecar in Zookeeper service pod and writes certificates onto the
file system. Zookeeper services reads certificates populated by Agent. Vault
Agent is configured to use password-less access to Vault. Further in the
document it is explained how exactly this is implemented.
Establishing trust between CCE and Vault
========================================
Before any application managed by the CCE is able to login to Vault relying on
infrastructure based authentication it is required to do some steps on the
Vault side. Kubernetes auth plugin is enabled and configured to only access
requests from specific Kubernetes cluster by providing its Certificate
Authority. To allow several multiple different CCE clusters to use Vault, a
dedicated auth path is going to be used.
.. code-block:: shell
$ vault auth enable -path kubernetes_cce1 kubernetes
$ vault write auth/kubernetes_cce1/config \
kubernetes_host="$K8S_HOST" \
kubernetes_ca_cert="$SA_CA_CRT"
Since in our example a dedicated service account with token is being
periodically rotated using `client JWT as reviewer JWT
<https://www.vaultproject.io/docs/auth/kubernetes#use-the-vault-client-s-jwt-as-the-reviewer-jwt>`_
can be used.
Access rules for Vault
======================
Having Auth plugin enabled, as described above, CCE workloads are able to
authenticate to Vault, but they can do nothing. It is now necessary to
establish further level of authorization and let particular service accounts of
CCE to get access to secrets in Vault.
For the scope of the use case, we grant the Zookeeper service account from its
namespace access to the TLS secrets stored in Vault's key-value store. For that
a policy providing a read-only access to the /tls/zk* and /tls/ca paths is
created.
.. code-block:: shell
$ vault policy write tls-zk-ro - <<EOF
path "secret/data/tls/zk_*" {capabilities = ["read"] }
path "secret/data/tls/ca" {capabilities = ["read"] }
path "secret/metadata/tls/zk_*" {capabilities = ["read"] }
path "secret/metadata/tls/ca" {capabilities = ["read"] }
EOF
Next granting the policy to the particular requestor (zookeeper
service account in zookeeper namespace) must be done.
.. code-block:: shell
$ vault write auth/kubernetes_cce1/role/zookeeper \
bound_service_account_names=zookeeper \
bound_service_account_namespaces=zookeeper \
policies=tls-zk-ro \
ttl=2h
With this done token of the service account zookeeper in the zookeeper
namespace is able to access to the vault for reading secrets located under
`/secret/tls` path. And since it is higly recommended to follow the least
required privilege principle only read only access to the TLS data is granted.
A time to live of two hours is being used here meaning that once application
authorize to Vault the token it gets can be used during next two hours. After
two hours Vault token becomes invalid and Vault Agent gets a new one valid for
next 2 hours. This needs to be carefully aligned with the time to live or the
service account token to minimize their overlap. It is advised to keep it
relatively short.
This is one the most sensitive steps in the whole configuration, since the
applications deployed in the Kubernetes may escape their scope or get
compromised by attackers. Reducing the number of secrets the accessor can read
mitigates this risk.
Populating secrets in Vault
===========================
Within Vault there are two possibilities to access TLS certificates:
* Store certificate data in the `KeyValue store
<https://www.vaultproject.io/docs/secrets/kv/kv-v2>`_
* Use `PKI secrets engine <https://www.vaultproject.io/docs/secrets/pki>`_ to
issue certificates
Vault enables users not only to store TLS certificates data in the key-value store,
but also to create and revoke them. To keep this tutorial simple enough we are
not going to do this and just upload generated certificates into the KV store.
For production setups this example can be easily extended with extra actions.
.. code-block:: shell
$ vault kv put secret/tls/ca certificate=@ca.crt
$ vault kv put secret/tls/zk_server certificate=@zk_server.crt private_key=@zk_server.key
$ vault kv put secret/tls/zk_client certificate=@zk_client.crt private_key=@zk_client.key
Certificate paths and property names used here are referenced by the Zookeeper installation.
Deploying Zookeeper
===================
Now that the secrets are stored safely in Vault and only allowed applications
can fetch them it is time to look how exactly the application accesses the
secrets. Generally, utilizing Vault requires modification of the application.
`Vault agent <https://www.vaultproject.io/docs/agent>`_ is a tool that was
created to simplify secrets delivery for applications when it is hard or
difficult to change the application itself. The Agent is taking care of reading
secrets from Vault and can deliver them to the file system.
There are many way how to properly implement Zookeeper service on the
Kubernetes. The scope of the blueprint is not Zookeeper itself, but
demostrating how an application can be supplied by required certificates. The
reference architecture described here bases on the best practices gathered from
various sources and extended by HashiCorp Vault. It overrides default Zookeeper
start scripts in order to allow better control of the runtime settings and
properly fill all required configuration options for TLS to work. Other methods
of deploying Zookeeper can be easily used here instead.
1. Create a Kubernetes namespace named `zookeeper`.
.. code-block:: shell
$ kubectl create namespace zookeeper
2. Create a Kubernetes service account named `zookeeper`.
.. code-block:: shell
$ kubectl create serviceaccount zookeeper
3. In Kubernetes a *service account* provides an identity for the services
running in the pod so that the process can access Kubernetes API. The same
identity can be used to access Vault, but require one special permission -
access to the tokenreview API of the Kubernetes. When instead a dedicated
reviewer JWT is used, this step is not necessary, but it also means
long-living sensitive data is used and frequently transferred over the
network. More details on various ways to use Kubernetes tokens to authorize
to Vault `can be found here
<https://www.vaultproject.io/docs/auth/kubernetes#how-to-work-with-short-lived-kubernetes-tokens>`_.
.. code-block:: shell
$ kubectl create clusterrolebinding vault-client-auth-delegator \
--clusterrole=system:auth-delegator \
--serviceaccount=zookeeper:zookeeper
4. Create a Kubernetes ConfigMap with all required configurations. One possible
approach is to define dedicated health and readiness check scripts and to
override automatically created Zookeeper start script. This is especially
useful when TLS protection is enabled, but default container scripts do not
support this.
.. code-block:: yaml
:caption: zookeeper-cm.yaml
---
apiVersion: v1
kind: ConfigMap
metadata:
name: zookeeper-config
namespace: "zookeeper"
data:
ok: |
#!/bin/sh
# This sript is used by live-check of Kubernetes pod
if [ -f /tls/ca.pem ]; then
echo "srvr" | openssl s_client -CAfile /tls/ca.pem -cert /tls/client/tls.crt \
-key /tls/client/tls.key -connect 127.0.0.1:${1:-2281} -quiet -ign_eof 2>/dev/null | grep Mode
else
zkServer.sh status
fi
ready: |
#!/bin/sh
# This sript is used by readiness-check of Kubernetes pod
if [ -f /tls/ca.pem ]; then
echo "ruok" | openssl s_client -CAfile /tls/ca.pem -cert /tls/client/tls.crt \
-key /tls/client/tls.key -connect 127.0.0.1:${1:-2281} -quiet -ign_eof 2>/dev/null
else
echo ruok | nc 127.0.0.1 ${1:-2181}
fi
run: |
#!/bin/bash
# This is the main starting script
set -a
ROOT=$(echo /apache-zookeeper-*)
ZK_USER=${ZK_USER:-"zookeeper"}
ZK_LOG_LEVEL=${ZK_LOG_LEVEL:-"INFO"}
ZK_DATA_DIR=${ZK_DATA_DIR:-"/data"}
ZK_DATA_LOG_DIR=${ZK_DATA_LOG_DIR:-"/data/log"}
ZK_CONF_DIR=${ZK_CONF_DIR:-"/conf"}
ZK_CLIENT_PORT=${ZK_CLIENT_PORT:-2181}
ZK_SSL_CLIENT_PORT=${ZK_SSL_CLIENT_PORT:-2281}
ZK_SERVER_PORT=${ZK_SERVER_PORT:-2888}
ZK_ELECTION_PORT=${ZK_ELECTION_PORT:-3888}
ID_FILE="$ZK_DATA_DIR/myid"
ZK_CONFIG_FILE="$ZK_CONF_DIR/zoo.cfg"
LOG4J_PROPERTIES="$ZK_CONF_DIR/log4j.properties"
HOST=$(hostname)
DOMAIN=`hostname -d`
APPJAR=$(echo $ROOT/*jar)
CLASSPATH="${ROOT}/lib/*:${APPJAR}:${ZK_CONF_DIR}:"
if [[ $HOST =~ (.*)-([0-9]+)$ ]]; then
NAME=${BASH_REMATCH[1]}
ORD=${BASH_REMATCH[2]}
MY_ID=$((ORD+1))
else
echo "Failed to extract ordinal from hostname $HOST"
exit 1
fi
mkdir -p $ZK_DATA_DIR
mkdir -p $ZK_DATA_LOG_DIR
echo $MY_ID >> $ID_FILE
echo "dataDir=$ZK_DATA_DIR" >> $ZK_CONFIG_FILE
echo "dataLogDir=$ZK_DATA_LOG_DIR" >> $ZK_CONFIG_FILE
echo "4lw.commands.whitelist=*" >> $ZK_CONFIG_FILE
# Client TLS configuration
if [[ -f /tls/ca.pem ]]; then
echo "secureClientPort=$ZK_SSL_CLIENT_PORT" >> $ZK_CONFIG_FILE
echo "ssl.keyStore.location=/tls/client/client.pem" >> $ZK_CONFIG_FILE
echo "ssl.trustStore.location=/tls/ca.pem" >> $ZK_CONFIG_FILE
else
echo "clientPort=$ZK_CLIENT_PORT" >> $ZK_CONFIG_FILE
fi
# Server TLS configuration
if [[ -f /tls/ca.pem ]]; then
echo "serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory" >> $ZK_CONFIG_FILE
echo "sslQuorum=true" >> $ZK_CONFIG_FILE
echo "ssl.quorum.keyStore.location=/tls/server/server.pem" >> $ZK_CONFIG_FILE
echo "ssl.quorum.trustStore.location=/tls/ca.pem" >> $ZK_CONFIG_FILE
fi
for (( i=1; i<=$ZK_REPLICAS; i++ ))
do
echo "server.$i=$NAME-$((i-1)).$DOMAIN:$ZK_SERVER_PORT:$ZK_ELECTION_PORT" >> $ZK_CONFIG_FILE
done
rm -f $LOG4J_PROPERTIES
echo "zookeeper.root.logger=$ZK_LOG_LEVEL, CONSOLE" >> $LOG4J_PROPERTIES
echo "zookeeper.console.threshold=$ZK_LOG_LEVEL" >> $LOG4J_PROPERTIES
echo "zookeeper.log.threshold=$ZK_LOG_LEVEL" >> $LOG4J_PROPERTIES
echo "zookeeper.log.dir=$ZK_DATA_LOG_DIR" >> $LOG4J_PROPERTIES
echo "zookeeper.log.file=zookeeper.log" >> $LOG4J_PROPERTIES
echo "zookeeper.log.maxfilesize=256MB" >> $LOG4J_PROPERTIES
echo "zookeeper.log.maxbackupindex=10" >> $LOG4J_PROPERTIES
echo "zookeeper.tracelog.dir=$ZK_DATA_LOG_DIR" >> $LOG4J_PROPERTIES
echo "zookeeper.tracelog.file=zookeeper_trace.log" >> $LOG4J_PROPERTIES
echo "log4j.rootLogger=\${zookeeper.root.logger}" >> $LOG4J_PROPERTIES
echo "log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender" >> $LOG4J_PROPERTIES
echo "log4j.appender.CONSOLE.Threshold=\${zookeeper.console.threshold}" >> $LOG4J_PROPERTIES
echo "log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout" >> $LOG4J_PROPERTIES
echo "log4j.appender.CONSOLE.layout.ConversionPattern=\
%d{ISO8601} [myid:%X{myid}] - %-5p [%t:%C{1}@%L] - %m%n" >> $LOG4J_PROPERTIES
if [ -n "$JMXDISABLE" ]
then
MAIN=org.apache.zookeeper.server.quorum.QuorumPeerMain
else
MAIN="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=$JMXPORT \
-Dcom.sun.management.jmxremote.authenticate=$JMXAUTH \
-Dcom.sun.management.jmxremote.ssl=$JMXSSL \
-Dzookeeper.jmx.log4j.disable=$JMXLOG4J \
org.apache.zookeeper.server.quorum.QuorumPeerMain"
fi
set -x
exec java -cp "$CLASSPATH" $JVMFLAGS $MAIN $ZK_CONFIG_FILE
vault-agent-config.hcl: |
exit_after_auth = true
pid_file = "/home/vault/pidfile"
auto_auth {
method "kubernetes" {
mount_path = "auth/kubernetes_cce1"
config = {
role = "zookeeper"
token_path = "/run/secrets/tokens/vault-token"
}
}
sink "file" {
config = {
path = "/home/vault/.vault-token"
}
}
}
cache {
use_auto_auth_token = true
}
# ZK is neat-picky on cert file extensions
template {
destination = "/tls/ca.pem"
contents = <<EOT
{{- with secret "secret/data/tls/ca" }}{{ .Data.data.certificate }}{{ end }}
EOT
}
template {
destination = "/tls/server/server.pem"
contents = <<EOT
{{- with secret "secret/data/tls/zk_server" }}{{ .Data.data.certificate }}
{{ .Data.data.private_key }}{{ end }}
EOT
}
template {
destination = "/tls/server/tls.crt"
contents = <<EOT
{{- with secret "secret/data/tls/zk_server" }}{{ .Data.data.certificate }}{{ end }}
EOT
}
template {
destination = "/tls/server/tls.key"
contents = <<EOT
{{- with secret "secret/data/tls/zk_server" }}{{ .Data.data.private_key }}{{ end }}
EOT
}
template {
destination = "/tls/client/client.pem"
contents = <<EOT
{{- with secret "secret/data/tls/zk_client" }}{{ .Data.data.certificate }}
{{ .Data.data.private_key }}{{ end }}
EOT
}
template {
destination = "/tls/client/tls.crt"
contents = <<EOT
{{- with secret "secret/data/tls/zk_client" }}{{ .Data.data.certificate }}{{ end }}
EOT
}
template {
destination = "/tls/client/tls.key"
contents = <<EOT
{{- with secret "secret/data/tls/zk_client" }}{{ .Data.data.private_key }}{{ end }}
EOT
}
.. code-block:: bash
$ kubectl apply -f zookeeper-cm.yaml
5. Create Zookeeper Headless service. It is used by pods to build quorum and
implementing cluster internal communication.
.. code-block:: yaml
:caption: zookeeper-svc.yaml
---
name: "zookeeper-svc"
namespace: "zookeeper"
apiVersion: v1
kind: Service
spec:
# Not exposing in the cluster
clusterIP: None
# Important to start up
publishNotReadyAddresses: true
selector:
app: zookeeper
ports:
- port: 2281
name: client
targetPort: client
protocol: TCP
- port: 2888
name: server
targetPort: server
protocol: TCP
- port: 3888
name: election
targetPort: election
protocol: TCP
.. code-block:: bash
$ kubectl apply -f zookeeper-svc.yaml
6. Create Frontend service. It is used by the clients and therefore only includes client port of Zookeeper.
.. code-block:: yaml
:caption: zookeeper-svc-public.yaml
apiVersion: v1
kind: Service
spec:
clusterIP: None
ports:
- name: client
port: 2281
protocol: TCP
targetPort: client
selector:
app: zookeeper
sessionAffinity: None
type: ClusterIP
.. code-block:: bash
$ kubectl apply -f zookeeper-svc-public.yaml
7. Create StatefulSet replacing `<VAULT_PUBLIC_ADDR>` with the address of the
Vault server. This includes a pod with Vault Agent side container as an init
container, Vault Agent side container used continuously in the run cycle of
the pod and Zookeeper main container.
.. code-block:: yaml
:caption: zookeeper-ss.yaml
apiVersion: apps/v1
kind: StatefulSet
spec:
podManagementPolicy: Parallel
replicas: 3
selector:
matchLabels:
app: zookeeper
component: server
serviceName: zookeeper-headless
template:
metadata:
labels:
app: zookeeper
component: server
spec:
containers:
- args:
- agent
- -config=/etc/vault/vault-agent-config.hcl
- -log-level=debug
- -exit-after-auth=false
env:
- name: VAULT_ADDR
value: <VAULT_PUBLIC_ADDR>
image: vault:1.9.0
name: vault-agent-sidecar
volumeMounts:
- mountPath: /etc/vault
name: vault-agent-config
- mountPath: /tls
name: cert-data
- mountPath: /var/run/secrets/tokens
name: k8-tokens
- command:
- /bin/bash
- -xec
- /config-scripts/run
env:
- name: ZK_REPLICAS
value: "3"
- name: ZOO_PORT
value: "2181"
- name: ZOO_STANDALONE_ENABLED
value: "false"
- name: ZOO_TICK_TIME
value: "2000"
image: zookeeper:3.7.0
livenessProbe:
exec:
command:
- sh
- /config-scripts/ok
failureThreshold: 2
initialDelaySeconds: 20
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 5
name: zookeeper
ports:
- containerPort: 2281
name: client
protocol: TCP
- containerPort: 2888
name: server
protocol: TCP
- containerPort: 3888
name: election
protocol: TCP
readinessProbe:
exec:
command:
- sh
- /config-scripts/ready
failureThreshold: 2
initialDelaySeconds: 20
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 5
securityContext:
runAsUser: 1000
volumeMounts:
- mountPath: /data
name: datadir
- mountPath: /tls
name: cert-data
- mountPath: /config-scripts
name: zookeeper-config
dnsPolicy: ClusterFirst
initContainers:
- args:
- agent
- -config=/etc/vault/vault-agent-config.hcl
- -log-level=debug
- -exit-after-auth=true
env:
- name: VAULT_ADDR
value: <VAULT_PUBLIC_ADDR>
image: vault:1.9.0
name: vault-agent
volumeMounts:
- mountPath: /etc/vault
name: vault-agent-config
- mountPath: /tls
name: cert-data
- mountPath: /var/run/secrets/tokens
name: k8-tokens
restartPolicy: Always
serviceAccount: zookeeper
serviceAccountName: zookeeper
terminationGracePeriodSeconds: 1800
volumes:
- configMap:
defaultMode: 420
items:
- key: vault-agent-config.hcl
path: vault-agent-config.hcl
name: zookeeper-config
name: vault-agent-config
- configMap:
defaultMode: 365
name: zookeeper-config
name: zookeeper-config
- emptyDir: {}
name: cert-data
- name: k8-tokens
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 7200
path: vault-token
updateStrategy:
type: RollingUpdate
volumeClaimTemplates:
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: datadir
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: csi-disk
volumeMode: Filesystem
.. code-block:: bash
$ kubectl apply -f zookeeper-ss.yaml
With this a production-ready Zookeeper service with enabled TLS has been
deployed sucessfully to the CCE. The Vault Agent takes care of authorizing to
HashiCorp Vault using a Kubernetes service account with a short time to live
token and fetches required secrets to the file system. In the entire Kubernetes
deployment there are no secrets for the application, neither the key to the
Vault, nor TLS certificates themselves. Not even using Kubernetes secrets is
necessary.
References
==========
* https://learn.hashicorp.com/tutorials/vault/agent-kubernetes?in=vault/app-integration
* https://learn.hashicorp.com/tutorials/vault/agent-kubernetes?in=vault/auth-methods
* https://www.vaultproject.io/docs/auth/kubernetes