Categories
Azure

Key Vault adoption for Data Factory and Databricks

In this article we’ll review how to create a Key Vault and setup access policies so it can be used from Data Factory and Databricks. Then setup Data Factory linked service and Databricks secret scope to the Key Vault.

Setup Key Vault access policies

To get Key Vault secrets from any service, it needs to have access policies to get the secret. Go to Key Vault > Access policies > + Add Access Policy.

Then add secret permission (select both “Get” and “List”) and select principal to be your Data Factory, then click “Add”.

To do it for Databricks do the same and for select principle choose “AzureDatabricks”.

When ready click “Save”.

The secret permissions are as follows:

  • Permissions for secret management operations
    • get: Read a secret
    • list: List the secrets or versions of a secret stored in a Key Vault
    • set: Create a secret
    • delete: Delete a secret
    • recover: Recover a deleted secret
    • backup: Back up a secret in a key vault
    • restore: Restore a backed up secret to a key vault
  • Permissions for privileged operations
    • purge: Purge (permanently delete) a deleted secret

We can also give permissions to users and active directory groups, so for example if we want the developers from a specific AAD group to view the list of all secret (but not the values) we can give “List” secret permissions to the group.

Setup Data Factory Key Vault Linked Service

Go to your Data Factory studio (the one we added secret policy for) > Linked services > New > search for “Key Vault” > Continue.

For authentication method select managed identity and select the key vault. Click on “Test connection” to make sure the access policies are setup properly and if everything is okay click on “Create”.

Now we can use the Key Vault linked service to get Key Vault secrets. For example, let’s create a new linked service for connection to Data Lake Gen2 with service principal. When we start creating the linked service with service principal, we have the option to add the service principal key directly or to use key vault. When selecting “Azure Key Vault” we can select the Key Vault linked service and then the secret (instead of typing it directly).

This means that we can get the secret from the Key Vault and use it.

Setup Databricks Secret Scope

Login to the Databricks workspace and add “#secrets/createScope” to the link. (i.e. https://<databricks-instance>#secrets/createScope) and we’ll be directed to secret scope page.

To get the Key Vault DNS Name and Resource ID go to the Key Vault > Properties, copy them and paste them in the Databricks secret scope page. After that we need to click “Create” to create the secret scope.

And now if we want to get the secret using the secret scope we can use it with the following code:

dbutils.secrets.get(scope="ivo-akv-scope",key="ivo-test-secret")

We can also add access control lists to the secret scopes. This is if we want specific AD groups to have access to the secret scope. To do so first we need to setup personal access token and then login with PowerShell or Bash and setup the access control lists.

  1. Go to the Databricks workspace > Settings > User Settings > Generate New Token and copy the token.
  2. Login to Azure CLI (PowerShell)
  3. Configure Databricks CLI
  4. Setup ACLs (access control lists) to the secret scope
#Install Databricks CLI
pip3 install databricks-cli

#Configure Databricks CLI
echo "[DEFAULT]
host = https://northeurope.azuredatabricks.net
token = {Your token here}"> .databrickscfg

#Setup Databricks secret scope ACLs
databricks secrets list-scopes #to list the secret scopes
databricks secrets list-acls --scope ivo-akv-scope #to list ACL of a scope
databricks secrets put-acl --scope ivo-akv-scope --principal users --permission READ #to add acl for all users, you can add AD group instead of users
databricks secrets delete-acl --scope ivo-akv-scope --principal users #if we want to delete access control entry

If we want to add AD groups, we need to provision them in Databricks first. You can check an article how this can be done automatically here: https://ivotalkstech.com/azure/auto-provision-databricks-ad-groups-that-match-specific-regex/

Enjoy the Key Vault adoption!

Stay Awesome,
Ivelin

Leave a Reply

Your email address will not be published.