Categories
Azure

Auto-provision Databricks active table access and configure consumption for PowerBI

I’m assuming you already have:
1. Databricks with admin priviliges.
2. Data Factory with contributing access.
3. PowerBI Desktop installed.

The idea is to update the access for Databricks tables for specific AD groups. If you’re interested in auto-provisioning Databricks AD Groups you can follow up this article: https://ivotalkstech.com/azure/auto-provision-databricks-ad-groups-that-match-specific-regex

  1. Go to your Databricks admin console and make sure you’ve enabled Table Access Control from the Workspace Settings:

2. Then create a new cluster where:
2.1 Select “High Concurrency” cluster mode, so your cluster would be serverless.
2.2 In the Autopilot options make sure you’ve enabled autoscaling and auto-termination, that way you’ll avoid unwated costs.
2.3 In the Advanced Options you must enable table access control and allow only Python and SQL commands.

3. Once the cluster is created, go to cluster’s “Advanced Options”, click on “JDBC/ODBC” and copy the “Server Hostname” and “HTTP path” – you’ll need this for setting up the connection from PowerBI.

4. Create new Python notebook that will be used for granting permissions to the cluster. You can use this as an example and build on it:

for x in spark.sql('SHOW databases LIKE "bg_*"').collect():
  spark.sql('GRANT USAGE ON DATABASE {} TO `Azure_Translators_BG`'.format(x.databaseName))
  spark.sql('GRANT SELECT ON DATABASE {} TO `Azure_Translators_BG`'.format(x.databaseName))
  spark.sql('GRANT READ_METADATA ON DATABASE {} TO `Azure_Translators_BG`'.format(x.databaseName))

That way the AD group “Azure_Translators_BG” will be able to read all Databricks databases that start with “bg_”.


5. Now you’d probably want this notebook to run on regular basis so that any new databases that match this regex will be updated with the appropraite access. You can orchestrate this with Data Factory. Lets create new linked service, trigger and pipeline.

5.1 Create Data Factory Scheduled Trigger that will be associated with your pipeline.

5.2 Create Data Factory linked service to connect to Azure Databricks interactive cluster we just created.

5.2.1 Databricks Workspace URL – this is the link to the databricks you’re using it will look something like this “https://adb-<some-numbers>.azuredatabricks.net”
5.2.2 Authentication type – if you’re using access token to authenticate (which can be created from Databricks “User Settings > Access Tokens > Generate New Token”) ideally the password should be stored in Key Vault as a secret.
5.2.3 Select cluster – pick “Existing interactive cluster” and enter the “Existing interactive cluster ID”. If you unsure, go back to point 3, where you copied the HTTP Path, the last string after the “/” is the cluster id, it should look something like this “sql/protocolv1/o/<databricks-id>/<cluster-id>”.

5.3 Create Data Factory pipeline with Azure Databricks Notebook activity.

Where in Azire Databrocks tab you select the linked service we just created.

And in the settings tab you select the path to the notebook from your Databricks Workspace

6. Now that you’ve updated the access to the cluster and created a way to automatically update the permissions on a regular basis – it’s time to connect to those Databricks databases with PowerBI. To setup the connections go “Get Data” and search for “Databricks”

For “Server Hostname” and “HTTP Path” put the info you copied from point 3 and select data connectivity mode that works for you.

Now with this example if you’re member of AD Group “Azure_Translators_BG” and the group is provisioned into the Databricks workspace, you’ll be able to read all databases that start with “bg_”. Be mindful that if the cluster is not currently running it will take a couple of minutes to start.

Stay Awesome,
Ivelin

Leave a Reply

Your email address will not be published.