Databricks-Connect in a Container

Today I have published a set of containers to Docker Hub to enable developers to create Databricks-Connect Dev environments really quickly and consistently. Docker Hub: Source Code: These are targeted at dev using PySpark in VSCode. Though I suspect it will work for Scala and Java development as well. Why? Because setting up Databricks-Connect (particularly on Windows is a PIA). This allow: A common setup between team members Multiple side by side versions Ability to reset your environment Even run the whole thing from a browser! [Read More]

Series - Developing a PySpark Application

This is a series of blog post to demonstrate how PySpark applications can be developed specifically with Databricks in mind. Though the general principal applied here can be used with any Apache Spark setup (not just Databricks). Python has many complexities with regards to paths, packaging and deploying versions. These blogs posts are what we have learnt working with clients to build robust, reliable development processes. The goal is to have a local development environment but using Databricks-Connect to execute against, and then a solid CI and testing framework to support the development going forward using Azure DevOps pipelines. [Read More]

Databricks Key Vault backed Secret Scopes

A few weeks ago now Databricks added the ability to have Azure Key Vault backed Secret Scopes. These are still in preview. Today the PowerShell tools for Databricks we maintain have been updated to support these! Technically the API is not documented, and as the service is in preview stop working in the future. But we will update the module if Databricks change the API. Example: Import-Module azure.databricks.cicd.Tools $BearerToken = "dapi1234567890" $Region = "westeurope" $ResID = "/subscriptions/{mysubscriptionid}/resourceGroups/{myResourceGroup}/providers/Microsoft. [Read More]

PowerShell for Azure Databricks

Last year we released a a PowerShell module called on GitHub and PowerShell Gallery. What we never did is publish anything about what it can do. The original purpose was to help with CI/CD scenarios, so that you could create idempotent releases in Azure DevOps, Jenkins etc. But now it has almost full parity with the options available in the REST API. Databricks do offer a supported CLI (which requires Python installed), and a REST API - which is quite complex to use - but is what this PowerShell module uses. [Read More]

Controlling the Databricks Resource Group Name

When you create a Databricks workspace using the Azure portal you obviously specify the Resource Group to create it in. But in the background a second resource group is created, this is known as the managed resource group - it is created with an almost random name. This is a pain if you have naming conventions or standards to adhere to. The managed resource group is used for networking of your clusters and for providing the DBFS storage account. [Read More]

ADFv2 - Testing Linked Services

Azure Data Factory v2 Linked Services Testing If you are deploying lots of Linked Services to an environment it would be nice to run a test that proves they connect successfully. This can validate many things, including: * Key Vault Secrets have also been deployed * Permissions applied to your secrets * Integration Runtime is up and working * Firewall ports opened * User Permissions deployed PowerShell PowerShell and the Azure REST API to the rescue. [Read More]

PowerShell and Azure REST API Authentication

Sometimes you find that the Azure PowerShell commandlets do not offer all of the functionality of the REST API/Portal. In these cases you can fall back to the REST API which can be called from PowerShell of course. The first thing you always need to do is authenticate. These scripts will authenticate using a service principle so that they can be used in non-interactive mode. Details of how to create a service principle can be found here: https://docs. [Read More]