Databricks PowerShell Tools Update 1.1.21

A new release of azure.databricks.cicd.tools has gone out today. Changes include: Support for Cluster Log Path for: New-DatabricksCluster Add-DatabricksJarJob Add-DatabricksNotebookJob Add-DatabricksPythonJob Support for Instance Pool ID on the above as well Support from creating new instance pools will come soon New Command Restart-DatabricksCluster New Command Remove-DatabricksLibrary Add -RunImmediate to Add-DatabricksNotebookJob Fixed case sensitive issue preventing all commands importing on Linux All the docs have been updated in the wiki as well. [Read More]

Databricks Key Vault backed Secret Scopes

A few weeks ago now Databricks added the ability to have Azure Key Vault backed Secret Scopes. These are still in preview. Today the PowerShell tools for Databricks we maintain have been updated to support these! Technically the API is not documented, and as the service is in preview stop working in the future. But we will update the module if Databricks change the API. Example: Import-Module azure.databricks.cicd.Tools $BearerToken = "dapi1234567890" $Region = "westeurope" $ResID = "/subscriptions/{mysubscriptionid}/resourceGroups/{myResourceGroup}/providers/Microsoft. [Read More]

PowerShell for Azure Databricks

Last year we released a a PowerShell module called azure.databricks.cicd.tools on GitHub and PowerShell Gallery. What we never did is publish anything about what it can do. The original purpose was to help with CI/CD scenarios, so that you could create idempotent releases in Azure DevOps, Jenkins etc. But now it has almost full parity with the options available in the REST API. Databricks do offer a supported CLI (which requires Python installed), and a REST API - which is quite complex to use - but is what this PowerShell module uses. [Read More]

Databricks Cluster Management via PowerShell

We have released a big update to the CI/CD Tools on GitHub today: https://github.com/DataThirstLtd/azure.databricks.cicd.tools These updates are for cluster management within Databricks. They allow for you to Create or Update Clusters. Stop/Start/Delete and Resize. There are also some new helper functions to get a list of available Spark versions and types of VM’s available to you. The full set of new commands is: Get-DatabricksClusters - Returns a list of all clusters in your workspace New-DatabricksCluster - Creates/Updates a cluster Start-DatabricksCluster Stop-DatabricksCluster Update-DatabricksClusterResize - Modify the number of scale workers Remove-DatabricksCluster - Deletes your cluster Get-DatabricksNodeTypes - returns a list of valid nodes type (such as DS3v2 etc) Get-DatabricksSparkVersions - returns a list of valid versions These will hopefully be added to the VSTS/Azure DevOps tasks in near future. [Read More]

Controlling the Databricks Resource Group Name

When you create a Databricks workspace using the Azure portal you obviously specify the Resource Group to create it in. But in the background a second resource group is created, this is known as the managed resource group - it is created with an almost random name. This is a pain if you have naming conventions or standards to adhere to. The managed resource group is used for networking of your clusters and for providing the DBFS storage account. [Read More]

ADFv2 - Testing Linked Services

Azure Data Factory v2 Linked Services Testing If you are deploying lots of Linked Services to an environment it would be nice to run a test that proves they connect successfully. This can validate many things, including: * Key Vault Secrets have also been deployed * Permissions applied to your secrets * Integration Runtime is up and working * Firewall ports opened * User Permissions deployed PowerShell PowerShell and the Azure REST API to the rescue. [Read More]

PowerShell and Azure REST API Authentication

Sometimes you find that the Azure PowerShell commandlets do not offer all of the functionality of the REST API/Portal. In these cases you can fall back to the REST API which can be called from PowerShell of course. The first thing you always need to do is authenticate. These scripts will authenticate using a service principle so that they can be used in non-interactive mode. Details of how to create a service principle can be found here: https://docs. [Read More]

Databricks CI/CD Tools

A while back now I started to create some PowerShell modules for assisting with DevOps CI and CD scenarios. These can be found on GitHub here. Why? Firstly, the one thing I don’t like about Databricks is the CI/CD support. I think it is very lacking in support for data engineers and too focused on data science. Don’t get me wrong, Databricks is great - but it is also relatively young as a product and still ironing out the user experience part. [Read More]