ADFv2 - Testing Linked Services

Azure Data Factory v2 Linked Services Testing If you are deploying lots of Linked Services to an environment it would be nice to run a test that proves they connect successfully. This can validate many things, including: * Key Vault Secrets have also been deployed * Permissions applied to your secrets * Integration Runtime is up and working * Firewall ports opened * User Permissions deployed PowerShell PowerShell and the Azure REST API to the rescue. [Read More]

PowerShell and Azure REST API Authentication

Sometimes you find that the Azure PowerShell commandlets do not offer all of the functionality of the REST API/Portal. In these cases you can fall back to the REST API which can be called from PowerShell of course. The first thing you always need to do is authenticate. These scripts will authenticate using a service principle so that they can be used in non-interactive mode. Details of how to create a service principle can be found here: https://docs. [Read More]

Getting All Table Information

One thing that bugs me in SQL Server is how hard it is to get information about your tables to analyse usage, indexes and size. This is a query I wrote several years ago and still use today. Information exported includes: All Index information including columns Compression File Groups Space Used Row Count Index Usage SELECT O.object_id, O.name TableName, ISNULL(I.[name],'HEAP') IndexName, i.type_desc [IndexType], ISNULL(SDS.name,NPSDS.[name]) FileGroup, PS.row_count [RowCount], CAST(PS.used_page_count * 8 AS money)/1024 SpaceUsed_MB, CAST(PS. [Read More]

Unpivot Data in PySpark

Problem I recently encountered a file similar to this: The data required “unpivoting” so that the measures became just three columns for Volume, Retail & Actual - and then we add 3 rows for each row as Years 16, 17 & 18. Their are various ways of doing this in Spark, using Stack is an interesting one. But I find this complex and hard to read. First lets setup our environment and create a function to extract our sample data: [Read More]

Databricks CI/CD Tools

A while back now I started to create some PowerShell modules for assisting with DevOps CI and CD scenarios. These can be found on GitHub here. Why? Firstly, the one thing I don’t like about Databricks is the CI/CD support. I think it is very lacking in support for data engineers and too focused on data science. Don’t get me wrong, Databricks is great - but it is also relatively young as a product and still ironing out the user experience part. [Read More]