Part 5 - Developing a PySpark Application

This is the 5th and final part of a series of posts to show how you can develop PySpark applications for Databricks with Databricks-Connect and Azure DevOps. All source code can be found here. Configuration & Releasing We are now ready to deploy. I’m working on the assumption we have two further environments to deploy into - UAT and Production. Deploy.ps1 This script in the root folder will do all the work we need to release our Wheel and setup some Databricks Jobs for us. [Read More]

Part 4 - Developing a PySpark Application

This is the 4th part of a series of posts to show how you can develop PySpark applications for Databricks with Databricks-Connect and Azure DevOps. All source code can be found here. Create a CI Build Now that we have everything running locally we want to create a CI process to build our Wheel, publish that as an artefact and of course to test our code. In the root of the project is a file called azure-pipelines. [Read More]

Part 3 - Developing a PySpark Application

This is the 3rd part of a series of posts to show how you can develop PySpark applications for Databricks with Databricks-Connect and Azure DevOps. All source code can be found here. Packaging into a Wheel Before we can create a CI process we should ensure that we can build and package the application locally. By reusing the same scripts locally and on CI server minimises the chances of something breaking. [Read More]

Part 2 - Developing a PySpark Application

This is the 2nd part of a series of posts to show how you can develop PySpark applications for Databricks with Databricks-Connect and Azure DevOps. All source code can be found here. Adding Tests Now that we have a working local application we should add some tests. As with any data based project testing can be very difficult. Strictly speaking all tests should setup the data they need and be fully independent. [Read More]

Part 1 - Developing a PySpark Application

This is the 1st part of a series of posts to show how you can develop PySpark applications for Databricks with Databricks-Connect and Azure DevOps. All source code can be found here. Overview The goal of this post is to be able to create a PySpark application in Visual Studio Code using Databricks-Connect. This post focuses on creating an application in your local Development environment. Other posts in the series will look at CI & Testing. [Read More]

Series - Developing a PySpark Application

This is a series of blog post to demonstrate how PySpark applications can be developed specifically with Databricks in mind. Though the general principal applied here can be used with any Apache Spark setup (not just Databricks). Python has many complexities with regards to paths, packaging and deploying versions. These blogs posts are what we have learnt working with clients to build robust, reliable development processes. The goal is to have a local development environment but using Databricks-Connect to execute against, and then a solid CI and testing framework to support the development going forward using Azure DevOps pipelines. [Read More]