Now that we have everything running locally we want to create a CI process to build our Wheel, publish that as an artefact and of course to test our code.
In the root of the project is a file called azure-pipelines.yml. This Yaml file describes the build process that takes place.Read More
Before we can create a CI process we should ensure that we can build and package the application locally. By reusing the same scripts locally and on CI server minimises the chances of something breaking.Read More
Now that we have a working local application we should add some tests. As with any data based project testing can be very difficult. Strictly speaking all tests should setup the data they need and be fully independent. This is actually a good case to use normal PySpark locally rather than Databricks-Connect as that forces you to use small local files which you could add to a repoRead More
The goal of this post is to be able to create a PySpark application in Visual Studio Code using Databricks-Connect. This post focuses on creating an application in your local Development environment. Other posts in the series will look at CI & Testing.Read More