PySpark Applications for Databricks

UPDATE April 2019: If you are interested in creating a PySpark application for Databricks you should consider using Databricks-Connect. More details here. Whilst notebooks are great, there comes a time and place when you just want to use Python and PySpark in it’s pure form. Databricks has the ability to execute Python jobs for when notebooks don’t feel very enterprise data pipeline ready - %run and widgets just look like schoolboy hacks. [Read More]

Databricks CI/CD Tools

A while back now I started to create some PowerShell modules for assisting with DevOps CI and CD scenarios. These can be found on GitHub here. Why? Firstly, the one thing I don’t like about Databricks is the CI/CD support. I think it is very lacking in support for data engineers and too focused on data science. Don’t get me wrong, Databricks is great - but it is also relatively young as a product and still ironing out the user experience part. [Read More]