Having recently tried to get DBConnect working on a Windows 10 machine I’ve realised things are not as easy as you might think.Read More
Whilst notebooks are great, there comes a time and place when you just want to use Python and PySpark in it’s pure form. Databricks has the ability to execute Python jobs for when notebooks don’t feel very enterprise data pipeline ready - %run and widgets just look like schoolboy hacks. Also the lack of debugging in Databricks is painful at times. By having a PySpark application we can debug locally in our IDE of choice (I’m using VSCode).Read More
Databricks provides some nice connectors for reading and writing data to SQL Server. These are generally want you need as these act in a distributed fashion and support push down predicates etc etc. But sometimes you want to execute a stored procedure or a simple statement.Read More
The data required “unpivoting” so that the measures became just three columns for Volume, Retail & Actual - and then we add 3 rows for each row as Years 16, 17 & 18.
Their are various ways of doing this in Spark, using Stack is an interesting one. But I find this complex and hard to read.Read More