Executing SQL Server Stored Procedures from Databricks (PySpark)

Databricks provides some nice connectors for reading and writing data to SQL Server. These are generally want you need as these act in a distributed fashion and support push down predicates etc etc. But sometimes you want to execute a stored procedure or a simple statement. I must stress this is not recommended - more on that at the end of this blog. I’m going to assume that as you made it here you really want to do this. [Read More]

Unpivot Data in PySpark

Problem I recently encountered a file similar to this: The data required “unpivoting” so that the measures became just three columns for Volume, Retail & Actual - and then we add 3 rows for each row as Years 16, 17 & 18. Their are various ways of doing this in Spark, using Stack is an interesting one. But I find this complex and hard to read. First lets setup our environment and create a function to extract our sample data: [Read More]