PySpark

Executing SQL Server Stored Procedures from Databricks (PySpark)

Posted on October 12, 2018 | 4 minutes | 739 words | Simon D'Morias

Databricks provides some nice connectors for reading and writing data to SQL Server. These are generally want you need as these act in a distributed fashion and support push down predicates etc etc. But sometimes you want to execute a stored procedure or a simple statement. I must stress this is not recommended - more on that at the end of this blog. I’m going to assume that as you made it here you really want to do this. [Read More]

Unpivot Data in PySpark

Posted on September 13, 2018 | 2 minutes | 289 words | Simon D'Morias

Problem I recently encountered a file similar to this: The data required “unpivoting” so that the measures became just three columns for Volume, Retail & Actual - and then we add 3 rows for each row as Years 16, 17 & 18. Their are various ways of doing this in Spark, using Stack is an interesting one. But I find this complex and hard to read. First lets setup our environment and create a function to extract our sample data: [Read More]

PySpark Python Databricks