Setup Databricks-Connect on Windows 10

UPDATE June 2020 - How about using a container instead? It’s much easier than installing all this stuff: Prebuilt container Having recently tried to get DBConnect working on a Windows 10 machine I’ve realised things are not as easy as you might think. These are the steps I have found to setup a new machine and get Databricks-Connect working. Install Java Download and install Java SE Runtime Version 8. It’s very important you only get version 8 nothing later. [Read More]

Databricks-Connect - FINALLY!

I’ve been waiting for this feature for what feels like forever. Databricks-Connect is here! You can download here. It allows you to develop using an IDE like VSCode, PyCharm, IntelliJ etc and connect to a remote Databricks cluster to execute the task. This means we can bring much better development experiences and best practices to data engineering workloads. Notebooks are great for exploring data, but they are not enterprise code for ETL jobs. [Read More]

Databricks & Snowflake Python Errors

I’ve recently been playing with writing data to Snowflake from Databricks. Reading and Writing is pretty simple as per the instructions from Databricks. But if you want to execute SnowSQL commands using the snowflake-python-connector and Python 3 you will be greeted with this error when you try to import the module (despite attaching the library without error): cffi library '_openssl' has no function, constant or global variable named 'Cryptography_HAS_SET_ECDH_AUTO' [Read More]

Unpivot Data in PySpark

Problem I recently encountered a file similar to this: The data required “unpivoting” so that the measures became just three columns for Volume, Retail & Actual - and then we add 3 rows for each row as Years 16, 17 & 18. Their are various ways of doing this in Spark, using Stack is an interesting one. But I find this complex and hard to read. First lets setup our environment and create a function to extract our sample data: [Read More]