You will get Unit Testing of PySpark Code

Project details
1-hour consultation regarding unit testing of PySpark code
Unit testing of a PySpark code can be really tricky and cumbersome. Especially when you have to test the code that interacts with a data/file storage, DeltaTable, SqlContext, Databricks Utilities (dbutils) etc. In such scenarios you would typically have to prepare data before executing a unit test, clean it up afterwards and repeat this process for every single unit test that has external dependencies or is execution IO operation.
Wouldn't it be easier, faster and cheaper to execute all the test just against memory? We can discuss how to approach unit testing of such code, I can provide you with examples and links to documentation of the tools that we will use.
You can pick my brain to achieve the goals you have. I can help you up-skill yourself to be more proficient and productive.
Unit testing of a PySpark code can be really tricky and cumbersome. Especially when you have to test the code that interacts with a data/file storage, DeltaTable, SqlContext, Databricks Utilities (dbutils) etc. In such scenarios you would typically have to prepare data before executing a unit test, clean it up afterwards and repeat this process for every single unit test that has external dependencies or is execution IO operation.
Wouldn't it be easier, faster and cheaper to execute all the test just against memory? We can discuss how to approach unit testing of such code, I can provide you with examples and links to documentation of the tools that we will use.
You can pick my brain to achieve the goals you have. I can help you up-skill yourself to be more proficient and productive.
Project Type
Development, IT$70
- Delivery Time 5 days
- Number of Revisions 0
1 review
(1)
(0)
(0)
(0)
(0)
This project doesn't have any reviews.
VH
Venus H.
Nov 4, 2024
DQ Project: GenAI - Python/Databricks debugging
About Michal
Databricks Solution Architect, Data Engineer & GenAI Engineer
Pezinok, Slovakia - 10:22 pm local time
I'm a certified Databricks Solutions Architect Champion. This is an exclusive "by invitation only" program. Databricks Champions are evangelists and leaders of success for their Unified Analytics and Machine Learning practices. As a Databricks partner I have access to private previews of new Databricks features and products, access to exclusive webinars and Partner Product Council meetings. This is a great opportunity to take advantage of being always few steps ahead.
I specialize in the architecture and implementation of data lakehouses, ETL / ELT pipelines, Generative AI and Machine Learning in Microsoft Azure and Databricks. I have experience with Databricks Data Intelligence Platform, Unity Catalog, MLflow, Azure Synapse, Azure Data Factory, Azure Data Lake Storage and other complementary components.
I also focus on the topic that is very dear to me - Artificial Intelligence. I hold a master's degree in AI and since I graduated the landscape in this field has changed significantly. Databricks makes it quite easy to work with ML and GenAI on their platform already and they are constantly releasing more features and products in this area. They are lowering the entry barrier, optimizing training processes and thus pushing the price of training your own model down to make it available to everyone. I have experience with building RAG applications, compound AI systems, vector search chains, agentic AI, prompt engineering and LLM / chain evaluation on Databricks platform.
I’m working on couple of POCs related to ML and GenAI. One of them is building a RAG application and improving it by fine-tuning the model and prompt engineering. I’m experimenting with various models like DBRX, Gemma, GPT-2 & GPT-3 and evaluating their performance on different tasks.
I'm certified Accenture Technology Architect Associate and I have a great passion for Scrum methodology and experience of being a scrum master on several projects.
My strengths are extreme attention to detail, enforcing consistency, promoting good team spirit and building long term relationship with clients.
I'm the founder and the CEO of ru3ch interactive company - innovative and agile provider of digital, cloud and advisory services.
Certifications:
• Databricks Solutions Architect Champion
• Databricks Solutions Architect Essentials badge
• Databricks Generative AI Fundamentals badge
• Databricks Data & AI Governance with Unity Catalog badge
• Databricks Advantages of Data & AI Governance badge
• Databricks Data Intelligence Platform Whiteboard badge
• Databricks Advantages of the Lakehouse badge
• Databricks Advantages of Azure Databricks & Microsoft Fabric badge
• Databricks Cloud Native SPARK Migration badge
• Databricks Sales Ready badge
• Databricks Developer Essentials badge
• Databricks Developer Foundations badge
• Databricks SQL Analyst Associate badge
• Databricks Delta Lake Essentials badge
• Databricks Databricks Lakehouse Essentials badge
• Databricks Unified Data Analytics Essentials badge
• Microsoft Azure Data Engineer Associate (DP-200 + DP-201)
• MCSD: Web Applications
• MCPD: Web Developer 4
• Professional Scrum Master I
• Avanade Certified Scrum Developer
• Avanade Certified Scrum Project Manager
• Accenture Technology Architect Associate
Steps for completing your project
After purchasing the project, send requirements so Michal can start the project.
Delivery time starts when Michal receives requirements from you.
Michal works on your project following the steps below.
Revisions may occur after the delivery date.
Scheduling the meeting
Based on your and my availability we'll agree on the date and time of the meeting and platform we'll use for the call.
Online consultation
We'll meet online and discuss the topic.