Hire the best Apache Spark Engineers in Poland
Check out Apache Spark Engineers in Poland with the skills you need for your next job.
- $55 hourly
- 5.0/5
- (16 jobs)
Over 22 years of experience working on enterprise class systems. Variety of projects and customers. Always focused on performance and reliability. Always on cutting edge of technology. Keeping in mind business objectives and customers interests. Competencies: • Solution Selection, Architecture Design, Implementation, Customization, Migration, Internal Adoption - Freelance Consulting & Audit BigData & Machine Learning Expertise: • Machine Learning Models adoption from business use-case until production maintenance • Data Collection, Data pipelines, Data Workflows • Big Data ETL & Mining • System Integration I have a solid portfolio of enterprise level, successfully implemented solutions as well as ad-hoc optimizations of existing systems in large, global organizations.Apache Spark
KubernetesBigQuerySQL ProgrammingBig DataDockerApache KafkaNoSQL DatabaseMachine LearningScalaPython - $100 hourly
- 5.0/5
- (45 jobs)
I have over 4 years of experience in Data Engineering (especially using Spark and pySpark to gain value from massive amounts of data). I worked with analysts and data scientists by conducting workshops on working in Hadoop/Spark and resolving their issues with big data ecosystem. I also have experience on Hadoop maintenace and building ETL, especially between Hadoop and Kafka. You can find my profile on stackoverflow (link in Portfolio section) - I help mostly in spark and pyspark tagged questions.Apache Spark
MongoDBData WarehousingData ScrapingETLData VisualizationPySparkPythonData MigrationApache AirflowApache KafkaApache Hadoop - $60 hourly
- 5.0/5
- (1 job)
🚀 Senior Data Engineer | Databricks Expert | Data Solutions Architect With over 7 years of experience as a Data Engineer, I help businesses transform raw data into real, measurable value. I've managed infrastructures scaling over 2 petabytes, and in one project alone, handled 500+ TB of data, building and optimizing large-scale pipelines that are fast, reliable, and cost-efficient. 🧠 Area of Expertise: ✅ Data pipelines (ETL/ELT) ✅ Data Governance & Security ✅ Data Silos & Integration ✅ Data CI/CD ✅ Anything related to Databricks and Data Engineering 🛠️ Services I Offer: ✅ Databricks implementation or migration ✅ Organizing and structuring your data for insight and scalability ✅ Choosing the right data architecture for your goals ✅ Building both batch and real-time streaming pipelines ✅ Creating your data platform from the ground up ✅ Training and deploying ML/AI models using Databricks ✅ Implementing Unity Catalog and the Data Lakehouse architecture ✅ Data strategy and consulting ✅ Training your team to be self-sufficient with Databricks 💡 My work consistently delivers impact — I've helped clients reduce storage and processing costs by an average of 75%, translating into substantial long-term savings. While I'm proficient in Snowflake, AWS Glue, SageMaker, Azure Synapse, GCP BigQuery, and other major cloud platforms, Databricks is my specialty and my preferred environment. I use it to build everything from high-performance ETL/ELT pipelines to ML model training workflows, data governance solutions, and real-time analytics. It’s simply the best platform for modern data engineering, hands down. 🔍 I solve data challenges like: ✅ Inefficient or outdated pipelines ✅ Lack of data governance and lineage ✅ Disconnected and siloed data sources ✅ Difficult-to-maintain or poorly designed data architectures 📈 My goal is simple: Help clients understand, use, and profit from their data. Whether you're scaling up, cleaning up, or starting from scratch — I can help. Industries I know really well are: ✅ Aviation ✅ Logistics / ocean freight ✅ SaaS / PaaS products Let’s unlock the value of your data — and turn it into real business results.Apache Spark
Data ProfilingMicrosoft AzurePySparkSQLPythonData CleaningData Analytics & Visualization SoftwareData AnalysisMachine LearningData MiningETL PipelineETLData EngineeringDatabricks Platform - $80 hourly
- 5.0/5
- (20 jobs)
With over 8 years of professional experience in Data Engineering, I specialize in 𝒃𝒖𝒊𝒍𝒅𝒊𝒏𝒈 𝒂𝒏𝒅 𝒐𝒑𝒕𝒊𝒎𝒊𝒛𝒊𝒏𝒈 𝒉𝒊𝒈𝒉-𝒑𝒆𝒓𝒇𝒐𝒓𝒎𝒂𝒏𝒄𝒆 𝒅𝒂𝒕𝒂 𝒊𝒏𝒇𝒓𝒂𝒔𝒕𝒓𝒖𝒄𝒕𝒖𝒓𝒆𝒔 using Apache Spark and related big data technologies. My expertise lies in 𝒂𝒓𝒄𝒉𝒊𝒕𝒆𝒄𝒕𝒊𝒏𝒈 𝒔𝒄𝒂𝒍𝒂𝒃𝒍𝒆 𝒔𝒐𝒍𝒖𝒕𝒊𝒐𝒏𝒔 𝒇𝒐𝒓 𝒑𝒓𝒐𝒄𝒆𝒔𝒔𝒊𝒏𝒈 𝒂𝒏𝒅 𝒂𝒏𝒂𝒍𝒚𝒛𝒊𝒏𝒈 𝒎𝒂𝒔𝒔𝒊𝒗𝒆 𝒅𝒂𝒕𝒂𝒔𝒆𝒕𝒔, enabling businesses to extract actionable insights, drive innovation, and d𝒆𝒍𝒊𝒗𝒆𝒓 𝒄𝒖𝒕𝒕𝒊𝒏𝒈-𝒆𝒅𝒈𝒆, 𝒕𝒐𝒑-𝒕𝒊𝒆𝒓 𝑨𝑰 𝒔𝒐𝒍𝒖𝒕𝒊𝒐𝒏𝒔. I possess an in-depth understanding of Apache Spark and have served as the author, designer, and lead developer of several libraries for Apache Spark, including: ● 𝑺𝒑𝒂𝒓𝒌 𝑶𝑪𝑹 (𝑽𝒊𝒔𝒖𝒂𝒍 𝑵𝑳𝑷): Developed advanced solutions for Optical Character Recognition (OCR) and Visual Natural Language Processing (NLP) within the Spark ecosystem. ● 𝑷𝑫𝑭 𝑫𝒂𝒕𝒂𝑺𝒐𝒖𝒓𝒄𝒆 𝒇𝒐𝒓 𝑨𝒑𝒂𝒄𝒉𝒆 𝑺𝒑𝒂𝒓𝒌: Creator and contributor to the 𝒐𝒑𝒆𝒏-𝒔𝒐𝒖𝒓𝒄𝒆 spark-pdf datasource project, written in Scala, enhancing Spark’s data processing capabilities. 🔑 𝗞𝗲𝘆 𝗦𝗸𝗶𝗹𝗹𝘀 & 𝗘𝘅𝗽𝗲𝗿𝘁𝗶𝘀𝗲: 🔷𝑩𝒊𝒈 𝑫𝒂𝒕𝒂 𝑷𝒓𝒐𝒄𝒆𝒔𝒔𝒊𝒏𝒈 & 𝑨𝒏𝒂𝒍𝒚𝒕𝒊𝒄𝒔: Extensive experience with Apache Spark (PySpark, Spark ML, Spark Structured Streaming) to build distributed systems and handle complex ETL workflows and real-time data pipelines. 🔷𝑹𝒆𝒂𝒍-𝑻𝒊𝒎𝒆 𝑫𝒂𝒕𝒂 𝑺𝒕𝒓𝒆𝒂𝒎𝒊𝒏𝒈: Proficient in designing and deploying real-time aggregation systems with tools like Kafka, Kinesis, and Spark Streaming. 🔷𝑫𝒂𝒕𝒂 𝑬𝒏𝒈𝒊𝒏𝒆𝒆𝒓𝒊𝒏𝒈 𝑾𝒐𝒓𝒌𝒇𝒍𝒐𝒘𝒔: Skilled in end-to-end development of robust ETL processes, batch pipelines, and automated workflows to transform and enrich large-scale data. 🔷𝑪𝒍𝒐𝒖𝒅 & 𝑫𝒊𝒔𝒕𝒓𝒊𝒃𝒖𝒕𝒆𝒅 𝑺𝒚𝒔𝒕𝒆𝒎𝒔: Hands-on expertise with AWS, Databricks, and containerized environments (Docker), ensuring efficient and scalable infrastructure for big data solutions. 🔷𝑶𝒑𝒕𝒊𝒎𝒊𝒛𝒂𝒕𝒊𝒐𝒏 & 𝑷𝒆𝒓𝒇𝒐𝒓𝒎𝒂𝒏𝒄𝒆 𝑻𝒖𝒏𝒊𝒏𝒈: Proven ability to optimize Spark jobs and workflows, reducing execution time and improving throughput to handle 50GB+ daily datasets efficiently. 🔷𝑷𝒓𝒐𝒈𝒓𝒂𝒎𝒎𝒊𝒏𝒈 𝑬𝒙𝒑𝒆𝒓𝒕𝒊𝒔𝒆: Advanced skills in Scala and Python, leveraging best practices for big data applications and distributed systems. 🔷𝑫𝒂𝒕𝒂 𝑫𝒆-𝒊𝒅𝒆𝒏𝒕𝒊𝒇𝒊𝒄𝒂𝒕𝒊𝒐𝒏 & 𝑨𝒏𝒐𝒏𝒚𝒎𝒊𝒛𝒂𝒕𝒊𝒐𝒏: Expert in anonymizing sensitive data from text, images, PDFs, and DICOM files. I ensure privacy, security, and compliance with GDPR and HIPAA standards using NLP, OCR, and computer vision to remove or mask personal information, safeguarding data confidentiality. 🔷𝑯𝒆𝒂𝒍𝒕𝒉𝒄𝒂𝒓𝒆, 𝑷𝒉𝒂𝒓𝒎𝒂, 𝑴𝒆𝒅𝑻𝒆𝒄𝒉, 𝑩𝒊𝒐𝑻𝒆𝒄𝒉 𝑬𝒙𝒑𝒆𝒓𝒕𝒊𝒔𝒆: Over 5 years of experience in the healthcare and life sciences sectors, with a strong understanding of formats like DICOM, and expertise in delivering solutions specifically tailored to meet the unique needs of these industries. 𝗧𝗢𝗣 𝟱 𝗥𝗲𝗮𝘀𝗼𝗻𝘀 𝘁𝗼 𝗪𝗼𝗿𝗸 𝗪𝗶𝘁𝗵 𝗠𝗲 ✅ 𝑬𝒏𝒅-𝒕𝒐-𝑬𝒏𝒅 𝑬𝒙𝒑𝒆𝒓𝒕𝒊𝒔𝒆 ✅ 𝑪𝒐𝒎𝒑𝒍𝒆𝒙 𝑷𝒓𝒐𝒃𝒍𝒆𝒎-𝑺𝒐𝒍𝒗𝒊𝒏𝒈 𝑨𝒃𝒊𝒍𝒊𝒕 ✅ 𝑻𝒊𝒎𝒆𝒍𝒚 𝑫𝒆𝒍𝒊𝒗𝒆𝒓𝒚 ✅ 𝑻𝒓𝒂𝒏𝒔𝒑𝒂𝒓𝒆𝒏𝒕 𝑪𝒐𝒎𝒎𝒖𝒏𝒊𝒄𝒂𝒕𝒊𝒐𝒏 ✅ 𝑺𝒄𝒂𝒍𝒂𝒃𝒍𝒆 𝑺𝒐𝒍𝒖𝒕𝒊𝒐𝒏𝒔 🏆 100% Job Success Score 📈 15+ years of experience 🕛 15 000+ Upwork Hours 🎓 Master’s in Applied Mathematics 𝗣𝗿𝗼𝗳𝗲𝘀𝘀𝗶𝗼𝗻𝗮𝗹 𝗦𝗸𝗶𝗹𝗹𝘀 🛠️ 𝑷𝒓𝒐𝒈𝒓𝒂𝒎𝒎𝒊𝒏𝒈 𝑳𝒂𝒏𝒈𝒖𝒂𝒈𝒆𝒔: Python, Scala ⚡𝑩𝒊𝒈 𝑫𝒂𝒕𝒂 & 𝑫𝒊𝒔𝒕𝒓𝒊𝒃𝒖𝒕𝒆𝒅 𝑺𝒚𝒔𝒕𝒆𝒎𝒔: Big Data Processing, ETL, Stream Processing, Real-Time Aggregation, Apache Spark (PySpark, Spark ML, Spark Structured Streaming), Kinesis, Kafka, Databricks 🚀𝑪𝒍𝒐𝒖𝒅 𝑪𝒐𝒎𝒑𝒖𝒕𝒊𝒏𝒈 & 𝑰𝒏𝒇𝒓𝒂𝒔𝒕𝒓𝒖𝒄𝒕𝒖𝒓𝒆: Amazon Web Services (AWS), Distributed Systems, CI/CD Pipelines, Docker, Jenkins, Graphite, Grafana, Elasticsearch, Kibana ⚙️𝑫𝒂𝒕𝒂𝒃𝒂𝒔𝒆𝒔: PostgreSQL, MongoDB, Redis, DynamoDB 📊 𝑫𝒂𝒕𝒂 𝑺𝒄𝒊𝒆𝒏𝒄𝒆 & 𝑴𝒂𝒄𝒉𝒊𝒏𝒆 𝑳𝒆𝒂𝒓𝒏𝒊𝒏𝒈: NLP, Computer Vision, Large Language Models (LLMs), Optical Character Recognition (OCR), Model Productionalization, Deep Learning (PyTorch, TensorFlow, Hugging Face Transformers, ONNX, Pandas, CLIP) 📅 𝗔𝘃𝗮𝗶𝗹𝗮𝗯𝗶𝗹𝗶𝘁𝘆: Committed to long-term collaborations. Available full-time for your next project. 🔍 𝗞𝗲𝘆𝘄𝗼𝗿𝗱𝘀: Technical Architect Big Data Document Processing ML Infrastructure MLOps Engineer ETL & ML Pipeline Cloud Architecture (SaaS) Machine LearningApache Spark
AI DevelopmentLangChainPySparkDatabricks PlatformSoftware Architecture & DesignHugging FaceLarge Language ModelMachine LearningTesseract OCRPythonScalaPyTorchComputer VisionNatural Language Processing - $120 hourly
- 5.0/5
- (40 jobs)
✅ AWS Certified Solutions Architect ✅ Google Cloud Certified Professional Data Engineer ✅ SnowPro Core Certified Individual ✅ Upwork Certified Top Rated Professional Plus ✅ The author of Python package for cryptocurrency market Currency.com (python-currencycom) Specializing in Business Intelligence Development, ETL Development, and API Development with Python, Apache Spark, SQL, Airflow, Snowflake, Amazon Redshift, GCP, and AWS. Accomplished lots of complicated and not very projects like: ✪ Highly scalable distributed applications for real-time analytics ✪ Designing data Warehouse and developing ETL Pipelines for multiple mobile apps ✪ Cost optimization for existing cloud infrastructure But the main point: I have a responsibility for the final result.Apache Spark
Data ScrapingSnowflakeETLBigQueryAmazon RedshiftBig DataData EngineeringCloud ArchitectureGoogle Cloud PlatformETL PipelinePythonAmazon Web ServicesApache AirflowSQL - $48 hourly
- 0.0/5
- (0 jobs)
Hi! I am Data Engineer with more than 4 years of experience of building robust Data pipelines and warehouses. I can help with handling all types of data challenges. I am great enthusiast of classic and new data technologies, I love to squeeze their potential to bring new sense of data to companies. I work with many technologies like Python, Scala, SQL, Git, Airflow, Hadoop, Spark and many more, both on premise or in the Cloud. Apart from that, I am open-minded and extremely easy going person, I like to work in a good team environment. If you think that I could be a help in your data challenge, hit me a message!Apache Spark
BigQueryData Analytics & Visualization SoftwareData EngineeringGitJiraMicrosoft ExcelScalaTableauApache HadoopApache AirflowSQLAmazon Web ServicesGoogle Cloud PlatformPython - $100 hourly
- 0.0/5
- (0 jobs)
I am an experienced software developer who combines practical skills with a deep respect for the theoretical foundations of software engineering and related disciplines. In my work, I believe that different classes of problems require tailored solutions. Therefore, I continuously expand my toolkit and programming practices by attending industry conferences, reading technical literature, and staying updated through podcasts and articles. I adhere to the principles of Software Craftsmanship, which emphasize quality, attention to detail, and a responsible approach to software development. I am particularly interested in both low-level and high-level architectural patterns, as well as methodologies for analyzing and translating complex business requirements into scalable, working software. In my career, I also develop expertise in Data Science, including data exploration and analysis, designing machine learning models, and deploying them in production environments. I work with tools such as Python, scikit-learn, TensorFlow, and PyTorch, as well as techniques to optimize analytical processes. In the field of AWS, I focus on designing and implementing cloud solutions, including building scalable microservices, implementing CI/CD strategies, cost optimization, and ensuring system reliability. I have experience with key services such as AWS Lambda, S3, EC2, RDS, EKS, Machine Learning Services, I value responsibility and autonomy, both in software engineering and in everyday life. This approach allows me to deliver tangible value to all project stakeholders while fostering collaboration and understanding within teams.Apache Spark
Web ApplicationWeb APIMachine Learning ModelSpring BootSQLTeaching ProgrammingAWS CloudFormationAWS DevelopmentPythonData ScienceData Analysis - $10 hourly
- 0.0/5
- (0 jobs)
I am a Master Degree's student in Data Science, specializing in data-driven solutions. I develop scalable models using machine learning frameworks like Scikit-Learn and PyTorch, combined with SQL-based data analysis. I have achieved over 90% accuracy in customer behavior prediction and A/B testing projects. I'm here to help you align your business goals with data-driven insights.Apache Spark
Web ScrapingBig DataAWS CloudFormationTableauPySparkpandasTensorFlowSQLPythonAdobe InDesignAdobe PhotoshopAdobe Illustrator - $20 hourly
- 0.0/5
- (1 job)
- 10+ years of professional experience in JVM-based software development. - Expertise in Scala’s modern FP stack (Typelevel, ZIO, Akka), data engineering (Hadoop, AWS, Spark), and Java (Spring) for high-performance solutions. - A product-oriented mindset, focused on understanding the business domain deeply to deliver solutions that align with and exceed business expectations.Apache Spark
Distributed DatabaseDistributed ComputingMicrosoft SQL ServerPostgreSQLClickHousePySparkGoogle Cloud PlatformAmazon Web ServicesMicrosoft AzureJavaApache FlinkApache KafkaPythonScala - $12 hourly
- 0.0/5
- (0 jobs)
I'm a Data Engineer with a passion for building end-to-end data pipelines and solving real-world problems. I specialize in working with Python, SQL, Apache Spark, and Kafka — delivering reliable, scalable systems for both batch and real-time processing. 🔹 Experienced with cloud platforms like AWS,GCP, Snowflake, and Databricks 🔹 Skilled in orchestrating workflows using Airflow and CI/CD tools like Docker & GitHub Actions 🔹 Comfortable designing ETL/ELT pipelines, validating data quality, and creating insightful dashboards with Streamlit and Grafana 🔹 Built and deployed 7+ production-grade projects, including real-time fraud detection, outage monitoring, and IoT data analytics 🔹 Work independently as a B2B contractor with flexible hours — no need for sponsorship or relocation I'm self-taught, fast to adapt, and genuinely enjoy working on data problems. Whether it’s automation, analytics, or infrastructure — I’ll bring clarity to your data and value to your team.Apache Spark
Data CleaningReal Time Stream ProcessingDashboardAnalytics DashboardPostgreSQLStreamlitApache KafkaApache AirflowSQLPythonData Visualization FrameworkData AnalysisETL PipelineETL - $11 hourly
- 0.0/5
- (0 jobs)
I am a Data Engineer and Analyst with a strong background in data processing, automation, and big data technologies. Currently, I am pursuing a Master’s degree in Data Science at Cracow University of Technology, where I focus on building efficient data pipelines and optimizing data workflows. I have hands-on experience with Python, SQL, Apache Spark, Kafka, Airflow, and Power BI, allowing me to work with large datasets, automate data transformations, and develop scalable solutions. Additionally, I am proficient in Microsoft Excel and have a keen eye for detail, ensuring accuracy in data entry and organization. I am highly motivated, analytical, and always eager to take on new challenges. If you need a reliable and detail-oriented professional for your data-related tasks, feel free to reach out—I’d love to collaborate! If you're interested in checking out some of the projects I've been working on, feel free to take a look at my LinkedIn profile and GitHub repository.Apache Spark
GitGitHubMicrosoft AzureData VisualizationMicrosoft Power BIJavaApache SupersetApache KafkaETLPySparkPythonDockerNoSQL DatabaseSQL Want to browse more freelancers?
Sign up
How hiring on Upwork works
1. Post a job
Tell us what you need. Provide as many details as possible, but don’t worry about getting it perfect.
2. Talent comes to you
Get qualified proposals within 24 hours, and meet the candidates you’re excited about. Hire as soon as you’re ready.
3. Collaborate easily
Use Upwork to chat or video call, share files, and track project progress right from the app.
4. Payment simplified
Receive invoices and make payments through Upwork. Only pay for work you authorize.