Hire the best Apache Spark Engineers in California

Check out Apache Spark Engineers in California with the skills you need for your next job.
  • $150 hourly
    I am a professional cloud architect, data engineer, and software developer with 18 years of solid work experience. I deliver solutions using a variety of technologies, selected based on the best fit for the task. I have experience aiding startups, offering consulting services to small and medium-sized businesses, as well as experience working on large enterprise initiatives. I am an Amazon Web Services (AWS) Certified Solutions Architect. I have expertise in data engineering and data warehouse architecture as well. I am well versed in cloud-native ETL schemes/scenarios from various source systems (SQL, NoSQL, files, streams, and web scraping). I use Infrastructure as Code tools (IaC) and am well versed in writing continuous integration/delivery (CICD) processes. Equally important are my communication skills and ability to interface with business executives, end users, and technical personnel. I strive to deliver elegant, performant solutions that provide value to my stakeholders in a "sane," supportable way. I have bachelor's degrees in Information Systems and Economics as well as a Master of Science degree in Information Management. I recently helped a client architect, develop, and grow a cloud-based advertising attribution system into a multi-million $ profit center for their company. The engagement lasted two years, in which I designed the platform from inception, conceived/deployed new capabilities, led client onboardings, and a team to run the product. The project started from loosely defined requirements, and I transformed it into a critical component of my client's business.
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    Data Management
    Business Intelligence
    API Development
    Amazon Redshift
    Amazon Web Services
    MongoDB
    Data Warehousing
    ETL
    Node.js
    Docker
    AWS Glue
    Apache Airflow
    SQL
    Python
  • $85 hourly
    I'm a developer with a diverse range of experiences, having been a builder, consultant, and involved in product sales. I understand that the success of a project lies not in the technology itself but in identifying and solving the right problem for yourself or your business. I bring a wealth of experience in various programming languages, frameworks, data modeling in SQL and NoSQL databases, and CI/CD tooling. Additionally, I have expertise in working with Retrieval-Augmented Generation (RAG) models and Artificial Intelligence/Machine Learning (AI/ML) technologies, having applied RAG models, which combine traditional language models with information retrieval systems, to enhance the quality and accuracy of generated outputs by incorporating external knowledge sources. Projects: - Cohesive AI: Utilized Whisper, GPT-3.5 and Scenario.com to generate automated summaries of sales calls, update data in Salesforce and attach customers to open feature requests to reduce data duplication and improve data quality. - Barcade.ai: A web based arcade for LLM powered agents to compete on games. The goal being to introduce audience interaction into the games to build more engaging experiences where the agents have to adapt to audience controlled environments.
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    Automation
    JavaScript
    Terraform
    Vault by HashiCorp
    CI/CD
    React
    Python
    Golang
  • $150 hourly
    Do you need help getting your machine learning initiative moving? I can help move you from concept or POC to a weaponized production implementation. I'm an independent data consultant helping small and medium enterprise companies develop and execute on data strategy. I specialize in helping where off-the-shelf solutions can't. I've worked in a variety of industries including ecommerce, aerospace, dating apps, chatbots, and scraping. If your data pipeline or architecture needs work, I can help get you on track. I believe in best practices for CI/CD and reproducibility. I can walk you through how I've helped other companies achieved these goals. If your starting from a sound foundation, I can help you identify and execute on the best use cases for machine learning, computer vision, natural language processing, and stochastic optimization that will impact your business.
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    PostgreSQL
    Amazon DynamoDB
    Docker
    Elasticsearch
    AWS Lambda
    Machine Learning
    TensorFlow
  • $130 hourly
    AWS cloud architect and data engineer with 6+ years expertise in application development, data engineering, data modeling, data & machine learning pipelines, serverless applications, cloud engineering, ETL/ELT and applied Artificial Intelligence solutions. I have worked in a wide variety of domains including LLM application development, web analytics, genomics, cybersecurity, advertising, social media, and natural language processing (NLP). Services: * Cloud-hosted application development & implementation * AWS Solutions Architecture * AI & LLM applications * Software engineering * Python package development * Consultation & Auditing * Unit & Integration testing * Documentation Programming languages: * Python * TypeScript * SQL * Bash * Makefile * Cypher Databases, Data Warehouses & Storage: * PostgreSQL & MySQL (RDS, Aurora & Docker hosted) * Hadoop/Hive/PrestoDB on AWS S3 (using AWS Athena) * Snowflake * Neo4j * Kafka * DynamoDB * Weaviate & other vector databases * Redshift * DataBricks AWS Services & APIs: * Athena * Batch * CDK * CloudFormation * CloudFront * CloudWatch * CodeBuild * CodeDeploy * CodePipeline * DataSync * DataPipeline * DynamoDB * EC2 * ECS & Fargate * ECR * EFS * EMR, EMR Studio & EMR Serverless * OpenSearch (formerly AWS Elasticsearch) * Glue * IAM * Kinesis * Kinesis * Lambda * Neptune * RDS * Redshift * S3 * SQS * SSM * SNS * StepFunctions * Systems Manager * VPC & Networking Libraries & Frameworks: * CDK for TypeScript & Python * CloudFormation * AWS Serverless Application Model (SAM) * Pytest & Unittest * Apache Spark (pyspark) * pandas, polars * DSPy * Docker * Sphinx Documentation * PyTorch * Hugging Face transformers * scikit-learn * dbt Models: * Ollama models & Modelfiles for custom models * Chatbot & Large Language Model (LLM) powered application development using OpenAI and Anthropic APIs * OpenAI Whisper for audio transcription * Other models offered by Hugging Face's transformer's library My GitHub profile includes the following work: * gfe-db: Genomics data pipeline using AWS StepFunctions and Batch to build and load alleles into a Neo4j graph database running on EC2 and served through a public API for clinicians and researchers. * serverless-streaming-reddit-pipeline: Infinitely scalable serverless data mining app built on Lambda, SNS/SQS, Kinesis Firehose, S3, Glue, and Athena, capable of rapidly ingesting gigabytes of parquet data. * aws-open-data-registry-neural-search: Semantic search application of AWS Open Data Registry datasets using the Weaviate vector database. Vector databases store embeddings of records in addition to the records themselves for rapid topic modeling, Q&A search, NER and similarity searches for images, text or both using CLIP. (Work in progress as of December 2022). * aws-cdk-ec2-weaviate: CDK application to deploy a Weaviate instance on an EC2 instance. Configures Weaviate to use text2vec-transformers and sentence-transformers-multi-qa-MiniLM-L6-cos-v1 for text2vec. * emr-managed-scaling-cluster: Automated CloudFormation deployment of an EMR cluster with managed scaling policy for large workloads. Basic configuration deploys Hadoop, Hive, and Spark applications and can be configured for Flink, MXNet, Pig, Tensorflow, Delta Lake, Hudi, Iceberg and Presto. Can be combined with EMR Studio for a highly capable analytics & ETL development environment. * emr-studio: Deploy an EMR Studio environment using CDK TypeScript. Useful for organizations needing a development environment backed by an EMR cluster for analyzing large volumes of data with Jupyter Notebook (also see emr-managed-scaling-cluster). * aws-getting-started-opensearch: CloudFormation deployment following the AWS documentation tutorial for OpenSearch. Can be used as a starting point to get an OpenSearch domain up and running. * neo4j-titanic: Demonstration data pipeline to load the Titanic dataset into the Neo4j graph database. Certifications held: AWS DevOps Professional - Validation number 4115382bda3f421cafede0d8cc11b02a (1/2025 - 1/2028) AWS Developer Associate - Credential ID 00WR156CTN1EQG54 (1/2023 - 1/2026) AWS SysOps Administrator Associate - Credential ID JNX5EPN1ZM14QDCE 1/2023 - 1/2026) AWS Solutions Architect Associate - Credential ID ELY44WSCFEQQ1G5M (11/2019 - 11/2022)
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    Data Scraping
    AWS Lambda
    Database Design
    Neo4j
    Bash Programming
    Docker
    Database Architecture
    AWS CloudFormation
    PySpark
    AWS Glue
    Serverless Computing
    Amazon S3
    ETL Pipeline
    SQL
    Python
  • $100 hourly
    At UC Berkeley, I helped submit several Computer Vision and Reinforcement Learning papers. Here is a short list for context: - GANs for Model-Based Reinforcement Learning - Frame Rate Upscaling with Constitutional Networks - Neural Multi-Style Transfer At Amazon, I built a pipeline framework to store and serve sales data for the millions of third party merchants on Amazon.com. More recently, I have taken on part-time consulting. These are some of the clients and projects I have worked on in the past: - GitHub on Improving Code Classification with SVMs - SAP on Applying HANA Vora to Load Forecasting - Intuit on Quantifying Brand Exposure From Unstructured Text As opposed to these previous projects, I am looking to take on more projects, each with smaller time commitments.
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    ETL
    Apache Hadoop
    Machine Learning
    Deep Learning
    TensorFlow
    Keras
    Python
    Java
    Computer Vision
  • $105 hourly
    Tableau developer and dashboard designer with five years of development experience. Specialize in KPI selection, Tableau dashboard development, Data visualization and data transformation with SQL. Tenured analyst with over ten years of experience in the marketing analytics landscape. Extensive experience across B2B, B2C, SAAS and eCommerce business models, focused on improving revenue metrics without relying on increased marketing spend. Thrive in fast-paced, quick turnaround engagements. If you have raw data that needs to be transformed or engineered into a dataset for visualizing, dashboarding or general reporting purposes - I can't wait to help you solve your challenges. I have extensive experience while working with leading technology providers like Atlassian in providing data engineering support to transform your data from jumbled data sets into a cohesive series of tables and business logic, to then be visualized for you. Core skills include: Marketing Analytics | Visualizations | Executive Dashboards Segmentation | KPI Selection | Product Analytics Technical skills include: SQL | Tableau | ETL | Google Analytics | Data Studio | Looker And accolades I have received from prior stakeholders: - I've worked with Tyler for nearly four years and there is so much I could say about him. Even in the very beginning, Tyler was handed nearly impossible problems to solve. It seems like the harder the problem the more invested he is in solving it though. Never satisfied with being comfortable and always onto the next adventure, Tyler is an analytics powerhouse that I'm lucky to have on the team. In addition to his work ethic, his personality and ability to play as a team makes him such an important factor in team moral. He's always helping others, making people laugh, and down to hang out after work for a beer or two. I've enjoyed every minute of working with him! (Jessica Vetorino, Atlassian) - I and Tyler worked together as part of the Stride/Hipchat team. Tyler was part of the extended marketing team and was responsible for the marketing analytics function. He did a kick-ass job of reporting daily/weekly/monthly on the health of the business and actively looking for insights into the data. The last part is most important - you elevate yourself as an analyst when you don't wait for questions to be asked but you formulate them yourselves. He not only excelled analytically but was also very pleasant to work with. You want to be surrounded by team players like Tyler. I will hire him again in a heartbeat. (Raj Sarkar, Atlassian)
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    Data Visualization Framework
    ETL
    Business Logic Layer
    Databricks Platform
    Interactive Data Visualization
    Visualization
    Data Analytics & Visualization Software
    Looker Studio
    Looker
    Data Visualization
    Data Analysis
    SQL
    Tableau
    Google Analytics
  • $80 hourly
    I’m a Full-Stack Sr.Software engineer with more than 10 years of experience working in developing big-data and machine learning systems. I have extensive experience in creating scalable machine learning data science applications & high-load web applications. My main areas of expertise are: - Python 2/ 3, R, Scala, Java, PHP - Machine Learning: Regression, Decision Trees, PCA, SVD, Clustering, Image Processing - NLP: Bag of words, LDA, LSI - - Node.JS and Python Web development. - Deep Learning: Word2Vec, Neural Networks, CNN - Frameworks: Play, Django/ Flask, Spring, Symfony - Databases: SQL (MySQL, PostgreSQL), NoSQL (MongoDB, Hbase, Cassandra), Druid - Distributed Tools: Storm, Kafka, RabbitMQ, I am very proficient with data structures and algorithms. I have designed very sophisticated and scalable architectures on different cloud providers including AWS, Rackspace. I have experience in working with Fortune 100 clients, large Universities and references can be provided. OTHER - Git version control system knowledge - Project management skills
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    Amazon Web Services
    Django
    Distributed Computing
    Machine Learning
    Java
    Python
    Deep Learning
    Scala
  • $85 hourly
    🏅 TOP-RATED Plus 🏆 100% Job Success Score 🔰 7+ years of experience 🕛 3.9k Upwork Hours Hello! My name is Kerim. Thank you for taking the time to learn about my expertise in data engineering, machine learning, and generative AI. I look forward to the opportunity to work together! WHY CHOOSE ME: - **Data Engineering and Machine Learning Expert** Skilled in designing end-to-end data engineering solutions and building machine learning models to generate predictive insights and enhance business intelligence. - **Specialist in Multi-Agent AI Workflows** Experienced in developing complex multi-agent AI workflows, similar to those in CrewAI and AWS Bedrock, allowing for highly interactive, scalable solutions in areas like customer service, recommendation engines, and real-time analytics. - **Python and Cloud Proficiency** Proficient in building and optimizing ETL pipelines, working with large datasets, and deploying machine learning models in cloud environments, particularly AWS. - **Strategic and Insightful Problem-Solver** Adept at understanding project requirements and translating them into structured, actionable AI-driven solutions tailored to deliver impactful results. - **Quick Learner and Adaptable** Continuously up-to-date with emerging technologies, incorporating new ideas and best practices to achieve efficient and innovative solutions. I am open to collaborating with other developers. Whether you’re expanding your team or need an experienced data engineering and machine learning specialist, I’m here to help. I have built data engineering and AI-driven platforms across these industries: E-Commerce: ETL pipelines, personalized recommendation systems Cyber Security: Anomaly detection, AI-driven threat intelligence Web Services and API Development: Generative APIs, real-time data processing Financial Technology: Predictive analytics, fraud detection models Healthcare Services: Data-driven patient management, diagnostic AI Gaming Analytics: Player behavior modeling, multi-agent adaptive content Multi-Agent AI Workflows: Experienced in creating advanced, multi-agent systems that perform complex tasks, manage workflows, and interact intelligently across distributed platforms, similar to CrewAI and AWS Bedrock frameworks. Working hours: 40 hrs/week Feel free to reach out to discuss your project and see how I can contribute to your success. Best Regards, Kerim Tricic
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    ETL
    Data Migration
    Apache Airflow
    Computer Science
    Database Programming
    Python Script
    SQL Programming
    ETL Pipeline
    Data Visualization
    Microsoft Power BI
    SQL
    Python
  • $90 hourly
    SUMMARY An achiever computer scientist and engineer with 13+ years of experience developing high quality software and machine learning applications. Equipped with diversified computer science skills including machine learning, deep learning, network communications, Internet of things, and software engineering, I am interested in building something superb by applying my skills in data science, deep learning and Internet of Things (IoT). What I can bring to you: * Develop Deep Learning models and applications, Reinforcement Learning models, and supervised Machine Learning algorithms. * Internet of Things technologies, and IoT Smart Services, Consultation and Implementation. * Design, develop and deploy applications on cloud platforms (Google cloud platform, Amazon Web Services), Big data frameworks (Hadoop, Spark). * Design, develop, review and consult research proposals. Some of my open source projects: github.com/mehdimo
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    Software Development
    Database Management
    Big Data
    Software Design
    Biology Consultation
    Database Design
    Amazon Web Services
    Machine Learning
    Python
    Reinforcement Learning
    Supervised Learning
  • $50 hourly
    I'm a Software developer 👩‍💻 currently doing my master in Computer Science and looking for opportunities
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    Database
    Artificial Intelligence
    GraphQL
    API
    Data Structures
    Database Management System
    Data Visualization
    API Development
    Python
  • $140 hourly
    I am Sr Architect, proven leader in Reliability Engineering, DevOps and Cloud Computing World with over 15 years experience managing highly critical Infrastructure for businesses. Have proven track record of helping Organizations with getting started on Cloud or either setting up for scale or ensuring reliability of the systems. Well versed with implementation of latest CI/CD, WAF, Security and Compliance/certifications such as CCF, PCI etc for organizational needs. I am experienced with Cloud Infrastructure, especially AWS, Data systems and stream/event processing and batch processing pipelines. Any Data related question/work, I am here to help.
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    Amazon S3
    Amazon EC2
    AWS Systems Manager
    AWS CloudFormation
    Database Design
    Oracle Database Administration
    DevOps Engineering
    Apache Hadoop
    Database Architecture
    Apache HBase
    Kubernetes
    MySQL
    Cloud Computing
    Amazon Web Services
  • $75 hourly
    Professional with deep and broad experience applying quantitative analysis, statistical learning, Machine Learning and Artificial Intelligence. Experienced in implementing CI/CD throughout the MLOps lifecycle.
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    Apache Kafka
    JavaScript
    PySpark
    Artificial Intelligence
    Flask
    Tableau
    PyTorch
    Keras
    TensorFlow
    SAS
    R
    Python
    Machine Learning
    Machine Learning Model
  • $50 hourly
    I’m a Data Engineer with a strong background in building scalable data pipelines and transforming complex datasets into actionable insights. I’ve worked extensively with Iceberg and Trino to manage modern data lakehouse environments, and I’ve got hands-on experience with ML ops and building machine learning pipelines in Databricks. Whether it’s setting up cloud data warehousing, optimizing workflows, or bridging the gap between data engineering and AI solutions, I’m ready to help. I’m also skilled in Snowflake, dbt, and Airflow, and I’m available for freelance projects where you need expert data engineering to drive results.
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    Automation
    ETL Pipeline
    Machine Learning
    SQL
    Apache Airflow
    dbt
    Python
    Snowflake
  • $45 hourly
    A seasoned Software Engineer, with expertise in big data, distributed frameworks, underpinned by a strong grasp of block and object storage systems. She has demonstrated a notable proficiency in optimizing performance for AI/deep learning and data intensive applications over parallel and distributed computing frameworks. My academic journey culminated in a Ph.D. in Computer Engineering from Northeastern University, where my research focused on GPU computing, big data frameworks, and modern storage systems. I addressed data transfer bottlenecks between NVMe SSD storage and GPU, as well as distributed platforms such as Apache Spark and Hadoop. After completing her academic pursuits, I ventured into the professional landscape, start working at Samsung memory solution lab, I honed my expertise in optimizing data pipeline bandwidth for distributed object storage over fabric, specifically for AI/deep learning applications on PyTorch platform.
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    Unix Shell
    Bash Programming
    Linux
    SSD
    AWS Development
    Apache Hadoop
    Multithreaded, Parallel, & Distributed Programming Language
    Distributed Computing
    OpenCL
    PyTorch
    Python
    C++
    CUDA
  • $60 hourly
    Generative AI - LLM - STT -TTS - Talking-Face Generation - Chatbot- Spark – SQL –Databricks – ETL – Airflow – Hadoop – Python – DWH – AWS – GCP – MapReduce – BI – Analytics – NoSQL. 𝗢𝗽𝗲𝗻 𝗽𝗿𝗼𝗳𝗶𝗹𝗲 𝘁𝗼 𝘀𝗲𝗲 𝗱𝗲𝘁𝗮𝗶𝗹𝘀. 🤝 𝙒𝙃𝘼𝙏 𝙔𝙊𝙐 𝙂𝙀𝙏 𝙃𝙄𝙍𝙄𝙉𝙂 𝙈𝙀: — Data Engineering and AI Solutions: From crafting sophisticated data platforms tailored to your use case to integrating advanced chatbot solutions, I deliver end-to-end expertise. — Data Scraping and Mining: Extract as much data as you want from any source. — LLM and Chatbot Innovation: Leveraging the latest in AI, I provide guidance in implementing Large Language Models for various applications including conversational AI, enhancing user interaction through intelligent chatbot systems. — BI & Data Visualization: Proficient in tools like Tableau, Power BI, and Looker, I turn complex data into actionable insights. — Automation: Proficient in workflow platforms like Zapier, GHL, and Make.com, I turn complex processes into completed automation workflows. 😉 𝙄 𝙝𝙖𝙫𝙚 𝙚𝙭𝙥𝙚𝙧𝙞𝙚𝙣𝙘𝙚 𝙞𝙣 𝙩𝙝𝙚 𝙛𝙤𝙡𝙡𝙤𝙬𝙞𝙣𝙜 𝙖𝙧𝙚𝙖𝙨, 𝙩𝙤𝙤𝙡𝙨 𝙖𝙣𝙙 𝙩𝙚𝙘𝙝𝙣𝙤𝙡𝙤𝙜𝙞𝙚𝙨: ► AI ENGINEERING LLM models(GPT-x, Llama, LangChain), TTS(Whisper, Deepgram), STT(Bark, ElevenLabs), DeepFake(Wav2lip, Sadtalker), Model Optimization ► BIG DATA & DATA ENGINEERING Apache Spark, Apache Airflow, Hadoop, ClickHouse, Amplitude, MapReduce, YARN, Pig, Hive, HBase, Kafka, Druid, Flink, Presto (incl. AWS Athena) ► ANALYTICS, BI & DATA VISUALIZATION SQL Experienced with complex queries and analytical tasks. BI: Tableau, Redash, Superset, Grafana, DataStudio, Power BI, Looker ► WORKFLOW AUTOMATION Zapier, GoHigheLevel, Make.com, ZoHo, N8N ► OTHER SKILLS & TOOLS Docker, Terraform, Kubernetes, Pentaho, NoSQL databases 𝙈𝙮 𝙧𝙚𝙘𝙚𝙣𝙩 𝙥𝙧𝙤𝙟𝙚𝙘𝙩𝙨: — Real-time crypto status tracking and technical analysis, AI auditor for blockchain code and crypto contracts — GCP-based ML-oriented ETL infrastructure (using Airflow, Dataflow) — Real-time events tracking system (utilizing Amplitude, DataLens, serverless) — Data Analytics platform for CRM analysis of online-game websites — Data visualization project for an E-commerce company 𝙎𝙠𝙞𝙡𝙡𝙨: — Spark expert and experienced Data Engineer 😉 — Extensive experience with MapReduce and BI tools. — Effective communicator, responsible, team-oriented. — Major remote experience: I build an effective work process for a distributed team. — Interested in high load back-end development, ML, and analytical researches. — Data Visualization expert — Data Scraping expert
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    Data Visualization
    Data Scraping
    Data Warehousing
    Python
    Apache Hadoop
    Apache Airflow
    ETL
    Databricks Platform
    SQL
    Chatbot
    AI Text-to-Speech
    AI Speech-to-Text
    Natural Language Processing
    Generative AI
  • $60 hourly
    I’ve been a Software Engineer and Data Scientist. I’ve also been a Business Analyst, IT Project Manager, and Scrum Master. So I can do both technical and managerial task. This has allowed me to communicate in a way that both technical and non-technical people can both understand.
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    Agile Software Development
    Scrum
    ChatGPT
    TypeScript
    Next.js
    API
    React
    PySpark
    JavaScript
    Data Science
    NumPy
    Databricks Platform
    Computer Vision
    Python
  • $60 hourly
    Hi, I’m Najeeb Al-Amin I’m a Multimedia / IT / Business Intelligence professional with 4+ years experience in architecting, testing, and deploying highly effective and scalable Big Data solutions. With 15 years of IT and Multimedia experience and also early project experience concentrated on Building Physical Machine Networks, digital automated dialogue replacement (VoiceOver) and Virtual Server Networks under my belt I’m a one stop shop. Also experienced in complex ETL development. Focus installing, designing, configuration and administration of Hadoop architecture as well as Ecosystem components. Building data models to support business reporting and analysis requirements. Highly familiar and experienced with implementing Business Intelligence methodologies in a flexible situation based custom which has become indispensable in architecting top tier information systems. Digital Audiobook, remote and on-site Digital audio workstation engineering, and video tutorial creation are only some of the many ways a strong multimedia background helps me re-shape how I can provide quality services to clients. Effective metadata strategies are key when it comes to being able to deliver solutions. Very knowledgeable as per transfer capabilities .4+ years experience working on mid to small scale data warehouse projects and executing roles such as Big Data Developer, Big Data Consultant, Hadoop Administrator, Hive Developer, Lab Technician and Logistics Consultant.
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    Apache Hive
    Big Data
    Data Analysis
    Apache Hadoop
    Sqoop
    Apache Flume
    System Administration
    Data Modeling
    Apache Kafka
  • $120 hourly
    I was a senior machine learning engineer for Fanatics, a leading e-commerce platform. I built machine learning pipelines and models at Fanatics to match customers with products. I am experienced in AWS SageMaker, Apache Spark, and PyTorch. I would love to design and build machine learning solutions for your business.
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    Big Data
    Amazon Web Services
    NodeJS Framework
    Machine Learning
    PyTorch
  • $150 hourly
    I am an experienced and ambitious engineering leader with over a decade of professional experience building software solutions.
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    Natural Language Processing
    Machine Learning
    AWS Glue
    PostgreSQL
    Apache Kafka
    Database Design
    Data Science
    PyTorch
    Data Engineering
    AWS Lambda
    Snowflake
    Rust
    API Development
    Python
  • $25 hourly
    PROFESSIONAL SUMMARY * Accomplished in-depth knowledge of Machine Learning (ML) professional skilled in leading strategic early stage and large scale machine intelligence algorithms, aligning machine learning techniques in the field of Additive Manufacturing (AM). * Possess considerable experience across different data types, working on different AI solutions incorporating Recommender System, Computer Vision (CV), and Reinforcement Learning (RL) systems. * Demonstrated strong problem-solving and analytical skills in process troubleshooting, root cause analysis, continuous improvement, and high safety standards.
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    Amazon Web Services
    Hive
    Apache Hadoop
    Computer Vision
    Data Analysis
    Big Data
    Apache Hive
    Data Visualization
  • $75 hourly
    My expertise lies in data cleaning and organization, a crucial skill in the world of data-driven decision-making. With my background, I can be a valuable asset to businesses seeking a virtual assistant to handle data-related tasks efficiently and effectively. I have an aerospace engineer with two years of industry experience, I understand the importance of quality work delivered in good timing.
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    LangChain
    OpenAI API
    Machine Learning Algorithm
    Data Scraping
    Workato
  • $30 hourly
    I am a versatile AI/ML Developer proficient in diverse software paradigms, driving profitability, optimising business processes, enhancing energy efficiency, and ensuring customer satisfaction. With expertise in scalable AI/ML product development, I have catered to diverse industries including Renewable Energy, Education, Finance, Sales, and Healthcare. My strong leadership background, both academically and professionally, enables me to excel in creating proprietary AI/ML solutions with impactful results.
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    Web Development
    Data Analytics & Visualization Software
    Deep Learning
    Snowflake
    Apache Airflow
    Generative AI
    Energy Modeling Software
    SQL
    Python
    Data Engineering
    Data Science
    AI Development
    Machine Learning
    Business Development
  • $30 hourly
    Data Engineer with 1.5 years of experience in a Fortune 5 company, with a background in developing / optimizing data pipelines and migrating codebases. I have also interned at Microsoft and a Startup where I developed ETL pipelines and API's for applications, with expertise in Python, Scala, Apache Spark, and cloud technologies.
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    Scala
    Natural Language Processing
    Machine Learning
    ETL Pipeline
    ETL
    Data Extraction
  • $5 hourly
    I am a seasoned data analyst with 4 years of experience delivering actionable insights to drive business decisions. My expertise lies in transforming complex datasets into clear, impactful strategies that optimize performance and fuel growth. - Proficient in SQL, Python, Tableau, and Power BI, with a strong focus on data visualization and predictive modeling. - Skilled in identifying trends, reducing inefficiencies, and delivering analytics-driven solutions tailored to business needs. - Experienced in collaborating with cross-functional teams to design and implement data-driven strategies. With a commitment to precision and innovation, I excel at turning raw data into meaningful narratives that empower stakeholders. Let’s uncover opportunities and achieve measurable success together!
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    Retrieval Augmented Generation
    Artificial Neural Network
    YARN
    Apache Beam
    MongoDB
    Data Visualization
    Time Series Forecasting
    Machine Learning Model
    Data Mining
    ETL Pipeline
    Docker
    Python
    R
    SQL
  • $125 hourly
    I am an expert cloud data architect and backend engineer with 10+ years' experience in developing microservice-based complex software stacks, real-time fast-data platforms, production container deployments using Docker and/or Kubernetes, modern DevOps and CI/CD practices. I completed my Ph.D. in 2016 from the University of California at Davis. My research focused on next-generation broadband access network architectures and services, software-defined networks, and traffic modeling for modern Internet-based video applications such as IPTV. One of my papers received the Best Paper Award at ANTS 2013, and another paper was a semi-finalist in Corning Outstanding Student Paper Award at OFC 2014. I have served as reviewer for prestigious journals such as IEEE Transactions on Communications (TCOM), IEEE Journal of Optical Communication and Networking (JOCN), Photonic Network Communications (PNET). After completing my Ph.D., I joined Ennetix, Inc., a start-up incubated from UC Davis, where I was one of the very first full-time employees. During my stint at Ennetix, first as a Senior R&D Engineer and then as Director of Engineering, I designed the entire microservice-based software architecture of Ennetix's flagship AIOps product, xVisor. I was in charge of technical leadership, where I interfaced with stakeholders and customers to solidify requirements of Ennetix's software products. I made decisions regarding technology choices along the entire software stack, with a focus on quality, stability, and performance. I was also in charge of engineering management at Ennetix - I created project management and work tracking framework following agile methodologies in Azure Boards. I managed a team of developers, using Kanban boards, sprints, backlogs, and milestones. I mentored engineers at various stages of their careers, and educated on best practices on code design, style, and reviews. After more than six busy and fascinating years at Ennetix, I decided to take a break. I was battling some health issues, and needed a more flexible schedule where I could work as an individual contributor on my schedule. Freelancing was a natural fit for me. Since October 2022, I have been working with a stealth start up seeking to bring exciting new tech to the Ethereum blockchain space, especially the NFT market. I am the principal lead backend engineer working on developing real-time analytics and machine learning on Ethereum blockchain data using Apache Spark, as well as fast optimized APIs to power our first of its kind dashboard. We have been building a lot of things that is a first in this space, for example, intelligent asset valuation and advanced portfolio tracking. If you decide to hire me, you will get deep expertise on modern cloud-native data-intensive application design at pennies on the dollar. I feel like this can be a great opportunity for both me and you, and we can form a lasting relationship if we end up being a great fit. Besides software and architecture development, I can guide and mentor junior engineers, improve existing processes on build and testing pipelines, prepare design documents, manage infrastructure, improve existing cloud deployments by optimizing cost and ease of maintenance, etc. Below is a list of areas I have expertise in: - Real-time streaming applications using Apache Kafka/Apache Pulsar/Google PubSub/Azure Event Bus/AWS Kinesis, and Apache Spark/Apache Flink/ksqlDB/Kafka Streams/Google Cloud Dataflow/Azure HDInsight/AWS EMR, etc. - Restful API development in Golang/Scala - Databases such as PostgreSQL, TimescaleDB, MySQL, MariaDB - Analytics datastores such as Elasticsearch, Clickhouse, InfluxDB, Prometheus, Druid, Pinot - Caching using Redis, Dragonfly, Memcached - Infrastructure as code using Terraform, AWS CloudFormation, Azure Resource Manager - Cloud orchestration on AWS/Azure/Google Cloud Platform - Container orchestration on Kubernetes in GKE/AKS/EKS - DevOps and CI/CD using CircleCI/Travis CI/Azure DevOps/Github Actions/Gitlab Pipelines
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    Microsoft Azure
    Docker
    API Development
    Scala
    PostgreSQL
    Docker Compose
    Elasticsearch
    Terraform
    Data Engineering
    Google Cloud Platform
    Apache Kafka
    RESTful Architecture
    Golang
    Kubernetes
  • $30 hourly
    I am a highly skilled Data Scientist with a passion for leveraging data-driven insights to solve complex business problems. With a background in computer science, business analytics, and information systems and technology, I have developed a strong foundation in machine learning, data visualization, natural language processing, and statistical analysis. My professional experience includes working with technologies such as Python, Tableau, Power BI, Spark/PySpark, and SQL to perform root cause analysis, design interactive dashboards, and communicate KPI insights to technical and non-technical stakeholders. I'm a motivated and detail-oriented individual who is dedicated to providing valuable insights to organizations and helping them make data-driven decisions.
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    Database
    pandas
    Qualtrics
    Analytics
    Apache Spark MLlib
    Machine Learning
    Analytical Presentation
    Data Analysis
    Tableau
    Google Analytics
    Data Visualization
    SQL
    Microsoft Power BI
    Python
  • $70 hourly
    Service & product minded data engineer with experience building terabyte-scale data platforms from scratch, at organizations ranging from stealth mode to unicorn. Experience working in the healthcare regulated environment.
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    DevOps
    Apache Airflow
    dbt
    Snowflake
    Amazon Redshift
    Tableau
    SQL
    Python
  • Want to browse more freelancers?
    Sign up

How hiring on Upwork works

1. Post a job

Tell us what you need. Provide as many details as possible, but don’t worry about getting it perfect.

2. Talent comes to you

Get qualified proposals within 24 hours, and meet the candidates you’re excited about. Hire as soon as you’re ready.

3. Collaborate easily

Use Upwork to chat or video call, share files, and track project progress right from the app.

4. Payment simplified

Receive invoices and make payments through Upwork. Only pay for work you authorize.

Trusted by 5M+ businesses