Hire the best Apache Spark Engineers in the United States
Check out Apache Spark Engineers in the United States with the skills you need for your next job.
- $50 hourly
- 4.9/5
- (28 jobs)
I am a data engineer with strong experience in data crawling and data processing. I am able to get data from any website in minutes. Hence please contact me if you need - Crawling the data from any website - Processing the data as you usually do using excel - Provide the data in any format you needApache Spark
Lead GenerationData ScrapingData MiningSpring FrameworkAmazonBig DataJavaApache HadoopApache Kafka - $50 hourly
- 4.8/5
- (2 jobs)
With over 5 years of experience in data science, machine learning engineering, and comprehensive data solutions, I've cultivated a career focused on delivering tangible value through strategic data-driven initiatives. From optimizing energy consumption in industrial settings to revolutionizing predictive analytics for global logistics leaders, my journey is marked by a relentless pursuit of excellence in leveraging data for business growth. ★ Core Competencies ★ Comprehensive Data Solutions: Proficient in Python, R, SQL, and a suite of data analysis tools, with a proven track record of extracting actionable insights from complex datasets and implementing solutions across various domains. Machine Learning & AI: Expertise in developing and deploying predictive models and algorithms, specializing in areas such as LLMs, time series forecasting, optimization, and ML models. Data Engineering: Architect of robust data pipelines and microservices, adept at implementing modern CI/CD practices and scalable architectures to drive operational efficiency. Data Visualization: Skilled in transforming data into compelling visual narratives using tools like Amazon Quicksight and Power BI. Consulting & Client Engagement: Extensive experience in consulting, developing decks, and presenting data-driven strategies to Fortune 100 clients, driving stakeholder engagement and aligning data solutions with business objectives. ★ Professional Milestones ★ Cost Optimization at Hormel Foods: Led initiatives resulting in a 10% cost reduction ($500,000 in savings) by developing machine learning models to optimize energy consumption in industrial refrigeration plants. Delivery Date Prediction for Global Logistics Leader: Elevated delivery date prediction accuracy from 40% to over 75% through the development of production-ready machine learning models, deployed on Azure using Python and automated MLOps pipelines. Time Series Forecasting at Mercedes Benz Financial Services: Enhanced forecast accuracy by 20% through the development of a self-learning time series forecasting model for the collections team. Strategic Client Consulting: Successfully created and presented strategic data solutions to Fortune 100 clients, developing comprehensive decks and leading presentations that have driven key decision-making processes and resulted in successful project implementations. ★ Connect & Collaborate ★ I'm passionate about solving complex challenges through innovative data solutions. Whether you're looking to streamline operations, improve customer experience, unlock insights from your data, or drive strategic initiatives, I'm here to collaborate and drive measurable results for your business. Let's harness the power of data to propel your organization forward.Apache Spark
Deep Learning ModelingArtificial IntelligenceData VisualizationCI/CDBig DataC++Databricks Platform - $100 hourly
- 5.0/5
- (141 jobs)
— TOP RATED PLUS Freelancer on UPWORK — EXPERT VETTED Freelancer (Among the Top 1% of Upwork Freelancers) — Full Stack Engineer — Data Engineer ✅ AWS Infrastructure, DevOps, AWS Architect, AWS Services (EC2, ECS, Fargate, S3, Lambda, DynamoDB, RDS, Elastic Beanstalk, AWS CDK, AWS Cloudformation etc.), Serverless application development, AWS Glue, AWS EMR Frontend Development: ✅ HTML, CSS, Bootstrap, Javascript, React, Angular Backend Development: ✅ JAVA, Spring Boot, Hibernate, JPA, Microservices, Express.js, Node.js Content Management: ✅ Wordpress, WIX, Squarespace Big Data: ✅ Apache Spark, ETL, Big data, MapReduce, Scala, HDFS, Hive, Apache NiFi Database: ✅ MySQL, Oracle, SQL Server, DynamoDB Build/Deploy: ✅ Maven, Gradle, Git, SVN, Jenkins, Quickbuild, Ansible, AWS Codepipeline, CircleCI As a highly skilled and experienced Lead Software Engineer, I bring a wealth of knowledge and expertise in the areas of Java, Spring, Spring Boot, Big Data, MapReduce, Spark, React, Graphics Design, Logo Design, Email Signatures, Flyers, Web Development (HTML, CSS, Bootstrap, JavaScript & frameworks, PHP, Laravel), responsive web page development, Wordpress and designing, and testing. With over 11 years of experience in the field, I have a deep understanding of Java, Spring Boot, and Microservices, as well as Java EE technologies such as JSP, JSF, Servlet, EJB, JMS, JDBC, and JPA. I am also well-versed in Spring technologies including MVC, IoC, security, boot, data, and transaction. I possess expertise in web services, including REST and SOAP, and am proficient in various web development frameworks such as WordPress, PHP, Laravel, and CodeIgniter. Additionally, I am highly skilled in Javascript, jQuery, ReactJs, AngularJs, Vue.Js, and Node. C#, ASP.NET MVC In the field of big data, I have experience working with MapReduce, Spark, Scala, HDFS, Hive, and Apache NiFi. I am also well-versed in cloud technologies such as PCF, Azure, and Docker. Furthermore, I am proficient in various databases including MySQL, SQL Server, MySql, and Oracle. I am familiar with different build tools such as Maven, Gradle, Git, SVN, Jenkins, Quickbuild, and Ansible.Apache Spark
DatabaseWordPressCloud ComputingSpring FrameworkData EngineeringNoSQL DatabaseReactServerless StackSolution Architecture ConsultationSpring BootDevOpsMicroserviceAWS FargateAWS CloudFormationJavaCI/CDAmazon ECSContainerization - $110 hourly
- 5.0/5
- (4 jobs)
Accomplished Data Engineer and Scientist with over ten years of experience in designing, implementing, and optimizing complex data infrastructures and analytical solutions across prominent cloud platforms such as Google Cloud Platform (GCP), Amazon Web Services (AWS), and Microsoft Azure. My expertise includes the adept use of a wide array of technologies, from Python, SQL, and front-end to back-end development, ensuring comprehensive data management and application development. Skilled in infrastructure automation using Terraform and analytics engineering through dbt, I am proficient in enhancing data processing, management, and insightful analytics. My technical proficiency extends to managing cloud-native services, optimizing big data storage and queries using tools like BigQuery, and deploying machine learning models leveraging frameworks such as TensorFlow and PyTorch. This capability is complemented by extensive experience with AWS components like Lambda, S3, and Redshift, and Azure services including Azure Functions and Azure Data Lake, which facilitate robust data pipeline construction and scalable solutions. I have a strong background in Python for both data processing and analytics scripts, which has been integral in my work with PySpark and Apache Beam for data stream processing and batch data processing. My experience also includes real-time data streaming and integration using Kafka within cloud environments, improving system responsiveness and data flow efficiency. In each role, I have successfully led cross-functional teams in the delivery of projects that exceed business objectives by focusing on automation, scalability, and reliability. I have applied my skills across various sectors, including marketing analytics, manufacturing analytics, and energy sector analytics, demonstrating the ability to adapt and drive forward data-centric strategies and solutions that meet diverse organizational needs. Overall, my deep understanding of data engineering and science, combined with a rich toolkit of technical skills, enables me to contribute to and lead on projects that enhance decision-making, optimize operational efficiency, and drive the data-driven transformation of businesses. Dynamic Pricing Model for E-Commerce Tools: Python, TensorFlow, AWS Lambda, Amazon Redshift Led the development of a machine learning model to optimize pricing strategies dynamically based on real-time market trends, inventory levels, and consumer behavior. The model was deployed on AWS, utilizing Lambda for event-driven price adjustment and Redshift for data warehousing, resulting in a 15% increase in profit margins for a leading e-commerce retailer. Fraud Detection System for Banking Tools: Python, PySpark, Azure ML, Kafka Designed and implemented a real-time fraud detection system for a major bank. This system uses machine learning algorithms to analyze transaction patterns and detect suspicious activities instantly. Leveraging Azure ML for model deployment and Kafka for streaming transaction data, the project significantly reduced fraudulent transactions by 20%. Computer Vision for Quality Control in Manufacturing Tools: OpenCV, Python, TensorFlow, GCP Pub/Sub Engineered a computer vision system that automates the detection of defects in manufacturing lines using TensorFlow and OpenCV. Integrated with GCP’s Pub/Sub for handling stream data, this system improved defect identification accuracy by 25% and reduced manual inspection costs. Personalized Recommendation Engine for Streaming Services Tools: Python, Apache Spark, AWS S3, dbt Developed a recommendation system for a video streaming platform to personalize content suggestions. Utilizing collaborative filtering and content-based filtering techniques with Spark and hosting data on AWS S3, the system enhanced viewer engagement by recommending highly relevant content. Customer Churn Prediction Model for Telecom Tools: R, SQL, Azure Data Lake, Power BI Built a predictive model to identify risk of customer churn in a telecom company. Using R for statistical analysis and SQL for data querying, this model accessed data from Azure Data Lake, providing actionable insights that reduced churn by 10%. Supply Chain Optimization Using AI Tools: Python, GCP BigQuery, SimPy Implemented a simulation model to optimize supply chain logistics for a global retailer. The model predicts and manages inventory levels across various locations, using Python’s SimPy for simulation and GCP BigQuery for handling large datasets, leading to a 30% reduction in logistic costs. Real-time Public Sentiment Analysis for Political Campaigns Tools: Python, Natural Language Processing, Twitter API, Google Cloud Natural Language Developed a real-time sentiment analysis tool that monitors public opinion on social media for political campaigns. Employing NLP techniques and Google Cloud Natural Language for sentiment analysis, the tool provided insights that shaped campaign strategies effectively.Apache Spark
Microsoft AzureAmazon Web ServicesDatabricks PlatformBigQueryAPIAWS GlueApache KafkaPythonApache AirflowAzure Machine LearningGoogle Cloud PlatformMicrosoft Power BISQLR - $150 hourly
- 5.0/5
- (17 jobs)
As a Data and Business Intelligence Engineer, I strive to deliver consulting and freelance data engineering services, with a focus on overseeing and executing projects in alignment with customer needs. With services encompassing the full data journey, I create and implement robust data foundations that streamline process development and enable leaders to execute rapid business decisions. Three categories of service include: • Consulting: Data Strategy Development, Data & Reporting Solution Architecture, and Process Development. • Data Products/Engineering: Dashboard Development & Reporting, Data Pipelines (ETL), Process Automation, and Data Collection. • Analytics: Key Performance Indicators (KPIs), Metrics, Data Analysis, and Business Process Analysis. Leveraging over eight years of experience in business intelligence, data visualization, business analysis, and requirements analysis, I build data pipelines and translate data into actionable insights that provide a competitive edge. Tools of choice include Amazon Web Services (AWS), Databricks, Snowflake, Kafka, Snowpipe Streams, Airflow, Tableau/PowerBI, SQL, NoSQL, APIs, Python, and Spark/PySpark. Let me know what I can do for YOU!Apache Spark
APIData AnalysisDatabaseAmazon Web ServicesBusiness AnalysisSnowflakeDatabricks PlatformETL PipelinePythonApache AirflowDashboardTableauSQL - $60 hourly
- 5.0/5
- (3 jobs)
ABOUT ME: I am Lead Data Engineer with strong software development background. I have over 10 years of professional experience in IT, 7 years of which in Data Engineering. I have MS in Software Engineering from DePaul University (Chicago, IL USA) WHAT I CAN DO FOR YOU: Having worked as a Lead Data Engineer in Fortune 500 big enterprises, I can help startups with with *developing comprehensive data governance and security strategies, *designing and implementing cloud data platforms (Azure, AWS, Databricks) * data warehouse modelling * data lake/data lakehouse modelling *cost optimization of data and ML pipelines *performance optimization of data and ML pipelines TECHNICAL SKILLS Python| Java| Scala| PySpark| Apache Spark| Apache Airflow| Databricks| AWS| Azure| AWS EMR| AWS GLUE | Azure Datafactory | Azure SynapseApache Spark
Jakarta EEAndroid SDKAndroid App DevelopmentData LakeData ModelingAmazon Web ServicesMicrosoft AzureAWS LambdaAWS GluePySparkETLData EngineeringMachine LearningDatabricks PlatformSQLJavaPython - $175 hourly
- 5.0/5
- (4 jobs)
Mr. Joshua B. Seagroves is a seasoned professional having served as an Enterprise Architect/Senior Data Engineer for multiple Fortune 100 Companies. With a successful track record as a startup founder and CTO, Mr. Seagroves brings a wealth of experience to his role, specializing in the strategic design, development, and implementation of advanced technology systems. Throughout his career, Mr. Seagroves has demonstrated expertise in architecting and delivering cutting-edge solutions, particularly in the realm of data engineering and sciences. He has successfully spearheaded the implementation of multiple such systems and applications for a diverse range of clients. As part of his current responsibilities, Mr. Seagroves actively contributes to the prototyping and research efforts in the field of data engineering/data science, specifically in the development of operational systems for critical mission systems. Leveraging his extensive background in architecture and software modeling methodologies, he has consistently led and collaborated with multidisciplinary teams, successfully integrating various distributed computing technologies, including Hadoop, NiFi, HBase, Accumulo, and MongoDB. Mr. Seagroves' exceptional professional achievements and extensive experience make him a highly sought-after expert in his field. His comprehensive knowledge and hands-on expertise in advanced technology systems and big data make him a valuable asset to any organization.Apache Spark
YARNApache HadoopBig DataApache ZookeeperTensorFlowApache NiFiApache KafkaArtificial Neural NetworkArtificial Intelligence - $350 hourly
- 5.0/5
- (35 jobs)
"Michael is just FANTASTIC. He is by far the best freelancer I have worked with over the past four years. He makes the process so seamless." Ranked in the top 1% of freelancers, member of the Upwork vetted expert program, and over 12 years experience. Please reach out to me for any of your AI/ML & Data Science Needs. Please see modelforge.ai for more information.Apache Spark
Large Language ModelVisual Basic for ApplicationsModelingForecastingChatGPTNatural Language ProcessingMachine LearningPython Scikit-LearnMicrosoft ExcelSQLTensorFlowPython - $75 hourly
- 5.0/5
- (5 jobs)
Tool-oriented data science professional with extensive experience supporting multiple clients in Hadoop and Kubernetes environments, deployed with Cloudera Hadoop on-premise and Databricks in AWS. My passion is client adoption and success, with a focus on usability. With my computer science and applied math background, I have been able to fill the gap between platform engineers and users, continuously pushing for product enhancements. As a result, I have continued to create innovative solutions for clients in an environment where use-cases continue to evolve every day. I find fulfillment in being able to drive the direction of a solution in a way that allows both client and support teams to have open lanes of communication, creating success and growth. I enjoy working in a diverse environment that pushes me to learn new things. I'm interested in working on emerging solutions as data science continues to evolve.Apache Spark
RServerless StackReactApache HadoopJavaClouderaAWS LambdaApache ImpalaR HadoopBash ProgrammingPostgreSQLPythonAWS DevelopmentApache Hive - $125 hourly
- 4.8/5
- (14 jobs)
🏆 Achieved Top-Rated Freelancer status (Top 10%) with a proven track record of success. Past experience: Twitter, Spotify, & PwC. I am a certified data engineer & software developer with 5+ years of experience. I am familiar with almost all major tech stacks on data science/engineering and app development. If you require support in your projects, please do get in touch. Programming Languages: Python | Java | Scala | C++ | Rust | SQL | Bash Big Data: Airflow | Hadoop | MapReduce | Hive | Spark | Iceberg | Presto | Trino | Scio | Databricks Cloud: GCP | AWS | Azure | Cloudera Backend: Spring Boot | FastAPI | Flask AI/ML: Pytorch | ChatGPT | Kubeflow | Onnx | Spacy | Vertex AI Streaming: Apache Beam | Apache Flink | Apache Kafka | Spark Streaming SQL Databases: MSSQL | Postgres | MySql | BigQuery | Snowflake | Redshift | Teradata NoSQL Databases: Bigtable | Cassandra | HBase | MongoDB | Elasticsearch Devops: Terraform | Docker | Git | Kubernetes | Linux | Github Actions | Jenkins | GitlabApache Spark
JavaApache HadoopAmazon Web ServicesSnowflakeMicrosoft AzureGoogle Cloud PlatformDatabase ManagementLinuxETLAPI IntegrationScalaSQLPython - $50 hourly
- 4.9/5
- (8 jobs)
I have successfully harnessed a wide range of data sources, skillfully extracting and transforming them into valuable assets by leveraging cost-effective open-source architectures. In the process, I have adeptly addressed architectural and modeling challenges for businesses. I am eager to contribute my expertise to projects, enhancing their effectiveness while cutting costs through the use of open source solutions and my proven problem-solving abilities.Apache Spark
Business IntelligenceBig DataSQL ProgrammingData ModelingSASData MiningData WarehousingMicrosoft SQL ServerETLBigQuerySnowflakeSQLData Engineering - $70 hourly
- 4.7/5
- (12 jobs)
Hello, I am Matthew (you can call me Matt). I truly love data and revealing to people what it can show. I'm a data scientist by trade (MS in Data Science from Columbia University's Fu Foundation School of Engineering and Applied Science) with specialties in Data Visualization, Machine Learning, Natural Language Processing, and Data Mining. I previously worked in financial compliance and healthcare technology, but I am here to work with anything data-related, particularly to its presentation. I am always focused first on providing the most comprehensive, polished products possible in a timely and transparent manner, because clients always deserve true honesty and quality from whomever they hire. I continue to ask "How can data help solve this problem?", whatever the problem might be. I look for clear trends and patterns to try to find the most accurate resolution possible. For me, it always comes down to numbers, and how they tell the larger story. Past Data Specialist Experience: - Transaction Modeling for Anti-Crime Modeling - Health Registries and Claims Data Analysis - Political partisanship and voter demographic dashboards - Machine Learning in stock market price data - Financial similarity matrices - LSTM NLP Summarization Model Programming Experience: - Python (numPy, Pandas, matplotlib, plotly, seaborn, Dash, scikit-learn (personally taught by creator of said package), Scipy, NLTK, Tensorflow) - R (dplyr, ggplot2, Rmarkdown, shiny, lubridate, zoo, knitr) - SQL (Oracle, MS SQL, Hive) - NoSQL (MondoDB) - LaTeXApache Spark
ggplot2Data VisualizationPySparkMicrosoft Power BIApache HiveR ShinyApache HadoopSQLTableauMachine LearningPythonDeep LearningR - $175 hourly
- 4.8/5
- (26 jobs)
I am an expert in solving complex engineering problems using open-source technologies on the cloud. I am an advocate for infrastructure as code, containerization, API microservices, continuous integration and deployment, and proper use of version control systems. I can quickly analyze high-level system architectures, as well as deep dive into the actual code. My preferred languages are Python, Java, SQL, Javascript, and Bash, but I also have industry experience working with C, C++, C#, Objective-C, Ruby, Scala, Kotlin, R, Matlab, and more. Given my past employment history, I have developed particular expertise in patent analytics, time series analytics, AWS and GCP ecosystems, Kubernetes, and big data processing with Apache Spark. I am extremely excited and passionate about working with startups and founders to solve difficult problems that deliver value to the market.Apache Spark
Data ScrapingMicrosoft AzureDevOpsJenkinsClassificationPyTorchGitHubTensorFlowKubernetesAmazon Web ServicesGoogle Cloud PlatformMachine LearningCI/CDPython - $50 hourly
- 5.0/5
- (0 jobs)
I'm Chris, a data science professional: - 3 years in tech - 2 years in data (CPG and insurance) - 2 years as barista - 1+ year in retail With 3 years of professional experience in tech, including 2 years in data-centric roles in CPG and insurance involving SQL, the Python DS stack (scikit-learn, numpy, pandas, scipy, statsmodels, matplotlib, seaborn) along with Jupyter and GCP's BigQuery, and backend data virtualization for Power BI, I'm committed to engaging directly with clients, exploring big data, creating and validating machine learning models, and translating from mathematics to real-world business impact. What sets me apart, however, is my background in hospitality and retail. After 2 years as a barista and over a year in cosmetics, I love to consult with clients to uncover goals, share subject matter expertise, and directly collaborate on crafting the perfect solution while building strong, long-term client relationships. You can find my work via my GitHub: github.com/cwfrock Python: - PyTorch - PyTorch-Lightning - scikit-learn - NumPy - pandas - SciPy - statsmodels - Streamlit - Plotly - networkx - matplotlib - seaborn - altair Data Science Concepts: - Neural networks (GANs, CNNs, ANNs, LSTMs) - Supervised and unsupervised machine learning - Machine learning models - Hyperparameter tuning - Optimization - Regression, classification, clustering Data Science Platforms: - Jupyter (+ JupyterLab) - Anaconda (+ Anaconda Cloud) - Visual Studio Code - Google Cloud Platform (GCP) - Google BigQuery - Microsoft Azure SQL: - Microsoft T-SQL - SQL Server 2012 - SQL Server Management Studio (SSMS) - CTEs, subqueries, views, stored procedures, triggers - SQL Import Export Wizard - Denodo Virtual DataPort AdministratorApache Spark
Databricks PlatformAzure Machine LearningMicrosoft ExcelJupyter NotebookCalculusPhysicsBigQueryArtificial IntelligenceSQLMachine LearningPyTorchNeural NetworkPython Scikit-LearnPython - $110 hourly
- 5.0/5
- (6 jobs)
Data Engineer and Cloud Solutions Architect able to build end-to-end web applications, data pipelines and robust cloud architecture. A quick learner with a diverse background from working at both large corporations, small companies and running a business.Apache Spark
ETLFlaskMicrosoft Power BIFivetranJenkinsBashSeleniumAPI IntegrationPySparkAWS FargateFastAPIAWS LambdaCI/CDSnowflakeSQLPythondbtApache AirflowDocker - $50 hourly
- 5.0/5
- (2 jobs)
• Web scraping with BeautifulSoup and Selenium • Scalable data processing using Spark, Pandas, NumPy, EMR, and AWS Batch (Scala or Python) • Deduplication of complex records, such as detecting duplicate job postings across multiple feeds • Probabilistic data matching across imperfect datasets (e.g., names, metadata, partial identifiers) • Building database-driven web applications with Django • Interactive data visualization with Streamlit, Matplotlib and plotly • Real-time visualization of physical systems using PyQT • Enriching datasets using large language models (LLMs) • Applying physics and advanced mathematics to data-driven problems I specialize in solving complex data challenges involving extraction, scraping, standardization, matching, and de-duplication. My solutions are resilient to failure, scale efficiently with growing data volumes, and adhere to cost constraints. I am proficient in tools such as Spark, Pandas, and NumPy, and have extensive experience building scalable data pipelines in AWS using EMR, S3, and Batch. I also build front-end interfaces with Streamlit, PyQT, and Django.Apache Spark
PyQtStreamlitWeb ScrapingNumPypandasDaskAPI DevelopmentScalaDjangoPhysicsScriptingPython - $80 hourly
- 5.0/5
- (6 jobs)
I am a versatile professional with extensive expertise in MLOps, Machine Learning Engineering, Data Engineering, and Data Science. I specialize in building and deploying scalable AI solutions, automating ML pipelines, and transforming data into actionable insights to drive business value. With hands-on experience across various industries, including banking, freelancing platforms, and government services, I have a proven track record of delivering robust, production-grade machine learning systems and end-to-end AI solutions. 1. What I Offer a. MLOps & DevOps Excellence: Expertise in deploying scalable ML models using Docker, Kubernetes, and Terraform. Built CI/CD pipelines with Jenkins, Azure DevOps, and GitHub for seamless ML lifecycle management. Managed cloud-native infrastructure on AWS and Azure, integrating services like SageMaker, Azure ML, AKS, and EKS. b. Machine Learning Expertise: Delivered predictive models using LightGBM, XGBoost, and CatBoost for tasks such as job matching, customer analytics, and time-series forecasting. Developed LLM-based solutions with OpenAI GPT-3/4 for natural language processing tasks like resume matching, skill extraction, and market insights generation. Built high-performing recommendation systems, graph-based inference models, and advanced computer vision pipelines. c. Data Engineering Proficiency: Designed and implemented data pipelines for real-time and batch processing using Databricks, Azure Data Factory, and AWS services. Proficient in integrating and transforming data from diverse sources such as APIs, databases, and unstructured data streams. d. AI-Powered Insights & Analytics: Built a knowledge graph for entity linking and profile enrichment, leveraging NLP and embedding-based similarity models. Created digital twins for city-scale social issue monitoring and policy simulations using time-series forecasting and knowledge graphs. Extracted actionable insights through sentiment analysis, entity recognition, and topic modeling. 2. Key Achievements Successfully deployed scalable machine learning models for Citibank, enhancing predictive capabilities for financial operations. Developed a job connection prediction model for a leading freelancing platform, improving application and hiring rates with real-time optimization. Built a state-of-the-art face similarity search pipeline, capable of handling cross-domain challenges with high accuracy and scalability. Delivered a knowledge graph-based linking engine for entity matching, revolutionizing data enrichment processes for large-scale internet traffic. 3. Technical Skills Programming & Frameworks: Python, PyTorch, TensorFlow, Spark, Flask, FastAPI. Cloud & DevOps: AWS (SageMaker, EKS, Rekognition), Azure (AKS, Functions, ML), Terraform, Docker, Kubernetes. Machine Learning Tools: LightGBM, XGBoost, CatBoost, OpenAI GPT-3/4, MLflow. Data Tools: SQL, Elasticsearch, Databricks, GroundTruth. 4. Why Choose Me? I bring a results-driven approach to every project, ensuring that solutions are not only technically sound but also aligned with business goals. Whether you need end-to-end ML deployment, advanced analytics, or AI-driven insights, I am here to help you unlock the full potential of your data.Apache Spark
Amazon Web ServicesMicrosoft AzureMLOpsDockerLLM PromptMicrosoft Power BIElasticsearchMongoDBSQLMachine LearningPythonNatural Language ProcessingDeep LearningPython Scikit-Learn - $100 hourly
- 4.8/5
- (8 jobs)
I'm a data scientist and statistician with 3+ years of experience in tech. After working at Lucid Software for several years, I decided to go back to school to up-level my skills with a PhD in statistics at the University of Michigan. Some projects I've tackled in the past include: - Building customer lifetime value (CLV/LTV) models to save $1M+/yr - Forecasting account growth to optimize sales strategy on a team of over 200 sales reps - Leveraging time series models to set data-driven goals for customer success teams - Designing a (Bayesian) framework for analyzing hundreds of A/B tests - Building out critical pieces of data infrastructure, including a deployment of dbt for building database tables and version control for data science workloads I'm particularly strong in the following areas of data science: - Statistics modeling, especially Bayesian models - Causal inference: A/B testing, matching, propensity score weighting, randomization inference, experimental design - Time series forecasting - Custom algorithm design: I've implemented several projects 'from scratch' when the best method was not supported by standard libraries) - Theory/mathematics of probability and statistics - Data science soft skills: problem framing, project planning, communication, & data visualization I'm also proficient in these areas: - Artificial intelligence (AI) / machine learning (ML) / deep learning (DL) - Reinforcement learning (RL), especially contextual bandit algorithms I'm familiar with the following tools (but I'm open to learning others): - Deep learning frameworks: PyTorch, keras, tensorflow - SQL, especially Postgres and Snowflake - dbt - Tableau - Programming languages: Python, R, C++ - Probabilistic programming languages: Stan, PyMC3Apache Spark
ScalaArtificial IntelligenceMachine LearningStatisticsBashBayesian StatisticsdbtSQLPythonTableauR - $80 hourly
- 5.0/5
- (7 jobs)
I have been coding for over 20 years, have a computer science degree, a PhD, and 7 years of IT consulting experience for major fortune 500 clients. I am AWS certified, I have intermediate experience with GCP, and am familiar with Azure. I am proficient in Python, SQL, Typescript, Bash, Terraform, Java, React Native, and other languages. I have worked with hundreds of APIs, including Google Sheets, OpenAI, GitHub, and many more.Apache Spark
ChatGPTGPT-4Amazon S3Data ScienceScriptingGoogle Cloud PlatformAmazon Web ServicesJavaScriptBashTerraformAPI DevelopmentAWS CloudFormationGoogle SheetsPython - $110 hourly
- 5.0/5
- (4 jobs)
Professional summary: Big data and analytics enthusiast, permanent learner, with about 18 years experience of data analysis and research in experimental particle physics and 10 years of data science experience in industrial settings (advertising, automotive, supply chain, energy&utility and consulting). Co-author of many software packages in experimental particle physics and industry. Leader of a few algorithmic and physics research groups and data science groups in industry. Supervised many undergraduate/PhD students, data scientists and interns in various projects. Delivery of end-to-end ML services in business companies using on-premise and cloud technologies. Primary author of more than 30 papers published in major peer-reviewed physics journals with application of machine learning algorithms in physics experiments and industrial environments: inspirehep.net/author/profile/D.V.Bandurin.1 Business website: solveum.ai A few projects have been either delivered or in progress on Upwork. Skills: – Programming in Python, R, C++, Scala, Fortran, MatLab – SQL (incl. Postgres, Redshift, Snowflake), noSQL (Mongo, Redis, BigQuery, Cassandra, Neo4j, ElasticSearch); – Big data processing using Hadoop, Databricks, Spark, Hive, Impala; – Machine learning using scikit-learn, MLLib, MLFlow, TensorFlow, Keras, PyTorch; – Distributed deep learning using Dask, Ray, Horovod; – Reinforcement learning using RLLib, Ray, COACH, OpenAI Gym; – Natural language processing [incl. Gensim/NLTK/SpaCy; GloVe/Word2Vec/FastText/BERT, etc]; – Computer vision [incl. OpenCV, OCR]; – Azure Cloud (Databricks, Delta Lake, Azure ML, Synapse Analytics, Azure IoT Hub, IoT Edge, Functions); – AWS Cloud (RDS, Amazon S3, EC2&ECR, Elastic Beanstalk, Lambda, SageMaker, etc); – Google Cloud (Vertex AI, BigQuery, DataStudio, Kubeflow, AutoML); – IBM Watson (Audio and Text modeling, transcription services); – Data visualization (Tableau, Power BI, QuickSight, Python&R libraries, e.g. Plotly, Dash, Shiny); Recommendations: see dmitrybandurin/details/recommendations/ at LinkedIn.Apache Spark
Particle PhysicsMicrosoft AzureApache HadoopCloud ComputingAnalyticsApache HiveAmazon Web ServicesBig DataArtificial IntelligenceClouderaMachine Learning ModelC++Apache Spark MLlibComputer Vision - $100 hourly
- 5.0/5
- (5 jobs)
One of my notable achievements was adding $1M of revenue for a national energy company. I developed a machine learning model to determine churn for their customers, allowing for more “what-if” pricing analysis. Additionally, I have experience researching and fine-tuning large language models for implementation into applications, as well as using machine learning and NLP methods to predict candidates’ likelihood of passing resume screening. As a Data Science Consultant with experience fine-tuning large language models including GPT-3, I have a strong track record of delivering data science solutions that add value for my clients. I have a deep understanding of data science libraries such as Pandas, Numpy, Scikit-learn, and XGBoost, and am proficient in Python, SQL, R, and PySpark. I hold a Master of Science in Computer Science with a concentration in Data Analytics from Boston University, where I completed relevant coursework in data mining, data visualization, database management, web analytics, and software engineering. I also have a background in economics from Colgate University, where I earned a degree in Computer Science. Overall, my skills and accomplishments demonstrate that I am a skilled and experienced data science professional with a passion for delivering results for my clients. I am confident that I can apply my skills and experience to any data science project and help my clients achieve their goals.Apache Spark
Large Language ModelData AnalyticsMicrosoft ExcelData EngineeringData MiningData VisualizationGPT-3SQLNatural Language ProcessingMachine Learning ModelMachine LearningPythonDatabricks PlatformData Science - $250 hourly
- 5.0/5
- (1 job)
Daniel has deep experience combining data science and machine learning with business strategy, operations and marketing. He has both leadership and hands-on coding experience developing and applying machine learning/predictive analytics, optimization, data visualization, ETL automation, and other data tasks and techniques across a wide array of business functions and industries. He has successfully built advanced analytics/data science teams from the ground up in various industries. Currently he is the founder and principal data scientist at Tensor Data Scientists (tensords.com), a boutique data science services firm. Previously, he build and led data science and analytics teams at Red Oak Sourcing (J.V. of CVS and Cardinal Health), WP&C, J.Jill, and L.E.K. Consulting. He also led data science projects for IBM's Advanced Analytics & Optimization consulting group, Hertz, and Toys"R"Us. He holds a Master of Science in Data Science from UT Austin and an MBA from Carnegie Mellon's Tepper School of Business. Expertise Data Science, Machine Learning, Visualization, Databases, Automation, Software & Programming Languages, Data Strategy, Training, General Business Specific Skills: * Machine Learning Algorithms * Deep Learning & AI * Data Visualization * Statistical Analysis * Database Design * ETL, RPA and Data Process Automation * Python (numpy, scipy, scikit-learn, matplotlib, pytorch etc.) * R (rtidy, ggplot2, etc.) * SQL (MS, Oracle, Spark, Snowflake etc.) * Tableau * Excel with VBA * Team Building & Management * Executive Presentations, Client Management & Selling Ideas * Assessing Talent & Hiring * Project leadership & management * Training & mentoringApache Spark
Artificial IntelligenceA/B TestingAnalyticsDatabase DesignData VisualizationETLSQLData ScienceRDeep LearningPyTorchPythonMachine LearningTableau - $70 hourly
- 5.0/5
- (3 jobs)
Are you in search of a proficient Data Engineer or Analyst who can navigate the complexities of data pipelines, from initial debugging to creating insightful visualizations? I’m Owais, here to turn your data challenges into actionable insights. Why Partner with Me? Bespoke Data Solutions: Tailored data engineering and analysis services that meet your unique business objectives. End-to-End Pipeline Expertise: From data acquisition and cleaning to sophisticated analysis and visualization, leveraging tools like Python, SQL, DBT, and more. E-Commerce Data Mastery: Extensive experience in handling complex e-commerce datasets, ensuring your data not only informs but drives growth. Collaborative Success: Proven track record of working seamlessly with both upstream and downstream teams, ensuring smooth project execution. Professional Snapshot: Since joining HP in January 2022 as a Data Engineer, I’ve spearheaded projects that processed millions of records daily, integrating robust data management practices with SQL, Python, and cloud technologies (AWS & Azure). This role has sharpened my skills in: Developing large-scale ETL processes and data pipelines. Performing deep dives into data analysis and visualization, primarily using Python, Pandas, and PySpark. Ensuring data integrity through comprehensive error analysis, debugging, and monitoring. Empowering teams with data-driven insights, thanks to advanced analytics and machine learning techniques. What I Offer: Free Consultation: Let’s discuss how I can support your project or long-term data strategy. Adaptable Expertise: Whether it’s enhancing data pipeline efficiency, conducting error analysis, or visualizing complex datasets, I offer the flexibility and expertise to support diverse data needs. Ready to Transform Your Data into Insights? I’m committed to delivering exceptional value and building enduring partnerships. For a detailed discussion on how I can assist your project or team, please reach out via Upwork or email for a free consultation. Thank you for considering my expertise for your data engineering and analysis needs.Apache Spark
TableauETLIntegration TestingMachine LearningAmazon Web ServicesMicrosoft AzureData AnalysisData VisualizationdbtDatabricks PlatformPySparkPythonData EngineeringSQL - $90 hourly
- 5.0/5
- (5 jobs)
I am a Data Scientist passionate about the potential of Data Science and IoT to solve problems. My skills lie in the intersection of IoT, data science, and data engineering. I'm experienced in the end-to-end pipeline of telematics data. I build dashboards, script automations, and train predictive machine learning algorithms. I have worked primarily with Python, SQL, Dataiku, Hadoop, and Apache Data Science tools (NiFi, Kafka, Hive, Impala, HBase, etc.) As an expert in IoT data, I can quickly learn new tools to solve IoT-related problems. I follow CRISP-DM for data science projects and can confidently move a project from start to finish, with the business goal in mind. I have worked fully remotely for the last 5 years; regular communication is incredibly important to me! A little more about my background: I have been working in the Information Technology field for over a decade, with much of that time spent managing data. I began at an insurance company and helped to migrate a FoxPro database to SQL. From there, I took a role as a lead system administrator managing HP Unix, Red Hat Linux, and Windows systems in a National Guard Data Center. Meanwhile, I also served as a Signal Officer for the National Guard responsible for planning and establishing wireless communications in austere environments forApache Spark
Content WritingData ScienceApache Spark MLlibMachine LearningData AnalyticsTechnical Project ManagementData VisualizationCommunicationsAPI DevelopmentDatabase Management SystemInternet of ThingsData ScrapingPythonData Engineering - $120 hourly
- 5.0/5
- (2 jobs)
I'm an enthusiastic Data Engineer, who is deeply interested in architecting, building, scaling, and optimizing data models, data pipelines, data lakes, and data warehouses. I'm an expert in Apache Spark for batch processing to handle terabytes of data. I'm always looking toward automation, self-service, and improving productivity for both developers and products. I believe in transparency and over-communication rather than staying in silence. Hire me today, or simply sent me an invite, we can discuss your projects.Apache Spark
Web ScrapingAutomationUnix ShellETLData ProcessingMicrosoft AzureBig DataSnowflakeDatabricks PlatformPySparkApache AirflowData EngineeringPython - $55 hourly
- 5.0/5
- (1 job)
With over a decade of experience as a data engineer / software engineer, I've been heavily engaged in ETL pipelines, automation scripts, microservices, APIs, web portals, and more. 🚀 𝐌𝐲 𝐬𝐞𝐫𝐯𝐢𝐜𝐞𝐬 🚀 𝑷𝒓𝒐𝒈𝒓𝒂𝒎𝒎𝒊𝒏𝒈 𝑳𝒂𝒏𝒈𝒖𝒂𝒈𝒆: Python | JavaScript | TypeScript | SQL 𝑫𝒂𝒕𝒂 𝑬𝒏𝒈𝒊𝒏𝒆𝒆𝒓𝒊𝒏𝒈: - Apache Spark | Apache Hadoop | Apache Kafka - Databricks | Snowflake | AWS Glue - Dagster | Apache Airflow - DBT 𝑫𝒂𝒕𝒂𝒃𝒂𝒔𝒆: PostgreSQL | MsSQL | MySQL | MongoDB | DynamoDB 𝑫𝒂𝒕𝒂 𝑾𝒂𝒓𝒆𝒉𝒐𝒖𝒔𝒆: Google Big Query | AWS Redshift | Snowflake 𝑫𝒂𝒕𝒂 Visualization: Looker Studio | Power BI | Grafana 𝑫𝒆𝒗𝑶𝒑𝒔: Docker | Kubernetes | CI/CD 𝑪𝒍𝒐𝒖𝒅 𝑺𝒆𝒓𝒗𝒊𝒄𝒆𝒔: Amazon Web Services | Google Cloud Platform | Azure 💘 𝐖𝐡𝐲 𝐡𝐢𝐫𝐞 𝐦𝐞? 💘 - High quality - In-time delivery - Active communication - Optimal products Feel free to reach out if you need my help.Apache Spark
Data MigrationData WarehousingData VisualizationData AnalysisBigQueryApache KafkaBig DataETL PipelineData EngineeringJavaScriptAPI IntegrationSQLPythonWeb Application - $60 hourly
- 5.0/5
- (5 jobs)
Machine Learning and Artificial Intelligence engineer with a proven track record of innovation, including five filed patents and multiple published works. Specializing in developing advanced reinforcement learning, deep learning, computer vision, and natural language processing (NLP) techniques to optimize model performance and robustness. Expertise in leveraging cutting-edge technologies such as Large Language Models (LLM), Generative AI (GenAI), Conversational AI, and cloud-based tools like AWS SageMaker for scalable solutions. Proficient in Python, TensorFlow, PyTorch, Scikit-Learn, and Apache Spark, with hands-on experience in data manipulation using Pandas, Numpy, and containerization technologies like Docker and Kubernetes. Skilled in distributed computing with Hadoop, and adept at using Agile and Scrum methodologies for project management and delivery. Strong version control proficiency with Git, ensuring smooth collaboration across teams. Experienced in working within fast-paced, cross-functional teams, driving business growth by creating innovative, high-quality solutions. Expert in project and engagement management, working closely with stakeholders to understand business needs and translate them into actionable, technical solutions. Known for the ability to simplify complex, technical concepts into clear and accessible explanations, ensuring quality, maintainability, and timely delivery of results. 💡 Technical expertise Machine Learning Artificial Intelligence Deep Learning Computer Vision LLM NLP Generative AI Conversational AI Python Pandas Numpy Tensorflow Pytorch Scikit-Learn Apache Spark Hadoop AWS SageMaker Docker Kubernetes Agile Scrum Git 👨💻 Achievements Patent Holder: Filed 5 patents for innovative AI and machine learning technologies, contributing to cutting-edge advancements in reinforcement learning, computer vision, and NLP. AI Model Optimization: Successfully developed and deployed state-of-the-art AI models that improved model performance and robustness by over 30%, leading to enhanced predictive accuracy and business impact. Large-Scale AI Solutions: Designed and implemented scalable machine learning solutions using AWS SageMaker, TensorFlow, and PyTorch, handling datasets with millions of records and reducing processing time by 40%. Cross-Functional Leadership: Led interdisciplinary teams in the design and deployment of computer vision applications that increased operational efficiency by 25% for a major client. Generative AI Applications: Spearheaded the development of Conversational AI solutions that enhanced user engagement, resulting in a 15% increase in customer satisfaction and retention. Cloud-Based Architecture: Architected and deployed machine learning models in production environments using Docker, Kubernetes, and AWS, ensuring robust, scalable, and cost-efficient cloud solutions. Agile Project Management: Managed end-to-end AI/ML project life cycles using Agile and Scrum methodologies, delivering projects 20% faster while maintaining high standards of quality and client satisfaction. Team Mentorship: Mentored junior engineers and data scientists, fostering a culture of collaboration and knowledge-sharing, leading to improved team performance and technical innovation. Data Pipeline Optimization: Built and optimized data pipelines using Apache Spark and Hadoop, improving data processing speed by 50% and reducing operational overhead.Apache Spark
Conversational AIDockerAmazon SageMakerPyTorchTensorFlowpandasPythonGenerative AINatural Language ProcessingLarge Language ModelComputer VisionDeep LearningArtificial IntelligenceMachine Learning Want to browse more freelancers?
Sign up
How hiring on Upwork works
1. Post a job
Tell us what you need. Provide as many details as possible, but don’t worry about getting it perfect.
2. Talent comes to you
Get qualified proposals within 24 hours, and meet the candidates you’re excited about. Hire as soon as you’re ready.
3. Collaborate easily
Use Upwork to chat or video call, share files, and track project progress right from the app.
4. Payment simplified
Receive invoices and make payments through Upwork. Only pay for work you authorize.