Hire the best Apache Spark Engineers in Texas

Check out Apache Spark Engineers in Texas with the skills you need for your next job.
  • $175 hourly
    Mr. Joshua B. Seagroves is a seasoned professional having served as an Enterprise Architect/Senior Data Engineer for multiple Fortune 100 Companies. With a successful track record as a startup founder and CTO, Mr. Seagroves brings a wealth of experience to his role, specializing in the strategic design, development, and implementation of advanced technology systems. Throughout his career, Mr. Seagroves has demonstrated expertise in architecting and delivering cutting-edge solutions, particularly in the realm of data engineering and sciences. He has successfully spearheaded the implementation of multiple such systems and applications for a diverse range of clients. As part of his current responsibilities, Mr. Seagroves actively contributes to the prototyping and research efforts in the field of data engineering/data science, specifically in the development of operational systems for critical mission systems. Leveraging his extensive background in architecture and software modeling methodologies, he has consistently led and collaborated with multidisciplinary teams, successfully integrating various distributed computing technologies, including Hadoop, NiFi, HBase, Accumulo, and MongoDB. Mr. Seagroves' exceptional professional achievements and extensive experience make him a highly sought-after expert in his field. His comprehensive knowledge and hands-on expertise in advanced technology systems and big data make him a valuable asset to any organization.
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    YARN
    Apache Hadoop
    Big Data
    Apache Zookeeper
    TensorFlow
    Apache NiFi
    Apache Kafka
    Artificial Neural Network
    Artificial Intelligence
  • $125 hourly
    🏆 Achieved Top-Rated Freelancer status (Top 10%) with a proven track record of success. Past experience: Twitter, Spotify, & PwC. I am a certified data engineer & software developer with 5+ years of experience. I am familiar with almost all major tech stacks on data science/engineering and app development. If you require support in your projects, please do get in touch. Programming Languages: Python | Java | Scala | C++ | Rust | SQL | Bash Big Data: Airflow | Hadoop | MapReduce | Hive | Spark | Iceberg | Presto | Trino | Scio | Databricks Cloud: GCP | AWS | Azure | Cloudera Backend: Spring Boot | FastAPI | Flask AI/ML: Pytorch | ChatGPT | Kubeflow | Onnx | Spacy | Vertex AI Streaming: Apache Beam | Apache Flink | Apache Kafka | Spark Streaming SQL Databases: MSSQL | Postgres | MySql | BigQuery | Snowflake | Redshift | Teradata NoSQL Databases: Bigtable | Cassandra | HBase | MongoDB | Elasticsearch Devops: Terraform | Docker | Git | Kubernetes | Linux | Github Actions | Jenkins | Gitlab
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    Java
    Apache Hadoop
    Amazon Web Services
    Snowflake
    Microsoft Azure
    Google Cloud Platform
    Database Management
    Linux
    ETL
    API Integration
    Scala
    SQL
    Python
  • $80 hourly
    I am a versatile professional with extensive expertise in MLOps, Machine Learning Engineering, Data Engineering, and Data Science. I specialize in building and deploying scalable AI solutions, automating ML pipelines, and transforming data into actionable insights to drive business value. With hands-on experience across various industries, including banking, freelancing platforms, and government services, I have a proven track record of delivering robust, production-grade machine learning systems and end-to-end AI solutions. 1. What I Offer a. MLOps & DevOps Excellence: Expertise in deploying scalable ML models using Docker, Kubernetes, and Terraform. Built CI/CD pipelines with Jenkins, Azure DevOps, and GitHub for seamless ML lifecycle management. Managed cloud-native infrastructure on AWS and Azure, integrating services like SageMaker, Azure ML, AKS, and EKS. b. Machine Learning Expertise: Delivered predictive models using LightGBM, XGBoost, and CatBoost for tasks such as job matching, customer analytics, and time-series forecasting. Developed LLM-based solutions with OpenAI GPT-3/4 for natural language processing tasks like resume matching, skill extraction, and market insights generation. Built high-performing recommendation systems, graph-based inference models, and advanced computer vision pipelines. c. Data Engineering Proficiency: Designed and implemented data pipelines for real-time and batch processing using Databricks, Azure Data Factory, and AWS services. Proficient in integrating and transforming data from diverse sources such as APIs, databases, and unstructured data streams. d. AI-Powered Insights & Analytics: Built a knowledge graph for entity linking and profile enrichment, leveraging NLP and embedding-based similarity models. Created digital twins for city-scale social issue monitoring and policy simulations using time-series forecasting and knowledge graphs. Extracted actionable insights through sentiment analysis, entity recognition, and topic modeling. 2. Key Achievements Successfully deployed scalable machine learning models for Citibank, enhancing predictive capabilities for financial operations. Developed a job connection prediction model for a leading freelancing platform, improving application and hiring rates with real-time optimization. Built a state-of-the-art face similarity search pipeline, capable of handling cross-domain challenges with high accuracy and scalability. Delivered a knowledge graph-based linking engine for entity matching, revolutionizing data enrichment processes for large-scale internet traffic. 3. Technical Skills Programming & Frameworks: Python, PyTorch, TensorFlow, Spark, Flask, FastAPI. Cloud & DevOps: AWS (SageMaker, EKS, Rekognition), Azure (AKS, Functions, ML), Terraform, Docker, Kubernetes. Machine Learning Tools: LightGBM, XGBoost, CatBoost, OpenAI GPT-3/4, MLflow. Data Tools: SQL, Elasticsearch, Databricks, GroundTruth. 4. Why Choose Me? I bring a results-driven approach to every project, ensuring that solutions are not only technically sound but also aligned with business goals. Whether you need end-to-end ML deployment, advanced analytics, or AI-driven insights, I am here to help you unlock the full potential of your data.
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    Amazon Web Services
    Microsoft Azure
    MLOps
    Docker
    LLM Prompt
    Microsoft Power BI
    Elasticsearch
    MongoDB
    SQL
    Machine Learning
    Python
    Natural Language Processing
    Deep Learning
    Python Scikit-Learn
  • $100 hourly
    — TOP RATED PLUS Freelancer on UPWORK — EXPERT VETTED Freelancer (Among the Top 1% of Upwork Freelancers) — Full Stack Engineer — Data Engineer ✅ AWS Infrastructure, DevOps, AWS Architect, AWS Services (EC2, ECS, Fargate, S3, Lambda, DynamoDB, RDS, Elastic Beanstalk, AWS CDK, AWS Cloudformation etc.), Serverless application development, AWS Glue, AWS EMR Frontend Development: ✅ HTML, CSS, Bootstrap, Javascript, React, Angular Backend Development: ✅ JAVA, Spring Boot, Hibernate, JPA, Microservices, Express.js, Node.js Content Management: ✅ Wordpress, WIX, Squarespace Big Data: ✅ Apache Spark, ETL, Big data, MapReduce, Scala, HDFS, Hive, Apache NiFi Database: ✅ MySQL, Oracle, SQL Server, DynamoDB Build/Deploy: ✅ Maven, Gradle, Git, SVN, Jenkins, Quickbuild, Ansible, AWS Codepipeline, CircleCI As a highly skilled and experienced Lead Software Engineer, I bring a wealth of knowledge and expertise in the areas of Java, Spring, Spring Boot, Big Data, MapReduce, Spark, React, Graphics Design, Logo Design, Email Signatures, Flyers, Web Development (HTML, CSS, Bootstrap, JavaScript & frameworks, PHP, Laravel), responsive web page development, Wordpress and designing, and testing. With over 11 years of experience in the field, I have a deep understanding of Java, Spring Boot, and Microservices, as well as Java EE technologies such as JSP, JSF, Servlet, EJB, JMS, JDBC, and JPA. I am also well-versed in Spring technologies including MVC, IoC, security, boot, data, and transaction. I possess expertise in web services, including REST and SOAP, and am proficient in various web development frameworks such as WordPress, PHP, Laravel, and CodeIgniter. Additionally, I am highly skilled in Javascript, jQuery, ReactJs, AngularJs, Vue.Js, and Node. C#, ASP.NET MVC In the field of big data, I have experience working with MapReduce, Spark, Scala, HDFS, Hive, and Apache NiFi. I am also well-versed in cloud technologies such as PCF, Azure, and Docker. Furthermore, I am proficient in various databases including MySQL, SQL Server, MySql, and Oracle. I am familiar with different build tools such as Maven, Gradle, Git, SVN, Jenkins, Quickbuild, and Ansible.
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    Database
    WordPress
    Cloud Computing
    Spring Framework
    Data Engineering
    NoSQL Database
    React
    Serverless Stack
    Solution Architecture Consultation
    Spring Boot
    DevOps
    Microservice
    AWS Fargate
    AWS CloudFormation
    Java
    CI/CD
    Amazon ECS
    Containerization
  • $50 hourly
    🏅Expert-Vetted | 🏆 100% Job Success Rate | ⭐ 5-Star Ratings | 💎 $1 Million+ Earnings | 🕛 Full Time Availability | ✅ Verifiable projects | ❇️ 16,000+ Hours 🏆 Winner of 2 Presidential Awards in the country of operations 🏆 Member of Chamber of Commerce in the country of operations 🏆 Work Recognition on TV shows and Blogs 🏆 Regularly conducts IT Bootcamps in the country of operations  🚀 Streamlining Business Operations Through Data Empowerment! 🚀 9+ years of experience | Automation Specialist | Data Driven Operations | Web Development | Data Science | Data Management | AI/ML Implementations | Deep Learning Solutions | Process Optimization | Business Performance Enhancement 👉 Big Data: I help business owners and their teams unlock the true value of their historic data, aligning sales and marketing teams, as well as product teams with the demand and supply dynamics of both the market, and the company's product/service offerings. ✅ Big Data Tools Integration (e.g., Apache Hadoop, Apache Spark) ✅ ETL Processes Implementation (Extract, Transform, Load) ✅ Data Retention Policies ✅ Distributed Computing ✅ Hadoop Cluster Management ✅ Stream Processing Systems (e.g., Apache Kafka) ✅ NoSQL Databases (e.g., MongoDB, DynamoDB) ✅ ML Toolkits (e.g., TensorFlow) ✅ Lambda Architecture 👉 Data Science: I help build business owners and their teams robust predictive analytics systems to better align their budgeted with the actual figures at year end, thereby reducing the delta and increasing the likelihood of shareholder alignment, improving the company's going-concern assumption for the stakeholders, and thereby improving the share value in the long run. ⚡ Selecting features, building and optimising classifiers using ML techniques. ⚡ Data mining ⚡ Data collection (Apache Kafka, Logstash) ⚡Third-Party Data Integration (e.g., APIs, Selenium, Postman) ⚡ Data Integrity Verification ⚡ Anomaly Detection Systems ⚡ Query Language Proficiency (e.g., SQL, MySQL, MongoDB, BigQuery, PostgreSQL) ⚡ Scripting Skills (e.g., Python) ⚡ Statistical Analysis ⚡ Effective Communication Skills 👉 Core Technology: ✅ Frontend: HTML, CSS, JavaScript (ES6, ES7, Typescript), JavaScript frameworks (e.g., React, Next.js, Angular), GraphQL clients (e.g., Apollo), CSS preprocessors (e.g., SASS, LESS), JavaScript charting libraries (e.g., D3, Highcharts, Recharts), State management (e.g., Redux, Redux Saga, React Toolkit), RESTful APIs ✅ Backend: Node.js, Nest.js, Express, Python, ✅ Cloud Computing Platforms (AWS, Google Cloud, IBM Cloud) ✅ Testing frameworks (e.g., Jest, Mocha) ✅ CI/CD tools (e.g., GitHub Actions, Jenkins, AWS CodePipeline) ✅ Version control tools (e.g., Git) ✅ MERN Stack (React.js, Node.js, MongoDB, Express.js) ✅ CMS (WordPress, WooCommerce) ✅ Deployment: Docker, Heroku, AWS, Azure ✅ UI/UX Material Design, Figma, Miro, HTML5, CSS, JavaScript, XML ✅ AI and Machine Learning (AI Assistants and Chatbots, ML Models, Predictive Analytics, Sentiment Analysis, NLP, Audio/Video/Speech to Text) ✅ Data Visualisation (Power BI, Tableau) Driven by a passion for innovation and backed by a strong foundation in cutting-edge technologies, I'm committed to propelling your business towards unparalleled success. Let's harness the power of tech to revolutionize your operations! 💡🌟
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    Elasticsearch
    Apache Kafka
    Data Modeling
    Data Integration
    API Integration
    ETL Pipeline
    BigQuery
    Artificial Intelligence
    Django
    Data Analysis
    Data Mining
    Data Science
    Machine Learning
    Python
  • $50 hourly
    Hands-on design and development experience on Hadoop ecosystem (Hadoop, HBase, PIG, Hive, and MapReduce) including one or more of the following Big data related technologies - Scala, SPARK, Sqoop, Flume, Kafka and Python, strong ETL, PostgreSQL experience as well Strong background in all aspects of software engineering with strong skills in parallel data processing, data flows, REST APIs, JSON, XML, and microservice architecture * Experience in Cloudera Stack, HortonWorks, and Amazon EMR * Strong experience in using Excel, SQL, SAS, Python and R to dump the data and analyze based on business needs. * Strong experience in Data Analysis, Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export through the use of multiple ETL tools such as Ab Initio and Informatica PowerCenter * Strong understanding and hands-on programming/scripting experience skills - UNIX shell * An excellent team player & technically strong person who has
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    Amazon S3
    Data Warehousing & ETL Software
    Big Data
    Amazon Web Services
    Hive
    Data Science
    ETL
    Data Lake
    Data Cleaning
    Apache Hive
    Apache Hadoop
    Apache Kafka
    Data Migration
    ETL Pipeline
  • $60 hourly
    * I'm a data scientist with 9 years experience analyzing and modeling data and expertise in quantitative research and statistical programming. My track record includes statistical and predictive modeling of large-scale cross-sectional and time series data. If you need help with data cleaning, data wrangling, data analysis, visualization, result interpretation, or presentation, I can help! * I'm proficient in Python, R, MATLAB, and SQL * I'm comfortable working with any stage of project, even managing a project start to finish * Regular communication is really important to me, so let’s keep in touch!
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    Statistical Analysis
    Data Wrangling
    Data Analysis
    Analytical Presentation
    MATLAB
    Data Cleaning
    Feature Extraction
    Data Modeling
    Python
    Random Forest
    Machine Learning
    Linear Regression
    Data Visualization
    R
  • $30 hourly
    I am a professional software developer and analyst. I got my MS from Georgia Tech in computer science and a BS from UT Austin in math. I mainly use Python, SQL, R, and Java.
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    Regression Analysis
    Statistical Analysis
    R
    HTML
    TypeScript
    Snowflake
    Tableau
    Machine Learning
    Data Analysis
    SQL
    JavaScript
    Git
    Java
    Python
  • $60 hourly
    Self-motivated software engineer with 13 years of experience in designing, developing, testing, and implementing applications for Banking, Health Insurance, PBM and IT industry organizations mainly using Java, Python and React . TECHNICAL COMPETENCIES: -- Languages: Java, JavaScript, Python -- J2EE Technologies: Spring Boot, Spring Framework, Spring Data, Spring JPA, Spring Scheduler, Hibernate, JSP, JDBC, RESTful, SOAP, Sprint Cloud -- JavaScript: ReactJs, Redux, React-Router, Axios, JQuery, WebPack, Babel. -- Cloud Technologies: AWS(Lambda,S3,Route53,apiGateway,SNS,SQS,EMR,EC2, Pivotal Cloud Foundry(PCF) -- Web: HTML, CSS, Bootstrap, XML, XPath, JSON, YAML, Shell and batch Scripting, XQuery. -- Databases: Oracle, MySQL, SQL Server, PostgreSQL, MongoDB, Teradata, MariaDB, DynamoDB, DB2. -- Application Server: WebSpahere, Apache Tomcat, JBoss, GlassFish, Weblogic. -- IDE: MyEclipse, RAD, Eclipse, Netbeans, Spring Tool Suite,IntelliJ, WebStorm, Visual Code, PyCharm. -- SDLC: Agile-Scrum , Waterfall, Kanban. -- Tools: Apache Spark, Docker, JIRA, SVN,GitHub,Jenkins, XLRelease, JFrog, Maven, Ant, Ansible, CyberArk, Checkmarx, Splunk, New Relic, LDAP, Terraform -- Platforms: Windows, Linux, Unix, AIX. -- Testing Tools: Selenium, Junit, Jest, Mockito, Jasmine. -- Networks: TCP/IP, DNS, Proxy, DHCP Server, LAN / WAN, CISCO Router and Switch.
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    GitHub
    Jenkins
    Jira
    Teradata
    IBM WebSphere
    Ab Initio
    Terraform
    Docker
    Oracle
    Amazon DynamoDB
    AWS Lambda
    React
    Java
    Python
  • $49 hourly
    I’m a developer experienced in building websites for small and medium-sized businesses. Whether you’re trying to win work, list your services, or create a new online store, I can help. Backend, frontend, devops, data engineering, event sourcing, TDD - we can work it out by pairing.
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    CI/CD
    Azure App Service
    Apache Avro
    Apache Kafka
    .NET Core
    pandas
    React
    Azure Cosmos DB
    MongoDB
    GraphQL
    Azure DevOps
    SQL
    Node.js
  • $150 hourly
    8 Years of Expertise in Big Data, Cloud Technologies, and Advanced Analytics With eight years of extensive experience across data engineering, machine learning, and cloud technologies, I deliver innovative and scalable solutions tailored to diverse business needs. My expertise spans a wide range of tools, frameworks, and platforms, enabling me to manage complex data ecosystems and drive impactful insights. Core Technical Proficiencies: Programming and Scripting: > Mastery in Java, Scala, C, C#, Python, and JavaScript for building robust, scalable applications. > Advanced scripting capabilities with Bash, Shell, and tools like awk for process automation. Data Storage and Processing: > Distributed Storage: Expertise with Apache Hadoop (HDFS), Amazon S3, Azure Blob, and Google Cloud Storage. > Processing Frameworks: In-depth experience with Apache Spark, Flink, MapReduce, and Google Dataflow for batch and stream processing. > Data Lakehouses: Skilled in Snowflake, Databricks, Hive, and Delta Lake for modern analytics architecture. Databases and Data Models > SQL Databases: Extensive work with PostgreSQL, MySQL, Oracle, and MSSQL. > NoSQL Databases: Advanced proficiency in MongoDB, Cassandra, DynamoDB, and Aerospike. > Specialized Technologies: Amazon QLDB (ledger databases), Milvus and Chroma (vector databases), and graph solutions like Neo4j and JanusGraph. Real-Time Data and Messaging Systems > Proficient in setting up and managing Apache Kafka, RabbitMQ, and Amazon Kinesis for real-time data pipelines. > Expertise in distributed query engines like Presto, Trino, and Athena for high-performance data analytics. Data Integration and Transformation > Data Tools: Skilled in Logstash, Fluentd, and Talend for data ingestion and transformation. > File Formats: Comprehensive knowledge of Parquet, Avro, Delta, JSON, and other data formats. Machine Learning and Data Science > Advanced ML Techniques: Feature engineering, dimensionality reduction, and model optimization. > Algorithms: Expertise in Random Forest, Logistic Regression, SVM, and more. Tools and Platforms: Hands-on with MLflow, AWS SageMaker, and Jupyter for model development, training, and deployment. Visualization and Reporting > Dashboarding expertise with Tableau, Kibana, AWS QuickSight, and Looker for actionable insights and analytics reporting. Cloud and DevOps Proficiency > Cloud Platforms: Deep experience with AWS (Lambda, EMR, ECS), Google Cloud Platform, and Azure services. > DevOps Tools: Skilled in Docker, Kubernetes, Terraform, and ArgoCD for infrastructure management and deployment. Workflow Orchestration and Automation > Workflow Platforms: Expertise in Airflow, Prefect, and Oozie for efficient pipeline automation. Advanced AI and NLP Solutions > Proficiency in integrating OpenAI (GPT-4, LLMs), AWS Transcribe, and AWS Translate for AI-driven applications. > Specialized knowledge in RAG (Retrieval Augmented Generation) and AI-enabled data insights. Why Choose Me? > Broad Expertise: Comprehensive experience across modern data ecosystems ensures tailored, end-to-end solutions. > Scalable Solutions: Proven ability to deliver scalable and efficient systems for real-world business challenges. > Cutting-Edge Tools: Up-to-date knowledge and hands-on experience with the latest in big data, AI, and cloud technologies. > Collaborative Approach: A strong communicator and team player, committed to achieving your project’s goals efficiently. Let’s collaborate to transform your data challenges into opportunities. Get in touch to discuss your project today!
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    Architecture
    Apache Kafka
    Generative AI
    Node.js
    Java
    Python
    SQL
    NoSQL Database
    dbt
    ETL Pipeline
    Machine Learning
    Artificial Intelligence
    Apache Flink
    Databricks Platform
  • $75 hourly
    Senior Data Engineer Senior Data Engineer with 14 years of extensive experience in designing and implementing large-scale data pipelines and cloud-based solutions. Expertise spans AWS, Azure, and Snowflake, with a proven track record in building and optimizing data platforms. Proficient in Big Data technologies, including Apache Spark and Hadoop, with deep knowledge of ETL processes, data modeling, and performance tuning.
    vsuc_fltilesrefresh_TrophyIcon Apache Spark
    CSV
    JSON
    ORC
    Apache Avro
    Parquet
    Apache Hadoop
    Big Data File Format
    Big Data
    PySpark
    Oracle
    Microsoft SQL Server
    Informatica Cloud
    Databricks Platform
    Snowflake
  • Want to browse more freelancers?
    Sign up

How hiring on Upwork works

1. Post a job

Tell us what you need. Provide as many details as possible, but don’t worry about getting it perfect.

2. Talent comes to you

Get qualified proposals within 24 hours, and meet the candidates you’re excited about. Hire as soon as you’re ready.

3. Collaborate easily

Use Upwork to chat or video call, share files, and track project progress right from the app.

4. Payment simplified

Receive invoices and make payments through Upwork. Only pay for work you authorize.

Trusted by 5M+ businesses