Hire the best Apache Hive Developers in India

Check out Apache Hive Developers in India with the skills you need for your next job.
  • $35 hourly
    I have 18+ years of experience in software development in Telecom, Banking, and Healthcare domains. Primary skillsets include Big Data eco-systems (Apache Spark, Hive, Map Reduce, Cassandra), Scala, Core Java, Python, C++. I am well versed in designing and implementing Big data solutions, ETL and Data Pipelines, Serverless and event-driven architectures on Google Cloud Platform (GCP), and Cloudera Hadoop 5.5. I like to work with organizations to develop sustainable, scalable, and modern data-oriented software systems. - Keen eye on scalability, sustainability of the solution - Can come up with maintainable & good object-oriented designs quickly - Highly experienced in seamlessly working with remote teams effectively - Aptitude for recognizing business requirements and solving the root cause of the problem - Can quickly learn new technologies Sound experience in following technology stacks: Big Data: Apache Spark, Spark Streaming, HDFS, Hadoop MR, Hive, Apache Kafka, Cassandra, Google Cloud Platform (Dataproc, Cloud storage, Cloud Function, Data Store, Pub/Sub), Clouder Hadoop 5.x Languages: Scala, Python, Java, C++, C Build Tools: Sbt, Maven Databases: Postgres, Oracle Worked with different types of Input & Storage formats: CSV, XML, JSON file, Mongodb, Parquet, ORC
    vsuc_fltilesrefresh_TrophyIcon Apache Hive
    C++
    Java
    Apache Spark
    Scala
    Apache Hadoop
    Python
    Apache Cassandra
    Oracle PLSQL
    Cloudera
    Google Cloud Platform
  • $35 hourly
    Highly Skilled Data Engineer with diverse experience in the following areas: ✅ Data analysis and ETL solution expertise. ✅ Snowflake DB Expertise- Developer. ✅ Sharepoint and Onedrive Integration using Microsoft Graph API ✅ Airflow Workflow / DAG development ✅ Matillion ETL ✅ Talend ETL Expert- Integration, Java Routines, data quality. ✅ Salesforce Integration. ✅ Google Cloud Platform - Cloud Function, Cloud Run, Data Proc, Pub-Sub, Bigquery. ✅ AWS- S3, Lambda, EC2, Redshift. ✅ Cloud Migration - work with Bulk data and generic code. ✅ Python automation and API Integration ✅ SQL reporting. ✅ Data Quality Analysis and Data Governance solution architecture design. ✅ Data Validation using Great expectations(python tool) P.S. Available to work US - EST hours on demand. I have good exposure to data integration, migration, transformation, cleansing, warehouse design, SQL, Functions, and procedures. - Databases: Snowflake, Oracle, PostgreSQL, Bigquery. - ETL Tools: Matillion, Talend open studio, Talend Data Fabric with Java - DB Languages and tools: SQL, SnowSQL, DBT(Data Build Tool). - Workflow management tool: Airflow. - Scripting language - Python. - Python Frameworks: Pandas, Spark, Great Expectations, - Cloud Ecosystem: AWS, GCP
    vsuc_fltilesrefresh_TrophyIcon Apache Hive
    dbt
    Apache Hadoop
    Talend Open Studio
    Google Cloud Platform
    ETL
    Talend Data Integration
    Snowflake
    AWS Lambda
    API Integration
    JavaScript
    Apache Spark
    Amazon Web Services
    Python
    Apache Airflow
  • $35 hourly
    ════ Who Am I? ════ Hi, nice to meet you! I'm Ajay, a Tableau and SQL Specialist, Business Intelligence Developer & Data Analyst with half a decade of experience working with data. For the last few years I've been helping companies all over the globe achieve their Data Goals and making friends on the journey. If you're looking for someone who can understand your needs, collaboratively develop the best solution, and execute a vision - you have found the right person! Looking forward to hearing from you! ═════ What do I do? (Services) ═════ ✔️ Tableau Reports Development & Maintenance - Pull data from (SQL Servers, Excel Files, Hive etc.) - Clean and transform data - Model relationships - Calculate and test measures - Create and test charts and filters - Build user interfaces - Publish reports ✔️ SQL - Build out the data and reporting infrastructure from the ground up using Tableau and SQL to provide real time insights into the product and business KPI's - Identified procedural areas of improvement through customer data, using SQL to help improve the probability of a program by 7% - Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala. ═════ How do I work? (Method) ═════ 1️⃣ First, we need a plan; I will listen, take notes, analyze and discuss your goals, how to achieve them, determine costs, development phases, and time involved to deliver the solution. 2️⃣ Clear and frequent communication; I provide frequent project updates and will be available to discuss important questions that come up along the way. 3️⃣ Stick to the plan; I will deliver, on time, what we agreed upon. If any unforeseen delay happens, I will promptly let you know and provide a new delivery date. 4️⃣ Deliver a high-quality product. My approach aims to deliver the most durable, secure, scalable, and extensible product possible. All development includes testing, documentation, and demo meetings.
    vsuc_fltilesrefresh_TrophyIcon Apache Hive
    Python Script
    Scala
    Machine Learning
    Apache Spark
    Hive
    SQL Programming
    Business Intelligence
    Microsoft Excel
    Microsoft Power BI
    Tableau
    SQL
    Python
  • $30 hourly
    6+ years of experience in architecting, designing and developing software across large scalable distributed systems and web applications. In my past experiences, I have been responsible for end-to-end development of features for Paytm Mall (Ecommerce), Paytm Smart Retail (B2B) and Paytm For Business(Merchant Platform). I am currently working on development of inhouse analytics platform for flipkart as Abobe Analytics is not scaling anymore at Flipkart's scale. Languages: Java, Scala, Python, JS Technologies: Spring, Spring Boot, Apache Flink, Spark,Django , Node.js, Express, Flask Data: Hibernate, Hadoop, Hive, Hbase, Druid, MySQL, SQLite, PostgreSQL, Elastic Search, Redis, SQLAlchemy Others: Kafka, RabbitMQ, Jenkins, Kibana, Nginx, Gunicorn, Celery, Supervisor, Datadog, JIRA, Git, CI/CD, TDD
    vsuc_fltilesrefresh_TrophyIcon Apache Hive
    Amazon Web Services
    Google Cloud Platform
    Java
    Big Data
    Apache Hadoop
    Apache Spark
    Apache HBase
    Apache Flink
    Apache Kafka
    Django
    Elasticsearch
    JavaScript
    Python
    SQL
  • $15 hourly
    With a Bachelor’s degree in Computer Science, and hands-on experience using JAVA and C++ to create and implement software applications. I work as a Software engineering SDE, in a well known fintech startup , I use JAVA and C++ extensively for my day to day work. Have experience in working with advance BIG DATA frameworks such as Apache Hadoop, Apache Spark and Apace Hive. Works as SME at Chegg where I help students with there doubts and assignments in the field of Computer Science. Have 1yr+ experience in teaching.
    vsuc_fltilesrefresh_TrophyIcon Apache Hive
    PyTorch
    AWS Development
    Rust
    Golang
    Python
    LLM Prompt Engineering
    Data Engineering
    C++
    Spring Boot
    Core Java
    Apache Hadoop
    Data Structures
    Apache Spark
    MySQL
  • $40 hourly
    I am a data professional with 8 years of experience, and expertise in building data platforms and pipelines. Have helped clients to build applications with all the stages of data movement from fetching data from multitude data sources to creating final tables on data warehouses. I have primary worked on SQL and Python. Bigquery, Airflow, Data Studio are primary tools for data warehouse, data orchestration and visualization respectively. I worked with data movement, analytical dashboard building, cleaning data, and making aggregated tables. I have worked on various data sources like APIs, databases, Datawareouses, CSV files, Excel files etc. I worked on large amounts of data to process, analyze and feed them to dashboards, tune the queries and pipelines, etc. I have experience in bi tools such as Tableau. Following are some of the tools/tech stack that I work in my day to day life. 1. Airflow (Composer) 2. BigQuery 3. Python 4. Data Visualization 5. JIRA
    vsuc_fltilesrefresh_TrophyIcon Apache Hive
    Apache Beam
    Google Cloud Platform
    Looker Studio
    Statistics
    SQLite Programming
    Data Analysis
    Google Dataflow
    SQL Programming
    BigQuery
    Data Cleaning
    Apache Airflow
    Python
    ETL Pipeline
  • $40 hourly
    🚀 Greetings! 🚀 I'm a seasoned Senior Data Engineer with a robust background in architecting and implementing sophisticated data solutions that drive decision-making and business intelligence. With a knack for data wrangling, transformation, normalization, and crafting end-to-end data pipelines, I bring to the table a wealth of expertise aimed at optimizing your data infrastructure for peak performance and insight generation. 🔍 What Sets Me Apart? 🔍 Proven Track Record: Successfully deployed multiple complex data pipelines using industry-standard tools like Apache Airflow and Apache Oozie, demonstrating my capability to handle projects of any scale. Fortune 500 Experience: Contributed significantly to data platform teams at renowned companies, tackling intricate data challenges, managing voluminous datasets, and enhancing data flow efficiency. Holistic Skillset: My proficiency isn't just limited to engineering. I excel in Business Intelligence, ETL processes, and crafting complex SQL queries, ensuring a comprehensive approach to data management. Efficiency & Simplicity: I prioritize creating solutions that are not only effective but also straightforward and maintainable, ensuring long-term success and ease of use. 🛠 Tech Arsenal 🛠 Cloud Platforms: Mastery over GCP (Google Cloud Platform) and AWS (Amazon Web Services), enabling seamless data operations in the cloud. Programming Languages: Skilled in Java, Scala, and Python, offering versatility in tackling various data engineering challenges. Data Engineering Tools: Expertise in Spark, Pyspark, Kafka, and more, equipped to build robust data processing applications. Data Warehousing: Proficient with AWS Athena, Google BigQuery, Snowflake, ensuring scalable and efficient data storage solutions. Orchestration & Scheduling: Adept in managing complex workflows with tools like Airflow and Oozie, coupled with container orchestration using Docker. 🌟 Why Collaborate With Me? 🌟 Beyond my technical prowess, I am detail-oriented, organized, and highly responsive, prioritizing clear communication and project efficiency. I am passionate about unlocking the potential of data to fuel business growth and innovation. Let's embark on this data-driven journey together! Connect with me to discuss how we can elevate your data infrastructure to new heights.
    vsuc_fltilesrefresh_TrophyIcon Apache Hive
    Apache Airflow
    Apache Kafka
    Data Warehousing
    Data Lake
    ETL Pipeline
    ETL
    AWS Lambda
    AWS Glue
    Microsoft Azure
    Data Integration
    Data Transformation
    PySpark
    SQL
    Python
  • $35 hourly
    Seasoned, solution-oriented engineer with 10 years of experience in designing and implementing robust systems. Highly experienced in near real time streaming analytics, distributed micro-services architecture and reactive systems. Worked on multiple areas of development from design, coding to performance tuning, customer issues and cost saving automation.
    vsuc_fltilesrefresh_TrophyIcon Apache Hive
    Apache Spark
    Cloudera
    MySQL
    RESTful Architecture
    Java
    Kubernetes
    Python
    Terraform
    MongoDB
    Cloud Architecture
    Analytics
    NGINX
    Google Cloud Platform
    Apache Kafka
    Apache Airflow
    Spring Boot
  • $60 hourly
    Nikhil is a Microsoft certified azure data engineer with 5+ years of experience in data engineering and big data. Have worked for couple of fortune 500 companies for developing and deploying their data solutions in azure and helped them find business insights out of their data. Coding: - SQL, Python, Pyspark Azure: - Azure Data Factory - Azure Databricks - Azure Synapse Analytics - Azure Datalake - Azure Functions and other azure services Reporting: - Power BI - Microsoft Office
    vsuc_fltilesrefresh_TrophyIcon Apache Hive
    ETL
    Microsoft Azure
    Data Lake
    Data Warehousing
    Microsoft SQL Server
    Big Data
    PySpark
    Databricks Platform
    SQL
    Apache Spark
    Python
    Microsoft Excel
    Data Engineering
    Data Integration
  • $60 hourly
    I am a DevOps Engineer with 8 years of experience. * Experienced working with Hadoop ecosystem with components like Sqoop, Flume, Kafka, Spark, Hive, Impala etc, building Data marts and data lakes. * Worked on various AWS based big data tools such as EMR, AWS Data Pipeline, AWS Glue, Lambda etc. * Implemented various Azure based big data solutions on services such as Azure Data factory, Azure Databricks, Hdinsights, Data lake storage Gen 2, etc. * Experienced in functional programming using Scala. * Automated various tasks using Python. * Experienced with NoSql databases - ELK, MongoDB. * Experienced in writing simple to complex SQL queries. * Experienced in data scraping and cleaning and data analysis in Python and R. * Automated CI/CD using GitLab CI, Bitbucket pipelines, Azure DevOps, AWS pipelines, GitHub Actions * Worked with Ansible for automated configuration management. * Created end-to-end infrastructure using Terraform in AWS and Azure, GCP and OCI * Expertise with Kubernetes, Helm, Docker, helmfile etc.
    vsuc_fltilesrefresh_TrophyIcon Apache Hive
    DevOps
    CI/CD
    Apache Spark
    Apache Kafka
    Amazon Web Services
    Terraform
    Microsoft Azure
    Kubernetes
    Deployment Automation
    Docker
    Packer
    Git
    Python
  • $70 hourly
    • A creative hands-on with around 12 years of experience, exceptional technical skills, and a business-focused outlook. Adept in analyzing information system needs, evaluating end-user requirements, custom designing solutions for complex information systems management • Vast experience in data driven applications ,creating data pipe lines, creating interfaces between up-stream and down-stream applications and tuning the pipe lines. • Interacting with business team to discuss and understand the data flow and designing the data pipelines as per the requirements. • Experience in driving the team to meet the target deliverables. Strong experience in creating scalable and efficient big data pipelines using Spark, Hadoop, Hive, Pyspark, Python,Snowflake,DBT and Airflow • Commendable experience in cloud data warehousing SNOWFLAKE. . Experience in snowflake development , data sharing, advanced features of snowflake Strong experience in integrating snowflake with DBT and creating data layers on the Snowflake warehouse using DBT • Expertise skill in SQL • Have strong exposure in PYTHON. • Strong experience on Hadoop • Strong experience in implementing ETL pipelines using SPARK. • Strong experience in tunings the SPARK applications. • Extensively used SPARK SQL to clean the data and to perform calculations on datasets. • Have strong experience in HIVE. • Strong experience in HIVE query tuning. • Worked on different big data file formats such as Parquet, ORC, etc.. • Familiar with AZURE Databricks. Decent exposure on Airbyte, Bigquery, Terraform • Expertise in Analytical functions. • Have strong exposure in converting the data into business insights. • Decent knowledge on data lake and data marts concepts. • Experience in Creation of Tables, Views, Materialized Views, Indexes using SQL and PL/SQL. • • In-depth knowledge of PL/SQL with the experience in constructing the tables, joins, sub queries and correlated sub queries In SQL * Plus. • Proficient in Developing PL/SQL Programs Using Advanced Performance Enhancing Concepts like Bulk Processing, Collections and Dynamic SQL • Sound knowledge in using Oracle materialized views • Effectively made use of Indexes, Collections, and Analytical functions • Sound knowledge in using Oracle SQL Loader and External Tables. • Has good knowledge and exposure in designing and developing user defined stored procedures and user defined functions. • Having experience in using Packages UTL_FILE, DBMS_JOB and DBMS_SCHEDULE. • Skilled in handling critical application and business validation oriented trigger logic. • Has good Knowledge in trapping runtime errors by providing Suitable Exception Handlers.
    vsuc_fltilesrefresh_TrophyIcon Apache Hive
    Apache Airflow
    Databricks Platform
    Apache Spark
    Python
    Apache Hadoop
    PySpark
    Snowflake
    Amazon S3
    dbt
    Database
    Oracle PLSQL
    Unix Shell
  • $50 hourly
    Hello, I'm Raj, a data/ML professional with over 7+ years of experience in building large scale recommender systems and building data science solution in Adtech space - Data-driven statistician with a passion for leveraging insights to drive well-informed business decisions. - 2+ years of experience driving impactful data science solutions in the microblogging domain, contributing to the advancement of ML capabilities at Koo. - 5+ years of experience delivering innovative data science solutions in Ad-tech and e-commerce, spanning the programmatic stack (SSP, DSP, DMP, RTB). - Also, I've utilized the AWS tech stack to quickly analyze billions of log events and extract actionable insights from 2TB/day of RTB logs. - Enthusiastic learner, actively participating in MOOCs, translating knowledge into projects like Gitdiscoverer.com. In my industry experience, I've worked with: - Languages: Python, R, PySpark - Dashboards & Visualizations: Rshiny, Apache Superset, Kibana, Redash, Metabase- - Cloud: AWS, GCP - Database/Query: MySQL, Citus(PostgreSQL), Hive, Elastic Search, Amazon Athena, MongoDB, Postgres, Citus (PostgreSQL), Snowflake, Hive - Transformations: AWS GLUE, DBT - Cloud data warehouse: Amazon Redshift, Snowflake - Big Data Frameworks: Apache Spark, Databricks Notable Achievements at Koo: - Immense pride in leading the development of the 'Recommended For You' feature (also known as 'People You May Know' on LinkedIn) for Koo (India's twitter), built from scratch with a remarkable team effort. - Implemented a weighted PageRank algorithm at Koo to enhance content personalization and user recommendation systems. - Designing and developing insightful dashboards for the ML initiatives, providing key stakeholders with valuable visualizations and actionable insights - Designing and developing a suite of ETL processes that empowered data transformation for downstream Machine Learning products - Led the development of the 'For You' tab, applying advanced techniques such as Locality-Sensitive Hashing (LSH) for vector search within PySpark to deliver scalable and personalized content recommendations - Led the migration of the entire recommender system from AWS to GCP and transitioned Koo's search functionality from managed to self-hosted OpenSearch, enhancing performance, scalability, and control while significantly reducing operational costs At Class one exchange (C1X): - Setting up self-serve analytics: enabled data access to the entire organization by setting up Apache Superset and Metabase. This helped folks access reports in a self-service manner using industry-leading tools. Allowing easy access to rich reports is truly a game-changer in any organization - Developed a URL classification model to identify and enrich an ad impression to provide more context for campaign selection - AWS cost optimization by developing a routing model to match ad impressions with bidders. - Predicting the anomalous behavior in revenue and sending an alert with the affected metrics. Stakeholders were alerted when anything went wrong in the system like revenue was down or in some cases we observed overspend. This is a game-changer for any product manager who generally is answerable to management in these kinds of situations. - Identifying top monetization friendly users based on various buying intents. This helped us identify which publisher had a premium audience and made us focus more of our efforts on growing those accounts.
    vsuc_fltilesrefresh_TrophyIcon Apache Hive
    LLM Prompt Engineering
    Data Management
    ETL
    PostgreSQL
    Analytics
    R Shiny
    Apache Spark
    Machine Learning
    Data Science
    Data Analysis
    Statistics
    R
    Python
    SQL
  • $20 hourly
    I am an experienced Data Engineer having 10 years of experience with hands-on expertise in ETL, Data Engineering, Data Modelling, Data Integration, and Data Warehouse. If you are looking for someone with a broad skill set who can work as a team member and take the responsibilities of the tasks. I have experience and knowledge in the following areas, tools, and technologies: Data Storage: S3, Azure Storage Data Warehouse: Google BigQuery, Snowflake, Azure Synapse Analytics DATABASES: SQL Server, MySQL, PostgreSQL, MongoDB, Oracle, Google Cloud Big Table Data Lake: Azure Data Lake Storage, AWS Lake Formation Data Transformation, Data Integration, Data Governance, Data Quality: Azure Data Factory, Azure Synapse, Amazon Glue, Fivetran, DBT, PySpark Monitoring Tools: Amazon Cloudwatch, New Relic, Dynatrace, Datadog BI, Visualization, Data Analyst: Looker, Power BI, Google Data Studio, Sigma Computing OTHER SKILLS & TOOLS: SQL, Python, Rest API I can also work on different technologies and tools if required
    vsuc_fltilesrefresh_TrophyIcon Apache Hive
    Microsoft Azure
    Microsoft Azure SQL Database
    Amazon S3
    Data Modeling
    Microsoft SQL Server Programming
    Microsoft SQL Server
    Data Warehousing
    PySpark
    Snowflake
    SQL
    Apache Spark
    Apache Airflow
    ETL Pipeline
    Python
  • $30 hourly
    Seasoned data engineer with over 11 years of experience in building sophisticated and reliable ETL applications using Big Data and cloud stacks (Azure and AWS). TOP RATED PLUS . Collaborated with over 20 clients, accumulating more than 2000 hours on Upwork. 🏆 Expert in creating robust, scalable and cost-effective solutions using Big Data technologies for past 9 years. 🏆 The main areas of expertise are: 📍 Big data - Apache Spark, Spark Streaming, Hadoop, Kafka, Kafka Streams, HDFS, Hive, Solr, Airflow, Sqoop, NiFi, Flink 📍 AWS Cloud Services - AWS S3, AWS EC2, AWS Glue, AWS RedShift, AWS SQS, AWS RDS, AWS EMR 📍 Azure Cloud Services - Azure Data Factory, Azure Databricks, Azure HDInsights, Azure SQL 📍 Google Cloud Services - GCP DataProc 📍 Search Engine - Apache Solr 📍 NoSQL - HBase, Cassandra, MongoDB 📍 Platform - Data Warehousing, Data lake 📍 Visualization - Power BI 📍 Distributions - Cloudera 📍 DevOps - Jenkins 📍 Accelerators - Data Quality, Data Curation, Data Catalog
    vsuc_fltilesrefresh_TrophyIcon Apache Hive
    SQL
    AWS Glue
    PySpark
    Apache Cassandra
    ETL Pipeline
    Apache NiFi
    Apache Kafka
    Big Data
    Apache Hadoop
    Scala
    Apache Spark
  • $15 hourly
    I am Big Data Engineer with expertise in Hadoop, Cloudera and Horton Works Distributions and also Azure Data Services proficiency. Having good experience in all trending popular tools and technologies like Azure: Azure Data Factory, Azure Logic Apps, Azure Function apps, Azure Event Hub and Azure Service bus, Azure SQL DB. Apache: Apache Spark, Apache NIFI, Apache Kaka , Apache Hive. Having strong knowledge in programming languages like Java, Scala and Python. Also have good knowledge in SAP process.
    vsuc_fltilesrefresh_TrophyIcon Apache Hive
    Microsoft Azure
    ETL Pipeline
    Apache Cassandra
    Apache Hadoop
    Database Design
    Apache Spark
    Apache Kafka
    Apache NiFi
    Elasticsearch
  • $5 hourly
    Working in the information technology and services industry. Skilled in Python, C++, Java, C, and Web Development. Strong education professional with a Bachelor of Technology - BTech focused in Computer Science from Galgotias University.
    vsuc_fltilesrefresh_TrophyIcon Apache Hive
    Data Structures
    PySpark
    NumPy
    C
    Machine Learning
    Apache Hadoop
    Apache Spark
    Scala
    MySQL
    Java
    Python
    C++
    HTML
    CSS
  • $29 hourly
    *Experience* • Have hands-on experience upgrading the HDP or CDH cluster to Cloudera Data Private Cloud Platform [CDP Private Cloud]. • Extensive experience in installing, deploying, configuring, supporting, and managing Hadoop Clusters using Cloudera (CDH) Distributions and HDP hosted on Amazon web services (AWS) cloud and Microsoft Azure. • Experience in pgrading of Kafka, Airflow and CDSW • Configured various components such as HDFS, YARN, Sqoop, Flume, Kafka, HBase, Hive, Hue, Oozie, and Sentry. • Implemented Hadoop security. • Deployed production-grade Hadoop cluster and its components through Cloudera Manager/Ambari in a virtualized environment (AWS/Azure Cloud) as well as on-premises. • Configured HA for Hadoop services with backup & Disaster Recovery. • Setting Hadoop prerequisites on Linux server. • Secured the cluster using Kerberos & Sentry as well as Ranger and tls. • Experience in designing and building scalable infrastructure and platforms to collect and process very large amounts of structured and unstructured data. • Experience in adding and removing nodes, monitoring critical alerts, configuring high availability, configuring data backups, and data purging. • Cluster Management and troubleshooting on the Hadoop ecosystem. • Performance tuning, and solving Hadoop issues using CLI, CMUI by apache WebUI. • Report generation of running nodes using various benchmark operations. • Worked on AWS services such as EC2 instances, S3, Virtual private cloud, Security groups, and Microsoft Service like resource groups, resources (VM, disk, etc.), Azure blob storage, Azure storage replication. • configure private and public IP addresses, network routes, network interface, subnets, and virtual network on AWS/Microsoft Azure. • Troubleshooting, diagnosing, performance tuning, and solving the Hadoop issues. • Administration of Linux installation. • Fault finding, analysis and logging information for report. • Expert in administration of Kafka and deploying of UI tools to manage Kafka • Implementing HA for MySQL • Installing/Configuring Airflow for orchestration of jobs
    vsuc_fltilesrefresh_TrophyIcon Apache Hive
    Apache Kafka
    Apache Airflow
    Apache Spark
    YARN
    Hortonworks
    Apache Hadoop
    Apache Zookeeper
    Cloudera
    Apache Impala
  • $30 hourly
    Optimistic, forward-looking software developer with 1.5+ years' background in creating and executing innovative software solutions. I have worked on Big Data project and have experience to handle a huge amount of data. - I'm experienced in Spark, Scala, Hive, HDFS. - I'm experienced in Docker and Kubernetes. - I have a good data handling skill. - Design before Code, Always!
    vsuc_fltilesrefresh_TrophyIcon Apache Hive
    Docker Compose
    PySpark
    Kubernetes
    Apache Flume
    Docker
    Apache Hadoop
    MySQL
    Apache Kafka
    Apache Spark
    Spring Boot
    RESTful API
    Java
    Scala
    Python
  • $5 hourly
    Data Engineer with 3 years of relevant work experience. Skills : Pyspark, Python, SQL, HDFS, Hive, Bash Scripting Projects : 1. POC in which we developed Pyspark application to replace an ETL tool to cut down the cost and increase process efficiency with the power of spark 2. Migrate Abinitio to Pyspark, in which we gain significant performance gain. 3. Automate current Spark pipeline to run multiple jobs in one go using python multithreading and oozie job scheduler. It significantly reduced runtime down to around 40% 4. Using Pyspark, Automate the Target data analysis to gain insight on target data which helped in taking business decisions in a much easier and faster way. 5. Using Python and Shell, automate current pre sales analysis task which helped organization to take quick decisions for cost management
    vsuc_fltilesrefresh_TrophyIcon Apache Hive
    SonarQube
    Apache Hadoop
    Big Data
    Git
    Bash Programming
    PySpark
    Anaconda
    Python
    SQL
    ETL
  • $40 hourly
    I am a Senior Data Engineer with extensive expertise in data wrangling, transformation, normalization, and setting up comprehensive end-to-end data pipelines. My skills also include proficiency in Business Intelligence, ETL processes, and writing complex SQL queries. I have successfully implemented multiple intricate data pipelines using tools like Apache Airflow and Apache Oozie in my previous projects. I have had the opportunity to contribute to the data platform teams at Fortune 500 companies, where my role involved solving complex data issues, managing large datasets, and optimizing data streams for better performance and reliability. I prioritize reliability, efficiency, and simplicity in my work, ensuring that the data solutions I provide are not just effective but also straightforward and easy to maintain. Over the years, I have worked with a variety of major databases, programming languages, and cloud platforms, accumulating a wealth of experience and knowledge in the field." Skills : 𝗖𝗹𝗼𝘂𝗱: GCP (Google Cloud Platform) , AWS (Amazon Web Services) 𝗣𝗿𝗼𝗴𝗿𝗮𝗺𝗺𝗶𝗻𝗴 𝗟𝗮𝗻𝗴𝘂𝗮𝗴𝗲 : Java, Scala, Python 𝗗𝗮𝘁𝗮 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 : Spark, Pyspark, Kafka, Crunch, MapReduce, Hive, HBase, AWS Glue 𝗗𝗮𝘁𝗮-𝘄𝗮𝗿𝗲𝗵𝗼𝘂𝘀𝗶𝗻𝗴 : AWS Athena, Google BigQuery, Snowflake, Hive 𝗦𝗰𝗵𝗲𝗱𝘂𝗹𝗲𝗿 : Airflow, Oozie etc. 𝗢𝗿𝗰𝗵𝗲𝘀𝘁𝗿𝗮𝘁𝗶𝗼𝗻 : Docker I am highly attentive to details, organised, efficient, and responsive. Let's connect over.
    vsuc_fltilesrefresh_TrophyIcon Apache Hive
    Data Warehousing & ETL Software
    API Integration
    Apache Airflow
    Apache Spark
    Apache Hadoop
    Apache Kafka
    PySpark
    ETL Pipeline
    Data Engineering
    Data Preprocessing
    Data Integration
    Python
    SQL
    Data Transformation
  • $15 hourly
    Specialties: Big Data Technology, Spark, Databricks, Azure Synapse Analytics Services, AWS, Hive, ETL, Data lake, delta lake expert. Languages : Scala, Java , Python(intermediate), SQL & No-SQL Databases Academic Project expert for all University
    vsuc_fltilesrefresh_TrophyIcon Apache Hive
    Oracle
    ETL
    Oracle PLSQL
    Big Data
    SQL
    Java
    Apache Kafka
    Apache Hadoop
    Apache Spark
  • $10 hourly
    Technical Experience * Hands on experience in Hadoop Ecosystem including Hive, Sqoop, MapReduce and basics of Kafka * Excellent knowledge on Hadoop ecosystems such as HDFS , Resource Manager, NodeManager , Name Node, Data Node and Map Reduce programming paradigm * Expertise in managing big data processing using Apache Spark and its various components * Load and transform large sets of structured, semi-structured and unstructured data from Relational Database Systems to HDFS and vice-versa using Sqoop tool. * Data ingestion and refresh from RDBMS to HDFS using Apache Sqoop and processing data through Spark Core and Spark SQL * Proficiency in Scala and Pyspark required for high level data processing and have end-to end knowledge for implementation of a project * Designing and creating Hive external tables, using shared meta-store instead of Derby, and creating Partitions and Bucketing
    vsuc_fltilesrefresh_TrophyIcon Apache Hive
    Amazon Web Services
    Visualization
    Apache Spark
    Apache Kafka
    SQL
    Apache Hadoop
  • $22 hourly
    Motivated, self-taught and teamwork-oriented software engineer with a can-do attitude. Has significant experience in developing high- volume applications.
    vsuc_fltilesrefresh_TrophyIcon Apache Hive
    Amazon SageMaker
    AWS Glue
    PySpark
    AWS Lambda
    Amazon S3
    Amazon Web Services
    Data Ingestion
    Database
    ETL
    Amazon DynamoDB
    Python
    Apache Hadoop
    R
    Apache Spark
  • $50 hourly
    I am a passionate coder and data enthusiast. I love solving complex problems using Data and Models. I am currently working with tools and frameworks required for building efficient and scalable data pipelines using AWS and GCP based cloud based platform. My skills : Computer Vision, Google cloud, Infrastructure set-up, Big Data, Machine Learning, MapReduce, SQL, Search technologies. Tools and Languages : Pytorch, Tensorflow, Opwn CV, Apache Hadoop, Apache Kafka, Apache Spark, Apache Hive, Apache Impala, Apache Jena, AWS Cognito, AWS IOT Core, AWS Lambda, Django framework, Flask, Graphene, Graphql, AWS Dynamodb, AWS S3, RDF triple stores, Time series databases like Axibase, Apache Solr, Apache Lucene, Marklogic, Metafacts, Jenkins, Telegraf, Grafana, Kubernetes, Docker, AWS ECS, AWS EKS, GCP Kubernetes, Databricks solutions, Python, SQL.
    vsuc_fltilesrefresh_TrophyIcon Apache Hive
    Kubernetes
    Apache Spark
    Apache Kafka
    Architectural Design
    TensorFlow
    AWS Lambda
    PyTorch
    Internet of Things Solutions Design
    Apache Hadoop
    Internet of Things
    Google Cloud Platform
    Cloud Computing
    Amazon Web Services
  • $30 hourly
    I am a dedicated and results-driven Data Engineer with a passion for transforming complex data into valuable insights and actionable results. With 4 of experience in the industry, I have honed my skills in designing, developing, and implementing effective data systems and pipelines using a range of tools including Apache Spark, Apache Hadoop, and Snowflake. My deep understanding of data warehousing, ETL processes, and data analysis has enabled me to deliver innovative solutions that drive business growth and competitive advantage. I am committed to staying up-to-date with the latest technologies and industry trends, always seeking new and better ways to turn data into meaningful insights
    vsuc_fltilesrefresh_TrophyIcon Apache Hive
    Data Analytics
    Big Data
    Data Warehousing
    Google Analytics
    Apache Spark MLlib
    Apache Airflow
    Apache Kafka
    Data Mining
    Data Structures
    Apache Spark
    Data Analysis
    Python
    SQL
    ETL Pipeline
  • $20 hourly
    With 6.5 yrs of Exp working with huge Data to solve complex business problems, ability to write technical code and articulate in simple business terms with excellent communication skills. I am a full stack Data Engineer. Tech stack Programming Languages : Python, Scala,Shell Scripting Database : MySQL, Teradata and other RDBMs Distributed systems : Hadoop Ecosystem - HDFS, Hive, Spark, PySpark, Oozie
    vsuc_fltilesrefresh_TrophyIcon Apache Hive
    Engineering & Architecture
    Big Data
    Linux
    RESTful API
    PySpark
    Scala
    Apache Hadoop
  • $35 hourly
    * 9.6 Years of experience in Apache Spark, Python, MS Azure Cloud platform, Azure Data Factory, Azure Data Lake Storage, Azure-Databricks, Hadoop eco-system Map Reduce, Hive, Sqoop, HDFS, Oracle, GIT, JIRA, Agile methodology. * Involved in understanding the business requirement and providing solution/design. Understanding the data, parameter, schema associated and behaviour of the data to better perform operations and transformation the data. * Contributed and proposed best possible technical specification to solve limitations and defects in existing application. * Involved in the code development, testing and also created the
    vsuc_fltilesrefresh_TrophyIcon Apache Hive
    Microsoft Azure
    AWS Glue
    Data Lake
    ETL Pipeline
    PaaS
    Agile Project Management
    Agile Software Development
    Git
    Cloud Computing
    Apache Spark
    Databricks Platform
    Apache Hadoop
  • Want to browse more freelancers?
    Sign up

How hiring on Upwork works

1. Post a job (it’s free)

Tell us what you need. Provide as many details as possible, but don’t worry about getting it perfect.

2. Talent comes to you

Get qualified proposals within 24 hours, and meet the candidates you’re excited about. Hire as soon as you’re ready.

3. Collaborate easily

Use Upwork to chat or video call, share files, and track project progress right from the app.

4. Payment simplified

Receive invoices and make payments through Upwork. Only pay for work you authorize.