Filter

My recent searches
Filter by:
Budget
to
to
to
Type
Skills
Languages
    Job State
    1,055 pyspark jobs found

    ...and transform it in real time, feed it to a set of AI services, then serve the insights back to users through an intuitive dashboard. Your day-to-day work will touch three key areas: • Data collection – build reliable connectors, handle auth flows, schedule recurring pulls, and maintain error logging. • Data processing – design ETL pipelines, implement transformation logic in Python (Pandas, PySpark or similar), and ensure everything is containerised for smooth deployment. • Data visualization – wire processed datasets into the React front-end, craft reusable chart components (D3, , or your preferred library), and optimise for performance. Acceptance criteria 1. End-to-end pipeline runs with a one-command deploy (Docker / docker-compose or...

    ₹9314 Average bid
    ₹9314 Avg Bid
    37 bids

    Data cleaning using SQL/Python (need to figure out) and export in Excel. The client have used this in Pyspark environment. we can have a discussion later ont the details.

    ₹838 Average bid
    ₹838 Avg Bid
    1 bids

    We are looking for an experienced Palantir Foundry Developer to support data and AI use cases. Scope of Work: * Build and maintain Foundry data pipelines (Pipeline B...support data and AI use cases. Scope of Work: * Build and maintain Foundry data pipelines (Pipeline Builder, Transforms) * Work with Ontology (object types, link types, data modeling) * Develop Workshop applications for business users * Implement AIP Logic workflows and basic agent integrations * Write production-quality Python, SQL, and PySpark code Requirements: * Hands-on experience with Palantir Foundry (mandatory) * Strong skills in Python, SQL, and PySpark * Experience with Ontology, Pipelines, and Workshop * Basic understanding of AIP (preferred) Project Details: * Budget: ₹45,000+(Negotiable) * ...

    ₹51354 Average bid
    ₹51354 Avg Bid
    21 bids

    Results-driven Senior Data Analyst with 8+ years delivering enterprise data solutions across Banking, Financial Services, and Healthcare. Specialized in end-to-end Data Warehouse design, ETL pipeline development, and BI reporting. Core Expertise: Snowflake, Azure Data Factory, Azure Synapse, Delta Lake, PySpark, SSIS, T-SQL, PL/SQL, Oracle, MySQL, Power BI, DAX, Tableau, SSRS, Star/Snowflake Schema, EDW Design, Data Mart Development, Data Lineage, Gap Analysis. Compliance & Governance: HIPAA, GDPR, SOX Audit Controls, Data Quality Frameworks, Data Governance Policies. Business Analysis: BRD/FRD Writing, JAD Facilitation, UML Diagrams, Stakeholder Management, Agile, Waterfall. Certifications: Microsoft Azure Fundamentals, Salesforce Administrator, Salesforce Platform Developer I....

    ₹13970 Average bid
    ₹13970 Avg Bid
    24 bids

    Responsible for designing and implementing large-scale data migration and ingestion pipelines to move high-volume data from diverse sources into cloud platforms. Sources include HDFS, relational databases such as MySQL and PostgreSQL, and real-time streaming systems like Kafka. Develop and maintain robust data pipelines using PySpark, ensuring efficient processing of batch and streaming data. Implement automated scheduling mechanisms to orchestrate data workflows on daily and monthly intervals, ensuring reliability and timely data availability. Optimize data ingestion and storage through advanced performance tuning, partitioning, and compaction strategies to handle large-scale datasets efficiently. Ensure data quality, consistency, and fault tolerance across all pipelines. Deploy...

    ₹1024 Average bid
    ₹1024 Avg Bid
    1 bids

    ...Experience Required: 5+ Years (Data Engineering), 3+ Years (Databricks) Note: Budget is fixed. Please do not apply if you are looking to negotiate. Key Responsibilities Develop and optimize data pipelines using Databricks, PySpark, and Spark SQL Design and implement Delta Lake architecture (Bronze / Silver / Gold layers) Work on Lakehouse architecture and manage Unity Catalog Apply DataOps practices for scalable and reliable data workflows Optimize Spark jobs for performance and cost efficiency Required Skills Strong hands-on experience with Databricks Proficiency in PySpark and Spark SQL Experience with Delta Lake and Lakehouse architecture Knowledge of data pipeline design and optimization Understanding of DataOps and data governance Nice to Have Experience with Azur...

    ₹60067 - ₹70078
    Sealed NDA
    ₹60067 - ₹70078
    12 bids

    ...database containing nested JSON / key-value blobs. • Goal: parse, normalize, and flatten these blobs into well-defined columns while preserving relationships and lineage. • Scale: millions of rows, so solutions that leverage Spark, Hadoop, BigQuery, Snowflake, or well-tuned SQL/Python pipelines are welcome—as long as they remain maintainable. Deliverables 1. Transformation code (Python, PySpark, SQL, or Scala) with clear comments. 2. A runnable job definition or workflow file (Airflow DAG, Spark submit script, dbt model, etc.) that shows how to execute the pipeline end-to-end. 3. Simple README explaining prerequisites, run steps, and how new fields should be added in future. Acceptance criteria • Pipeline processes at least 10 GB of source data ...

    ₹14250 Average bid
    ₹14250 Avg Bid
    6 bids

    ...validation rules, automated tests, and observable metrics baked in from day one—Great Expectations, Delta Live Tables expectations, or comparable frameworks are welcome, as long as quality gates are visible in the monitoring layer. Scope to cover: • Architecture design diagram with clear component rationale (Azure Data Lake, Databricks, Delta, Unity Catalog, etc.). • Reproducible code (Python / PySpark, notebooks or repos) with CI/CD instructions. • Ingestion pipelines (batch or streaming), curated layers, and serving tier (SQL endpoints, Power BI, or dashboards of your choice). • Integrated monitoring, alerting, and cost-aware observability using native Azure tools or open-source add-ons. • End-to-end test suite: unit, integration, and data qualit...

    ₹1024 / hr Average bid
    ₹1024 / hr Avg Bid
    19 bids

    ...validation rules, automated tests, and observable metrics baked in from day one—Great Expectations, Delta Live Tables expectations, or comparable frameworks are welcome, as long as quality gates are visible in the monitoring layer. Scope to cover: • Architecture design diagram with clear component rationale (Azure Data Lake, Databricks, Delta, Unity Catalog, etc.). • Reproducible code (Python / PySpark, notebooks or repos) with CI/CD instructions. • Ingestion pipelines (batch or streaming), curated layers, and serving tier (SQL endpoints, Power BI, or dashboards of your choice). • Integrated monitoring, alerting, and cost-aware observability using native Azure tools or open-source add-ons. • End-to-end test suite: unit, integration, and data qualit...

    ₹50573 Average bid
    ₹50573 Avg Bid
    59 bids

    ...transformation, and optimisation. • Hands-on experience working within Databricks, including notebooks, workflows, and job execution. • Proven experience using Power BI for report and dashboard development, including data modelling, DAX, Power Query, and visualisation design. • Experience building and maintaining data pipelines, ideally within Azure environments. • Experience using Python (e.g. PySpark) within Databricks environments is advantageous. • Understanding of data modelling concepts, including fact and dimension structures. • Familiarity with Azure Data Factory or similar orchestration tools, with Insight Factory advantageous. • Working knowledge of DevOps practices, including version control, repository management, and...

    ₹129206 Average bid
    ₹129206 Avg Bid
    31 bids

    Job Title: Data Engineer (Databricks and AWS) Duration: 2 Hours Budget: ₹22,000 – ₹26,000 (based on screening) Tech Stack: Databricks, Python, PySpark, AWS, SQL, Git Job Description: We are looking for an experienced Data Engineer to provide short-term support. The role involves working on data pipelines, transformations, and analytics using Databricks and AWS. Responsibilities: Develop and optimize data pipelines using Databricks, PySpark, and Python Work with AWS services and SQL-based data processing Manage code and versioning using Git Troubleshoot and optimize data workflows Requirements: Strong hands-on experience with Databricks, PySpark, and Python Good knowledge of AWS data services and SQL Experience with Git and collaborative development Ability to del...

    ₹27382 Average bid
    ₹27382 Avg Bid
    17 bids

    ...engineering experiences with various aws services Experience building end-to-end data pipelines (schema discovery, ingestion, transformation, orchestration, monitoring) Experience working with relational databases like Oracle, MySQL, and SQL Server etc Experience with data ingestion from on-prem systems to cloud Experience with streaming platforms like Kafka or AWS Kinesis Strong skills in Python, PySpark, SQL, and Terraform...

    ₹104964 Average bid
    ₹104964 Avg Bid
    160 bids

    ...the next round of hiring I want an accomplished Senior Data Engineer to sit in on our technical interviews for roughly two hours each day. The role is purely evaluative: you will craft probing questions, join live video calls, and quickly score each candidate’s depth of knowledge across Python, Scala and SQL. Our stack centres on Azure and Databricks, so practical insight into large-scale Spark/PySpark jobs, data-model design, ETL orchestration and cloud performance tuning is essential. Candidates frequently discuss streaming, optimisation strategies and modern AI/ML add-ons, so any hands-on exposure to libraries such as PyTorch, NumPy, SciPy or TensorFlow will help you challenge them at the right level, though it is not mandatory. Availability is limited to two focus...

    ₹24402 Average bid
    ₹24402 Avg Bid
    16 bids

    ...narrative continuity before passing curated context into a citation aware LLM routing layer that prioritizes Gemini, OpenAI, then Anthropic, then Ollama local models, enforcing context bound generation and preventing hallucination outside retrieved evidence. Indexing is parallelized using ProcessPoolExecutor for efficient multi core utilization and automatically scales to distributed ingestion via PySpark when corpus size exceeds a configured threshold, enabling safe handling of 20k plus documents or 50GB class corpora, while the system is wrapped in a full MLOps backbone that integrates MLflow for experiment tracking of retrieval metrics, PPO reinforcement learning rewards, and parameter tuning, exposes Prometheus metrics for latency and retrieval monitoring compatible with Graf...

    ₹22725 Average bid
    ₹22725 Avg Bid
    14 bids

    Description: We’re looking for an experienced Data Engineer preferably based from Dubai to help build and manage data pipelines for a global platform. Most work is in Azure, using Azure Data Factory, ADLS, and Databricks. What you’ll do: Build and manage PySpark/Spark pipelines in Databricks Schedule and monitor pipelines in Azure Data Factory Optimize Databricks for better performance Keep code and documentation organized and clear Requirements: Experience with Azure cloud and Databricks Strong PySpark / Spark skills Experience building scalable, reliable data pipelines Details: Project-based, with potential to move to full-time Ideal for engineers who like building cloud-native pipelines

    ₹1118 / hr Average bid
    ₹1118 / hr Avg Bid
    20 bids

    ... • Read multiple flat-file formats (mainly CSV, with the occasional JSON). • Apply thorough data-cleansing rules—removing duplicates, enforcing data types, flagging out-of-range values, and normalising text fields. • Run validation checks so that only clean, schema-compliant rows proceed to the load step. I’m happy for you to choose the stack you are most efficient with—Python (pandas, PySpark), Talend, or another ETL tool—as long as the final solution is reproducible and can be triggered automatically (CLI, scheduled job, or cloud function). If you think aggregation or more advanced joins would improve the dataset, flag that as a future enhancement; for now, cleansing and validation are the must-haves. Deliverables 1. Well-docum...

    ₹2386 Average bid
    ₹2386 Avg Bid
    24 bids

    ...Azure Data Engineer to support and enhance our existing data platform on an ongoing basis. You should be strong in: Azure Data Factory (ADF) for building and maintaining ETL/ELT pipelines Azure Databricks and PySpark for large‑scale data processing Python for data engineering utilities, automation, and integration Delta Lakes/Lakehouse concepts, performance optimization, and troubleshooting Working with SQL‑based data sources, data warehousing, and BI integrations Responsibilities Design, build, and optimize data pipelines in Azure ADF and Databricks Develop and maintain PySpark and Python jobs for batch and near real‑time workloads Implement best practices for data quality, observability, and monitoring Collaborate with our internal team, follow existing standa...

    ₹931 / hr Average bid
    ₹931 / hr Avg Bid
    34 bids

    I am looking for an experience data engineer with 4-5 years of experience with Pyspark And Python handson experience. Experience with handling a complex data pipeline.

    ₹466 / hr Average bid
    ₹466 / hr Avg Bid
    6 bids

    ...Databricks Data Analyst and Data Engineer certifications and want a structured, hands-on tutoring program that also deepens my Snowflake skills. The goal is to become confident building end-to-end data pipelines, running analytics, and understanding platform architecture well enough to pass the exams and perform the work in practice. Focus areas Databricks • Data processing & analytics with PySpark/SQL and Delta Lake • Machine learning workflows inside the Databricks environment • Workspace, cluster, job, and Lakehouse architecture Snowflake • Core data-warehousing concepts and best practices • Query tuning and overall performance optimisation • Security features: RBAC, masking, encryption, and access policies How we can wor...

    ₹17323 Average bid
    ₹17323 Avg Bid
    48 bids

    ...Object Storage, Data Flow (Spark), and Data Catalog. * Solid understanding of Finance / Order-to-Cash (O2C) data entities and processes. * Knowledge of data modeling, lineage, and governance principles. * Familiarity with CI/CD and DevOps for automated deployments. Preferred Skills * OCI Data Integration certification. * Experience integrating Oracle Cloud ERP with OCI DI. * Knowledge of Python or PySpark for custom transformations. * Exposure to Data Science and ML pipelines leveraging OCI services. * Experience with monitoring tools like Grafana...

    ₹192698 Average bid
    ₹192698 Avg Bid
    5 bids

    ...guidance with embedding Genie via API into apps, Teams, or dashboards. • Train internal teams on Genie capabilities, administration, and operational readiness. Required Skills & Experience • Strong practical experience with Azure Databricks, Lakehouse architecture, Unity Catalog, SQL Warehouse. • Knowledge of Genie AI, foundational models, or Databricks conversational analytics. • Competency in PySpark, SQL, data modeling, and enterprise data engineering practices. • Familiarity with Azure ecosystem (Data Lake, Data Factory, DevOps). • Ability to translate business questions into NLQ-friendly dataset design. • Excellent communication and ability to work with cross functional data, BI, and business teams. Nice to Have • Experience with A...

    ₹186 / hr Average bid
    ₹186 / hr Avg Bid
    3 bids

    I have a Hadoop cluster holding several large data sets, and I need a seasoned PySpark developer who also writes rock-solid SQL. The immediate aim is to connect to the cluster (YARN/HDFS with Hive metastore), develop or refine PySpark jobs, optimise the accompanying SQL, and make sure everything runs smoothly end-to-end. You’ll receive access to a staging namespace plus a sample of the data. Once the logic checks out we’ll promote the code to the full environment. Deliverables • A clean, well-commented PySpark notebook or .py job that executes successfully on the cluster • The corresponding SQL script or view definitions ready for Hive or spark-sql • A concise README detailing execution steps, parameters, and expected outputs Accep...

    ₹6985 Average bid
    ₹6985 Avg Bid
    11 bids

    I need a reusable ETL framework built inside Databricks notebooks, version-controlled in Bitbucket and promoted automatically through a Bitbucket Pipeli...attached to any cluster. Acceptance criteria • Parameter-driven notebooks organised by layer. • Reusable GraphQL connector packaged as a .whl. • Bitbucket Pipelines yaml that runs unit tests, uses the Databricks CLI to deploy notebooks, and executes an integration test on commit. • Clear README detailing how to add a new API endpoint and where to place cleaning logic. Leverage native tools—PySpark, SQL, Delta Lake, dbutils—while keeping external libraries to a minimum and fully documented. Please share a brief outline of your approach and any relevant Databricks + Bitbucket CI experience s...

    ₹31759 Average bid
    ₹31759 Avg Bid
    114 bids

    I’m a beginner looking for a 1-on-1 Databricks instructor for a very hands-on, fast-paced 2-week program. Requirements: - Strong real-world Databricks experience - Hands-on Apache Spark (PySpark), SQL, Delta Lake - Real use case / mini project (end-to-end pipeline) - Live screen sharing, coding together - Beginner-friendly but practical (no theory-only) Goal: By the end of 2 weeks, I want to confidently build and understand a real Databricks data pipeline. Availability: 5–6 sessions per week, 1–1.5 hours per session Please share: - Your Databricks experience - How you would structure these 2 weeks - Your hourly rate Thanks!

    ₹1863 / hr Average bid
    ₹1863 / hr Avg Bid
    59 bids

    ...across multiple source systems. Build and optimize Foundry pipelines using Code Workbooks (PySpark, SQL, Scala) and Quiver. Support data integration, feature engineering, and pipeline debugging for production AI workloads. Implement security and permissions architecture aligned with enterprise governance. Help develop Foundry applications using Workshop, Contour, and Slate for analytics and decision-making. Guide on best practices for CI/CD, testing, and deployment within Foundry. Provide mentorship and troubleshooting support during live client engagements. Required Skills: Strong hands-on experience with Palantir Foundry (Ontology, Code Workbooks, Quiver, Workshop). Proficiency in Python, PySpark, and SQL. Experience with data modeling, transformation logic, and pipelin...

    ₹1304 / hr Average bid
    ₹1304 / hr Avg Bid
    20 bids

    Need a strong streaming experience person to develop design deploy Pyspark publishing and upserting job in EMR with Spark, MongoDB(documentDb) connector, AWS EMR step functions, Cloud watch, docker, Kafka cluster architecture, Airflow dags, Gitlab, Pycharm, Cursor AI IDE etc needed for environment experience

    ₹1024 / hr Average bid
    ₹1024 / hr Avg Bid
    39 bids

    ...patterns that Databricks loves to test. • Fresh practice questions (or a curated question bank) with detailed explanations so I understand not just the right answer but the thinking process. • At least one full-length mock exam under timed conditions followed by a debrief on weak areas and strategies to avoid common pitfalls. I work mainly in the Databricks notebook environment with Python, PySpark, and SQL, so please weave real-world examples into the prep. I’m flexible on session times and frequency; we can agree milestones and refine the plan as we go. If you’ve already helped others pass this exam—or you hold the certification yourself—tell me how you’d tackle my study roadmap and what materials you’d bring to the table. I...

    ₹4284 Average bid
    ₹4284 Avg Bid
    2 bids

    ...actual medicines and would map once the inconsistencies are ironed out, so I want the process to be fully automated, driven by a robust auto-correct algorithm rather than manual review. Remaining 0.1% could be non medical entries, and need to be deleted. I am open to proven techniques—fuzzy matching, phonetic hashing, Levenshtein, word embeddings, or a hybrid—as long as they scale. Python, pandas, PySpark, or any other big-data friendly stack is fine, provided the final solution is reproducible and well documented. Deliverables • Clean, executable scripts (Jupyter notebook or .py) that ingest both files, normalise product names, detect duplicates, and output a one-to-one mapping table. • A brief README explaining dependencies, algorithm logic, and how ...

    ₹56533 Average bid
    ₹56533 Avg Bid
    39 bids

    ...Infrastructure Microsoft Azure (Functions, Logic Apps, Service Bus, Blob Storage, Data Factory, Azure DevOps) AWS Cloud Docker, Kubernetes RabbitMQ CRM, ERP & Enterprise Platforms Microsoft Dynamics CRM 365 Dynamics Business Central Sage CRM NopCommerce Sitefinity v12.2 Umbraco v8.0 DotNetNuke v4.0 Python, AI & Advanced Solutions Python, Django, Flask, Pyramid REST APIs, WebSockets PySpark AI Email & Chatbot Solutions Data Science & Analytics CMS, E-Commerce & Web Platforms WordPress, Joomla, Drupal Prestashop PHP-based systems BI, Finance & Business Support Power BI Advanced Excel Accounting, Finance & Bookkeeping Data Entry & Business Reporting MS Office Suite Tools & Delivery Methodology Git (Version Control) N...

    ₹466 / hr Average bid
    ₹466 / hr Avg Bid
    20 bids

    ...short interpretive notes that fold easily into manuscripts. What matters most is hands-on mastery of data extraction, table linking, and general database management within MIMIC. Solid grounding in observational study design, epidemiology, and EHR quirks is essential; a background in medicine or public health will make communication smoother. Working code in SQL plus either tidyverse/R or pandas/pySpark is expected. The immediate deliverable is a fully cleaned analytic dataset with the accompanying scripts and an outline of the statistical approach. After that, I plan to keep the collaboration open for additional projects and sensitivity analyses as new questions arise....

    ₹2422 / hr Average bid
    ₹2422 / hr Avg Bid
    67 bids

    I’m looking for a Data Engineer with strong AWS native services experience to help build and support an event-driven data platform. This project focuses on automated batch data pipelines, data governance, and making data available in a secure ...Data Engineer with strong AWS native services experience to help build and support an event-driven data platform. This project focuses on automated batch data pipelines, data governance, and making data available in a secure and scalable way. This is not ad-hoc ETL — it’s a platform-style setup. Tech stack involved: • AWS: S3, SQS, Lambda, MWAA (Airflow), EMR Serverless • Data Processing: PySpark, Apache Spark • Data Lake: Apache Iceberg, AWS Glue Catalog • Governance & Security: Lake Formatio...

    ₹1583 / hr Average bid
    ₹1583 / hr Avg Bid
    40 bids

    ...Remote Working Time: evening Budget: 22-24k monthly Duration:-2 hours per day Demo Required: Today Job Description We are seeking an experienced Senior Data Engineer with strong expertise in the Healthcare Payer domain to design, build, and maintain scalable data pipelines and reporting solutions. The ideal candidate will have hands-on experience across AWS and Microsoft Azure, strong Python/PySpark skills, and the ability to support integrated reporting and analytics using Power BI. Key Responsibilities Design, develop, and maintain end-to-end data pipelines for healthcare payer data Build and optimize ETL/ELT workflows using AWS Glue, Step Functions, and Python Work with Azure and AWS cloud services for data ingestion, processing, and storage Implement and manage Data ...

    ₹27103 Average bid
    ₹27103 Avg Bid
    9 bids

    ...Remote Working Time: evening Budget: 22-24k monthly Duration:-2 hours per day Demo Required: Today Job Description We are seeking an experienced Senior Data Engineer with strong expertise in the Healthcare Payer domain to design, build, and maintain scalable data pipelines and reporting solutions. The ideal candidate will have hands-on experience across AWS and Microsoft Azure, strong Python/PySpark skills, and the ability to support integrated reporting and analytics using Power BI. Key Responsibilities Design, develop, and maintain end-to-end data pipelines for healthcare payer data Build and optimize ETL/ELT workflows using AWS Glue, Step Functions, and Python Work with Azure and AWS cloud services for data ingestion, processing, and storage Implement and manage Data ...

    ₹23005 Average bid
    ₹23005 Avg Bid
    4 bids

    My current résumé sells me as a data engineer, yet my next move is a Data Analyst role. I need the Work Experience and Skills sections re-worked so recruiters immediately see me as a strong analytical hire. Here’s what you’ll be working with • Hands-on background in Hadoop administration, PySpark development, Databricks workflows and day-to-day data analysis. • A solid foundation in SQL and reporting tools, though these strengths are not highlighted well in the document. What I’m after • Rewrite both sections to spotlight analytical impact, business-friendly storytelling and in-demand keywords (think SQL, dashboards, data visualization, statistical insight, KPI tracking, etc.). • Re-order bullet points around results, not...

    ₹1988 Average bid
    ₹1988 Avg Bid
    6 bids

    The core of my remote-sensing crop-yield project is in place, but the code will not run from start to finish. I need a fresh set of eyes to hunt down and eliminate the blockers so that the pipeline executes smoothly on Databricks and locally. Current state • Repository already contains: – Spark-based preprocessing notebooks (PySpark) – Trained ML model scripts and saved artefacts – A handful of Databricks experiment notebooks for exploration What I need most Debugging is the priority. I am not after a full rewrite—I want the existing pieces to work together. You are free to suggest refactors where they remove obvious bottlenecks, but the first milestone is simply getting the code to run cleanly. Focus areas • Spark preprocessi...

    ₹838 Average bid
    ₹838 Avg Bid
    9 bids

    We are seeking a freelancer proxy for a Data Engineer role to support a remote healthcare data platform. The work will be 5 to 6 hours per day. You will be required to sit alongside the engineer during work hours, explain work...operational runbooks for knowledge sharing • Support and guide production-grade pipelines built on Dagster, DBT, Airflow, AWS Glue, and SSIS Required Skills & Tech Stack: • Python (Strong) • SQL (Advanced) • Dagster, DBT, Airflow, AWS Glue • AWS: Athena, Glue, SQS, SNS, IAM, CloudWatch • Databases: PostgreSQL, AWS RDS, Oracle, Microsoft SQL Server • Data Modeling & Query Optimization • Pandas, PySpark, PyCharm • Terraform, Docker, DataGrip, VS Code • Git/GitHub and CI/CD pipelines • Experience wi...

    ₹61010 Average bid
    ₹61010 Avg Bid
    58 bids

    We are seeking a freelancer proxy for a Data Engineer role to support a remote healthcare data platform. The work will be 5 to 6 hours per day. You will be required to sit alongside the engineer during work hours, explain work...operational runbooks for knowledge sharing • Support and guide production-grade pipelines built on Dagster, DBT, Airflow, AWS Glue, and SSIS Required Skills & Tech Stack: • Python (Strong) • SQL (Advanced) • Dagster, DBT, Airflow, AWS Glue • AWS: Athena, Glue, SQS, SNS, IAM, CloudWatch • Databases: PostgreSQL, AWS RDS, Oracle, Microsoft SQL Server • Data Modeling & Query Optimization • Pandas, PySpark, PyCharm • Terraform, Docker, DataGrip, VS Code • Git/GitHub and CI/CD pipelines • Experience wi...

    ₹64808 Average bid
    ₹64808 Avg Bid
    29 bids

    I have an existing SAS program that handles end-to-end data processing for a single SQL Database source. The code cleans raw tables, applies a series of transformations, then produces several aggregated outputs that feed downstream reports. I now need the entire workflow re-implemented in PySpark running on Azure Databricks so I can retire the SAS environment and take advantage of Databricks’ scalability. You will receive: • The original .sas files with inline comments that explain each step • A data-dictionary of the SQL tables involved • Sample input/output datasets to verify parity What I’m expecting from you: 1. A well-structured Databricks notebook (or .py files) that reproduces the SAS logic for data cleaning, transformation, and aggregat...

    ₹11083 Average bid
    ₹11083 Avg Bid
    28 bids

    ...AWS and Databricks. This role is focused on hands-on execution, optimization, and support within a clearly defined scope. Key Responsibilities Enhance and maintain existing Databricks (PySpark) data pipelines Work with AWS services such as S3, Glue, Lambda, Redshift/Athena Optimize data workflows for performance and reliability Implement data transformations, validations, and incremental loads Troubleshoot and resolve pipeline and data issues Maintain documentation for assigned components Required Experience & Skills 6–8 years of experience in Data Engineering Strong hands-on expertise in Python & PySpark Proven experience with Databricks Good knowledge of AWS data services Strong SQL and data modeling skills Ability to work independently in a remote setu...

    ₹67896 Average bid
    ₹67896 Avg Bid
    13 bids

    ...running the usual checks for duplicates, missing values, and outliers. Once it is clean, I expect you to apply the appropriate statistical and machine-learning techniques—time-series decomposition, clustering, cohort or basket analysis, whichever combination best surfaces trend signals. Python or R is fine (Pandas, NumPy, scikit-learn, tidyverse, etc.), and if you prefer a big-data stack such as PySpark, that works too; the volume will justify it. Please package the outcome as: • A concise written report (PDF or Markdown) that explains the key trends and how you arrived at them. • Visualisations (static or interactive) that make the findings easy to consume for non-technical stakeholders—Matplotlib, Seaborn, Plotly, or Tableau Public dashboards are all a...

    ₹1863 / hr Average bid
    ₹1863 / hr Avg Bid
    49 bids

    I need an experienced engineer who can sit with me in Pune MH India and provide hands-on, offline technical support for daily data engineering tasks. The focus is strictly on Python and PySpark: reviewing code, untangling bugs, optimising Spark jobs, and guiding me through best practices as we build and maintain data-processing pipelines. This is not a remote, on-call role; I’m looking for someone who can be physically present in Kharadi/Viman Nagar/Magarpatta Area—pair programming, white-boarding solutions, and helping me push features all the way to a clean commit. If you have solid production experience with Python, strong command of PySpark’s RDD/DataFrame APIs, and the confidence to troubleshoot performance issues on the spot, let’s talk about a regular...

    ₹27475 Average bid
    ₹27475 Avg Bid
    10 bids

    Bank Loan ETL & Visualization Project Report 1. Abstract This project builds a complete ETL (Extract, Transform, Load) pipeline for bank loan analytics using PySpark and Python. It cleans, validates, and integrates branch, customer, and loan datasets into a unified master table. The pipeline standardizes financial data, generates analytical insights, and prepares the output for reporting and automated financial analysis. 2. Technologies Used Python PySpark Pandas Matplotlib CSV Files Java JDK (required for Spark) 3. Dataset Description This project uses three CSV datasets: – Branch details (branch_id, branch_name, branch_state) – Customer demographic information – Loan records linked to customers and branches 4. ETL Workflow The

    ₹23284 Average bid
    ₹23284 Avg Bid
    19 bids

    I have an existing SAS program that handles end-to-end data processing for a single SQL Database source. The code cleans raw tables, applies a series of transformations, then produces several aggregated outputs that feed downstream reports. I now need the entire workflow re-implemented in PySpark running on Azure Databricks so I can retire the SAS environment and take advantage of Databricks’ scalability. You will receive: • The original .sas files with inline comments that explain each step • A data-dictionary of the SQL tables involved • Sample input/output datasets to verify parity What I’m expecting from you: 1. A well-structured Databricks notebook (or .py files) that reproduces the SAS logic for data cleaning, transformation, and aggregat...

    ₹44053 Average bid
    ₹44053 Avg Bid
    41 bids

    I need an experienced engineer who can sit with me in Pune offline and provide hands-on, offline technical support for daily software-development tasks. (entire month but as per both convenience) The focus is strictly on Python and PySpark: reviewing code, optimizing Spark jobs, and guiding me through best practices as we build and maintain data-processing pipelines. This is not a remote, on-call role; I’m looking for someone who can be physically present—pair programming, white-boarding solutions, and helping me push features all the way to a clean commit. If you have solid production experience with Python, strong command of PySpark’s RDD/DataFrame APIs, and the confidence to troubleshoot performance issues on the spot, let’s talk about a regular schedule ...

    ₹25691 Average bid
    ₹25691 Avg Bid
    5 bids

    Hi, thanks for the opportunity. I can support your Databricks and AI Agents project with strong skills in PySpark, SQL, Delta Lake, and data automation. I will handle ETL pipelines, data processing, AI agent integration, and workflow optimization. My rate is 8 USD per hour, and I can work 40 hours per week (320 USD weekly). I can start immediately and will work closely with your team for smooth delivery.

    ₹331 / hr Average bid
    ₹331 / hr Avg Bid
    1 bids

    Need a strong streaming experience person to help me wrote design develop and deploy Pyspark broker publishing job in EMR with Pyspark, MongoDB connector ,DocumentDB streaming(strong Kafka mongo) AWS, step functions, EMR, docker, Kafka architecture Cloudwatch, Airflow dags.

    ₹931 / hr Average bid
    ₹931 / hr Avg Bid
    33 bids

    I need an experienced engineer who can sit with me in Pune and provide hands-on, offline technical support for daily software-development tasks. The focus is strictly on Python and PySpark: reviewing code, untangling bugs, optimising Spark jobs, and guiding me through best practices as we build and maintain data-processing pipelines. This is not a remote, on-call role; I’m looking for someone who can be physically present—pair programming, white-boarding solutions, and helping me push features all the way to a clean commit. If you have solid production experience with Python, strong command of PySpark’s RDD/DataFrame APIs, and the confidence to troubleshoot performance issues on the spot, let’s talk about a regular schedule that works for both of us.

    ₹21049 Average bid
    ₹21049 Avg Bid
    4 bids

    ...audit submissions. 13. Able to communicate, plan and execute BI platform Audit with internal audit team. Competencies for the job 1) Proven experience with big data solution design and development in Databricks, notebooks & schema design, development, best practice and notebooks Azure Dev Ops / CI-CD Pipelines 2) Hands On in Python PySpark, Spark SQL, Delta Live + Kafka; Azure SQL DB, Azure Data Factory, Azure DataBricks, Azure Synapse, Azure Data Lake, Delta, Pyspark, Python, Logic Apps, Azure DevOps, CI/CD implementation, Power BI / QlikSense, Blob Storage, ADLS, Azure Key Vault, ETL, SSIS 3) Experience in Query Development, Performance Tuning and loading data to Databricks SQL DW 4) Experience in data ingestion into ADLS, Azure Blob Storage, Azure Logic Apps 5) Prac...

    ₹24867 Average bid
    ₹24867 Avg Bid
    14 bids

    ...delivery 8. Ensure developments follow standard coding patterns, are fully documented for audit submissions. Competencies for the job 1) Proven experience with big data solution design and development in Databricks, notebooks & schema design, development, best practice and notebooks Azure Dev Ops / CI-CD Pipelines 2) Hands On in Python PySpark, Spark SQL, Delta Live + Kafka; Azure SQL DB, Azure Data Factory, Azure DataBricks, Azure Synapse, Azure Data Lake, Delta, Pyspark, Python, Logic Apps, Azure DevOps, CI/CD implementation, Power BI / QlikSense, Blob Storage, ADLS, Azure Key Vault, ETL, SSIS 3) Experience in Query Development, Performance Tuning and loading data to Databricks SQL DW 4) Experience in data ingestion into ADLS, Azure Blob Storage, Azure Logic Apps 5) ...

    ₹25426 Average bid
    ₹25426 Avg Bid
    6 bids

    Description: Need an experienced Databricks engineer to guide me through adding logging tasks to 2 workflows in Azure Databricks. What needs to be done: Add log_success and log_failure notebook tasks to 2 existing Databricks workflows Config...CRITICAL REQUIREMENT: All work must be done via Zoom screen sharing on MY machine You will guide/instruct me while I make the changes or you can do it I need to learn the process, not just get it done Must Have: Strong Azure Databricks workflows/jobs experience Experience with pipeline logging/monitoring patterns Patient teaching approach Tech Stack: Azure Databricks Unity Catalog Python/PySpark Azure DevOps (YAML configs) Timeline: Start ASAP To Apply: Share your Databricks experience and availability for Zoom sessions (mention your ...

    ₹2345 Average bid
    ₹2345 Avg Bid
    4 bids