We are looking for a PySpark Expert to help optimise our PySpark pipeline. We are using Kafka for ingestion, PySpark for Stream processing and Elasticsearch for storage. Our pipeline has a couple of bottlenecks, which we want to unclog.
We want someone who understands how to use the various optimisation techniques in Spark, knows how to set the configurations for hardware resources and understands YARN.
If you are an expert in this space, please reach out to us.