Spark Memory management involves two different types of memory Driver Memory and Executor memory.
But In this article, I will cover everything about driver memory in spark applications.
Let’s get started!
The amount of memory that a driver requires depends upon the job that you are going to execute. In a cluster mode there is also an overhead added to prevent YARN from killing the driver container prematurely for using too much resources.
Overhead in cluster mode
Suppose if you are using collect or take action on large RDDs or DataFrame then it will try to bring all data to driver memory. Hence you will get a heap size error.
Default driver memory : 1GB
Use below command to set driver memory while running spark submit
spark-submit — master yarn — driver-memory 4g .…..