Data Engines: Spark Architecture

What happens when a Spark Job is submitted?

When the application code is submitted, the driver implicitly converts the code containing the transformations and actions into logical directed acyclic graph(DAG).During this stage, the driver program performs some optimizations like pipelining transformations and then the logical DAG is converted into physical execution plan with set of stages.The physical execution plan is now created. Once the plan is created , small physical execution units which are called tasks under each stage are created. Tasks will be then bundled will be sent to the Spark Cluster.

The driver program then connects with the cluster manager and ignores for resources. On behalf of driver,the cluster manager launches executors on the worker nodes. Now the driver sends tasks to the cluster manager based on the data placement. Before executors starts the execution, they first registers itself with the driver program so that the driver has overview of all the executors. Now the executors start executing the various tasks that was assigned by the driver program. Driver program monitors the all the set of executors that run at every point. It can also schedule the future tasks based on data placement by checking the location of cached data. If the driver programs main () method exits or when it calls the stop () method of the Spark Context, executors are terminated and the resources are released from the cluster.

Spark Architecture

No comments:

Training