Shared Variables in Spark

Broadcast Variable : 


A data item which changes time to time is a Variable . Broadcast is nothing but  sending the data to all recipients or Machines.On a whole broadcast variable is a data item that changes time to time in all machines simultaneously .Broadcast variable creates the set of data available in each machine so as to reduce I/O costs during the retrieval of data .


Broadcast variable is ery useful in a case where the data is needed by multiple tasks at different stages .


Eg :

val broadcastVar = sc.broadcast(Array(Indians are cultural , People )) 


The values "Indians are , cultural , People" is made available in all machines



Accumulator Variable : 


A data item which changes time to time is a Variable . Accumulator is a data item which changes which will accumulate ( increases the value time to time ) .On a whole Accumulator Variable is a variable in spark which generally accumulates the values and can be used in spark .



Example : 

>>val accum = sc.longAccumulator("Count of all elements")

>>sc.parallelize(Array(1, 2, 3, 4)).foreach(x => accum.add(x))

>>accum.value

res2: Long = 10