Final answer:
Broadcast variables in Apache Spark are used to improve data shuffling performance by keeping a read-only variable cached on each machine, thus reducing network data transfers and helping with memory management.
Step-by-step explanation:
The need for broadcast variables in Apache Spark is to optimize the efficiency of data distribution across multiple nodes in a cluster. Broadcast variables are a feature of Spark that allow the developer to keep a read-only variable cached on each machine rather than shipping a copy of it with tasks. They are especially useful for large datasets that are needed by all nodes of the Spark application, such as a lookup table or a large feature vector. The correct option from the given choices is:
b. Broadcast variables improve data shuffling performance.
Broadcast variables can help reduce the amount of data that is transferred over the network during shuffles, which can improve the overall performance of the Spark application. They help in memory management by eliminating the necessity for tasks to get a fresh copy of the variable with every iteration, which can lead to extensive use of memory and slowdowns.