82.9k views
4 votes
Which statement is true about Spark's Resilient Distribution Dataset (RDD)?

A) RDDs are immutable and can't be modified once created.
B) RDDs are stored in a traditional relational database.
C) RDDs can be modified and updated dynamically.
D) RDDs are only available in Python, not in other programming languages.

User Jeniffer
by
7.6k points

1 Answer

2 votes

Final answer:

The true statement is that RDDs are immutable and cannot be modified once created. They are stored across a cluster and support several programming languages, not only Python.

Step-by-step explanation:

The correct statement about Spark's Resilient Distributed Dataset (RDD) is A) RDDs are immutable and can't be modified once created. RDDs are a fundamental data structure of Apache Spark, which is a cluster-computing framework. Once an RDD has been created, it cannot be changed directly. However, it can be transformed into a new RDD through operations like map, filter, and reduce. These transformations allow you to produce new datasets by applying functions to the data. For example, you might have an RDD containing numbers and use a map transformation to create a new RDD with each number squared.

RDDs are not stored in a traditional relational database as suggested by option B. They are stored in memory or on a disk across a cluster in a fault-tolerant manner. Option C is incorrect as RDDs are immutable, which means they cannot be modified dynamically once defined. As for option D, this is also incorrect because RDDs can be used with several programming languages supported by Spark, including Scala, Java, and Python, not just Python.

User Shinya Koizumi
by
8.4k points