161k views
1 vote
Which two of the following are row-based data encoding formats? (Select two)

A) ORC
B) Parquet
C) Avro
D) CSV

User Mhinz
by
8.2k points

2 Answers

4 votes

Final answer:

CSV and Avro are the two row-based data encoding formats among the options. CSV is a text-based format where records are separated by lines, and Avro is a binary format suitable for efficiently serializing data. ORC and Parquet, in contrast, are column-based formats optimized for analytical workloads.

Step-by-step explanation:

Among the options provided, CSV (D) and Avro (C) are the two data encoding formats that are based on a row-oriented approach. CSV is a simple, text-based format where each line represents a single record with values separated by a delimiter, typically a comma. This format is inherently row-based, as each row in the file corresponds to a single record. Avro, on the other hand, is a binary row-based format developed as part of the Apache Hadoop project and is designed for serializing data in a compact and efficient manner.

While ORC (A) and Parquet (B) are also data encoding formats, they follow a columnar storage approach. This means that they store data tables by column rather than by row, which facilitates efficient data compression and encoding schemes, especially for complex nested data structures. Their column-oriented design is suited for analytic workloads that typically involve reading a subset of columns from the dataset.

User Wyx
by
7.8k points
1 vote

Final answer:

CSV and ORC are row-based data encoding formats that store tabular data in plain text with commas separating values, and optimized row columnar format with advanced compression and efficient column pruning respectively.

The correct answer is A.

Step-by-step explanation:

The two row-based data encoding formats are CSV and ORC.

CSV stands for Comma-Separated Values and it stores tabular data in plain text, using commas to separate values in each row. It is a simple and widely supported format, but it can be less efficient for data processing compared to other formats.

ORC stands for Optimized Row Columnar and it is a columnar storage format that provides advanced compression and efficient column pruning. It is optimized for analytical workloads and can significantly improve query performance.

User JohanB
by
8.0k points