1.

What is a parquet file in Spark?

Answer»

Parquet is a column-based file format which is used to optimize the speed of queries and is very EFFICIENT than a CSV or JSON file format. Spark SQL supports both read and write functions on parquet files which capture schema of original data AUTOMATICALLY

5. Why spark is faster than Hive?

Spark is faster than Hive because it does the processing of data in the main memory of worker nodes thus preventing UNNECESSARY I/O operations within DISKS.



Discussion

No Comment Found