Answer» - In-MEMORY: ABILITY to perform operation in the primary memory not in the disk
- Immutable or Read-Only: Emphasize in creating the immutable data set.
- Lazy evaluated: Spark computing the record when the action is going to perform, not in transformation level.
- Cacheable: We can cache the record, for faster processing.
- Parallel: Spark has an ability to parallelize the operation on data, saved in RDD.
- Partitioned of records: Spark has ability to partition the record, by DEFAULT its support 128 MB of partition.
- Parallelizing: an existing collection in your driver program.
- Referencing a dataset in an external STORAGE system, such as a SHARED file system, HDFS, HBase
|