Category: Spark
-
RDD in PySpark
In this article I will explain the Resilient Distributed Datasets, RDD, in Spark with Python and PySpark. I explain how are built the pipelines in PySpark with the Transformations and Actions. Finally, I will show examples of code handling a simple numerical dataset.