Category: Spark

  • RDD in PySpark

    RDD in PySpark

    In this article I will explain the Resilient Distributed Datasets, RDD, in Spark with Python and PySpark. I explain how are built the pipelines in PySpark with the Transformations and Actions. Finally, I will show examples of code handling a simple numerical dataset.