PySpark exam review

My first organized project in this vein is to pass the Hortonworks Spark Developer exam. The study guide has some specific stated goals that will form the structure of this chapter.

Spark Core

Write a Spark Core application in Python or Scala

Initialize a Spark application

Run a Spark job on YARN

Create an RDD

Create an RDD from a file or directory on HDFS

Persist an RDD in memory or on disk

Perform Spark transformations on an RDD

Perform Spark actions on an RDD

Create and use broadcast variables

Create and use accumulators

Configure Spark properties

SparkSQL

Create Spark DataFrames from an existing RDD

Perform operations on a DataFrame

Write a SparkSQL application

Use Hive with ORC from SparkSQL

Write a SparkSQL application that reads and writes data from/to Hive tables.

results matching ""

    No results matching ""