Jupyter Notebook for Pyspark
Goal: Create a spark project for python in Jupyter Notebook.
Download Apache Spark binary and untar it in a location of your choice and set SPARK_HOME environment variable to that location. For example,
You can make these in $SPARK_HOME/conf/spark-env.sh to make them persistent after reboot.
Now, launch jupyter with Spark session support$ cd ~/workspace/python
Open a web browser http://localhost:7070, create a new Python 3 notebook and check whether SparkContext object is available to get started with pyspark code.
Alternatively, suppose you want to start jupyter in a normal way then start the spark session.
Run the following before in launching jupyter and install py4j using pip
Launch jupyter in the browser http://localhost:8888 and create a new notebook.
Submit .py file using spark-submit
Launch jupyter with spark backend: $ %SPARK_HOME/bin/pyspark
Submit the job $ %SPARK_HOME/bin/spark-submit <.py file>