Setup Spark Cluster
Single Node Spark Cluster (local mode)
# Install OS patches
sudo yum update -y
# Install Open JDK
sudo yum install java-1.8.0-openjdk -y
# Download and install anaconda for python 3.7
wget https://repo.anaconda.com/archive/Anaconda3-2019.10-Linux-x86_64.sh
bash Anaconda3-*.sh -b -u
# Update PATH variable with anaconda
echo 'export PATH=/home/ec2-user/anaconda3/bin:$PATH' >> ~/.bash_profile
source ~/.bash_profile
# Generate jupyer configuration file. Setting password: pass123
jupyter notebook --generate-config
echo 'c.NotebookApp.ip = "0.0.0.0"' >> ~/.jupyter/jupyter_notebook_config.py
echo 'c.NotebookApp.open_browser = False' >> ~/.jupyter/jupyter_notebook_config.py
echo 'c.NotebookApp.password = "sha1:298cdb938f91:e6462632083df74cb2f68d4b55b0e7a7d7e0e45b"' >> ~/.jupyter/jupyter_notebook_config.py
# Dowbload spark
wget http://apachemirror.wuchna.com/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
tar xf spark-2.*-bin-hadoop2.7.tgz
sudo mv spark-2.*-bin-hadoop2.7 /usr/lib/spark
# Set spark to PATH
echo 'export SPARK_HOME=/usr/lib/spark' >> ~/.bash_profile
echo 'export PATH=$SPARK_HOME/bin:$PATH' >> ~/.bash_profile
echo 'export PATH=/home/ec2-user/anaconda3/bin:$PATH' >> ~/.bash_profile
source ~/.bash_profile
Untar binary of Zeppelin and Spark to /usr/lib
$ tar xf spark-2.0.2-bin-hadoop2.7.tgz
$ tar xf zeppelin-0.6.2-bin-all.tgz
$ sudo mv spark-2.0.2-bin-hadoop2.7 /usr/lib
$ sudo mv zeppelin-0.6.2-bin-all /usr/lib
Now you can launch Zeppelin notebook.
http://ec2-35-165-67-12.us-west-2.compute.amazonaws.com:8080
Next Steps