MapR Sandbox
Sanity test the Sandbox
List the mapr cli service running on the MapR Sandbox.
[mapr@maprdemo ~]$ maprcli service list
logpath displayname name memallocated state
/opt/mapr/hbase/hbase-1.1.8/logs HBaseThriftServer hbasethrift Auto 2
/opt/mapr/httpfs/httpfs-1.0/logs Httpfs httpfs Auto 2
/opt/mapr/spark/spark-2.2.1/logs/ SparkHistoryServer spark-historyserver Auto 2
/opt/mapr/oozie/oozie-4.3.0/logs Oozie oozie Auto 2
/opt/mapr/hive/hive-2.1/logs/mapr HiveServer2 hs2 Auto 2
/opt/mapr/logs/cldb.log CLDB cldb 256.0 2
/opt/mapr/logs/hoststats.log HostStats hoststats Auto 2
/opt/mapr/hive/hive-2.1/logs/mapr HiveMetastore hivemeta Auto 2
/opt/mapr/logs/mfs.log FileServer fileserver 1609.0 2
/opt/mapr/hadoop/hadoop-2.7.0/logs ResourceManager resourcemanager 1073.0 2
/opt/mapr/hadoop/hadoop-2.7.0/logs JobHistoryServer historyserver 107.0 2
/opt/mapr/hue/hue-3.12.0/logs/ HueWebServer hue Auto 2
/opt/mapr/logs/nfsserver.log NFS Gateway nfs 4
/opt/mapr/hadoop/hadoop-2.7.0/logs NodeManager nodemanager 215.0 2
/opt/mapr/apiserver/logs/apiserver.log APIServer apiserver 1000.0 2
If you get the following error while listing the services, probably mapr-zookeeper service is not functioning properly.
$ maprcli service list
ERROR (10009) - Could not connect to CLDB and no Zookeeper connect string provided
Restart the services. Wait for 3 minutes before testing again for the service status.
$ sudo service mapr-zookeeper restart
$ sudo service mapr-warden restart
As you see nfs service has state = 4, which indicates the NFS service of MaprFS is down. You are restart the service as below.
$ maprcli node services -nodes maprdemo -name nfs -action restart
MCS is published by the apiserver. In MCS, which is available on https://localhost:8443 (note, it is https).
Run Spark in Jupyter notebook
Download an install anaconda for python 3.6+ (64 bit). Run as mapr user.
https://www.anaconda.com/download/#linux
wget https://repo.anaconda.com/archive/Anaconda3-5.2.0-Linux-x86_64.sh
bash Anaconda3-5.2.0-Linux-x86_64.sh -f -b
echo "export PATH=$HOME/anaconda3/bin:\$PATH" >> ~/.bashrc
source ~/.bashrc
Open a terminal and run the following commands
export PATH=/home/mapr/anaconda3/bin:$PATH
rm -rf ~/.jupyter/jupyter_notebook_config.py
jupyter notebook --generate-config
echo "c.NotebookApp.ip = '*'" >> ~/.jupyter/jupyter_notebook_config.py
echo "c.NotebookApp.open_browser = False" >> ~/.jupyter/jupyter_notebook_config.py
echo "c.NotebookApp.password = u'sha1:298cdb938f91:e6462632083df74cb2f68d4b55b0e7a7d7e0e45b'" >> ~/.jupyter/jupyter_notebook_config.py
mkdir -p ~/notebooks
cd ~/notebooks
jupyter notebook
Keep the terminal running. It runs the notebook server.
Do a port forwarding for port 8889. Open Virtual box settings > Network > NAT adapter advanced setting.
Open a browser on the host machine and open http://localhost:8889
Use password, use pass123 to login.
To get started with Spark with Jupyter following this doc.
https://blog.einext.com/apache-spark/pyspark-on-jupyter-notebook
import os, sys
SPARK_HOME = "/opt/mapr/spark/spark-2.2.1"
sys.path.insert(0, os.path.join(SPARK_HOME, "python", "lib", "py4j-0.10.4-src.zip"))
sys.path.insert(0, os.path.join(SPARK_HOME, "python"))
from pyspark.sql import SparkSession
spark = SparkSession.builder.enableHiveSupport().getOrCreate()
sc = spark.sparkContext
spark.read.text("file:///etc/passwd").show(10, False)
It should show an ouput something like below
+----------------------------------------------+
|value |
+----------------------------------------------+
|root:x:0:0:root:/root:/bin/bash |
|bin:x:1:1:bin:/bin:/sbin/nologin |
|daemon:x:2:2:daemon:/sbin:/sbin/nologin |
|adm:x:3:4:adm:/var/adm:/sbin/nologin |
|lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin |
|sync:x:5:0:sync:/sbin:/bin/sync |
|shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown |
|halt:x:7:0:halt:/sbin:/sbin/halt |
|mail:x:8:12:mail:/var/spool/mail:/sbin/nologin|
|operator:x:11:0:operator:/root:/sbin/nologin |
+----------------------------------------------+
only showing top 10 rows