MapR Sandbox

Sanity test the Sandbox

List the mapr cli service running on the MapR Sandbox.

[mapr@maprdemo ~]$ maprcli service list

logpath displayname name memallocated state

/opt/mapr/hbase/hbase-1.1.8/logs HBaseThriftServer hbasethrift Auto 2

/opt/mapr/httpfs/httpfs-1.0/logs Httpfs httpfs Auto 2

/opt/mapr/spark/spark-2.2.1/logs/ SparkHistoryServer spark-historyserver Auto 2

/opt/mapr/oozie/oozie-4.3.0/logs Oozie oozie Auto 2

/opt/mapr/hive/hive-2.1/logs/mapr HiveServer2 hs2 Auto 2

/opt/mapr/logs/cldb.log CLDB cldb 256.0 2

/opt/mapr/logs/hoststats.log HostStats hoststats Auto 2

/opt/mapr/hive/hive-2.1/logs/mapr HiveMetastore hivemeta Auto 2

/opt/mapr/logs/mfs.log FileServer fileserver 1609.0 2

/opt/mapr/hadoop/hadoop-2.7.0/logs ResourceManager resourcemanager 1073.0 2

/opt/mapr/hadoop/hadoop-2.7.0/logs JobHistoryServer historyserver 107.0 2

/opt/mapr/hue/hue-3.12.0/logs/ HueWebServer hue Auto 2

/opt/mapr/logs/nfsserver.log NFS Gateway nfs 4

/opt/mapr/hadoop/hadoop-2.7.0/logs NodeManager nodemanager 215.0 2

/opt/mapr/apiserver/logs/apiserver.log APIServer apiserver 1000.0 2


If you get the following error while listing the services, probably mapr-zookeeper service is not functioning properly.

$ maprcli service list

ERROR (10009) - Could not connect to CLDB and no Zookeeper connect string provided


Restart the services. Wait for 3 minutes before testing again for the service status.

$ sudo service mapr-zookeeper restart

$ sudo service mapr-warden restart


As you see nfs service has state = 4, which indicates the NFS service of MaprFS is down. You are restart the service as below.

$ maprcli node services -nodes maprdemo -name nfs -action restart

MCS is published by the apiserver. In MCS, which is available on https://localhost:8443 (note, it is https).



Run Spark in Jupyter notebook

Download an install anaconda for python 3.6+ (64 bit). Run as mapr user.

https://www.anaconda.com/download/#linux

wget https://repo.anaconda.com/archive/Anaconda3-5.2.0-Linux-x86_64.sh

bash Anaconda3-5.2.0-Linux-x86_64.sh -f -b

echo "export PATH=$HOME/anaconda3/bin:\$PATH" >> ~/.bashrc

source ~/.bashrc


Open a terminal and run the following commands

export PATH=/home/mapr/anaconda3/bin:$PATH

rm -rf ~/.jupyter/jupyter_notebook_config.py

jupyter notebook --generate-config

echo "c.NotebookApp.ip = '*'" >> ~/.jupyter/jupyter_notebook_config.py

echo "c.NotebookApp.open_browser = False" >> ~/.jupyter/jupyter_notebook_config.py

echo "c.NotebookApp.password = u'sha1:298cdb938f91:e6462632083df74cb2f68d4b55b0e7a7d7e0e45b'" >> ~/.jupyter/jupyter_notebook_config.py


mkdir -p ~/notebooks

cd ~/notebooks

jupyter notebook


Keep the terminal running. It runs the notebook server.

Do a port forwarding for port 8889. Open Virtual box settings > Network > NAT adapter advanced setting.

Open a browser on the host machine and open http://localhost:8889

Use password, use pass123 to login.


To get started with Spark with Jupyter following this doc.

https://blog.einext.com/apache-spark/pyspark-on-jupyter-notebook

import os, sys

SPARK_HOME = "/opt/mapr/spark/spark-2.2.1"

sys.path.insert(0, os.path.join(SPARK_HOME, "python", "lib", "py4j-0.10.4-src.zip"))

sys.path.insert(0, os.path.join(SPARK_HOME, "python"))

from pyspark.sql import SparkSession

spark = SparkSession.builder.enableHiveSupport().getOrCreate()

sc = spark.sparkContext

spark.read.text("file:///etc/passwd").show(10, False)

It should show an ouput something like below

+----------------------------------------------+

|value |

+----------------------------------------------+

|root:x:0:0:root:/root:/bin/bash |

|bin:x:1:1:bin:/bin:/sbin/nologin |

|daemon:x:2:2:daemon:/sbin:/sbin/nologin |

|adm:x:3:4:adm:/var/adm:/sbin/nologin |

|lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin |

|sync:x:5:0:sync:/sbin:/bin/sync |

|shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown |

|halt:x:7:0:halt:/sbin:/sbin/halt |

|mail:x:8:12:mail:/var/spool/mail:/sbin/nologin|

|operator:x:11:0:operator:/root:/sbin/nologin |

+----------------------------------------------+

only showing top 10 rows