MapR Sandbox

Sanity test the Sandbox

List the mapr cli service running on the MapR Sandbox.

[mapr@maprdemo ~]$ maprcli service list

logpath displayname name memallocated state

/opt/mapr/hbase/hbase-1.1.8/logs HBaseThriftServer hbasethrift Auto 2

/opt/mapr/httpfs/httpfs-1.0/logs Httpfs httpfs Auto 2

/opt/mapr/spark/spark-2.2.1/logs/ SparkHistoryServer spark-historyserver Auto 2

/opt/mapr/oozie/oozie-4.3.0/logs Oozie oozie Auto 2

/opt/mapr/hive/hive-2.1/logs/mapr HiveServer2 hs2 Auto 2

/opt/mapr/logs/cldb.log CLDB cldb 256.0 2

/opt/mapr/logs/hoststats.log HostStats hoststats Auto 2

/opt/mapr/hive/hive-2.1/logs/mapr HiveMetastore hivemeta Auto 2

/opt/mapr/logs/mfs.log FileServer fileserver 1609.0 2

/opt/mapr/hadoop/hadoop-2.7.0/logs ResourceManager resourcemanager 1073.0 2

/opt/mapr/hadoop/hadoop-2.7.0/logs JobHistoryServer historyserver 107.0 2

/opt/mapr/hue/hue-3.12.0/logs/ HueWebServer hue Auto 2

/opt/mapr/logs/nfsserver.log NFS Gateway nfs 4

/opt/mapr/hadoop/hadoop-2.7.0/logs NodeManager nodemanager 215.0 2

/opt/mapr/apiserver/logs/apiserver.log APIServer apiserver 1000.0 2

If you get the following error while listing the services, probably mapr-zookeeper service is not functioning properly.

$ maprcli service list

ERROR (10009) - Could not connect to CLDB and no Zookeeper connect string provided

Restart the services. Wait for 3 minutes before testing again for the service status.

$ sudo service mapr-zookeeper restart

$ sudo service mapr-warden restart

As you see nfs service has state = 4, which indicates the NFS service of MaprFS is down. You are restart the service as below.

$ maprcli node services -nodes maprdemo -name nfs -action restart

MCS is published by the apiserver. In MCS, which is available on https://localhost:8443 (note, it is https).

Run Spark in Jupyter notebook

Download an install anaconda for python 3.6+ (64 bit). Run as mapr user.


bash -f -b

echo "export PATH=$HOME/anaconda3/bin:\$PATH" >> ~/.bashrc

source ~/.bashrc

Open a terminal and run the following commands

export PATH=/home/mapr/anaconda3/bin:$PATH

rm -rf ~/.jupyter/

jupyter notebook --generate-config

echo "c.NotebookApp.ip = '*'" >> ~/.jupyter/

echo "c.NotebookApp.open_browser = False" >> ~/.jupyter/

echo "c.NotebookApp.password = u'sha1:298cdb938f91:e6462632083df74cb2f68d4b55b0e7a7d7e0e45b'" >> ~/.jupyter/

mkdir -p ~/notebooks

cd ~/notebooks

jupyter notebook

Keep the terminal running. It runs the notebook server.

Do a port forwarding for port 8889. Open Virtual box settings > Network > NAT adapter advanced setting.

Open a browser on the host machine and open http://localhost:8889

Use password, use pass123 to login.

To get started with Spark with Jupyter following this doc.

import os, sys

SPARK_HOME = "/opt/mapr/spark/spark-2.2.1"

sys.path.insert(0, os.path.join(SPARK_HOME, "python", "lib", ""))

sys.path.insert(0, os.path.join(SPARK_HOME, "python"))

from pyspark.sql import SparkSession

spark = SparkSession.builder.enableHiveSupport().getOrCreate()

sc = spark.sparkContext"file:///etc/passwd").show(10, False)

It should show an ouput something like below


|value |


|root:x:0:0:root:/root:/bin/bash |

|bin:x:1:1:bin:/bin:/sbin/nologin |

|daemon:x:2:2:daemon:/sbin:/sbin/nologin |

|adm:x:3:4:adm:/var/adm:/sbin/nologin |

|lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin |

|sync:x:5:0:sync:/sbin:/bin/sync |

|shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown |

|halt:x:7:0:halt:/sbin:/sbin/halt |


|operator:x:11:0:operator:/root:/sbin/nologin |


only showing top 10 rows