MapR Sandbox

Sanity test the Sandbox

List the mapr cli service running on the MapR Sandbox.

[mapr@maprdemo ~]$ maprcli service list
logpath                                 displayname         name                 memallocated  state
/opt/mapr/hbase/hbase-1.1.8/logs        HBaseThriftServer   hbasethrift          Auto          2
/opt/mapr/httpfs/httpfs-1.0/logs        Httpfs              httpfs               Auto          2
/opt/mapr/spark/spark-2.2.1/logs/       SparkHistoryServer  spark-historyserver  Auto          2
/opt/mapr/oozie/oozie-4.3.0/logs        Oozie               oozie                Auto          2
/opt/mapr/hive/hive-2.1/logs/mapr       HiveServer2         hs2                  Auto          2
/opt/mapr/logs/cldb.log                 CLDB                cldb                 256.0         2
/opt/mapr/logs/hoststats.log            HostStats           hoststats            Auto          2
/opt/mapr/hive/hive-2.1/logs/mapr       HiveMetastore       hivemeta             Auto          2
/opt/mapr/logs/mfs.log                  FileServer          fileserver           1609.0        2
/opt/mapr/hadoop/hadoop-2.7.0/logs      ResourceManager     resourcemanager      1073.0        2
/opt/mapr/hadoop/hadoop-2.7.0/logs      JobHistoryServer    historyserver        107.0         2
/opt/mapr/hue/hue-3.12.0/logs/          HueWebServer        hue                  Auto          2
/opt/mapr/logs/nfsserver.log            NFS Gateway         nfs                                4
/opt/mapr/hadoop/hadoop-2.7.0/logs      NodeManager         nodemanager          215.0         2
/opt/mapr/apiserver/logs/apiserver.log  APIServer           apiserver            1000.0        2

If you get the following error while listing the services, probably mapr-zookeeper service is not functioning properly.

$ maprcli service list
ERROR (10009) -  Could not connect to CLDB and no Zookeeper connect string provided

Restart the services. Wait for 3 minutes before testing again for the service status.

$ sudo service mapr-zookeeper restart
$ sudo service mapr-warden restart

As you see nfs service has state = 4, which indicates the NFS service of MaprFS is down. You are restart the service as below.

$ maprcli node services -nodes maprdemo  -name nfs -action restart

MCS is published by the apiserver. In MCS, which is available on https://localhost:8443 (note, it is https).

Run Spark in Jupyter notebook

Download an install anaconda for python 3.6+ (64 bit). Run as mapr user.

bash -f -b
echo "export PATH=$HOME/anaconda3/bin:\$PATH" >> ~/.bashrc
source ~/.bashrc

Open a terminal and run the following commands

export PATH=/home/mapr/anaconda3/bin:$PATH
rm -rf ~/.jupyter/
jupyter notebook --generate-config
echo "c.NotebookApp.ip = '*'" >> ~/.jupyter/
echo "c.NotebookApp.open_browser = False" >> ~/.jupyter/
echo "c.NotebookApp.password = u'sha1:298cdb938f91:e6462632083df74cb2f68d4b55b0e7a7d7e0e45b'" >> ~/.jupyter/

mkdir -p ~/notebooks
cd ~/notebooks
jupyter notebook

Keep the terminal running. It runs the notebook server.

Do a port forwarding for port 8889. Open Virtual box settings > Network > NAT adapter advanced setting.

Open a browser on the host machine and open http://localhost:8889

Use password, use pass123 to login.

To get started with Spark with Jupyter following this doc.

import os, sys
SPARK_HOME = "/opt/mapr/spark/spark-2.2.1"
sys.path.insert(0, os.path.join(SPARK_HOME, "python", "lib", ""))
sys.path.insert(0, os.path.join(SPARK_HOME, "python"))
from pyspark.sql import SparkSession
spark = SparkSession.builder.enableHiveSupport().getOrCreate()
sc = spark.sparkContext"file:///etc/passwd").show(10, False)

It should show an ouput something like below

|value                                         |
|root:x:0:0:root:/root:/bin/bash               |
|bin:x:1:1:bin:/bin:/sbin/nologin              |
|daemon:x:2:2:daemon:/sbin:/sbin/nologin       |
|adm:x:3:4:adm:/var/adm:/sbin/nologin          |
|lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin      |
|sync:x:5:0:sync:/sbin:/bin/sync               |
|shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown  |
|halt:x:7:0:halt:/sbin:/sbin/halt              |
|operator:x:11:0:operator:/root:/sbin/nologin  |
only showing top 10 rows