Hive Metastore in Spark

To install Mysql 5.7.

$ sudo yum localinstall https://dev.mysql.com/get/mysql57-community-release-el6-8.noarch.rpm

$ sudo yum install mysql-community-server -y

Start the mysqld service

$ sudo service mysqld start

$ sudo chkconfig mysqld on

Find temporary password for root

$ sudo grep 'temporary password' /var/log/mysqld.log

Launch mysql

$ mysql -u root -p

<enter the temporary password>

Set password for root user

mysql> ALTER USER 'root'@'localhost' IDENTIFIED BY 'Pass@123!';

If you like to disable password validation in Mysql 5.7.

mysql> uninstall plugin validate_password;

Create a user 'spark'@'%' and grant all privileges

mysql> CREATE USER 'spark'@'%' IDENTIFIED BY 'spark';

mysql> REVOKE ALL PRIVILEGES, GRANT OPTION FROM 'spark'@'%';

mysql> GRANT ALL PRIVILEGES ON *.* TO 'spark'@'%';

mysql> FLUSH PRIVILEGES;

Download hive binaries. To persist schema from Spark, you do not required hive binaries or HDFS. However, you need to create the hive metastore schema. To create the metastore schema, use the mysql script available inside hive binaries. Follow the steps as below.

$ wget https://www-eu.apache.org/dist/hive/hive-1.2.1/apache-hive-1.2.1-bin.tar.gz

$ tar xf apache-hive-1.2.1-bin.tar.gz

$ cd apache-hive-1.2.1-bin/scripts/metastore/upgrade/mysql/

$ mysql -u root -p

<enter mysql root password>

mysql> create database metastore;

mysql> use metastore;

mysql> source 'hive-schema-1.2.0.mysql.sql';

mysql> exit;

Now create a hive configuration, $SPARK_HOME/conf/hive-site.xml with the following content

<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://172.31.12.108:3306/metastore?useSSL=false</value> <description>JDBC connect string for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>spark</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>spark</value> </property> <property> <name>datanucleus.fixedDatastore</name> <value>true</value> </property> <property> <name>datanucleus.autoCreateSchema</name> <value>true</value> </property> </configuration>

Now you are good to go with Spark.