HiveContext vs Spark SQLContext

Hive Context: org.apache.spark.sql.hive.HiveContext

Spark SQLContext: org.apache.spark.sql.SQLContext.

Following are the reasons you need to use hive context, if you want to

  • Use hive metastore tables/views
  • Launch thrift server
  • Run window functions (rank, dense_rank, lag, lead)
  • Use hive udf
  • Persist the table schema for Spark SQL
  • Enable thrift JDBC service for Spark SQL

HiveContext is more battle tested.

Spark 2 is introducing window functions natively and would be compliant with ANSI SQL 2003 standard.