HBase Fundamentals

What is HBase?

HBase is an open source, non-relational, distributed database modeled after Google's BigTable and is written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed File System), providing BigTable-like capabilities for Hadoop. That is, it provides a fault-tolerant way of storing large quantities of sparse data (small amounts of information caught within a large collection of empty or unimportant data, such as finding the 50 largest items in a group of 2 billion records, or finding the non-zero items representing less than 0.1% of a huge collection).

In the parlance of Eric Brewer’s CAP Theorem, HBase is a CP type system.


Common Use Cases of HBase

HBase is use for low latency (<10ms) for high throughput read and/or write use cases as operational data source. It supports fast CRUD operation. It is also useful for storing high velocity HA time series data (streaming data).

  • No SQL
  • Wide columnar
  • Schema less
  • Distributed
  • Strongly consistent
  • Highly scalable (~ peta byte scale)


Limitations

  • Does not support join, order by, group by queries
  • No secondary index
  • No support for foreign key constraint

When not to use HBase

  • Do not use Big Table if you need support transaction level atomicity. For example, in eCommerce, an order placement could perform a CRUD operations on orders table, order line items tables and inventory table in a single transaction. HBase supports row level atomic during. For transaction level atomicity, use RDBMS like Oracle, SQL Server (on premise) or AWS Aurora, Google Cloud SQL/Spanner (on cloud).
  • Do not use for data less than 1 TB. Use relational databases like Mysql, Postgres, Oracle etc.
  • Do not use if the primary workload is analytics oriented. Instead, use Hadoop + Hive like combination.
  • Do not use for documents or highly structured hierarchies. Use MongoDB, CouchDB etc.
  • Do not use for blob storage where typical size is > 10MB. Use distributed file systems like AWS S3, Google Cloud Storage (cloud) or MapR-FS (on premise)


Check hbase version

$ hbase version


Sanity check HBase services

Find out the hbase services

$ sudo ls -l /etc/init.d/hbase*

Check the status of the services for hbase-master and hbase-regionserver services

$ sudo service hbase-master status
$ sudo service hbase-regionserver status

If any of the services are not in running state, please restart the service, for example,

$ sudo service hbase-master restart
$ sudo service hbase-regionserver restart

hbase installation directory: /usr/lib/hbase

$ ls -l /etc/hbase/conf/
-rw-r--r-- 1 root root 1811 Mar 23 11:29 hadoop-metrics2-hbase.properties
-rw-r--r-- 1 root root 4537 Mar 23 11:29 hbase-env.cmd
-rw-r--r-- 1 root root 7468 Mar 23 11:29 hbase-env.sh
-rw-r--r-- 1 root root 2257 Mar 23 11:29 hbase-policy.xml
-rw-rw-r-- 1 root root 1648 Apr  5 16:02 hbase-site.xml
-rw-r--r-- 1 root root 4339 Mar 23 11:29 log4j.properties
-rw-r--r-- 1 root root   10 Mar 23 11:29 regionservers

hbase-env.sh: environment variable, JVM properties etc.

hbase-policy.xml: access policy for hbase services

hbase-site.xml: configuration of hbase cluster

log4j.properties: logging output controls

regionservers: list of region servers allowed to connect to HMaster

View hbase-site.xml to find

  • root HDFS directory
  • distribution mode


HBase Operations

Launch hbase shell

$ hbase shell

Command groups

hbase(main):002:0> help                                                                                                                                                                       
HBase Shell, version 2.0.0.3.0.1.0-187, re9fcf450949102de5069b257a6dee469b8f5aab3, Wed Sep 19 10:16:35 UTC 2018                                                                               
Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.                                                                                        
Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group.                                                                                       
                                                                                                                                                                                              
COMMAND GROUPS:                                                                                                                                                                               
  Group name: general                                                                                                                                                                         
  Commands: processlist, status, table_help, version, whoami                                                                                                                                  
                                                                                                                                                                                              
  Group name: ddl                                                                                                                                                                             
  Commands: alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, list_regions, loca
te_region, show_filters                                                                                                                                                                       
                                                                                                                                                                                              
  Group name: namespace                                                                                                                                                                       
  Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables                                                                      
                                                                                                                                                                                              
  Group name: dml                                                                                                                                                                             
  Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve                                                                      
                                                                                                                                                                                              
  Group name: tools                                                                                                                                                                           
  Commands: assign, balance_switch, balancer, balancer_enabled, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, cleaner_chore_enabled, cleaner_chore_run, cleaner_chore_swi
tch, clear_block_cache, clear_compaction_queues, clear_deadservers, close_region, compact, compact_rs, compaction_state, flush, is_in_maintenance_mode, list_deadservers, major_compact, merge
_region, move, normalize, normalizer_enabled, normalizer_switch, split, splitormerge_enabled, splitormerge_switch, trace, unassign, wal_roll, zk_dump                                         
                                                                                                                                                                                              
  Group name: replication                                                                                                                                                                     
  Commands: add_peer, append_peer_namespaces, append_peer_tableCFs, disable_peer, disable_table_replication, enable_peer, enable_table_replication, get_peer_config, list_peer_configs, list_p
eers, list_replicated_tables, remove_peer, remove_peer_namespaces, remove_peer_tableCFs, set_peer_bandwidth, set_peer_exclude_namespaces, set_peer_exclude_tableCFs, set_peer_namespaces, set_
peer_replicate_all, set_peer_tableCFs, show_peer_tableCFs, update_peer_config                                                                                                                 
                                                                                                                                                                                              
  Group name: snapshots                                                                                                                                                                       
  Commands: clone_snapshot, delete_all_snapshot, delete_snapshot, delete_table_snapshots, list_snapshots, list_table_snapshots, restore_snapshot, snapshot                                    
                                                                                                                                                                                              
  Group name: configuration                                                                                                                                                                   
  Commands: update_all_config, update_config                                                                                                                                                  
                                                                                                                                                                                              
  Group name: quotas                                                                                                                                                                          
  Commands: list_quota_snapshots, list_quota_table_sizes, list_quotas, list_snapshot_sizes, set_quota                                                                                         
                                                                                                                                                                                              
  Group name: security                                                                                                                                                                        
  Commands: grant, list_security_capabilities, revoke, user_permission                                                                                                                        
                                                                                                                                                                                              
  Group name: procedures                                                                                                                                                                      
  Commands: abort_procedure, list_locks, list_procedures                                                                                                                                      
                                                                                                                                                                                              
  Group name: visibility labels                                                                                                                                                               
  Commands: add_labels, clear_auths, get_auths, list_labels, set_auths, set_visibility                                                                                                        
                                                                                                                                                                                              
  Group name: rsgroup                                                                                                                                                                         
  Commands: add_rsgroup, balance_rsgroup, get_rsgroup, get_server_rsgroup, get_table_rsgroup, list_rsgroups, move_namespaces_rsgroup, move_servers_namespaces_rsgroup, move_servers_rsgroup, m
ove_servers_tables_rsgroup, move_tables_rsgroup, remove_rsgroup, remove_servers_rsgroup                                                                                                       
                                                                                          


Get help for a command

hbase(main):007:0> help 'list_namespace'                                                                                                                                                      
List all namespaces in hbase. Optional regular expression parameter could                                                                                                                     
be used to filter the output. Examples:                                                                                                                                                       
                                                                                                                                                                                              
  hbase> list_namespace                                                                                                                                                                       
  hbase> list_namespace 'abc.*'  


Create a sample table

hbase> create 'sample', 'cf1', 'cf2' 

Check HDFS location for the table. The value in blue is the region name (MD5 encoded value). It is likely to be different on your machine.

$ hadoop fs -ls /hbase/data/default/sample
drwxr-xr-x   - hbase supergroup          0 2016-08-26 12:01 /hbase/data/default/sample/.tabledesc
drwxr-xr-x   - hbase supergroup          0 2016-08-26 12:01 /hbase/data/default/sample/.tmp
drwxr-xr-x   - hbase supergroup          0 2016-08-26 12:01 /hbase/data/default/sample/37120ff814f8c05fe0e0015a6dadfec2
From the above directory structure, you can see that there is one region for this table. Each region is identified by 32 byte long region id. 

You can also find the table in HBase master web UI http://<hbase-master>:60010

Use describe command to view the column families and other information.

hbase> describe 'sample'
Table sample is ENABLED
sample         
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf1', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}                                                                                                          
{NAME => 'cf2', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'} 

Check the folder structure inside region directory.

$ hadoop fs -ls /hbase/data/default/sample/37120ff814f8c05fe0e0015a6dadfec2
-rw-r--r--   1 hbase supergroup         41 2016-08-26 12:04 /hbase/data/default/sample/37120ff814f8c05fe0e0015a6dadfec2/.regioninfo
drwxr-xr-x   - hbase supergroup          0 2016-08-26 12:04 /hbase/data/default/sample/37120ff814f8c05fe0e0015a6dadfec2/cf1
drwxr-xr-x   - hbase supergroup          0 2016-08-26 12:04 /hbase/data/default/sample/37120ff814f8c05fe0e0015a6dadfec2/cf2
drwxr-xr-x   - hbase supergroup          0 2016-08-26 12:04 /hbase/data/default/sample/37120ff814f8c05fe0e0015a6dadfec2/recovered.edits

Inside the region, for each column family, there is a separate folder.

If you look inside, one of directory dedicated to a column family, you will see there is no data inside. This is because we have not loaded any data yet.

$ hadoop fs -ls -R /hbase/data/default/sample/37120ff814f8c05fe0e0015a6dadfec2/
-rw-r--r--   1 hbase supergroup         41 2016-08-26 12:04 /hbase/data/default/sample/37120ff814f8c05fe0e0015a6dadfec2/.regioninfo
drwxr-xr-x   - hbase supergroup          0 2016-08-26 12:04 /hbase/data/default/sample/37120ff814f8c05fe0e0015a6dadfec2/cf1
drwxr-xr-x   - hbase supergroup          0 2016-08-26 12:04 /hbase/data/default/sample/37120ff814f8c05fe0e0015a6dadfec2/cf2
drwxr-xr-x   - hbase supergroup          0 2016-08-26 12:04 /hbase/data/default/sample/37120ff814f8c05fe0e0015a6dadfec2/recovered.edits
-rw-r--r--   1 hbase supergroup          0 2016-08-26 12:04 /hbase/data/default/sample/37120ff814f8c05fe0e0015a6dadfec2/recovered.edits/2.seqid

Let's put some values to the sample table.

hbase> put 'sample', 'k1', 'cf1:c1', 'v1'
hbase> put 'sample', 'k1', 'cf2:c1', 'v2'
hbase> put 'sample', 'k2', 'cf1:c2', 'v3'

Now, verify there directories again

$ hadoop fs -ls -R /hbase/data/default/sample/37120ff814f8c05fe0e0015a6dadfec2/
-rw-r--r--   1 hbase supergroup         41 2016-08-26 12:04 /hbase/data/default/sample/37120ff814f8c05fe0e0015a6dadfec2/.regioninfo
drwxr-xr-x   - hbase supergroup          0 2016-08-26 12:04 /hbase/data/default/sample/37120ff814f8c05fe0e0015a6dadfec2/cf1
drwxr-xr-x   - hbase supergroup          0 2016-08-26 12:04 /hbase/data/default/sample/37120ff814f8c05fe0e0015a6dadfec2/cf2
drwxr-xr-x   - hbase supergroup          0 2016-08-26 12:04 /hbase/data/default/sample/37120ff814f8c05fe0e0015a6dadfec2/recovered.edits
-rw-r--r--   1 hbase supergroup          0 2016-08-26 12:04 /hbase/data/default/sample/37120ff814f8c05fe0e0015a6dadfec2/recovered.edits/2.seqid

Force flush all the put to the file (generally not required under normal operational condition) . Without flush hbase will keep the data memstore. It will wait to reach a threshold (blocksize) before it flushes out to the disk.

hbase> flush 'sample'

Now, notice that 2 new files are created. These are called store files.

$ hadoop fs -ls -R /hbase/data/default/sample/37120ff814f8c05fe0e0015a6dadfec2/
-rw-r--r--   1 hbase supergroup         41 2016-08-26 12:04 /hbase/data/default/sample/37120ff814f8c05fe0e0015a6dadfec2/.regioninfo
drwxr-xr-x   - hbase supergroup          0 2016-08-26 12:16 /hbase/data/default/sample/37120ff814f8c05fe0e0015a6dadfec2/.tmp
drwxr-xr-x   - hbase supergroup          0 2016-08-26 12:16 /hbase/data/default/sample/37120ff814f8c05fe0e0015a6dadfec2/cf1
-rw-r--r--   1 hbase supergroup       1043 2016-08-26 12:16 /hbase/data/default/sample/37120ff814f8c05fe0e0015a6dadfec2/cf1/587c011cf1454b5da510999e3f7d8b6a
drwxr-xr-x   - hbase supergroup          0 2016-08-26 12:16 /hbase/data/default/sample/37120ff814f8c05fe0e0015a6dadfec2/cf2
-rw-r--r--   1 hbase supergroup       1011 2016-08-26 12:16 /hbase/data/default/sample/37120ff814f8c05fe0e0015a6dadfec2/cf2/16716b2167be401c99a18851d367472d
drwxr-xr-x   - hbase supergroup          0 2016-08-26 12:04 /hbase/data/default/sample/37120ff814f8c05fe0e0015a6dadfec2/recovered.edits
-rw-r--r--   1 hbase supergroup          0 2016-08-26 12:04 /hbase/data/default/sample/37120ff814f8c05fe0e0015a6dadfec2/recovered.edits/2.seqid

For each column family, the new data are written to file in HDFS.

Put two more values. one with new row key and one with existing row key.

hbase> put 'sample', 'k2', 'cf1:c3', 'v4'
hbase> put 'sample', 'k3', 'cf1:c1', 'v5'
hbase> flush 'sample'

Now, check the hbase directory again,

$ hadoop fs -ls -R /hbase/data/default/sample/37120ff814f8c05fe0e0015a6dadfec2/
-rw-r--r--   1 hbase supergroup         41 2016-08-26 12:04 /hbase/data/default/sample/37120ff814f8c05fe0e0015a6dadfec2/.regioninfo
drwxr-xr-x   - hbase supergroup          0 2016-08-26 12:21 /hbase/data/default/sample/37120ff814f8c05fe0e0015a6dadfec2/.tmp
drwxr-xr-x   - hbase supergroup          0 2016-08-26 12:21 /hbase/data/default/sample/37120ff814f8c05fe0e0015a6dadfec2/cf1
-rw-r--r--   1 hbase supergroup       1043 2016-08-26 12:16 /hbase/data/default/sample/37120ff814f8c05fe0e0015a6dadfec2/cf1/587c011cf1454b5da510999e3f7d8b6a
-rw-r--r--   1 hbase supergroup       1043 2016-08-26 12:21 /hbase/data/default/sample/37120ff814f8c05fe0e0015a6dadfec2/cf1/9cff20f096c1487f8a10a70fe64dfce2
drwxr-xr-x   - hbase supergroup          0 2016-08-26 12:16 /hbase/data/default/sample/37120ff814f8c05fe0e0015a6dadfec2/cf2
-rw-r--r--   1 hbase supergroup       1011 2016-08-26 12:16 /hbase/data/default/sample/37120ff814f8c05fe0e0015a6dadfec2/cf2/16716b2167be401c99a18851d367472d
drwxr-xr-x   - hbase supergroup          0 2016-08-26 12:04 /hbase/data/default/sample/37120ff814f8c05fe0e0015a6dadfec2/recovered.edits
-rw-r--r--   1 hbase supergroup          0 2016-08-26 12:04 /hbase/data/default/sample/37120ff814f8c05fe0e0015a6dadfec2/recovered.edits/2.seqid

A new store file has been created under columnfamily cf1.

Add a new column family, cf3 to the table

hbase> alter 'sample', 'cf3'

Scan the table to view the rows

hbase> scan 'sample'

Count number of rows in 'sample'

hbase> count 'sample'


A more performant version is . This count fetches 1000 rows at a time. Set CACHE lower if your rows are big. Default is to fetch one row at a time.

hbase> count 'sample', CACHE => 1000


Another way to count the number of rows in a table, especially useful for large tables.

$ hbase org.apache.hadoop.hbase.mapreduce.RowCounter 'table name'
2020-05-11 17:31:07,922 INFO  [main] mapreduce.Job: Job job_1589211235667_0003 completed successfully                                                                                         
2020-05-11 17:31:08,171 INFO  [main] mapreduce.Job: Counters: 46                                                                                                                              
        File System Counters                                                                                                                                                                  
                FILE: Number of bytes read=0                                                                                                                                                  
                FILE: Number of bytes written=274556                                                                                                                                          
                FILE: Number of read operations=0                                                                                                                                             
                FILE: Number of large read operations=0                                                                                                                                       
                FILE: Number of write operations=0                                                                                                                                            
                HDFS: Number of bytes read=215                                                                                                                                                
                HDFS: Number of bytes written=0                                                                                                                                               
                HDFS: Number of read operations=1                                                                                                                                             
                HDFS: Number of large read operations=0                                                                                                                                       
                HDFS: Number of write operations=0                                                                                                                                            
        Job Counters                                                                                                                                                                          
                Launched map tasks=1                                                                                                                                                          
                Rack-local map tasks=1                                                                                                                                                        
                Total time spent by all maps in occupied slots (ms)=178924                                                                                                                    
                Total time spent by all reduces in occupied slots (ms)=0                                                                                                                      
                Total time spent by all map tasks (ms)=44731                                                                                                                                  
                Total vcore-milliseconds taken by all map tasks=44731                                                                                                                         
                Total megabyte-milliseconds taken by all map tasks=45804544                                                                                                                   
        Map-Reduce Framework                                                                                                                                                                  
                Map input records=2                                                                                                                                                           
                Map output records=0                                                                                                                                                          
                Input split bytes=215                                                                                                                                                         
                Spilled Records=0                                                                                                                                                             
                Failed Shuffles=0                                                                                                                                                             
                Merged Map outputs=0                                                                                                                                                          
                GC time elapsed (ms)=393                                                                                                                                                      
                CPU time spent (ms)=5660                                                                                                                                                      
                Physical memory (bytes) snapshot=240848896                                                                                                                                    
                Virtual memory (bytes) snapshot=2881601536                                                                                                                                    
                Total committed heap usage (bytes)=138936320                                                                                                                                  
                Peak Map Physical memory (bytes)=240848896                                                                                                                                    
                Peak Map Virtual memory (bytes)=2881601536                                                                                                                                    
        HBase Counters                                                                                                                                                                        
                BYTES_IN_REMOTE_RESULTS=0                                                                                                                                                     
                BYTES_IN_RESULTS=66                                                                                                                                                           
                MILLIS_BETWEEN_NEXTS=4770                                                                                                                                                     
                NOT_SERVING_REGION_EXCEPTION=0                                                                                                                                                
                NUM_SCANNER_RESTARTS=0                                                                                                                                                        
                NUM_SCAN_RESULTS_STALE=0                                                                                                                                                      
                REGIONS_SCANNED=1                                                                                                                                                             
                REMOTE_RPC_CALLS=0                                                                                                                                                            
                REMOTE_RPC_RETRIES=0                                                                                                                                                          
                ROWS_FILTERED=0                                                                                                                                                               
                ROWS_SCANNED=2                                                                                                                                                                
                RPC_CALLS=1                                                                                                                                                                   
                RPC_RETRIES=0                                                                                                                                                                 
        org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counters                                                                                                                
                ROWS=2                                                                                                                                                                        
        File Input Format Counters                                                                                                                                                            
                Bytes Read=0                                                                                                                                                                  
        File Output Format Counters                                                                                                                                                           
                Bytes Written=0                                                                          



Scan return rows in ascending order of key values. [This feature is not supported in MapR-DB]

hbase> scan 'sample', {REVERSED => true}

Get row for a corresponding key 'k1'.

hbase> get 'sample', 'k1'

It will return all the column family. If you want to return only columns in cf1 column family, use the following statement.

hbase> get 'sample', 'k1', {COLUMNS => ['cf1']}

You can create a variable t for table 'sample.

hbase> t = get_table 'sample'

hbase> t.<press tab to view available functions>

hbase(main):037:0> t.Display all 239 possibilities? (y or n)                                                                             
t.__id__                            t.__send__                          t._append_internal                                               
t._count_internal                   t._createdelete_internal            t._delete_internal                                               
t._deleteall_internal               t._deleterows_internal              t._get_counter_internal                                          
t._get_internal                     t._get_scanner                      t._get_splits_internal                                           
t._hash_to_scan                     t._incr_internal                    t._put_internal                                                  
t._scan_internal                    t.abort_procedure                   t.add_labels                                                     
t.add_peer                          t.add_rsgroup                       t.alter                                                          
t.alter_async                       t.alter_namespace                   t.alter_status                                                   
t.append                            t.append_peer_namespaces            t.append_peer_tableCFs                                           
t.assign                            t.balance_rsgroup                   t.balance_switch                                                 
t.balancer                          t.balancer_enabled                  t.catalogjanitor_enabled                                         
t.catalogjanitor_run                t.catalogjanitor_switch             t.class                                                          
t.cleaner_chore_enabled             t.cleaner_chore_run                 t.cleaner_chore_switch                                           
t.clear_auths                       t.clear_block_cache                 t.clear_compaction_queues                                        
t.clear_deadservers                 t.clone                             t.clone_snapshot                                                 
t.close                             t.close_region                      t.com                                                            
t.compact                           t.compact_rs                        t.compaction_state                                               
t.convert                           t.convert_bytes                     t.convert_bytes_with_position                                    
t.count                             t.create                            t.create_namespace                                               
t.debug                             t.debug?                            t.define_singleton_method                                        
t.delete                            t.delete_all_snapshot               t.delete_snapshot                                                
t.delete_table_snapshots            t.deleteall                         t.desc                                                           
t.describe                          t.describe_namespace                t.disable                                                        
t.disable_all                       t.disable_peer                      t.disable_table_replication                                      
t.display                           t.drop                              t.drop_all                                                       
t.drop_namespace                    t.dup                               t.enable                                                         
t.enable_all                        t.enable_peer                       t.enable_table_replication                                       
t.enum_for                          t.eql?                              t.equal?                                                         
t.exists                            t.extend                            t.flush                                                          
t.freeze                            t.frozen?                           t.get                                                            
t.get_all_columns                   t.get_auths                         t.get_counter                                                    
t.get_peer_config                   t.get_rsgroup                       t.get_server_rsgroup                                             
t.get_splits                        t.get_table                         t.get_table_rsgroup                                              
t.grant                             t.handle_different_imports          t.hash                                                           
t.help                              t.hlog_roll                         t.include_class                                                  
t.incr                              t.inspect                           t.instance_eval                                                  
t.instance_exec                     t.instance_of?                      t.instance_variable_defined?                                     
t.instance_variable_get             t.instance_variable_set             t.instance_variables                                             
t.is_a?                             t.is_disabled                       t.is_enabled                                                     
t.is_in_maintenance_mode            t.is_meta_table?                    t.itself                                                         
t.java                              t.java_annotation                   t.java_field                                                     
t.java_implements                   t.java_kind_of?                     t.java_name                                                      
t.java_package                      t.java_require                      t.java_signature                                                 
t.javafx                            t.javax                             t.kind_of?                                                       
t.list                              t.list_deadservers                  t.list_labels                                                    
t.list_locks                        t.list_namespace                    t.list_namespace_tables                                          
t.list_peer_configs                 t.list_peers                        t.list_procedures                                                
t.list_quota_snapshots              t.list_quota_table_sizes            t.list_quotas                                                    
t.list_regions                      t.list_replicated_tables            t.list_rsgroups                                                  
t.list_security_capabilities        t.list_snapshot_sizes               t.list_snapshots                                                 
t.list_table_snapshots              t.locate_region                     t.major_compact                                                  
t.merge_region                      t.method                            t.methods                                                        
t.move                              t.move_namespaces_rsgroup           t.move_servers_namespaces_rsgroup                                
t.move_servers_rsgroup              t.move_servers_tables_rsgroup       t.move_tables_rsgroup                                            
t.name                              t.nil?                              t.normalize                                                      
t.normalizer_enabled                t.normalizer_switch                 t.object_id                                                      
t.org                               t.parse_column_name                 t.private_methods                                                
t.processlist                       t.protected_methods                 t.public_method                             
t.public_methods                    t.public_send                       t.put                                                            
t.remove_instance_variable          t.remove_peer                       t.remove_peer_namespaces                                         
t.remove_peer_tableCFs              t.remove_rsgroup                    t.remove_servers_rsgroup                                         
t.respond_to?                       t.restore_snapshot                  t.revoke                                                         
t.scan                              t.send                              t.set_attributes                                                 
t.set_authorizations                t.set_auths                         t.set_cell_permissions                                           
t.set_cell_visibility               t.set_converter                     t.set_op_ttl                                                     
t.set_peer_bandwidth                t.set_peer_exclude_namespaces       t.set_peer_exclude_tableCFs                                      
t.set_peer_namespaces               t.set_peer_replicate_all            t.set_peer_tableCFs                                              
t.set_quota                         t.set_visibility                    t.show_filters                                                   
t.show_peer_tableCFs                t.singleton_class                   t.singleton_methods                                              
t.snapshot                          t.split                             t.splitormerge_enabled                                           
t.splitormerge_switch               t.status                            t.table                                                          
t.table_help                        t.taint                             t.tainted?                                                       
t.tap                               t.to_enum                           t.to_java                                                        
t.to_json                           t.to_s                              t.to_string                                                      
t.tools                             t.trace                             t.truncate                                                       
t.truncate_preserve                 t.trust                             t.unassign                                                       
t.untaint                           t.untrust                           t.untrusted?                                                     
t.update_all_config                 t.update_config                     t.update_peer_config                                             
t.user_permission                   t.version                           t.wal_roll                                                       
t.whoami                            t.zk_dump                                                       



Scan table 'sample' for any value match = v3 in any column

hbase> scan 'sample', FILTER => "ValueFilter(=, 'binary:v3')"

Scan table 'sample' to find rows where cf1:c2 equals to v3

hbase> scan 'sample', COLUMNS => 'cf1:c2', FILTER => "SingleColumnValueFilter('cf1','c2',=, 'binary:v3')"


At the end disable the drop the table from HBase.

hbase> disable 'sample'
hbase> drop 'sample'


Counters

hbase(main):001:0> create 'counters', 'daily', 'weekly', 'monthly'
0 row(s) in 1.1930 seconds

hbase(main):002:0> incr 'counters', '20110101', 'daily:hits', 1
COUNTER VALUE = 1

hbase(main):003:0> incr 'counters', '20110101', 'daily:hits', 1
COUNTER VALUE = 2

hbase(main):04:0> get_counter 'counters', '20110101', 'daily:hits'
COUNTER VALUE = 2



One more example

hbase> create_namespace "inventory"
hbase> create "inventory:product", "info"
hbase> alter "inventory:product", {NAME => "info", VERSIONS => 3}
hbase> alter "inventory:product", {NAME => "reviews", VERSIONS => 3}
hbase> put "inventory:product", "r1", "info:name", "Mac pro"
hbase> put "inventory:product", "r1", "info:cpu", "12"
hbase> put "inventory:product", "r1", "info:price", "1000"
hbase> put "inventory:product", "r1", "info:price", "1100"
hbase> put "inventory:product", "r1", "info:price", "1200"
hbase> get "inventory:product", "r1", {COLUMN => "info", VERSIONS => 3}
hbase> put "inventory:product", "r2", "info:name", "Samsung S9"
hbase> put "inventory:product", "r3", "info:name", "Microsoft Surface Pro"
hbase> scan "inventory:product",{COLUMNS => ["info:name"], STARTROW => "r2", ENDROW => "r4"}
hbase> put "inventory:product", "r3", "info:price", "800"
hbase> put "inventory:product", "r3", "reviews:c2", "4.5"



Query HBase table using Hive

Create a hive table against the HBase table. Use the code snippet below and modify it based on HBase table.

hive> CREATE EXTERNAL TABLE product(
    key string,
    name string,
    price int)
ROW format serde 'org.apache.hadoop.hive.hbase.HBaseSerDe'
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES('hbase.columns.mapping' = ':key,info:name,info:price')
TBLPROPERTIES ('hbase.table.name' = 'inventory:product');



0: jdbc:hive2://sandbox-hdp.hortonworks.com:2> select * from product; 
+--------------+------------------------+----------------+                                                                                                                                    
| product.key  |      product.name      | product.price  |                                                                                                                                    
+--------------+------------------------+----------------+                                                                                                                                    
| r1           | Mac pro                | 1200           |                                                                                                                                    
| r2           | Samsung S9             | NULL           |                                                                                                                                    
| r3           | Microsoft Surface Pro  | 800            |                                                                                                                                    
+--------------+------------------------+----------------+                                                                                                                                    
3 rows selected (4.521 seconds)                              



Lets dump the HBase table into HDFS as parquet file for further analysis.

hive> create table product_parquet(
    key string,
    name string,
    price int
) stored as parquet;
insert into product_parquet select * from product;


At the end disable the drop the table from HBase.

hbase> delete "inventory:product", "r3", "reviews:c2" # delete a cell value
hbase> deleteall "inventory:product", "r3" # delete entire row
hbase> disable "inventory:product"
hbase> drop "inventory:product"

Note: deleting rows based on row key range in not available in hbase shell. Use HBase API to delete in bulk.



'