HBASE

            Hbase is a No SQL distributed databasewhich is  built on top of Hadoop file system designed to achieve random, real-time read/write access to Big data. It is opensource and  is developed after Google’s big-data table and is written in Java. It is a column-oriented database.

What is the Need of Hbase?

            In Hadoop, data can be accessed only in sequentially manner which means read/write starts from the beginning of the file and proceeds step-by-step till the end. To query small data also it needs the entire dataset to be searched. Hadoop cannot change the partial data in the file without completely re-writing it. Because of this, there was a need to develop a solution which can provide random read/write access to huge volumes of Data.

Features of Hbase:

·         Column-oriented No SQL Database
·         Provides fault tolerance
·         Supports semi-structured as well as structured data
·         It Uses Hash tables to give random access and  stores the data in Indexed form in HDFS for fast look ups.

Architecture of Hbase:

Hbase has 3 main components:
·         H-Master
·         Region Servers
·         Zookeeper




1) H-Master:



  • §  It is the Master Server in Hbase.

  • §  It Assigns regions to the region servers and also monitors all region servers.

  • §  Performs load balancing. It distributes the load equally between Region servers.

  • §  H-Master handles all the operations related to metadata change like DDLs (create, delete, update of table) 



2) Region Servers:



  • §  These are worker nodes in Hbase

  • §  Contains regions which are the horizontal partitions of the tables based on the Row key. Regions are the basic building blocks of Hbase cluster

  • §  Communicates with clients and handles read/write/update/delete operations of all the regions present in it.

  • §  Region server process will be run on every data node of Hadoop Cluster.



Region server has the following components:



1.     Write Ahead Log (WAL): It is a log file that stores the new data which is not yet written to permanent storage and is useful while recovering due to node failures.

2.     Block Cache: In memory,It caches the frequency used data .

3.     MemStore: It is a Write Cache which stores the data which is not yet written to disk. Each column family in the region server will have its dedicated MemStore.

4.     HFile: It stores the actual data/rows in store in a sorted manner of KeyValues.



3) Zookeeper
  • §  Maintains Server configuration information.

  • §  Keeps track of server failures.

  • §  Monitors all master servers and keeps only one H-Master server active at any time.