Data Engines: HBase Introduction

HBASE

Hbase is a No SQL distributed databasewhich is built on top of Hadoop file system designed to achieve random, real-time read/write access to Big data. It is opensource and is developed after Google’s big-data table and is written in Java. It is a column-oriented database.

What is the Need of Hbase?

In Hadoop, data can be accessed only in sequentially manner which means read/write starts from the beginning of the file and proceeds step-by-step till the end. To query small data also it needs the entire dataset to be searched. Hadoop cannot change the partial data in the file without completely re-writing it. Because of this, there was a need to develop a solution which can provide random read/write access to huge volumes of Data.

Features of Hbase:

· Column-oriented No SQL Database
· Provides fault tolerance
· Supports semi-structured as well as structured data
· It Uses Hash tables to give random access and stores the data in Indexed form in HDFS for fast look ups.

Architecture of Hbase:

Hbase has 3 main components:
· H-Master
· Region Servers
· Zookeeper

1) H-Master:

§ It is the Master Server in Hbase.
§ It Assigns regions to the region servers and also monitors all region servers.
§ Performs load balancing. It distributes the load equally between Region servers.
§ H-Master handles all the operations related to metadata change like DDLs (create, delete, update of table)

2) Region Servers:

§ These are worker nodes in Hbase
§ Contains regions which are the horizontal partitions of the tables based on the Row key. Regions are the basic building blocks of Hbase cluster
§ Communicates with clients and handles read/write/update/delete operations of all the regions present in it.
§ Region server process will be run on every data node of Hadoop Cluster.

Region server has the following components:

1. Write Ahead Log (WAL): It is a log file that stores the new data which is not yet written to permanent storage and is useful while recovering due to node failures.

2. Block Cache: In memory,It caches the frequency used data .

3. MemStore: It is a Write Cache which stores the data which is not yet written to disk. Each column family in the region server will have its dedicated MemStore.

4. HFile: It stores the actual data/rows in store in a sorted manner of KeyValues.

3) Zookeeper

§ Maintains Server configuration information.
§ Keeps track of server failures.
§ Monitors all master servers and keeps only one H-Master server active at any time.

HBase Introduction

No comments:

Training