Developing Distributed Applications Using ZooKeeper

Nbiswas
Posted by in NoSql category on for Beginner level | Points: 250 | Views : 2246 red flag

Apache ZooKeeper is an effort to develop and maintain an open-source server which enables highly reliable distributed coordination.

Introduction

A distributed computer system consists of multiple software components that are on multiple

computers, but run as a single system. The computers that are in a distributed system can be physically

close together and connected by a local network, or they can be geographically distant and connected

by a wide area network. The goal of distributed computing is to make such a network work as a single

computer

Distributed systems offer many benefits over centralized systems. One benefit is that of Scalability.

Systems can easily be expanded by adding more machines as needed. Another major benefit of

distributed systems is Redundancy. Several machines can provide the same services, so if one is

unavailable, work does not stop. Additionally, because many smaller machines can be used, this

redundancy does not need to be prohibitively expensive

Distributed applications require coordination. You could develop your own coordination service,

however that can take a lot of work and is not a trivial task. The alterative is using a robust coordination

service that is already available for use. ZooKeeper is just that – a distributed centralized coordination

service that is open-source and ready for use

What is Zoo keeper

ZooKeeper is a high-performance coordination service for distributed applications. It exposes common services - such as naming, configuration management, synchronization, and group services - in a simple interface so you don't have to write them from scratch. You can use it off-the-shelf to implement consensus, group management, leader election, and presence protocols. And you can build on it for your own, specific needs

 

Usage of ZooKeeper

The ZooKeeper service can be used to help you tackle many of the common challenges distributed

applications face

 

ZooKeeper can be used to maintain configuration information. For example you can store configuration

data in ZooKeeper and share that data across all nodes in your distributed setup

 

ZooKeeper can be used for naming. An example is using it as a naming service, allowing one node to find

a specific machine in a cluster of thousands of servers

 

ZooKeeper can be used to solve the problem of distributed synchronization – providing the building

blocks for Locks, Barriers, and Queues

 

ZooKeeper can also be used for group services such as leader election and more

 

ZooKeeper provides the building blocks for all of these scenarios and is distributed, reliable and fast,

while still being relatively simple to work with

In a distributed ZooKeeper implementation, there are multiple servers. This is known as ZooKeeper’s

Replicated Mode. One server is elected as the leader and all additional servers are followers. If the

ZooKeeper leader fails, then a new leader is elected

 

All ZooKeeper servers must know about each other. Each server maintains an in-memory image of the

overall state as well as transaction logs and snapshots in persistent storage. Clients connect to just a

single server, however, when a client is started, it can provide a list of servers. In that way, if the

connection to server for that client fails, the client connects to the next server in its list. Since each

server maintains the same information, the client is able to continue to function without interruption

A ZooKeeper client can perform a read operation from any server in the ensemble, however a write

operation must go through the ZooKeeper leader and requires a majority consensus to succeed

 

More about Zookeeper

ZooKeeper may also be run in Standalone mode. In this mode only a single ZooKeeper server exists. All

clients connect to the ZooKeeper service via this one server. You lose the benefits of replicated mode

when using standalone mode. Trading high-availability and resilience for a simpler environment can be

useful for testing and learning purposes

 

Zookeeper consistency guarantees

1) ZooKeeper provides us with six consistency guarantees

2)  The first ZooKeeper consistency guarantee is that of Sequential Consistency. This means that updates

from a client to the ZooKeeper service are applied in the order they are sent

 

3) Next, the Atomicity guarantee means that updates in ZooKeeper either succeed or fail. Partial updates

are not allowed

 

4) The Singe System Image guarantee states that a client will see the same view of the ZooKeeper service

regardless of the server in the ensemble that it is connected to

 

5) The Reliability guarantee means that if an update succeeds in ZooKeeper then it will persist and not be

rolled back. The update will only be overwritten when another client performs a new update

 

6) Finally, the Timeliness guarantee means that a clients view of the system is guaranteed to be up-to-date Within a certain time bound, generally within tens of seconds. If a client does not see system changes

within that time bound, then the client assumes a service outage and will connect to a different server in

the ensemble

N.B.~ It is important to understand that ZooKeeper does NOT make a Simultaneously Consistent Cross-Client View guarantee. This means that ZooKeeper does not guarantee that different clients will have identical views of ZooKeeper data at every instance in time

 

Network delays and other factors may make it possible for one client to perform an update before

another client is notified of the change. The way we can handle this is using the sync() method that

ZooKeeper provides. The sync method forces a ZooKeeper ensemble server to catch up with the leader

 

The distributed processes using ZooKeeper coordinate with each other through shared hierarchical

namespaces. These namespaces are organized much like the file systems in UNIX or Linux. Each

namespace has a root node and can have one or more child nodes. Since the term node has so many

different connotations, ZooKeeper refers to each of these nodes as znodes. Data can be stored in a

znode. When data is written to or read from a znode, all of the data is either written or read. Also, there

is an access controlled list (also known as an “ACL”) that is associated with each znode. This allows

control over who can create, read, update, and delete a znode

 

Conclusion

As new versions of Hadoop are released, ZooKeeper is being used more and more in the Hadoop

infrastructure. HBase uses ZooKeeper for master election, server lease management, bootstrapping, and

coordination between servers. Later versions of Hadoop are using ZooKeeper to provide high availability for the ResourceManager. In IBM’s Big Insights, ZooKeeper can be used with Adaptive MapReduce.

Apache Flume also has been using ZooKeeper for configuration purposes in recent releases

A variety of companies are using ZooKeeper with their distributed applications. Twitter, Yahoo and many

others are using ZooKeeper for different purposes such as configuration management, sharding, locking

and more.

 

References

https://zookeeper.apache.org/doc/r3.2.2/zookeeperProgrammers.html#ch_zkDataModel

http://www.ibm.com/developerworks/library/bd-zookeeper/

https://www.igvita.com/2010/04/30/distributed-coordination-with-zookeeper/

http://zookeeper.apache.org/

http://en.wikipedia.org/wiki/Zookeeper_(film)

http://en.wikipedia.org/wiki/Apache_ZooKeeperv

Page copy protected against web site content infringement by Copyscape

About the Author

Nbiswas
Full Name: Niladri Biswas
Member Level: Starter
Member Status: Member
Member Since: 9/6/2014 9:36:16 AM
Country: India
Thanks

I am a Technical Lead at HCL Technologies

Login to vote for this post.

Comments or Responses

Login to post response

Comment using Facebook(Author doesn't get notification)