|Time:||2:00 pm - 2:50 pm|
ZooKeeper is the unsung hero, and a lot of time people don’t know that it’s there until it’s down. Because ZooKeeper is so important, it’s important to make it durable.
ZooKeeper is fairly stable, so more often the things that bring ZK down are misconfigurations, not bugs.
ZooKeeper is a coordinator for distributed applications. It is designed to remove the need for custom coordination code/solutions. ZK is used by HBase, HDFS, Solr, Kafka, etc.
Misconfigurations are any diagnostic ticket that require a ZK/config file change. These comprise 44% of tickets at Cloudera [eep!]. Typically ZK is straight forward to set up and operate, and issues tend to be client rather than ZK issues.
A 3 ZK Ensemble consists of three ZK machines: one leader, two followers. All three store a copy of the same data. This full replication ensures durability.
Leader is elected at startup, changes are coordinated through the leader, and clients talk to followers. Changes are accepted when a majority of ZKs agree.
Too Many Connections
Connection Closes Prematurely
Need to increase wait time for recovery. [Didn’t understand this completely.]
Pig Hangs Connecting to HBase
Client Session Time Out
Clients Lose Connections
Unable to Load Database: Unable to Load Quorum Server
Unable to Load Database - Unreasonable Length Exception
Failure to Follow Leader
Because ZK operates by majority, recommend having an odd number of servers in an ensemble: if you have 2 servers in an ensemble, and one goes down, you’re down (1 is not a majority of 2). Recommend:
But more isn’t always better: more servers means you need to wait for more votes in elections. You can use Observers to provide more followers that do not participate in elections.
You can verify the configuration using zk-smoketest.