The Python API for Cloudera Manager is very powerful, anything you can do via the UI you can do via the API. I use it a lot for automating cluster initialisation.

Many people use Ansible/Puppet etc but this only gets you so far, they can provision the machines and install Cloudera Manager/Ambari, their agents and the associated Hadoop stack but they don’t handle actually creating a cluster and the services you want instaledl.  Using the APIs you can automate it all and put into a Jenkins pipeline and build yourself a new cluster with your configurations. For example setup a basic cluster with minimal services installed for developers. This is even easier using Cloudera Managers templates or blueprints in Ambari. Normally you go to the UI and start pressing buttons.

Cloudera Director will eventually take over this but for now I believe it only covers deployment to Amazon (?)

Kerberos configuration can also be automated via the Cloudera Manager API which is the subject of this post.  I’m not covering installing the Kerberos clients, I assume they are installed and talking to Active Directory or FreeIPA. Maybe I’ll cover those in another post.

The important method in this code is the enable_kerberos method. It accepts a cluster name on which to enable Kerberos and dict containing the cluster configuration. The cluster configurations come from a json file describing my cluster, similar to Ambari’s blueprints. A simplified version would be

 

One point to note is that enabling Kerberos on the individual services actually triggers a gen_credentials command by Cloudera Manager, this is where Cloudera Manager generates the kerberos principles. Restarting the cluster too soon will fail since the principles for all services may not have been generate yet. Hence why I use a lambda filter expression to wait for all active gen_credential commands to finish before starting the cluster.

Share blog