Puppet Managing vCenter and vShield

Today, we released a set of open source Puppet modules for managing vCenter Server Appliance 5.1 and vCloud Network and Security 5.1 (vCNS previously known as vShield). They provide a framework for managing resources within vCenter and vCNS via Puppet1.

The modules can be obtained from forge.puppetlabs.com:

1
2
3
$ puppet module install vmware/vcsa
$ puppet module install vmware/vcenter
$ puppet module install vmware/vshield

For development use github repos which can be installed via the following librarian Puppetfile:

1
2
3
4
5
6
mod "puppetlabs/stdlib"
mod "nanliu/staging"
mod "vmware_lib", :git => "git://github.com/vmware/vmware-vmware_lib.git"
mod "vcsa",       :git => "git://github.com/vmware/vmware-vcsa.git"
mod "vcenter",    :git => "git://github.com/vmware/vmware-vcenter.git"
mod "vshield",    :git => "git://github.com/vmware/vmware-vshield.git"

The puppet management host needs to have connectivity with vCenter and vCNS appliance. We are currently using a custom version of RbVmomi which has been included in the module. The management host should deploy all dependent software packages before managing any vCenter/vCNS resources:

1
2
3
node 'management_server' {
  include 'vcenter::package'
}

One of the gems in the package requires nokogiri. If you use Puppet Enterprise, install the pe-rubygem-nokogiri package on the management host (it’s not typically installed for agents). See Nokogiri documentation for additional information for open source puppet agents.

In last week’s sneak preview, I showed the debugging output for ssh transport. For the observant readers, those commands were the steps to initialize vCenter Server Appliance2:

This is the corresponding Puppet manifests3:

1
2
3
4
5
6
7
vcsa { 'demo':
  username => 'root',
  password => 'vmware',
  server   => '192.168.1.10',
  db_type  => 'embedded',
  capacity => 'm',
}

If we dig into the define resources type, it simply passes the user account to the ssh transport and initialize the device in the appropriate sequence:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
define vcsa (
...
) {
  transport { $name:
    username => $username,
    password => $password,
    server   => $server,
  }

  vcsa_eula { $name:
    ensure    => accept,
    transport => Transport[$name],
  } ->

  vcsa_db { $name:
    ensure    => present,
    type      => $db_type
  ...
}

Once vCenter Server appliance is initialized we can manage vCenter resources using the vSphere API. The example below specifies a vSphere API transport, along with datacenter, cluster, and an ESX host4:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
transport { 'vcenter':
  username => 'root',
  password => 'vmware',
  server   => '192.168.1.10',
  # see rbvmomi documentation for available options:
  options  => { 'insecure' => true },
}

vc_datacenter { 'dc1':
  ensure    => present,
  path      => '/dc1',
  transport => Transport['vcenter'],
}

vc_cluster { '/dc1/clu1':
  ensure    => present,
  transport => Transport['vcenter'],
}

vc_cluster_drs { '/dc1/clu1':
  require   => Vc_cluster['/dc1/clu1'],
  before    => Anchor['/dc1/clu1'],
  transport => Transport['vcenter'],
}

vc_cluster_evc { '/dc1/clu1':
  require   => [
    Vc_cluster['/dc1/clu1'],
    Vc_cluster_drs['/dc1/clu1'],
  ],
  before    => Anchor['/dc1/clu1'],
  transport => Transport['vcenter'],
}

anchor { '/dc1/clu1': }

vcenter::host { 'esx1':
  path      => '/dc1/clu1',
  username  => 'root',
  password  => 'esx_password',
  dateTimeConfig => {
    'ntpConfig' => {
      'server' => 'us.ntp.pool.org',
    },
    'timeZone' => {
      'key' => 'UTC',
    },
  },
  transport => Transport['vcenter'],
}

The next task is to connect vCloud Network and Security appliance to a vCenter appliance to form a cell:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
transport { 'vshield':
  username => 'admin',
  password => 'default',
  server   => '192.168.1.11',
}

vshield_global_config { '192.168.1.11':
  # This is the vcenter connectivity info. See vShield API doc:
  vc_info   => {
    ip_address => '192.168.1.10',
    user_name  => 'root',
    password   => 'vmware',
  },
  time_info => { 'ntp_server' => 'us.pool.ntp.org' },
  dns_info  => { 'primary_dns' => '8.8.8.8' },
  transport => Transport['vshield'],
}

In vShield API, all vCenter resources are referred by the vSphere Managed Object Reference (MoRef). ‘esx-13’ might be understandable to a computer, but for configuration purpose the name of the ESX host would make much more sense to an admin. For this reason, we developed the transport resource to support multiple connectivity during a single puppet execution:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
transport { 'vcenter':
  username => 'root',
  password => 'vmware',
  server   => '192.168.1.10',
  options  => { 'insecure' => true },
}

transport { 'vshield':
  username => 'admin',
  password => 'default',
  server   => '192.168.1.11',
}

vshield_edge { '192.168.1.11:dmz':
  ensure             => present,
  datacenter_name    => 'dc1',
  resource_pool_name => 'clu1',
  enable_aesni       => false,
  enable_fips        => false,
  enable_tcp_loose   => false,
  vse_log_level      => 'info',
  fqdn               => 'dmz.vm',
  vnics              => [
    { name          => 'uplink-test',
      portgroupName => 'VM Network',
      type          => "Uplink",
      isConnected   => "true",
      addressGroups => {
        "addressGroup" => {
          "primaryAddress" => "192.168.2.1",
          "subnetMask"     => "255.255.255.128",
        },
      },
    },
  ],
  transport  => Transport['vshield'],
}

This should a provide general overview for the module capabilities. Additional resources are available beyond what’s covered in this post, however some of them such as vc_vm are not operational yet, and currently the modules do not offer comprehensive coverage of the vSphere and vShield API. I hope other users will find this module useful for your environment.

Thanks again for the support from R&D team at VMware, and especially Randy Brown and Shawn Holland for contributing the vCenter and vShield module. Also thanks to Rich Lane for releasing RbVmomi, and support from Christian Dickmann for resolving an issue in that library.

Reference:

VMware github repository:

Video walkthrough:

  1. See Nick’s blog post for more info.

  2. Thanks to Will Lam’s post on vCenter appliance.

  3. In the module test manifests “import ‘data.pp’” is a pattern to simplify testing for developers in different environments, please do not use import function for your production puppet manifests.

  4. Also the resources should work against a vCenter installation on Windows, however it hasn’t been tested.

Device Management With Puppet

In Puppet 2.7, one of the new features added was device management. In this initial release, only a small number of Cisco switches were supported. Overall the capabilities weren’t really significant, however the concept shifted people’s perception of the boundaries for configuration management. All of sudden Puppet didn’t end at the operating system, and extended to black boxes that were previously thought to be a bridge too far.

This slowly spawned a flurry of activities exploring network devices, load-balancers, and storage:

The benefits of having the entire infrastructure automated with a single tool chain under version control is indisputable. A Software Define Data Center is not complete until you gap the management capabilities whether it’s your network or storage. With that said, there’s still some limitations with puppet device. Currently, the device command only supports communication with a single device at a time. This is fine when the device is self contained, but in some instances it’s necessary to interact with a series of devices to perform a meaningful task. For this reason, we developed transport resource to support multiple connectivity for a single puppet execution. This is not a substitution for orchestration of a chain of events, but rather to group resources that have interactions between different devices:

1
2
3
4
5
6
7
8
9
10
11
12
13
transport { 'ssh':
  username => 'root',
  password => 'p@ss',
  server   => '192.168.1.10',
  # support connection options in net::ssh :
  options  => { 'port' => 10022 },
}

transport { 'rest':
  username => 'admin',
  password => 'secret!',
  server   => '192.168.1.11',
}

A transport is shared and reused across several resources, and custom type/provider can leverage any transport connectivity:

1
2
3
4
5
# this is a mockup
remote_service { 'ntp':
  ensure    => running,
  transport => Transport['ssh'],
}

Here’s an example debug output showing ssh connectivity established and reused for 192.168.1.10 to perform a series of activity.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
debug: PuppetX::Puppetlabs::Transport::Ssh initializing connection to: 192.168.1.10
debug: Executing on 192.168.1.10:
vpxd_servicecfg eula read
debug: Execution result:
VC_EULA_STATUS=1
VC_CFG_RESULT=0

debug: Executing on 192.168.1.10:
vpxd_servicecfg db read
debug: Execution result:
VC_DB_TYPE=embedded
VC_DB_SERVER=
VC_DB_SERVER_PORT=
VC_DB_INSTANCE=
VC_DB_USER=
VC_DB_SCHEMA_VERSION=
VC_CFG_RESULT=0

debug: Executing on 192.168.1.10:
vpxd_servicecfg sso read
debug: Execution result:
SSO_TYPE=embedded
SSO_LS_LOCATION=https://192.168.1.10:7444/lookupservice/sdk
SSO_DB_TYPE=embedded
SSO_DB_SERVER=localhost
SSO_DB_SERVER_PORT=5432
SSO_DB_INSTANCE=ssodb
SSO_DB_USER=ssod
SSO_DB_PASSWORD=
VC_CFG_RESULT=0

debug: Executing on 192.168.1.10:
vpxd_servicecfg service status
debug: Execution result:
VC_SERVICE_STATUS=1
VC_CFG_RESULT=0

debug: Finishing transaction 2205823080
debug: Closing PuppetX::Puppetlabs::Transport::Ssh connection to: 192.168.1.10
debug: Storing state
debug: Stored state in 0.07 seconds
notice: Finished catalog run in 2.90 seconds

In the example above, a single ssh connection was established at the beginning, and closed at the end of the session. This was one of the roadblock Jeremy Schulman brought up when developing network devices NETCONF sessions. Originally I suggested a resource responsible for closing the connections that had automatic dependencies to the correct network resources since there was no API to do this in the provider2. However this doesn’t take in consideration for resources failures which results in dangling open connections. Ultimately, I went back to Puppet::Transaction and added the connection cleanup at the end of evaluate method:

1
2
3
4
5
6
7
8
9
10
module Puppet
  class Transaction
    alias_method :evaluate_original, :evaluate

    def evaluate
      evaluate_original
      PuppetX::Puppetlabs::Transport.cleanup
    end
  end
end

This isn’t the cleanest solution, since it monkey patches Puppet’s internal, and does not trap user termination of Puppet, but it is more reliable then depending on every resource succeeding. Thanks again to Jeremy’s original work with transaction.rb which led me down this path. For now this is a work around, while we wait for Puppet Labs come up with a formal solution to #3946.

Next week at VMware Partners Exchange (PEX 2013), Nick Weaver and Carl Caum will present more about Puppet and how it works with VMware Products at session VPN1298. This is just a sneak preview, so please attend to get the complete picture. Stay tuned, and I will provide more technical details about what we are using Puppet for next week after the PEX announcement.

  1. Juniper has gone as far as embedding Puppet agent to the device.

  2. See #3946. Work on Puppet long enough, you’ll find tickets indicating Dan has explored these dark corners before anyone else.

Migrating to Puppet 3.x

I am a bit late to the Puppet 3.0 party, since 3.1 have been released recently1. After reading the release notes and “The Angry Guide to Puppet 3”, I thought most of the upgrade issues have been covered, but there was still a small surprise when it comes to types and providers.

The Exec resource had a small behavior change noted as: “Due to misleading values, the HOME and USER environment variables are now unset when running commands.”

The impact is a bit more significant than updating Exec resource since it also affects commands declared in providers. If the invoked command depends on the HOME environment variables such as brew, it will fail in Puppet 3:

Error: /Stage[main]//Package[rbenv]:
Could not evaluate: Could not list packages:
Execution of '/usr/local/bin/brew list --versions rbenv' returned 1:
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/pathname.rb:853:in `expand_path':
couldn't find HOME environment -- expanding `~/Library/Caches/Homebrew' (ArgumentError)
from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/pathname.rb:853:in `expand_path'
from /usr/local/Library/Homebrew/global.rb:22:in `cache'
from /usr/local/Library/Homebrew/global.rb:41
from /usr/local/bin/brew:17:in `require'
from /usr/local/bin/brew:17

The solution is described in #16779:

Setting command environment variables16779
1
2
3
has_command(:brew, 'brew') do
  environment({ 'HOME' => ENV['HOME'] })
end

The method ‘has_command’ is not backwards compatible with Puppet 2.x, so we need to wrap it around some version check:

1
2
3
4
5
6
7
8
9
if Puppet::Util::Package.versioncmp(Puppet.version, '3.0') >= 0
  has_command(:brew, "/usr/local/bin/brew") do
    environment({ 'HOME' => ENV['HOME'] })
  end
else
  commands :brew => "/usr/local/bin/brew"
end

brew('list', '--versions')

This also means execute will strip these two environment variables as well, so commands pass :custom_environment to preserve this value. execute is less common, and don’t use it unless you need special behavior such as :failonfail => false:

1
execute([command(:brew), 'list', '--version'], :custom_environment => {'HOME'=>ENV['HOME']})

Hope this is helpful for anyone else who stumbles into this issue.

  1. The proliferation of versions might be alarming, but it’s due to stricter adherence to semantic versioning. The changes 3.0 3.1 is closer to 2.7.0 to 2.7.1 than 2.6 to 2.7.

New Year: Reboot

For the last two and half year, I lived, breathed, and spent my waking hours thinking about Puppet. In a blink of an eye Puppet Labs is no longer the 12 person startup above Old Town Pizza shop1 or more infamously, above the start of the Shanghai Tunnel 2 in China Town.

The company can probably can start shedding the term startup, since it’s 10x the size, closed a third round investment, and finished their third office move to the Pearl district.

Looking back, I still pinch myself for even getting the opportunity to be in Portland since I started the job knowing little about the software. I’m still amused by the interview with the other candidate in the same room that felt like a reality TV show3. Fast forward 200,000 miles around the world later and I’m finally feeling comfortable using and talking about Puppet. It’s been an incredible ride, and more important I’ve been lucky to work with some of the brightest people I know. If you enjoy solving infrastructure problems, apply and have a blast.

It’s a New Year, and I’m sad it’s time for a reboot and I’ve left Puppet Labs to join VMware as a Sr. Systems Engineer under R&D. As usual a new shiny problem has presented itself and I’m diving head first into the unknown.

However I’m not saying goodbye to the Puppet community. A glimps of what’s to come should be announced at VMware Partners Exchange 2013. Here’s to 2013.

  1. www.oldtownpizza.com + Stumptown Coffee = Startup Fuel.

  2. I thought it was a joke at first too: http://www.shanghaitunnels.info

  3. Story for another time, best told by Dan. Thankfully they didn’t maintain this medieval practice of jousting between candidates.

From Typhoon to Computers

Translations tend to have this weird effect of imbuing additional meanings the original language didn’t intend to contain1. One of the possible origins for the word Typhoon is the Chinese pronunciation of “East Wind”(东风) 2. Interestingly enough, the word has been imported back to Chinese phonetically as “Typhoon”(台风) with a completely different meaning than the original word east wind.

There’s two translations of the word computers in Chinese: “计算机” and “电脑”. If we go through the typhoon exercise and translate them back to English. The first, “Computing Machine”, is not far off from the original meaning, but the second, “Electronic Brain”, seems far more imaginative; especially compared to what computer systems were able to achieve when these words were first translated.

When I started working on computers, machines were simply waiting for the next human instruction, the tools for managing them doesn’t really mask the rather tedious task.

However with latest generation of configuration management software, I finally feel like we started building solutions to tackle the automation problem in a fundamental way. Similar to the insight in Stephen Wolfram’s book “A New Kind of Science”, complexity can be derive from simple rules, the inverse also appears be true: “complex systems can be managed with simple tools following basic rules chained together”.

We are certainly nowhere near having all the right building blocks for deploying and managing self healing infrastructure, but it’s a great aspiration to build better automation and be one small step closer towards the “Electronic Brain”.