Thoughts and Ideas: October 2014

Last week, I took on a high priority task for VMware so my work on pyVmomi and rbVmomi had to move down in priority for me. I'll continue to post regular updates on progress but the velocity will have slowed on both project until I have cleared my higher priority tasks.

rbVmomi

I have slid out the next release until November 15th
I have begun learning how to upstream the github changes into the VMware git repos for rbVmomi

we will need to determine the upstreaming process' impact on the public repo
I need to get any merge requests peer reviewed by VMware internal folks before proceeding

pyVmomi

I have triaged 2 new bugs

issue 176 is an interesting issue we've seen directly impact integrations like OpenStack, SaltStack, and Ansible. I would like to find a clean answer.
issue 180 is a problem I've heard reported before, it has to do with a newer vCenter server speaking to an older ESXi host. The current work-around is to ask the programmer to select a lowest acceptable version number (ie: 5.1) and use that namespace exclusively. The bug reporter has done a good job of breaking down the problem. We need to have a solution that will work robustly and impact client code minimally. EDIT: This issue is only known to impact ESXCLI use at this time

I wrote a simple sample for an IRC user today describing how to navigate a virtual machine's hardware devices. The sample is rather straight forward and I was glad to get to write a sample after having neglecting samples for so long
I revisited pyVmomi-tools and started a testing refactor using VCRpy

Thursdays will be my rbVmomi/pyVmomi day and I'll alternate between the projects. This week I spent most of my time on pyVmomi so next week I'll try to focus on rbVmomi.

More next week...

This week, I spent time at VMware's HQ in Silicon Valley. I gave a number of talks on my lessons learned with pyVmomi and rbVmomi and gave several calls to action. This week, I'd like to share one of the more profound lessons from the project. It has to do with how tools impact social interactions.

The first social networks were the ones that built software. The Software Development Life Cycle (SDLC) were social actions of developers directed, enabled, and enforced through the software tools that they used. If you take this view of creating software, then you it's natural to view any conflicts between departments or teams of developers as an opportunity to invent and apply new technology.

Let's examine something really interesting and profound about Travis CI that may not even have been intentional. The way that Travis CI shifts responsibilities through a tiny change in how builds are specified. The magic is in that .travis.yaml file and any build system that emulated the .travis.yaml file strategy would exhibit the same magical effects (even without using yaml). To understand the profundity of the change, let's revisit the bad-old-days of Developer Operations for a moment.

On the left, our developer builds some code and designs a build system to build their module. On the right an operations person builds a CI server to match. This process has some issues with it already. First off, the operations person must be told what the developer needs in a CI server. This requires a conversation and frequently its one that the operations person isn't directly incentivized to pay much attention to since this isn't likely to be their primary job.

The Operations person may even feel that their cooperation with the developer is a gift. That comes from the fact that the developer likely isn't in the Operations persons direct line of report and their collaboration is a cross-functional team. What is the Operations person really on the hook for?

Let's assume the previous gift from the Operations person lead to success. There's a working build. Over time the build works and breaks as you would expect. The developer is quite happy to fix their code to repair the build problems. Until one day... the build breaks permanently.

First the developer will try their normal work-around tricks. These won't work. The build environment is operationally broken now because the developer changed a library dependency or some other build criteria. Or, perhaps, our developer is spinning their wheels this effort is pointless, because something in the data center is broken.

What's the problem?

if the fault is in the build service we waste time debugging code that isn't broken
the developer is helpless to do their job without direct intervention from operations
the operations people have no view into when a build is broken because of a service problem
Operations may exercise their right to change how the server is running and break the build without notice to the developers

In the most extreme cases, the Operations person is going to be bothered during a crisis. This will result in loss of developer time as they wait on Operations (at best), or it will result in the Developer demanding more services from an already busy operations crew what the Operations person thought was a gift. And, often the timing is terrible. Both parties are blocked from doing their jobs effectively.

I've known at least one developer who would handle this by screaming "This is your job! Do your damn job!" Not only did this not get people to do their "damn jobs" but it also made them defensive viewing development as an enemy to defend against. This developer was not pleasant to be around.

Demanding people do their jobs and threatening their livelihoods doesn't exactly setup a safe place for innovation. Not all of us are high-priced-prostitutes and instead aspire to other motivations. (By the way, if you think I'm talking about you ... I'm not. I'm really really not.) If you find yourself here, start asking how to keep people from feeling like they need pitch-forks and torches.

Let's tear down this conflict. What are the facts in this scenario?

The developer specifies the build's operational requirements

building software is their job
running build servers is not their job

The operations person runs the build server

running workloads, servers, and Infrastructure is their job
deciding how software is built is not their job

Does this about cover it?

With these facts consider the following:

The developer

can't build software without asking for permission from operations

Operations can't create the build server on their own

ops has to stop a dev and ask for direction

This is an issue of Separation of Concerns ... only on social systems, not software systems. The concern of specifying a build in most CI systems rests with a separate configuration. In Jenkins this configuration is commonly run through a control panel. That panel is not version controlled, it's not tested code, yet it is effectively a software artifact.

Enter .travis.yaml

The Travis CI build setup is minimal. Point a Travis job at a repository and branch. The configuration work typically carried by an Operations person (those long configuration forms) is carried by the .travis.yaml file. This file is versioned with the code-base itself. That means, when the developer exercises their right to change their own build, the yaml file goes through version control and we can see if things break because of the change in build specification.

We now have two options as the developer:

modify our build to fit current expectations
request a new build environment that fits our new needs

The .travis.yaml file does not eliminate conflict or eliminate the problem, it frames the issue in a different manner.

Using a build specification by configuration file model means, the build configuration is now part of the project it describes. We have placed the control and authority over that process in the correct place ... with the person who's responsibility it is to maintain it. So one possible conflict is removed, the developer no longer has to beg and wait on Operations to donate their time to allow the developers to do their jobs.

The other part of this new social reality is, if we build a .travis.yaml file that our CI can't support we know it will fail. What's the conversation we're going to have now? The developer now has highlighted boldly that Operations is prepared to give certain services but not others and to ask for new services is to ask someone to go beyond their normal duties.

Operations persons now equipped with the right back-end tools can see what kinds of build requests are coming in, what their resource consumption and performance profiles look like... and without having to interfere with the developer's jobs they can tune their environment to cope. Build configuration by document now becomes a workload specification that an IaaS or a PaaS operator can judge for themselves whether or not they can service.

The conflict and struggle is still there, but now it's framed properly. We accomplish this through software. And, this ability to shape social interaction through software tools is why Software Engineering with more than a handful of people who know each other personally is really Social Engineering.

Thoughts and Ideas

2014-10-16

This week on pyVmomi/rbVmomi

rbVmomi

pyVmomi

2014-10-10

Software Engineering at scale is Social Engineering

2014-10-03

This week in pyVmomi & rbVmomi

rbVmomi

pyVmomi