2014-10-16

This week on pyVmomi/rbVmomi

Last week, I took on a high priority task for VMware so my work on pyVmomi and rbVmomi had to move down in priority for me. I'll continue to post regular updates on progress but the velocity will have slowed on both project until I have cleared my higher priority tasks.

rbVmomi


  • I have slid out the next release until November 15th
  • I have begun learning how to upstream the github changes into the VMware git repos for rbVmomi
    • we will need to determine the upstreaming process' impact on the public repo
    • I need to get any merge requests peer reviewed by VMware internal folks before proceeding

pyVmomi

  • I have triaged 2 new bugs
    • issue 176 is an interesting issue we've seen directly impact integrations like OpenStack, SaltStack, and Ansible. I would like to find a clean answer.
    • issue 180 is a problem I've heard reported before, it has to do with a newer vCenter server speaking to an older ESXi host. The current work-around is to ask the programmer to select a lowest acceptable version number (ie: 5.1) and use that namespace exclusively. The bug reporter has done a good job of breaking down the problem. We need to have a solution that will work robustly and impact client code minimally. EDIT: This issue is only known to impact ESXCLI use at this time
  • I wrote a simple sample for an IRC user today describing how to navigate a virtual machine's hardware devices. The sample is rather straight forward and I was glad to get to write a sample after having neglecting samples for so long
  • I  revisited pyVmomi-tools and started a testing refactor using VCRpy

Thursdays will be my rbVmomi/pyVmomi day and I'll alternate between the projects. This week I spent most of my time on pyVmomi so next week I'll try to focus on rbVmomi.

More next week...

2014-10-10

Software Engineering at scale is Social Engineering

This week, I spent time at VMware's HQ in Silicon Valley. I gave a number of talks on my lessons learned with pyVmomi and rbVmomi and gave several calls to action. This week, I'd like to share one of the more profound lessons from the project. It has to do with how tools impact social interactions.

The first social networks were the ones that built software. The Software Development Life Cycle (SDLC) were social actions of developers directed, enabled, and enforced through the software tools that they used. If you take this view of creating software, then you it's natural to view any conflicts between departments or teams of developers as an opportunity to invent and apply new technology.

Let's examine something really interesting and profound about Travis CI that may not even have been intentional. The way that Travis CI shifts responsibilities through a tiny change in how builds are specified. The magic is in that .travis.yaml file and any build system that emulated the .travis.yaml file strategy would exhibit the same magical effects (even without using yaml). To understand the profundity of the change, let's revisit the bad-old-days of Developer Operations for a moment.


On the left, our developer builds some code and designs a build system to build their module. On the right an operations person builds a CI server to match. This process has some issues with it already. First off, the operations person must be told what the developer needs in a CI server. This requires a conversation and frequently its one that the operations person isn't directly incentivized to pay much attention to since this isn't likely to be their primary job.

The Operations person may even feel that their cooperation with the developer is a gift. That comes from the fact that the developer likely isn't in the Operations persons direct line of report and their collaboration is a cross-functional team. What is the Operations person really on the hook for?

Let's assume the previous gift from the Operations person lead to success. There's a working build. Over time the build works and breaks as you would expect. The developer is quite happy to fix their code to repair the build problems. Until one day... the build breaks permanently.


First the developer will try their normal work-around tricks. These won't work. The build environment is operationally broken now because the developer changed a library dependency or some other build criteria. Or, perhaps, our developer is spinning their wheels this effort is pointless, because something in the data center is broken.

What's the problem?

  • if the fault is in the build service we waste time debugging code that isn't broken
  • the developer is helpless to do their job without direct intervention from operations
  • the operations people have no view into when a build is broken because of a service problem
  • Operations may exercise their right to change how the server is running and break the build without notice to the developers

In the most extreme cases, the Operations person is going to be bothered during a crisis. This will result in loss of developer time as they wait on Operations (at best), or it will result in the Developer demanding more services from an already busy operations crew what the Operations person thought was a gift. And, often the timing is terrible. Both parties are blocked from doing their jobs effectively.


I've known at least one developer who would handle this by screaming "This is your job! Do your damn job!" Not only did this not get people to do their "damn jobs" but it also made them defensive viewing development as an enemy to defend against. This developer was not pleasant to be around.

Demanding people do their jobs and threatening their livelihoods doesn't exactly setup a safe place for innovation. Not all of us are high-priced-prostitutes and instead aspire to other motivations. (By the way, if you think I'm talking about you ... I'm not. I'm really really not.) If you find yourself here, start asking how to keep people from feeling like they need pitch-forks and torches.

Let's tear down this conflict. What are the facts in this scenario?

  • The developer specifies the build's operational requirements
    • building software is their job
    • running build servers is not their job
  • The operations person runs the build server
    • running workloads, servers, and Infrastructure is their job
    • deciding how software is built is not their job
Does this about cover it?

With these facts consider the following:
  • The developer 
    • can't build software without asking for permission from operations
  • Operations can't create the build server on their own
    • ops has to stop a dev and ask for direction
This is an issue of Separation of Concerns ... only on social systems, not software systems. The concern of specifying a build in most CI systems rests with a separate configuration. In Jenkins this configuration is commonly run through a control panel. That panel is not version controlled, it's not tested code, yet it is effectively a software artifact.

Enter .travis.yaml


The Travis CI build setup is minimal. Point a Travis job at a repository and branch. The configuration work typically carried by an Operations person (those long configuration forms) is carried by the .travis.yaml file. This file is versioned with the code-base itself. That means, when the developer exercises their right to change their own build, the yaml file goes through version control and we can see if things break because of the change in build specification.

We now have two options as the developer:
  1. modify our build to fit current expectations
  2. request a new build environment that fits our new needs
The .travis.yaml file does not eliminate conflict or eliminate the problem, it frames the issue in a different manner. 

Using a build specification by configuration file model means, the build configuration is now part of the project it describes. We have placed the control and authority over that process in the correct place ... with the person who's responsibility it is to maintain it. So one possible conflict is removed, the developer no longer has to beg and wait on Operations to donate their time to allow the developers to do their jobs.

The other part of this new social reality is, if we build a .travis.yaml file that our CI can't support we know it will fail. What's the conversation we're going to have now? The developer now has highlighted boldly that Operations is prepared to give certain services but not others and to ask for new services is to ask someone to go beyond their normal duties.

Operations persons now equipped with the right back-end tools can see what kinds of build requests are coming in, what their resource consumption and performance profiles look like... and without having to interfere with the developer's jobs they can tune their environment to cope. Build configuration by document now becomes a workload specification that an IaaS or a PaaS operator can judge for themselves whether or not they can service.

The conflict and struggle is still there, but now it's framed properly. We accomplish this through software. And, this ability to shape social interaction through software tools is why Software Engineering with more than a handful of people who know each other personally is really Social Engineering.

2014-10-03

This week in pyVmomi & rbVmomi

rbVmomi


  • version 1.8.2 is live on rubygems
  • release notes are here

pyVmomi


I'm in Silicon Valley next week for an unrelated project so it will likely be a slow week on the *Vmomi projects. If folks would like to meet up I'm trying to organize a hack-night for projects in pyVmomi or rbVmomi while I'm in town. Contact me for details.

2014-09-26

This week in pyVmomi & rbVmomi

pyVmomi

Not much motion this week. I've spent the majority of the week focused on coming up to speed on rbVmomi. Of note we did have a bug report of sockets getting left open and I'll be spending quality time debugging that next week.


  • The next major revision of pyVmomi is slipped to December now in light of my new work commitments. I will keep github up to date on how these projects are going.
  • Disconnect doesn't close socket is this a pervasive bug or a quirk? TBD I'll carve time from my schedule next week to attempt to reproduce and resolve the bug as either a quirk or create a fix. The fix will go in 2014.2 at minimum but might be bad enough to cut 2014.1.2 if needed.
  • We need some more education and evolution on tasks. I've shelved my work on pyvmomi-tools temporarily while I come up to speed on rbVmomi. But, tasks are one of those areas I feel the raw API is too rich for new developers to grasp right away. Some syntax sugar should help.
  • Currently stalled on my desktop is an effort to introduce an SMS "plugin" (plugin isn't the right word but it's close) which I wanted to introduce through pyvmomi-tools. I'll ask for review inside VMware and from key thought leaders in the python and pyVmomi communities before shipping it to pypi ... more on this as we have time.
  • We changed version conventions yet again. On pyVmomi we wanted the version number to communicate freshness and compatibility. So the year-dot-quarter method was used. Then releases moved to every six months so it's 1st half and 2nd half now. Then we found out `-` means something in pypi so now it's just a `.` character. Finally we have 5.5.0.2014.1 and 5.5.0.2014.1.1 to indicate 1st half of 2014 bug fix 1 is compatible with vSphere 5.5.0. Part of this is we still don't know precisely how pyVmomi github and pyVmomi vSphere will interact on a code-base level going forward. That's TBD be the vSphere teams in charge of that. The admittedly complex interaction between these two things 

rbVmomi

Just coming up to speed on the modern state of Ruby gem management since it's been a number of years since I've had to work with them. I'm also moving cautiously since there's a standing development practice and tradition in place that I need to be mindful of.

  • A prerelease for rbVmomi 1.8.2 is available on rubygems now please test!
  • A regular release for 1.8.2 will happen October 1st barring any negative testing results
  • A 1.8.3 release is tentatively set to got to pre-release October 17th (or so) depending on how well we can manage the process of upstreaming pull requests and this is very tentative. Note: this is a change I had intended to hold the release 2 weeks for myself to gain familiarity first. Instead I like this pre-release, delay for 3rd party testing/review, then release strategy and I'll likely adopt it as my normal operating style from now on.
  • Long term planning will have to happen for a 1.9.1 release but I'll not comment much on that until I know what my long term relationship to the project looks like.
  • I opened a low priority research ticket for [higher performance XML](https://github.com/vmware/rbvmomi/issues/57) which seems to be a minor scalability problem rbVmomi faces. My initial research is in the ticket. It's a help wanted item so feedback is much appreciated.
  • NOTE: any item I tag 'help wanted' is an area I feel a 3rd party can either easily help with (usually also tagged 'low hanging fruit') or an area we can't really move on without outside help.
  • NOTE: any item I tag 'VMware staff assistance needed' is something that I feel a 3rd party should probably avoid working on since the issue is fuzzy and bleeds into VMware internal technical details, decisions, or other issues muddy the ticket such that it's not going to move without someone (usually me) bridging the Open Source and VMware internal engineering communities.

General Status

My Friday posts will probably take this two part pyvmomi then rbvmomi flavor as long as I'm bridging both communities. Sorry if that confuses things but that's a function of Conway's law and at the moment the Open Source to VMware and back communications conduit is basically me for both projects so you have to expect a bit of that.

Thanks for your support! The following freenode IRC channels will at least have my IRC 'bot in them #pyvmomi , #pyvmomi-dev , #rbvmomi , #rbvmomi-dev , #govmomi , #govmomi-dev and I'll eventually see IRC messages in those channels through my 'bots. The -dev channels are for advanced programming topics while the other channels will be open for newblings and will need community support since I'm pretty bad at dealing with neophyte level problems these days.

More on each of these threads as progress is made...

2014-09-24

On the topic of rbVmomi

Earlier this week I started helping out on the rbVmomi project. My involvement is just to help take some pressure off of Christian Dickmann around reviewing and releasing the next version of rbVmomi.

Compared with pyVmomi, the rbVmomi project has a longer history in Open Source. While pyVmomi has existed longer, rbVmomi has the benefit of being designed from the ground up for the type of collaborative effort an API binding like this can represent. I'll want to be cautious and observant of that established project workflow and primarily act as an accelerant for the processes already there.

To address the initial problem of there being no release this year, I've begun drafting v1.8.2 which should be ready for release in a few days. Having learned from pyVmomi, I've set up a follow up release v1.8.3 to try and catch anything we miss in the earlier release. The follow up release should hit in mid-October if we need it. If it turns out we don't or can't hit mid-October I'll drop the milestone and we'll move to long-term planning. Future release schedules are TBD.

For pyVmomi, this means my attentions are divided and I've shifted out milestones for that project to match my expectations. The second release of 2014 for pyVmomi is now tentatively set for mid-December. (I hope that pyVmomi reaches feature parity with rbVmomi by that release.) Our big additions for the next release of pyVmomi should be SMS and SPBM API support provided we don't hit any hidden issues. We'll hopefully have time for EAM as well.

My role on rbVmomi at this time is just to assist in the current release plan. Christian is very much still the lead developer. I'll be using tags on the open issues and pull requests to help decide what to call out for Christian's direct attention. There's also a bit of process for this project that I'll need to work out on the fly.

More on these topics as things happen...


2014-09-19

How to use Conway's Law for good and not evil

If you've been paying attention to our github repository you'll have noticed some noise around the following API additions to pyVmomi...
These are all presently scheduled for the next major release of pyVmomi which is currently set for December of 2014 and will be called 5.5.0-2014.2 barring any major issues.

I'll be working with the various feature teams inside VMware to figure out how best to Open Source their existing API bindings. More importantly, I'll be work with these teams to find a way that allows the effect described in Conway's Law to function beneficially on this set of projects.

In other words, pyVmomi isn't one large project with one BDFL directing the entire API. The reality is that the APIs potentially exposed by pyVmomi are in fact the result of multiple collaborating teams. Each of these teams will tend to be their own unique snowflake.

Couple this with the fact that we will also have the opportunity to leverage contributions from interested third parties, vendors, and partners and you can see that the previous monolithic structure might not survive the strain. I'll be taking some vital time to implement at least one or more extension strategies for the library and have them evaluated. This will probably cost 3 or more weeks worth of time but it will be worth the pain because these are long-lived projects with minimum life-spans on the order of 3 or more years.

As part of this research I've come back time and again to a talk by Doug Hellmann titled Dynamic Code Patterns.


Not to diminish Doug's work at all, but his Stevedore project is an implementation of the Extensibility Pattern which is itself part of the previous generation's Software Design Patterns movement. The design patterns technique came under understandable criticism for not really contributing much to programming and more than occasionally over complicating things. It's these concerns about over complication that are going to slow me down a bit. After all, making things simple is not easy and my goal is to make things as simple as possible and no simpler.

The Stevedore project does offer something to a library like pyVmomi. It is a tool to create extensions. The term 'plugin' is not really accurate when we're talking about modules like SMS. The SMS module is neither a driver, nor is it a hook, but it is an extension. These are new classes and methods that provide new capabilities from the primary facade of the pyVmomi library itself. You can call these plugins but not in the normal sense.

There is also danger in approaching the problem of extending into these new API as a plugin. It quite possibly introduces the problem of turning pyVmomi into a framework and such a transformation would be wholly inappropriate. In particular, I am considering the use case for pyVmomi as a helper library to deliver a driver for SaltStack. What happens to the outer project if pyVmomi itself becomes a framework and does this make those consumer projects unnecessarily more complex?

We want extensibility but not at the cost of adding the need to be aware of more details. We want the library to appear as a unit even though it is actually composed of multiple sub-units. That's something that requires some subtlety. It's not something to bother with if we don't need it so... why do I think we need it?

Over the next 2 weeks I'll be experimenting with SMS support provided in a couple of different ways in the hopes of finding a sustainable, flexible, and robust mechanism to use Conway's law as a means of enforcing a structure of collaboration between teams. In particular there's a set of existing organizational and structural relationships that I feel the current design violates..

In particular, at VMware, I do not personally manage, dictate, or control what the structure or design a given ... feature interface (or sub-API) looks like. It's impractical to think I could. My role is to act as a bridge between VMware and the developer communities interested in building software on-top of the VMware platforms.

Some details about the practical nature of how modules like SMS, SPBM, and EAM for use in pyVmomi are produced directly informs our library design considerations. For example...

  • Feature teams (and their management)
    • design, maintain, and release their own API
    • dictate what API are official, unofficial, and deprecated
    • have their own schedules, priorities, and deadlines
    • are fundamentally disinterested in low-level concerns
Knowing this, any design I create for pyVmomi that dictates that a feature team must talk to me first and then convince me to do things a certain way are very likely to fail. Why? Well, let's take the opposite approach as a thought experiment and see what happens.

If I were to decide that I was going to tightly control pyVmomi and force feature teams to first get approval for API from me before I clicked my big "approved" button and there by uploading their API to github that would mean I could conceivably derail their priorities or deadlines. 

What do you think would happen? I could cause a vital product or feature to slip it's release; I could force an unnecessary conflict between managers; I could find my project usurped by something the feature team felt could disentangle themselves from my meddling in their affairs. Any or all these and other scenarios could happen. In short, I would create an unnecessary friction and unintentional power-struggle as people tried to focus on issues wholly unrelated to something as trivial as providing API to people like yourself.

So, if I want to instead create a successful project in this environment I have to engineer my software so that it can function with the natural (and proper) social boundaries already present. The structure of the code itself will also influence how social interactions occur around the code base. And finally, the modularity of the code will allow me to potentially delegate power, authority, and autonomy to other people.

So what are our design concerns?
  • For feature creators adding to pyVmomi we should,
    • leverage existing library features
    • hide low-level concerns
    • allow independent ownership
    • simplify the process of creating a binding as much as possible
  • For integrating developers working with pyVmomi in their separate projects we should,
    • present a single unified "surface" for them to use
    • hide accidental complexity but expose essential complexity
    • follow a rule of least surprise
I admittedly may think of more as I work through the problems but this captures what we're after in a nutshell. This library in the world of Conway's Law represents a bridge between multiple parties and it's structure will end up reflecting how these groups relate. The pyVmomi software sitting between these groups will be least painful to work with for everyone if it can be molded to it present reality.

After we tackle Open Sourcing these three new API I'll have a better picture of what that reality is. And, that's what large-scale software development is really about, not as much computers and code so much as people, relationships, and communication. Our tools affect our lives and better tools make better lives.

More on this another time...

2014-09-12

Developer Community Engagement

If you've not been following along at home, pyVmomi was run in a different manner from most VMware Open Source projects. It's been a bit of a social experiment. The last two weeks since our release, I've been working on distilling lessons learned from the past five months of the project.

I did not plan on also looking into rbVmomi ... but ... at the same time just after VMworld a certain blog post started making the rounds on social media. It's clearly an opportunity to examine what we're doing at VMware around OpenSource development projects.

rbVmomi is a more typical VMware fling project. These start as a developer driven POC project and these are developed on a best-effort basis. The rbVmomi project has closed 6 issues and closed 10 pull requests during its entire lifetime as a VMware project.

pyVmomi has benefited from having my attention full time since April/May of 2014. The total number of issues closed to date is 59 with a total number of 70 merges. These differences in numbers shouldn't be surprising, that's to be expected when you go from free time development to full time development. My personal stats have become quite impressive due to the full-time activity on GitHub.

That's all nice for me, but, what does that really mean for the library? What does that mean for developer use, experience, and over-all for VMware? It's a matter of audience and SDK adoption.

Over on stackoverflow, you can see that in its entire life-span rbVmomi has had 9 questions asked as of this writing. That indicates 9 times people who are likely to seek help on stackoverflow have sought help and those 9 times only 4 of those questions got answers.

Taking a look at the same search for pyVmomi yields 24 questions over a much shorter life-span. 



13 of these questions have been answered and 19 of times people voted on the questions where as with rbVmomi no one voted. If you take a closer look, 17 of those questions occur after my full time commitment to the project. In the shorter time-frame my public commitment and effort to the library has helped increase developer engagement with the library improve by an order of magnitude.

The next question is... is this effort worth it? And how do we determine that

I'm open to suggestions. What else should I be looking at?