QuantOps

Recently, I was interviewed for the ActiveState blog on DevOps & Platform as a Service (PaaS); that interview made it to Wired.com (here). A discussion on the topic was timely, as I’ve been thinking about DevOps and other agile delivery chain mechanisms quite a bit lately, mainly as I am applying them in my current gig which my colleagues are I describe as “Business Ops”. Next month at Nordic Security 2013 I’ll be presenting “Operating * By the Numbers” (If you’re wondering why there’s no abstract, it’s because I’m still perfecting “Just In Time” deck development…just kidding. Sort of.*)

Anyway, I thought it might be a good idea to explain What I’m Talking About When I Talk About DevOps (apologies to the incomparable Haruki Murakami). This will be my first time trying to explain where I’m going with this whole DevOps thing, so it might get fuzzy. Bear with me. I reserve the right to change my mind later, of course (I’m cognitively agile that way, haha), so if you have comments or criticisms I’m very open to hearing your thoughts.

Connection between DevOps & Risk

DevOps, if you’ve not heard of it before, is a concept/approach to managing large-scale software deployments. It seems to be most popular/effective at software-based or online services, and it is “big” at highly scaled out companies like Google, Etsy, and Netflix. Whether consumer-facing or B2B, these services need to be fast and highly-reliable/available. The DevOps movement is one where deployments and maintenance are simplified (simplicity is easier to maintain than complexity) through standardization and automation, lots of instrumentation & monitoring, and an integration of process across teams (most specifically, Dev, QA & Ops). More on “QA” later.

But…the thing about DevOps is, that while it is a new concept in the world of online services, it draws heavily from Operations Management, which is not new. The field of Operations Research was forged in manufacturing but the core concepts are easily applied across other product development cycles. In fact this extension is largely overdue, since a scan through semi-recent texts on operations management shows IT largely described as an enabling function (e.g. ERP) but not a product class in and of itself. (BTW, in some curriculums, Operations Management is cross-listed or referred to as Decision Science, which is a core component of risk/security analytics.)

So – DevOps for me is just Ops, which I personally have some love for not just because I studied it in school (part of my major), but also because of my dabbling in systems theory. “Operations” describes a transformative system that converts inputs into outputs: the “what” of the product as well as the “how”. And a great deal of operations theory is devoted to performance optimization: usually around quality (reducing variances in quality across outputs) and also around cost/speed of delivery (improving throughput, reducing resources required for completing conversion, etc). Generally, then , operations defines performance criteria and provides a set of tools for optimizing performance.

A little control chart action for ya, courtesy of wikipedia.

Which brings us to Risk, and how risk management relates to operations management. My working definition of risk management is the set of practices/tools/processes concerned with improving the predictability – or reducing the variance – to returns. More pedantically (if this is possible), I would say risk management is optimization of performance on prioritized dimensions related to reducing variance of returns, given limited (or at least, not complete) control over externalities, and random events endogenous to the system.

So, if you will follow my (twisted) logic, the path is this. DevOps is just Ops. Ops is performance optimization of production/delivery chains. Risk management is a type of performance optimization, dependent on factors including production/delivery chains.

Since too much theory is boring, and gives me a headache, let’s go into a practical example that may help illustrate how I’m connecting the business logic layer into operating a function/service in an agile – or DevOppy – way.

A Little e-xample
Example: e-commerce

OMG it’s a rainbow pareto chart! V useful for clarifying tradeoffs like the ones in this nifty example. Thanks DanielPenfield for sharing via Wikipedia.

    • Basic integration = connecting a checkout flow to a processor, pass required info to process a payment transaction. If the use case is met, an authorization request will be initiated – sent to payment processor/issuer with no error. 
    • Business logic layer challenge: 
      • The more info passed to the bank, the more likely that the authorization request will be approved (resulting in a completed transaction, $$)
      • The more info collected from the customer, the more likely that the checkout flow will be abandoned (that the customer will drop off, resulting in no transaction)
      • Note: This is not an ephemeral problem like “privacy vs security”. This is a literal tradeoff that can be measured, and is a common target of AB testing (or similar approaches)
  • Optimization problem: What are the “correct” data fields to collect and pass through in the authorization request to maximize conversion, i.e. revenue?
  • Point: If the business logic is suboptimal, that doesn’t mean there is a “bug” per se, but it is a business operations issue that affects the expected performance of the checkout flow
  • What does this have to do with DevOps: DevOps aims to reduce artificial process boundaries in the delivery chain. If our use case ends at the application layer, and the integration code, what will be delivered is “working code”, that creates an authorization request and does not generate an error. But what we actually want is a “working service”, i.e. a successful checkout – and that requires pushing beyond code-centricity, or even service availability, and into the actual business logic, so that we can improve the probability of that transaction completing.

Note: I’m using the term “business logic” pretty generically.  I don’t mean to imply that all of these optimization problems are solved in a discrete business logic layer. There are plenty of these types of decisions that get driven into the product design/architecture at lower layers.  For most infrastructure components, or back-end production systems, these types of capabilities are managed in the application layer, no problem. And that works fine in most situations. For some use cases that require optimization it’s useful to have a business logic layer, or some “soft” interface so the system can be adjusted without requiring fundamental changes to the underlying code (that is hard-coded). Risk decisioning is one I have the most familiarity with, but there are others I can think of, such as catalog, pricing/promotions, and website/blog CMS. My point is less about where the business logic lives, and more about the idea one can have correctly functioning code (that will pass QA and not generate errors) that is suboptimized from a pure operational perspective, if one considers operations as all the functions composing a system’s transformation of raw inputs to desired outcomes. (Perhaps this will be covered in the sequel to The Phoenix Project, aka “Return of the Marketing & CS Jedi”. <- Yes. Phoenix Project reference achievement unlocked here.)

There are other reasons I like DevOps. It’s a natural extension of agile product development which I like as a way of working. As a product manager I far prefer to work on small, modular releases over large, monolithic codebases. Business requirements are full of assumptions, and it is nice to fail early and fix fast in small increments, rather than fail late and spectacularly. When managing software developers, our output was complicated enough we needed to support our own releases (i.e. augment Ops on our code), which meant we were certainly motivated to find issues pre-release, and not release anything that couldn’t be supported. Higher agility = lower fragility? Sounds good to me. Also, from a risk/security perspective: the impact of small changes is easier to estimate and measure than the effects of large sets of changes. I still haven’t made up my mind up about QA in the context of continuous delivery/deployment.

Generally though, despite all love for automation, reducing any “us vs them” boundaries between technical teams, and improvements in stability, reliability, and other key performance metrics – the deal is, I’m going to keep on this kick because I have an analytics / decision science hammer, and so I encourage everyone to keep showing me nails. Full disclosure. Also, wicked tired now.

 

* I’m such a wildcard operator, lolza.

What do you think?

This site uses Akismet to reduce spam. Learn how your comment data is processed.