Archive for October, 2010

How to properly do CPR (when software problems require CPR)

October 24, 2010 1 comment

It’s Saturday morning and it’s raining, both literally and figuratively.

Your cellular phone rings.  It’s M.  from technical support team.   M. just received a call from one of your most important customers.  The new software release has been installed and has not exhibited any problems for 2 weeks – until now.   The rain just turned into a monsoon.

The problem is urgent, time sensitive, and ambiguous.  Certain process has to run before Monday morning.  Yet the problem symptoms reported b y the customer are completely unfamiliar.  All critical scenarios were extensively tested prior to delivering the software to the customer.

It’s clear that something is wrong but in due time the problem will be found.

Before we discuss CPR in detail, do you have an emergency notification plan in your Engineering organization?  If yes – ensure it’s up to date.  If no – it will be difficult to reach the right engineers and solve the problem before Monday morning.  The customer’s business will be severely impacted.

CPR – or Customer Problem Report – is a must-do step after the solution has been found.

What is a CPR and why it’s so valuable?

CPR is a detailed (by definition –very binary and to the point) and actionable document created whenever a software problem has a significant impact on the customer.

The primary purpose of the CPR is …

–          Document all root causes and contributing factors:  describe every action items required to mitigate or eliminate the root cause

–          Identify any necessary technology changes

–          Identify any process changes within the organization.  For example, if the root cause of the problem turns out to an improper installation procedure, all recommendations required to prevent this problem will be clearly identified.

–          Identify organization changes.  For example, if the root cause turns out to be an inexperienced QA manager who did not accommodate a customer specific scenario, even this information has to be captured

Finally – CPR will not be effective without these 3 items:

–          For each action item identified above, a clear owner must be assigned to each action item

–          Then – “must complete by” dates must be established for each action item

–          Single accountable individual must be assigned to monitor progress which may involve (and usually does) members of multiple organizations:  Product Management, Engineering, QA, Technical Support, and Professional Services.

Until all action items identified by the CPR have been completed, the CPR is treated like any other open software defect and receives the same attention during defect / issue review meetings.

CPRs are an invaluable tool to enable and sustain a completely transparent organizational culture.

No one wants to see the customer’s business impacted as a result of the software not working as expected.   By definition, CPR is a cause for change, for the right reasons, as well as a vehicle to ensure that change happens on time.   Inability to change or unwillingness to improve will be quickly noted.   Those that cannot change will be simply replaced.

If you do not have a CPR process currently in place, try it.  When your Engineering team may have to do CPR on an ailing software release, the other kind of CPR – the subject of this discussion – will be very helpful.

Categories: Software Engineering

Building a world class global team 101: start building a culture based on mutual dependence

October 5, 2010 2 comments

This particular scenario happens very frequently.

The software company and its software product portfolio enter a new period of significant growth.  The software products are now sold and used by customers in multiple countries.

One software engineering team based in one country – usually where the product was originally developed and launched – is increasingly unable to maintain product development velocity, especially when product requirements are influenced by unique requirements in other countries.  It’s simply not possible to have the same team operate 24 hours a day.

It’s time to implement a global software engineering strategy:  additional teams based in other countries, working together, using time zones as a strategic advantage.

Companies which have multiple software engineering organizations in several countries are nothing new.  However, they all share one common attribute.  Before a global software engineering strategy led to a measurable improvement in product development velocity, during the early stage of implementing this strategy things actually got worse – not better.

This blog entry is about a practical and proven approach to skip “it’s worse than previously thought” and go directly to “one product, one team, working together – and faster”.

Please consider these steps.

First – perform a clinical review of how the software product is designed and built.

–          Look for opportunities to componentize the software engineering process

Then – create a culture of mutual dependence.

–          Intentionally assign ownership of components which depend on each other to different teams

–          This creates a culture where one team cannot succeed working largely alone.  If one team creates an API and another team consumes it, both teams cannot declare victory all major requirements are met, including ensuring that API cam meet certain scalability and performance targets.  Scalability testing could be done by yet another team in another country.

–          Collaboration by design compels all teams to work together while being driven by the same shared agenda

–          Do not co-locate QA engineering resources with the same team (as tempting as it might be due to convenience).  When QA engineers and software design engineers in test are based in different locations, it takes extra effort for functional knowledge to flow to QA engineers and test feedback to flow in reverse to software engineers.  Yet in this instance distance plays a very beneficial role.  Both software engineers and QA engineers must find and implement the right process to ensure that distance does not present itself as a barrier to success.  Again – collaboration by design …

Finally – implement a build process which supports the culture of mutual dependence

–          To illustrate:  if one team creates an API consumed by others, create a build process which first generates API component and then triggers integration of this component by separate build processes owned by teams which rely on this component.

–          Build failures will immediately trigger collaborative discussions between all teams.   The root cause is not as important as the collaborative process to find the root cause, perform lessons learned, and quickly implement  changes and if needed sustainable improvements.

One of the biggest benefits of implementing a culture of mutual dependence in a global software engineering organization is how quickly one could witness and correct weak links.

The culture of mutual dependence compels every manager – in addition to performing their duties – to become an interested party in the success of every other manager that owns a component or components that everyone relies on.

Those who cannot collaborate and put the needs of the customer and the product above all other considerations will be easily recognized and replaced.