Archive for November, 2009

The “right side up” organization (or how not to demonstrate software)

November 4, 2009 Leave a comment

Many years ago, I worked for PepsiCo in Purchase, NY.  I learned a lot at PepsiCo.  In addition to building large scale data warehouse which gave SQL optimizers more than one headache, I also learned what PepsiCo called at that time “the right-side-up organization”.

In the right-side-up organization, employees are on top and managers are on the bottom.  It’s the manager’s job to enable their employees, set clear goals, remove barriers, and get out of the way.

Similarly, in the-right-side-up software company, customers are always on top.   Always.   And when the sales account manager and sales engineer travel to a customer location, while they are on site with the customer, the entire company should work for these two employees during that time.  Always – even the CEO.

Being invited to demonstrate mission critical software at a customer location is a privilege.   It takes a lot of preparation to ensure that demonstrations are executed flawlessly.   And should  trouble unexpectedly surface, the right-side-up organization will be the difference.  Here is what can happen if right-side-up organization is not in place:

Before the demonstration:

– Another account team was using the only demo environment and promised to reset all customer databases (“trust us, you will never know we changed all the data”)

– Grapevine has its benefits.  When our sales account manager learned that the demo environment was used by another account team, she called to confirm if the demo environment was reset and ready for her customer demonstration.

– Demo environment was still down for maintenance so no one could check.  Our sales account manager had to jump on the plane and never knew the true state of the demo environment.

During the demo:

– Demo environment was still down and unavailable.  The Operations team did not know that a critical, on site customer presentation was about to be conducted.

– When our sales account manager called everyone she knew, every call went directly to the voice mail.  The demo was eventually conducted but later the same afternoon.

In the right-side-up organization, this would occur very differently.

– Even if the demo environment had to be down for maintenance, someone would be 100% accountable to log in later in the evening and reset / restore all databases to ensure that the demo environment was available for the customer demonstration.

– Schedule of all customer demonstrations would be confirmed and shared in a cross-functional meeting on Monday morning or even on Sunday evening via a conference call.  In addition, sales account managers and sales engineers would be able to reach the right person on the first ring because at least 2 people would be on standby to act as problem resolution coordinators.

In the right-side-up organization, everyone always asks, “Who do I enable and how can I help”.    Try it.

Conversation with Seat 11B continued while getting luggage (Part 2 – what’s going on with US economy)

November 4, 2009 Leave a comment

I had no idea that my conversation with “Seat 11B”  would receive such attention.  Many readers of my blog wanted to know how the conversation concluded.

My conversation with “Seat 11B” continued while we were standing next to the conveyor belt, waiting for the luggage to arrive.    Ironically, the luggage took about 20 minutes to appear.  This turned out to be just enough time for me to ask “Seat 11B’ about his opinion about the state of US economy.

He replied, “The US economy has the equivalent of H1N1 flu.  This particular strain is however very different, because there is no vaccine or a quick fix that American society has gotten so used to.  This virus is called cumulative failure over time to do the right thing“.

The truth is “Seat 11B” and I talked longer than 20 minutes as the passengers around left ran to the their next destination. Worthy items of interest follow:

– One can easily see how the financial markets forgot to the the right thing. Mortgage-backed securities were designed to increase liquidity and promote home ownership by allowing mortgage originators to replenish their funds. What started as financial innovation, turned instead into a desire to collect transaction fees. What happened to the fundamentals of creating quality credit assets regardless of whether they might be packaged and sold as securities?

– Securities and Exchanges Commission – SEC – also forgot to do the right thing, while credit rating agencies were placing investment-grade (stamps of approval) ratings on questionable mortgage-backed securities. In late 2008, SEC did approve new measures intended to strengthen oversight of credit rating agencies.

– The top four US depository banks also (conveniently) forgot to do the right thing during several years prior to the financial crisis. These banks moved approximately $5.2 trillion in assets and liabilities off their balance sheet into special purpose vehicles. In essence, this approach enabled them to effectively bypass existing regulations regarding minimum capital ratios. Leverage and profits increased during boom. But the losses were dramatic during the crisis, leading to the failure or near failure of companies that were deemed too big to fail.

We talked about many other topics where the root cause was failure to do the right thing, ultimately by people either in the government or private industry.

I exited the airport while remembering a lesson from my Management 101 class at Pace University.

One of the most important jobs of a leader is to balance the needs of customers, shareholders, and employees.

Categories: Side conversations

Conversation with Seat 11B or what’s going on with the US economy

November 3, 2009 6 comments

I usually have a good book to read on a long flight home but this time I found myself without any reading materials. My trusted ThinkPad X31 is a perfect laptop for working on the plane. Its  12″ screen is small enough to survive an attempt by the passenger in front of me to abruptly recline the seat. But this time, without a spare battery and no book to read it was going to be a long flight.

The passenger next to me in Seat 11B turned out to be an economist. After short introductions, he asked me, “What do you think is going with the US economy?”. Although I mentioned that I was not an economist, he was still interested in my perspective.

I paused and quickly realized that even a long flight was not enough to give this conversation its due.  Starting at the very beginning seemed to be a good approach.

– It is widely reported that US consumer spending is about 70% of GDP. However, this number includes about half of the $2.5 trillion healthcare spending, or how much US government spends on Medicare. Still, consumer spending is a huge economic driver.

– Bubble #1: The financial crisis began to plant its seeds many years ago. The US economy was being artificially stimulated to encourage more home ownership than the market could create on its own. When the housing bubble burst, billions of dollars were spent on – to put it simply – on unproductive assets. In 2003, former CFO of Washington Mutual predicted this crisis but very few listened.

– Bubble #2: Four jobs paying $25K / per year are not equal to one job that paid $100K / per year but was eliminated in the US and recreated in an another country. Personal incomes are still dropping. Globalization can and does lead to transfer of wealth across borders and this process always produces those who benefit and those who don’t. The middle class in the US is not a beneficiary of this process. It is shrinking.

– Bubble #3: Healthcare. Anywhere between 25% and 50% (there is an ongoing debate about this statistic) of all personal bankruptcies in the US are attributed to personal health emergencies. Approximately 40 million Americans do not have health insurance (again – this number is also subject to an ingoing debate). But one fact is true. There is an urgent need to separate employment from ability to obtain basic and emergency healthcare services. Moreover, there is also a need to guarantee basic healthcare services. Imagine a recently divorced mother who chose not to work while she was raising the kids trying to get healthcare on an individual market …

Seat 11B asked me why I started my conversation by mentioning the US consumer.

The US consumer is not spending because of unprecedented, structural changes in the US economy.

Yet, I believe the US economy will recover a lot faster when all of us exercise our right to vote.

The conversation with Seat 11B turned out to be very interesting.

Categories: Side conversations

Neglecting batch jobs can be very expensive

November 2, 2009 1 comment

Batch jobs – or processes which run typically at night without anyone watching – are not glamorous. Yet they perform some of the most critical tasks in the business environment.

Every data center with batch intensive processing at night deals with batch window constraints. If there are 7 hours during the night to post all customer transactions, that’s the batch window. If the sales team exceeds everyone’s expectations and the number of customer transactions quickly increases by a factor of 4, well … the batch window is still 7 hours.

Neglecting batch jobs can be a very costly proposition. But first – what are the signs that batch jobs are neglected?

– Batch jobs seem to never end and no one knows why
– Batch jobs produce 2 messages: started, ended
– The amount of time spent by the engineering team to determine root causes of batch job failures continues to increase
– Operations team warns that batch window constraint is only 20 minutes away
– Engineering team warns that redesigning stored procedures developed by engineers who are no longer with the team is too risky (there are no comments anywhere in the stored procedure code)

How much of the above sounds all too familiar?

The scenario I described above where the number of customer transactions suddenly quadrupled is real. Here is how my team learned about:

– One of the critical batch jobs started and continued to execute well past 7:00am without any meaningful messages
– It took an enormous amount of time to understand the code written by others (and it was not very well written)
– It took another heroic effort to examine database tables and realize that the data everyone thought was improperly submitted by the customer was in fact real and urgent

Root cause: the customer acquired a company a few months ago and added new transactions to the file being transmitted every night. And batch jobs could not detect nor inform the Operations team that the execution profile was very different from the execution profile from last night.

If you design, engineer, or operate mission critical batch jobs, I suggest to sit down and set very clear objectives for not only how batch jobs will be designed but how they will monitored during execution.

To get you started – a few design objectives:

– Identify technical and business drivers which directly influence batch job execution profile, i.e. “number of transactions from customer A”, “lowest number of transactions during the last month”, “highest number of transactions during the last month”, “number of unique customers found”, etc. Technical drivers can be “amount of time required to update 1,000 transactions”, etc.

– Design an approach to store this information after execution and create automated reports which show how nightly batch job runs compare. Any significant deviation will be immediately seen and become the topic of discussion, hopefully well in advance of a crisis.

– Instrument, instrument, and instrument again. Batch jobs should produce meaningful & actionable messages about their execution. There is nothing more unfortunate than an error message at 2:30am which states “unknown data encountered – execution terminated”.

The best batch job is one that never requires a phone call at 2:30am, even if it encounters errors but with crystal clear messages such as:

“Unexpected data encountered in Record 543, Customer=A,Source File=A.File”
“Record will be ignored and saved in File B for manual review”
“Processing will continue. No operator action required”

Categories: Software Engineering

Dinner in Vienna, Austria and then coffee at Cafe Demel

November 2, 2009 Leave a comment

I am a big fan of Vienna, Austria. If one has to fly to Eastern Europe on business, Austrian Airlines has very convenient flights from Vienna Airport to cities in Russia and Ukraine. I preferred to fly via Austrian Airlines to Krasnodar instead of flying through Moscow.

Transiting and sleeping over in Vienna is a breeze. NH Hotels has a property within walking distance from the arrivals terminal. In 5 minutes of getting my luggage from the conveyer belt, I was already checking in and thinking about dinner plans.

How about jumping into a cab and getting some dinner in Vienna?

It would be difficult not to suggest Steirereck in Statdpark. I dined at Steirereck many times and the food is simply exceptional.

Next – let’s get some coffee at Cafe Demel which is perhaps the best cafe in Europe.

If you have some extra time, take the train the Prague and enjoy dinner at La Degustation or Coda.

Why runtime awareness in any code is important

November 1, 2009 2 comments

This is a classic scenario that repeats itself over and over again.

The team spent a lot of time testing a new and very important software release.   Everyone is tired but happy.  It should be a good release.   But – shortly after the long-awaited  release has been installed by  THE customer (THE customer waited for this release 2 months), one of the processes performing a critical function does not work as expected.   Worse – this process performs critical transaction integrity functions and transaction verification logs do not contain valid data.

What happened?  Exhaustive regression tests executed perfectly.  Yet, something in the customer environment caused the software to work incorrectly.

The problem was eventually traced to a missing configuration file.  When the process started in customer’s own environment, the code looked for a specific configuration file and could not find it (even more troublesome – the installation procedure failed and no one noticed).   Then – the code proceeded to read default configuration parameters from a file so old that some of the parameters caused the process to cause multiple errors much later during the execution.

First – the solution (and lessons learned at the end):

– This process was immediately changed to report critical information when it started, while it was running, and when it ended.

– When the process started, it reported – among many critical items – location and contents of all configuration parameters read, accepted, rejected, defaults used instead, or ignored (and action taken afterwards)

– In addition, the process initialization was gracefully terminated – with detailed error messages – if a critical parameter was incorrect.  This is a good practice to ensure the customer continues to maintain continued awareness of all elements essential to the operation of any mission critical software product.

– While the process was running, any dynamic parameter changes were logged and reported

Second – what was the cost of not doing this right the first time?

– 3 days of exhaustive research by an experienced software engineer who had to jump on a plane and visit the customer.  No one else could help other than the code owner.

– 6 person-days of 2 other software engineers working around the clock to support their colleague at the customer location

– Delayed customer reference to a prospective client

Third – lessons learned:

– Make runtime awareness one of the design objectives

– Challenge the team during design reviews (“how would you know if these conditions may be present …”)

– Reject the release if insufficient runtime awareness has been engineered in the code

Runtime awareness in software products is not a new concept .  Yet, it’s importance has never been higher.