Saturday, October 28, 2006

Case Q41: Case of False Sense of Confidence (Final)

I don't know if this is a bad luck or mis-alignment of certain planetary system. Just when I thought everything is fine and dandy, and have giving myself a breathing space, the storm was brewing over at the horizon. Got a call from the service manager on the night of first Raya day (Tues).

Out of the blue the system develop a problem again, after it has been running fine for one week since that eventful days. Or could this be one way the nature acting out and reaching out for a state of equilibrium. Sorting out it's own kinks. At first I didn't pay much attention, as I was told it was just a hardware problem. My thought was, so just replace the hardware and the system will be up and running. Anyway there are four machines running, so there should be three more machines running and taking in the load, maybe have to work harder than usual. But should still be okie.

Morning of second day of raya, I prepared myself some fried rice and stuffed myself silly with plenty of it. Knocked myself out after the meal. It wasn't until mid afternoon when wake up and found out there was a couple of miss calls on my mobile. Return the call to the service manager. There is another crisis. The remaining three services on the three machines decided to act up, and refuse to work properly. Darn!

He has escalated the issue to the platform provider, and this case is logged as Q41. After many calls placed to the top personel of the platform provider, we had finally managed to get them to set the serverity of this case to the highest level. Due to the serverity of this issue, resources all over the world has been roped in and the case is getting 24hours attention.

The personnels involved in this case are from Australia, China, Korea, Indonesia, Singapore, Malaysia, Pakistan and France. It goes according to time zone. By the end of the working day in one time zone, reports and case documents were prepared and handed over the personnel in the next timezone to continue. This way, virtually the file of this case had circulated planet earth almost three complete circles before we finnally nailed it down last night.

Like one weeks earlier, I wasn't roped in by the virtue of me being knowledgable the domain of this issue. Being fully aware of that, I didn't really talked too much in most of the conference call, but rather just trying to asked certain questions which in my opinion may lead to the root cause of the problem and eventually a solution.

I have choosen to let all the experts to lead the investigation, and trust them completely (that was the best option for me, considering the fact the other option for me was to bury myself into all the documentations of the platform, which may take at the very best 5 days. We can't wait that long). For which last night I had discovered may not be a good decision. All the earlier deductions and conclusions that I came upon throughtout the problem investigations and solving process are based on my assumption - they know their system best. But it may also due to some difficulties in communication, as all the personnels are from all sort of diverse mother tongues and culture backgrounds.

I initial and primary suspect was the application codes. I have a bad feeling that the codes are not correctly implemented. So my main push when I first join the conference calls among the platform support team was to get them to review the codes. So after that was being done with, or so I thought. With a false sense of confidence that the codes had been properly audited and all recomendation implemented yet the problem didn't go away, I trained my mind on other part of the system which maybe the root cause of the problem. Because of that false information, I have made a few wrong deductions which delayed me from finding the correct solution.

One whole day was wasted at looking for the problem at the wrong places. That was until, by certain turns of fortunate events, I observed there was a certain pattern on the behaviour of system yesterday. And decided to throw away all the earlier deductions, and restarted the investigation from zilth. Back to my primary suspect, which is the source codes of the application. But this time instead of finding out what was wrong with the existing code, which I can't find any because I hardly know where to look at because I don't have full knowledge of the API that was being used and I don't have time to study them all.

I started out all over again, and with certain luck and maybe help from heaven, got hold of a set of example codes which comes with the platform. Instead of trying to understand the whole intricate API, I just use the code as it is, and with my newly gained limited knowledge and insight, carried out tests on the live system (which is a taboo in this industry, but fuck care... the system is not running anyway... either way we will be fried). Nine hours and 9 iterations to the codes later, we have finally got the system up and running.

The implication of this action of mine is the works which was carried out by consultant in coming out with the problematic codes, goes completely down the drain, maybe except for a few portions which I ported to my new codes. I guess, I just gotta put feelings aside in crisis like this. I guess I sort of giving him many chances and oppurtunities redeem himself by fixing his own codes, yet when that is not forth coming, I have no other choice.

Everyone let out a sigh of relieves in the conference call last midnight, and glad that we do not have to attend any more conference call at ungodly hours.Yet, we couldn't be very certain that the new codes is complete correct until the system has been running for at least few months without any problem. Still keeping my fingers crossed. GodSpeed!

Case Q41: Case of False Sense of Confidence (Part 1)

That is the case number logged for the problem, of which I think we have finally subdue last night (cross my finger hard). For almost two months since the problem surfaced, I stayed on the sideline, and offering only some limited advices from time to time.

It is the system that I went to Pakistan to deploy almost two months back. The problem surfaced on part of the system which is not within my direct scope and handled by another consultant. I have not much domain knowledge on that part of the system.

The service team was struggling with one after another crisis, and it almost got out of hand two weeks ago. The night before that eventful day (technically speaking it's almost 2 days), I was staying back in office until almost 4 am, coaching and helping another of my team member to get a prototype for a demo system up and ready for an event in red dot under.

On my way back home, got a SOS sms from the team in Pakistan. Upon reaching home, hooked up to the Net to find out the situation on the ground. The progress was slow, and in that unheavenly hours all my mental and bodily constitutions were already well on their way to dreamland. I was only aware about the minor issue for which the SOS was sent to me. I knew there was another issue which was being handled by the consultant at the site, but not aware about the grave situation of the problem. Thus the slow response to my recommendation for the minor issue.

After giving some recomendation for the minor problem, and wait to no avail for the result of my recomendation I decided to head for the sack at almost 6 am. I had to attend another meeting at 10 am, I need to grab little sleep or I will be in state of zombie in the meeting. Just when my back touched the comfy of my bed, my mobile started to buzz.

The service manager was on the other line. That was when I discovered that the consultant was struggling with another major problem. The service manager was asking if there is anything I can do to help avert the disaster, we both knew that the only thing that I can contribute toward finding solution for the problem is my knowledge of Java, which happened to be used to implement that part of the system. Programming platform aside, a certain domain knowledge of which I hardly had, is required to solve the problem. After some discussion, we decided that I will gave it a try anyway. Except that I didn't bring my notebook with me.

So I had to head back to office at that ungodly hour. I don't think I can attend the meeting anymore, so I wear casually and head back to office. Just in time to finally caught the bugger who always park his car at the disabled parking lot next to the elevator. Now I understand how he never fail to get that lot. Come to office at that unheavenly hour, don't tell me just because he want to have the disabled parking lot.

Back to subject matter. So I got up to office, but only after waiting for almost 30 mins for the service manager to come with the office key. Told him I don't think I will join him the the meeting at 10am. Came up with a workaround based on certain example codes and some guess work within 40 mins, and sent it over for testing. We were well aware that it doesn't completely solve the problem, but at least if it work, we can temporarily put it in and buy us more time to find better solution.

Now the problem is the consultant at the site is still lost in his troubleshooting and did not put much attention and priority in testing out my work around. I can understand how hard it must have been for him. Only after almost one hour of no response from the ground, the service manager and me decided that we have no choice but to force our decision down the throat of the consultant at the site. He had to leave for the meeting, and the charge is handed over to me. Made a few decisions, and 30 mins later, we had a good news that my workaround indeed work. Good enough to buy us more time and brought down the tension level of customer. The case was being handed back to the consultant.

After had the breakfast bought for me by a colleague, at lunch time, I head to a discussion room and trying to catch a little nap lying down on the couch. It was Friday, I was suppose to monitor my team member's work to ensure the demo system was ready in time for the event the next monday. Hope to do that after my little nap. For one hour trying to get into nap, I probably only sleep for 5 mins. I gave up. Decided to head back home to sleep, and plan to get back to office at around 5 pm.

At home, by 5 pm. I still can hardly sleep. Make a few calls back to office to check on status of two projects. Then communication through msn to engineer in Pakistan, and give a few recommendation and pointers on their next action plan. By 7 pm, got calls from the service manager and another colleagues, asking me out for a drink. Calsberg was having some sort of event at the pub where we always frequented. There were some drinking competitions. Got involved in one, and gulped down 2 mugs of beer and added to few mugs earlier already in my system - perfect recipe for drunkeness. From then on, only tooks sips and by the time we left for supper, got back my senses. Not too bad considering less than 2 hours of sleep in 48 hours.

Thursday, October 26, 2006

Breakfast

Been pretty pretending to be busy with stuffs at work, lately. All the projects suddenly came in at almost the same time. How bad is it? Well consider this, it's so bad that prioritizing tasks for myself and the rest of the team members are in itself a major problem. So how do I handle it? The only rules I follow right now in prioritizing the tasks (borrow some words from another colleague) is - I will do which ever task that I am going to get fuck the most and the soonest first. Other tasks, I just close one eyes put aside until I am about to get fuck.

Spending second half part of the day on second day of Raya in conference call which dragged on until 5am the next morning. Two weeks ago, spent almost 48 hours continueously in office.

* * *
This was the breakfast I had on Deepavali day. Normally I seldom had breakfast, and that day just feel like having one. So here the pics.

The cornflakes even matches with my kitchen table top :)

Add in some flavour... yeah... it's ribena...

Next... pour the milk

Voila... Ribena Cornflake Milk :-) Yummy.....

Okie... this is lame... I don't really in the right frame of mind to write more.

* * *
Talking about frame of mind, I was struggling to brush up my programming. For two whole weeks, including weekends, was trying to come up with proper development framework for which the team suppose to follow.

Trying to put together a framework with consisted of among others, these good shits - Hibernate, Spring Framework, EJB3 and JBoss.

Tooks me two days just to read up and almost two weeks put together the framework. I think ever since I was assigned new role almost two years back, these two weeks are the weeks when I am in the state of most aloof. Even in office, I hide myself in my small cubicle and mostly cut short any crap discussion, small talk, even with my boss :P.

After such a long time didn't do serious programming, I find it hard to get back into the flow. So once I am in the flow, I tend to ignore other things.

Sunday, October 08, 2006

Gone with the Wind

The past two weeks has been sort of like a roller-coaster week. So many uncertainty and so many crisis. By now some has settled, some are still in wait-and-see situation. Just hope good things will come out of this. Please be warned that this is not a post, but a rant.
The team member who has been giving me problem has finally made a move. The HR informed he has submitted the resignation letter, while I was away in Pakistan last month. And I was informed by my boss, that he has accepted it. So next week will be the last week he will be with us.
My boss wasn't too happy with the pressure that I put on both him and the guy. After a few failed attempts to communicate with the guy, I have no choice but to hand him in to my boss. Initially my boss still try to convince me that he can talked some sense into the guy.
He is my boss legacy. Hired and look upon highly by my boss. I don't have the patient anymore. So many chances given and so many screwed up over and over again.
As much as possible, I tried to assigned a team member to tasks that they like and are good at. Yet there are exceptions and this is not a given right but a privilege. Futher more it has to be a two ways give-and-take, not a one way street. Everyone in the team has to be treated as equal as possible, and of course also by taking into consideration their role and performance. If one cannot agree to this, then I don't think he/she fit into the team.
Yet, my boss sometimes can be too romantic. I won't blame him, I am well awere that he has a certain history with the guy. But yet I can only tolerate up to certain extends. If his antics only effect me alone, I think I won't be pressuring and pestering my boss so much. Every time he created a mess, I have no choice but to ask favour from other team member to clean up after him.
Still my boss was telling we can't treat a friend like this. Well no doubt he is also my friend. But for me, a friend do not purposely or at least try no to create trouble for another friend. Well that may explain why I don't have many friends :P. Friendship is a two way street. The keyword here is be fair, and there nothing I can do that is my inherent weakness. Or if you want, you can blame it my my star too - Libra.
Anyway after a certain incident, my boss finally has to agree with me. Despite that his still not a cold-blooded as me :P.
* * *
There is also another legacy staff in another team, which I just heard will soon be gone too. I have a fair share of experience with her too. A supposingly a senior, who once even held even a management post.
Just recently, her department team head was seeking an advise from me for a certain project.
I gave my advise and suggestion. Well aware that she is suppose to be the so called IP Networking expert, and to save her some face after she can't give any suggestion, I made a recomendation which I said upfront that I don't even sure will work. IP Networking is not my current core field, but I think I understand it enough and even more than the so called networking expert.
Anyway, I was suggesting the team to tried out my suggestion because I wasn't sure if it really work. Her head assigned her to test out my recommendation. So I explain to her what need to be done. Only the next day, she spent one whole day calling up her network expert friends to dicuss my recomendation. By the end of the day, she told her team mates that based on her discussion with her friends (the so called experts) it is very unlikely that my suggestion will work.
To say that I am pissed was an understatement! I have no problem with no being right all the time. But, please give me a benefit of doubt and test it out like what I have suggested. Further more it will only take less than two hours to test out my suggestion, as compare to her calling up her expert friends whole day to discuss about it and aside from saying it will not work, can't even suggest a better alternative.
I guess it's about time that I should stop being nice. I googled a few key words and read up a few literatures, which took me about 5 minutes in total (compare that against more than 300 mins she spent discuss with the experts). What I read, confirmed my guts feeling that my suggestion will work. I got back to her and infront of the team mates, told her right in her face that I don't think her expert friends don't understand what they are talking about. Instead of giving another suggestion, this time around I am firm asked her to stop discussion with her experts, and just spend a couple of hours to test out what I told her to do.
The next morning, she came to my desk that tell me my suggestion does work. And you dare call yourself a Network Expert?
* * *
Now that these legacy is gone, I think we will have more time to concentrate on real works and real problems.