It was during the Stanley Cup in the late ’90’s. Final game of 5, and the ticket company was the one who old the tickets for the event. As walk-up and will-call ticket holders started getting tickets, the load on the storage array had gotten pretty heavy. A KNOWN hardware issue in the array’s single point of failure – it’s cache card – was tripped, and the Array crashed. It actually was designed to do that to protect the data.
While no data loss was suffered, it took almost two hours for the array, and the HPUX computers attached to it, to come back up. My self, our DBA, and a tech from EMC sat in the conference room outside of the computer room until 2 or 3 in the morning waiting for either the game to end or the array to crash again so we could replaced the cache card. Well, the game ended very late, in over time, the home team, whom we were selling the tickets for, lost. We did get the array fixed and everything back up, but it was a late night. And our general manager who was with the team owner watching the game had a real tough time of it. As any one in IT can attest, this is the stuff of nightmares.
Fast forward 5 or 6 years, I’m working at the bank, and we are evaluating disk arrays for a hardware refresh. During a technology presentation, I express my concern about the cache card being the single point of failure. And the sales engineer, the same guy who was the ticket companies sales engineer that I had been working with for years, flat-out denied that event ever occurred! Needless to say, I was livid. My team lead had to pull me aside and tell me to calm down. I got a hold of the DBA from my former company and he backed up my story. So the bank wasn’t going to go with that storage company. But then, unbeknownst to me and my team lead, the CIO at the time, who was on his way out, made a closed door deal. A week later he was bragging about his new Caddy, a month later he was out the door. And we were left with a bad solution that was not appropriate for our needs.