To print: Click here or Select File and then Print from your browser's menu
This story was printed from silicon.com, located at http://www.silicon.com/
Story URL: http://hardware.silicon.com/pdas/0,39024643,39166802,00.htm
BlackBerry blackout: Is RIM overreaching itself?
Was it really "an accident waiting to happen"?
By Marguerite Reardon
Published: Thursday 19 April 2007
RIM's massive BlackBerry email outage this week highlights how vulnerable the company's network has become as it tries to keep up with demand for its popular service.
RIM did not provide details of what caused the outage, which left millions of BlackBerry subscribers without access to email on Tuesday evening and into Wednesday morning. The company said in a statement released early on Wednesday that it was still reviewing the situation.
But analysts say that judging from the nature of the outage and who was affected, the problem falls squarely on RIM's shoulders. For one, the outage only impacted data services, including email and mobile web-browsing. Subscribers were still able to make phone calls and send and receive SMS text messages.
All of this points to some kind of technical issue within one of RIM's Network Operations Centres (NOCs), which act as an intermediary between corporate mail servers and recipients.
The email outage, first reported by WNBC, began around 17:00(PDT) on Tuesday and lasted until the small hours of the morning on Wednesday when email began trickling into inboxes to users across North America and parts of Europe and Asia. The widespread disruption highlights just how vulnerable RIM's network has become, especially as the company's subscriber base grows.
Over the years, RIM has built a good reputation as a reliable service provider attracting bankers, lawyers and even congressional lawmakers as subscribers. Lately the company has also been trying to attract more mainstream customers with new handsets such as the BlackBerry Pearl and the BlackBerry 8800, both of which include media players and mobile browsers for web surfing.
The result has been a spike in subscriber growth. In the company's latest quarter, it reported it had added 1.02 million new subscribers, taking its total to eight million. This is a huge increase from the two million subscribers the company reported a year ago when it settled its patent infringement case with NTP. The company expects to add between 1.125 million and 1.15 million subscribers during the current quarter.
Dan Taylor, managing director of the Mobile Enterprise Alliance, a not-for-profit trade organisation that promotes enterprise mobility, said: "With all the recent subscriber growth the company has seen, it's not surprising that they would have network problems.
"They've just about quadrupled their subscriber base in the last 12 to 16 months. In some ways it was an accident waiting to happen. I'm sure the people running the NOC were aware that something could happen, and I'm sure they are working to get it fixed."
While it's not known for sure what caused RIM's outage, it's not difficult to see how the very nature of RIM's network could potentially lead to a major service outage. RIM's service is centralised and it works by routing all BlackBerry emails through one of two main NOCs, which are essentially large data centres. One NOC is located in Canada and it primarily services the Western Hemisphere as well as parts of Asia, said analysts familiar with the company. The other data centre, located in the UK, handles email traffic in Europe, Africa and the Middle East.
The BlackBerry Enterprise Server, which sits on the corporate network, receives emails from the company's Exchange or Lotus email server and forwards those emails in an encrypted tunnel to one of the NOCs. The NOC then acts as an efficient delivery system that authenticates users and forwards the messages to the appropriate handheld device.
Because user authentication is handled by RIM away from the corporate network, it protects companies from hackers who may try to obtain information through email servers, which sit inside the company's firewall. RIM's approach also means corporate IT departments don't have to juggle relationships with multiple mobile operators because RIM handles all of that for them in the NOC.
The flipside of RIM's approach is that with only two NOCs handling emails from eight million subscribers, there are two major points of potential failure. And when something goes wrong in one or both of these data centres, it can result in an outage like the one that occurred on Tuesday night and Wednesday morning, which technologically paralysed users.
Gene Signorini, vice president of enterprise research at the Yankee Group, said: "Anytime you have a situation where traffic is flowing through a single data centre, there is potential for a catastrophic outage. But, that said, the RIM architecture also provides a lot of benefits to its corporate customers. It's just the nature of the beast."
Some of the most common issues that can result in an outage are power failures, failure of a critical component that takes down a larger component, software bugs, viruses and other attacks from the outside, or patches that fail. RIM hasn't identified which issue caused this particular outage but Todd Kort, principal analyst at Gartner said the outage may have been caused by a software bug.
He said: "If the RIM outage is affecting other parts of the globe, this fact most likely points to some type of software bug."
Marguerite Reardon writes for CNET News.com
Copyright ©1995-2008 CNET Networks, Inc. All rights reserved. Top of page