Science

Amazon.com suffers far-reaching outage

Amazon.com struggled Friday to restore computers used by other major websites such as Reddit as an outage stretched beyond 24 hours.
In this screen shot of the fousquare.com website, an apology for technical difficulties is displayed. Dozens of major websites including Foursquare, Reddit and others crashed or suffered severe slowdowns after technical problems hit their hosting company, Amazon.com. ((Associated Press))

Amazon.com struggled Friday to restore computers used by other major websites such as Reddit as an outage stretched beyond 24 hours.

Though better known for selling books, DVDs and other consumer goods, Amazon also rents out space on huge computer servers that run many websites and other online services.

The problems began at an Amazon data center near Dulles Airport outside Washington early Thursday. On Friday morning, Amazon's status page said the recovery effort was making progress, but it couldn't say when all affected computers would be restored.

Most of the sites that were brought down by the outage on Thursday were back up on Friday, but news-sharing site Reddit was still in "emergency read-only mode," and smaller sites were still reporting trouble.

 

'It's a pretty vulnerable feeling. This is a really big message to us that we need to revisit our strategy ' —Josh Cochrane, Palo Alto Software

Location-sharing social network Foursquare and HootSuite, which lets users monitor Twitter and other social networks more easily, appeared to have recovered.

Many other companies that use Amazon Web Services, like Netflix Inc. and Zynga Inc., which runs Facebook games, were unscathed by the outage. Amazon has at least one other major U.S. data center that stayed up, in California.

It's not uncommon for internet services to become inaccessible due to technical problems, sometimes for hours or even days. But the outage is notable because Amazon's servers are so commonly used, meaning many sites went down at once.

Amazon, which had not responded to requests for comment, has not revealed how many companies use its internet services or how many were affected by the outage.

No one knew for sure how many people were inconvenienced, but the services affected are used by millions.

Amazon Web Services provide "cloud" or utility-style computing in which customers pay only for the computing power and storage they need, on remote computers.

Seattle-based Amazon has big plans for AWS. Although it now makes up just a few percent of the company's revenue, CEO Jeff Bezos said last year that it could eventually be as large as Amazon's retail business. Competitors include Rackspace Hosting Inc. and Microsoft Corp.'s Azure platform.

Some people consider cloud computing more reliable than conventional hosting services in which a small company might rent a handful of computers in a data center.

If one of them malfunctions, the failure can take down a website. But "clouds" like AWS use vast banks of computers. If one fails, the tasks that it performs, such as running a website or a game, can immediately be taken over by others.

When a company needs more capacity, maybe because of a surge in visitors to its website, it only takes minutes to rent more computers from Amazon.

But cloud computing isn't immune to failure, either.

Backup system appears to have failed

Lydia Leong, an analyst for the tech research firm Gartner, said that judging by details posted on Amazon's AWS status page, a network connection failed Thursday morning, triggering an automatic recovery mechanism that then also failed.

Amazon's computers are divided into groups that are supposed to be independent of each other. If one group fails, others should stay up. And customers are encouraged to spread the computers they rent over several groups to ensure reliable service. But Thursday's problem took out many groups simultaneously.

Outages with Amazon's services are rare but not unprecedented. In 2008, several companies lost access to their own files for about two hours when one of Amazon's data centers failed. The companies included DigitalChalk Inc., which delivers multimedia training over the Web.

In general, Amazon Web Services have been more reliable and, above all, cheaper than many other hosting systems, said Josh Cochrane, vice president of product development at Palo Alto Software in Eugene, Ore.

But the firm's websites and web-based applications that create business plans were all brought down by Thursday's crash.

"It's a pretty vulnerable feeling," he said. "This is a really big message to us that we need to revisit our strategy."

That might include spreading the applications more widely over Amazon's network, so that problems at one data center won't bring down everything, he said.

Amazon engineers struggled throughout the day to rectify the problem. Leong said the problems are of a type that's not covered by Amazon's money-back guarantees.

Amazon shares rose $2.02 US, or 1.1 percent, to close Thursday at $185.89.