Atlassian finally explains the cause of the ongoing cloud outage

Atlassian

Atlassian has finally revealed the exact cause of an ongoing cloud service outage. The company estimates this could affect some of its customers for another two weeks.

When we first reported on this outage, Atlassian told us that a routine maintenance script blocked some customers from accessing their data after “accidentally” shutting down the sites of about 400 of the more than 200,000 customers.

Maintenance script erases hundreds of customer sites

Sri Viswanath, Atlassian’s chief technology officer, told how hundreds of customer sites were accidentally deleted on April 5, leading to a weeks-long incident that the company is still trying to address.

As he explained, the outage was the result of communication issues between two Atlassian teams working to deactivate the standalone legacy “Insight – Asset Management” app used by Jira Service Management and Jira Software on all customer sites.

Instead of getting the ID needed to deactivate the app, the deactivation team received the IDs for the cloud sites where the app was installed.

In addition, the maintenance script they used to disable the app was started with the wrong execution mode (i.e. permanent deletion of data instead of deletion with failsafe for recoverability).

“The script was executed with the wrong execution mode and list of IDs. As a result, sites for about 400 clients were erroneously removed,” explains Viswanath.

Update for Atlassian cloud outage

Nearly 50% of deleted sites recovered

The Atlassian status page was updated Thursday to say that engineers have restored functionality for 49% of users affected by the outage in a batch-based approach.

Atlassian initially estimated the recovery work would take no more than a few days, and confirmed to BleepingComputer that the outage was not the result of a cyberattack.

Earlier this week, however, the company revealed in emails sent to affected customers that restoring all affected users’ sites will likely take another two weeks.

“We recover affected customers identified by a mix of multiple variables including site size, complexity, edition, duration and several other factors in groups of up to 60 at a time,” the company said.

“The entire recovery process involves our technical teams, our customer service teams and our customer.”

This outage comes after Atlassian announced in October 2020 that it will stop licensing on-premises products from February 2021, with support for already active licenses ending three years later, on February 2, 2024.



This post Atlassian finally explains the cause of the ongoing cloud outage

was original published at “https://www.bleepingcomputer.com/news/technology/atlassian-finally-explains-the-cause-of-ongoing-cloud-outage/”