Runtime JS services Affected
Incident Report for Reflektion, Inc
Postmortem

RCA - Runtime JS Issue of August 3rd, 2023 

 

Executive Summary  

On August 3rd 2023, Discover runtime JS services were unreachable for around five hours.  During this time, runtime JS based implementations of Preview Search, Search Results Page, Recommendations and Hosted Product Listing Pages were not displayable on customer sites. 

Sitecore Discover team worked on a recovery strategy and applied the fix for each customer after this issue was investigated and identified  

 

Root Cause 

One of the Sitecore’s Content Delivery Network (CDN) resources residing within our AWS infrastructure was inadvertently removed due to a misinterpretation of the resource’s labelling. The removal of the CDN distribution hosts reflektion.js and customer CSS files, critical to the runtime JS components implemented on customer sites, directly caused customer’s site experiences to not load. 

 

Timeline (all time in PDT) 

6:00AM - Internal alert fired for CDN monitoring 

6:30AM - Customer started to file support case regarding Discover services not working 

7:00AM - Discover Engineering team received escalation and moved to further investigation 

8:30AM - Discover Engineering team worked out a recovery plan and updated one customer  

11:00AM - Discover engineering team applied fixes for all affected customers 

14:15PM - All affected accounts are updated and verified to be back to normal 

 

Impacts 

All and only customers implementing Sitecore Discover’s services via runtime JS experienced an outage whereby the runtime JS components did not load on their corresponding site.  

 

Planned changes / follow-ups 

  1. Internal alert escalation policy has been revisited and enforced for faster reaction times of any future like-issue.   
  2. All AWS resources’ labels will be revisited to improve comprehension and avoid future confusion. 
  3. Enhancements for faster recovery in such cases where runtime JS-related updates across our customer base are required.
Posted Aug 24, 2023 - 08:57 PDT

Resolved
This incident is now resolved.

Please let us know if you continue to see issues around runtime JS based services (beacon, widgets, scripts).
A postmortem / RCA will be shared in the next few days.
Posted Aug 03, 2023 - 13:45 PDT
Monitoring
We've applied fixes to our customer base. Runtime JS based services should be back to normal once browser cache are invalidated.

Team will continue monitoring the services. Please do not hesitate to reach out to Sitecore Support if you continue to see issues with Beacon, Sitecore Discover JS scripts or and runtime JS based services.
Posted Aug 03, 2023 - 11:06 PDT
Update
We are continuing to work on a fix for this issue.
Posted Aug 03, 2023 - 11:03 PDT
Update
We are continuing the work to apply the fix across all our customer base.
Posted Aug 03, 2023 - 10:26 PDT
Update
We've worked out a solution for the current outages and start to apply the updates for every customer. Customers should be expecting their runtime JS based services back to normal gradually in the next 30 minutes to 1 hour.
Posted Aug 03, 2023 - 08:52 PDT
Identified
We've identified the issue for this malfunctioning. We are actively updating our system to recover.
Posted Aug 03, 2023 - 07:23 PDT
Investigating
We are seeing an issue with services integrated with runtime JS.

We are actively investigating this issue.
Posted Aug 03, 2023 - 07:19 PDT
This incident affected: Production Beacon.