Platform UI unavailable

Incident Report for Lucidworks Platform

Postmortem

Summary

On April 10, 2025, starting at approximately 15:43 UTC, users of the Lucidworks Platform UI at platform.lucidworks.com may have experienced issues accessing or altering their Lucidworks Platform product configurations. This was due to an error introduced during the deployment of a new service as part of a forthcoming product rollout. The issue was identified and resolved by 16:19 UTC. 

Root Cause

The incident occurred when a new service was deployed with an incorrect network configuration. The Lucidworks Platform uses a global load-balancing and routing system to direct traffic to the various services it provides. When introducing a new route to accommodate a newly-developed service, a misconfiguration of that routing layer inadvertently resulted in UI traffic being directed to this new service, which affected the reachability of platform.lucidworks.com for all users. Our engineering team identified the problem and resolved it by reverting the routing configuration to the previous stable state.

Our internal investigation revealed that the configuration error stemmed from incomplete internal documentation as well as a need for more rigorous processes for deploying new services. We are taking steps to address these internal factors to prevent similar issues in the future, as described below.

Detection of this incident could have been more timely.  Our postmortem analysis revealed a misconfigured alert that was incorrectly getting set to a lower severity than is appropriate for this level of impact. This misconfiguration has been corrected.

Lucidworks Actions

Lucidworks is committed to providing a stable and reliable platform experience for all users. As a result of this incident, we are taking the following actions:

  • We have updated our internal documentation and tooling to make it clearer to our engineers how to introduce a new service onto the Platform without affecting any others.
  • We are implementing stricter controls and validation processes for new service deployments, to ensure they are first rolled out in our dev environment for functional validation, and to confirm their introduction does not affect any existing services, prior to deploying them into production.
  • We are also enhancing the way we manage routing configurations on the Platform in order to make all routing rules more strict, to ensure it will not be possible for traffic intended for one service to be inadvertently routed to a different service in the future.
  • As mentioned above, we have fixed a misconfigured alert to appropriately notify our teams with the highest severity, should something similar to this incident happen again in the future.

Recommended Client Actions

There are no recommended client actions as a result of this incident.  If you have any follow-up questions or concerns, please submit a request with Lucidworks Support.

Posted Apr 17, 2025 - 15:25 PDT

Resolved

A change related to enabling a new product feature was rolled out that had the unintended effect of making our Platform UI unavailable, which was not caught in our development and testing environment. That change has been rolled back, and the UI is once again available.

No other product functionality was impacted by this incident. We will share more details in a postmortem update as soon as our analysis is complete.
Posted Apr 10, 2025 - 09:19 PDT

Investigating

The Lucidworks Platform UI at platform.lucidworks.com is currently unavailable. This has no effect on product functionality other than users' ability to change configurations. Our teams are actively working to resolve the issue.
Posted Apr 10, 2025 - 08:50 PDT
This incident affected: Lucidworks Platform (User Logins & Configuration UI).