On November 13, 2025, at 16:12 UTC, the Lucidworks SaaS Platform became unavailable. This issue affected Lucidworks AI functionality (including Neural Hybrid Search), Commerce Studio and Analytics Studio accessibility, and Connected Search. API requests received HTTP 403 errors, and the platform.lucidworks.com UI returned the error message RBAC: access denied. The access denied message was returned when attempting to access Commerce Studio instances, Analytics Studio dashboards, Lucidworks Search promotion requests, or any general Lucidworks Platform configuration controls.
Lucidworks Engineering resolved the issue by 17:08 UTC, restoring service for all affected products.
To support our development and testing efforts, there are two comprehensive Lucidworks SaaS Platform environments: the Production environment and the Development environment. The Production environment provides services to all Lucidworks customers. The Development environment is used by Lucidworks Engineering to test changes before deploying them to Production.
Lucidworks employs a concept of capabilities in regional deployments to control product and feature availability on the Lucidworks Platform. For example, Lucidworks can have all of the Lucidworks SaaS Platform products available in the us-iowa region but deploy only Lucidworks AI to the us-texas region. Capabilities are configured by applying labels to various resources in Google Cloud Platform (GCP), which is Lucidworks’ public cloud provider. Our deployment tooling assesses the labels to determine where to deploy applicable microservices.
On November 12, 2025, a change related to an upcoming product release was deployed in the Development environment, where it was tested and confirmed to be functional. However, after deploying this same change to Production, the product did not display in our testing workspaces. Lucidworks investigated and determined the requisite GCP resource labels had not been applied to the necessary Production regions. A Lucidworks engineer ran an internal tool to update the deployment labels in Production.
Unfortunately, due to human error, this tool was executed with an incorrect parameter, which had the unintended consequence of applying the labels from Development into Production. This action subsequently updated the routing layer configuration, which operates off of some of the same labels. This resulted in the Production routing layer being configured to send traffic to the Development backend services. However, because the Production and Development environments are not attached to each other in any way, all incoming traffic was dropped.
The routing layer is deployed across multiple instances in order to provide highly available services. Once the same erroneous configuration was applied to all such instances, a widespread outage ensued.
API calls to the following systems began to fail:
Additionally, because the Platform could not route traffic to the user interfaces in Production, HTTP 403 responses were returned with an error message of RBAC: access denied. These errors affected the following user experiences:
Lucidworks has identified monitoring gaps that increased the time to detect this issue, which in turn increased the time to resolution. We utilize multiple monitoring and alerting tools to ensure timely notification of any production issues 24x7.
Lucidworks also identified gaps in the response and mitigation of this issue.
External synthetic checks were retrieving SSL certificates to ensure their integrity, which Lucidworks falsely believed would also alert if the service itself is down. However, the secure HTTP connection was successful, and the failure occurred beyond that point.
Lucidworks has active alerting in place for HTTP 500 errors but was not sufficiently alerting for 4xx errors. This was done under the false premise that these occur due to invalid requests and not invalid responses. In this incident, the system incorrectly responded with a 403 error code instead of the more accurate 500 error.
After the source of this issue was detected, at 17:07 UTC Lucidworks personnel changed the necessary values in Production to generate a repopulation of all relevant labels throughout the Lucidworks Platform. These labels propagated quickly. Within one minute, the routing layer had been updated with the correct information, and full service was restored for all affected products.
Lucidworks will take the following actions as a result of this incident:
Replace the tooling responsible for setting the labels with a tool that:
Enhance external monitoring tools to:
4xx responses, even in cases where these may be indicative of invalid calls rather than an inability to properly respond to those callsUpdate Lucidworks’ error handling processes in our product UIs so that:
403 error but instead respond with a 500 error that clearly specifies a Lucidworks infrastructure issue as the causeEnhance Neural Hybrid Search to more gracefully fall back to a lexical-only query in the event that Lucidworks AI is unreachable; the recently released Fusion 5.9.15 included an additional failsafe fallback to the Neural Hybrid Query Stage, and we will additionally implement similar fallbacks in a future release of Fusion to increase the ability for the overall system to withstand service outages such as this one
Lucidworks recommends that clients using Neural Hybrid Search upgrade to Fusion 5.9.15 as soon as possible, in order to take advantage of the latest enhancements, including the automated lexical query fallback mentioned previously.
We also recommend that clients subscribe to Lucidworks status updates to receive notifications about Lucidworks SaaS Platform incidents. To enable this feature, click “Subscribe to Updates” on status.lucidworks.com.