On October 29, 2025, beginning at 15:40 UTC, Lucidworks Connected Search query responses began to produce an increased amount of 504 gateway timeout errors which affected the ability to serve search traffic. Lucidworks mitigated the issue by rolling back the most recent code change, resolving the incident at 17:01 UTC. Full functionality was verified to be restored by 17:12 UTC.
The total impact duration was approximately 81 minutes, during which time users experienced intermittent increased search latency and failures in Connected Search environments.
Lucidworks proactively opened outbound Support cases for affected customers during this incident.
The incident was initially believed to be caused by a recent change to an unrelated service that introduced unstable connection handling. Following that update, we detected frequent connection resets to an upstream third-party service, which propagated to other services, resulting in timeouts and degraded search performance. A Severity 1 (S1) event was declared, and we began posting updates to our status page.
Despite our belief that the most recently-deployed change was isolated from Connected Search, we reverted it in an attempt to restore service. Around the time that rollback completed, we began to see recovery and declared the incident resolved.
However, in the process of debugging the change that was rolled back as part of our standard postmortem process, our testing suggests that the code in question may not have actually been the underlying source of this incident, and we are coordinating with one of our third-party providers to obtain more information. This incident report will be updated when that information is received.
After extensive investigation, consultation with third-party providers, and targeted testing of the suspected faulty code, Lucidworks identified the root cause as a compound issue. The recent change previously mentioned, designed to improve vector-lookup performance, successfully achieved that goal but inadvertently exposed a latent defect in HTTP connection reuse. In addition, the vector service was under-resourced, causing rapid scaling of replicas under query load. This scaling behavior increased the number of concurrent connections. This behavior, combined with lacking connection reuse, led to all available ports being exhausted. As a result, outgoing packets were dropped, and new connections could not be established to reach the index in order to serve search traffic.
To resolve the issue, Lucidworks increased service resource allocations, explicitly implemented HTTP connection reuse, and added enhanced logging and validation safeguards to detect similar issues in the future.
Lucidworks has taken the following actions as a result of this incident:
There are no recommended client actions as a result of this incident.