US Platform Outage
Incident Report for Cisco
Resolved
The manual rerun of cache generation jobs is now complete, and normal platform performance has been restored.

Our Engineering team has pushed out two updates for this issue:

- our queue configuration has been updated to ensure that cache generation is remains unaffected by events such as this
- a bugfix has been deployed to address the unexpected volume of vulnerability definition updates
Posted Oct 31, 2018 - 15:50 CDT
Update
A bug was discovered that gave an incorrect "trending" categorization to vulnerabilities, which led to the larger than expected update processing and queuing. A fix for this is currently being tested and will be deployed later today. Vulnerability and asset scoring were unaffected by this bug.

The cache generation jobs started earlier today are still currently running. We'll provide a status update once those jobs complete and we've verified normal platform performance has been restored.
Posted Oct 31, 2018 - 13:08 CDT
Update
Our cache generation jobs are still currently running and are expected to finish in the next 4 hours. While that generation is occurring, users may experience slowness within the platform. Our Site Reliability Engineering team is continuing to monitor the cache generation and platform performance.
Posted Oct 31, 2018 - 10:38 CDT
Monitoring
The Kenna US Platform is now reachable and functioning as expected.

Our Site Reliability Engineering team determined that an unexpected number of our vulnerability definitions were updated overnight, resulting in higher than usual rescoring work and queuing.

This queuing caused delays in our nightly user data cache generation, which caused requests to bypass the cache and request data directly from our database platform. That process is significantly slower than reading directly from cached data, which resulted in timeouts for users attempting to access the platform.

Our Site Reliability Engineering team prioritized the cache generation, which allowed the platform to become reachable for all users. They are continuing to monitor the status of the platform and are investigating the higher than expected vulnerability definition updates.
Posted Oct 31, 2018 - 10:03 CDT
Investigating
The Kenna US Platform is currently experiencing an outage for all users. Our Site Reliability Engineering team is currently investigating.
Posted Oct 31, 2018 - 09:37 CDT