Speaker
Description
The reliability of commercial off-the-shelf (COTS) devices such as solid-state drives (SSDs) under radiation has become a critical concern in space applications and high-altitude environments. Radiation testing has proved to be an effective method for investigating failure mechanisms and evaluating reliability. An overview of the common reliability of SSDs, focusing primarily on failure mechanisms and design mitigation techniques in relation to factors such as NAND program/erase cycles and temperature dependence, is provided in this study.
Commercial solid-state drives (SSDs) were subjected to broad-spectrum neutron exposure at the China Spallation Neutron Source (CSNS) to analyze radiation-induced errors in components and functional interruptions in nonvolatile memory express (NVMe) and serial advanced technology attachment (SATA) SSDs. The experiments revealed apparent sensitivity differences, with NVMe SSDs demonstrating better resistance at the module level due to advanced controller technology and enhanced error correction capabilities than SATA SSDs. For NVMe SSDs, functional interruptions were primarily identified as NAND Flash faults, such as timeouts, and dynamic random access memory (DRAM) errors, such as stuck bits, while controller vulnerabilities contributed minimally. Moreover, this study examines the dominance of read errors as the primary failure mode in NAND Flash and explores how the cumulative characteristic of these errors correlates with functional interruptions.
Acknowledgment
Present study was supported by the Grant U2241280 and Grant Y2022057.