There is now way that a external software can always find a intermittent memory fault.
You can do some things that is acceptable on a workstation that do graphics but if you do science or manufacturing work were every result is important you basically have to recalculate everything twice on different nodes, this will lower the peak performance with 50% but is the only way to know that the result is the right one.

This is a fantastic system but for organization were the result has to be correct they better look at a system with ECC.

