Analysis of SSD Reliability during power-outages

Revision History
Published: 27 Dec 2013 - first published
Updated  : 28 Dec 2013 - add TODO list of Samsung 840 and Crucial M500
Updated  : 29 Dec 2013 - add Editor's note after slashdot article
Updated  :  1 Jan 2014 - Add stec-inc S230 SATA Slim

Editor's note 29Dec2013

Thank you for everyone's input from the Slashdot story.
The additional drives for consideration is extremely useful but they will
have to go through the same process of cost-benefit - followed only then by
reliability - analysis that the other drives went through, with the additional
handicap that the Intel S3500 has already "won" and been selected for live
deployment.

Which brings me to a keen point that is difficult to express when there
are 275 slashdot comments to contend with.  The belief that Intel paid
for this report comes through loud and clear.  Those who believe that
are severely mistaken.  Let's look at it again.

Statement of fact: The S3500 SSD happens to be the sole drive which
a) is cost-effective
b) passed all the extreme tests
c) is within budget
d) was clearly marked in the online marketing as "having power loss protection"
e) is not end-of-life

So let us be absolutely clear:

     Fact: the Intel S3500 was the only drive which matched the requirements

That it did so so completely comprehensively despite the extreme nature of
the testing, which lasted several days whilst all other drives failed within
minutes, is the real key point of this report.

However that point - that success - is itself also completely irrelevant
beside the fact that the testing itself provided the company that commissioned
the work with an amazingly high level of confidence in "an SSD" despite their
complete paranoia which had driven them to commission the testing in the first
place.  To make that clear:

    The company doesn't care about Intel: they care about a reliable drive

If there were other drives that had passed or were known about or could have
been found, they would have been added to the list already.

Analysis of SSD Reliability during power-outages

This report was originally commissioned due to the remote deployment of
over 200 32gb OCZ SSDs resulting in severe data corruption in over 50%
of the units.  The recovery costs were far in excess of the costs saved
by purchasing the cheaper OCZ units.  They were replaced rapidly over a
period of years by Intel SSD 320s, where, despite remote deployment of
over 500 units there have only ever been three unrecoverable failures.

However, the Intel 320 SSD has reached end-of-life, so a replacement was
sought.  Due to paranoia over the OCZs an in-depth analysis was requested.
Around the time that the paranoia was hitting, a report had come out
on slashdot, covering power-related corruption.
It made sense therefore to attempt to replicate that report, as it was
believed that the data corruption of the OCZs was related to power loss.

This report therefore covers the drives selected and the testing that was
carried out.  We follow up with a conclusion (summary: if you care about
power loss don't buy anything other than Intel SSDs - end of story) and
some interesting twists.

Picking drives for testing

The scenario for deployment is one where huge amounts of data simply are
not required.  An 8gb drive would be able to store 1 month's worth of sensor
data, as well as have room for a 1.5gb OS deployment.  A 16gb drive stores
over two months.  Bizarrely, except in the Industrial arena the focus
is on constant increases in data storage capacity rather than data
reliability.  The fact that shrinking geometries automatically results
in higher susceptibility to data corruption is left for another time,
however.

Additionally, due to the aforementioned paranoia and assumptions that the
data loss was occurring due to loss of power, the requirements to have
"Power Loss Protection" were made mandatory.  Power Loss Protection is
usually found in Industrial and Server Grade SSDs, which are typically
more expensive.

So, finding low-cost low-size reliable SSD reported to have
"Power Loss Protection" proved... challenging.  After an exhaustive search,
the following candidates were found:

 Crucial M4 128gb
 The unpronounceable Toshiba THNSNH060GCS 60gb
 The new Intel S3500
 The Innodisk 3MP Sata Slim (8gb and 16gb)


The Innodisk units came in around £30, whilst all the other drives came
in at between £60 and £90.  Also added to the testing was the original
32gb Vertex OCZ and the Intel 320.

Test procedure

The original report at the FAST conference was quite hard to replicate:
the report is a summary rather than containing detailed procedures or
source code.  A best effort was made and then extended.


OS-based test. The first test devised was to boot up a full OS and
to power-cycle it using a mains timer.  This test turned out to be completely
lame, except for its negative results proving that simply switching power on
and off was not the root cause of problems.
OS-based huge parallel writes. The second test was to write huge
numbers of files and subdirectories in parallel.  Thousands of directories
and millions of small files as well as one large one were copied, sync'd then
deleted using 64 parallel processes.  Power was not pulled during
this test.
Direct disk writing. This test was closer to the original FAST
report, except simplified in some ways and extended in others.


Crucial M4

The Crucial M4 was tested with an early prototype version of the SSD
torture program.  It was power-cycled approximately 1,900 times over a
48 hour period.  Data was randomly written, sync'd and then read back,
whilst power-cycling was done on a random basis between 8 and 25 seconds
through the read-sync-write cycle.  Every 30 seconds the geometry was
checked and a smartctl report obtained.

After approximately 1600 power-cycles, the Crucial M4's SMART report showed
over 20,000 CRC errors.  Within 1900 power-cycles, that number had jumped
to 40,000 CRC errors and had been joined by serious LBA errors.

Conclusion: epic fail. Not fit for purpose: returned under warranty.

Toshiba THNSNH060GCS 60gb

This drive turned out to be a little more interesting.  It passed the OS-based
parallel writes test with flying colours.  Running for over 20 minutes, several
million files and directories were created and deleted.  In between each run
no filesystem corruption was observed.

Then came the direct-disk writing.  It turns out that if the write speed is
kept below around 20mbytes/sec, the Toshiba THNSNH060GCS is perfectly capable
of retaining data integrity even when power is being pulled, even when there
are 64 parallel threads all writing at the same time.

However when the write speed exceeds a certain threshold, all bets are off.
At higher write speeds, data loss when power is pulled is only a matter
of time (minutes).

We conclude from this that the Toshiba THNSNH060GCS does have power-loss
protection circuitry and firmware, but that the internal power reservoir
(presumably supercapacitors) simply isn't large enough to cover saving the
entire outstanding cache of writes.

Conclusion: close, but no banana.

Innodisk 3MP Sata Slim

There were high hopes for these drives, based on the form-factor and low cost.
However, unfortunately they turned out to have rather interesting firmware
issues.

The observed write-then-read speeds (a write followed by a verify step)
turned out to be adversely affected by the number of parallel writes.  If
there were no parallel writes (only one thread) then it was possible to
write and then read at least 18 mbytes per second (i.e. the data was written
at probably 30mbytes/sec then read at probably 45mbytes/sec, except that
the timer was started at the beginning of the write and stopped at the end
of the read).  This speed was sustained.

However, if there were even just two parallel write-read threads, the speed
was sustained for approximately 15 seconds and then dropped down to 1 (one!)
mbyte/sec.  The more threads were introduced, the less time it took for
the write-then-read speed to drop to a crawl.

Paradoxically, if the torture program was suspended temporarily even for
a duration of a few seconds, then when it was resumed the speed would shoot
back up to 18 mbytes / sec and then once again plummet.

We conclude from this that either the CPU on the Innodisk SATA Slim or the
algorithms being used are just too slow to deal with parallel writes.  There
is clearly a RAM cache which is being filled up: the speed of writing to the
NAND itself is not an issue (because if it was, then single-threaded writes
would be slow as well).  So it really is a firmare / CPU issue: when the
cache is full of random parallel data, the firmware / CPU goes into meltdown,
cannot cope, and the write speed suffers as a result.

To Innodisk's credit, they actually responded and were given a copy of
the SSD torture program and instructions on how to replicate the issue.
It will be interesting to see how they solve this one: updates will be
provided.

Conclusion: wait and see.

OCZ Vertex 32gb

This was also interesting.  The OS-based test (which was ordered to be run,
despite reservations that it would be ineffective) showed absolutely ZERO
data corruption.  Let's repeat that.  When picking one of the worst
drives with the worst smartctl report ever seen that was still functional
from a batch with over 50% failure rates and using it to install an OS and
then leaving it to power-cycle over 100 times there was ZERO data
corruption.

What we can conclude from this is that power-loss had absolutely nothing to
do with the data-loss.  What it was then necessary to do was to devise a
test which would show where the problem actually was.  This test was the
"OS-based huge parallel writes" test.  Running this test for a mere 5 minutes
(bear in mind that there was no power-cycling) resulted in immediate data
corruption.

Further investigation was therefore warranted.  OCZ (before they went into
liquidation) had been advising - without explanation - to upgrade the firmware.
After working out how this can be done on GNU/Linux systems, and after
observing in passing that the firmware upgrade system was using syslinux
and FreeDOS, the firmware was "downgraded" to Revision 1.6.

The exact same OCZ - with an incredible array of failures, CRC errors,
lost sectors as reported by smartctl - when downgraded to firmware Revision
1.6 - then showed ZERO data corruption when the exact same OS-based
parallel write testing was carried out.

which is fascinating in itself.

Further investigation then dug up an interesting nugget: it turns out that
OCZ apparently had been warned by Sandforce not to enable a switch in
the firmware which would result in "increased speed".  OCZ, in their desperate
attempt to remain "king of the speed wars" ignored the advice that doing so
would result in data corruption.  The results correlate with this advice:
at higher speeds, data corruption is guaranteed to occur.

The hypothesis here is that at higher speeds there is a bug in the firmware
which results in the data being written incorrectly.  What was not determined
was whether that data was simply... not written at all or whether it
was written in the wrong place.  Given that out of the 50% failed drives a
number of them actually could not be seen on the SATA bus at all, it seems
likely that at high speeds, OCZs with the faulty firmware are actually capable
of overwriting their own firmware!  However, actually demonstrating this
is beyond the scope of the tests carried out, not least because it would
require wiping an entire drive, carrying out some parallel writes, then
checking the entire drive to see where the writes actually ended up.
This test may be added to the suite at a later date.

Once the firmware was downgraded to Revision 1.6, the drive-level testing
was carried out (there was no point doing so when the drive's firmware could
not even maintain data integrity even when power was provided).  Surprisingly,
the drive fared pretty well.  Sustained random speed levels were good, but
data was lost intermittently when power was pulled, especially
(like the Toshiba) at higher speeds.

Conclusion: buy cheap, flash firmware to 1.6 if power-loss not important

Intel 320 and S3500

As already hinted at, these drives simply could not be made to fail, no matter
what was thrown at them.  The S3500 was power-cycled some 6,500 times for
several days: several terabytes of random data were written and read from that
drive.  not a single byte of data was lost.  Despite even the reads being
interrupted, there was not a single time - not even once - when the S3500
failed to verify the data that had been written.

The only strange behaviour observed was that the write-then-read cycle
speeds tended to fluctuate, sustaining around 25 to 30mbytes of write-then-read
speed continuously for several minutes then dropping after 10 or so minutes
to 20 or even 12 mbytes / sec for one (and only one) write-read cycle.
The only possible explanation for this could be some housekeeping going
on, in the firmware, which would take up CPU cycles for short durations.

Conclusion: don't buy anything other than Intel SSDs

Conclusion

Right now, there is only one reliable SSD manufacturer: Intel.
That really is the end of the discussion.  It would appear that Intel is
the only manufacturer of SSDs that provide sufficiently large on-board
temporary power (probably in the form of supercapacitors) to cover writing
back the entire cache when power is pulled, even when the on-board cache
is completely full.

The Toshiba drives have some power-loss protection, but it's not
enough to cover an entire cache.  The Innodisk team have tried hard: their
datasheet shows that they are also providing power-loss protection as well
as detecting when power and current drop below unsustainable levels.
Given how difficult it is to even find out if Manufacturers provide this
kind of capability at all it is worth giving Innodisk credit for
at least making that information publicly accessible.

The OCZ Management deserve everything that's happened to OCZ.  They should
have listened to Sandforce: the history of SSDs would have been a radically
different story.  The sad thing is that when the firmware is downgraded,
the drives are no worse than any other consumer-grade SSD.

The Crucial M4 is probably okay for general use, as are all the other drives
(except the Innodisk until they fix the firmware issues to get the sustained
write speeds back).  And so, if it's possible to buy them cheap, and
power-loss is not an issue, getting hold of second-hand OZC Vertex drives
and downgrading the firmware would not be that bad an option.

However, if data integrity is really important, even when power could be
pulled at any time, then there really is absolutely no question: get an
Intel SSD.  it's as simple as that.

Future

On the TODO list is to write that test which wipes the drive, carries out
random writes, then checks the entire drive to see if the writes went in
the correct places.  On the face of it this seems such an obvious thing
that drives should do, but the OCZ Vertex's show that it's an
assumption that cannot be made.

The Innodisk drives are one to watch: the price and tiny size is well worth
continuing to work with Innodisk to see if they can solve the problem of
parallel-write-cache overload.

Other drives may prove to be as good as the Intel S3500, however they were
not tested during this research because other drives were either way outside
of the budget, or it was impossible to find out from even exhaustive Internet
searches as well as speaking to suppliers whether the other potential
candidates had any form of power-loss protection.

If anyone would like to find out if a particular make or model of drive is
reliable under extreme torturing and power-interruption, contact
lkcl@lkcl.net: a contract can be arranged
and this report updated.

Lastly, it is worth noting that this testing was only carried out for
a maximum of a few days sustained writing.  The long-term viability
obviously has not been tested.  However, given that deployment of over
500 Intel 320 SSDs has been carried out and only 3 failures observed
over several years, it would be reasonable to conclude that Intel S3500s
could be trusted long-term as well, bearing in mind - as a precautionary
tale - that lower geometries means more unreliability for the firmware
to contend with.

TODO Updated: 28th Dec 2013

Thank you to everyone who's recommended drives since this report was published.
The initial investigation is basically over: the Intel S3500 was top of the
list as it was the only one that passed.  However, based on unit cost it could
well be the case that the investigation is reopened.

Recommended drives for consideration at a later date:
* Samsung 840
* Crucial M500 (first Crucial drive with power-loss capacitors)
* Intel 540 series (which are apparently made differently from S3500 and 320s)
* stec-inc S230 SATA Slim

Recommended tests:
* Use new linux kernel 3.8 "cmd flush disable" option to check data integrity
* "Power brown-outs" (reducing current intermittently) as an advanced test