Maker Pro
Maker Pro

OT; Power faults and SSD's

M

Martin Riddle

Jan 1, 1970
0
A while back there was a discussion of power failures and the effects on
Hard disks.
Well, not really discussion but a 'Yes, it does happen' vs. a 'No, it
doesn't happen' post.

Anyway, I found this article that points out SSD's are more susceptible
to power faults.
<http://www.zdnet.com/how-ssd-power-faults-scramble-your-data-7000011979/>

And here is the link to the paper.
<https://www.usenix.org/system/files/conference/fast13/fast13-final80.pdf>

You need a few extra fingers and toes to count the faults on SSD's Vs.
the few faults on a rotational disk.

Cheers
 
D

DecadentLinuxUserNumeroUno

Jan 1, 1970
0
A while back there was a discussion of power failures and the effects on
Hard disks.
Well, not really discussion but a 'Yes, it does happen' vs. a 'No, it
doesn't happen' post.

Anyway, I found this article that points out SSD's are more susceptible
to power faults.
<http://www.zdnet.com/how-ssd-power-faults-scramble-your-data-7000011979/>

And here is the link to the paper.
<https://www.usenix.org/system/files/conference/fast13/fast13-final80.pdf>

You need a few extra fingers and toes to count the faults on SSD's Vs.
the few faults on a rotational disk.

Cheers

Thanks for that. Another good reason to have a UPS.
 
M

miso

Jan 1, 1970
0
Thanks for that. Another good reason to have a UPS.
You obviously didn't read the paper. They created a fake power failure
by gating power through a kludge protoboard circuit. They did not turn
off the AC to the power supply. Who knows how much ringing their was on
the power supply pin due to wiring inductance and di/dt. A computer
power supply has some hold time. Their test is quite bogus.

To elaborate, these wankers left the data bus connected to the SSD while
they gated the supply voltage. This is a prescription for inducing latch-up.

Direct gating of the power to a device via a fet is just bad. Yes it is
done, and yes most manufacturers will at least lab test their parts to
insure reasonable behavior under such circumstances, but such
shenanigans are not in the test flow. Chips provide power down
pins/modes. Use them.

While I'm at it, most home UPS are cheesy square wave (OK, "modified")
inverters that hopefully switch on in time. The data centers use double
conversion UPSs.
 
J

Jasen Betts

Jan 1, 1970
0
The tests were also done on unformatted drives, which eliminates the
benefits offered by the BIOS and operating system in detecting errors
and assigning alternate blocks.

are you suggesting that maybe software could have corrected some of
the hardware failures, one problem it could also compound them and it
makes comparisons harder.
I couldn't tell from the long description of the methodology at what
point in the write cycle they disconnected the power. If they pulled
the plug in the middle of a block of data, they should not expect it
to be written to the drive.

true, are you suggesting they did?
The failure scenario somewhat contrived. It creates a situation where
the drive is writing all the time, and the power drops during a write.
In reality, the drive spends most of its time idle. It does mostly
reads, and does comparatively fewer writes.

that depends what it'se being used for OLTP, DVR, and video editing
all write in with a high frequency,

particularly if they have plenty of RAM.
You can see for yourself
by using some performance monitoring tool, and comparing the bytes
read to the bytes written.

these are real numbers from an OLTP database server.

34 734 222 256 sectors written vs 8 381 621 185 read

Roughtly 4:1 in favour of writes.
The authors didn't bother supplying the names of the drive vendors, or
the models tested. I suppose this could be part of some kind of
double blind test method, but the final results should have included
the maker and model.

They probably wanted to avoid the fervid attention of lawyers.
 
J

Jasen Betts

Jan 1, 1970
0
It's hard for an OS to determine "soft" errors by looking at read
timing, due to the presence of the read cache in the drive (which you
can disable), and due to the fact that occasional, nonrepeating read
errors are actually fairly common due to vibration and electronic
noise.

yeah, I saw a demo on you-tube (possibly "Shouting in the Datacenter")
and was able to replicate those results on a smaller scale in the office

on linux
ddrescue /dev/sda /dev/null --force

then shout at the drive and watch the data rate dip.
 
Top