This goes out to all owners of a QNAP SS-839 Pro. I've owned the device for about 1,5 months now, and in general I'm pretty happy with it. However, the load cycle behaviour of the hard disks that I am using with the QNAP (8x WD5000BEVT 2,5-inch drives) had me doubt my happiness would last long.
The problem became clear when I checked the SMART details of each drive in the admin interface. Only a few days had passed since I first powered the machine up, and load cycles were already beyond 3000 for most of the drives. This seemed a little much and meant that drives would shut down and spin up about 400 times a day! It's just a belief, but I think that hard disk drives last longer if they run 24/7 instead of changing their power state that often. QNAP options didn't seem to contain switches to influence this from the host side.
SMART Statistics Illustrating the Problem
This is the chart that shows the load_cycle_count (193) SMART indicator for all of the drives. Note that I have disconnected drive 8 on August 22nd. It was configured as the hot spare drive, and I didn't want to continue wasting its lifetime.
I googled for these symptoms and found a considerable lot of reports of this problem, and repeatedly found that Linux as well as 2,5-inch drives were affected, regardless of whether a NAS configuration or a classic PC was used. There were some advice, too, but they required manufacturer-specific tools that I couldn't use, and WD didn't offer any tools for the 5000BEVT drives. The WD data sheets didn't show any usable info on power saving behaviour of the drives, so I decided to contact them.
WD Support
This is the text of the e-mail I wrote on August 19th:
Dear Sirs,
I have a question regarding the general power saving behavior of Scorpio Blue hard disk drives of which I own eight units that run in a QNAP SS-839 Pro NAS enclosure as a RAID-6. Watching the SMART data of the drives from time to time, I found out that the load cycle count of each disk goes up by 300 to 400 each day! That seems too much for my taste, and I'd like to find out why this happens.
The NAS is up and running all the time, so there is no impact on the general operation, but I'm worried about the reduced lifetime of the disks caused by these frequent shutdowns.
I noticed that the one drive that I have assigned as the spare drive does not shut down as often as the other drives. While most drives have about 8800 load cycles by now (after 3 weeks), the spare drive has ~5400 cycles. It is still questionable why the spare is this active as the it should not be doing much except being checked every once in a while, but instead it is nearly as much used as all the other disks which makes me worry if it is capable of saving the RAID for long once another disk fails.
I haven't found any options in the QNAP device software to control any of this. Currently it is unknown who is responsible for the behavior at all, either the NAS device or the disks. People in the QNAP forums couldn't help me so far, I hope you can help me find out.
The question is, do Scorpio hard disks shut down on their own, e.g. after being idle for some minutes to save power? Is there any way I can control this? I'd prefer to have each disk run 4 hours or so before it shuts down. In my optinion it is better for a disk to keep running than to shut down and spin up again constantly. Am I right with this assumption? I cannot find a tool on the WDC web site that would let me view or change the hard disk parameters. Does such a tool exist? I know from other HDD manufacturers that they deliver tools to adjust parameters such as AAM, and read detailed SMART data. Maybe WD has some tool of this kind, too?
What do you recommend I should do?
Thanks a lot for your support!
Best regards,
Johannes Franke
And wow! - indeed, the reply came in the evening of the same day, but seemed somewhat dissatisfying:
Dear JOHANNES,
Thank you for contacting Western Digital Customer Service and Support. My name is Tahimi.
I truly apologize for the inconvenience you are currently experiencing. Unfortunately Mr. Franke the first problem you are facing are the drives you are using on a RAID. Unfortunately we don't support Scorpio drives on a RAID array because they don't have the TLER feature enabled.
However, another bad news is that we don't have any feature or tool to manage the power or sleep timer of the drive.
To build a RAID array I recommend you to use our RAID edition drives like the RE3 or RE4. Please use the following link to get more info about the RAID edition drives.
http://www.wdc.com/en/products/index.asp?cat=2
If you have any further questions, please reply to this email and we will be happy to assist you further.
Sincerely,
Tahimi
Western Digital Service and Support
http://support.wdc.com
TLER is a nice feature which enables hard drives to report problems pretty soon to the host controller, and has both devices kind of negotiate on how to deal with the problem. Without this feature, a hard disk has about 8 seconds to come back with the data requested after the controller sent the request. If it takes longer, it is automatically set to FAILED state in the controller, and the user can't do anything but replace the drive, so TLER may help use drives longer, but for sure it is not required at all for normal RAID operation. Furthermore, the advice to use the RE3 or RE4 series of the WD drives is not applicable for me as they are only available in 3,5-inch form factor. Probably Tahimi didn't look up any info on what a SS-839 Pro is and what drives are supported.
I was also mad at QNAP because they have the 5000BEVT in their compatibility list for the SS-839 Pro [1]. This was not exactly confirmed by WD this way…
Finding the Needed Hint
After this, I tried tweaking the QNAP settings, removed all power saving settings as far as they were offered in the admin interface, and also removed the disk test jobs that I had created to make a daily quick test and a weekly full test for each of the drives. This dropped the load cycle growth down to about 10 times a day per disk (August 25 in the chart), but for some reason, there must have been a change between August 30th and September 8th that caused the load cycles to go further up again. I cannot remember what I did, and there was no more option to return to the previous state that I considered good.
Up to this point I didn't even know whether the frequent load cycling came from the QNAP and some of its built-in power saving mechanisms, or if the hard disks would shut down themselves after some idle time, so I investigated a little more yesterday. Victor Meldrew’s blog [7] was very interesting to read and pointed exactly in the direction I wanted – thanks Victor!
Enter: wdidle3!
Even though WD support denied it, there is a tool with only one purpose: tweaking the built-in "idle3" timer that triggers a shutdown of the disk after eight seconds (!) of idle time by default. WD calls this "IntelliPark", I'd rather call it "StupiSuicide"... oh well, get the tool here [2] or here [3].
The problem is that wdidle3 is a pure DOS tool, it cannot be run in any Windows environment. If you try, you will just get informed that the application is not allowed to run in the way you intended. That is, you need a DOS environment. Yes, such things still happen!
Creating a DOS Environment
Nowadays, FreeDOS and TUBCD are pretty popular and royalty-free. I chose TUBCD [5] but you can just as well use FreeDOS [4]. You will also need a PC that supports booting from USB memory sticks and features a built-in SATA controller. Third-party controllers will most probably not be recognized by the tool.
To customize TUBCD, and run it from a USB stick instead a CD, follow instructions at [6]. I skipped the chapters from
Adding floppy images through
Generating customized ISO image, and instead placed wdidle3.exe in the
ubcd\tools\win32 folder inside the path to which I had unpacked the ISO. It can be run from there after booting. The creation of a bootable USB stick is described in the chapter
Making UBCD memory stick in the customization instructions.
Lights On: Tweaking the Drives
This is what I did (main steps):
• Created a bootable USB stick with The Ultimate Bootable CD plus wdidle3 as described
• Shut down my PC, disconnected all hard drives from the mainboard, then connected the first of the eight drives I wanted to tweak
• Placed the USB stick in one of the USB ports
• Turned the PC back on and went to setup to modify the boot order: USB-Floppy (not USB-CD or USB-Harddisk) should be the first entry to ensure the system boots from the USB stick
• Watched the sytem boot from USB. There may be some dialogs during the boot order that you need to confirm.
• When the main menu appeared, chose UBCD FreeDOS
• After the command prompt appeared, entered c: to get to the root of the USB stick
• Entered cd \ubcd\tools\win32
Now you’re ready to use wdidle3 with the hard drive currently connected.
PLEASE NOTE: the steps described here worked for me, but may fail with your hardware. The wdidle3 tool is not officially designed to tweak the Scorpio Blue series of WD drives, and probably you are going to void your warranty once you use it. If you are extremely unlucky, the tool may corrupt your drive’s firmware and render it useless. Please keep in mind that you do this at your own risk. I am not responsible for any loss of data or damage to your hardware.
Please consider performing a full backup of your QNAP (or of each disk you are about to tweak) to make sure that a damaged drive is the worst thing that happens.
You can use wdidle3 with these parameters:
• /? – displays a command line help
• /R – reports the current timer status of the disk connected, along with the model and serial number
• /D – disables the timer completely, i.e. the drive will never shut down on its own even when idle
• /S{n} - sets the timer to the amount of seconds specified in place of {n} (values from1 to 255)
To disable the timer on the current hard disk, just enter
wdidle3 /d
Again, the hard disk model and serial number are shown, and the message should now also say that the timer is disabled. If so: congratulations! You are done!
I repeated this for all of the eight drives, and didn’t even need to power the system down and back up to disconnect the current and connect the next drive. That was a great timesaver, but let me repeat, this is something that no reasonable support personnel would ever recommend. Disconnecting and plugging in hardware while powered on is a very dangerous game, particularly for internal SATA ports which are not hotplug-enabled. It worked anyway with my Gigabyte GA-880GA-UD3 mainboard, using the following pattern. Feel free to try, but be extremely careful, and remember that it’s at your own risk:
- After the current drive is done, disconnect the SATA data cable from it (the smaller of the two plugs)
- Then disconnect the SATA power cable (larger plug)
- Get the next drive and connect SATA power first
- Take a listen - you should hear the drive spin up
- Then connect the SATA data cable
- Wait some seconds for the drive to be ready
- Repeat the command line “wdidle3 /d”, and verify that it shows the serial number of the drive you have connected in step 3
- Wait some seconds to ensure that no more writing to the drive takes place
- Continue with the next disk at step 1
That way, I disabled the timers of all hard drives within a few minutes. Eventually the big moment came: all disks were reinserted into the QNAP (ensure the same order as before!), then powered up, and whoa, all data still there, wonderful! Since then, no load cycle count increased, on none of the disks. Case closed.
Conclusion
WD's strategies to make their drives less power-consuming are two-edged: while they do save power by quickly shutting down whenever possible, they tend to destroy themselves more rapidly than drives that keep running. In a mobile PC, the hard disk is one of the minor power consumers, mostly it's the display that draws most of the energy, so this power saving approach goes a little too far in my opinion. Another downside of it is that the OS is not aware of the drive's behaviour. On many laptops, the result is that the system seems to "hang" from time to time because the OS doesn't even know the drive has shut down. When it is accessed the next time (and in Windows, that's rather frequent), the HDD spinning up again causes a delay of a few seconds, making the system completely inaccessible until the drive is back up.
For sure, WD is not the only manufacturer who implements power saving like this. No manufacturer is likely to let you tweak predefined settings, and most of them will see this as a violataion of warranty conditions. It's a matter of trust. I think the WD drives are well-built and will not fail spontaneously. At least it's less likely if they keep running, instead of interrupting their operation over and over again. They will probably handle running 24/7 with ease, but only time can tell. If they do fail, I will probably have a hard time getting a replacement even if this happens within the guarantee lifetime. It's sad that there is so little official information about this. I hope this article may help other hardware desperados find what they need more quickly.
[1]
QNAP Compatibility list for 2,5-inch drives
[2]
wdidle3 download at WD
[3]
wdidle3 download at private mirror
[4]
Use wdidle3 with FreeDOS (German)
[5]
The Ultimate Boot CD Download
[6]
The Ultimate Boot CD Customizing
[7]
I Don’t Believe It! Blog about WD self-torture
Addendum
[2010-01-12 19:28] Today, I inspected the SMART values again, and still the load cycle count has not changed for any of the disks! Wonderful!
|
Statistics from mid-August 2010 until today |