快讯
主页 > 科技 > 2017年1季度硬盘可靠性报告

2017年1季度硬盘可靠性报告

2017年1季度硬盘可靠性报告

Hard Drive Stats for Q1 2017

 

这次的更新中,我们将回顾17年1季度期间部署的硬盘以及我们现阶段所有硬盘的故障率,之后将审视对我们来说相对新一级别的硬盘—“企业级”(因为之前Backblaze一直都购买平民消费级)。 我们将分享我们的体验和看法,像以往一样,你也可以下载我们这份报告中用到的数据。

In this update, we’ll review the Q1 2017 and lifetime hard drive failure rates for all our current drive models, and we’ll look at a relatively new class of drives for us – “enterprise”. We’ll share our observations and insights, and as always, you can download the hard drive statistics data we use to create these reports.

我们的数据构成

Our Hard Drive Data Set

Backblaze 目前已经记录并保存了我们的数据中心4年间硬盘的日统计数据。这些数据包扩了每块硬盘自检报告提供的数据,同时也包含了其他如硬盘序列号及故障率的相关信息。截止2017年3月31日,我们共有84469块硬盘在运行,其中包括1800块引导盘和82669块数据盘。在我们此次回顾中,我们去除了一些部署个数不超过45块的硬盘型号,最终留给分析的是82516块硬盘。这些17种不同型号的硬盘涵盖了3至8TB的容量,这些硬盘全部为3.5寸硬盘。

Backblaze has now recorded and saved daily hard drive statistics from the drives in our data centers for over 4 years. This data includes the SMART attributes reported by each drive, along with related information such as the drive serial number and failure status. As of March 31, 2017 we had 84,469 operational hard drives. Of that there were 1,800 boot drives and 82,669 data drives. For our review, we remove drive models of which we have less than 45 drives, leaving us to analyze 82,516 hard drives for this report. There are currently 17 different hard drives models, ranging in size from 3 to 8 TB in size. All of these models are 3½” drives.

20171季度硬盘可靠性统计

Hard Drive Reliability Statistics for Q1 2017

自16年4季度的报告起,我们有新增了10577块硬盘,这将观察的总数提高到了82516块。我们将研究2017年1月1日至3月31日间的统计数据。再此运营期间的硬盘容量从3至8TB,如下所示:

Since our last report in Q4 2016, we have added 10,577 additional hard drives to bring us to the 82,516 drives we’ll focus on. We’ll start by looking at the statistics for the period of January 1, 2017 through March 31, 2017 – Q1 2017. This is for the drives that were operational during that period, ranging in size from 3 to 8 TB as listed below.

一季度观察与说明

Observations and Notes on the Q1 Review

你会注意到有些硬盘的故障率为0. 这里的0值的是在17年1季度期间故障率为0. 后期,我们会的研究会覆盖此硬盘的生命周期。为什么季度数据很重要?我们借此观察任何异常情况。例如,在一季度希捷4TB型号为:ST4000DX000的硬盘,故障率高达35.88%, 但是全生命周期的年化故障率则较低,为7.5%。这则案例中,我们仅有170块此型号的硬盘,所以此故障率统计上并不显著(置信度低)。但如果我们部署了几千块这种型号的硬盘,那么这种信息就变的有用了。

You’ll notice that some of the drive models have a failure rate of “0” (zero). Here a failure rate of zero means there were no drive failures for that model during Q1 2017. Later, we will cover how these same drive models faired over their lifetime. Why is the quarterly data important? We use it to look for anything unusual. For example, in Q1 the 4 TB Seagate drive model: ST4000DX000, has a high failure rate of 35.88%, while the lifetime annualized failure rate for this model is much lower, 7.50%. In this case, we only have a 170 drives of this particular drive model, so the failure rate is not statistically significant, but such information could be useful if we were using several thousand drives of this particular model.

一季度的故障硬盘数为375块。当硬盘满足以下情况则被认定为故障硬盘:

硬盘不转或无法连接操作系统

硬盘无法或不能持续在列阵中同步

硬盘自检报告的数据超过了我们设定的阈值

There were a total 375 drive failures in Q1. A drive is considered failed if one or more of the following conditions are met:

  • The drive will not spin up or connect to the OS.
  • The drive will not sync, or stay synced, in a RAID Array (see note below).
  • The Smart Stats we use show values above our thresholds.

Note: Our stand-alone Storage Pods use RAID-6, our Backblaze Vaults use our ownopen-sourced implementation of Reed-Solomon erasure coding instead. Both techniques have a concept of a drive not syncing or staying synced with the other member drives in its group.(解释,就不翻译了)

此次一季度的平均年化故障率为2.11%。 略高于上季度数据,但可能与我们在一季度中新增了10577块新硬盘有关。我们早前已经注意到硬盘在部署初期会存在故障率的小幅上升的情况。从我们之前报告中的“下凹曲线”可以看出。

The annualized hard drive failure rate for Q1 in our current population of drives is 2.11%. That’s a bit higher than previous quarters, but might be a function of us adding 10,577 new drives to our count in Q1. We’ve found that there is a slightly higher rate of drive failures early on, before the drives “get comfortable” in their new surroundings. This is seen in the drive failure rate “bathtub curve” we covered in a previous post.

多出的10577块硬盘

10,577 More Drives

多出的10577块硬盘是由11002块新增的硬盘减去425块被移除的硬盘。被移除的硬盘中有375块被认定为为故障,并做了对应替换。由于转向跟高密度的硬盘,所以425块硬盘从服务中移除。

The additional 10,577 drives are really a combination of 11,002 added drives, less 425 drives that were removed. The removed drives were in addition to the 375 drives marked as failed, as those were replaced 1 for 1. The 425 drives were primarily removed from service due to migrations to higher density drives.

下表依据硬盘容量列出了17年1季度损坏的硬盘:

The table below shows the breakdown of the drives added in Q1 2017 by drive size.

2017年1季度磁盘变更

现有硬盘生命期内故障率

Lifetime Hard Drive Failure Rates for Current Drives

 

下边列出了截止2017年3月31日我们运营中的硬盘的故障率。涵盖了13年4月至17年3月31日。

The table below shows the failure rates for the hard drive models we had in service as of March 31, 2017. This is over the period beginning in April 2013 and ending March 31, 2017.

13年4月至17年3月硬盘故障率

以上列表中的硬盘型号的年华故障率为2.07%。2016年4季度的同型号的故障率为2.05%。 考虑到2017年1季度的故障率的上升,这个结果符合预期。在此期间没有添加新型号的硬盘,也没有被去掉的已部署型号的硬盘。

The annualized failure rate for the drive models listed above is 2.07%. This compares to 2.05% for the same collection of drive models as of the end of Q4 2016. The increase makes sense given the increase in Q1 2017 failure rate over previous quarters noted earlier. No new models were added during the current quarter and no old models exited the collection.

Backblaze正在使用企业级硬盘我去!

Backblaze is Using Enterprise Drives – Oh My!

你们其中的某些人可能注意到在数据中心我们正在使用相当数量的企业级硬盘,共2459块希捷8TB硬盘,型号为:ST8000NM055。 我们最先部署的是HGST的8TB硬盘,但目前只剩下45块。为何我们突然决定新购超过2400块希捷8TB硬盘?原因是有段很短的时间,希捷主推新型号硬盘并淘汰旧型号的硬盘,而8TB的每兆兆字节的采购成本符合了我们的预算。之前我们曾购买了60块此型号的硬盘并在一个储存箱中进行测试,其表现满足我们设定应用环境下的要求。当有一个机会能够给我们一个合理的价格购入企业级硬盘,这实在难以拒绝。

Some of you may have noticed we now have a significant number of enterprise drives in our data center, namely 2,459 Seagate 8 TB drives, model: ST8000NM055. The HGST 8 TB drives were the first true enterprise drives we used as data drives in our data centers, but we only have 45 of them. So, why did we suddenly decide to purchase 2,400+ of the Seagate 8 TB enterprise drives? There was a very short period of time, as Seagate was introducing new and phasing out old drive models, that the cost per terabyte of the 8 TB enterprise drives fell within our budget. Previously we had purchased 60 of these drives to test in one Storage Pod and were satisfied they could work in our environment. When the opportunity arose to acquire the enterprise drives at a price we liked, we couldn’t resist.

以下是8TB消费级和8TB企业级硬盘的数据。

Here’s a comparison of the 8 TB consumer drives versus the 8 TB enterprise drives to date:

目前我们学到的。。。

What have we learned so far…

1、现在比较故障率为时尚早 – 最早部署的企业级硬盘只上线了约2个月,其中多数在1季度末之前刚被部署。Backblaze Vaults(BV)的企业级硬盘尚未写入数据。在对比之前,我们至少需要6个月的时间,因为现在的数据波动太大。例如,如果现在企业级的硬盘在2季度出现2次故障,则它的年化故障率将为0.57%。

  1. It is too early to compare failure rates – The oldest enterprise drives have only been in service for about 2 months, with most being placed into service just prior to the end of Q1. The Backblaze Vaultsthe enterprise drives reside in have yet to fill up with data. We’ll need at least 6 months before we could start comparing failure rates as the data is still too volatile. For example, if the current enterprise drives were to experience just 2 failures in Q2, their annualized failure rate would be about 0.57% lifetime.

2、企业级硬盘加载数据更快 – 装有企业级硬盘的Backblaze Vaults比装有消费级的加载数据更快。企业级硬盘的BV每天平均加载140TB,消费级的则为100TB。

  1. The enterprise drives load data faster – The Backblaze Vaults containing the enterprise drives, loaded data faster than the Backblaze Vaults containing consumer drives. The vaults with the enterprise drives loaded on average 140 TB per day, while the vaults with the consumer drives loaded on average 100 TB per day.

3、企业级的更耗电- 一点也不吃惊,因为希捷声称企业级的平均待机功率为9W,运行功率为10W。与此同时,消费级的平均待机功率为7.2W,运行功率为9W。对于单独的一块硬盘可能并不显著,但当你在4U的存储箱底盘里放入60块硬盘,且一个机架放入10个底盘,这些微小差异就被很快放大了。

  1. The enterprise drives use more power – No surprise here as according to the Seagate specifications the enterprise drives use 9W average in idle and 10W average in operation. While the consumer drives use 7.2W average in idle and 9W average in operation. For a single drive this may seem insignificant, but when you put 60 drives in a 4U Storage Pod chassis and then 10 chassis in a rack, the difference adds up quickly.

4、企业级硬盘有些好功能 – 我们使用的希捷8TB企业级硬盘拥有的PowerChoice™ technology技术,让我们可以低功率运行。当调低功率时,数据加载时间显著上升。总的来说,即便在低功率下,企业级也比消费级每天多加载40%的数据。

  1. Enterprise drives have some nice features – The Seagate enterprise 8TB drives we used have PowerChoice™ technologythat gives us the option to use less power. The data loading times noted above were recorded after we changed to a lower power mode. In short, the enterprise drive in a low power mode still stored 40% more data per day on average than the consumer drives.

尽管企业级硬盘拥有更快的数据加载能力,但硬盘的速度并不是我们系统的瓶颈所在。更快的数据加载系统,只是能让磁盘“排着队”更快的装满数据,对于从消费者处接收数据而言,这种能力绰绰有余。

While it is great that the enterprise drives can load data faster, drive speed has never been a bottleneck in our system. A system that can load data faster will just “get in line” more often and fill up faster. There is always extra capacity when it comes to accepting data from customers.

总结

Wrapping Up

我们将持续观察8TB企业级硬盘的表现并继续公布我们的发现。

We’ll continue to monitor the 8 TB enterprise drives and keep reporting our findings.