Ceph

Running Reweight By Utilization In Octopus

gpmidi

30 Jan 2021

Need to run a reweight by utilization but can't with octopus? Try this...

ceph osd reweight-by-utilization 0.5 -- 150

Errors:

Invalid command: unused arguments
xxxxx not in --no-increasing

Ceph: Frequent Node Reboots

gpmidi

10 Jan 2021

A few of my Ceph nodes with 60-ish disks each were experiencing frequent reboots. Turns out kernel.nmi_watchdog was rebooting it due to disks holding it up under very high load. By turning it off via `echo "kernel.nmi_watchdog=0" > /etc/sysctl.d/99-watchdog.conf` the problem was solved. Although I suspect there are better ways to tune NMI Watchdog to fix this. I'm being lazy.

Ceph: Networking Between Hosts

gpmidi

9 Jan 2021

Hosts in my cluster all have two 10GbE links to a switch bonded with LACP. That gives them an effective link of 20GbE normally. But really, even 5Gbps would be enough for most hosts. The kernel in most boxes combined with drivers and such isn't going to be able to push 20GbE, much less 40GbE. So faster isn't worth it.

Note: Don't bother with IPoIB. You're better off with 10GbE x2 via LACP. Although if it's a choice between 40Gbps IB and IPoIB or a couple of 1GbE lines, go IPoIB.

Ceph: What Drive Sizes To Use

gpmidi

9 Jan 2021

Drive Counts

43 2TB
1 2.5TB
23 3TB
42 8TB (In Ceph)
30 8TB (Not In Ceph)

Why

Why these drives? They're the main data drives I've had and used for years. The only ones I'm going to remove soon are a subset of the 2TB drives with high spin times. Some of them have more than 8.5 years of spin time. I'll probably remove any disk with more than 6 years of spin time as a preventive measure.

Ceph: OSD Journal/WAL (& Sometimes Database)

gpmidi

9 Jan 2021

Once GREAT option for improving OSD performance with spinning disk, esp slow disk, is to use a redundant array of SSDs for the Bluestore Journal. If you've got SSD space to spare you could even put the RocksDB on SSD too. But that needs a LOT more space. The Journal only needs a couple of GiB per OSD. RocksDB needs a LOT more.

Ceph: osd_memory_target

gpmidi

9 Jan 2021

A couple of months back I changed the value of "osd_memory_target" for all of my OSDs from 4GiB to 1.5GiB. That change has stopped all RAM related issues on my cluster. While I suspect but can't prove a small performance drop, it's well worth it in my case.

NDS-4600 - SATA Drive Failures In Linux

gpmidi

1 Dec 2020

One issue I've recently run into with a failed SATA drive in one of my NDS-4600 units is that Linux frequently tries to recover the drive by resetting the bus. This takes out a few other disks in the group with it. The resulting IO timeouts cause problems for my Ceph OSDs using those disks.

It should be noted that only some types of disk failures cause this. The host bus resets only are done by the Linux kernel in some cases (I think) and I suspect the cause of the other disks errors is said disk.

Scrubbing Vs Deep Scrubbing

gpmidi

3 Jun 2020

Ceph has two forms of scrubbing that it runs periodically: Scrub and Deep Scrub

A Scrub is basically as fsck for replicated objects. It ensures that each object's replicas are all the latest version and exist.

A Deep Scrub is a full checksum validation of all data.

Ceph For Media Storage (Big But Slow I/O)

gpmidi

3 Jun 2020

My Ceph cluster at home isn't designed for performance. It's not designed to maximum availability. It's designed for a low cost per TiB while still maintaining usability and decent disk-level redundancy. Here is some recent tuning to help with performance and corruption prevention....

23TiB On CephFS & Growing

gpmidi

24 May 2020

Original post on Reddit

Hardware

Previously I posted about the used 60-bay DAS units I recently acquired and racked. Since then I've figured out the basics of using them and have them up and working.

Running Reweight By Utilization In Octopus

Tags

Ceph: Frequent Node Reboots

Tags

Ceph: Networking Between Hosts

Tags

Ceph: What Drive Sizes To Use

Drive Counts

Why

Tags

Ceph: OSD Journal/WAL (& Sometimes Database)

Tags

Ceph: osd_memory_target

Tags

NDS-4600 - SATA Drive Failures In Linux

Tags

Scrubbing Vs Deep Scrubbing

Tags

Ceph For Media Storage (Big But Slow I/O)

Tags

23TiB On CephFS & Growing

Original post on Reddit

Hardware

Tags

Search

Tags

Tags

Tags

Drive Counts

Why

Tags

Tags

Tags

Tags

Tags

Tags

Hardware

Tags