Ceph For Media Storage (Big But Slow I/O)

gpmidi

3 Jun 2020

My Ceph cluster at home isn't designed for performance. It's not designed to maximum availability. It's designed for a low cost per TiB while still maintaining usability and decent disk-level redundancy. Here is some recent tuning to help with performance and corruption prevention....

mds
1. mds_cache_memory_limit: 4GiB => 8GiB
  1. Target maximum memory usage of MDS cache
  2. The default wasn't enough for the CephFS instance I have. Probably related to the number objects or frequency FS scans for metadata.
osd
1. osd_deep_scrub_interval: 1w => 8w
  1. Deep scrub each PG (i.e., verify data checksums) at least this often
  2. Scrubs on 200TiB (and about to grow another 150TiB) of data isn't feasible once a week
2. osd_op_queue: wpq => mclock_client
  1. Which operation priority queue algorithm to use
  2. Changed from wpq to mclock_client to help with client and workload type per-osd disk queuing
3. osd_max_backfills: 1 => 4
  1. Maximum number of concurrent local and remote backfills or recoveries per OSD
  2. While most spinning should keep this at one, this should help overcome network/cpu latency. I think.
4. osd_recovery_max_active: 0 => 4
  1. Number of simultaneous active recovery operations per OSD (overrides _ssd and _hdd if non-zero)
  2. Helps ensure the recovery operations go as fast as possible. This may impact client read performance during recovery. But I'd prefer that over a higher chance of corruption, data loss, or other problems.
5. osd_recovery_max_single_start: 1 => 4
  1. The maximum number of recovery operations per OSD that will be newly started when an OSD is recovering
  2. See 2.3.2
6. osd_scrub_auto_repair: false => true
  1. Automatically repair damaged objects detected during scrub
  2. If something corrupt or out of sync is found, let's get that fixed asap.
7. osd_scrub_during_recovery: false => true
  1. Allow scrubbing when PGs on the OSD are undergoing recovery
  2. See 2.3.2
8. osd_scrub_load_threshold: 0.5 => 4
  1. Allow scrubbing when system load divided by number of CPUs is below this value
  2. The load on my OSD servers is usually above 1 during normal operations. It only seems to go above four during heavy recovery and other heavy ops. So this seemed a good middle ground.
9. osd_scrub_max_interval: 1w => 4w
  1. Scrub each PG no less often than this interval
  2. With 15 million objects / 220 million replicas as of 2020-06-03, every day seems like overkill.
10. osd_scrub_min_interval: 1d => 1w
  1. Scrub each PG no more often than this interval
  2. See 2.9.2
global
1. target_max_misplaced_ratio: 5% => 1%
  1. Max ratio of misplaced objects to target when throttling data rebalancing activity
  2. Given the large number of objects in my cluster I figured this should be low so the reblancing is a higher priority.

These can be fetched with `ceph config get <level 0> <level 1>` and set with `ceph config set <level 0> <level 1> <value>` where "level 0" is the top level indent in the list and "level 1" is the second level indentation.

Details for the OSD configuration can be found at https://docs.ceph.com/docs/octopus/rados/configuration/osd-config-ref/

Search

Tags