[00:43:03] *** Quits: bwalker_ (~bwalker@ip70-190-226-244.ph.ph.cox.net) (Quit: Leaving) [01:18:11] *** Quits: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) (Ping timeout: 256 seconds) [03:04:16] *** Joins: gila (~gila@5ED74129.cm-7-8b.dynamic.ziggo.nl) [03:10:21] *** Joins: tomzawadzki (uid327004@gateway/web/irccloud.com/x-xkjurrmfknaromlc) [03:21:38] As bdevs get more complex (i.e the raid bdev) -- i was wondering if there are any plans to start using things like, for example, libnvl (from freeBSD) [05:20:43] *** Quits: darsto (~darsto@89-78-174-111.dynamic.chello.pl) (Ping timeout: 268 seconds) [05:54:28] maybe you can expand on your idea a bit for those not familiar with libnvl? [06:57:01] Well, its a small library that allows you to store key and values of any type. So for example if you have a config with certain properties, you can do things like nvlist_lookup(nvl, key, &value) -- instead of all those custom linked lists that are used right now [06:57:31] you can store arrays, other nvls, arrays of nvls -- and get a binary format of it and (for example) store it on disk [07:02:04] *** Quits: bwalker (bwalker@nat/intel/x-hafswerldklqlmzb) (ZNC - http://znc.in) [07:04:47] *** Joins: bwalker (~bwalker@134.134.139.72) [07:04:47] *** Server sets mode: +cnrt [07:04:47] *** Server sets mode: +cnrt [07:04:47] *** ChanServ sets mode: +o bwalker [07:38:23] so right now the only config direction that I've heard talk of is all the work to move to JSON formatted configuration where the actual storage of the data is left to the SPDK application (we're moving away from the .conf file format stuff that's still floating around in a lot of places). There's only a few modules that store config on disk and I've not heard of anyone looking to introduce a new lib for managing that, anyone else? [07:47:58] no, there aren't any plans to use anything like libnvl - we are trying to get away from the existing "ini" config files like peluse mentioned, but want to center configuration moving forward all around JSON and JSON-RPC [08:13:20] Thats fine (for config) -- its more about getting/setting values in memory and storing them on disk as well (like raid config). (however, nvlist can be serialised to json very easily as well, which is useful for dumping configs) [08:18:44] (btw its libnv not nvl, mybad) [08:26:49] I think it might be interesting to see a partial proof of concept patch against the RAID module for example if you were willing to work on it. Actually seeing it in action might help clarify the benefits provided you are OK putting in the time knowing it might not go anywhere :) [08:29:20] if this is specific to RAID config, then using DDF metadata I think is a stronger option - are there other cases besides RAID config that something like libnvl would help? [08:39:25] DDF (iirc) only defines the fields, their meaning and sizes AFAIK. It does not provide an interface to actually work with them and store them. [08:39:31] But despite that I think there is more value yes. [08:40:55] For example when creating the raid_bdev config, this could be a single nvlist, which in the case of two base_bdevs as children could simply be another nvlist that is inserted in the main raid_bdev nvlist [08:42:38] I guess the best example I can think of is how ZFS uses it today (although the lib used is different -- namely libnvpair) [08:43:36] https://github.com/zfsonlinux/zfs/blob/master/module/zfs/vdev.c#L541 [09:07:50] *** Joins: LiuXiaodong (86868b4b@gateway/web/freenode/ip.134.134.139.75) [10:07:26] *** Joins: JoeGruher (86868b53@gateway/web/freenode/ip.134.134.139.83) [10:08:52] Does SPDK include any ability to span or RAID multiple NVMe devices into one big area of capacity that i can then make lvols from? Like can I make an lvol store across multiple NVMe devices? [10:09:13] there is a RAID 0 bdev [10:09:19] it's configuration only, no on-disk metadata [10:09:29] but you can use that to pool [10:09:49] That sounds fine... where is that documented? I don't see it under bdev in the docs. [10:11:57] Basically I'm going to do some NVMeoF performance testing, I have 16 disks in my target but only 12 clients, so I am thinking I will aggregate the 16 disks into one blob and then divide into 12 lvols, no concerns about persistence or redundancy [10:12:17] hmm, I don't see any documentation. That should be fixed up [10:12:34] must be on the secret menu [10:12:49] if you look in etc/spdk/nvmf.conf.in [10:12:56] there is a [RAID] section example [10:13:12] K [10:13:18] Is there a rpc.py command for RAID [10:13:23] yes I'm pulling that up now [10:14:10] there are the following raid-related RPC methods (found by searching for SPDK_RPC_REGISTER inside bdev_raid_rpc.c [10:14:15] get_raid_bdevs [10:14:22] construct_raid_bdev [10:14:28] destroy_raid_bdev [10:15:01] do ./scripts/rpc.py construct_raid_bdev -h [10:16:00] for performance testing this will work great - in production there are some disadvantages to pooling using simple RAID 0 [10:16:01] k [10:16:08] around failures [10:16:11] yeah like losing all your data [10:16:31] users get cranky about that sort of thing [10:16:40] yep - if you pool 16 SSDs, you now lose your data if only one of them fails [10:16:49] instead of losing 1/16th of the data if one of them fails [10:17:14] how do the base bdevs work, is the capacity automatically divided among them? or how is capacity allocated to them? [10:17:15] so we need to do like a RAID 5-ish thing for pooling [10:17:27] oh wait i see [10:17:34] you make the NVMe disks into bdevs and then raid those bdevs [10:17:40] yep [10:17:55] you can raid any bdev, but in your case real NVMe drives [10:18:03] raid5 is so old fashioned [10:18:11] yeah and it has problems too [10:18:30] there are better strategies, but they're all RAID-5-like [10:18:45] replication across separate systems is where it is at [10:20:26] *** Quits: gila (~gila@5ED74129.cm-7-8b.dynamic.ziggo.nl) (Quit: My Mac Pro has gone to sleep. ZZZzzz…) [10:21:17] well that has a few additional issues [10:21:23] it seems as simple as a RAID 1 at first [10:21:46] maybe with an optional way to specify which base bdev to prefer for doing reads, so you can read from the local/fastest one [10:22:11] but the hard part is where you store the configuration data that indicates which devices are in the same "RAID 1" across the cluster [10:22:17] and I think solving that is outside of the scope of SPDK [10:22:47] yeah you go down that path and next thing you know you're trying to code your own version of Ceph from scratch [10:23:55] people are already trying to rewrite Ceph on top of DPDK/SPDK, so I'll leave them to their work [10:32:07] it would be nice if get_raid_bdevs printed capacity [10:33:10] is there a way to find the capacity of my raid bdev? preferably in MiB since that's what the lvol create command takes as input? [10:37:08] ./scripts/rpc.py get_bdevs -b [10:37:36] it gives blocks and block size, so you'll have to multiply them [10:37:50] that doesn't seem to work for the raid bdev [10:37:59] and if i run it without -b the raid bdev is not in the list [10:38:22] did it fail to create? let me look at what's going on with that [10:39:02] hmmm it says it is in the "configuring" state in get_raid_bdevs [10:39:09] what does that mean [10:41:25] I'm looking - presumably it didn't find all of the nvme base bdevs it needs to come online [10:42:44] hmm, I don't think that RPC is quite right [10:43:30] nm it should have been fine [10:43:47] for each of the bdev names you passed it as base bdevs, do all of those show up when you call "get_bdevs"? [10:43:57] i think i see the problem, i didn't include the n1 in the bdev names i passed it [10:45:33] now i can't destroy it [10:46:08] the configuration isn't saved, so just restart the target. Or leave it there - it won't do anything. [10:46:20] but you should be able to destroy raid bdevs in that state [10:46:20] I'll add it to the list [10:49:56] are you going to run any comparison benchmarks in that set up? vs. like the kernel target using lvm? [10:52:34] i wasn't planning on it, we have some new hardware, so i'm just trying to baseline performance [10:53:12] can do about 6.7M IOPS on the local NVMe test [10:54:38] with 16 SSDs? [10:55:02] are you limited by the drives or by PCI bandwidth? [10:55:04] i.e. are you using PCI switches [10:56:54] by the drives [10:57:06] we do have PCIe switches by they're not limiting performance [10:57:08] but* [10:58:27] we get pretty much perfect scaling up to 11 drives, then it falls off just a little for some reason as we go up to 12-16 [10:58:52] is everything on the same NUMA node? [10:59:16] no, the drives are spread over the NUMA nodes, unfortunately [10:59:40] at 12-16 do you think you might be hitting QPI/UPI limits then? [11:00:01] *** Joins: darsto (~darsto@89-78-174-111.dynamic.chello.pl) [11:01:00] i don't think so, it doesn't plateau like we have hit a bottleneck, just doesn't scale quite as well, but keeps going up [11:04:06] i suppose i could pin the FIO jobs to NUMA local cores/disks if i wanted to make the effort [11:09:17] is there not a get_lvols command? i see get_lvol_stores... [11:10:12] oh they're in the bdev list, makes sense [11:30:05] is there any particular advantage for 1 namespace with 24 subsystems versus 24 namespaces each with 1 subsystem? [11:30:17] i think this came up on the list recently and the answer was not really [11:30:43] A subsystem in NVMf is an access control list [11:31:08] if the 24 namespaces are only going to be accessed by a single host, putting them into one subsystem makes sense [11:31:23] if you are going to use 24 different hosts, I'd put each one into a separate subsystem [11:31:30] performance-wise it doesn't matter [11:33:23] *** Quits: tomzawadzki (uid327004@gateway/web/irccloud.com/x-xkjurrmfknaromlc) (Quit: Connection closed for inactivity) [11:34:03] pwodkowx: https://ci.spdk.io/spdk/builds/review/b42b3e1e44593e8c3b382f985f8d72603abd8c70.1541696540/fedora-03/build.log [11:34:25] if I'm understanding this right, and I may not be, it seems to me like loading the configuration file resulted in having examine() called twice for the same bdev [11:34:48] it could be a threading issue maybe? [12:15:34] bwalker, you looking at this? "vbdev_passthru.c: 541:vbdev_passthru_register: *NOTICE*: Match on Malloc4 [12:15:34] vbdev_passthru.c: 576:vbdev_passthru_register: *NOTICE*: io_device created at: 0x0x22f9760 [12:15:34] vbdev_passthru.c: 587:vbdev_passthru_register: *NOTICE*: bdev opened [12:15:34] vbdev_passthru.c: 598:vbdev_passthru_register: *NOTICE*: bdev claimed [12:15:34] vbdev_passthru.c: 609:vbdev_passthru_register: *NOTICE*: pt_bdev registered [12:15:35] vbdev_passthru.c: 610:vbdev_passthru_register: *NOTICE*: created pt_bdev for: PTMalloc4 [12:15:37] vbdev_passthru.c: 541:vbdev_passthru_register: *NOTICE*: Match on Malloc4 [12:15:39] vbdev_passthru.c: 576:vbdev_passthru_register: *NOTICE*: io_device created at: 0x0x22fb570 [12:15:41] bdev.c:3495:spdk_bdev_open: *ERROR*: Could not open Malloc4 - passthru module already claimed it [12:15:43] vbdev_passthru.c: 581:vbdev_passthru_register: *ERROR*: could not open bdev Malloc4" [13:36:35] *** Joins: gila (~gila@5ED74129.cm-7-8b.dynamic.ziggo.nl) [13:52:07] there's some trick to NQNs in an FIO config file, right? like you can't use '.' or ':'? what was the workaround? [13:52:10] filename=trtype=RDMA adrfam=IPv4 traddr=10.5.0.201 trsvcid=4420 subnqn=nqn.2018-11.io.spdk:nqn1 ns=1 [14:23:40] *** Joins: travis-ci (~travis-ci@ec2-174-129-149-194.compute-1.amazonaws.com) [14:23:41] (spdk/master) nvmf/rdma: Fix QP shutdown procedure implementation (Evgeniy Kochetov) [14:23:41] Diff URL: https://github.com/spdk/spdk/compare/c6323a8d526c...90b4bd6cf9bb [14:23:41] *** Parts: travis-ci (~travis-ci@ec2-174-129-149-194.compute-1.amazonaws.com) () [14:27:00] *** Quits: JoeGruher (86868b53@gateway/web/freenode/ip.134.134.139.83) (Quit: Page closed) [14:59:25] *** Quits: bwalker (~bwalker@134.134.139.72) (ZNC - http://znc.in) [15:00:01] *** Joins: bwalker_ (bwalker@nat/intel/x-awxqqnxccyedqcko) [15:00:01] *** ChanServ sets mode: +o bwalker_ [15:00:02] *** Server sets mode: +cnrt [15:00:03] *** Server sets mode: +cnrt [15:00:27] *** Quits: bwalker (~bwalker@134.134.139.72) (Ping timeout: 240 seconds) [15:00:28] *** Quits: sethhowe (~sethhowe@134.134.139.72) (Ping timeout: 245 seconds) [15:00:53] *** Quits: peluse (peluse@nat/intel/x-wiibhapkstphwgzp) (Ping timeout: 245 seconds) [15:00:58] *** Quits: ppelplin (~ppelplin@134.134.139.72) (Ping timeout: 272 seconds) [15:02:52] *** Joins: fionatrahe (fionatrahe@nat/intel/x-xvvcjvbqxghvacyv) [15:10:35] *** Joins: ppelplin (ppelplin@nat/intel/x-kcideuxygunxrdne) [15:31:17] *** bwalker_ is now known as bwalker [15:34:39] *** Joins: sethhowe (~sethhowe@134.134.139.72) [16:09:05] *** Joins: travis-ci (~travis-ci@ec2-54-81-79-7.compute-1.amazonaws.com) [16:09:06] (spdk/master) rpc: add function to get the current RPC state (Seth Howell) [16:09:06] Diff URL: https://github.com/spdk/spdk/compare/90b4bd6cf9bb...9bec45256177 [16:09:06] *** Parts: travis-ci (~travis-ci@ec2-54-81-79-7.compute-1.amazonaws.com) () [16:10:34] *** Joins: travis-ci (~travis-ci@ec2-54-224-74-94.compute-1.amazonaws.com) [16:10:35] (spdk/master) app: fixup default values in the usage text (Darek Stojaczyk) [16:10:36] Diff URL: https://github.com/spdk/spdk/compare/9bec45256177...70ef3d917f3f [16:10:36] *** Parts: travis-ci (~travis-ci@ec2-54-224-74-94.compute-1.amazonaws.com) () [16:12:27] *** Joins: travis-ci (~travis-ci@ec2-54-198-87-145.compute-1.amazonaws.com) [16:12:28] (spdk/master) doc: update the NVMe-oF user guide (Seth Howell) [16:12:28] Diff URL: https://github.com/spdk/spdk/compare/70ef3d917f3f...5240cbbb9aa2 [16:12:28] *** Parts: travis-ci (~travis-ci@ec2-54-198-87-145.compute-1.amazonaws.com) () [16:13:47] *** Joins: travis-ci (~travis-ci@ec2-54-92-227-247.compute-1.amazonaws.com) [16:13:48] (spdk/master) env: add --huge-dir option (Darek Stojaczyk) [16:13:48] Diff URL: https://github.com/spdk/spdk/compare/5240cbbb9aa2...3e75e90a8ee7 [16:13:48] *** Parts: travis-ci (~travis-ci@ec2-54-92-227-247.compute-1.amazonaws.com) () [16:15:04] *** Joins: travis-ci (~travis-ci@ec2-54-80-111-73.compute-1.amazonaws.com) [16:15:05] (spdk/master) bdev: add unit tests for double buffering in bdev modules (Piotr Pelplinski) [16:15:06] Diff URL: https://github.com/spdk/spdk/compare/3e75e90a8ee7...dadd2a6dc0cb [16:15:06] *** Parts: travis-ci (~travis-ci@ec2-54-80-111-73.compute-1.amazonaws.com) () [16:19:39] *** Joins: travis-ci (~travis-ci@ec2-54-163-51-240.compute-1.amazonaws.com) [16:19:40] (spdk/master) pci: fix config access return codes on BSD (Darek Stojaczyk) [16:19:41] Diff URL: https://github.com/spdk/spdk/compare/dadd2a6dc0cb...f4ba781552a3 [16:19:41] *** Parts: travis-ci (~travis-ci@ec2-54-163-51-240.compute-1.amazonaws.com) () [16:37:48] *** Joins: travis-ci (~travis-ci@ec2-54-198-87-145.compute-1.amazonaws.com) [16:37:49] (spdk/master) setup.sh: Enable users select kernel driver for identified PCI deivces (tone.zhang) [16:37:49] Diff URL: https://github.com/spdk/spdk/compare/f4ba781552a3...e93d56b1edd2 [16:37:49] *** Parts: travis-ci (~travis-ci@ec2-54-198-87-145.compute-1.amazonaws.com) () [16:38:52] *** Joins: travis-ci (~travis-ci@ec2-54-198-66-181.compute-1.amazonaws.com) [16:38:53] (spdk/master) lib/nvme: tolerate abnormal char device (Liu Xiaodong) [16:38:53] Diff URL: https://github.com/spdk/spdk/compare/e93d56b1edd2...5aace13984b8 [16:38:53] *** Parts: travis-ci (~travis-ci@ec2-54-198-66-181.compute-1.amazonaws.com) () [16:45:25] *** Quits: gila (~gila@5ED74129.cm-7-8b.dynamic.ziggo.nl) (Quit: My Mac Pro has gone to sleep. ZZZzzz…) [17:28:23] Repo's on gerrithub and github out of sync ? [17:29:14] A "git pull" on gerrithub followed by a listing of the tags fails to show the v18.10 which I *can* see after cloning from github. [18:05:09] *** Quits: LiuXiaodong (86868b4b@gateway/web/freenode/ip.134.134.139.75) (Ping timeout: 256 seconds) [18:20:26] *** Joins: peluse (~peluse@134.134.139.72) [18:20:26] *** ChanServ sets mode: +o peluse [18:21:45] bwalker, on the thing you mentioned earlier where I asked if you were talking about the passthru claim failing because it was already claimed, if that was what you were looking at it happened on my compress patch on Jenkins with no test changes yet for compression. FYI... https://ci.spdk.io/spdk-jenkins/results/autotest-per-patch/builds/14661/archive/blockdev_autotest/build.log [20:45:19] Poker,12890