[00:04:10] *** Quits: gila (~gila@ec2-54-88-161-49.compute-1.amazonaws.com) (Ping timeout: 240 seconds) [00:06:15] *** Joins: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) [03:55:39] *** Quits: tsuyoshi (b42b2067@gateway/web/freenode/ip.180.43.32.103) (Quit: Page closed) [04:23:01] *** Joins: johnmeneghini (~johnmeneg@pool-96-252-112-122.bstnma.fios.verizon.net) [10:32:52] drv, jkkariu: generic histogram patch is out for review/testing: https://review.gerrithub.io/#/c/365263/ [10:40:12] looks pretty good [10:41:13] made a couple of minor comments [10:42:22] jimharris: can you look at these two rocksdb patches? https://review.gerrithub.io/#/c/364532/ [10:42:37] to merge the subsystem changes I have to change the makefiles a bit [10:42:41] and it breaks RocksDB without them [11:00:13] jimharris, bwalker: are we OK with merging the current vhost-user-blk patch to our qemu fork? [11:00:16] https://review.gerrithub.io/#/c/363361/ [11:00:42] I think it should be fine to put it on our spdk branch - (hopefully) shouldn't break any existing vhost users [11:02:42] i'll take a look [11:03:02] drv: updated the histogram patch based on your comments - also added histogram to the overhead tool [11:03:09] cool [11:03:37] do you plan to rebase the histogram RPC stuff? [11:03:53] i'll ask girish how he wants to proceed [11:03:59] ok [11:05:01] any reason why the cutoff pointer is a double pointer pointer in perf? [11:05:32] oh, never mind, it's walking through the array [11:07:33] bwalker: lgtm [11:07:37] do we want to enable histogram on the overhead test in test/lib/nvme/nvme.sh? [11:09:43] yeah - just updated the patch [11:11:33] did you see if it made any noticeable difference to the overhead numbers when histogram is enabled? [11:11:44] I didn't check [11:12:44] ugh - i forgot about the "spdk_histogram" name conflict [11:13:06] yeah, I don't know what to call these different things [11:13:15] i'll try to come up with something [11:13:18] we could make the storage for the actual stats called "spdk_histogram_data" or something like that [11:16:13] yeah - we had talked about that idea before i left for vacation [11:29:20] has anyone see this rbd failure before? http://spdk.intel.com/public/spdk/builds/review/42b997e079bd9e1665c6daf2542c86e08fa66e24.1497377673/ [11:30:32] hm, that's new [11:32:44] and this one: http://spdk.intel.com/public/spdk/builds/review/57dcc3e1a7b04438e3d243dbe8de0af1ade724be.1497378073/ [11:34:52] that second one is the intermittent fio_plugin crash we've had for a long time (maybe since we added fio_plugin) [11:34:56] I don't know what's going on there [11:35:23] I suspect we're hooking the wrong io_ops and cleaning up in a non-thread-safe way [11:35:58] i see one of the other threads is doing dl unmap/close related stuff [11:38:03] looks like reap_threads is running after our plugin has been dlclose'd? [11:44:14] i see how we can fix this [11:49:31] hmmm - nope, we can't without fio changes [12:04:06] i think we need to use LD_PRELOAD instead of the current dlopen model [12:16:19] is this a general fio plugin model thing or something specific to what we're doing with ours? [12:16:30] no - it's a general fio plugin model issue [12:17:28] i'm still working out the exact threading sequence - but the log clearly shows that we've called dlclose already when the reap_threads() function tries to access the ioengine structure allocated in our plugin (via td->io_ops) [12:17:39] hmm, that does seem bad [12:17:55] i was thinking we could just set td->io_ops = NULL in our cleanup routine but that won't work either [12:18:21] i don't really think it's that bad - it actually means you set ioengine=spdk in your job file which looks nicer IMHO [12:20:08] it seems odd that no one else would be hitting this [12:20:16] maybe nobody else uses an external fio plugin [12:20:18] it's clearly intermittent [12:20:34] or they use LD_PRELOAD [12:38:05] jimharris: do we want to go ahead with https://review.gerrithub.io/#/c/365026/ ? [12:38:17] I don't think it will actually fix anything, since the io_channel assert is hitting now [12:40:53] i think it's fine to commit [12:43:17] pushed a patch to the fio mailing list - i see drv has been pushing some fio patches too :) [12:46:39] yeah, just the one thing we patch locally to avoid scan-build warnings, since that header is included via ioengine.h in our code [12:46:56] I don't think anybody responded to it yet, though [12:54:52] i think this vhost issue would go away if we tried to cleanly shutdown the vms instead of just killing them [12:55:01] Question: Is there any plan to implement support for SPDK_NVME_OPC_ABORT? [12:56:04] we have spdk_nvme_ctrlr_cmd_abort() - or are you referring to something else? [12:56:37] oh - do you mean supporting this in the virtual nvmf layer? [12:56:57] Yes, in the virtual layer. [12:57:13] it's not something we are actively working on currently [12:58:40] we don't have a mapping for an individual abort at the bdev layer right now [12:59:28] Actually, I think we need support for SPDK_NVME_OPC_ABORT in both the direct.c and virtual.c path. [12:59:44] the direct.c path should already just work (it gets passed through to the NVMe device) [13:00:45] Ok. Let me look at that. spdk_nvme_ctrlr_cmd_abort() is only implemented as a TIMEOUT_ACTION_ABORT [13:00:57] That's a deadline, not an abort. [13:01:21] the path it will take is nvmf_direct_ctrlr_process_admin_cmd() -> spdk_nvme_ctrlr_cmd_admin_raw() [13:01:48] (for direct - for virtual it doesn't do anything right now) [13:03:05] Yes, I see that. There's a passthrough: in nvmf_direct_ctrlr_process_admin_cmd [13:03:18] in your particular use case, I didn't think you would be using direct mode anyway, only virtual, since your backend is implemented as a bdev [13:04:55] I also see that we've recently added nvmf_virtual_ctrlr_nvme_passthru_io. [13:05:48] yes, there is also now a blockdev passthru for NVMe admin commands, but it's not wired up in nvmf/virtual.c yet [13:06:24] there's some problems to work out since we can have multiple unrelated bdevs in a single virtual subsystem, so we need to figure out where to direct the admin commands [13:07:03] we could implement the simple case of a subsystem with a single namespace, but anything more complicated than that needs some thought [13:10:44] Well, I need to work on this because we need to support aborts in the virtual controller path. To support abort in nvmf_virtual_ctrlr_process_admin_cmd I can either create a specific command handler for SPDK_NVME_OPC_ABORT or I can create a passthru for the virtual controller admin path. Either way, I think this will also require a new io_abort() bdev entry point. We need some way to allow SPDK_NVME_OPC_ABORT to reach [13:10:44] r and retrieve the IOs. [13:11:03] drv: is seth in the lab? [13:11:28] Also, support for the Admin Abort command is mandatory in the NVMe spec. [13:11:33] what size ssd is in fedora-08? [13:12:13] Where the blockdev passthru for NVMe admin commands? [13:12:40] spdk_nvme_ctrlr_cmd_admin_raw? [13:16:34] jimharris: seth is here - I'll poke him to get on IRC [13:17:34] johnmeneghini: the new admin passthru bdev io is SPDK_BDEV_IO_TYPE_NVME_ADMIN, which goes through bdev_nvme_admin_passthru() in the nvme bdev [13:17:45] and that eventually calls spdk_nvme_ctrlr_cmd_admin_raw(), yes [13:18:05] *** Joins: sethhowe (~sethhowe@134.134.139.76) [13:18:42] <@jimharris> what size ssd is in fedora-08? [13:23:29] johnmeneghini: support for abort is mandatory, but abort actually doing anything is not. It's "best effort". The Linux kernel, for instance, just immediately returns failure in software. [13:23:49] but we can support abort in the bdev layer [13:23:54] I'm not at all opposed to that [13:24:09] certainly, we already do if you count NVMe I/O passthru [13:29:24] the hardest part is designing the API so that the user can indicate which I/O to abort [13:29:37] but with NVMe-oF using NVMe passthru, it's easy [13:29:44] because the user is sending the abort command [13:45:56] The NVMe Abort Command still specifies a CID and an SQID. These indexes are the only way to address the command. So these indexes will either have to be passed with ever IO into the BDEV, or we will have to add some infrastructure to the virtual controller to keep tag commands with this information. In fact, for Admin commands, we have to add the infrastructure anyway because not all Admin commands to into the BDEV and [13:45:56] t Aborts for both Admin Commands and NVM Commands. [13:50:40] This becomes even more interesting when we start creating multiple IO queue structures to support the multi-thread design. Different threads in the Subsystem will have affinity to different SQs and CQs. So this means the Admin Queue thread may have to use the SQID to look up and post and event on the poller thread that owns the SQ where the command that needs to be aborted is kept. [13:54:02] I think we need something like an spdk_io_channel[] structure per SQ. Right now this structure is tied to a Namespace. [14:01:46] The other thing I should say is: I think you should go ahead with your multi-thread design. [14:01:48] https://lists.01.org/pipermail/spdk/2017-April/000450.html [14:06:30] yeah it really needs to happen - but I've been fixing the bdev layer up first [14:07:31] the new threading design will be a lot easier to implement on top of a much more fully featured bdev library [14:07:34] that's the idea at least [14:11:23] now that the bdev layer is in a lot better shape I'm about ready to start [14:33:39] drv: do we need this patch? https://review.gerrithub.io/#/c/365069/ see my comment [14:34:06] it fixes a Klocwork warning [14:34:21] I also sent it upstream to dpdk [14:34:34] we could also just close the issue as don't-care [14:37:22] I'm fine with just closing it - on linux/glibc, pthread_mutex_destroy() doesn't actually free anything anyway [14:48:44] *** Quits: konv81 (cf8c2b51@gateway/web/freenode/ip.207.140.43.81) (Quit: Page closed) [15:59:45] drv: do you know where this fio_ubuntu comes from? [16:00:23] it's v2.2.10 [16:01:50] not sure - maybe sethhowe is in the loop [16:01:58] I don't know why they have their own custom fio [16:02:40] I don't know why exactly. It is also included in the vm image. [16:02:58] in the readme they state both machines need to have the same version of fio. [16:03:28] Having this binary on both machines may be their way of ensuring that. [16:04:05] sorry, they being the VHOST team. I believe someone from that team placed it in that directory for the VHOST tests. [16:06:19] ok [17:04:23] jimharris, what compiler is freebsd using when we call gmake? [17:04:32] clang [17:04:57] thx [17:30:07] FYI 2 failures on the doc patch, one about rocksdb that I don't think I've seen before and this one which I assume is what we were talking about earlier... mk/spdk.mock.unittest.mk [17:30:16] yeah, that's not what I meant to paste :) [17:30:27] vhost: bdev.c:645: spdk_bdev_channel_destroy: Assertion `ch->io_outstanding == 0' failed. [17:30:29] that is [17:32:41] the vhost one we're actively debugging [17:33:51] for the rocksdb one - can you rebase your patch from latest master? [17:34:38] bwalker checked in some changes today that make it easier for us to modify spdk and the rocksdb shim in the same commit - previously we had to do this dance between updating the spdk and rocksdb repositories in parallel [17:45:27] yup, in just a min will do [17:47:41] OK, done. thx [17:59:30] *** Joins: tsuyoshi (b42b2067@gateway/web/freenode/ip.180.43.32.103) [18:16:30] *** Joins: ziyeyang_ (~ziyeyang@134.134.139.82) [20:36:57] looks like that took care of the rocks issue but I consistently hit the OIO issue [21:30:43] *** Quits: johnmeneghini (~johnmeneg@pool-96-252-112-122.bstnma.fios.verizon.net) (Quit: Leaving.) [23:44:26] *** Joins: tomzawadzki (tzawadzk@nat/intel/x-ctzjocrsxofcpjus)