[00:17:34] *** Quits: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) (Ping timeout: 260 seconds) [00:56:22] *** Quits: ziyeyang__ (~ziyeyang@192.55.54.41) (Remote host closed the connection) [03:05:39] *** Joins: felipefr (~felipef@219.93.16.226) [03:08:49] Has anyone tried using the virtio-scsi pmd to drive multiple targets (on the same controller) with multiple threads (from the fio_plugin) ? [03:09:44] With 3 threads over 4 targets I'm getting: [03:09:53] rte_virtio/virtio_dev.c: 572:virtio_dev_find_and_acquire_queue: *ERROR*: no more unused virtio queues with idx >= 2. [03:09:53] bdev_virtio.c: 543:bdev_virtio_create_cb: *ERROR*: Couldn't get an unused queue for the io_channel. [03:09:53] fio_plugin.c: 374:spdk_fio_init: *ERROR*: Unable to get I/O channel for bdev. [03:09:55] fio: io engine spdk_bdev init failed. [03:09:57] fio: pid=2503, err=-1/ [03:10:10] (my virtio-scsi controller is MQ) [03:25:59] *** Quits: felipefr (~felipef@219.93.16.226) (Remote host closed the connection) [04:41:06] *** Joins: ziyeyang_ (~ziyeyang@134.134.139.75) [04:44:44] *** Quits: ziyeyang_ (~ziyeyang@134.134.139.75) (Client Quit) [07:18:53] *** Joins: felipefr (~felipef@219.93.16.226) [07:39:01] *** Joins: bwalker (~bwalker@192.55.54.40) [07:39:01] *** Server sets mode: +cnt [07:39:02] *** Server sets mode: +cnt [07:41:01] *** Joins: ppelplin (~ppelplin@192.55.54.40) [07:43:30] *** Joins: pbshah1 (~pbshah1@192.55.54.40) [07:44:57] *** Joins: changpe1 (~changpe1@192.55.54.40) [07:47:28] *** Joins: cunyinch (~cunyinch@192.55.54.40) [07:47:59] *** Joins: tsg (~tsg@192.55.54.40) [07:48:30] *** Joins: jimharris (~jimharris@192.55.54.40) [07:48:59] *** Joins: pzedlews (~pzedlews@192.55.54.40) [07:51:31] *** Joins: qdai2 (~qdai2@192.55.54.40) [07:52:05] *** Joins: mszwed (~mszwed@192.55.54.40) [07:53:36] *** Joins: lgalkax (~lgalkax@192.55.54.40) [07:54:32] *** Joins: darsto (~dstojacx@192.55.54.40) [07:54:56] *** darsto is now known as Guest22972 [07:55:02] *** Joins: kjakimia (~kjakimia@192.55.54.40) [07:57:29] *** ChanServ sets mode: +o jimharris [08:00:10] *** Joins: jkkariu (~jkkariu@192.55.54.40) [08:02:07] *** Joins: vermavis (~vermavis@192.55.54.40) [08:04:51] *** Joins: peluse (~peluse@192.55.54.40) [08:05:39] *** Joins: ziyeyang (~ziyeyang@192.55.54.40) [08:08:42] *** Joins: klateck (~klateck@192.55.54.40) [08:11:27] hi felipefr [08:11:32] hey jim [08:14:34] could you put your questions into an e-mail for the mailing list? normally darek and pawel from our team in gdansk are on this channel but i don't see them on right now [08:14:52] i'll take a look too though [08:15:38] Sure thing. Will include the fio config file! [08:15:54] just to confirm, you are using the fio plugin from within a vm (using virtio-pci) and not another host process? [08:16:00] Correct. [08:16:37] I have a VM and I'm using v17.10.x with fio 2.21. [08:16:46] My virtio-scsi controller has 4 targets with one lun each. [08:17:01] does 2 threads over 4 targets work? [08:17:05] Yes. [08:17:09] interesting [08:17:29] And the performance ramps up accordingly. "top" also shows the 2 vcpus busy. [08:17:45] *** Joins: pwodkowx (~pwodkowx@192.55.54.40) [08:18:01] But 3 or 4 threads produce the error above (twice for 4 threads). [08:18:23] Fio still runs, though. Producing numbers similar to 2 threads (I didn't check "top" to see how many threads were actually working). [08:18:50] *** Joins: jstern (~jstern@192.55.54.40) [08:19:52] *** Joins: pniedzwx (~pniedzwx@192.55.54.40) [08:20:46] how many queues did you request for your virtio-scsi controller? 4 I'm guessing? [08:25:04] Good question. Where would that be configured? On the associated conf file? [08:30:36] i think you specify that on qemu command line [08:30:54] vhost-user-scsi-pci.num_queues=uint32 [08:32:26] Oh that number of queues! Yes, that's the number of vCPUs which, for this VM, is 14. [08:33:07] (This is an E5-2680 v4 with 14 cores/socket.) [08:34:12] ok - well if you could shoot a quick e-mail to the mailing list, i'll make sure someone (if not me) tries to reproduce it [08:34:32] oh - i see pwodkowx (pawel) on now [08:34:42] pwodkowx - you there? [10:08:46] *** Joins: lhodev (~Adium@inet-hqmc04-o.oracle.com) [10:08:56] *** Parts: lhodev (~Adium@inet-hqmc04-o.oracle.com) () [11:02:42] jimharris: I responded on https://review.gerrithub.io/#/c/390306/ [11:21:18] *** Joins: boutcher (~boutcher@66.113.132.66) [11:21:46] looks good [11:22:08] how would our nvme-of target handle a new thread getting added? [11:23:20] i'm looking at all of our current for_each_thread and for_each_channel calls to come up with a solution for the race between for_each calls and new channels getting created during the for_each iterations [11:24:17] the thread is only used once you create a poll group on it [11:24:19] which is a channel [11:24:54] and really even then only when a new connection is assigned to it [11:25:09] which is done by a callback to the user when they poll the accept() call [11:25:35] oh - i'm looking at current master, i'll need to pull your latest patch set that's under review [11:29:36] looking at the top of your patch set - I still see state NVMF_TGT_INIT_CREATE_POLL_GROUPS do an spdk_for_each_thread() on function nvmf_tgt_create_poll_group [11:31:28] trying to understand the impact if a thread is added after nvmf_tgt has moved past this state [12:37:59] the nvmf_tgt app won't dynamically add and remove threads [12:38:09] that's where that NVMF_TGT_INIT_CREATE_POLL_GROUPS state is [12:38:21] the nvmf library should support dynamically adding and removing threads though [12:38:44] i.e. you can call nvmf_tgt_create_poll_group (which is basically an spdk_get_io_channel) on any thread at any time [12:40:25] jimharris, boutcher: so a while back we broke running as an unprivileged user accidentally [12:40:44] the problem is the function spdk_pci_device_claim [12:40:55] which creates a shared memory file [12:41:03] to claim a PCI device [12:41:12] and that requires root permissions [12:41:31] the issue specifically is you have to have permission to write to /dev/shm [12:41:54] so should I update setup.sh to automatically grant your user permission to write to /dev/shm just like I do for /dev/hugepage? [12:42:01] or is that overstepping [12:43:36] on the nvmf_tgt question - ok, got it - it's still not clear though, what is that pg->group used for? [12:44:05] which data structures? [12:44:23] the poll_group has two things in it - transport groups and subsystem groups [12:44:43] struct nvmf_tgt_poll_group [12:44:52] the spdk_nvmf_poll_group pointer [12:45:18] nvmf_tgt_poll_group is just an nvmf_tgt app thing - the actual structure in the library is spdk_nvmf_poll_group [12:45:37] I think nvmf_tgt_poll_group is on the chopping block - it used to hold the pointer to the real poll group and the poller [12:45:43] but now the poller is baked in to the library [12:45:45] ok [12:45:51] that answers my question :) [12:46:29] so if that's on the chopping block, then we will also get rid of the only calls we currently have for spdk_for_each_thread [12:46:55] I think the target app will still call spdk_for_each_thread [12:47:03] because it has to create an spdk_nvmf_poll_group on each thread [12:47:25] but that's all in the app - you can guarantee that a new thread won't be added or removed during that call [12:48:01] inside the create call for the spdk_nvmf_poll_group it's going to do an spdk_for_each_channel too [12:48:18] actually no, sorry [12:48:25] scratch the for_each_channel [12:48:27] why do you need to do that on each thread? [12:48:45] that's just the threading model that the current nvmf_tgt app is electing to use [12:49:01] it's using our event framework, so one thread per core [12:49:10] and on each thread it has a poll group that it is polling [12:51:02] so each thread starts a poller at app start time - for acceptor purposes? [12:51:42] it creates a poll group at app start time on each thread, but not for the acceptor [12:51:46] just so they're pre-allocated [12:51:59] so when a new connection comes in, in response to a call to accept(), it can round-robin hand them out [12:52:11] ok [12:52:23] so the user calls spdk_nvmf_accept(cb_fn, ...) [12:52:29] i got it now [12:52:44] and inside cb_fn, the user implements that by passing a message to a thread and calling spdk_nvmf_poll_group_add_qpair or whatever I named it [12:53:17] the nvmf library allows the user to do things in any order they want though - this is just how the example app is electing to use it [13:00:24] on the unprivileged user question - what exactly is failing? you shouldn't need to be root to do shm_open [13:00:39] you do need extra permissions to do shm_open with O_CREAT [13:00:52] specifically, you need permission to write to /dev/shm [13:01:06] is this some security thing that's only enabled on some systems? [13:01:10] if I grant my user permission to write to that directory, it works [13:01:39] it's a security thing, but it's enabled on most systems [13:01:45] on linux [13:01:54] not mine I guess [13:02:13] I think mostly the redhat variants [13:02:23] redhat 7, centos 7, fedora 20-something [13:02:40] it's not enabled on centos 6 [13:05:46] I just did a bunch of testing and besides that permission check, running as an unprivileged user is still working just fine [13:11:08] i don't see any solution besides changing the /dev/shm permissions [13:11:40] drv thinks we should use some other strategy to coordinate when we claim a pci device [13:11:42] if only we could do shm paths [13:11:52] does drv have any ideas? :) [13:12:31] well, we could use a file in /tmp or something like that [13:12:43] yeah - there's no reason it needs to use shm [13:14:40] are we ok just dumping these in /tmp? [13:14:51] I am fine with that [13:16:34] ok - while we're there, let's put some kind of spdk_pci_lock prefix on the file name [13:16:49] so /tmp/spdk_pci_lock_0000:04:00.0 [13:19:23] *** Guest22972 is now known as darsto [13:19:31] looks good to me [13:26:22] felipefr: for 17.10.x I believe you need ((jobs_num + 1) * lun_num) queues [13:27:23] darsto: could you try this same test when you're back in the office tomorrow? [13:27:24] there was a patch already that will let you use just (jobs_num) queues - it's already on master and will land in 18.01 [13:28:28] oh right [13:28:53] i forgot that set wasn't in 17.10.x [13:38:43] this patch lowered the number of required to queues to (jobs_num * lun_num): https://review.gerrithub.io/c/385802/, and this one lowered it to just (jobs_num): https://review.gerrithub.io/c/388505/ [14:47:04] this is the final conversion of nvmf to the new threading model: https://review.gerrithub.io/#/c/385954/ [14:47:11] after that, it's poll group driven [15:14:21] jimharris: can you take a look at https://github.com/spdk/spdk/issues/221 ? It looks like it was introduced by the change to skip mounted NVMe devices [15:30:01] i guess maybe these links changed at some point? but from the output it's not clear what /sys/block/nvme0n1/device/device actually is if it's not a symlink [15:33:58] yeah, mine is a symlink - I would guess it's more likely that it is not a symlink on some older kernel version [15:34:59] but even on our CentOS 6 machine with (RHEL-patched) 2.6.32, it's a symlink, so I'm not sure [15:52:54] Thanks darsto and jimharris. I'll try that later and let you know! [16:19:40] drv, sethhowe: could we have the testpool skip running a patch if it has [RFC] in the commit title? [16:20:16] that should be doable [16:21:20] i'm going to do a patch with some proposed blobstore API changes which I'd like to get review on but it won't come close to compiling so running it through the test pool is a waste [16:37:36] *** Quits: felipefr (~felipef@219.93.16.226) (Remote host closed the connection) [16:51:55] this patch makes the i/o channels not defer on put anymore: https://review.gerrithub.io/#/c/390524/ [16:52:12] it's failing - there are a few more places with implicit assumptions that the io channel put is deferred [16:52:17] in the bdev layer on shutdown, basically [16:52:35] so I'll clean those up, but that series of patches has all the correct io_channel logic [16:52:58] it's pretty tricky, but I think I handled every case [16:55:05] *** Joins: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) [17:14:28] *** Joins: lhodev (~Adium@inet-hqmc04-o.oracle.com) [17:14:50] *** Parts: lhodev (~Adium@inet-hqmc04-o.oracle.com) () [17:33:28] *** Joins: ziyeyang_ (~ziyeyang@134.134.139.76) [17:55:14] *** Quits: ziyeyang_ (~ziyeyang@134.134.139.76) (Remote host closed the connection) [23:35:26] *** Joins: felipefr (~felipef@219.93.16.226)