[00:32:55] Hi, Jim/Ben/Daniel, Can we add a default review group in the gerrithub, then we can add the reviewers quickly? [00:43:35] for the multi-process issue, I have raised a patch https://review.gerrithub.io/#/c/365646/ to fix the bug. If possible, please take some time review this first as it may happen in a frequent manner. [00:43:40] Thanks for your time and help. [01:01:41] *** Quits: ziyeyang_ (~ziyeyang@134.134.139.82) (Remote host closed the connection) [03:13:16] *** Joins: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) [03:14:27] *** Joins: gila_ (~gila@ec2-54-88-161-49.compute-1.amazonaws.com) [03:17:27] *** Quits: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) (Ping timeout: 240 seconds) [05:28:57] *** Quits: gila_ (~gila@ec2-54-88-161-49.compute-1.amazonaws.com) (Ping timeout: 240 seconds) [05:30:56] *** Joins: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) [07:37:42] ziyeyang, they're reviewing all incoming patches w/o the need for manually assigning names now. You can still assign an individual(s) if you think there is a special reason why that person(s) has to be involved with the patch [07:37:56] gangcao, are you talking about the multi-process thing I mentioned or a different one? [09:32:41] bwalker, drv: what does the "&" do in this patch? https://review.gerrithub.io/#/c/366087/ [09:33:05] more specifically: https://review.gerrithub.io/#/c/366087/2/test/vhost/fiotest/common.sh [09:33:20] it runs the shutdown in the background, so nothing [09:33:43] I was going to comment on that line because it really isn't right, but it will work [09:33:54] well then what does the "exit 0" do? [09:34:27] i guess it just runs that command after the nohup? [09:34:33] yeah - which does nothing [09:34:41] because the shell will block it due to background tasks [09:34:54] the "right" way to write that line is: [09:35:00] I think the idea is that they want a zero return value from the SSH call [09:35:13] yeah - that's what I was guessing [09:35:18] and shutdown is going to cause the shell to exit, so they exit 0 first, then in the background, sleep 1 and then shutdown [09:35:24] and shutdown doesn't return 0? [09:35:36] well I gave the patch a -1 for other reasons, if you want to ask him to make changes [09:35:39] well, I think it probably races with the shutdown process killing the SSH session [09:35:56] I don't know of a better way to do that, but it does look pretty clunky [09:36:03] they could just ignore the SSH return code [09:36:12] that's what I was going to recommend [09:36:23] you expect that ssh connection to die, so allow for that [09:36:29] ssh_vm "shutdonw ..." || true [09:36:32] yep [10:18:55] bwalker, FYI addressed a bunch of your comments in https://review.gerrithub.io/#/c/362847/ when you get a chance... [10:55:27] bwalker, jimharris: this should (partially) fix the nightly build: https://review.gerrithub.io/#/c/366149/ [11:47:36] peluse: https://review.gerrithub.io/#/admin/projects/spdk/spdk.github.io [11:47:56] no CI system hooked up like we had internally [11:48:00] but it's there at least [11:50:25] cool! I'll check it out a bit later [12:11:48] *** Quits: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) (Ping timeout: 240 seconds) [12:12:23] *** Joins: gila (~gila@ec2-54-88-161-49.compute-1.amazonaws.com) [14:21:47] *** Quits: peluse (~peluse@192.55.54.44) (*.net *.split) [14:21:55] *** Joins: peluse (~peluse@192.55.54.44) [14:24:03] *** Quits: qdai2 (~qdai2@192.55.54.44) (Ping timeout: 240 seconds) [14:25:18] *** Joins: qdai2 (~qdai2@192.55.54.44) [14:28:15] drv: which system has SPDK_TEST_IOAT enabled? [14:32:18] wkb-fedora-03 has it [14:40:30] *** Joins: vermavis (c037362d@gateway/web/freenode/ip.192.55.54.45) [14:45:06] *** Joins: dan_ (2658e3c4@gateway/web/freenode/ip.38.88.227.196) [14:45:28] *** dan_ is now known as Guest55311 [14:45:33] *** Quits: Guest55311 (2658e3c4@gateway/web/freenode/ip.38.88.227.196) (Client Quit) [14:50:24] jimharris: I am reviewing the qemu vhost-user-blk changes, and attempting to compile it fails, but it looks like it was broken even before the -blk changes [14:50:58] with the latest spdk branch from https://github.com/spdk/qemu, I get "contrib/vhost-user-scsi/vhost-user-scsi.c:16:25: fatal error: iscsi/iscsi.h: No such file or directory" [14:51:13] am I missing some build step? [14:55:05] there is no iscsi.h anywhere in the source tree - is that from some external library that I need to install? [14:55:55] libiscsi I think [14:57:30] hm, alright, looks like it is working after installing libiscsi-devel and re-running configure [14:59:01] I would think the vhost_scsi support in configure should depend on libiscsi being available then [14:59:10] (although not sure why vhost needs iSCSI stuff) [14:59:24] i think it's a sample or test [14:59:38] the example app just proxies vhost scsi to iscsi [15:00:12] anyway, the vhost-user-blk stuff builds, so I'm going to +1 verify it and push it, if you're OK with that [15:00:21] sounds good to me [15:00:36] cool, pushed [15:00:37] I just finished a full run through of fio + bdev layer [15:00:41] seems to be working [15:00:47] what do we need to do to update the qemu copy on the vhost machine? [15:00:49] long patch series to get there [15:17:21] jimharris: can you re-look over this series? https://review.gerrithub.io/#/c/365724/ [15:17:27] you've +2'd in the past, but I rebased [15:17:38] every patch is basically the same thing but for a different module [15:24:01] also, once we have fio working directly with the bdev layer - is there any reason to continue to maintain bdevperf and bdevio? [15:27:36] bdevio does some bounds checking tests [15:28:07] it would be nice to keep bdevperf I think - for same reason we have our own nvme/perf tool [15:28:52] one item on my todo list is to measure the overhead of the bdev layer [15:28:56] compared to going directly to NVMe [15:29:08] that's part of why I wrote this fio plugin [15:29:31] for our investigating into NVMf modes [15:29:36] and whether it is necessary to keep direct mode [15:31:12] if the result of that is that the bdev layer doesn't add measurable overhead, I think we can be more aggressive in removing some things [15:31:54] but I still think we want to keep bdevperf because of fio overhead [15:32:06] that's why we have our nvme perf tool [15:32:11] *** Quits: gila (~gila@ec2-54-88-161-49.compute-1.amazonaws.com) (Ping timeout: 268 seconds) [15:32:14] yeah I'm not so sure that's real either [15:32:25] my tentative measurements are showing fio can be made to be just as fast [15:32:31] with the right config [15:32:36] but I don'thave enough data yet [15:32:52] cool - you've tested it on an 8xSSD setup? [15:32:53] *** Joins: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) [15:33:55] 4x [15:33:58] I don't have 8 righ tnow [16:23:52] *** Quits: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) (Quit: My Mac Pro has gone to sleep. ZZZzzz…) [16:30:31] *** Joins: ziyeyang_ (~ziyeyang@134.134.139.77) [16:43:55] peluse, it could be the same issue where I am trying to fix regarding the multi-process. [16:48:53] *** Joins: ziyeyang__ (~ziyeyang@192.55.55.41) [16:48:54] *** Quits: ziyeyang_ (~ziyeyang@134.134.139.77) (Remote host closed the connection) [16:48:59] *** Quits: vermavis (c037362d@gateway/web/freenode/ip.192.55.54.45) (Ping timeout: 260 seconds) [16:58:25] gangcao, OK just tried it on my test setup (which is just running the CI tests on different HW) and it still hands in the multi_process part of nvme.sh... [17:13:19] *** Joins: sethhowe_ (~sethhowe@192.55.54.39) [17:15:05] *** Quits: sethhowe (sethhowe@nat/intel/x-dsuplzvzzwndurrh) (Remote host closed the connection) [17:15:06] *** Joins: drv_ (daniel@oak.drv.nu) [17:15:15] *** Quits: drv (daniel@oak.drv.nu) (Disconnected by services) [17:15:16] *** drv_ is now known as drv [17:15:17] *** ChanServ sets mode: +o drv [17:17:02] *** Quits: jimharris (~jimharris@192.55.54.44) (*.net *.split) [17:17:02] *** Quits: cunyinch (~cunyinch@192.55.54.44) (*.net *.split) [17:20:08] *** Quits: ziyeyang__ (~ziyeyang@192.55.55.41) (Ping timeout: 260 seconds) [17:24:36] *** Joins: jimharris (~jimharris@192.55.54.44) [17:24:36] *** Joins: cunyinch (~cunyinch@192.55.54.44) [17:24:36] *** card.freenode.net sets mode: +o jimharris [18:20:57] is there any detailed log for the failure? [18:22:20] *** Joins: frank_ (78230bc3@gateway/web/freenode/ip.120.35.11.195) [18:36:12] *** Quits: frank_ (78230bc3@gateway/web/freenode/ip.120.35.11.195) (Quit: Page closed) [18:37:05] gangcao, it's a hang on 3 threads issued as part of the while loop at the end. I haven't debugged it really at all, mostly spent time making sure I was as close software-wise as I could be to the CI systems. I do know that adding a delay between issuing the commands (identify, perf, etc) makes it go away so when I circle back around to looking at it I'll jot down which lock they're stuck on and who is waiting on what [18:37:26] might have been 4 threads.. I think 1 arb, 2 perf and 1-2 identify. I forget :) [20:21:34] so the problem here is one or more applications hang and can not exit after inserting more delays? [22:28:44] *** Joins: ziyeyang_ (ziyeyang@nat/intel/x-otasevihheufonkt)