[00:19:08] *** Quits: ziyeyang_ (ziyeyang@nat/intel/x-pdphfilzwfiktvqc) (Ping timeout: 252 seconds) [00:28:09] *** Joins: ziyeyang_ (~ziyeyang@134.134.139.82) [00:44:41] *** Quits: ziyeyang_ (~ziyeyang@134.134.139.82) (Remote host closed the connection) [00:47:55] *** Joins: ziyeyang_ (~ziyeyang@192.55.54.45) [00:54:15] To further promote the development of SPDK technology and its community and to provide a platform for exchange and sharing, the SPDK China Summit 2018 will be held in the Crowne Plaza Hotel Sun Palace Beijing on March 23rd, 2018. We sincerely invite you to attend this summit and discuss the status of SPDK and its future development. In this summit, Intel and SPDK users(e.g., Alibaba, Huawei, Hitachi, FusionStack and etc.) will share some topics related [00:54:15] with SPDK program and community development. Welcome to join this summit, and the following shows the detailed info for conference registration. [00:54:16] https://www.bagevent.com/event/1177885 [01:31:01] *** Joins: gila (~gila@5ED4D9C8.cm-7-5d.dynamic.ziggo.nl) [01:36:10] *** Quits: ziyeyang_ (~ziyeyang@192.55.54.45) (Remote host closed the connection) [05:05:49] *** Quits: gila (~gila@5ED4D9C8.cm-7-5d.dynamic.ziggo.nl) (Quit: My Mac Pro has gone to sleep. ZZZzzz…) [05:10:45] *** Joins: gila (~gila@5ED4D9C8.cm-7-5d.dynamic.ziggo.nl) [06:01:54] *** Quits: gila (~gila@5ED4D9C8.cm-7-5d.dynamic.ziggo.nl) (Quit: My Mac Pro has gone to sleep. ZZZzzz…) [06:14:28] *** Joins: gila (~gila@5ED4D9C8.cm-7-5d.dynamic.ziggo.nl) [06:26:24] *** Quits: gila (~gila@5ED4D9C8.cm-7-5d.dynamic.ziggo.nl) (Quit: My Mac Pro has gone to sleep. ZZZzzz…) [07:23:07] *** Joins: gila (~gila@5ED4D9C8.cm-7-5d.dynamic.ziggo.nl) [07:24:24] *** Quits: gila (~gila@5ED4D9C8.cm-7-5d.dynamic.ziggo.nl) (Client Quit) [09:33:49] peluse: part -> partition [10:24:40] hmm, it looks like SPDK_TEST_VHOST_INIT is not enabled on any of the machines currently; is that intentional? [10:25:04] drv: it should be enabled on fedora-07 [10:25:10] oh, I see [10:25:34] I thought only fedora-05 and fedora-08 ran vhost, but obviously fedora-07 also says vhost - thanks :) [10:26:43] I am a bit concerned about the vhost pci tests recompiling all of SPDK inside the VM: https://review.gerrithub.io/#/c/394248/ [10:27:22] the setup_vm step (which includes the SPDK build) takes almost 2 minutes: https://ci.spdk.io/builds/review/8d049350e3b0654025e95ef72c7aae6ff9e62ebe.1516959870/fedora-07/timing.svg [10:28:15] hmmm, you're right [10:28:43] if we can ensure that the guest VM and the host are running the same distro/version, we could probably just copy in the necessary binary from the host build [10:28:44] we pass-through the cpu, so we could just copy the precompiled fio_plugin library [10:30:13] yeah, as long as the FIO version is the same [10:30:33] it looks like maybe the fio source is baked into the VM image, but we could copy over the fio binary from the host as well [10:33:23] pniedzwx is on holiday next week, so I guess I'll take over this patch series [10:34:50] we could probably check it in as is and optimize it, just to get coverage of the PCI virtio code for now [10:35:03] up to you [10:35:27] (since it seems it doesn't make the overall test run any longer) [10:47:30] drv: premature optimization is the root of all evil [10:47:39] :) [10:47:41] but it's also fun :) [10:47:51] ok, let's merge it as is [11:11:15] bwalker: blob thin provisioning patch looks ready https://review.gerrithub.io/#/c/391422/ - mszwed squashed the patches like we requested [11:12:45] *** Joins: lhodev (~Adium@209.58.131.5) [11:38:35] jimharris - in the write path [11:38:40] why does it start by syncing out the metadata? [11:38:45] the metadata wasn't modified... [11:39:29] oh - you're right - this was needed in the 'wrong' earlier patch where the cluster was immediately written to the cluster array [11:40:16] now the cluster becomes allocated in response to the write and correctly tests for races with other threads allocating the cluster [11:40:27] but how do you know what the metadata state is now dirty after the write? [11:40:36] somehow it needs to know to sync [11:40:40] or it needs to do it automatically [11:41:05] can't it just always sync? [11:41:23] not really because we haven't worked out coordinating syncs on different threads [11:41:38] right now we still just say the user shouldn't do syncs from different threads at the same time [11:42:01] so if we just automatically did it here, you could have a scenario where one thread allocates a cluster, issues a sync write [11:42:15] and while that is pending, another thread allocates a cluster, regenerates the metadata and issues a sync write [11:42:19] and those get reordered by the disk [11:44:51] it will actually assert if that happens inside the sync [11:45:10] whatever we do - it should be separate from this patch [11:45:12] now we could change that behavior to return -EAGAIN or queue or something [11:46:11] maybe the next patch in the series fixes it [11:46:12] let me look [11:46:12] what if when you sync - you do an atomic increment on a per-blob value [11:46:38] if the value was 0, you do your sync [11:47:23] if the value was > 0, you don't do the sync, and whichever sync was already in progress issues another sync after it is done [11:47:40] but how do you deliver completion callbacks? [11:47:44] the next patch doesn't fix this [11:47:51] just complete the write before the sync completes? [11:48:04] because that means if you lose power, we completed a write that will be lost [11:48:18] yeah there would need to be more to it than that [11:49:18] i think we need to send all syncs to the thread that did the bs_open [11:49:33] the next patch has problems too - the flag for a cluster alloc in progress can't be on the channel [11:49:49] it doesn't help anything to queue per channel - the real potential conflict is across channels [11:49:52] why not? [11:50:10] what bug does a per channel flag solve? [11:50:18] what's the scenario that this is preventing? [11:51:11] multiple 4KB writes within the same unallocated cluster [11:51:16] for example [11:51:28] but what if you do that from two different channels? [11:52:23] the one that loses will reissue the user_op on the cluster of the one that won [11:53:00] but there's no need to allow these races to happen if its on the same channel - that just wastes time and bandwidth [11:53:16] I see - the other atomic blocks it [11:53:19] this also allows us to reduce the number of cluster-sized buffers we allocate [11:53:27] ok, I get this one then [11:53:43] it's just an optimization [11:54:35] without it - if you did 4kb random writes to a thin provisioned blob - on an lvol you'd allocate 4MB * queue depth of memory buffers [11:54:45] yeah [11:55:27] now it's just 4MB per channel and we can reduce that further in the future - do 256KB copy at a time [11:56:02] so a preliminary patch to message pass the syncs and queue/squash as necessary is what's needed [12:02:28] *** Quits: lhodev (~Adium@209.58.131.5) (Quit: Leaving.) [12:08:11] *** Joins: gila (~gila@5ED4D9C8.cm-7-5d.dynamic.ziggo.nl) [12:18:11] *** Joins: lhodev (~Adium@wsip-72-196-168-95.sd.sd.cox.net) [12:51:30] *** Quits: lhodev (~Adium@wsip-72-196-168-95.sd.sd.cox.net) (Quit: Leaving.) [13:13:09] *** Joins: lhodev (~Adium@ip68-107-79-69.sd.sd.cox.net) [13:20:48] drv: can you take a look at setting up the live migration tests that tomasz forwarded? [13:21:32] I'm looking at it now [13:21:38] sweet [13:23:01] can you take a look at the 3 patches before the final live migration one? I think they can go in as-is and then we can tweak the final patch [13:23:01] https://review.gerrithub.io/#/c/395740/ [13:27:43] they look fine to me but would like darsto_ to look before we commit [13:27:48] i've assigned them to him [13:28:42] ok, sounds good [13:36:41] i'm afraid the very first patch in the series is a hack (395514: vhost: prevent IO freeze by kicking all queues after starting device) [13:38:34] let me debug this case a bit further [13:44:00] jimharris is pushing magical empty patches again :) [13:44:12] that one was *not* on purpose [13:44:53] my stgit patch stack in my head did not match the real one before i pushed :) [13:45:11] hmm, cloning qemu is failing because it's trying to check out some submodules via git:// protocol [14:00:17] *** Joins: AlanA (94571712@gateway/web/freenode/ip.148.87.23.18) [14:02:37] *** Quits: AlanP_ (94571712@gateway/web/freenode/ip.148.87.23.18) (Ping timeout: 260 seconds) [14:03:28] ok, I think I have the live migration test up and running [14:03:44] fio is failing with verification errors, not sure if that is the failure I'm supposed to see? [14:03:55] yes - I think that's the failure [14:40:03] hmm, I don't see any obvious pattern to the failing LBA - it isn't the last one that we read or anything like that [14:40:39] is it doing sequential I/O? or how do you know is wasn't the last one read? [14:41:01] I turned on the SCSI trace flag and I'm looking at spdk_bdev_scsi_read logs [14:41:19] oh - nice that it still reproduces with all of that spew to the console :) [14:41:31] yeah, seems to be pretty quick even with logging on [14:42:08] the reads look sequential but of random size, not sure if that's expected [14:42:54] I guess that's due to the blocksize_range=4k-512k [14:42:58] oh - can you change their script to use a fixed IO size? [14:42:59] yeah [14:43:15] should be simpler to debug with sequential 4K I'd think [14:43:41] a lot more debug spam with 4k :) [14:44:11] maybe sequential 64K then? [14:44:26] well, it passed with 4k, but I don't know if that's due to timing or 4k blocks or what [14:44:29] ok, I'll try 64k [14:44:41] interesting [14:45:13] I just tried it once; don't know if that's reliable [14:45:38] 64k also seems to have worked [14:46:07] 512k? [14:46:25] ok, failed with bs=512k [14:46:33] time to bisect [14:47:41] are his patches rebased off of latest master? just thinking about the changes we made to the vhost MAX_IOVS earlier this week [14:48:18] no, it doesn't have the 129 iov patch [14:48:41] i don't see how that would matter [14:48:57] this is malloc backend? [14:49:14] yes [14:49:19] ioat disabled? [14:49:22] right [14:49:28] 32768 blocks of 4096 bytes each [14:49:42] bs=256k has worked twice in a row [14:49:50] ship it [14:49:59] going to try 512k again to make sure I'm not crazy :) [14:50:27] hm, 512k worked that time [14:52:23] sounds like you fixed it [14:52:59] well, I don't know what's going on now [14:53:13] it's just not failing for me [14:53:29] are you stopping/restarting the vhost target each time? [14:53:56] I'm running test/vhost/spdk_vhost -m, which I think kills the target at the end [14:56:03] going back to bs_range doesn't make it fail? [14:56:35] I put it back to the original bs_range and now it did fail again [14:56:45] alrighty then [14:56:51] interestingly, this time the offset it's complaining about was read right before some VHOST_CONFIG messages [15:00:06] bwalker: this cross thread insert cluster mechanism is going to be pretty simple in the end [15:00:16] yeah I agree it's not bad [15:00:26] we have all the necessary primitives now [15:01:49] it will probably make the patch series on the whole fewer lines of code and less complex [15:11:09] I rebased the live migration patches on top of latest master just to be sure; the bs_range=4k-512k test still fails [15:11:30] that's actually what i would have expected - but just wanted to make sure [15:12:08] uio or vfio? [15:12:54] vfio on my system [15:54:38] any luck? [15:55:26] we're still iterating on it to try and simplify [15:55:42] but it mostly works in all of the simple cases [15:55:54] *** Quits: gila (~gila@5ED4D9C8.cm-7-5d.dynamic.ziggo.nl) (Quit: My Mac Pro has gone to sleep. ZZZzzz…) [15:59:46] patches pushed for md sync [16:21:34] *** Joins: ksingh (c6ba0002@gateway/web/freenode/ip.198.186.0.2) [16:22:26] Hi Jim, do you have a min? I have a quick question about debugging spdk [16:23:06] Hi @jimharris, I have a quick question about debugging spdk [16:23:17] ksingh, ask away there are a few people here [16:24:57] I'm new to spdk and am trying to debug an issue. Is there a write up somewhere that describes debugging mechanisms in place for spdk? Logs, switches, etc? [16:25:38] ksingh, i can provide a few quick tips as many other can as well. let me check the docs and see if we have anything more complete... onse sec [16:28:04] ksingh, hmm, I don't see anything that really hits the mark but there is a lot of doc work ongoing right now and there's an item on the list somewhere, actually I think I said I'd start it but anyone can, to help guide someone through understanding a failure in CI and some basic debug steps. [16:28:09] but since that's not here yet... [16:29:00] the most basic thing I can suggest is make sure you build with CONFIG_DEBUG=y and use gdb to start one of the example apps. Set some breakpoints and poke around to learn or if you have a problem you are trying to track down repro under gdb [16:29:05] you've used gdb before? [16:29:15] That's ok. I'm willing to learn on the go. If you can throw some quick steps at me I can run with it [16:29:20] Yes for gdb [16:29:26] cool [16:29:35] any particular module you're focusing in on? [16:30:20] Not exactly sure but I'm running app/nvmf_tgt. The failure I'm seeing happens when a connect or discover command arrives [16:30:40] (also note that we try to keep available work for learning here: https://trello.com/b/P5xBO7UR/things-to-do in the 'low hanging fruit' column) [16:31:34] yeah OK, I personally haven't done much work there but starting the target side with gdb and reproducing might be a good place to start setting a brkpoint and some interesting point you see based on the error message [16:31:50] Ok. I'll check that as well. Regarding the failure I'm seeing qelr_poll_cq_req and rdma.c in most of the error messages [16:32:51] OK, another great first step would be to do a short write up of your scenario and what you are seeing/thinking and send to the dist list. That's usually better if the question requires more than a few lines of text [16:33:16] but if one of the other guys that knew that code better than I do you could easily get real time discussion here too :) [16:33:33] was here I mean, somehow I left that out of that last comment [16:35:38] ksingh, good luck, I've gotta take off for the evening... [16:35:45] Got it. I've just joined the mailinglist. Trying to get hooked into the various channels here. I've already created issue #226 and have been getting a lot of help from Jim. [16:35:57] great! [16:36:06] ksingh, regarding Paul's suggestion of building with CONFIG_DEBUG=y, I believe you can obtain this setting via the run of the configure script with --enable-debug, and then do a build. [16:36:09] Thanks! [16:36:32] I've already been doing that pretty much from the beginning [16:36:36] hi ksingh [16:36:46] Hi Jim [16:36:54] try passing "-t nvmf" as command line arguments to nvmf_tgt [16:37:07] this will only work if you build with DEBUG=y [16:37:10] i.e. [16:37:14] ./configure --enable-debug [16:37:29] *** Parts: lhodev (~Adium@ip68-107-79-69.sd.sd.cox.net) () [16:37:39] there are a bunch of SPDK_DEBUGLOGs in the nvmf library that will get printed in this mode [16:39:36] I had used -t nvmf a few days ago. I kept printing messages continuously so I stopped. I just did it again for this issue and it seems to be better. I have more info. Thanks for the suggestion [16:41:32] Do you think a network trace might help here? The 1st error is: [qelr_poll_cq_req:1668]Error: POLL CQ with RDMA_CQE_REQ_STS_REMOTE_INVALID_REQUEST_ERR. QP icid=0x1 [16:42:44] i'd suggest an e-mail to the mailing list - bwalker (Ben Walker) will be able to help more on these specifics [16:42:50] do you get this on initial connect? [16:45:29] Yes, initial connect [16:45:34] Ok, I'll email it [16:53:12] qelr_poll_cq_req - what is this structure/function/file from? I don't recognize it [16:53:47] Yeah, I can't find it either. Strange. It appears in the coredump though. [16:57:33] Sure enough. Its in ./libibverbs/libqedr-rdmav2.so matches [17:25:34] *** Quits: ksingh (c6ba0002@gateway/web/freenode/ip.198.186.0.2) (Ping timeout: 260 seconds)