[01:29:58] *** Joins: wzh (~wzh@114.255.44.139) [01:31:08] Hi, a simple question, unrelated to spdk but gerrithub. Why gerrithub always call me "Anonymous Coward"? [01:31:40] Although I have set my name and fullname to "wuzhouhui", set email to wuzhouhui@kingsoft.com. [01:39:20] *** Joins: travis-ci (~travis-ci@ec2-54-235-60-214.compute-1.amazonaws.com) [01:39:20] (spdk/master) nvmf/tcp: Add a poller to check the timeout of each qpair (Ziye Yang) [01:39:20] Diff URL: https://github.com/spdk/spdk/compare/9d11abfd0ee2...94cd652b1884 [01:39:20] *** Parts: travis-ci (~travis-ci@ec2-54-235-60-214.compute-1.amazonaws.com) () [01:44:57] *** Joins: travis-ci (~travis-ci@ec2-54-235-60-214.compute-1.amazonaws.com) [01:44:57] (spdk/master) setup.sh: Fix cleanup in matching files (Lance Hartmann) [01:44:57] Diff URL: https://github.com/spdk/spdk/compare/94cd652b1884...fe349098719d [01:44:57] *** Parts: travis-ci (~travis-ci@ec2-54-235-60-214.compute-1.amazonaws.com) () [06:46:15] *** Joins: vmysak (vmysak@nat/intel/x-hzcabxaqrtrtraby) [07:23:57] *** Quits: wzh (~wzh@114.255.44.139) (Ping timeout: 244 seconds) [07:29:45] wzh, seriously??? [08:52:48] *** Quits: vmysak (vmysak@nat/intel/x-hzcabxaqrtrtraby) (Ping timeout: 250 seconds) [09:01:43] *** Quits: gila (~gila@5ED74129.cm-7-8b.dynamic.ziggo.nl) (Ping timeout: 245 seconds) [09:09:13] wzh: That is a gerrit default. Just so you know, nobody in the SPDK community set that. It's the name Gerrit gives you when your username has not been configured. Under GerritHub->settings->profile->username, is anything listed? Also, under gerrithub->settings->identities have you configured any e-mail addresses? [10:30:09] *** Joins: travis-ci (~travis-ci@ec2-54-204-138-41.compute-1.amazonaws.com) [10:30:09] (spdk/master) hotplug: annotated temporarily for no base_img (WangHaiLiang) [10:30:09] Diff URL: https://github.com/spdk/spdk/compare/8c89954d4b0b...8114cd367a7c [10:30:09] *** Parts: travis-ci (~travis-ci@ec2-54-204-138-41.compute-1.amazonaws.com) () [10:33:44] *** Joins: travis-ci (~travis-ci@ec2-54-224-252-166.compute-1.amazonaws.com) [10:33:44] (spdk/master) util: added spdk_divide_round_up() (Konrad Sztyber) [10:33:44] Diff URL: https://github.com/spdk/spdk/compare/8114cd367a7c...62db4ac2cfb3 [10:33:44] *** Parts: travis-ci (~travis-ci@ec2-54-224-252-166.compute-1.amazonaws.com) () [10:34:27] *** Joins: travis-ci (~travis-ci@ec2-54-204-138-41.compute-1.amazonaws.com) [10:34:28] (spdk/master) util/crc16: Add init_crc parameter as seed value to spdk_crc16_t10dif (Shuhei Matsumoto) [10:34:29] Diff URL: https://github.com/spdk/spdk/compare/d49402fe5f6f...e303567bc1d5 [10:34:29] *** Parts: travis-ci (~travis-ci@ec2-54-204-138-41.compute-1.amazonaws.com) () [10:43:29] jimharris, possibly quick reduce make question? [10:43:39] sure [10:45:55] *** Joins: bwalker_ (~bwalker@134.134.139.72) [10:45:55] *** ChanServ sets mode: +o bwalker_ [10:45:56] OK, in the initial patch where I integrated reducelib into the vbdev build I included -lpmem in mk/spdk.modules.mk accordingly. Now, for whatever reason, I can't build unless I include it (conditionally of course) in mk/spdk.common.mk as it fails on the ld for one of the example apps, missing reference to a pmem function [10:46:48] never totally under the hierarchy of our make file fragments.... [10:46:54] stood :) [10:47:35] no time like the present to learn how they work :) [10:47:54] which example app, and what is the final link line when it fails to link? [10:48:26] pass Q= to make to get all of the output [10:48:37] one sec, have to either repro or scroll waaaay back :) [10:49:24] *** Joins: travis-ci (~travis-ci@ec2-54-91-19-110.compute-1.amazonaws.com) [10:49:25] (spdk/master) nvme/perf:Fix two small defects. (WangHaiLiang) [10:49:26] Diff URL: https://github.com/spdk/spdk/compare/e303567bc1d5...93c7efbbe858 [10:49:26] *** Parts: travis-ci (~travis-ci@ec2-54-91-19-110.compute-1.amazonaws.com) () [10:50:44] jimharris, https://gist.github.com/peluse/8f61142a98255f7f6105f9cb07a7f336 [10:50:56] bwalker, bwalker_: Did you see that we got a hit on the v18.10.1 latent failure with a dmesg log? https://ci.spdk.io/spdk/builds/review/8455986376c790f8b86248f211c95c79d025256c.1545091111/fedora-03/ [10:51:11] oh great I'll take al ook [10:53:05] jimharris, hmmm, I think I see what it is [10:53:15] ok [10:53:32] will try something real quick and let ya know [10:57:09] seems to work so no hurry it will get reviewed anyways but look at the change between set 7 and 8 on https://review.gerrithub.io/c/spdk/spdk/+/435764/8. [11:12:34] peluse: yes - that change is correct, my guess is your first version of this patch was from before the massive makefile cleanups that went in a week or two ago [11:12:51] jimharris, rock n roll. thanks! [11:30:47] ok who has ideas on how to get around this scan build issue: https://ci.spdk.io/spdk/builds/review/e6a182d77eda4d179b536455116968c71cd5a34e.1545088473/fedora-01/scan-build/report-05e3c1.html#EndPath [11:31:05] it's a filed bug against LLVM: https://bugs.llvm.org/show_bug.cgi?id=18222 [11:34:52] *** Joins: vmysak (vmysak@nat/intel/x-sqpbzadjfqlxbdtu) [11:34:54] bwalker_, what if you break and return outside of the macro? [11:36:27] what line? [11:37:00] oh on line 795? [11:37:02] are you looking at the comment on line 796? [11:37:23] yeah it blows up because it can't execute line 796 [11:37:37] so a break statement on line 797 won't change that I don't think [11:38:13] the problem, I think, is that it assumes the wrong path on the first delete call [11:39:18] could you try explicitly zeroing the TAILQ_ENTRY? [11:39:46] i don't like that as a solution but am curious if it changes the scan-build behavior [11:39:51] yeah will try it [11:40:11] I tried to use __builtin_assume to hint to clang, but not available on the clang version we're using [11:40:13] very new I guess [11:40:28] bwalker_: I figured out why we aren't hitting the v18.10.1 latent failures on master for the nvmf_lvol and shutdown tests (observed in the jenkins tp) on master https://review.gerrithub.io/#/c/spdk/spdk/+/432085/ https://review.gerrithub.io/#/c/spdk/spdk/+/432356/ [11:41:19] oh of course, we changed the test [11:41:34] well now we're in a pickle [11:42:04] this could be a kernel bug - in fact that's my best guess. But I'm not absolutely sure it is. [11:43:01] I feel like that's why we changed the tests in the first place. [11:43:17] it was also a speed thing [11:43:49] and an error recovery thing - when we'd have bugs in our target, the kernel module would often wedge [11:44:02] and we'd have to wait a full timeout [11:44:03] Not because of this specific failure, but I thought we were seeing a race condition where the device would show up in sysfs before it was available for writing? [11:44:21] yes that too [11:46:08] But we explicitly test against Do we really explicitly test I/O against the kernel target anywhere else? [11:46:26] I mean a little in filesystem and discovery, but not much. [11:46:51] we want to test against the kernel initiator for sure [11:47:07] just not necessarily in a stressful scenario like shutdown [11:47:49] in dmesg it looks like it sees a bunch of I/O timeout [11:48:09] but hard to line up - that could be after the point where our target also reported that the connection has dropped [11:48:40] it could be something lower down, where the RDMA stack is getting tripped up by some specific usage and the connection drops [11:53:47] so on that static analysis failure, the problem is step #3 where it assumes it finds a file corresponding to name "file1" [11:53:51] that fails - the test verifies that it does [12:13:26] *** Quits: bwalker_ (~bwalker@134.134.139.72) (Ping timeout: 250 seconds) [12:42:47] *** Quits: vmysak (vmysak@nat/intel/x-sqpbzadjfqlxbdtu) (Remote host closed the connection) [12:50:16] bwalker_: If you change the cu_assert to an SPDK_CU_ASSERT_FATAL on line 352, does your issue go away. I think that by switching to a fatal assert you can preclude the assumption from step 3. [12:53:32] bwalker_: does your issue go away. -> does your issue go away? I'm not yoda haha [13:31:16] bwalker_: nvm about my first suggestion. Clang can't associate the return code from the callback with the function behavior, but I found a new soln that fixes it by only modifying unittest code. https://review.gerrithub.io/#/c/spdk/spdk/+/437737/ [14:01:37] peluse: https://review.gerrithub.io/#/c/spdk/spdk/+/436781/ [14:02:18] jimharris, thanks [14:02:29] i responded to your question [14:02:40] yeah, thats why I said thanks :) [14:03:00] just wanted to make sure - thought maybe you were just thanking me for the +2 :) [14:03:59] sethhowe: that seems reasonable - maybe we could generalize this into a macro that we could put in spdk_cunit.h? in case we end up with a need for it again [14:09:21] jimharris: OK. Yeah, we might run into that until the bug in clang is fixed. [14:13:05] *** Joins: gila (~gila@5ED74129.cm-7-8b.dynamic.ziggo.nl) [14:13:17] jimharris, well that too but yeah I actually read all my patch comments ;) [14:34:38] *** Quits: gila (~gila@5ED74129.cm-7-8b.dynamic.ziggo.nl) (Quit: My Mac Pro has gone to sleep. ZZZzzz…) [14:47:02] *** Joins: darsto (~darsto@89-78-174-111.dynamic.chello.pl) [14:50:21] Looking over the messages earlier today (re: latent failures around nvmf_lvol), are we close (or closer) to cutting 18.10.1 ? [14:56:32] lhodev: closer, I think. I want to defer to bwalker on the answer to that question though. [15:18:25] sethhowe: To save me digging hours through code, could you give me a summary of the difference in test action between master's and v18.10.x version of nvmf_lvol.sh? [15:19:02] I can see one is using nvmf_fio.py and the other is bdevperf. [15:20:37] Do I understand correctly that the former relies on fio + nvmf kernel initiator code, where the latter is all userspace (SPDK) initiator? [15:21:15] And, that, they both were doing some kind of write+read+verify ops ? [15:37:48] Sorry for the late reply. Yeah, essentially we made the change right after the release to have nvmf_lvol test our target against our initiator which completely masks the issue we are seeing on v18.10.x without fixing it. It's possible that the issue is just the kernel initiator misbehaving, but we are not sure. [15:45:34] Examining top of the dmesg, looks like that's kernel version 4.18.9 which is quite recent. [15:46:52] Top of tree kernel.org is 4.20.0-rc7 [15:48:22] I'll do a quick sniff through the git log of changes to source in drivers/nvme/host. See if anything obvious or suspicious appears. [16:39:02] *** Joins: travis-ci (~travis-ci@ec2-54-81-207-242.compute-1.amazonaws.com) [16:39:03] (spdk/master) reduce: put pmem_unmap() in _init_load_cleanup() (wuzhouhui) [16:39:03] Diff URL: https://github.com/spdk/spdk/compare/84b5a8c1d616...7fbc5106e43d [16:39:03] *** Parts: travis-ci (~travis-ci@ec2-54-81-207-242.compute-1.amazonaws.com) () [16:39:36] *** Joins: travis-ci (~travis-ci@ec2-54-167-138-122.compute-1.amazonaws.com) [16:39:37] (spdk/master) reduce: check strlen(SPDK_REDUCE_SIGNATURE) in buildtime (wuzhouhui) [16:39:37] Diff URL: https://github.com/spdk/spdk/compare/7fbc5106e43d...3a4185be1924 [16:39:37] *** Parts: travis-ci (~travis-ci@ec2-54-167-138-122.compute-1.amazonaws.com) () [18:50:00] *** Joins: travis-ci (~travis-ci@ec2-54-204-138-41.compute-1.amazonaws.com) [18:50:01] (spdk/master) example/sock: avoid register poller multiple times (yidong0635) [18:50:01] Diff URL: https://github.com/spdk/spdk/compare/8db5ff2bddd8...c26bd15881ab [18:50:01] *** Parts: travis-ci (~travis-ci@ec2-54-204-138-41.compute-1.amazonaws.com) ()