[00:19:40] *** Joins: tkulasek (~tkulasek@134.134.139.76) [02:43:44] *** Joins: johnmeneghini (~johnmeneg@pool-100-0-53-181.bstnma.fios.verizon.net) [03:38:25] *** Quits: dlw (~Thunderbi@114.255.44.143) (Ping timeout: 268 seconds) [04:07:21] *** Quits: johnmeneghini (~johnmeneg@pool-100-0-53-181.bstnma.fios.verizon.net) (Quit: Leaving.) [04:19:02] *** Joins: johnmeneghini (~johnmeneg@pool-100-0-53-181.bstnma.fios.verizon.net) [04:22:33] *** Quits: johnmeneghini (~johnmeneg@pool-100-0-53-181.bstnma.fios.verizon.net) (Client Quit) [05:12:07] *** Joins: lyan (~lyan@2605:a000:160e:2124:4a4d:7eff:fef2:eea3) [05:12:31] *** lyan is now known as Guest46110 [05:52:34] *** Joins: johnmeneghini (~johnmeneg@pool-100-0-53-181.bstnma.fios.verizon.net) [05:52:36] *** Quits: johnmeneghini (~johnmeneg@pool-100-0-53-181.bstnma.fios.verizon.net) (Client Quit) [07:09:32] *** Joins: johnmeneghini (~johnmeneg@pool-100-0-53-181.bstnma.fios.verizon.net) [07:20:49] *** Joins: philipp-sk (~Philipp@ktnron0916w-lp130-02-76-66-162-159.dsl.bell.ca) [07:36:48] *** Joins: philipp_sk (~Philipp@ktnron0916w-lp130-02-76-66-162-159.dsl.bell.ca) [07:36:48] *** Quits: philipp-sk (~Philipp@ktnron0916w-lp130-02-76-66-162-159.dsl.bell.ca) (Read error: Connection reset by peer) [07:41:13] *** Joins: philipp-sk (~Philipp@ktnron0916w-lp130-02-76-66-162-159.dsl.bell.ca) [07:41:45] *** Quits: philipp_sk (~Philipp@ktnron0916w-lp130-02-76-66-162-159.dsl.bell.ca) (Read error: Connection reset by peer) [08:06:00] *** Joins: Jonathan_ (495dd02d@gateway/web/freenode/ip.73.93.208.45) [08:08:15] I see a repeatable core dump after several hours when running fio "reactor_7[60411]: segfault at 120 ip 0000000000442383 sp 00007f03e4dfb2f0 error 4 in nvmf_tgt[400000+b0000]". Is this a known issue? This is the latest code. Thanks [09:11:11] Jonathan_, anything specific about the test? [09:32:57] *** Quits: tkulasek (~tkulasek@134.134.139.76) (Ping timeout: 240 seconds) [10:04:53] philipp-sk: Just standard fio with different bs, iodepth with fio verify enabled. Checked and there is no correlation to these two (bs/iodepth) when the issue happen. [10:13:51] *** Quits: Jonathan_ (495dd02d@gateway/web/freenode/ip.73.93.208.45) (Ping timeout: 252 seconds) [10:29:46] *** Joins: Jonathan_ (495d992b@gateway/web/freenode/ip.73.93.153.43) [10:38:36] *** Quits: Jonathan_ (495d992b@gateway/web/freenode/ip.73.93.153.43) (Ping timeout: 252 seconds) [12:37:41] *** Quits: Guest46110 (~lyan@2605:a000:160e:2124:4a4d:7eff:fef2:eea3) (Remote host closed the connection) [12:40:07] *** Joins: lyan (~lyan@2605:a000:160e:2124:4a4d:7eff:fef2:eea3) [12:40:32] *** lyan is now known as Guest8334 [13:14:30] *** Joins: jkkariu (jkkariu@nat/intel/x-pmywolnsmonqbery) [13:44:36] *** Joins: lhodev_ (~lhodev@66-90-218-190.dyn.grandenetworks.net) [13:46:57] *** Quits: lhodev (~lhodev@66-90-218-190.dyn.grandenetworks.net) (Ping timeout: 248 seconds) [14:10:52] *** Joins: darsto_ (~darsto@89-68-136-23.dynamic.chello.pl) [14:11:44] *** Quits: darsto (~darsto@89-68-136-23.dynamic.chello.pl) (Ping timeout: 244 seconds) [14:11:44] *** darsto_ is now known as darsto [15:02:01] sethhowe_, you there? [15:02:32] Jonathan_, I do not see any failures with my fio setup. was it a clean build of yours? need more info on the failure then. [15:03:43] bwalker, any chance you could look at #416879 update? [15:07:05] peluse: yeah. Just saw your e-mail. [15:08:26] philipp-sk I think it looks good. We'll see what the tests say [15:08:49] in your testing, is this actually recovering from the firmware errors you were experiencing? [15:11:42] yes, qp reset brings it to life. [15:12:46] the other point is how initiator side feels. it does not get all the responses any way and issues disconnect [15:13:20] it doesn't get all of the responses? I thought it left the outstanding commands in the completed state [15:13:26] so when the qpair resumes, it sends all the completions [15:19:07] there are only incoming_queue that is resumed. the requests in TRANSFERRING_HOST_TO_CONTROLLER, TRANSFERRING_CONTROLLER_TO_HOST, COMPLETING are forced to COMPLETED [15:19:27] so they send no completion at all [15:19:34] no they don't [15:19:45] hmm, then the queue pair is just going to disconnect anyway [15:20:09] i left them for later [15:20:11] I wonder if you can just leave everything in the state it is in [15:20:20] do the recovery [15:20:25] right [15:20:25] then just keep going as normal [15:20:48] oh, let me think [15:21:12] will take time till tomorrow [15:21:27] otherwise, the recovery doesn't help much [15:21:33] because the initiator just kills the connection [15:22:02] well, the resources are properly freed [15:22:16] yeah - that's an improvement [15:22:36] it'd be awesome if we could do a full recovery without impacting the NVMe-oF queue state though [15:22:48] all transparent, just recovers and the generic NVMe-oF code doesn't even know something happened [15:23:25] because my experience has been that these connections die more often than you'd initially expect [15:24:15] i was planning to do it in two steps - step 1 is under review, step 2 is to properly complete the requests that are force completed now [15:24:30] that's fine with me [15:28:48] hi seth, my patch https://review.gerrithub.io/#/c/spdk/spdk/+/417292/ failed with error Process leaked file descriptors. See https://jenkins.io/redirect/troubleshooting/process-leaked-file-descriptors for more information [15:28:48] Build step 'Execute shell' marked build as failure [15:29:07] its a doc change so I did not touch any code [15:29:48] Is there a problem with nvmf_autotest [15:29:51] its the only one that failed [15:30:08] *** Quits: sethhowe_ (sethhowe@nat/intel/x-mikigcdopynqthkr) (Remote host closed the connection) [15:30:40] you scared him off [15:33:32] *** Quits: philipp-sk (~Philipp@ktnron0916w-lp130-02-76-66-162-159.dsl.bell.ca) (Ping timeout: 265 seconds) [15:33:51] *** Joins: sethhowe (sethhowe@nat/intel/x-xcohfcheyuydckqg) [15:40:44] *** Joins: travis-ci (~travis-ci@ec2-54-80-232-210.compute-1.amazonaws.com) [15:40:45] (spdk/master) doc: Add an example of using two.js to draw a diagram (Ben Walker) [15:40:46] Diff URL: https://github.com/spdk/spdk/compare/043e5edb1f1d...706c57bf2f00 [15:40:46] *** Parts: travis-ci (~travis-ci@ec2-54-80-232-210.compute-1.amazonaws.com) () [15:44:59] is this thing on? [15:45:18] yep [15:48:15] *** Quits: jimharris (jimharris@nat/intel/x-nxedwjwqmahpseoz) (Quit: ZNC - http://znc.in) [15:48:26] *** Quits: cunyinch (cunyinch@nat/intel/x-iwryzhkgaijjzeco) (Quit: ZNC - http://znc.in) [15:48:40] *** Quits: jstern (~jstern@134.134.139.72) (Quit: ZNC - http://znc.in) [15:49:33] heh, thanks. my whole windows VM exploded and nothing looked like it was working [15:49:49] *** Joins: klateck_ (~klateck@134.134.139.72) [15:50:06] reminder: this week is Asia time zone community meeting, see https://trello.com/b/DvM7XayJ/spdk-community-meeting-agenda [15:50:37] *** Quits: klateck (klateck@nat/intel/x-cpkpntuykdrvkyei) (Quit: ZNC - http://znc.in) [15:50:43] *** klateck_ is now known as jimharris [15:51:05] *** Quits: qdai2 (qdai2@nat/intel/x-fvzhijdcknusziji) (Quit: ZNC - http://znc.in) [15:52:14] *** Joins: pbshah1_ (pbshah1@nat/intel/x-huwvqqyozvlplrag) [15:55:18] *** Joins: sethhowe_ (sethhowe@nat/intel/x-awcfezyyhlgvohvy) [15:55:39] *** Quits: sethhowe (sethhowe@nat/intel/x-xcohfcheyuydckqg) (Quit: Leaving) [15:55:39] *** pbshah1_ is now known as sethhowe [15:58:50] *** Quits: sethhowe_ (sethhowe@nat/intel/x-awcfezyyhlgvohvy) (Client Quit) [16:02:29] *** Joins: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) [16:36:32] :) [16:47:50] *** Quits: Guest8334 (~lyan@2605:a000:160e:2124:4a4d:7eff:fef2:eea3) (Quit: Leaving) [16:49:27] *** Joins: sethhowe_ (~sethhowe@134.134.139.72) [16:50:17] *** Quits: sethhowe_ (~sethhowe@134.134.139.72) (Client Quit) [16:56:45] bwalker, hey, got the shutdown thing sorta working! thanks :) [17:13:02] see whatcha think at https://review.gerrithub.io/c/spdk/spdk/+/419446 [17:13:21] its really nothing more than what you said but bevperf needed one more tweak [17:25:25] is anyone already looking into nvmf test issues in the pool? I'm on my way out the door but I have a totally unrelated patch that failed with on the surface what looks like a simple memleak in nvmf somewhere. See https://ci.spdk.io/spdk-jenkins/results/autotest-per-patch/builds/7499/archive/nvmf_autotest/build.log and please yell out if you can look at this -- maybe someone in a timezone whose workday is about to start :) [17:41:25] peluse: I can look but I'm not sure what I can help yet. [18:16:25] *** Joins: dlw (~Thunderbi@114.255.44.143) [18:20:51] Shuhei, cool thanks :) I only glanced at the log so I don't know how easy/obvious it will be or if it will take of one of the nvmeof dudes.... [18:21:28] i need to learn that area a little more so if you don't figure it out and nobody else speaks up I'll look in the morning US time [18:30:43] sethhowe, do you know if https://ci.spdk.io/spdk/nightly_status.html is updated for Jenkins yet? [19:14:43] *** Quits: dlw (~Thunderbi@114.255.44.143) (Remote host closed the connection) [19:15:07] *** Joins: dlw (~Thunderbi@114.255.44.143) [20:03:25] Hi Changpeng, Gang, Wenzhong, and all, Will you take a look at https://ci.spdk.io/spdk-jenkins/results/autotest-per-patch/builds/7500/archive/nvme_autotest/build.log and https://ci.spdk.io/spdk-jenkins/results/autotest-per-patch/builds/7499/archive/nvmf_autotest/build.log ? [20:04:06] you may be able to help better than me. [20:28:32] *** Quits: johnmeneghini (~johnmeneg@pool-100-0-53-181.bstnma.fios.verizon.net) (Quit: Leaving.) [20:36:03] *** Joins: johnmeneghini (~johnmeneg@pool-100-0-53-181.bstnma.fios.verizon.net) [20:39:12] *** Quits: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) (Ping timeout: 252 seconds) [20:40:54] *** Quits: johnmeneghini (~johnmeneg@pool-100-0-53-181.bstnma.fios.verizon.net) (Client Quit)