[00:48:47] *** Joins: tomzawadzki (~tomzawadz@134.134.139.82) [01:50:50] *** Quits: tomzawadzki (~tomzawadz@134.134.139.82) (Remote host closed the connection) [01:51:04] *** Joins: tomzawadzki (~tomzawadz@134.134.139.82) [02:01:20] *** Quits: tomzawadzki (~tomzawadz@134.134.139.82) (Ping timeout: 255 seconds) [02:06:43] *** Joins: tomzawadzki (~tomzawadz@134.134.139.82) [02:14:13] *** Joins: tzawadzki (tomzawadzk@nat/intel/x-olpuamhatcmhwzdy) [02:16:11] *** Quits: tomzawadzki (~tomzawadz@134.134.139.82) (Ping timeout: 256 seconds) [02:28:09] *** Quits: tzawadzki (tomzawadzk@nat/intel/x-olpuamhatcmhwzdy) (Remote host closed the connection) [02:28:16] *** Joins: tzawadzki (tomzawadzk@nat/intel/x-grzgitlalhtteuud) [02:31:53] *** Quits: tzawadzki (tomzawadzk@nat/intel/x-grzgitlalhtteuud) (Remote host closed the connection) [02:32:16] *** Joins: tzawadzki (tomzawadzk@nat/intel/x-uqlnsnjnxorjqiau) [02:36:34] *** Quits: tzawadzki (tomzawadzk@nat/intel/x-uqlnsnjnxorjqiau) (Remote host closed the connection) [02:52:09] *** Quits: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) (Ping timeout: 260 seconds) [06:18:45] *** Joins: boutcher_ (~boutcher@66.113.132.66) [06:20:05] *** Quits: boutcher (~boutcher@66.113.132.66) (Ping timeout: 240 seconds) [06:35:18] *** Joins: tomzawadzki (~tomzawadz@134.134.139.82) [06:38:05] *** Quits: boutcher_ (~boutcher@66.113.132.66) (Ping timeout: 260 seconds) [06:43:12] *** Joins: boutcher (~boutcher@66.113.132.66) [07:16:03] jimharris: I have never seen VHOST_IOVS_MAX being reached before [07:16:46] I saw it just now and I have bad news: vhost.c: 338:spdk_vhost_vring_desc_to_iov: *ERROR*: SPDK_VHOST_IOVS_MAX(129) reached [07:18:41] http://spdk.intel.com/public/spdk/builds/review/99841b8541b93e12c4a2cfe454155fb94f04c31c.1516708205/fedora-08/build.log [07:44:39] darsto: bummer [07:45:10] what's the fix here then? [07:46:53] we could technically send a hint to the guest about max segments is should use [07:46:58] fio-3.3 is the only change I can think of that could be causing these problems - the commit to master that started showing these failures has nothing to do with the I/O path (setup.sh looking for python3) [07:47:26] then we need to support some more for I/O crossing 2MB boundary [07:47:47] but that requires changpeng latest SET_CONFIG msg [07:48:10] and ofc we need to check if kernel actually uses that hint [07:49:48] there's also this fairly frequent failure on fedora-05 - memory leak of host/port strdup() in spdK_vhost_dev_construct [07:56:07] would be great if we had a list of the last 20 test runs that had their -1's cleared by a maintainer - along with a timestamp [07:56:22] then everyone would have a ready list of intermittent test failures that need to be debugged [08:09:51] jimharris: that seg_max must be strictly obeyed, it's not just a hint. linux virtio-scsi respects it. I can see that QEMU sets seg_max to (128 - 2) by default [08:18:08] then how do we get 129 segments? [08:19:19] (128 - 2) + 1 [08:19:21] :) [08:20:03] 3+ iovectors split 2MB boundary? [08:30:39] ha [08:31:32] aren't the IOVs already split on 4KB page boundaries? I guess I'm thinking that if they are already split on 4KB boundaries, then there won't be any additional splitting on 2MB boundaries [08:44:03] then why would we need that splitting at all? [09:08:00] *** Quits: tomzawadzki (~tomzawadz@134.134.139.82) (Ping timeout: 260 seconds) [09:12:10] jimharris: regarding https://review.gerrithub.io/#/c/395711/ - can we just put the vbdev_unregister loop into bdev_unregister and get rid of the special vbdev_unregister? [09:31:06] darsto: we need the 2MB splitting - I'm just staying that if there's a case where a 2MB split is needed, you shouldn't be running into IOV limits [09:31:26] the IOV limits should be hitting when the payload is already split on page boundaries [09:32:47] drv: looking... [09:35:34] yeah - that should work [09:35:47] i'm not following why the assert needs to change to a !TAILQ_EMPTY there though? [09:35:58] probably not needed [09:37:06] moving the loop looks good to me - will you add a note making that suggestion? [09:37:48] scrubbing the Trello intermittent failures list - any issues with closing this one? [09:37:49] https://trello.com/c/Z7LaR4VU/23-iscsi-vm-failure [09:39:47] how far back are we keeping build results? [09:39:58] some of these intermittent failures point to logs that I can't get to anymore [09:41:04] I think sethhowe had to change it to purge after 30 days [09:41:09] we were using multiple terabytes of storage [09:41:13] bah [09:41:32] we have results on an internal server that isn't web accessible [09:41:44] archived, effectively [09:41:58] so if you need them, we can get to them [09:42:02] I tried modifying the URL to point to the internal one but that didn't work either [09:42:05] maybe I mistyped the URL [09:42:29] I was going to make a few suggestions to him to maybe zip up the release builds after 30 days instead of deleting [09:42:49] and only purge reviews after 30 days [09:43:14] the bulk of the storage issue was open reviews [09:46:23] darsto: i see your "caveman debugging" patch - love the title [09:47:35] it's full Linus style - add print statements, think very hard about problem, fix bug [09:54:58] darsto: I see what you are saying now - I forgot that virtio enforces the 128 max segments - so this error should be directly related to an extra split we induced on 2MB boundaries [09:55:40] did we recently upgrade guest kernels on fedora-05 and fedora-08, which might be related to differences in the I/O payload layouts? [09:59:09] darsto: running your patch through the pool again [09:59:41] jimharris: k, thanks [09:59:51] that's very unusual payload [10:03:55] this tests has blocksize_range set to 4k-512k with direct=1 - shouldn't that mean we could only get max 512k IO size to our vhost target? [10:04:44] if we're somehow getting IO sizes bigger than that, I could see where we could exceed this limit [10:06:15] how about io merging being involved? [10:07:12] possible [10:07:42] could you add to your patch to do a debug print when IO size is > 512KB? [10:09:26] kk [10:10:13] the remaining nightly test failure seems to be due to a new iSCSI digest test that doesn't work [10:10:16] iscsiadm -m node -p 127.0.0.1:3260 -o update -n 'node.conn[0].iscsi.DataDigest' -v None [10:10:16] iscsiadm: Cannot modify node.conn[0].iscsi.DataDigest. Invalid param name. [10:12:13] can you shoot an e-mail to xiaodong about that? [10:19:40] sure [10:40:01] jimharris: quite a lot of >512kB io [10:40:08] vhost_scsi.c: 524:task_data_setup: *ERROR*: io size > 512KB, task=0x7fc9451707b0, size = 400000 [10:40:22] (size is in hex) [11:56:01] 4MB - interesting [12:30:21] *** Joins: tomzawadzki (~tomzawadz@134.134.139.82) [13:03:21] *** Quits: tomzawadzki (~tomzawadz@134.134.139.82) (Ping timeout: 264 seconds) [13:14:38] *** Joins: tomzawadzki (~tomzawadz@192.55.54.44) [13:20:31] *** Joins: tzawadzki (~tomzawadz@192.55.54.45) [13:20:32] *** Quits: tomzawadzki (~tomzawadz@192.55.54.44) (Remote host closed the connection) [13:50:21] *** Quits: tzawadzki (~tomzawadz@192.55.54.45) (Ping timeout: 256 seconds) [14:33:58] ping this one should be pretty easy: (famous last words) https://review.gerrithub.io/#/c/386186/ [14:37:28] *** Quits: gila (~gila@5ED4D9C8.cm-7-5d.dynamic.ziggo.nl) (Quit: My Mac Pro has gone to sleep. ZZZzzz…) [14:57:19] bwalker: can you check my comments on https://review.gerrithub.io/#/c/395555/ - would like to get this in so we can get the later python patch committed too (before rpc.py changes again) [14:59:05] yep on it [14:59:44] jimharris, bwalker: you both previously +2'd this, and I asked for a few small mods that Ziye made: https://review.gerrithub.io/#/c/394674/ [15:01:28] drv: done [15:04:10] vhost fix: https://review.gerrithub.io/#/c/396058/ [15:04:22] darsto, pwodkowx: please review - thanks! [15:04:26] (you too drv) [15:09:01] having unit tests for that function made that patch super easy [15:28:14] drv and I talked about that patch that breaks up rpc.py [15:28:26] it's just the very beginning of what really needs to happen [15:28:54] those files I created need to become more like a library - i.e. each function shouldn't take the argparse 'args' parameter [15:29:03] but one step at a time [15:51:04] *** Joins: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) [16:57:45] *** Quits: lhodev (~Adium@66-90-218-190.dyn.grandenetworks.net) (Quit: Leaving.) [18:40:32] Hi Jim, I asked you about JSON file idea but I found that idea in Trello. I'll use Trello and Gerrit. Thanks. [23:34:07] *** Joins: ziyeyang_ (~ziyeyang@192.55.54.44)