[00:59:12] *** Joins: tkulasek (~tkulasek@192.55.54.44) [01:30:09] *** Joins: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) [02:39:49] *** Quits: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) (Ping timeout: 260 seconds) [03:37:07] *** Quits: tkulasek (~tkulasek@192.55.54.44) (Remote host closed the connection) [04:18:37] *** Joins: tomzawadzki (tomzawadzk@nat/intel/x-ahjrwsnyozqfvwoo) [04:19:05] *** Quits: dlw (~Thunderbi@114.255.44.143) (Ping timeout: 240 seconds) [05:14:41] *** Joins: dlw (~Thunderbi@114.246.95.117) [06:09:24] *** Quits: dlw (~Thunderbi@114.246.95.117) (Ping timeout: 260 seconds) [07:46:34] *** Quits: tomzawadzki (tomzawadzk@nat/intel/x-ahjrwsnyozqfvwoo) (Ping timeout: 264 seconds) [08:27:12] *** Joins: travis-ci (~travis-ci@ec2-54-162-215-241.compute-1.amazonaws.com) [08:27:13] (spdk/master) scripts/rpc.py: Handle socket connection error (Karol Latecki) [08:27:13] Diff URL: https://github.com/spdk/spdk/compare/b066126b0b9a...3142d978c35e [08:27:13] *** Parts: travis-ci (~travis-ci@ec2-54-162-215-241.compute-1.amazonaws.com) () [08:30:21] *** Joins: travis-ci (~travis-ci@ec2-54-166-6-76.compute-1.amazonaws.com) [08:30:22] (spdk/master) iscsi: do static initialization of globals that allow it (Pawel Wodkowski) [08:30:22] Diff URL: https://github.com/spdk/spdk/compare/3142d978c35e...e548df4ed1de [08:30:22] *** Parts: travis-ci (~travis-ci@ec2-54-166-6-76.compute-1.amazonaws.com) () [08:45:40] *** Joins: travis-ci (~travis-ci@ec2-54-162-215-241.compute-1.amazonaws.com) [08:45:41] (spdk/master) blob: fix type in spdk_blob_resize() declaration (Daniel Verkamp) [08:45:41] Diff URL: https://github.com/spdk/spdk/compare/e548df4ed1de...6a30dd6f8bf6 [08:45:41] *** Parts: travis-ci (~travis-ci@ec2-54-162-215-241.compute-1.amazonaws.com) () [09:43:47] lhodev: The behavior on current hardware is that it machine checks. The DMA engines can't handle the latency introduced by a page fault recovery, so it doesn't even try. [10:14:29] *** Quits: mphardy (~mphardy@pool-72-83-7-2.washdc.fios.verizon.net) (Ping timeout: 256 seconds) [10:15:59] *** Joins: mphardy (~mphardy@pool-72-83-7-2.washdc.fios.verizon.net) [10:22:48] *** Joins: travis-ci (~travis-ci@ec2-54-162-215-241.compute-1.amazonaws.com) [10:22:49] (spdk/master) test/nvmf: update test scripts from ifconfig to iproute2 (Tomasz Zawadzki) [10:22:49] Diff URL: https://github.com/spdk/spdk/compare/6e9293eacae7...b7f049070d0b [10:22:49] *** Parts: travis-ci (~travis-ci@ec2-54-162-215-241.compute-1.amazonaws.com) () [10:24:01] *** Joins: travis-ci (~travis-ci@ec2-54-87-70-72.compute-1.amazonaws.com) [10:24:02] (spdk/master) scripts/rpc.py: pass named args to lvol.py (Daniel Verkamp) [10:24:02] Diff URL: https://github.com/spdk/spdk/compare/b7f049070d0b...7d45cfc3ccd9 [10:24:02] *** Parts: travis-ci (~travis-ci@ec2-54-87-70-72.compute-1.amazonaws.com) () [10:26:31] bwalker: That is consistent with my expectations. In conclusion, do I understand correctly then that if one chooses not to employ hugepages for DMA bufs and we do have an IOMMU and it's enabled (presumably using vfio-pci), that any memory bufs allocated for DMA xfers would require explicit mlock() calls? I would assume in that case, the mlock(), would be performed somewhere within the call chain of things like spdk_dma_malloc() [10:26:31] such that the user application, itself, did not need to call mlock() directly, huh? [10:32:16] yes - if you were not using hugepages you'd need to mlock all pages [10:32:30] and the most convenient place would be in spdk_dma_malloc [10:33:33] spdk_dma_malloc does not call mlock today, and I don't think DPDK is calling mlock on its internal memory either. That could be confirmed though. [10:33:44] these are things we'll have to work through as DPDK's memory model becomes more dynamic [10:34:48] *** Joins: travis-ci (~travis-ci@ec2-54-162-215-241.compute-1.amazonaws.com) [10:34:49] (spdk/master) rpc: reword doc comments (Daniel Verkamp) [10:34:49] Diff URL: https://github.com/spdk/spdk/compare/7d45cfc3ccd9...90922c60df40 [10:34:49] *** Parts: travis-ci (~travis-ci@ec2-54-162-215-241.compute-1.amazonaws.com) () [10:48:00] *** Joins: travis-ci (~travis-ci@ec2-54-91-135-84.compute-1.amazonaws.com) [10:48:01] (spdk/master) test/vhost: add live migration test case 2 (Pawel Wodkowski) [10:48:01] Diff URL: https://github.com/spdk/spdk/compare/90922c60df40...6ebfbf735124 [10:48:01] *** Parts: travis-ci (~travis-ci@ec2-54-91-135-84.compute-1.amazonaws.com) () [10:48:50] *** Joins: travis-ci (~travis-ci@ec2-54-87-70-72.compute-1.amazonaws.com) [10:48:51] (spdk/master) nvme: Rmove all uses of strncpy (Ben Walker) [10:48:51] Diff URL: https://github.com/spdk/spdk/compare/6ebfbf735124...aedbb3b81aab [10:48:51] *** Parts: travis-ci (~travis-ci@ec2-54-87-70-72.compute-1.amazonaws.com) () [10:55:16] mlock? [10:55:51] wouldn't we still need to use the vfio MAP_DMA ioctls? [10:56:24] oh, you're right [10:56:28] you have to program the IOMMU [10:56:32] and that does the equivalent of mlock internally [10:56:59] and mlock only guarantees its resident, not that its pinned [10:57:41] drv: for these passthrough RPCs - what do you think about naming the parameters "base_bdev_name" and "passthrough_bdev_name"? [10:58:27] sure, sounds better to me - or "passthru" if you want to be consistently wrong :) [10:58:28] I know what "vbdev_name" is by looking at it, but i'm concerned it might be confusing or unclear [10:58:34] ha [10:58:52] also, I agree with John's comment on the mailing list - we should keep passthru as simple as possible so it serves as a good example [10:59:04] (although the RPC probably does fit into what should be part of the example) [11:00:46] maybe we should have a new module to fit gang's use case [11:01:00] agreed [11:01:09] even though a lot of it is duplicate - this new module could be a lot simpler - i.e. use the spdk_bdev_part API [11:02:53] are we set up to support out of tree bdevs yet? [11:04:19] define "support" [11:04:28] :) [11:04:55] like you specify a few config parameters to point to the library that contains your bdev module and it works [11:06:12] no - we don't have any hooks like that yet [11:06:21] I just added it to trello [11:06:43] there's an existing card about separating out the bdev module interface into its own header, I think [11:06:51] yeah, saw that one [11:06:55] that's also probably required [11:12:15] peluse: What's the use case for that init_complete call in the bdev library? [11:12:45] device aggregation (i.e. RAID) [11:13:24] it's a hint to the bdev module that no more examine calls are coming, and it can make some kind of decision on any partially discovered volumes [11:14:01] couldn't it just create a volume as a partial when it sees the first device, then add devices as it sees them until it is complete? [11:14:13] because it has to handle that scenario anyway with hotplug/remove, device failures, etc. [11:14:35] so relying on the hint doesn't seem like it will help anything [11:14:56] like if you hot insert two disks that are a RAID 1 - it won't get the hint but needs to handle that [11:17:42] if it's a 2-disk RAID-1 - when the first disk is inserted, we don't want to immediately register it as a bdev - ideally we'd like to wait until we see the second disk [11:17:57] so that we don't have to start rebuild when the second disk is inserted [11:18:04] so we could set a timer to wait for the second disk [11:18:25] but then we have to wait for the timer to expire before we can continue past bdev subsystem initialization [11:18:25] but, how do you know a 2nd disk is going to be inserted? [11:18:29] you don't [11:18:59] but at least if this init_complete callback is invoked, you know you might as well proceed with treating it as a degraded RAID-1 [11:19:12] jimharris/bwalker: couple of lvol test patches that look good to me, just need another review: https://review.gerrithub.io/#/c/406697/ [11:19:44] what rebuild would you do if the second disk was inserted? If a RAID 1 is broken, do you even allow writes? [11:20:17] could always track if there have been any writes at all since it was discovered, and if there weren't don't rebuild [11:20:22] just in memory is fine for that [11:20:46] on hot-insert, i don't think RAID stacks necessarily register/surface a degraded volume automatically if it only sees a subset of the member disks [11:21:25] so maybe keep a raid volume data structure and claim the underlying bdev, but don't create a bdev until they're all seen [11:21:31] that's even easier - then you for sure don't have to rebuild [11:21:38] yes [11:22:08] I think relying on the hint just doesn't work because you have to deal with hot insert anyway [11:22:20] so you may as well write one code path that works for any examine [11:22:22] i think raid implementations can vary widely on how they handle this scenario [11:22:58] If we had a raid implementation, I'd vote don't automatically surface degraded volumes. But add an RPC to request them to surface. [11:24:00] but i think at least some existing raid implementations today will surface the degraded volume automatically at boot/initialization - but won't if it's the result of a later hot-insert [11:24:37] i'm not saying i disagree with you - i'm just saying that adding the init_complete callback at least enables this kind of implementation (and is pretty low touch for spdk) [11:25:05] I just want to minimize the surface area of the bdev module API [11:25:29] I don't want to make it too complicated to implement one by adding more stuff, unless it is an important feature [12:52:43] jimharris: looks like a build failed with the same rocksdb manifest file corruption: https://ci.spdk.io/spdk/builds/review/cbc0bb515d2198271c36e326aacc6a0051149f73.1523561984/fedora-04/rocksdb/writesync_db_bench.txt [12:53:09] this is bwalker's https://review.gerrithub.io/407355 - I think that is rebased on top of the blobfs fix [12:57:20] interesting - it is a bit different actually - previously it would report files that were in the MANIFEST but not on disk [12:57:36] but this one is showing a different size for one of the files [12:57:42] i'll try to reproduce [13:10:04] *** Joins: travis-ci (~travis-ci@ec2-54-91-135-84.compute-1.amazonaws.com) [13:10:05] (spdk/master) vhost: move memory registration to DPDK thread (Dariusz Stojaczyk) [13:10:06] Diff URL: https://github.com/spdk/spdk/compare/aedbb3b81aab...6820d312e189 [13:10:06] *** Parts: travis-ci (~travis-ci@ec2-54-91-135-84.compute-1.amazonaws.com) () [13:52:06] *** Joins: David_ (819d4527@gateway/web/freenode/ip.129.157.69.39) [13:52:30] *** Quits: David_ (819d4527@gateway/web/freenode/ip.129.157.69.39) (Client Quit) [13:53:43] *** Joins: mshirley (~mshirley@inet-hqmc05-o.oracle.com) [14:12:37] *** Quits: sethhowe (~sethhowe@134.134.139.76) (Remote host closed the connection) [14:14:09] *** Joins: sethhowe (~sethhowe@192.55.54.42) [14:53:36] *** Joins: Tracy (0cda5282@gateway/web/cgi-irc/kiwiirc.com/ip.12.218.82.130) [14:55:13] Is iodepth=xx in fio configuration file per spdk drive? [15:08:12] iodepth with the SPDK fio plugin works the same way as in normal fio configuration: http://fio.readthedocs.io/en/latest/fio_doc.html#i-o-depth [15:09:06] it is per "file", which is an NVMe namespace in the case of examples/nvme/fio_plugin [15:14:50] and for examples/bdev/fio_plugin iodepth=x is per job [15:19:45] Thanks @drv and darsto. So if iodepth=4, numjob=5 and there are two SPDK drive, will each drive receive 20 IOs? [15:23:16] does each job specify both devices? [15:23:35] usually people write a job to only run on one device [15:35:13] The iodepth and numjob are in global section [15:39:42] *** Joins: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) [16:30:08] hmm, the vhost migration-tc2 test added over a minute to the test time [16:30:22] I wonder if that should be in the nightly test instead (or shortened somehow) [16:31:30] *** Joins: travis-ci (~travis-ci@ec2-54-87-70-72.compute-1.amazonaws.com) [16:31:31] (spdk/master) test/lvol: Snapshot and clone test cases for lvol feature. (Tomasz Kulasek) [16:31:31] Diff URL: https://github.com/spdk/spdk/compare/be0eef0a0d4a...ebf079362b28 [16:31:31] *** Parts: travis-ci (~travis-ci@ec2-54-87-70-72.compute-1.amazonaws.com) () [16:31:39] *** Joins: ollarjona (~jacvvfsp@187.73.231.87) [16:33:41] *** Quits: ollarjona (~jacvvfsp@187.73.231.87) (Remote host closed the connection) [17:52:49] *** Quits: mshirley (~mshirley@inet-hqmc05-o.oracle.com) (Remote host closed the connection) [17:58:49] *** Joins: dlw (~Thunderbi@114.255.44.143) [18:27:19] *** Quits: Tracy (0cda5282@gateway/web/cgi-irc/kiwiirc.com/ip.12.218.82.130) (Quit: http://www.kiwiirc.com/ - A hand crafted IRC client) [20:40:09] *** Quits: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) (Ping timeout: 260 seconds) [21:40:13] *** Joins: mshirley (~mshirley@c-24-22-29-66.hsd1.or.comcast.net) [21:54:22] *** Joins: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) [22:32:46] *** Quits: mshirley (~mshirley@c-24-22-29-66.hsd1.or.comcast.net) (Ping timeout: 264 seconds) [22:40:53] Tracy: iodepth=4 numjobs=5 will give you 20 iodepth total (per job). If you specified filename=BdevA:BdevB, this will give you roughly 10 iodepth per bdev [22:41:13] and that's assuming these two bdevs have the same performance [23:59:33] *** Joins: tomzawadzki (tomzawadzk@nat/intel/x-bdnblepmxzignktw)