[01:39:45] *** Joins: dlw1 (~Thunderbi@114.255.44.143)
[01:39:58] *** Quits: dlw (~Thunderbi@114.255.44.143) (Read error: Connection reset by peer)
[01:39:58] *** dlw1 is now known as dlw
[01:40:27] *** Quits: stefanha` (~stefanha@yuzuki.vmsplice.net) (Ping timeout: 240 seconds)
[01:40:34] *** Joins: stefanha (~stefanha@yuzuki.vmsplice.net)
[03:45:18] *** Quits: dlw (~Thunderbi@114.255.44.143) (Ping timeout: 256 seconds)
[04:14:59] <darsto> jimharris: replied. I'll try to come up with some extra vhost checks tomorrow
[06:27:04] <jimharris> thanks darsto - i remembered the qemu/nvdimm memory registration issues you hit - i just didn't remember how exactly they got "resolved"
[06:55:36] <peluse> FYI community meting in just over an hour from now.  Different WebEx info than usual, see https://trello.com/b/DvM7XayJ/spdk-community-meeting-agenda for details
[07:57:56] *** Joins: tomzawadzki (~tomzawadz@192.55.54.44)
[08:01:15] *** Joins: tkulasek (~tkulasek@134.134.139.75)
[08:38:38] *** Joins: tzawadzki (tomzawadzk@nat/intel/x-njyfzxkzgdrzticn)
[08:38:38] *** Quits: tomzawadzki (~tomzawadz@192.55.54.44) (Remote host closed the connection)
[08:48:47] <peluse> drv, I'm having a bit of a linux kernel driver issue with QAT, if you have some thoughts.... I have 2 different systems that were both working with QAT until I crashed them for various reasons.  On reboot neither system seems to be able to reload the kernel modules.
[08:48:54] <peluse> i get a bunch of shit like this in dmesg : qat_c62x: Unknown symbol adf_devmgr_add_dev (err 0)
[08:49:10] <peluse> and I confirmed the correct kernel version of the module and what I'm running
[08:50:37] <peluse> and when I manually try to insmod the modules I get "invalid symbol in module" or some other nonsense.  Have a bunch of meetings but if you, or anyone else has any ideas that'd be great
[08:51:20] <peluse> oh, and all of the adf* dmesg errors represent function names in the module I'm trying to load....
[09:28:03] *** Quits: tzawadzki (tomzawadzk@nat/intel/x-njyfzxkzgdrzticn) (Remote host closed the connection)
[09:28:12] *** Joins: tzawadzki (~tomzawadz@192.55.54.44)
[09:39:21] <peluse> tkulasek, you there?
[09:42:42] <tkulasek> yes
[09:47:57] <peluse> wrt your session creation comment, are you suggesting that I create my own pool of "created sessions" up front and then grab one and init it w/each crypto operation? Because regardless I do need a unique one for every outstanding IO correct? (I can just re-use without recreating is what I understood)
[09:52:02] <drv> peluse: the QAT driver thing sounds like maybe the QAT modules are compiled against a different version of the kernel or something like that?
[09:52:17] <drv> are the qat drivers upstream or are they a separate package that you have to build?
[09:55:46] <tkulasek> You need a pair of sessions for encoding and decoding. When you create it and initialize with a crypto device and xform, you may reuse it. As for a driver issue I didn't work with a QaT more than a year, but it was from 01.org, what I remember.
[09:58:02] <drv> it looks like at least some of the qat drivers are in the upstream kernel
[09:58:12] <drv> drivers/crypto/qat in the source tree
[09:58:20] <drv> so I'm not sure if peluse is using those or a different package
[10:00:53] *** Joins: travis-ci (~travis-ci@ec2-54-198-47-146.compute-1.amazonaws.com)
[10:00:54] <travis-ci> (spdk/master) bdev/qos: Break out code to destroy the qos into a separate function (Ben Walker)
[10:00:54] <travis-ci> Diff URL: https://github.com/spdk/spdk/compare/1d168901c6f2...6cd524d87c3a
[10:00:54] *** Parts: travis-ci (~travis-ci@ec2-54-198-47-146.compute-1.amazonaws.com) ()
[10:03:33] <peluse> drv, yeah the strange thing is everything was working until I rebooted and I didn't change any of the drivers.  I didn't build them orignially either, they were just there w/the kernel
[10:03:58] <drv> ok, if they're part of the normal kernel build, that sounds like it should "just work" then
[10:04:17] <peluse> and I didn't update anything before reboot either.  I did go through all the steps of unbinding, enabling VFs and binding to the DPDK drivers though
[10:05:07] <peluse> tkulasek, but I need a pair for each outstanding IO right?
[10:07:14] <drv> looks like that symbol you mentioned above is part of intel_qat.ko - does it work if you manually load that first?
[10:08:08] <peluse> let me check
[10:09:08] <peluse> tkulasek, the reason I mention that is because in the rte_cryptodev_sym_session_init() one of the parms is the cipher_xform which includes the address of the individual crypto operation
[10:11:46] <tkulasek> I need to look in the code to make sure
[10:16:28] <peluse> tkulasek, ok, thanks.  No hurry as I've still got a list of TODO items :)  Appreciate the inputs!
[10:24:25] <tkulasek> which parameter?
[10:24:26] <peluse> drv, yeah that's the first module that I was trying to load.  Note that when I first plugged this card in, I didn't have to insmod anything.  I did 'lsmod | grep qa' and it showed up
[10:25:40] * peluse is about ready to remove the card, reboot, add it back in and see if it comes up.  Akin to pissing on a spark plug (drv must get that reference...)
[10:30:49] <drv> jimharris: darsto posted a follow-up comment on https://review.gerrithub.io/#/c/spdk/spdk/+/410071/ (the RTE_BAD_IOVA patch)
[10:31:56] <jimharris> yeah - i saw it - i'm not sure what darsto has planned to fix it though - looking forward to seeing it :)
[10:32:40] <drv> I'm not sure I really understand the issue
[10:33:24] <drv> is it that rte_mem_virt2phy() returns 0, and then we try to load from the virtual address and it crashes?
[10:33:25] <jimharris> darsto was playing with QEMU, vhost and Clear Containers a while back - it will pass an emulated NVDIMM to the VM
[10:33:43] <jimharris> and QEMU will send that memory region in the SET_MEM_TABLE vhost message
[10:34:01] <drv> seems like if we get passed an address in memory registration that can't be dereferenced, then there was a problem somewhere earlier in the chain
[10:34:28] <jimharris> agreed - right now (before my patch), it "works" because DPDK returns BAD_IOVA and we're checking for 0
[10:34:44] <jimharris> so we don't try to touch the address and just return failure
[10:35:05] <jimharris> i agree - we need to get to the bottom of why we can't register the NVDIMM region in the vhost process
[10:37:04] <drv> is there a version where DPDK switched from returning 0 to returning BAD_PHYS_ADDR/BAD_IOVA, or was 0 always the wrong thing to look for?
[10:37:18] <drv> (0 as a physical address could theoretically be valid)
[10:42:57] *** Quits: tkulasek (~tkulasek@134.134.139.75) (Ping timeout: 240 seconds)
[10:55:50] <jimharris> yes - some guy named bwalker pushed patches to DPDK to change it from 0 to BAD_IOVA
[10:56:05] <jimharris> :-)
[10:59:11] <bwalker> sounds like something he would do
[11:00:33] <bwalker> you should ask him why - I'm curious to know
[12:49:39] <drv> bwalker: this intermittent failure looks related to the recent QoS changes: https://ci.spdk.io/spdk/builds/review/24b6526d7c9de70e8a880a71eab319cb6ef92761.1525807871/ubuntu16.04/build.log
[13:15:13] <jimharris> i need someone to hit me with a clue bat
[13:15:35] <jimharris> how does the nvmf default 128K max io size interoperate with the bdev...
[13:15:46] <jimharris> ...and as I type it, I know the answer
[13:15:56] <jimharris> nvmf has its own buffer pools
[13:20:21] <drv> yes
[13:20:34] <drv> currently, we don't use the bdev data buffer pools at all in nvmf
[13:21:17] <jimharris> yeah - looking at this github report on larger MDTS
[13:22:16] <drv> yeah, that one needs some better clarification - most of the replies are just confusing matters
[13:22:45] <drv> I think the patch that the submitter posted changes more stuff than necessary to enable MDTS of 512 KB
[13:23:46] <drv> (and the patch actually changes it to 2 MB, not 512 KB)
[13:25:53] <drv> it should be possible to just set MaxIOSize in the conf file with no code changes, then run this nvme-cli test
[13:27:36] <drv> (if we want to test with the SPDK NVMe-oF host code, that would need more code changes, but I don't think that's what the submitter is testing)
[13:40:28] <jimharris> oh - i see the problem - the 512KB gets through nvmf and bdev OK - but the backing NVMe SSD rejects since its MDTS is 256K
[13:40:50] <jimharris> well, no - the nvme driver should split it in that case
[13:51:32] <drv> we should at least put together a simpler repro script - it should be possible to do it with SoftRoCE
[13:52:05] <bwalker> I'm set up to run in loopback with soft roce if needed
[13:52:19] <bwalker> trying to get through a few code reviews before I plow through the github issues
[13:52:19] <jimharris> oh - no, the nvme driver won't split it if its io passthru
[13:59:51] <drv> yeah, I'm leaning toward not supporting NVMe I/O passthru commands at all in nvmf
[14:00:00] <drv> it really seems like we can't correctly do it
[14:01:50] <drv> almost tempting to put back in Direct mode or something equivalent to it and remove NVMe passthru from the bdev controller
[14:02:18] <drv> and limit direct mode so that it directly exposes the mdts, etc. of the underlying controller and can only be attached from a single host at a time
[14:07:33] *** Quits: tzawadzki (~tomzawadz@192.55.54.44) (Remote host closed the connection)
[14:25:37] <bwalker> I think we need to document which commands will be automatically split and which won't
[14:25:39] <bwalker> at a minimum
[14:29:13] <jimharris> hmmm, so on the initiator its doing nvme io-passthru, but once it gets to the target won't we just treat it as a normal write?
[14:30:09] <bwalker> he's not doing writes - he's doing vendor specific commands I think
[14:30:19] <bwalker> if you look at his original posting
[14:30:28] <jimharris> originally he was - but the latest post he's doing OPC=0x1
[14:31:14] <bwalker> in the real write case, it's going to call spdk_bdev_write_blocks
[14:31:19] <bwalker> which will translate to a regular nvme write call
[14:31:22] <bwalker> no passthru involved
[14:31:27] <jimharris> and the driver should split it
[14:31:30] <bwalker> yep
[14:50:05] <jimharris> bwalker: for the nvme-of perf data you collected with vishal last week - how many queue pairs was that spread across?
[14:50:16] <jimharris> and it was raw nvme - no lvol?
[14:50:26] <bwalker> for the 4.2M number?
[14:50:33] <jimharris> yes
[14:51:12] <bwalker> 4 subsystems each with 1 namespace (malloc bdev). 4 initiators in a 1:1 mapping to the subsystems, each with 1 qpair
[14:51:36] <jimharris> ok
[14:52:39] <bwalker> the system only has 8 NVMe devices, so it can't get up to 4.2M using real SSDs
[14:53:23] <bwalker> doing the I/O to the SSDs is cheaper than malloc in a number of ways, so if it had enough PCIe bandwidth and NVMe SSDs attached I have no doubt it could get that done.
[14:53:30] <bwalker> I don't have IOAT enabled
[16:56:10] *** Joins: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97)
[17:07:49] <peluse> ugh
[18:19:00] *** Joins: dlw (~Thunderbi@114.255.44.143)
[19:11:34] *** Quits: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) (Ping timeout: 260 seconds)
[22:58:01] *** Joins: dlw1 (~Thunderbi@114.255.44.143)
[22:58:01] *** Quits: dlw (~Thunderbi@114.255.44.143) (Read error: Connection reset by peer)
[22:58:02] *** dlw1 is now known as dlw