[00:52:51] *** Joins: travis-ci (~travis-ci@ec2-52-90-34-207.compute-1.amazonaws.com)
[00:52:52] <travis-ci> (spdk/master) bdev: remove the unnecessary spdk_bdev_finish call. (Ziye Yang)
[00:52:53] <travis-ci> Diff URL: https://github.com/spdk/spdk/compare/267f09dea0ef...39c4f95b35d1
[00:52:53] *** Parts: travis-ci (~travis-ci@ec2-52-90-34-207.compute-1.amazonaws.com) ()
[01:43:23] *** Joins: travis-ci (~travis-ci@ec2-54-80-0-128.compute-1.amazonaws.com)
[01:43:23] <travis-ci> (spdk/master) lib/bdev: Assert if there is no outstanding IO after completion with ENOMEM (Wojciech Malikowski)
[01:43:24] <travis-ci> Diff URL: https://github.com/spdk/spdk/compare/39c4f95b35d1...f1da65ef283b
[01:43:24] *** Parts: travis-ci (~travis-ci@ec2-54-80-0-128.compute-1.amazonaws.com) ()
[03:36:40] <spdk-jenkins-bot> Project autotest-nightly build #426: STILL FAILING in 29 min. See https://ci.spdk.io/spdk-jenkins for results.
[03:53:01] <spdk-jenkins-bot> Project autotest-nightly-failing build #295: STILL FAILING in 39 min. See https://ci.spdk.io/spdk-jenkins for results.
[04:59:42] *** Joins: Mic92 (~Mic92@mail.thalheim.io)
[05:02:58] <Mic92> I am currently porting SPDK to run inside SGX enclaves and stumbled over the `nvme_sigbus_fault_sighandler`. It accesses the nvme controller via thread-local storage. However I wonder how this is supposed to work because the thread, who sets g_thread_mmio_ctrlr might be not the same that is scheduled to handle the signal.
[06:08:54] <darsto> Mic92: oh, you're absolutely right
[06:10:15] <darsto> signal(7): A process-directed signal may be delivered to any
[06:10:16] <darsto>        one of the threads that does not currently have the signal blocked.
[06:10:16] <darsto>        If more than one of the threads has the signal unblocked, then the
[06:10:16] <darsto>        kernel chooses an arbitrary thread to which to deliver the signal.
[06:20:40] <darsto> DPDK 18.11+ also has a sigbus handler and finds the respective pci device by iterating through all available devices and comparing their bar addresses against the faulty address
[06:21:59] <darsto> SPDK still supports DPDK versions < 18.11 so it can't just utilize the DPDK handler
[06:22:36] <darsto> I'm afraid we'll have to implement the exact same functionality in SPDK
[06:23:05] <darsto> maybe jimharris has some more input
[06:52:22] <Mic92> darsto: What is actually the purpose of this handler? Why can the controller not be mapped upfront?
[06:54:50] <Mic92> For my use-case it would be great, if I could do this upfront. Since I run code in SGX enclave and are therefore limited w.r.t. syscalls.
[06:55:04] <Mic92> I guess I could issue a read before running the enclave.
[07:06:53] *** Joins: travis-ci (~travis-ci@ec2-54-89-106-93.compute-1.amazonaws.com)
[07:06:54] <travis-ci> (spdk/master) test: add mem_callbacks unit test (Jim Harris)
[07:06:54] <travis-ci> Diff URL: https://github.com/spdk/spdk/compare/f1da65ef283b...3fc824c834ba
[07:06:54] *** Parts: travis-ci (~travis-ci@ec2-54-89-106-93.compute-1.amazonaws.com) ()
[08:14:12] <lhodev> Someone online that might answer a question regarding spdk_nvme_ctrlr_get_num_ns() ?
[08:25:07] *** Joins: vmysak (vmysak@nat/intel/x-ygabwpedkdpuyvwv)
[08:43:44] <lhodev> It's moreover a question of WHY ctrlr->num_ns has the value of MaxNamespaces for an object pointing to a NVM subsystem over a Fabric as opposed to just the actual number of discovered namespaces.
[08:44:35] <lhodev> It appears this only happens if and only if MaxNamespaces was set for that subsystem.   If not set, then ctrlr->num_ns does hold the value of the actual number of namespaces.
[08:57:32] *** Quits: vmysak (vmysak@nat/intel/x-ygabwpedkdpuyvwv) (Ping timeout: 272 seconds)
[08:57:59] *** Joins: vmysak (~vmysak@192.55.54.40)
[09:02:34] *** Quits: vmysak (~vmysak@192.55.54.40) (Ping timeout: 250 seconds)
[09:08:06] <darsto> Mic92: the purpose of that handler is to remap pci bars after a device gets hotremoved
[09:09:15] <darsto> the hotremove is detected elsewhere and remapping the bars just allows it to proceed in its normal shutdown cycle
[09:09:33] <darsto> i see the github issue, thanks
[09:48:36] <bwalker> lhodev: the spec isn't really intuitive regarding num_ns
[09:48:45] <bwalker> one sec need to look at the code and then I can give you an answer
[09:49:08] <lhodev> bwalker:  Thank you!
[09:49:41] <bwalker> ok, confirmed what I was thinking
[09:49:55] <bwalker> so in the NVMe specification, there are subsystems
[09:50:05] <bwalker> which contain what is effectively an array of namespaces
[09:50:11] <bwalker> where the index into the array is the namespace id
[09:50:19] <bwalker> that's by spec - not by SPDK design
[09:50:32] <bwalker> the array does not need to be entirely populated
[09:50:53] <bwalker> that ctrlr->num_ns value is our internally cached value of the 'nn' field from the identify data
[09:51:01] <bwalker> which is the length of the array - not the number of namespaces
[09:51:46] *** Joins: vmysak (~vmysak@192.55.54.40)
[09:52:34] <bwalker> the part that trips everyone up, myself included, is that it isn't obvious that the array of namespaces can be sparsely populated
[09:52:52] <bwalker> the way the spec talks about it is that some namespaces can be "inactive", but that means they don't exist
[09:53:30] <bwalker> so it's totally possible to do a discover on a controller and find two namespaces on a device that can have up to 4
[09:53:53] <bwalker> if the device supports namespace management, for example
[10:01:06] <lhodev> So how does SPDK handle the case for namespace mgmt?   That is, how does it size the array if a NVMe controller supports namespace mgmt?
[10:02:02] <bwalker> well in a physical device, the array is whatever size is reported in the 'NN' value, which is ctrlr->num_ns for us
[10:02:16] <bwalker> but in nvme-of, we're emulating a controller
[10:02:25] <bwalker> and we may get RPCs that change the number of namespaces on us
[10:02:46] <bwalker> so in response to those RPCs, we dynamically reallocate the namespace array
[10:02:50] <bwalker> if necessary
[10:03:08] <bwalker> if the user adds a bdev to an nvmf subsystem (bdev == namespace in our design), they have two options
[10:03:13] <bwalker> they can either explicitly select the nsid
[10:03:18] <bwalker> or they can let us select one
[10:03:34] <bwalker> if they let us select one, we'll pick the lowest open slot. If there are no open slots, we'll make the array one slot bigger.
[10:04:01] <bwalker> if they explicitly select the nsid, if it isn't already occupied, we will resize the array to include that nsid if it doesn't already
[10:04:44] <bwalker> so you could do something silly if you wanted to and explicitly select 100 as your nsid for the first namespace added to a subsystem
[10:05:00] <bwalker> and spdk_nvme_ctrlr_get_num_ns() should return 101 (length of array is 0 to 100)
[10:05:02] <lhodev> The bottom line:    one should never use spdk_nvme_ctrlr_get_num_ns() as a means to determine the number of CURRENT namespaces.
[10:05:08] <bwalker> correct
[10:05:28] <bwalker> by CURRENT I assume you mean ACTIVE namespaces. The specification defines states for namespaces to be in
[10:05:35] <bwalker> I can look those up real quick - there's like 3 or 4
[10:06:26] <lhodev> Instead, the paradigm should be to use spdk_nvme_ctrlr_get_first_active_ns() and spdk_nvme_ctrlr_get_next_active_ns() to count/walk the, yes, ACTIVE, namespaces.
[10:06:33] <bwalker> exactly
[10:07:11] <bwalker> my terminology actually isn't exactly right now that I look at the spec
[10:07:23] <bwalker> it's not a namespace that can be in various states, but a "namespace id"
[10:07:26] <bwalker> i.e. nsid
[10:07:40] <bwalker> section 6.1 of NVMe 1.3c is where it's defined
[10:07:49] <lhodev> Gonna pull that up right now.
[10:08:01] <bwalker> https://nvmexpress.org/wp-content/uploads/NVM-Express-1_3c-2018.05.24-Ratified.pdf
[10:08:41] <bwalker> so in reality that ctrlr->num_ns value is the number of valid namespace ids, not the number of namespaces
[10:08:41] <lhodev> I already have it open in Preview, lol.   Just need to get to the appropriate section ;-)
[10:09:03] <bwalker> it's a little bit tough because a lot of this was retrofitted into the spec later on
[10:09:24] <bwalker> and some of the SPDK APIs were already defined and we don't want to change them
[10:09:38] <lhodev> Such that semantically they'd make more sense?
[10:09:44] <lhodev> (sigh)
[10:10:00] <bwalker> yeah - we could give spdk_nvme_ctrlr_get_num_ns() a better name
[10:10:19] <bwalker> in NVMe 1.0 and maybe 1.1, the number of NSIDs == number of namespaces I think
[10:10:37] <bwalker> there wasn't this distinction
[10:10:53] <bwalker> but that function is one of the original ones - more than 5 years old
[10:12:56] <lhodev> Maybe we could at least alter the Doxygen comments to help make this more understandable?
[10:13:31] <bwalker> yes definitely
[10:15:45] <lhodev> On an entirely unrelated note, were you able to get Karol (sp?) to build the SPDK 18.10.1 + DPDK 18.11 vanilla + single malloc patch built and run through the CI ?
[10:20:50] <bwalker> not yet - we have been working the DPDK release schedule on our end though
[10:20:59] <bwalker> I should have some updates soon
[10:23:32] <lhodev> *confused*  Exactly what do you mean by "...working the DPDK release schedule"?    Is that a reference to when we move our fork of the DPDK to a newer version, determine which SPDK-originated patches were accepted and which ones must be cherry-pick'd forward?
[10:24:17] <bwalker> there's a chance we could get you the 18.08.1 you wanted. Or at a minimum the 18.11.1 official which contains that extra patch you need
[10:24:51] <bwalker> but I'm actually going to kick off the test as you described above now just in case
[10:27:09] <lhodev> I really appreciate that.   I need to update some internal status on projected dates and this is a big one for me.
[10:59:36] *** Joins: travis-ci (~travis-ci@ec2-54-89-106-93.compute-1.amazonaws.com)
[10:59:37] <travis-ci> (spdk/v18.10.1-oracle) SPDK 18.10.1 (Tomasz Zawadzki)
[10:59:38] <travis-ci> Diff URL: https://github.com/spdk/spdk/compare/v18.10.1-oracle
[10:59:38] *** Parts: travis-ci (~travis-ci@ec2-54-89-106-93.compute-1.amazonaws.com) ()
[11:11:08] <bwalker> lhodev: I think this will get the test results you want: https://review.gerrithub.io/c/spdk/spdk/+/447841
[11:11:10] <bwalker> we'll see
[11:34:18] *** Quits: tomzawadzki (uid327004@gateway/web/irccloud.com/x-favanabcsxqrvzbb) (Quit: Connection closed for inactivity)
[12:32:34] <peluse> jimharris, not sure you saw my note this morn about removing pmem_msync running for 4 hrs on the compress write test, just got it to fail :(
[12:32:43] *** Joins: travis-ci (~travis-ci@ec2-54-89-106-93.compute-1.amazonaws.com)
[12:32:44] <travis-ci> (spdk/v19.01.x) fio_plugin: fix hang in FIO (Piotr Pelplinski)
[12:32:44] <travis-ci> Diff URL: https://github.com/spdk/spdk/compare/c828d09d3a25...a6f10a33cb31
[12:32:44] *** Parts: travis-ci (~travis-ci@ec2-54-89-106-93.compute-1.amazonaws.com) ()
[12:34:11] <jimharris> i think i missed it - was it an e-mail or irc?
[12:35:38] <jimharris> you're saying that if you remove the pmem_msync, it runs for much longer, but still eventually fails after 4 hours?
[12:41:45] *** Joins: travis-ci (~travis-ci@ec2-52-201-242-216.compute-1.amazonaws.com)
[12:41:46] <travis-ci> (spdk/v19.01.x) spdkcli: Exit with 1 when rpc throws JSONRPCException (Pawel Kaminski)
[12:41:46] <travis-ci> Diff URL: https://github.com/spdk/spdk/compare/a6f10a33cb31...85d6682dd491
[12:41:46] *** Parts: travis-ci (~travis-ci@ec2-52-201-242-216.compute-1.amazonaws.com) ()
[12:43:40] *** Joins: travis-ci (~travis-ci@ec2-52-201-242-216.compute-1.amazonaws.com)
[12:43:41] <travis-ci> (spdk/master) fio_plugin: don't submit the IO if got DIF context error (Changpeng Liu)
[12:43:41] <travis-ci> Diff URL: https://github.com/spdk/spdk/compare/3fc824c834ba...5a051a6c1b5a
[12:43:41] *** Parts: travis-ci (~travis-ci@ec2-52-201-242-216.compute-1.amazonaws.com) ()
[12:44:28] *** Joins: travis-ci (~travis-ci@ec2-54-167-172-211.compute-1.amazonaws.com)
[12:44:29] <travis-ci> (spdk/v19.01.x) blob: pass NULL or SPDK_BLOBID_INVALID when bserrno != 0 (Jim Harris)
[12:44:29] <travis-ci> Diff URL: https://github.com/spdk/spdk/compare/85d6682dd491...cf0d9530447f
[12:44:29] *** Parts: travis-ci (~travis-ci@ec2-54-167-172-211.compute-1.amazonaws.com) ()
[12:49:08] *** Joins: travis-ci (~travis-ci@ec2-34-238-139-82.compute-1.amazonaws.com)
[12:49:09] <travis-ci> (spdk/master) bdev: don't allow multiple unregister calls (Pawel Wodkowski)
[12:49:10] <travis-ci> Diff URL: https://github.com/spdk/spdk/compare/5a051a6c1b5a...0fe8cd17111f
[12:49:10] *** Parts: travis-ci (~travis-ci@ec2-34-238-139-82.compute-1.amazonaws.com) ()
[13:44:42] <bwalker> lhodev: SPDK 18.10.1 fails to compile against DPDK 18.11. I'll look into it a bit here, but we may need to produce a minimal 18.10.2 in order to support this configuration.
[13:46:27] <lhodev> bwalker:   interesting.   Ok.   Shall we stay the course (for now) of targeting vanilla DPDK 18.11 + malloc patch?   Or, do you think it's better if we go for vanilla DPDK 18.11.1?    I'm just a litttle concerned *when* the latter will finally be officially tagged.
[13:47:05] <bwalker> I think either way we need to produce an 18.10.2
[13:47:14] <bwalker> and the patches we need to pull in for that are almost certainly the same ones
[14:11:23] <lhodev> bwalker:   And then some.   When I took a look at the commits in DPDK 18.11.1, it did as you state pull in that malloc patch, and also has quite a few others.   Mind you, many of them are related to options in DPDK we're not using, but I think I recall at least one or maybe even two additional eal-related ones that might make an impact.
[14:20:20] <peluse> jimharris, I haven't done enough testing to say what makes it run longer/shorter.  Adding more debug code, getting closer I think maybe.
[14:34:07] *** Joins: travis-ci (~travis-ci@ec2-54-167-172-211.compute-1.amazonaws.com)
[14:34:08] <travis-ci> (spdk/master) test/lvol: Run all test cases. (Pawel Kaminski)
[14:34:08] <travis-ci> Diff URL: https://github.com/spdk/spdk/compare/0fe8cd17111f...bbf7627c31c2
[14:34:08] *** Parts: travis-ci (~travis-ci@ec2-54-167-172-211.compute-1.amazonaws.com) ()
[14:38:39] <bwalker> lhodev: I had to pull over just one commit to 18.10.x to get it to compile against DPDK 18.11
[14:38:43] <bwalker> so running the tests again now
[14:40:21] <lhodev> bwalker:   Cool.   So, are you doing this under another gerrit change-id?    Looks like the other you marked abandoned (https://review.gerrithub.io/#/c/spdk/spdk/+/447841/).
[14:40:58] <bwalker> https://review.gerrithub.io/c/spdk/spdk/+/447851
[14:41:04] <bwalker> switched branches so it made a new review
[14:42:40] *** Joins: travis-ci (~travis-ci@ec2-3-89-149-146.compute-1.amazonaws.com)
[14:42:41] <travis-ci> (spdk/master) rdma: allocate protection domains for devices up front. (Seth Howell)
[14:42:41] <travis-ci> Diff URL: https://github.com/spdk/spdk/compare/bbf7627c31c2...62266a72cf64
[14:42:41] *** Parts: travis-ci (~travis-ci@ec2-3-89-149-146.compute-1.amazonaws.com) ()
[14:43:51] <lhodev> Was this additional one commit from the SPDK, itself, or from the SPDK's fork of the DPDK?   Just curious.
[14:44:15] *** Joins: travis-ci (~travis-ci@ec2-54-167-172-211.compute-1.amazonaws.com)
[14:44:16] <travis-ci> (spdk/master) build: Don't pass -fuse-ld to compiler if LD_TYPE not set (Jonathan Richardson)
[14:44:16] <travis-ci> Diff URL: https://github.com/spdk/spdk/compare/62266a72cf64...1c96c421ebec
[14:44:16] *** Parts: travis-ci (~travis-ci@ec2-54-167-172-211.compute-1.amazonaws.com) ()
[14:44:47] <bwalker> it was one of the commits we made to SPDK 19.01 prior to moving our submodule to DPDK 18.11
[14:44:56] <bwalker> I just backported it to 18.10.x
[14:56:58] <lhodev> bwalker:   Looks like build failed, but I'm unable to follow the link.
[14:57:39] <bwalker> yeah hasn't transferred over yet
[14:57:43] <bwalker> but I'm looking at it now
[15:00:11] <bwalker> failing to build crypto
[15:00:23] <bwalker> need to think about how best to approach this
[15:00:59] <lhodev> For SPDK 18.10.x, didn't we state that crypto was only experimental?
[15:01:27] <lhodev> In the 18.10.x spec file we run configure without enabling crypto.
[15:21:15] <bwalker> we did - I'm trying to just turn it off in the tests
[15:27:25] *** Joins: travis-ci (~travis-ci@ec2-54-89-106-93.compute-1.amazonaws.com)
[15:27:26] <travis-ci> (spdk/master) sock/vpp: do not continue if buf writed is less than provided (wuzhouhui)
[15:27:26] <travis-ci> Diff URL: https://github.com/spdk/spdk/compare/1c96c421ebec...900f0c978bc5
[15:27:26] *** Parts: travis-ci (~travis-ci@ec2-54-89-106-93.compute-1.amazonaws.com) ()
[15:28:32] *** Joins: travis-ci (~travis-ci@ec2-54-210-171-126.compute-1.amazonaws.com)
[15:28:33] <travis-ci> (spdk/master) ocf: switch to dynamic queues (Vitaliy Mysak)
[15:28:33] <travis-ci> Diff URL: https://github.com/spdk/spdk/compare/900f0c978bc5...ca1b5c418db1
[15:28:33] *** Parts: travis-ci (~travis-ci@ec2-54-210-171-126.compute-1.amazonaws.com) ()
[16:26:39] *** Quits: vmysak (~vmysak@192.55.54.40) (Remote host closed the connection)
[18:44:53] *** Joins: travis-ci (~travis-ci@ec2-3-89-70-107.compute-1.amazonaws.com)
[18:44:54] <travis-ci> (spdk/master) iscsi: Generate and verify DIF to metadata space in read or write I/O (Shuhei Matsumoto)
[18:44:55] <travis-ci> Diff URL: https://github.com/spdk/spdk/compare/ca1b5c418db1...136c3fb46184
[18:44:55] *** Parts: travis-ci (~travis-ci@ec2-3-89-70-107.compute-1.amazonaws.com) ()
[20:29:32] *** Joins: felipef (~felipef@cpc92310-cmbg19-2-0-cust421.5-4.cable.virginm.net)
[20:33:55] *** Quits: felipef (~felipef@cpc92310-cmbg19-2-0-cust421.5-4.cable.virginm.net) (Ping timeout: 246 seconds)
[23:33:24] <spdk-jenkins-bot> Project autotest-nightly build #427: STILL FAILING in 33 min. See https://ci.spdk.io/spdk-jenkins for results.
[23:34:10] <spdk-jenkins-bot> Project autotest-nightly-failing build #296: STILL FAILING in 34 min. See https://ci.spdk.io/spdk-jenkins for results.