[00:52:51] *** Joins: travis-ci (~travis-ci@ec2-52-90-34-207.compute-1.amazonaws.com) [00:52:52] (spdk/master) bdev: remove the unnecessary spdk_bdev_finish call. (Ziye Yang) [00:52:53] Diff URL: https://github.com/spdk/spdk/compare/267f09dea0ef...39c4f95b35d1 [00:52:53] *** Parts: travis-ci (~travis-ci@ec2-52-90-34-207.compute-1.amazonaws.com) () [01:43:23] *** Joins: travis-ci (~travis-ci@ec2-54-80-0-128.compute-1.amazonaws.com) [01:43:23] (spdk/master) lib/bdev: Assert if there is no outstanding IO after completion with ENOMEM (Wojciech Malikowski) [01:43:24] Diff URL: https://github.com/spdk/spdk/compare/39c4f95b35d1...f1da65ef283b [01:43:24] *** Parts: travis-ci (~travis-ci@ec2-54-80-0-128.compute-1.amazonaws.com) () [03:36:40] Project autotest-nightly build #426: STILL FAILING in 29 min. See https://ci.spdk.io/spdk-jenkins for results. [03:53:01] Project autotest-nightly-failing build #295: STILL FAILING in 39 min. See https://ci.spdk.io/spdk-jenkins for results. [04:59:42] *** Joins: Mic92 (~Mic92@mail.thalheim.io) [05:02:58] I am currently porting SPDK to run inside SGX enclaves and stumbled over the `nvme_sigbus_fault_sighandler`. It accesses the nvme controller via thread-local storage. However I wonder how this is supposed to work because the thread, who sets g_thread_mmio_ctrlr might be not the same that is scheduled to handle the signal. [06:08:54] Mic92: oh, you're absolutely right [06:10:15] signal(7): A process-directed signal may be delivered to any [06:10:16] one of the threads that does not currently have the signal blocked. [06:10:16] If more than one of the threads has the signal unblocked, then the [06:10:16] kernel chooses an arbitrary thread to which to deliver the signal. [06:20:40] DPDK 18.11+ also has a sigbus handler and finds the respective pci device by iterating through all available devices and comparing their bar addresses against the faulty address [06:21:59] SPDK still supports DPDK versions < 18.11 so it can't just utilize the DPDK handler [06:22:36] I'm afraid we'll have to implement the exact same functionality in SPDK [06:23:05] maybe jimharris has some more input [06:52:22] darsto: What is actually the purpose of this handler? Why can the controller not be mapped upfront? [06:54:50] For my use-case it would be great, if I could do this upfront. Since I run code in SGX enclave and are therefore limited w.r.t. syscalls. [06:55:04] I guess I could issue a read before running the enclave. [07:06:53] *** Joins: travis-ci (~travis-ci@ec2-54-89-106-93.compute-1.amazonaws.com) [07:06:54] (spdk/master) test: add mem_callbacks unit test (Jim Harris) [07:06:54] Diff URL: https://github.com/spdk/spdk/compare/f1da65ef283b...3fc824c834ba [07:06:54] *** Parts: travis-ci (~travis-ci@ec2-54-89-106-93.compute-1.amazonaws.com) () [08:14:12] Someone online that might answer a question regarding spdk_nvme_ctrlr_get_num_ns() ? [08:25:07] *** Joins: vmysak (vmysak@nat/intel/x-ygabwpedkdpuyvwv) [08:43:44] It's moreover a question of WHY ctrlr->num_ns has the value of MaxNamespaces for an object pointing to a NVM subsystem over a Fabric as opposed to just the actual number of discovered namespaces. [08:44:35] It appears this only happens if and only if MaxNamespaces was set for that subsystem. If not set, then ctrlr->num_ns does hold the value of the actual number of namespaces. [08:57:32] *** Quits: vmysak (vmysak@nat/intel/x-ygabwpedkdpuyvwv) (Ping timeout: 272 seconds) [08:57:59] *** Joins: vmysak (~vmysak@192.55.54.40) [09:02:34] *** Quits: vmysak (~vmysak@192.55.54.40) (Ping timeout: 250 seconds) [09:08:06] Mic92: the purpose of that handler is to remap pci bars after a device gets hotremoved [09:09:15] the hotremove is detected elsewhere and remapping the bars just allows it to proceed in its normal shutdown cycle [09:09:33] i see the github issue, thanks [09:48:36] lhodev: the spec isn't really intuitive regarding num_ns [09:48:45] one sec need to look at the code and then I can give you an answer [09:49:08] bwalker: Thank you! [09:49:41] ok, confirmed what I was thinking [09:49:55] so in the NVMe specification, there are subsystems [09:50:05] which contain what is effectively an array of namespaces [09:50:11] where the index into the array is the namespace id [09:50:19] that's by spec - not by SPDK design [09:50:32] the array does not need to be entirely populated [09:50:53] that ctrlr->num_ns value is our internally cached value of the 'nn' field from the identify data [09:51:01] which is the length of the array - not the number of namespaces [09:51:46] *** Joins: vmysak (~vmysak@192.55.54.40) [09:52:34] the part that trips everyone up, myself included, is that it isn't obvious that the array of namespaces can be sparsely populated [09:52:52] the way the spec talks about it is that some namespaces can be "inactive", but that means they don't exist [09:53:30] so it's totally possible to do a discover on a controller and find two namespaces on a device that can have up to 4 [09:53:53] if the device supports namespace management, for example [10:01:06] So how does SPDK handle the case for namespace mgmt? That is, how does it size the array if a NVMe controller supports namespace mgmt? [10:02:02] well in a physical device, the array is whatever size is reported in the 'NN' value, which is ctrlr->num_ns for us [10:02:16] but in nvme-of, we're emulating a controller [10:02:25] and we may get RPCs that change the number of namespaces on us [10:02:46] so in response to those RPCs, we dynamically reallocate the namespace array [10:02:50] if necessary [10:03:08] if the user adds a bdev to an nvmf subsystem (bdev == namespace in our design), they have two options [10:03:13] they can either explicitly select the nsid [10:03:18] or they can let us select one [10:03:34] if they let us select one, we'll pick the lowest open slot. If there are no open slots, we'll make the array one slot bigger. [10:04:01] if they explicitly select the nsid, if it isn't already occupied, we will resize the array to include that nsid if it doesn't already [10:04:44] so you could do something silly if you wanted to and explicitly select 100 as your nsid for the first namespace added to a subsystem [10:05:00] and spdk_nvme_ctrlr_get_num_ns() should return 101 (length of array is 0 to 100) [10:05:02] The bottom line: one should never use spdk_nvme_ctrlr_get_num_ns() as a means to determine the number of CURRENT namespaces. [10:05:08] correct [10:05:28] by CURRENT I assume you mean ACTIVE namespaces. The specification defines states for namespaces to be in [10:05:35] I can look those up real quick - there's like 3 or 4 [10:06:26] Instead, the paradigm should be to use spdk_nvme_ctrlr_get_first_active_ns() and spdk_nvme_ctrlr_get_next_active_ns() to count/walk the, yes, ACTIVE, namespaces. [10:06:33] exactly [10:07:11] my terminology actually isn't exactly right now that I look at the spec [10:07:23] it's not a namespace that can be in various states, but a "namespace id" [10:07:26] i.e. nsid [10:07:40] section 6.1 of NVMe 1.3c is where it's defined [10:07:49] Gonna pull that up right now. [10:08:01] https://nvmexpress.org/wp-content/uploads/NVM-Express-1_3c-2018.05.24-Ratified.pdf [10:08:41] so in reality that ctrlr->num_ns value is the number of valid namespace ids, not the number of namespaces [10:08:41] I already have it open in Preview, lol. Just need to get to the appropriate section ;-) [10:09:03] it's a little bit tough because a lot of this was retrofitted into the spec later on [10:09:24] and some of the SPDK APIs were already defined and we don't want to change them [10:09:38] Such that semantically they'd make more sense? [10:09:44] (sigh) [10:10:00] yeah - we could give spdk_nvme_ctrlr_get_num_ns() a better name [10:10:19] in NVMe 1.0 and maybe 1.1, the number of NSIDs == number of namespaces I think [10:10:37] there wasn't this distinction [10:10:53] but that function is one of the original ones - more than 5 years old [10:12:56] Maybe we could at least alter the Doxygen comments to help make this more understandable? [10:13:31] yes definitely [10:15:45] On an entirely unrelated note, were you able to get Karol (sp?) to build the SPDK 18.10.1 + DPDK 18.11 vanilla + single malloc patch built and run through the CI ? [10:20:50] not yet - we have been working the DPDK release schedule on our end though [10:20:59] I should have some updates soon [10:23:32] *confused* Exactly what do you mean by "...working the DPDK release schedule"? Is that a reference to when we move our fork of the DPDK to a newer version, determine which SPDK-originated patches were accepted and which ones must be cherry-pick'd forward? [10:24:17] there's a chance we could get you the 18.08.1 you wanted. Or at a minimum the 18.11.1 official which contains that extra patch you need [10:24:51] but I'm actually going to kick off the test as you described above now just in case [10:27:09] I really appreciate that. I need to update some internal status on projected dates and this is a big one for me. [10:59:36] *** Joins: travis-ci (~travis-ci@ec2-54-89-106-93.compute-1.amazonaws.com) [10:59:37] (spdk/v18.10.1-oracle) SPDK 18.10.1 (Tomasz Zawadzki) [10:59:38] Diff URL: https://github.com/spdk/spdk/compare/v18.10.1-oracle [10:59:38] *** Parts: travis-ci (~travis-ci@ec2-54-89-106-93.compute-1.amazonaws.com) () [11:11:08] lhodev: I think this will get the test results you want: https://review.gerrithub.io/c/spdk/spdk/+/447841 [11:11:10] we'll see [11:34:18] *** Quits: tomzawadzki (uid327004@gateway/web/irccloud.com/x-favanabcsxqrvzbb) (Quit: Connection closed for inactivity) [12:32:34] jimharris, not sure you saw my note this morn about removing pmem_msync running for 4 hrs on the compress write test, just got it to fail :( [12:32:43] *** Joins: travis-ci (~travis-ci@ec2-54-89-106-93.compute-1.amazonaws.com) [12:32:44] (spdk/v19.01.x) fio_plugin: fix hang in FIO (Piotr Pelplinski) [12:32:44] Diff URL: https://github.com/spdk/spdk/compare/c828d09d3a25...a6f10a33cb31 [12:32:44] *** Parts: travis-ci (~travis-ci@ec2-54-89-106-93.compute-1.amazonaws.com) () [12:34:11] i think i missed it - was it an e-mail or irc? [12:35:38] you're saying that if you remove the pmem_msync, it runs for much longer, but still eventually fails after 4 hours? [12:41:45] *** Joins: travis-ci (~travis-ci@ec2-52-201-242-216.compute-1.amazonaws.com) [12:41:46] (spdk/v19.01.x) spdkcli: Exit with 1 when rpc throws JSONRPCException (Pawel Kaminski) [12:41:46] Diff URL: https://github.com/spdk/spdk/compare/a6f10a33cb31...85d6682dd491 [12:41:46] *** Parts: travis-ci (~travis-ci@ec2-52-201-242-216.compute-1.amazonaws.com) () [12:43:40] *** Joins: travis-ci (~travis-ci@ec2-52-201-242-216.compute-1.amazonaws.com) [12:43:41] (spdk/master) fio_plugin: don't submit the IO if got DIF context error (Changpeng Liu) [12:43:41] Diff URL: https://github.com/spdk/spdk/compare/3fc824c834ba...5a051a6c1b5a [12:43:41] *** Parts: travis-ci (~travis-ci@ec2-52-201-242-216.compute-1.amazonaws.com) () [12:44:28] *** Joins: travis-ci (~travis-ci@ec2-54-167-172-211.compute-1.amazonaws.com) [12:44:29] (spdk/v19.01.x) blob: pass NULL or SPDK_BLOBID_INVALID when bserrno != 0 (Jim Harris) [12:44:29] Diff URL: https://github.com/spdk/spdk/compare/85d6682dd491...cf0d9530447f [12:44:29] *** Parts: travis-ci (~travis-ci@ec2-54-167-172-211.compute-1.amazonaws.com) () [12:49:08] *** Joins: travis-ci (~travis-ci@ec2-34-238-139-82.compute-1.amazonaws.com) [12:49:09] (spdk/master) bdev: don't allow multiple unregister calls (Pawel Wodkowski) [12:49:10] Diff URL: https://github.com/spdk/spdk/compare/5a051a6c1b5a...0fe8cd17111f [12:49:10] *** Parts: travis-ci (~travis-ci@ec2-34-238-139-82.compute-1.amazonaws.com) () [13:44:42] lhodev: SPDK 18.10.1 fails to compile against DPDK 18.11. I'll look into it a bit here, but we may need to produce a minimal 18.10.2 in order to support this configuration. [13:46:27] bwalker: interesting. Ok. Shall we stay the course (for now) of targeting vanilla DPDK 18.11 + malloc patch? Or, do you think it's better if we go for vanilla DPDK 18.11.1? I'm just a litttle concerned *when* the latter will finally be officially tagged. [13:47:05] I think either way we need to produce an 18.10.2 [13:47:14] and the patches we need to pull in for that are almost certainly the same ones [14:11:23] bwalker: And then some. When I took a look at the commits in DPDK 18.11.1, it did as you state pull in that malloc patch, and also has quite a few others. Mind you, many of them are related to options in DPDK we're not using, but I think I recall at least one or maybe even two additional eal-related ones that might make an impact. [14:20:20] jimharris, I haven't done enough testing to say what makes it run longer/shorter. Adding more debug code, getting closer I think maybe. [14:34:07] *** Joins: travis-ci (~travis-ci@ec2-54-167-172-211.compute-1.amazonaws.com) [14:34:08] (spdk/master) test/lvol: Run all test cases. (Pawel Kaminski) [14:34:08] Diff URL: https://github.com/spdk/spdk/compare/0fe8cd17111f...bbf7627c31c2 [14:34:08] *** Parts: travis-ci (~travis-ci@ec2-54-167-172-211.compute-1.amazonaws.com) () [14:38:39] lhodev: I had to pull over just one commit to 18.10.x to get it to compile against DPDK 18.11 [14:38:43] so running the tests again now [14:40:21] bwalker: Cool. So, are you doing this under another gerrit change-id? Looks like the other you marked abandoned (https://review.gerrithub.io/#/c/spdk/spdk/+/447841/). [14:40:58] https://review.gerrithub.io/c/spdk/spdk/+/447851 [14:41:04] switched branches so it made a new review [14:42:40] *** Joins: travis-ci (~travis-ci@ec2-3-89-149-146.compute-1.amazonaws.com) [14:42:41] (spdk/master) rdma: allocate protection domains for devices up front. (Seth Howell) [14:42:41] Diff URL: https://github.com/spdk/spdk/compare/bbf7627c31c2...62266a72cf64 [14:42:41] *** Parts: travis-ci (~travis-ci@ec2-3-89-149-146.compute-1.amazonaws.com) () [14:43:51] Was this additional one commit from the SPDK, itself, or from the SPDK's fork of the DPDK? Just curious. [14:44:15] *** Joins: travis-ci (~travis-ci@ec2-54-167-172-211.compute-1.amazonaws.com) [14:44:16] (spdk/master) build: Don't pass -fuse-ld to compiler if LD_TYPE not set (Jonathan Richardson) [14:44:16] Diff URL: https://github.com/spdk/spdk/compare/62266a72cf64...1c96c421ebec [14:44:16] *** Parts: travis-ci (~travis-ci@ec2-54-167-172-211.compute-1.amazonaws.com) () [14:44:47] it was one of the commits we made to SPDK 19.01 prior to moving our submodule to DPDK 18.11 [14:44:56] I just backported it to 18.10.x [14:56:58] bwalker: Looks like build failed, but I'm unable to follow the link. [14:57:39] yeah hasn't transferred over yet [14:57:43] but I'm looking at it now [15:00:11] failing to build crypto [15:00:23] need to think about how best to approach this [15:00:59] For SPDK 18.10.x, didn't we state that crypto was only experimental? [15:01:27] In the 18.10.x spec file we run configure without enabling crypto. [15:21:15] we did - I'm trying to just turn it off in the tests [15:27:25] *** Joins: travis-ci (~travis-ci@ec2-54-89-106-93.compute-1.amazonaws.com) [15:27:26] (spdk/master) sock/vpp: do not continue if buf writed is less than provided (wuzhouhui) [15:27:26] Diff URL: https://github.com/spdk/spdk/compare/1c96c421ebec...900f0c978bc5 [15:27:26] *** Parts: travis-ci (~travis-ci@ec2-54-89-106-93.compute-1.amazonaws.com) () [15:28:32] *** Joins: travis-ci (~travis-ci@ec2-54-210-171-126.compute-1.amazonaws.com) [15:28:33] (spdk/master) ocf: switch to dynamic queues (Vitaliy Mysak) [15:28:33] Diff URL: https://github.com/spdk/spdk/compare/900f0c978bc5...ca1b5c418db1 [15:28:33] *** Parts: travis-ci (~travis-ci@ec2-54-210-171-126.compute-1.amazonaws.com) () [16:26:39] *** Quits: vmysak (~vmysak@192.55.54.40) (Remote host closed the connection) [18:44:53] *** Joins: travis-ci (~travis-ci@ec2-3-89-70-107.compute-1.amazonaws.com) [18:44:54] (spdk/master) iscsi: Generate and verify DIF to metadata space in read or write I/O (Shuhei Matsumoto) [18:44:55] Diff URL: https://github.com/spdk/spdk/compare/ca1b5c418db1...136c3fb46184 [18:44:55] *** Parts: travis-ci (~travis-ci@ec2-3-89-70-107.compute-1.amazonaws.com) () [20:29:32] *** Joins: felipef (~felipef@cpc92310-cmbg19-2-0-cust421.5-4.cable.virginm.net) [20:33:55] *** Quits: felipef (~felipef@cpc92310-cmbg19-2-0-cust421.5-4.cable.virginm.net) (Ping timeout: 246 seconds) [23:33:24] Project autotest-nightly build #427: STILL FAILING in 33 min. See https://ci.spdk.io/spdk-jenkins for results. [23:34:10] Project autotest-nightly-failing build #296: STILL FAILING in 34 min. See https://ci.spdk.io/spdk-jenkins for results.