[00:49:43] *** Joins: baruch (~baruch@31.210.182.58) [01:15:20] *** Joins: tomzawadzki (~tomzawadz@134.134.139.76) [01:18:56] *** Quits: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) (Quit: Page closed) [03:17:41] *** Quits: tomzawadzki (~tomzawadz@134.134.139.76) (Ping timeout: 265 seconds) [03:31:08] *** Quits: gila (~gila@5ED4D9C8.cm-7-5d.dynamic.ziggo.nl) (Quit: My Mac Pro has gone to sleep. ZZZzzz…) [03:32:34] *** Joins: gila (~gila@5ED4D9C8.cm-7-5d.dynamic.ziggo.nl) [03:47:27] *** Joins: tomzawadzki (~tomzawadz@192.55.54.44) [04:13:35] *** Quits: gila (~gila@5ED4D9C8.cm-7-5d.dynamic.ziggo.nl) (Quit: My Mac Pro has gone to sleep. ZZZzzz…) [04:29:22] *** Joins: gila (~gila@5ED4D9C8.cm-7-5d.dynamic.ziggo.nl) [07:34:40] *** Joins: lhodev (~Adium@inet-hqmc07-o.oracle.com) [07:34:47] *** Parts: lhodev (~Adium@inet-hqmc07-o.oracle.com) () [09:15:16] *** Quits: tomzawadzki (~tomzawadz@192.55.54.44) (Ping timeout: 260 seconds) [09:49:06] *** Quits: gila (~gila@5ED4D9C8.cm-7-5d.dynamic.ziggo.nl) (Quit: My Mac Pro has gone to sleep. ZZZzzz…) [09:51:41] drv, bwalker: the md format described in doc/blob.md is getting pretty far out of sync [09:52:34] do we want to keep maintaining that? if we do, that's fine, but from what I can tell so far, it's been out of sync since the very beginning [09:53:00] it's probably something we should document once we reach stability [09:53:05] and not in the interim [09:53:17] i.e. if it doesn't change for a year, we can write it down [09:53:33] i'll add something to it for this new blobid mask but won't fix anything else as part of that patch [09:54:02] signature and clean are missing - docs say we persist page size to disk which we don't [09:54:30] bs-type is listed in the wrong place (totally missed that in the review) [09:55:53] note that all that will be replaced with the PG so maybe I should just yank the details, blab a little about it and refer to source for specifics [09:56:10] (and that whole section in the PG is copy n paste from blob.md) [10:00:20] I think we do want to document the disk format separately from the code, but I agree we can put it off until this set of changes to the format have been finished [10:00:37] it's probably also not clear enough from the docs to actually do a second implementation without looking at the code, so we should improve that [11:14:59] jimharris: were you seeing that there were still bdev mgmt channels around on bdev module shutdown? [11:15:07] is that why you added the for_each_channel? [11:15:20] yes [11:15:43] I would have expected all of the bdev mgmt channels to get destroyed before that - doesn't destroying a regular bdev io channel release its reference to the bdev mgmt channel? [11:17:08] it should - let me look into it [11:25:01] the split bdevs in my test don't get unregistered [11:25:20] there's more plumbing needed to make that happen - but I'd prefer not to tackle that right now [11:25:28] yeah I already gave your patch a +2 [11:27:31] sethhowe is looking into something related, I think [11:27:55] the spdk_bdev_part_free() code doesn't unregister the bdev [11:53:04] jimharris: check out https://review.gerrithub.io/#/c/392547/ for a quick fix. I am going to pull down Daniel's old patch at https://review.gerrithub.io/#/c/389878/ and finish it up for a more comprehensive fix. [12:44:49] *** Joins: gila (~gila@5ED4D9C8.cm-7-5d.dynamic.ziggo.nl) [15:28:45] *** Joins: James (~James@208.185.211.6) [15:29:09] *** James is now known as Guest69401 [15:37:06] *** Quits: gila (~gila@5ED4D9C8.cm-7-5d.dynamic.ziggo.nl) (Quit: My Mac Pro has gone to sleep. ZZZzzz…) [16:07:43] *** Joins: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) [16:09:08] @bwalker, would be possible to give a pros and cons to use uio versus VFIO? [16:10:33] if you are deploying on bare metal, use vfio [16:10:44] if you are deploying in a VM that is already protected by an IOMMU on the host, use uio [16:11:15] no pros and cons - it's not a trade off. VFIO is the only way to deploy in production on bare metal [16:12:00] otherwise you don't have any protection for page movement by the kernel, and there are legitimate reasons why the kernel would move a page [16:12:07] memory failure being the primary good reason [16:14:00] if you give me a scenario that you're looking at, I can tell you whether you need to use vfio [16:14:01] if that would help [16:14:09] and what may go wrong if you don't [16:26:22] Hi Ben, thanks a lot, WOuld you mind giving me a hint why page will move by the kernel? [16:27:04] there are lots of reasons, but one good one is if you are running with ECC memory [16:27:24] and reading a memory page reaches a certain error correction count [16:27:34] that indicates that the page is probably going bad [16:27:45] so the kernel will remap the virtual address to a new physical address while it can still read the page [16:28:33] yes, understand [16:28:41] *** Guest69401 is now known as James [16:29:11] *** James is now known as Guest9408 [16:29:11] you mean the vfio is able to prevent form happening? [16:29:53] vfio doesn't prevent the movement, but when we're using vfio the IOMMU is now involved to translate PCIe bus addresses to physical addresses [16:30:04] so if the kernel remaps a va to a pa [16:30:16] the iova gets remapped automatically [16:30:28] so we continue to program the device with the same iova but it now DMA's to the new pa [16:30:33] all automatically in the hardware [16:32:33] the UIO does not have support for IOMMU, correct? [16:37:00] correct - that's the primary difference between uio and vfio [16:38:28] wonderful, is there any other differences between UIO and VFIO? UIO also support DMA now. [16:39:39] lots of minor differences in how they're set up and configured [16:39:49] and in how they mark the device in use to the kernel [16:39:58] but the IOMMU is the big one [16:40:57] so the recommendation from SPDK community would be VFIO instead of UIO, correct? [16:42:45] for deployment on production systems, on bare metal, absolutely [16:43:12] uio is fine for running in VMs or on a dev system [16:43:30] (only fine for running on a VM if the VM is protected by the IOMMU already) [16:45:28] thanks a lot, by the way, Intel NVMe device has IOMMU capabilities ,right ? [16:45:56] the IOMMU is a capability of the platform, not the SSD. Most Intel Xeon skus have IOMMUs. [16:47:06] thanks a lot, by default, the NVMe device will access the memory through IOMMU translation,correct? [16:48:20] you have to enable the IOMMU in the BIOS, in the Linux kernel boot parameters, and then configure it through sysfs. SPDK provides scripts/setup.sh which will automatically configure your system to use the IOMMU with your SSDs if it detects that the IOMMU is enabled on the platform. [16:48:48] so basically, turn it on in BIOS and on Linux kernel boot parameters, then run ./scripts/setup.sh from SPDK [16:48:51] and it will be enabled [16:49:09] got it, [16:50:48] assuming there is ecc error happened in memory, so the kernel will remap the virtual address to the new physical memory and also notify the IOMMU to remap the device address to new physical memory, correct? [16:51:45] more or less, yes [16:51:58] if without IOMMU, it can only write the ecc error memory which eventually crash the whole system, correct? [16:52:31] the kernel will still move the page, but without the IOMMU SPDK will continue programming the device to DMA to the old physical address [16:52:41] which is not where the virtual address is mapped anymore [16:52:54] so the user is told their I/O completed, but their data is not right [16:54:15] got [16:54:18] got it [18:59:03] *** Quits: Guest9408 (~James@208.185.211.6) (Remote host closed the connection) [19:02:49] *** Joins: guerby_ (~guerby@ip165.tetaneutral.net) [19:07:14] *** Quits: guerby (~guerby@april/board/guerby) (*.net *.split) [19:29:35] *** Joins: James (~James@208.185.211.6) [19:30:02] *** James is now known as Guest56297 [19:34:14] *** Guest56297 is now known as JamesLiu [19:34:28] *** JamesLiu is now known as JamesLiu123456 [19:34:49] *** Quits: JamesLiu123456 (~James@208.185.211.6) (Quit: Leaving...) [19:36:11] *** Joins: James (~James@208.185.211.6) [19:36:35] *** James is now known as Guest74205 [19:38:53] hi Ben [19:39:57] @bwalker, quick question, is there any performance degrade while using VFIO IOMMU comparing UIO? [19:40:10] *** Guest74205 is now known as JamesLiu [20:10:34] anyone from Intel have answers, pls? [20:30:02] *** Quits: JamesLiu (~James@208.185.211.6) (Remote host closed the connection) [20:30:34] *** Joins: JamesLiu (~James@208.185.211.6) [20:34:57] *** Quits: JamesLiu (~James@208.185.211.6) (Ping timeout: 240 seconds) [21:11:15] *** Joins: JamesLiu (~James@2601:640:4:bbcc:b1c7:ad28:eea4:c78f) [21:16:00] *** Quits: JamesLiu (~James@2601:640:4:bbcc:b1c7:ad28:eea4:c78f) (Ping timeout: 265 seconds) [21:25:35] *** Joins: JamesLiu (~James@2601:640:4:bbcc:b1c7:ad28:eea4:c78f) [21:31:29] JamesLiu: I am not aware of any performance penalty for using the IOMMU, but it's worth inquiring with the hardware people [21:31:50] cool,thanks [23:15:11] *** Quits: JamesLiu (~James@2601:640:4:bbcc:b1c7:ad28:eea4:c78f) (Ping timeout: 255 seconds)