[00:57:09] *** Quits: ziyeyang_ (~ziyeyang@192.55.54.42) (Remote host closed the connection)
[01:31:20] *** Joins: jan__ (~textual@ip-94-113-131-57.net.upcbroadband.cz)
[04:27:23] *** Quits: jan__ (~textual@ip-94-113-131-57.net.upcbroadband.cz) (Remote host closed the connection)
[04:57:17] *** Quits: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) (Quit: My Mac Pro has gone to sleep. ZZZzzz…)
[05:06:08] *** Joins: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl)
[07:01:43] *** Joins: nKumar (uid239884@gateway/web/irccloud.com/x-chqtmzdcannlibgc)
[07:04:40] <nKumar> Question about iterating through blobs on a drive. I believe there was once mention of a function within the spdk currently that I could look at where all the blobs are iterated through on a drive to gain basic information on each blob.
[07:04:49] <nKumar> ie creating a list of all blobids on the drive
[07:41:22] *** Quits: tomzawadzki (~tzawadzk@192.55.54.45) (Ping timeout: 255 seconds)
[09:23:08] <bwalker> nKumar: spdk_bs_md_iter_first, spdk_bs_md_iter_next
[09:23:16] <bwalker> declared at the bottom of include/spdk/blob.h
[09:32:16] <nKumar> got it. However, since these are functions with void return, how do you determine when you have finished iterating through all blobs?
[09:33:13] <nKumar> for example, if I have a drive with N blobs and my system shuts down. When the program restarts, it doesnt know the value of N, so I would need to iterate a virtually indeterminate amount of times till I know the name/id of every blob.
[09:33:41] <bwalker> it's asynchronous - so the "return" value is passed to the function callback. In the function callback, a return code of -ENOENT means it's done
[09:33:56] <bwalker> you call spdk_bs_md_iter_first - when the first one is loaded in it calls you back
[09:34:04] <nKumar> perfect, thanks!
[09:34:07] <bwalker> then you call spdk_bs_md_iter_next, giving it the one you were just returned
[09:34:21] <bwalker> etc., until the callback from spdk_bs_md_iter_next calls you with rc == -ENOENT
[09:34:43] <bwalker> as it iterates it has to read from disk - that's why it's asynchronous
[09:35:52] <bwalker> the other thing you may want to look into is using the super blob facilities
[09:36:00] <bwalker> you can set one blob that you create as the "super" blob
[09:36:13] <bwalker> when you load a blobstore fresh, you can ask which blobid was marked as the super blob
[09:36:28] <bwalker> then read that - that's a good place to put your root metadata
[09:36:30] <bwalker> for whatever you are building
[15:30:50] <jimharris> bwalker: i haven't thought this through to completion yet - but would it be possible to create two binaries from test/app/stub/Makefile - one that includes the bare minimum libraries (like stub does today) and another that includes everything we need to for a bdev service?
[15:31:24] *** Joins: Darren (c6ba0002@gateway/web/freenode/ip.198.186.0.2)
[15:31:28] <jimharris> never mind
[15:31:47] *** Darren is now known as Guest45527
[15:31:47] <jimharris> that won't work unless we abstract out the nvme probe stuff from stub.c
[15:36:03] <Guest45527> I am having a long latency problem on latest spdk/dpdk (tip).  custom application that works with 17.03/17.02.  I have a work Q that is shared between hw and a registered poller (latency 0).  The poller checks the shared memory to see if new work has arrived from the adapter.  On spdk (tip) I see ~4ms from time item is placed, to when the poller discovers the item (verified via pcie).  I verified the poller is getting called with high
[15:37:53] <bwalker> is there anything else potentially running on the same thread as the poller?
[15:37:59] <bwalker> from a different application, for example
[15:38:22] <bwalker> by thread I mean same core as the poller
[15:43:33] <Guest45527> Could be, but if I place a pcie register read at the top of the poller, I verified that the register is read every ~1.6us (via pcie)
[15:44:15] <bwalker> so you have some PCIe device
[15:44:23] <bwalker> that does writes into a known memory region
[15:44:32] <bwalker> and the SPDK poller is reading that location to check for entries
[15:44:44] <bwalker> correct?
[15:45:52] <Guest45527> correct.  Been using this since spdk 16.12
[15:46:13] <bwalker> and you wrote your own user space driver for this device?
[15:46:21] <Guest45527> yes.
[15:46:25] <bwalker> that fits into the DPDK framework, just like SPDK did with our NVMe driver
[15:46:38] <bwalker> are you using uio or vfio
[15:46:44] <Guest45527> uio
[15:47:42] <bwalker> are you running this on a platform with something equivalent to DDIO?
[15:47:44] <Guest45527> we use all the underlying rte functions to alloc/map etc
[15:48:37] <bwalker> (i.e. a Xeon server or something)
[15:48:48] <Guest45527> platform is standard intel xeon v3
[15:49:07] <Guest45527> e5-2640 v3
[15:49:15] <bwalker> ok, just wanted to get the easy stuff out of the way
[15:49:28] <bwalker> it used to be fast with 16.12, right?
[15:49:49] <Guest45527> 16.12 and 17.03 worked great
[15:50:27] <bwalker> are you using the dpdk submodule from spdk?
[15:51:04] <Guest45527> yes.  I updated from the git last week
[15:51:45] <bwalker> k one sec I probably know the problem
[15:51:51] <bwalker> just checking a few things
[15:52:00] <Guest45527> glad I got the expert :)
[15:52:19] <Guest45527> I was at the spdk conf a few months ago.  Didn't get a chance to say hi
[15:53:43] <bwalker> the DPDK submodule inside of SPDK has a few patches on top of it
[15:53:45] <bwalker> it's not just stock DPDK
[15:54:19] <bwalker> one of the changes we made was to how uio maps BARs
[15:54:30] <bwalker> it will now enable write combining of the BAR is marked as prefetchable
[15:54:46] <bwalker> (vfio always did this - that's what the Linux kernel does)
[15:54:49] <Guest45527> If I do a git from spdk, doesn't that now include the dpdk?  (I know we used to have to get dpdk separately)
[15:55:12] <bwalker> yes - if you get spdk from our github and then do git submodule update --init, it will automatically grab our forked DPDK
[15:55:25] <bwalker> we still do support you using your own DPDK - you can specify where DPDK is when you run ./configure
[15:55:40] <bwalker> but our DPDK has a few critical patches that aren't guaranteed to have pushed into DPDK yet
[15:56:19] <Guest45527> I had that pre-fetch problem a few weeks ago with writes to registers.  I now do a read after my write to flush.
[15:56:20] <bwalker> the big one is commit 47668495 that we put on top of DPDK
[15:56:37] <jimharris> the problem is that your hardware pushes a work item to shared memory, but your SPDK poller doesn't see it for 4ms?
[15:57:16] <Guest45527> yes
[15:57:31] <bwalker> is that shared memory a mapped PCI BAR?
[15:58:02] <Guest45527> no.  It is allocated memory using rte_alloc_socket, then get the phys.  The phys is passed to the hw.
[15:58:10] <bwalker> hmm
[15:59:11] <bwalker> do you have like doorbells and such like NVMe does?
[15:59:43] <jimharris> is it exactly 4ms?
[16:00:19] <Guest45527> I just did the update.  I also added the pcie register read back to my poller.  I'm going to verify on pcie that the period from the placement of the work item by the hw to when I see it processed included the register reads.  This will prove the poller is running, but it doesn't see the memory update
[16:00:44] <Guest45527> About 3.9ms to 4ms
[16:01:40] <Guest45527> The doorbells are hw registers.   I see when the item has been processed by the poller because the doorbell (hw register) gets updated across pcie
[16:02:30] <bwalker> so you see on the pcie analyzer that the physical address was written to by the device
[16:02:39] <bwalker> and you see that the poller ran and read that address
[16:02:45] <bwalker> but you don't see the update for a full 4ms
[16:02:49] <bwalker> worth of polling
[16:04:13] <jimharris> what is CONFIG_HZ on your running kernel?
[16:04:16] <Guest45527> I can't see the poller reading the host memory across pcie :)
[16:04:33] <bwalker> but you have like a printf with a timestamp, right?
[16:05:40] <jimharris> Ubuntu at least uses CONFIG_HZ=250 by default which equals 4ms - probably not related but that came to mind
[16:05:51] <Guest45527> I see the hw write the host memory.  I see a 4ms delay before a pcie write occurs that acks the work entry.  I know that the poller is being called at hight freq because I added a pcie register read at the top.  It was read every 1.6us on pcie
[16:06:37] <Guest45527> I can go to another directory on same system and run this on 17.03 just fine
[16:06:56] <bwalker> so you see those pcie reads come in on the analyzer within 1.6us of the hw write
[16:07:04] <bwalker> and in the poller is just does the pcie read and then reads the address in question
[16:07:13] <bwalker> so presumably, it did attempt to read the location
[16:07:51] <Guest45527> yea.  I was trying to see if something in the poller registration had changed and it wasn't being called.
[16:08:21] <bwalker> well, now I'm not convinced I know what the problem is. As a long shot, can you revert commit 4766849 from our dpdk submodule and see if it is fixed?
[16:08:24] <Guest45527> I have 2 main pollers.  1 at 50ms interval (and I verified on pcie that is called at 50ms).  The other at interval 0
[16:09:37] <jimharris> you are using the rte_ function directly, and not using the spdk_ wrapper equivalents?
[16:09:43] <jimharris> for allocating memory, etc.
[16:10:12] <jimharris> should not matter either way - would help rule out our spdk env wrappers though
[16:10:56] <Guest45527> I have a mix.  Not everything has been converted to the spdk wrappers.
[16:12:58] <Guest45527> let me verify the poller via the pcie register reads after the wq item is placed.  Also try reverting that commit.
[16:15:10] <Guest45527> CONFIG_HZ is set to 250.
[16:16:18] <jimharris> probably a red herring but thanks for checking
[16:17:55] <jimharris> how much memory on your host system, and how much memory is allocated to huge pages?
[16:24:18] <Guest45527> 64G and 512 pages/per numa node
[16:26:37] <jimharris> do you explicitly set the mem size in the env opts, or just leave the default (which consumes all available huge pages)?
[16:28:57] <Guest45527> I've done both.  I usually use dpdk-setup.sh to set a value.
[16:30:55] <Guest45527> give me a bit to chase this through and try backing out the commit above.
[16:42:30] *** Quits: Guest45527 (c6ba0002@gateway/web/freenode/ip.198.186.0.2) (Ping timeout: 260 seconds)
[18:22:38] *** Joins: ziyeyang_ (~ziyeyang@134.134.139.78)
[19:41:11] *** Quits: nKumar (uid239884@gateway/web/irccloud.com/x-chqtmzdcannlibgc) (Quit: Connection closed for inactivity)
[19:57:04] *** Joins: whitepa (~whitepa@2601:601:1200:f23b:b1bd:1de0:67e6:111d)
[23:24:10] *** Joins: tomzawadzki (~tzawadzk@192.55.54.39)