[01:19:33] *** Joins: sbasierx (~sbasierx@192.198.151.43)
[01:49:15] *** Joins: tomzawadzki (~tomzawadz@134.134.139.75)
[02:31:05] *** Quits: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) (Ping timeout: 260 seconds)
[02:46:39] *** Joins: tkulasek (86bfdc47@gateway/web/freenode/ip.134.191.220.71)
[04:04:05] *** Quits: tomzawadzki (~tomzawadz@134.134.139.75) (Ping timeout: 240 seconds)
[05:43:17] *** Quits: johnmeneghini (~johnmeneg@pool-96-252-112-122.bstnma.fios.verizon.net) (Quit: Leaving.)
[05:52:19] *** Quits: tkulasek (86bfdc47@gateway/web/freenode/ip.134.191.220.71) (Ping timeout: 260 seconds)
[06:41:48] *** Joins: tkulasek (86bfdc48@gateway/web/freenode/ip.134.191.220.72)
[06:53:32] *** Quits: sbasierx (~sbasierx@192.198.151.43) (Quit: Going offline, see ya! (www.adiirc.com))
[10:21:14] *** Quits: tkulasek (86bfdc48@gateway/web/freenode/ip.134.191.220.72) (Ping timeout: 260 seconds)
[11:25:04] *** Joins: philipp-s (4633f117@gateway/web/freenode/ip.70.51.241.23)
[11:31:50] <philipp-s> ping
[11:51:42] *** Parts: philipp-s (4633f117@gateway/web/freenode/ip.70.51.241.23) ()
[11:59:23] <jimharris> hi philipp-s
[12:54:42] *** Quits: sethhowe (~sethhowe@192.55.54.42) (Remote host closed the connection)
[12:56:12] *** Joins: sethhowe (~sethhowe@192.55.54.42)
[13:09:46] <drv> jimharris: I added a comment to https://review.gerrithub.io/#/c/399993/ - let me know what you think
[13:11:24] <jimharris> sounds good to me - I removed my +2
[13:15:36] <drv> part of that is my fault - I copied the existing layout of the RPC methods when I moved it over from the original Sun-RPC implementation, and I intended to clean it up eventually...
[13:23:24] <jimharris> then it's partly my fault too since I created that layout in the original Sun-RPC implementation ;-)
[14:15:54] <jimharris> drv: curious what your thoughts are on the CMB allocator - I just responded to Stephen's thread on the mailing list
[14:16:52] <drv> I think we can just do a simple first-fit kind of allocator
[14:17:15] <drv> with a separate set of structures in normal malloc()-based RAM tracking the free area
[14:18:03] <drv> that should be like 20 lines of code (famous last words)
[15:43:02] <jimharris> i've seen this behavior a few times now: https://review.gerrithub.io/#/c/400163/
[15:43:38] <jimharris> first patch fails on fedora-03 with nvme_rdma.c: 211:nvme_rdma_get_event: *ERROR*: Received event: 7(RDMA_CM_EVENT_UNREACHABLE) from CM event channel, but expected event: (RDMA_CM_EVENT_ESTABLISHED)
[15:43:55] <jimharris> then next patch also fails because fedora-03 doesn't respond
[15:45:45] <peluse> drv, so the ASAN thing might be bogus? Disregard the comment I made about a NULL being passed into spdk_bdev_part_base_hotremove() or wherever it was, I was looking at a parm that had already been used and freed. The ASAN thing is in that function though
[15:46:43] <peluse> its the TAILQ_FOREACH_SAFE() loop where the list items are removed indirectly via a call to spdk_bdev_unregister() and I walked through it and there's nothing that I can see wrong with it
[15:48:35] <peluse> if I replace it with a while TAILQ_FIRST ASAN has no issues with it. It doesn't look like we can count on a list removal all of the time though via the unregister...
[15:49:25] <peluse> so the while FIRST thing probably isn't a real fix - have you ever seen ASAN be wrong though? I can walk through it with you tomrrow if you want
[15:51:25] <drv> peluse: I'm out tomorrow, but we can check it out on Tuesday if you're here then
[15:51:39] <drv> I haven't seen a case where it's ever actually wrong (yet), but it's possible, I guess
[15:51:47] <jimharris> can you post a link to the ASAN failure?
[15:51:55] <jimharris> is it in the test pool or on your local system?
[15:57:22] <peluse> jimharris, its disabled in the pool because of this but you should be able to repro it anywhere.  Just build w/ASAN and run blockdev.sh
[15:57:36] <peluse> drv, yeah Tue works.  I forgot Mon is a holiday, yeeehaw
[15:58:11] <peluse> jimharris, or you can run bdevio using the bdev.conf.in file, that's a little quicker
[15:58:25] <peluse> in /test/lib/bdev/bdevio of coruse
[15:58:51] <drv> oh yeah, that definitely blows up for me with ASAN on
[16:00:04] <peluse> yeah... i could have missed something small too but I walked through all construction and destruction of shit running bdev with gdb tracking every pointer.  I don't see any real issues anywhere and did manage to learn quite a bit so that's good.  Will probably forget by Tue though :)
[16:00:31] <drv> maybe it's because spdk_bdev_part_base_hotremove() does not remove the items from the tailq?
[16:00:50] <drv> I'm not sure if it's supposed to, or if that's meant to happen somewhere else as part of bdev unregistration
[16:01:04] <jimharris> i'll take a look
[16:01:41] <jimharris> i see a use-after-free in spdk_bdev_part_base_hotremove - is that the same that you're seeing?
[16:01:51] <drv> yes
[16:08:23] *** Joins: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97)
[16:49:08] <drv> just fyi, we are doing the monthly scheduled build pool update, and it seems that the current kernel shipped by Fedora 26 panics during the vhost tests
[16:49:14] <drv> so sethhowe is rolling back the kernel to the previous version
[17:01:57] <peluse> so the use after free is on the line TAILQ_FOREACH_SAFE itself but the last execution of the loop does correctly pull the last item off the list
[17:03:01] <peluse> the Q head is 0 when it returns to the top of the loop where it should just bail
[17:04:01] <peluse> drv, all of the items are being pulled off of that list in spdk_bdev_part_free()
[17:10:49] <peluse> here's a ton of prints that are pretty self-explanatory.  This is the working version where I replace the SAFE loop with a while FIRST https://gist.github.com/peluse/25fb38357302d101857bebdaa0e6380a
[17:12:10] <peluse> the prints with the 4 pointer values after spdk_bdev_part_base_hotremove() are a quick dump of the queue before we start walking through it to do unregisters
[17:13:13] <drv> hmm, if the manual loop with TAILQ_FIRST works, I'd be tempted to just check that in
[17:13:39] <drv> you could try manually expanding the implementation of the TAILQ_FOREACH_SAFE macro on several lines so you could get a better idea of where it's blowing up, but I don't know if that would help much
[17:32:20] <peluse> drv, yeah I was actually thinking of doing that after finding out that the FIRST thing worked...
[17:34:39] <peluse> but again, on first glance I don't think we're assured that a call to unregsiter from that loop is guaranteed to remove it from the list so we could end up in an endless loop without some other hack.  Will keep looking at it tomorrow.  thanks!
[18:02:38] <sethhowe> Just a PSA for the build pool. drv and I updated the build pool which introduced a latent failure which we have patched. I figure some of you are out of the office, so I took the liberty of rebasing the patches already in the queue so that they will not fail tonight.
[18:13:05] <Shuhei> Thank you Seth!
[18:59:18] <Shuhei> Hi All, it looks that SPDK RPC handler pass only simple structure, array at the most to SPDK internal.
[19:00:12] <Shuhei> And SPDK RPC handler does not pass struct to SPDK internal, I mean SPDK internal does not know JSON config structure.
[19:00:36] <Shuhei> I think this is the rule for us to follow.
[19:00:40] <Shuhei> Is it correct?
[19:11:36] <Shuhei> Sorry, I thought passing complex structure might be necessary, but found it was not. Hence it's OK to follow the current implementation.
[23:04:33] *** Joins: sbasierx (sbasierx@nat/intel/x-iugasbnexfzvlnjg)