[00:58:54] *** Quits: ziyeyang_ (ziyeyang@nat/intel/x-otasevihheufonkt) (Quit: Leaving)
[01:04:05] *** Quits: pzedlews (~pzedlews@192.55.54.44) (Ping timeout: 240 seconds)
[01:04:50] *** Joins: pzedlews (~pzedlews@192.55.54.44)
[01:48:15] *** Joins: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl)
[02:20:18] *** Quits: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) (Ping timeout: 268 seconds)
[02:20:59] *** Joins: gila (~gila@ec2-54-88-161-49.compute-1.amazonaws.com)
[04:57:34] *** Quits: gila (~gila@ec2-54-88-161-49.compute-1.amazonaws.com) (Quit: My Mac Pro has gone to sleep. ZZZzzz…)
[05:58:03] *** Joins: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl)
[05:59:08] *** Joins: gila_ (~gila@ec2-54-88-161-49.compute-1.amazonaws.com)
[06:02:32] *** Quits: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) (Ping timeout: 240 seconds)
[07:20:38] <peluse> gangcao, no, it's that they hang unless I insert delays.  If I insert delays the nvme.sh tests completes without error
[07:30:29] *** Quits: gila_ (~gila@ec2-54-88-161-49.compute-1.amazonaws.com) (Ping timeout: 268 seconds)
[07:31:11] *** Joins: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl)
[08:41:37] <peluse> FYI I put some rough ideas for a super basic mock lib in https://trello.com/c/GrppU5JB/15-add-utmock-library-to-help-with-authoring-ut
[09:19:57] *** Joins: vermavis (c037362d@gateway/web/freenode/ip.192.55.54.45)
[09:29:36] *** Quits: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) (Quit: My Mac Pro has gone to sleep. ZZZzzz…)
[09:36:24] *** Quits: vermavis (c037362d@gateway/web/freenode/ip.192.55.54.45) (Ping timeout: 260 seconds)
[09:55:22] <bwalker> I think the ideas around the mock stuff look good
[09:55:31] <bwalker> doing it using macros seems like the right way to go to me
[09:55:49] <bwalker> instead of defining each function in a mock library
[09:56:03] <bwalker> for the syscall ones we can do that, but for SPDK mocked functions the macro is better
[09:57:36] <bwalker> I don't think you need a "table" of the mock functions either - if the macro to declare the stub creates the C function plus a global struct that's named like func_name_mock_opts
[09:57:50] <bwalker> then you can write a macro named "mock_put" that takes the function name and return value and updates that global struct
[10:01:25] <bwalker> I'm not sure how workable it is with functions, but there may be something clever we can do using typeof too
[10:01:32] <bwalker> to make the declare stub macro simpler
[10:13:54] <bwalker> ok you can't use a typedef/typeof to define a function, so that won't help
[10:14:03] <bwalker> but it still isn't bad
[10:38:26] <jimharris> bwalker: all of your event/subsystem patches look good - there are still three in the series that drv needs to review though
[10:38:36] <jimharris> added my +2 to two of your log patches too
[10:39:33] <jimharris> could you rebase https://review.gerrithub.io/#/c/364831/ off of the end of your event/subsystem series?
[10:43:29] <drv> bwalker: I made a comment on the last patch in the event series: https://review.gerrithub.io/#/c/365730/
[10:45:27] <bwalker> ok - I'm going to merge up to the one before the one where drv made the comment
[10:45:33] <bwalker> then fix that, rebase
[10:45:37] <bwalker> and rebase the cb_event one on top of that
[10:46:50] <bwalker> dang, merge conflict about half way through
[10:46:55] <bwalker> I merged what I could, rebasing the rest
[10:47:20] <jimharris> I think I screwed you - I accidentally merged some of drv's unit test changes
[10:48:06] <jimharris> but I guess one of you was going to have to rebase either way
[10:48:13] <bwalker> I was able to merge 4 patches, so not bad
[10:49:13] <jimharris> drv: can you rebase the last 3 patches in your unit test series?
[10:49:18] <drv> sure
[11:29:13] <jimharris> bwalker: the scsi patch still has the duplicate #include for event.h
[11:30:01] <bwalker> crap thanks for letting me know
[11:30:55] <jimharris> bwalker: can you respond to this comment from Ziye?  https://review.gerrithub.io/#/c/366149/
[11:42:59] *** Joins: vermavis (c037362d@gateway/web/freenode/ip.192.55.54.45)
[11:57:00] *** Quits: vermavis (c037362d@gateway/web/freenode/ip.192.55.54.45) (Ping timeout: 260 seconds)
[12:02:00] *** Joins: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl)
[12:07:02] <jimharris> drv, bwalker: can you check the first two patches here?  https://review.gerrithub.io/#/c/364947/
[12:07:14] <jimharris> i'm ok with these going in, as a step towards eventually doing the periodic pollers
[12:13:59] <peluse> bwalker, thanks for the feedback - consistent with a chat Daniel/Jim and I had earlier so that's what I'm attempting now (no table just macro magic)
[12:17:19] <bwalker> jimharris: they're fine with me, but those two patches just make the behavior the same as master
[12:17:27] <bwalker> the first one reverts a change and the second one adds it back in
[12:17:30] <bwalker> so the result is the same behavior
[12:17:46] <bwalker> the patches are just more minimal than the one I merged already
[12:18:00] <bwalker> I modified a couple of other things to get some additional error messages
[12:18:25] <bwalker> so he reverts that, then proposes a patch that does mostly the same thing but with fewer modified lines
[12:19:43] <jimharris> not arguing
[12:19:45] <bwalker> peluse: I would have been willing to bet money that we all had the same thought on how to implement that macro
[12:20:35] <peluse> heh, likely. macros are never my first choice for anything, ugh
[12:20:49] <jimharris> i think all of that related code will change once the periodic poller + reset changes go in - the differences between current and the revert are fairly moot now since the resets are disabled
[12:21:17] <jimharris> but i'll defer to you if you want to ask darek to pull those patches out
[12:22:10] <bwalker> he's got other patches based on top of them, so we can just merge
[12:22:15] <bwalker> path of least resistance
[12:22:29] <jimharris> yeah - that's what i was thinking
[12:22:35] <drv> sounds good to me
[12:24:04] <bwalker> ever since we opened SPDK up and my email address has been floating around publicly the amount of spam mail I get has been increasing exponentially
[12:24:48] <bwalker> companies must literally troll GitHub commits to grab email addresses or something
[12:24:54] <bwalker> or mailing list posts
[12:28:31] <peluse> or your home PC...
[12:37:52] <jimharris> bwalker/drv: can one of you take a look at gang's patch? https://review.gerrithub.io/#/c/365646/
[12:37:59] <jimharris> this should resolve the test failures in the pool
[12:38:32] <jimharris> there's still some work needed here though - we must have some apps that are not properly releasing queues before shut down
[12:40:47] <drv> seems OK, although I'm still not clear on the reason for the core mask change
[12:41:11] <jimharris> i think he changed it from 0xf to 0xf0 in an earlier patch, thinking that would resolve this issue
[12:41:18] <jimharris> so he's just putting it back to where it was before
[12:42:15] <jimharris> git show 3b3ab8
[12:46:56] <drv> are we sure there isn't anything else in DPDK that requires multiprocess apps be on different cores?
[12:47:02] <drv> I know we took out mempool usage in NVMe
[12:47:24] <jimharris> well you can get crappy performance
[12:47:33] <drv> but the DPDK docs say "Attempting to do so can cause corruption of memory pool caches, among other issues."
[12:47:38] <drv> so I don't know what the "other issues" are
[12:47:39] <jimharris> i just ran into this with this virtio-scsi user initiator
[12:47:56] <jimharris> two threads polling on the same core means they get timesliced
[12:48:01] <drv> yeah
[12:48:06] <jimharris> so I was getting 30 IO/s
[12:48:14] <jimharris> different cores => 400K IO/s
[12:49:06] <jimharris> i want to rethink our multi-process strategy...
[12:49:16] <drv> me too :)
[12:49:22] <jimharris> ...for example, if a process dies unexpectedly, do we want to try to clean up?
[12:50:01] <drv> I think it's not really possible to clean up correctly in a lot of cases
[12:50:08] <drv> the robust mutex stuff we're doing now is essentially a hack
[12:50:19] <drv> if a process dies holding the mutex, we have no idea what state the mutex-protected data was in
[12:50:45] <jimharris> i'm thinking about something like this (imagine me doing lots of hand waving here)
[12:50:46] <drv> so I would be OK with just declaring that we don't handle that case
[12:51:09] <peluse> QQ: with the recent merge of 'dpdk: switch submodule to spdk-17.05 branch' what's the easiest way to resolve a rebase merge conflict on older patches?
[12:51:28] <bwalker> git checkout <branch you want to rebase>
[12:51:32] <bwalker> git rebase -i master
[12:51:39] <bwalker> if it shows dpdk out of date
[12:51:43] <bwalker> git submodule update
[12:51:57] <bwalker> when it hits the conflict it may show dpdk out of date - that's when you do git submodule update
[12:52:08] <drv> you probably also need to do 'git add dpdk' to mark the conflict as resolved during the rebase
[12:52:11] <jimharris> if the stub process could detect secondary processes, have the stub process basically kill everything off if a process dies unexpectedly
[12:52:43] <bwalker> by stub do you mean primary?
[12:52:45] <jimharris> and take out the code from the nvme driver that tries to detect killed processes, cleaning up queues, etc.
[12:52:53] <jimharris> yes - the primary app
[12:52:57] <bwalker> so leave it to the app
[12:53:10] <peluse> gracias
[12:53:24] <jimharris> or it's something running on top of that (python script?) - haven't totally thought this through yet
[12:53:46] <drv> or you can go for the nuclear approach and just blow the submodule away until you are done rebasing
[12:54:20] <bwalker> I like where your thinking is going, but it isn't all clear to me just yet how this ends up working
[12:56:43] <jimharris> i think it's fairly straightforward if (and that's a big "if") the stub app can get notified about any secondary processes
[12:57:12] <bwalker> what's the mechanism for notification? can it register for events on opens of the shared memory files?
[12:57:47] <bwalker> and only the NVMe driver knows about those files, so how do we expose that to the app
[12:57:48] <jimharris> that's the big todo
[12:57:51] <drv> you should be able to use waitpid(), assuming we can detect when it starts up
[12:59:13] <drv> hm, I guess the other processes are not actually children of the primary, though
[12:59:14] <drv> maybe that won't work
[13:00:02] <jimharris> let me dig into this a bit
[13:02:32] <jimharris> we can use inotify on the dpdk config file
[13:02:36] <jimharris> i think..
[13:02:44] <jimharris> lsof -p <dpdk process>
[13:03:20] <jimharris> inotify on OPEN events for the /run/.spdk_xxx_config file
[13:05:14] <drv> hmm, testing stub on my local system, it looks like we don't clean up the sentinel file if I kill it with Ctrl-C
[13:05:28] <jimharris> yeah - I haven't gotten around to adding that
[13:05:46] <jimharris> so inotify won't work - we get notified that *someone* opened it
[13:10:50] <jimharris> i think fanotify might work
[13:57:37] <drv> peluse: if you get a chance, can you try re-running the NVMe multi-process tests on your test setup?
[13:57:41] <drv> after Gang's patch
[13:57:50] <peluse> sure, its on master now right?
[13:57:53] <drv> yep
[13:58:00] <peluse> will do it now...
[13:59:10] <jimharris> bwalker: committed the next 3 patches in your series, posted comments on the scsi cb_event patch - I think there are some additional simplifications you can make
[13:59:23] <jimharris> https://review.gerrithub.io/#/c/364831/
[14:07:58] <peluse> drv, failed before that point this time...
[14:08:00] <peluse>    16   timing_enter bounds
[14:08:00] <peluse> => 17   $testdir/bdevio/bdevio $testdir/bdev.conf
[14:08:01] <peluse>    18   timing_exit bounds
[14:08:32] <peluse> machine might be in a funky state, will reboot and try again
[14:08:32] <drv> hmm, that's before the NVMe tests
[14:08:35] <peluse> yeah
[14:09:13] <peluse> takes a little while to reboot, will let ya know shortly if it gets any farther after fresh boot
[14:23:00] <peluse> hey, it got beyond nvme.sh for the first time ever on this machine (well, without adding delays)! fails further on now, I'll rerun a few times and keep ya posted
[14:25:04] <drv> cool
[14:25:29] <peluse> drv, just curious, what drove you to make the change to autobuild that pukes when it sees untracked changes?
[14:25:42] <peluse> changes=files I mean of course
[14:26:20] <drv> we occasionally forget to add .gitignore entires for binaries
[14:26:27] <drv> and it's nice to have that checking automated
[14:27:04] <peluse> OK, as a side effect I need to do a git clean if a run of autobuild fails in the middle, before I can rerun that is
[14:27:35] <drv> ah, hm, the pool automatically cleans up between runs, but I guess if you are running manually it is a little bit of a hassle
[14:27:51] <peluse> yeah, no big deal.  just confirming that its behaving as expected
[14:28:29] <jimharris> drv: do we need a bdev_unregistered() hook?
[14:29:01] <jimharris> i'm reviewing ziyeyang's gpt patch and trying to figure out how a vbdev would know when a base bdev was removed
[14:29:02] <drv> probably
[14:29:14] <drv> well, it should work like hot remove
[14:29:16] <jimharris> ok - just making sure i wasn't missing something
[14:29:33] <jimharris> hmmm - yeah
[14:29:36] <jimharris> i'll need to look
[14:30:08] <drv> I guess there is only one remove_cb, so a bdev module can't really hook that directly
[14:31:50] <jimharris> is there any case where two different vbdevs would build on top of the same base bdev?
[14:32:20] <drv> probably not, but I think we want to support a case where GPT has installed vbdevs on top of a bdev, but somebody wants to export the base bdev directly to e.g. iSCSI
[14:32:33] <drv> that gets back to what claiming a bdev should actually mean
[14:32:48] <jimharris> yeah - i'm going to take a crack at that (or have ziyeyang look at it)
[14:33:07] <drv> maybe we just need a list of remove_cb's
[14:33:49] <jimharris> i was thinking one remove_cb for whoever opened it (i.e. iSCSI, vhost), and another for an upper_vbdev_cb (name sucks, i'll think of something better)
[14:34:09] <jimharris> instead of a list
[14:34:30] <drv> that is probably fine too
[14:34:38] <jimharris> we can make it a list later if we have to
[14:34:42] <drv> although I guess we could think of situations where you could have multiple vbdevs attached to a base bdev
[14:34:55] <jimharris> oh - that will definitely happen
[14:35:03] <jimharris> multiple GPT partitions on one base bdev
[14:35:09] <jimharris> I just meant multiple vbdev drivers
[14:35:16] <drv> yeah, that's what I meant too, sorry
[14:35:19] <jimharris> i.e. GPT and split, or GPT and logical volume
[14:35:40] <jimharris> i'm just not sure in practice what that would look like?
[14:36:03] <jimharris> you'll have a logical volume store in a GPT partition in an NVMe namespace
[14:36:13] <jimharris> but that's all 1 => 1 => 1
[14:36:14] <drv> I don't have an example in mind, so we can probably just do the two callbacks like you said for now
[14:36:48] <jimharris> going to need to figure out that asynchronous RPC thing
[14:36:51] <peluse> FYI passed nvme.sh 3x in a row so yeah that fix appears to be relevant for these systems. Failing somewhere in env.sh now, will look at that later this week...
[14:36:59] <jimharris> sweet
[14:37:01] <drv> nice
[14:37:12] <peluse> gotta love freebies like that :)
[14:37:19] <peluse> well, free to me I should say!
[14:37:23] <drv> env.sh failing sounds a little worrying
[14:37:26] <peluse> thanks gangcao!
[14:37:35] <drv> that just runs two tests
[14:38:16] <peluse> yeah, I need to go compare against good output to see what's different so can't tell ya much now...
[14:38:32] <jimharris> i think we need an spdk nbd driver
[14:38:43] <jimharris> so we can run things like parted on an spdk bdev
[14:39:09] <drv> yeah, I was just wondering if there was a better way to set up Ziye's GPT test than doing a reset/setup every time
[14:39:23] <jimharris> nbd would be slick
[14:40:01] <drv> to trello! ;)
[14:40:11] <jimharris> yeah
[14:41:00] <jimharris> what about using a ramdisk instead?  run parted on the ramdisk, then specify the ramdisk as an AIO bdev
[14:41:36] <drv> sounds good to me
[14:41:42] <drv> doesn't really need to be NVMe
[14:41:43] <drv> it could even just be a disk file
[14:41:59] <drv> I think parted can operate on a regular file, or we can set up a loop device
[15:00:55] *** Joins: changpe1- (~changpe1@192.55.54.44)
[15:01:19] *** changpe1- is now known as vermavis
[15:18:05] *** Quits: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) (Ping timeout: 241 seconds)
[15:21:41] <peluse> wrt mock UT stuff, take a look at this and see if it seems sane/what you guys where thinking. Just did the 2 syscalls that are already wrapped, will hold off doing more til I get some feedback on this.. https://review.gerrithub.io/#/c/366348/
[15:21:41] *** Joins: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl)
[15:33:13] <jimharris> peluse - posted some comments - overall this looks great though - exactly what I had in mind
[15:34:10] <peluse> #define SUCCESS timeForBeer=1;
[15:34:18] <drv> peluse: btw, the failure on your mock change is not due to your patch - sethhowe_ is working on adding an Ubuntu machine
[15:34:30] <peluse> drv, cool, thanks
[15:34:51] <peluse> sethhowe_, that's awesome BTW....
[15:51:13] *** Quits: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) (Quit: My Mac Pro has gone to sleep. ZZZzzz…)
[15:56:00] <sethhowe_> Thanks peluse, sorry it's messing up all of your tests. I'm going to clean that up right now and make sure they get run again.
[15:56:29] <peluse> sethhowe_, no hurry.  thanks
[15:56:38] <peluse> well no hurry on account of me I mean :)
[18:11:52] *** Joins: cunyinch_ (~cunyinch@134.134.139.78)
[18:16:35] *** Joins: ziyeyang_ (~ziyeyang@134.134.139.73)