[00:58:54] *** Quits: ziyeyang_ (ziyeyang@nat/intel/x-otasevihheufonkt) (Quit: Leaving) [01:04:05] *** Quits: pzedlews (~pzedlews@192.55.54.44) (Ping timeout: 240 seconds) [01:04:50] *** Joins: pzedlews (~pzedlews@192.55.54.44) [01:48:15] *** Joins: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) [02:20:18] *** Quits: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) (Ping timeout: 268 seconds) [02:20:59] *** Joins: gila (~gila@ec2-54-88-161-49.compute-1.amazonaws.com) [04:57:34] *** Quits: gila (~gila@ec2-54-88-161-49.compute-1.amazonaws.com) (Quit: My Mac Pro has gone to sleep. ZZZzzz…) [05:58:03] *** Joins: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) [05:59:08] *** Joins: gila_ (~gila@ec2-54-88-161-49.compute-1.amazonaws.com) [06:02:32] *** Quits: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) (Ping timeout: 240 seconds) [07:20:38] gangcao, no, it's that they hang unless I insert delays. If I insert delays the nvme.sh tests completes without error [07:30:29] *** Quits: gila_ (~gila@ec2-54-88-161-49.compute-1.amazonaws.com) (Ping timeout: 268 seconds) [07:31:11] *** Joins: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) [08:41:37] FYI I put some rough ideas for a super basic mock lib in https://trello.com/c/GrppU5JB/15-add-utmock-library-to-help-with-authoring-ut [09:19:57] *** Joins: vermavis (c037362d@gateway/web/freenode/ip.192.55.54.45) [09:29:36] *** Quits: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) (Quit: My Mac Pro has gone to sleep. ZZZzzz…) [09:36:24] *** Quits: vermavis (c037362d@gateway/web/freenode/ip.192.55.54.45) (Ping timeout: 260 seconds) [09:55:22] I think the ideas around the mock stuff look good [09:55:31] doing it using macros seems like the right way to go to me [09:55:49] instead of defining each function in a mock library [09:56:03] for the syscall ones we can do that, but for SPDK mocked functions the macro is better [09:57:36] I don't think you need a "table" of the mock functions either - if the macro to declare the stub creates the C function plus a global struct that's named like func_name_mock_opts [09:57:50] then you can write a macro named "mock_put" that takes the function name and return value and updates that global struct [10:01:25] I'm not sure how workable it is with functions, but there may be something clever we can do using typeof too [10:01:32] to make the declare stub macro simpler [10:13:54] ok you can't use a typedef/typeof to define a function, so that won't help [10:14:03] but it still isn't bad [10:38:26] bwalker: all of your event/subsystem patches look good - there are still three in the series that drv needs to review though [10:38:36] added my +2 to two of your log patches too [10:39:33] could you rebase https://review.gerrithub.io/#/c/364831/ off of the end of your event/subsystem series? [10:43:29] bwalker: I made a comment on the last patch in the event series: https://review.gerrithub.io/#/c/365730/ [10:45:27] ok - I'm going to merge up to the one before the one where drv made the comment [10:45:33] then fix that, rebase [10:45:37] and rebase the cb_event one on top of that [10:46:50] dang, merge conflict about half way through [10:46:55] I merged what I could, rebasing the rest [10:47:20] I think I screwed you - I accidentally merged some of drv's unit test changes [10:48:06] but I guess one of you was going to have to rebase either way [10:48:13] I was able to merge 4 patches, so not bad [10:49:13] drv: can you rebase the last 3 patches in your unit test series? [10:49:18] sure [11:29:13] bwalker: the scsi patch still has the duplicate #include for event.h [11:30:01] crap thanks for letting me know [11:30:55] bwalker: can you respond to this comment from Ziye? https://review.gerrithub.io/#/c/366149/ [11:42:59] *** Joins: vermavis (c037362d@gateway/web/freenode/ip.192.55.54.45) [11:57:00] *** Quits: vermavis (c037362d@gateway/web/freenode/ip.192.55.54.45) (Ping timeout: 260 seconds) [12:02:00] *** Joins: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) [12:07:02] drv, bwalker: can you check the first two patches here? https://review.gerrithub.io/#/c/364947/ [12:07:14] i'm ok with these going in, as a step towards eventually doing the periodic pollers [12:13:59] bwalker, thanks for the feedback - consistent with a chat Daniel/Jim and I had earlier so that's what I'm attempting now (no table just macro magic) [12:17:19] jimharris: they're fine with me, but those two patches just make the behavior the same as master [12:17:27] the first one reverts a change and the second one adds it back in [12:17:30] so the result is the same behavior [12:17:46] the patches are just more minimal than the one I merged already [12:18:00] I modified a couple of other things to get some additional error messages [12:18:25] so he reverts that, then proposes a patch that does mostly the same thing but with fewer modified lines [12:19:43] not arguing [12:19:45] peluse: I would have been willing to bet money that we all had the same thought on how to implement that macro [12:20:35] heh, likely. macros are never my first choice for anything, ugh [12:20:49] i think all of that related code will change once the periodic poller + reset changes go in - the differences between current and the revert are fairly moot now since the resets are disabled [12:21:17] but i'll defer to you if you want to ask darek to pull those patches out [12:22:10] he's got other patches based on top of them, so we can just merge [12:22:15] path of least resistance [12:22:29] yeah - that's what i was thinking [12:22:35] sounds good to me [12:24:04] ever since we opened SPDK up and my email address has been floating around publicly the amount of spam mail I get has been increasing exponentially [12:24:48] companies must literally troll GitHub commits to grab email addresses or something [12:24:54] or mailing list posts [12:28:31] or your home PC... [12:37:52] bwalker/drv: can one of you take a look at gang's patch? https://review.gerrithub.io/#/c/365646/ [12:37:59] this should resolve the test failures in the pool [12:38:32] there's still some work needed here though - we must have some apps that are not properly releasing queues before shut down [12:40:47] seems OK, although I'm still not clear on the reason for the core mask change [12:41:11] i think he changed it from 0xf to 0xf0 in an earlier patch, thinking that would resolve this issue [12:41:18] so he's just putting it back to where it was before [12:42:15] git show 3b3ab8 [12:46:56] are we sure there isn't anything else in DPDK that requires multiprocess apps be on different cores? [12:47:02] I know we took out mempool usage in NVMe [12:47:24] well you can get crappy performance [12:47:33] but the DPDK docs say "Attempting to do so can cause corruption of memory pool caches, among other issues." [12:47:38] so I don't know what the "other issues" are [12:47:39] i just ran into this with this virtio-scsi user initiator [12:47:56] two threads polling on the same core means they get timesliced [12:48:01] yeah [12:48:06] so I was getting 30 IO/s [12:48:14] different cores => 400K IO/s [12:49:06] i want to rethink our multi-process strategy... [12:49:16] me too :) [12:49:22] ...for example, if a process dies unexpectedly, do we want to try to clean up? [12:50:01] I think it's not really possible to clean up correctly in a lot of cases [12:50:08] the robust mutex stuff we're doing now is essentially a hack [12:50:19] if a process dies holding the mutex, we have no idea what state the mutex-protected data was in [12:50:45] i'm thinking about something like this (imagine me doing lots of hand waving here) [12:50:46] so I would be OK with just declaring that we don't handle that case [12:51:09] QQ: with the recent merge of 'dpdk: switch submodule to spdk-17.05 branch' what's the easiest way to resolve a rebase merge conflict on older patches? [12:51:28] git checkout [12:51:32] git rebase -i master [12:51:39] if it shows dpdk out of date [12:51:43] git submodule update [12:51:57] when it hits the conflict it may show dpdk out of date - that's when you do git submodule update [12:52:08] you probably also need to do 'git add dpdk' to mark the conflict as resolved during the rebase [12:52:11] if the stub process could detect secondary processes, have the stub process basically kill everything off if a process dies unexpectedly [12:52:43] by stub do you mean primary? [12:52:45] and take out the code from the nvme driver that tries to detect killed processes, cleaning up queues, etc. [12:52:53] yes - the primary app [12:52:57] so leave it to the app [12:53:10] gracias [12:53:24] or it's something running on top of that (python script?) - haven't totally thought this through yet [12:53:46] or you can go for the nuclear approach and just blow the submodule away until you are done rebasing [12:54:20] I like where your thinking is going, but it isn't all clear to me just yet how this ends up working [12:56:43] i think it's fairly straightforward if (and that's a big "if") the stub app can get notified about any secondary processes [12:57:12] what's the mechanism for notification? can it register for events on opens of the shared memory files? [12:57:47] and only the NVMe driver knows about those files, so how do we expose that to the app [12:57:48] that's the big todo [12:57:51] you should be able to use waitpid(), assuming we can detect when it starts up [12:59:13] hm, I guess the other processes are not actually children of the primary, though [12:59:14] maybe that won't work [13:00:02] let me dig into this a bit [13:02:32] we can use inotify on the dpdk config file [13:02:36] i think.. [13:02:44] lsof -p [13:03:20] inotify on OPEN events for the /run/.spdk_xxx_config file [13:05:14] hmm, testing stub on my local system, it looks like we don't clean up the sentinel file if I kill it with Ctrl-C [13:05:28] yeah - I haven't gotten around to adding that [13:05:46] so inotify won't work - we get notified that *someone* opened it [13:10:50] i think fanotify might work [13:57:37] peluse: if you get a chance, can you try re-running the NVMe multi-process tests on your test setup? [13:57:41] after Gang's patch [13:57:50] sure, its on master now right? [13:57:53] yep [13:58:00] will do it now... [13:59:10] bwalker: committed the next 3 patches in your series, posted comments on the scsi cb_event patch - I think there are some additional simplifications you can make [13:59:23] https://review.gerrithub.io/#/c/364831/ [14:07:58] drv, failed before that point this time... [14:08:00] 16 timing_enter bounds [14:08:00] => 17 $testdir/bdevio/bdevio $testdir/bdev.conf [14:08:01] 18 timing_exit bounds [14:08:32] machine might be in a funky state, will reboot and try again [14:08:32] hmm, that's before the NVMe tests [14:08:35] yeah [14:09:13] takes a little while to reboot, will let ya know shortly if it gets any farther after fresh boot [14:23:00] hey, it got beyond nvme.sh for the first time ever on this machine (well, without adding delays)! fails further on now, I'll rerun a few times and keep ya posted [14:25:04] cool [14:25:29] drv, just curious, what drove you to make the change to autobuild that pukes when it sees untracked changes? [14:25:42] changes=files I mean of course [14:26:20] we occasionally forget to add .gitignore entires for binaries [14:26:27] and it's nice to have that checking automated [14:27:04] OK, as a side effect I need to do a git clean if a run of autobuild fails in the middle, before I can rerun that is [14:27:35] ah, hm, the pool automatically cleans up between runs, but I guess if you are running manually it is a little bit of a hassle [14:27:51] yeah, no big deal. just confirming that its behaving as expected [14:28:29] drv: do we need a bdev_unregistered() hook? [14:29:01] i'm reviewing ziyeyang's gpt patch and trying to figure out how a vbdev would know when a base bdev was removed [14:29:02] probably [14:29:14] well, it should work like hot remove [14:29:16] ok - just making sure i wasn't missing something [14:29:33] hmmm - yeah [14:29:36] i'll need to look [14:30:08] I guess there is only one remove_cb, so a bdev module can't really hook that directly [14:31:50] is there any case where two different vbdevs would build on top of the same base bdev? [14:32:20] probably not, but I think we want to support a case where GPT has installed vbdevs on top of a bdev, but somebody wants to export the base bdev directly to e.g. iSCSI [14:32:33] that gets back to what claiming a bdev should actually mean [14:32:48] yeah - i'm going to take a crack at that (or have ziyeyang look at it) [14:33:07] maybe we just need a list of remove_cb's [14:33:49] i was thinking one remove_cb for whoever opened it (i.e. iSCSI, vhost), and another for an upper_vbdev_cb (name sucks, i'll think of something better) [14:34:09] instead of a list [14:34:30] that is probably fine too [14:34:38] we can make it a list later if we have to [14:34:42] although I guess we could think of situations where you could have multiple vbdevs attached to a base bdev [14:34:55] oh - that will definitely happen [14:35:03] multiple GPT partitions on one base bdev [14:35:09] I just meant multiple vbdev drivers [14:35:16] yeah, that's what I meant too, sorry [14:35:19] i.e. GPT and split, or GPT and logical volume [14:35:40] i'm just not sure in practice what that would look like? [14:36:03] you'll have a logical volume store in a GPT partition in an NVMe namespace [14:36:13] but that's all 1 => 1 => 1 [14:36:14] I don't have an example in mind, so we can probably just do the two callbacks like you said for now [14:36:48] going to need to figure out that asynchronous RPC thing [14:36:51] FYI passed nvme.sh 3x in a row so yeah that fix appears to be relevant for these systems. Failing somewhere in env.sh now, will look at that later this week... [14:36:59] sweet [14:37:01] nice [14:37:12] gotta love freebies like that :) [14:37:19] well, free to me I should say! [14:37:23] env.sh failing sounds a little worrying [14:37:26] thanks gangcao! [14:37:35] that just runs two tests [14:38:16] yeah, I need to go compare against good output to see what's different so can't tell ya much now... [14:38:32] i think we need an spdk nbd driver [14:38:43] so we can run things like parted on an spdk bdev [14:39:09] yeah, I was just wondering if there was a better way to set up Ziye's GPT test than doing a reset/setup every time [14:39:23] nbd would be slick [14:40:01] to trello! ;) [14:40:11] yeah [14:41:00] what about using a ramdisk instead? run parted on the ramdisk, then specify the ramdisk as an AIO bdev [14:41:36] sounds good to me [14:41:42] doesn't really need to be NVMe [14:41:43] it could even just be a disk file [14:41:59] I think parted can operate on a regular file, or we can set up a loop device [15:00:55] *** Joins: changpe1- (~changpe1@192.55.54.44) [15:01:19] *** changpe1- is now known as vermavis [15:18:05] *** Quits: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) (Ping timeout: 241 seconds) [15:21:41] wrt mock UT stuff, take a look at this and see if it seems sane/what you guys where thinking. Just did the 2 syscalls that are already wrapped, will hold off doing more til I get some feedback on this.. https://review.gerrithub.io/#/c/366348/ [15:21:41] *** Joins: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) [15:33:13] peluse - posted some comments - overall this looks great though - exactly what I had in mind [15:34:10] #define SUCCESS timeForBeer=1; [15:34:18] peluse: btw, the failure on your mock change is not due to your patch - sethhowe_ is working on adding an Ubuntu machine [15:34:30] drv, cool, thanks [15:34:51] sethhowe_, that's awesome BTW.... [15:51:13] *** Quits: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) (Quit: My Mac Pro has gone to sleep. ZZZzzz…) [15:56:00] Thanks peluse, sorry it's messing up all of your tests. I'm going to clean that up right now and make sure they get run again. [15:56:29] sethhowe_, no hurry. thanks [15:56:38] well no hurry on account of me I mean :) [18:11:52] *** Joins: cunyinch_ (~cunyinch@134.134.139.78) [18:16:35] *** Joins: ziyeyang_ (~ziyeyang@134.134.139.73)