[00:16:02] *** Joins: tomzawadzki (tzawadzk@nat/intel/x-nmjgyarsgqbilkpx)
[01:00:26] *** Quits: ziyeyang_ (~ziyeyang@134.134.139.72) (Quit: Leaving)
[02:01:26] *** Quits: cunyinch_ (~cunyinch@134.134.139.82) (Remote host closed the connection)
[02:45:45] *** Quits: whitepa (~whitepa@2601:601:1200:f23b:510d:fd92:df29:a7ba) (Read error: Connection reset by peer)
[04:07:21] *** Quits: tsuyoshi (b42b2067@gateway/web/freenode/ip.180.43.32.103) (Quit: Page closed)
[08:31:01] *** Quits: tomzawadzki (tzawadzk@nat/intel/x-nmjgyarsgqbilkpx) (Ping timeout: 258 seconds)
[08:53:36] <jimharris> the i/o counting patch didn't fail because of calsoft - i misread the log and contents of the results directory
[08:53:47] <jimharris> it was a bug in the error vbdev module
[08:54:20] <jimharris> it is getting a new bdev_io via get_child_io() but then calling resubmit on that new I/O
[08:55:00] <jimharris> because we don't have an API available to the error injection module to submit this child_io as a "new" I/O
[08:55:21] <jimharris> for now i'm just fixing the error module to just fail the I/O immediately (as drv had suggested in passing last week)
[08:55:54] <drv> yeah, I think we can just chop out all of the child I/O stuff
[09:22:33] <drv> jimharris: were you going to add a SPDK_TEST_ flag for unit tests? if not, I can put that together
[09:23:32] <jimharris> i was, but if you want to throw it together i'm ok with that too :)
[09:24:26] <drv> ok, I will make a patch
[09:25:07] <jimharris> yeah - let's get that in asap and hold off on committing patches until we get that in - it will actually save us about 40s per patch
[09:26:02] <jimharris> it was 30 on my system but looks like 40-45s on fedora-07 (currently the long pole in the tent among the test systems)
[09:27:33] <drv> https://review.gerrithub.io/#/c/362390/ - I already set the flag to 0 on wkb-fedora-07
[09:28:56] <bwalker> need that reorderable build queue
[09:29:07] <jimharris> i'd go ahead and remove it from fedora-04 too
[09:29:55] <drv> ok
[09:34:43] <jimharris> have we started at all on the dynamic patch queue?
[09:35:14] <bwalker> no - he's working on putting the agents in VMs right now
[09:35:24] <bwalker> so we can quickly deploy lots of agents
[09:35:32] <jimharris> cool
[09:35:48] <bwalker> also he's writing unit tests
[09:35:52] <bwalker> for the real code
[09:35:53] <jimharris> i was just curious - this is a case where bumping drv's patch to that front would be really nice
[09:35:57] <jimharris> yeah - i saw that
[09:36:06] <bwalker> I agree - reorderable queue would be awesome
[09:37:20] <drv> we could actually do it by hand-editing the file righr now
[09:37:27] <jimharris> do it :)
[09:37:31] <drv> I think...
[09:38:14] <jimharris> well actually that wouldn't help that much - the patches already in the queue wouldn't skip the unit tests
[09:38:34] <bwalker> but for the ones we own we can rebase them real quick if we merge
[09:38:37] <jimharris> sure
[09:38:52] <bwalker> if we rebase before they run, they'll run the rebased version instead
[09:38:55] <jimharris> the release builds in the queue would still run long
[09:38:56] <bwalker> it's smart enough to work all of that out
[09:38:59] <bwalker> yeah
[10:58:36] <peluse> jimharris, so my vagrant VM doesn't have an NVMe device for sure, I'm assuming yours does? When I run on my Mac it is there. Wondering if there's a version and/or syntax thing with the addition of the NVMe device in Vagrantfile. Or maybe my vbox ext versions, will check...
[11:00:11] <jimharris> mine does but it's also Mac - I haven't tried Linux yet
[11:04:14] <peluse> hmmm, my linux vbox is older than my max vbox.  I'll update and see if it works
[11:42:48] <jimharris> bwalker, drv: could you review my nvme perf latency patches?  i'd like to get those committed before i write up info on my histogram approach for girish
[11:45:35] <bwalker> reviewed. there are comments on the second one
[11:47:58] <jimharris> drv: do you want me to respin to add 'const' to that cutoff array?
[12:29:50] <johnmeneghini1> Hey, I just realized that you guys have make dpdk a submodule of spdk in gerrithub
[12:31:30] <johnmeneghini1> This is going to make things a little more complicated.
[12:32:09] <johnmeneghini1> What are we supposed to do if we have patches in DPDK that we need to get our SPDK changes to work?
[12:32:38] <johnmeneghini1> This means I need to upstream my patches to DPDK as well as my Changes to SPDK
[12:48:28] <drv> jimharris: if it's not too much trouble, I think it would be a good idea
[12:48:53] <drv> johnmeneghini1: we still support building with out-of-tree DPDK, so you should be able to point at your own patched version of DPDK
[12:49:04] <drv> the submodule is just to make it easier to get a known-good version of DPDK that works with the current version of SPDK
[12:51:52] <johnmeneghini1> Agreed.
[12:53:45] <johnmeneghini1> Also, what does it mean that "gerrithub automaticlly updates your github project"?
[12:54:06] <johnmeneghini1> While setting this up I imported my github spdk fork.
[12:54:19] <johnmeneghini1> At this point I'm thinking that was a bad idea.
[12:54:24] <drv> if you use Gerrithub to merge patches (using the Submit button), they would get synced to your github project
[12:54:32] <drv> but if you don't do that with gerrit, nothing will happen
[12:55:11] <drv> if you just directly interact with the GitHub project and don't touch it on Gerrithub, it won't be a problem
[12:56:26] <johnmeneghini1> So my github fork will be synced with patches that I merge into gerrithub but not that others merge?
[13:00:27] <peluse> jimharris, FYI w/latest vbox on linux I get the NVMe device in the vagrant VM and hellow works as expected including seutp.sh in Vagrantfile.  Will update patch accordingly...
[13:01:13] <drv> johnmeneghini1: your github fork won't be modified unless you push things via gerrithub
[13:03:44] <drv> you can probably also delete your project on gerrithub if you didn't want to import it
[13:06:32] <johnmeneghini1> OK.  I think I'm going to remove my github fork on gerrithub. I maintain this fork manually by pulling from github/spdk and pushing to github/johnmeneghini/spdk.  And I do plan to push things into spdk via gerrithub.  I'm planning to move completely away from github.
[13:09:08] <drv> just to be clear, the main spdk repo is called 'spdk/spdk' on gerrithub, and pushing things to that won't affect your johnmeneghini/spdk fork on gerrithub or github
[13:09:30] <drv> but yeah, if you aren't going to use the gerrithub integration for your fork, I would just remove it to avoid confusion
[13:10:19] <johnmeneghini1> Agreed
[13:14:28] <drv> jimharris: your new latency patch has a dpdk submodule change in it: https://review.gerrithub.io/#/c/362270/
[13:14:35] <drv> probably not intentional
[13:16:19] <jimharris> good grief
[13:17:47] <jimharris> fixed
[13:18:29] <jimharris> so another thing I noticed is the total build time gets reported as about 45s longer than the longest of the individual test systems
[13:18:36] <jimharris> http://spdk.intel.com/public/spdk/builds/release/master/2219/
[13:18:54] <drv> hm, that is quite a bit
[13:19:27] <drv> some of that is probably from copying the source tree over sshfs, but I can't imagine it takes 45 seconds
[13:24:35] <johnmeneghini1> Is there some way I can download a patch set that I am reviewing in gerrithub?  It's hard to review some of these changes w/out being able to see the whole bundle.  I'd prefer to download a patch and use cscope.
[13:25:13] <jimharris> in the upper right corner, you should see a menu called "Download"
[13:25:32] <jimharris> there are several options there for how to download it
[13:25:49] <jimharris> personally i usually use the Checkout option
[13:26:53] <johnmeneghini1> OK. Thanks
[13:30:46] <peluse> ooh, my daily reminder just went off to harass bwalker about setting up an office hours time... :)
[13:40:07] <sethhowe> jimharris, thanks for pointing that out. I'm going to analyze this next build and see where the hangup is.
[13:42:36] <drv> I vote that we schedule our office hours in Swatch internet time
[13:42:52] <bwalker> what does that mean
[13:52:09] <peluse> http://lmgtfy.com/?q=swatch+internet+time
[13:52:57] <sethhowe> It looks like we lose about 10 seconds while the pool and agent communicate back and forth getting ready to start the build(There are a few intermediate states). Still looking for other causes.
[13:53:06] <peluse> they taught me that in school :)
[13:54:02] <drv> it's a useful life skill
[13:54:25] <peluse> bwalker, one more try on that timeslot please.... see email
[13:56:35] *** Quits: johnmeneghini1 (~johnmeneg@pool-96-252-112-122.bstnma.fios.verizon.net) (Quit: Leaving.)
[13:57:40] <peluse> like even an hour earlier would be good, or that time on W or F
[14:06:34] <bwalker> 9am is pretty early - I'm not always caught up on email and done putting out fires by then
[14:06:44] <bwalker> and 11-12 bleeds into lunch
[14:06:54] <bwalker> so I want it at 10am
[14:07:17] <bwalker> but you have something every day at 10am except wed.
[14:07:27] <bwalker> while the rest of us have nothing at 10am every day except wed
[14:29:40] <peluse> I can do Mon
[14:30:03] <peluse> does that work?
[14:33:20] <bwalker> yeah monday works
[14:33:34] <peluse> great, thanks
[14:37:55] *** Joins: johnmeneghini (~johnmeneg@pool-96-252-112-122.bstnma.fios.verizon.net)
[14:38:17] <johnmeneghini> Ben and Daniel
[14:38:38] <johnmeneghini> I'm looking at https://review.gerrithub.io/#/c/362255/2/lib/util/io_channel.c
[14:39:04] <johnmeneghini> I have a comment, bit it appears to be stuck in the draft state.
[14:40:02] <drv> to publish comments, you have to hit Reply... at the top level and Post
[14:40:17] <drv> you can also do +1/-1 depending on what you think of the patch
[14:40:20] <bwalker> yeah - if you go to the top level of the review (up button if you are looking at a file)
[14:40:28] <bwalker> at the top is a "Reply..." button
[14:40:37] <bwalker> you get to vote and it publishes all of your comments at once
[14:49:12] <johnmeneghini> OK. Thanks.  You've got my first comment, but I'm now done for a while.  I'll be offline for the next few days.  My daughter is getting married this week end!
[14:50:13] <bwalker> oh congrats
[15:01:18] *** Joins: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl)
[15:12:53] <peluse> nice!
[15:20:48] <bwalker> jimharris: I'm investigating the unit test failure from this patch https://review.gerrithub.io/#/c/362254/
[15:20:55] <bwalker> the patch should be totally unrelated to the failure
[15:21:21] <bwalker> and I've made some progress - namely that it is timing related
[15:21:36] <bwalker> if the code runs slower, because for instance thread sanitizer is on, it hits
[15:21:39] <bwalker> but if it runs fast it doesn't
[15:21:59] <bwalker> so then I dug into how it is doing the actual message passing
[15:22:37] <jimharris> that's the first time I've seen that failure
[15:22:39] <bwalker> and am I correct in understanding that it's assuming the blobfs operations will each submit at most queue depth one worth of requests?
[15:23:15] <jimharris> yes
[15:23:18] <bwalker> because I made a quick modification where in ut_send_request and in spdk_thread I set g_req to NULL
[15:23:23] <bwalker> when I'm done with it
[15:23:29] <bwalker> i.e. whenever req->done is set to 1
[15:23:38] <bwalker> and then in send_request, I assert that g_req is NULL
[15:23:45] <bwalker> if I run normally, it passes
[15:23:55] <bwalker> if I turn on thread sanitizer or put in delays
[15:23:56] <bwalker> it fails
[15:24:08] <bwalker> so two reqs are being submitted
[15:24:22] <bwalker> specifically, from spdk_file_truncate
[15:24:24] <jimharris> hold on - i'll need to take a look
[15:26:25] <jimharris> where are two requests getting sent from truncate?
[15:26:36] <bwalker> that's just the stack when I hit the assert
[15:27:04] <bwalker> the rest of the stack goes up to cache_write, line 154
[15:27:17] <bwalker> so there could be an I/O outstanding from spdk_fs_open_file maybe?
[15:27:24] <jimharris> but spdk_file_truncate only calls ->send_request once
[15:27:26] <bwalker> I'm trying to make sure that my asserts are all correct
[15:29:51] <bwalker> yeah - when truncate calls send_request
[15:29:58] <bwalker> there is still an __fs_open_file req out
[15:30:16] <bwalker> triple checking that I'm doing the right thing with my asserts here
[15:30:44] <jimharris> to be honest I need to rethink how this blobfs_sync_ut code works - I wanted to run it all without relying on the spdk framework, but my "solution" isn't very ideal
[15:31:01] <bwalker> yeah - thread sanitizer thinks it is totally broken
[15:31:05] <jimharris> so does valgrind
[15:31:08] <bwalker> but it's not assuming x86 atomicity rules
[15:31:24] <jimharris> i can take a look at this tomorrow
[15:31:42] <jimharris> btw - i have a new idea to reduce the iscsi (and probably nvmf) tests dramatically
[15:31:54] <jimharris> i've tested it out locally and it works - needs a few patches to get it ready
[15:32:13] <jimharris> the basic idea is to create a dpdk "stub" app that basically just does the spdk_env_init() as a primary process
[15:32:18] <jimharris> also probes nvme devices
[15:32:32] <jimharris> then all of the iscsi tests just pass an instance id to bind to that stub process
[15:32:40] <jimharris> so we avoid dpdk init time + nvme probe time for all of those tests
[15:33:26] <jimharris> that also allows us to be much more granular on our system-level automated tests
[15:34:07] <bwalker> I like it
[15:38:01] <bwalker> well, my assert was very slightly wrong
[15:38:26] <bwalker> in the spdk_thread, I was setting g_req to NULL after I called req->fn
[15:38:41] <bwalker> but req->fn kicks the semaphore, so the main thread moves on and tries to call the next I/O
[16:01:04] <sethhowe> jimharris: I did a little more looking into the timing issues. There is about a fifteen second latency before all of the agents get to the point of running the builds, and the build finishes very quickly after the last agent finishes.
[16:02:45] <sethhowe> I think a lot of the disparity is coming from the fact that I am using the data generated by the timing function in autorun to generate the agent time while I am measuring the pool time directly from python.
[16:04:17] <sethhowe> I will change the scripts to measure agent time directly so that we get a better idea of real issues slowing down the builds.
[16:09:21] <jimharris> sethhowe: thanks!  some of these issues might not be fixable - hope you don't mind sharing what I find to see if there are easy things we can do to reduce the test times
[16:14:52] <sethhowe> Not at all! Keeping test times down will go a long way towards improving development turnaround times.
[16:21:23] *** Quits: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) (Quit: My Mac Pro has gone to sleep. ZZZzzz…)
[18:16:01] *** Joins: ziyeyang_ (~ziyeyang@134.134.139.77)
[21:59:00] *** Joins: changpeng (changpeng@nat/intel/x-fgtafscqyasqtbjv)
[22:00:09] *** Quits: johnmeneghini (~johnmeneg@pool-96-252-112-122.bstnma.fios.verizon.net) (Quit: Leaving.)
[22:49:14] *** Joins: tsuyoshi (b42b2067@gateway/web/freenode/ip.180.43.32.103)