[00:01:57] *** Joins: lhodev_ (~lhodev@66-90-218-190.dyn.grandenetworks.net) [00:03:53] *** Quits: lhodev (~lhodev@66-90-218-190.dyn.grandenetworks.net) (Ping timeout: 268 seconds) [02:16:15] *** Quits: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) (Ping timeout: 256 seconds) [05:09:08] *** Joins: travis-ci (~travis-ci@ec2-3-93-181-94.compute-1.amazonaws.com) [05:09:09] (spdk/master) scripts/ceph: reduce ceph_raw.img from 10G to 4G (yidong0635) [05:09:10] Diff URL: https://github.com/spdk/spdk/compare/2c651f101cb3...2cc2bbe604a7 [05:09:10] *** Parts: travis-ci (~travis-ci@ec2-3-93-181-94.compute-1.amazonaws.com) () [07:23:42] *** Quits: lhodev_ (~lhodev@66-90-218-190.dyn.grandenetworks.net) (Quit: Textual IRC Client: www.textualapp.com) [07:24:10] *** Joins: lhodev (~lhodev@inet-hqmc06-o.oracle.com) [07:25:13] *** Joins: travis-ci (~travis-ci@ec2-3-81-169-192.compute-1.amazonaws.com) [07:25:14] (spdk/master) reduce: prepend rw to request buf and buf_iov (Jim Harris) [07:25:15] Diff URL: https://github.com/spdk/spdk/compare/2cc2bbe604a7...20145fd7148a [07:25:15] *** Parts: travis-ci (~travis-ci@ec2-3-81-169-192.compute-1.amazonaws.com) () [08:10:18] bwalker: Were you able to run those "scans"? I just did a fetch and did not yet note the presence of an 18.10.2 tag. [08:28:04] *** Joins: travis-ci (~travis-ci@ec2-3-91-36-228.compute-1.amazonaws.com) [08:28:05] (spdk/master) bdev/compress: add compression vbdev module (Paul Luse) [08:28:05] Diff URL: https://github.com/spdk/spdk/compare/20145fd7148a...db541f8eb36b [08:28:05] *** Parts: travis-ci (~travis-ci@ec2-3-91-36-228.compute-1.amazonaws.com) () [08:40:38] jimharris: Did I miss something, or is there "no way" to enable logging when invoking nvme/perf ? [08:43:17] you're not missing anything - currently nvme/perf doesn't have an option to enable logging flags [08:43:33] i don't see any reason i can't be added though [08:43:42] i => it [08:44:26] are you signing up for that? ;-) [08:46:23] the one bummer is that nvme/perf uses -L for specifying the level of the latency charts [08:46:26] It doesn't use the same kind of arg parsing function, though, that, say, nvmf_tgt, does. perf uses "struct spdk_env_opts" whereas nvmf_tgt uses spdk_app_opts. I'm sure there's "a way", but it doesn't appear as a simple drop-in. [08:47:04] correct - it will have to be added explicitly to perf.c, since that application is not based on the spdk app framework and can't use spdk_app_parse_args [08:47:39] but the code from spdk_app_parse_args serves as a good template for how to add it to the parse_args() function in perf.c [08:47:59] What started (for me) as a "simple" exercise to update some doxygen comments related to spdk_nvme_ctrlr_get_num_ns() has turned into its own little "project". [08:49:49] the proverbial rabbit hole [08:50:22] I wanted to clarify the description/use of spdk_nvme_ctrlr_get_num_ns() after some other folks ran into a problem using it. [08:51:16] It doesn't simply "Get the number of namespaces...", but instead it obtains the number of VALID namespaces which may or may not be active. [08:52:41] Related, the doc for spdk_nvme_ctrlr_get_ns() can be misinterpreted in that it states "There will never be any gaps in the numbering..." [08:53:46] Well, one *can* get an entry, albeit it can be "blank" because one can use non-contiguous namespace id's. [08:55:12] Further, I found that if I use an RPC to delete a bdev, spdk_nvme_ctrlr_get_ns() will still return a ptr to the associated structure for that deleted bdev and there's no change to it; i.e. it still appears active. [08:55:59] which rpc to delete a bdev? [08:56:12] delete_malloc_bdev [08:57:03] Meanwhile, perf will continue to chug along without complaint. I can see errors from nvmf_tgt, but nothing is advertised (nor logged) from perf. [08:58:11] And, if perf in the midst of that happens to do a spdk_nvme_ctrlr_get_ns() against that deleted namespace (which I hacked in to see how it behaved), the resulting structure gives no indication that namespace went away (became inactive). [08:59:14] I suspect there are some calls perf is making (unfortunately without checking return values) that might indicate something had gone wrong, which is why I wanted to enable logging :-/ [09:00:55] :q [09:01:06] it shouldn't be too hard to add - or for right now you could just hack your version locally to add spdk_log_set_flag("nvme") [09:01:57] Ooooh, I didn't know about that. [09:05:09] i guess for nvme/perf, we don't really need to provide a capability to specify multiple log flags - "nvme" is the only possibility [09:08:10] I'll continue digging. My current theory why I don't see a complaint from perf when the bdev is deleted from the target is because perf does not check the return value of spdk_nvme_qpair_process_completions() in check_io(). Maybe? [09:08:48] spdk_nvme_qpair_process_completions() just returns how many completions occurred [09:09:09] No negative values to indicate any type of failure? [09:09:34] the completion handler is responsible for checking the values in the nvme completion entry [09:10:05] more likely it's that nvme/perf isn't checking for namespace change events [09:10:15] This "completion handler" of which you speak is in the lib code, though, right? Not in perf itself. [09:11:30] it's in perf - io_complete() [09:12:40] user of the library passes a completion handler pointer whenever it calls something like spdk_nvme_ns_cmd_read/write [09:12:52] user of the library later calls spdk_nvme_qpair_process_completions() [09:13:07] Ah, ok. I had spied various error checking stuff in lib code related to fielding completions, but noted the use of the logging hence my earlier comments. [09:13:09] if any completions are found, the library then calls that completion handler for each completion [09:13:14] and passes the completion entry [09:13:23] which the completion handler can decode for errors [09:16:36] Right now, perf's io_complete() -> task_complete(). I'm guessing the latter isn't doing much by way of checking for errors. I do see code with a comment in there for "add application level verification for end-to-end data protection", but nothing else really. [09:17:11] Unless perf has timed out, task_complete() then just submits a new IO. [09:20:27] io_complete() does an fprintf to stderr if it finds any kind of error - i assume you're not seeing any of those? [09:21:52] Correct -- none. [09:22:50] I hasten to add that I'm doing this on the SPDK 18.10.x branch -- not top of tree. [09:24:30] I just looked in master/top-of-tree and note that perf's io_complete() has the printf (based on if spdk_nvme_cpl_is_error()). [09:24:40] That code is NOT in the 18.10.x branch. [09:26:53] Can you give me a starting point (e.g. function) that I can follow to learn how to field namespace change events? [09:27:12] Ah, just noted the time. I have a meeting in 20 minutes for which I need to depart. Back in a little while. [09:29:17] spdk_nvme_ctrlr_register_aer_callback [09:36:09] there's even a test app that shows how to use it - test/nvme/aer/aer.c [10:08:33] lhodev: scans ran, I just have to get through all of the results to confirm we're good [11:14:28] jimharris, so seems like w/the compression callbacks, when I find a completed operation in the poller, I'm done with the host IO then right? (ie I don't need to tell reduce that the operation is done) [11:19:32] jimharris, I keep forgetting to ask, I need to bump ipsec lib up a version to stay in sycn with AESNI testing on the DPDK side. Do you want to create a branch in the fork for this? [11:30:09] jimharris, ipsec lib ver we'll want to move to is .52 [11:33:23] you always have to tell libreduce when a compress or decompress operation is complete [11:33:42] or maybe i don't totally understand the question [11:41:15] peluse: i just updated our intel-ipsec-mb master branch to latest from upstream [11:41:41] i created an spdk-0.49 branch that points to the v0.49 tag + your makefile fix [11:42:04] then i updated the spdk branch to the v0.52 tag + cherry pick of your makefile fix [11:43:56] ugh - that looks like the same process that was used when we moved from v0.48 to v0.49 - but I think my force push to spdk broke the gerrithub => github integration [12:02:12] *** Joins: gila (~gila@5ED4D979.cm-7-5d.dynamic.ziggo.nl) [12:07:50] *** Joins: travis-ci (~travis-ci@ec2-3-88-32-126.compute-1.amazonaws.com) [12:07:51] (spdk/master) bdev: fix potential segmentation fault bug (JinYu) [12:07:51] Diff URL: https://github.com/spdk/spdk/compare/db541f8eb36b...2f3147c0b2a9 [12:07:51] *** Parts: travis-ci (~travis-ci@ec2-3-88-32-126.compute-1.amazonaws.com) () [12:10:03] @jimharris I'm trying to reproduce the issue using the kernel initiator but I'm seeing some slightly different behavior thats making it challenging [12:10:28] namely when I try to rescan the host to find the drive and bring it back in to repeat the kick [12:12:10] its consistently coming back as /dev/sdf instead of /dev/sda [12:13:58] it seems to continue to come back as /dev/sdf so long as fio is still running [12:14:39] im wondering if the way that we're connecting to the iscsi target is possibly a key difference in why its failing for us but you guys are having trouble with the repro [12:20:23] we're using libiscsi to connect not the kernel initiator [12:21:01] one of my coworkers thinks I should try using the spdk perf tool instead of the kernel initiator and fio [12:21:09] so im gonna try that to see if the readd behavior changes [12:43:50] ok jrlusby - sounds good - let us know what you find [12:44:30] 👍 right now im trying out that O_DSYNC change with our testcases that normally produce the hang [12:55:20] *** Joins: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) [13:22:34] *** Joins: travis-ci (~travis-ci@ec2-3-87-12-86.compute-1.amazonaws.com) [13:22:35] (spdk/master) bdev/compress: add reduce integration (Paul Luse) [13:22:35] Diff URL: https://github.com/spdk/spdk/compare/2f3147c0b2a9...1717b5d580c9 [13:22:35] *** Parts: travis-ci (~travis-ci@ec2-3-87-12-86.compute-1.amazonaws.com) () [13:25:58] @Shuhei @jimharris I reproduced the hang with our io error injecting system test and the O_DSYNC change @Shuhei recommended [13:27:56] its looking like that perf tool is for nvme not iscsi [13:28:09] at first glance at least [13:28:19] im gonna try digging into the docs a bit [13:30:59] jrusby: Thank you for trying it, and sorry for confusion. [13:31:20] You don't have to read the links I added in detail. [13:32:55] Previously our team observed a strange behavior when they didn't add O_DSYNC. This was observed when they didn't use SPDK. [13:33:41] So I just proposed O_DSYNC. [13:35:10] jimharris: I tried reproduce but failed. I will not be able to join today's bug scrub meeting but talking in the meeting may be helpful. [13:36:50] jrlusby: All other information except for O_DSYNC was noise for this issue. Sorry for confusion. [13:38:51] no worries @Shuhei , I'm just really happy to have you guys helping me try to get to the bottom of this [13:48:42] *** Joins: travis-ci (~travis-ci@ec2-3-93-64-6.compute-1.amazonaws.com) [13:48:43] (spdk/18.07-perf) perf: Add option to create unused io queue pairs (Ben Walker) [13:48:43] Diff URL: https://github.com/spdk/spdk/compare/b63ad2eec0de^...24d508744103 [13:48:43] *** Parts: travis-ci (~travis-ci@ec2-3-93-64-6.compute-1.amazonaws.com) () [14:28:57] *** Quits: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) (Ping timeout: 256 seconds) [14:32:27] *** Joins: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) [15:13:09] *** Quits: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) (Ping timeout: 256 seconds) [15:22:25] peluse: i pushed an spdk-0.52 branch to gerrithub and it's already been synced to github [15:24:23] great, thanks [15:35:15] jimharris, do you really need two separate calls to me, one for comp and one for decomp, or can I just give you one and we add a bool indicating compress/decompress? A little less code on my end is all [15:38:00] i guess we don't need two separate callbacks - but i guess i see it similar to read/write and we have separate calls for those [15:39:07] could you just make a 'main' compress/decompress function that takes the bool? and then two different functions you provide me as callbacks, each of which call that main funciton [15:40:04] yeah, I have one main one and was planning on doing just that unless you wanted to collapse to just one on your end. Can do [15:47:10] hey for errno on my call to you when an operation is done, do you want me to just you the status from the dequeue operation or just 0 for success and try and match some standard errors up to the status I get back and use a translated one? [15:47:14] jimharris, ^ [15:53:53] for compress/decompress, i'll need the number of bytes as the 'errno' parameter instead of just 0 [15:54:01] for the success case [15:55:09] we'll need to settle on some kind of 'standard' error values though - unless we can just map them to standard errno values [16:01:55] cool [16:09:42] *** Quits: ppelplin (ppelplin@nat/intel/x-aeznenhwwafwqcbq) (Quit: ZNC - http://znc.in) [16:09:52] *** Joins: ppelplin (~ppelplin@134.134.139.75) [16:13:05] *** Quits: gila (~gila@5ED4D979.cm-7-5d.dynamic.ziggo.nl) (Quit: My Mac Pro has gone to sleep. ZZZzzz…) [18:10:54] *** Joins: travis-ci (~travis-ci@ec2-184-73-55-46.compute-1.amazonaws.com) [18:10:55] (spdk/master) nvme:complete I/O and abort rest I/O before destroy io_qpair (JinYu) [18:10:56] Diff URL: https://github.com/spdk/spdk/compare/f8815f02af98...5874e2ac6c20 [18:10:56] *** Parts: travis-ci (~travis-ci@ec2-184-73-55-46.compute-1.amazonaws.com) () [19:52:54] *** Joins: travis-ci (~travis-ci@ec2-18-212-179-22.compute-1.amazonaws.com) [19:52:55] (spdk/master) ut/ftl: fixed unitialized value warnings (Konrad Sztyber) [19:52:56] Diff URL: https://github.com/spdk/spdk/compare/5874e2ac6c20...ce95c099a87e [19:52:56] *** Parts: travis-ci (~travis-ci@ec2-18-212-179-22.compute-1.amazonaws.com) () [21:14:27] *** Joins: travis-ci (~travis-ci@ec2-3-91-36-228.compute-1.amazonaws.com) [21:14:28] (spdk/master) reduce: save num_io_units and chunk_is_compressed to req object (Jim Harris) [21:14:29] Diff URL: https://github.com/spdk/spdk/compare/ce95c099a87e...2edc65291325 [21:14:29] *** Parts: travis-ci (~travis-ci@ec2-3-91-36-228.compute-1.amazonaws.com) () [22:27:26] *** Quits: lhodev (~lhodev@inet-hqmc06-o.oracle.com) (Remote host closed the connection) [22:28:03] *** Joins: lhodev (~lhodev@66-90-218-190.dyn.grandenetworks.net) [23:51:41] Yippee, build fixed! [23:51:42] Project autotest-nightly build #442: FIXED in 51 min. See https://ci.spdk.io/spdk-jenkins for results.