[00:28:55] *** Joins: darsto_ (~darsto@89-68-114-161.dynamic.chello.pl) [00:30:01] *** Quits: darsto (~darsto@89-68-114-161.dynamic.chello.pl) (Ping timeout: 244 seconds) [00:30:01] *** darsto_ is now known as darsto [01:37:44] *** Joins: tkulasek (tkulasek@nat/intel/x-bznpxmdrskywvqrt) [04:46:26] *** Joins: lyan (~lyan@2605:a000:160e:2124:4a4d:7eff:fef2:eea3) [04:46:49] *** lyan is now known as Guest23991 [06:02:18] *** Joins: philipp-sk (~Philipp@ktnron0916w-lp130-02-76-66-162-159.dsl.bell.ca) [06:18:16] *** Joins: darsto_ (~darsto@89-68-114-161.dynamic.chello.pl) [06:19:17] *** Quits: darsto (~darsto@89-68-114-161.dynamic.chello.pl) (Ping timeout: 244 seconds) [06:19:17] *** darsto_ is now known as darsto [06:26:59] *** Quits: philipp-sk (~Philipp@ktnron0916w-lp130-02-76-66-162-159.dsl.bell.ca) (Read error: Connection reset by peer) [06:27:25] *** Joins: philipp-sk (~Philipp@ktnron0916w-lp130-02-76-66-162-159.dsl.bell.ca) [06:45:17] *** ChanServ sets mode: +o peluse [06:45:48] FYI, Euro community meeting in a little over an hour from now... https://trello.com/b/DvM7XayJ/spdk-community-meeting-agenda [07:29:22] bwalker, drv jimharris : Just ran into a strange looking bdev coming from bdevperf. The total blocks is listed as 8 as I assume its a 4K IO but the iovs shows vector of length 8640?? See https://gist.github.com/peluse/a557570ccf23cedc9fa869c913a1bdb9 [07:46:13] hold up... this might not be a bdev_io from the app, I think I have a bug [07:50:41] well, maybe not again :) Will bring it up in the meeting.... [07:52:18] what's the callstack where you captured that gdb output? [07:53:21] this sounds like something we should be able to debug outside of the community meeting [07:57:31] sure, if nobody else is interested [08:00:31] where have you looked so far? [08:02:00] jimharris, are you on the call? [08:02:10] joining - takes forever to load [08:16:53] *** Joins: bwalker_ (~bwalker@ip70-190-226-244.ph.ph.cox.net) [08:16:53] *** ChanServ sets mode: +o bwalker_ [08:19:36] _ [08:21:29] anyway, I think its up to Ben - I think its a legit request to have a code walkthrough on something that isn't super clear [08:56:29] please press 1 from your dial pad [08:57:17] jimharris, that bdev_io length thing... I get a read in my submit request and don't mess with any of the fields. I just pass it on with " rc = spdk_bdev_readv_blocks(crypto_node->base_desc, crypto_ch->base_ch, bdev_io->u.bdev.iovs, [08:57:17] bdev_io->u.bdev.iovcnt, bdev_io->u.bdev.offset_blocks, [08:57:17] bdev_io->u.bdev.num_blocks, _crypto_complete_io, [08:57:17] bdev_io);" [08:58:03] then in the completion of the read, _crypto_complete_io, I added code to comare the num_blocks to the iov length and catch one that was 4096 on the way down and 85xx in the completion. num_blocks didn't change [08:58:05] ok - what is bdevperf passing to the bdev layer at the top end? [08:58:22] size you mean? [08:58:27] *** Joins: peter_turschm (~peter_tur@64.187.168.208) [08:58:46] 8 blocks, 4K IO (all fields in the bdev_io are consistent) [08:58:55] here's the callstack in the completion routine: [08:59:08] "#0 _crypto_complete_io (bdev_io=0x7fffec801040, success=true, cb_arg=0x7fffec800e00) at vbdev_crypto.c:743 [08:59:08] #1 0x00000000004fdd8c in _spdk_bdev_io_complete (ctx=0x7fffec801040) at bdev.c:2470 [08:59:08] #2 0x00000000004fe56e in spdk_bdev_io_complete (bdev_io=0x7fffec801040, status=SPDK_BDEV_IO_STATUS_SUCCESS) at bdev.c:2552 [08:59:08] #3 0x00000000005043f5 in spdk_bdev_part_complete_io (bdev_io=0x7fffec801280, success=true, cb_arg=0x7fffec801040) at part.c:188 [08:59:08] #4 0x00000000004fdd8c in _spdk_bdev_io_complete (ctx=0x7fffec801280) at bdev.c:2470 [08:59:10] #5 0x00000000004fe56e in spdk_bdev_io_complete (bdev_io=0x7fffec801280, status=SPDK_BDEV_IO_STATUS_SUCCESS) at bdev.c:2552 [08:59:11] so this bdev_io you pasted was the one that cmopleted from the underlying bdev? [08:59:13] #6 0x00000000004fef38 in spdk_bdev_io_complete_nvme_status (bdev_io=0x7fffec801280, sct=0, sc=0) at bdev.c:2617 [08:59:16] #7 0x000000000045d1ee in bdev_nvme_queued_done (ref=0x7fffec801370, cpl=0x7fffd6972020) at bdev_nvme.c:1301 [08:59:19] #8 0x0000000000481917 in nvme_complete_request (req=0x7fffd48dcc00, cpl=0x7fffd6972020) at nvme_internal.h:784 [08:59:21] #9 0x000000000048926d in nvme_pcie_qpair_complete_tracker (qpair=0x7fffd62339f8, tr=0x7fffd48b1000, cpl=0x7fffd6972020, [08:59:25] print_on_error=true) at nvme_pcie.c:1242 [08:59:26] #10 0x000000000048ea37 in nvme_pcie_qpair_process_completions (qpair=0x7fffd62339f8, max_completions=64) at nvme_pcie.c:2093 [08:59:29] #11 0x0000000000499a35 in nvme_transport_qpair_process_completions (qpair=0x7fffd62339f8, max_completions=0) [08:59:31] at nvme_transport.c:218 [08:59:36] #12 0x0000000000490f00 in spdk_nvme_qpair_process_completions (qpair=0x7fffd62339f8, max_completions=0) at nvme_qpair.c:400 [08:59:39] #13 0x0000000000457122 in bdev_nvme_poll (arg=0x6080000011d0) at bdev_nvme.c:192 [08:59:40] #14 0x000000000050f2f6 in _spdk_reactor_run (arg=0x6120000004c0) at reactor.c:518 [08:59:43] #15 0x00000000005103e9 in spdk_reactors_start () at reactor.c:692 [08:59:44] #16 0x000000000050b3d4 in spdk_app_start (opts=0x7fffffffe2c0, start_fn=0x409904 , arg1=0x0, arg2=0x0) at app.c:575 [08:59:47] #17 0x000000000040afc0 in main (argc=11, argv=0x7fffffffe478) at bdevperf.c:1027" [09:00:09] *** Quits: bwalker_ (~bwalker@ip70-190-226-244.ph.ph.cox.net) (Quit: Leaving) [09:00:18] so this bdev_io you pasted was the one that cmopleted from the underlying bdev? [09:01:11] your original comment said it was a 'bdev' coming from bdevperf, but now i'm confused [09:01:17] The one I passed in submit (the parms in the function call) is the bdev_io directly from bdevperf. The one I'm looking at in the completion is the one allocated from that call and I'm seeing it in the completion routine that I specified [09:01:57] bdevperf-->crypto-->submit->spdk_bdev_readv_blocks(bdev parms from bdevperf) [09:02:22] completion_routine is getting the bdev that was allcoated from the spdk_bdev_readv_block() call. Does that make sense? [09:02:54] completion routine is getting the bdev_io that was allocated? [09:03:33] yes, I hope so :) And that is, of course, the one that was sent to my base_desc which is the nvme bdev under me [09:04:16] and this is intermittent and just started happening here recently, possibly introduced by a rebase or something. I haven't changed this code in forever and before I went on vacation I ran bdevperf for 72 hrs straight w/o issues. Now it fails 1/3 of the time [09:05:15] open to suggestions, otherwise I'm just going to start stepping and see if I can figure out where it's (the iov length) getting changed, maybe a data breakpoint on that address or something [09:05:29] do the data breakpoint [09:05:39] will do, gracias [09:06:11] on a somewhat related topic - i posted some comments on your RFC crypto patch - i think there are some potential iov related issues you're going to hit especially if you ever try to run this with vhost [09:06:33] it's same problem that kunal's pvol patch is going to have when it splits IO across strip boundaries [09:07:09] cool thanks for looking. I was planning on going through those today and in preparation was making sure my system here was running great when I ran into whatever this thing is :) [09:22:52] one other thing I forgot to mention, I'm using split to break by underlying NVMe in two. Stepping through the IO now that I think is the one that gets jacked... [09:51:16] sethhowe_: can you please review the first two patches starting at https://review.gerrithub.io/#/c/spdk/spdk/+/416052/ [09:51:22] *** Quits: tomzawadzki (tomzawadzk@nat/intel/x-xljwmlnxhxglprwu) (Ping timeout: 264 seconds) [09:51:37] jimharris: sorry, just pushed a new rev of this after you +2'd it: https://review.gerrithub.io/#/c/spdk/spdk/+/418866/ [09:54:33] i'm going to mark it -1 for now - gerrit says this will conflict with johnm's vm_setup.sh localization patch and we need to get that one in [09:58:34] *** Joins: johnmeneghini (~johnmeneg@pool-100-0-53-181.bstnma.fios.verizon.net) [10:02:08] I'm not sure why it would conflict - it does touch vm_setup.sh, but not any of the same lines [10:02:18] should get resolved automatically by a rebase [10:02:43] *** Quits: peter_turschm (~peter_tur@64.187.168.208) (Remote host closed the connection) [10:02:57] I'd like to get sethhowe_ to test the vm_setup.sh patches before we merge them [10:02:57] *** Joins: peter_turschm (~peter_tur@2604:4080:1133:0:bc6c:53bf:a013:c140) [10:03:28] *** Quits: peter_turschm (~peter_tur@2604:4080:1133:0:bc6c:53bf:a013:c140) (Remote host closed the connection) [10:39:32] *** Joins: peter_turschm (~peter_tur@66.193.132.66) [11:10:03] hey bwalker, a gentle reminder about 416878 and 416879, thanks! :) [11:11:12] been workin on them - reading the ibverbs spec is not light reading [11:14:36] jimharris: quick and easy bug fix patch for your review: https://review.gerrithub.io/#/c/spdk/spdk/+/418868/ [11:17:54] bwalker: was looking at "InfiniBand Architecture Release 1.3, March 3, 2015", pp.472-482 [11:18:06] yeah me too [11:19:09] there's also a nice howto on qp modification: http://www.rdmamojo.com/2013/01/12/ibv_modify_qp/ [11:19:45] have you ever seen this actually detect an error, kick in, and recover? [11:19:51] under what conditions is that expected to occur? [11:20:51] we do periodically see RDMA connections enter an error state, but I have no insight as to why this happens [11:24:48] yes, seen it with ab snic by vendor, running some beta firmware. [11:25:29] it was formware issue under high load from multiple initiators running iozone test [11:26:16] i suspect it was some fw issue, but still the cnx was dead and had to wait till initiator hits i/o timetout and issues disconnect [11:26:43] that's what I was concerned was happening on our end [11:26:53] some piece of hardware malfunctions - we only see this happen on specific NICs [11:26:55] then i changed softroce to inject these errors [11:27:31] so it's fast to develop and test [11:28:01] could share this somehow [12:17:20] *** Quits: tkulasek (tkulasek@nat/intel/x-bznpxmdrskywvqrt) (Ping timeout: 244 seconds) [12:27:21] jimharris, drv: Just got back from the home inspection. I'm checking it out now. [12:32:19] *** Quits: philipp-sk (~Philipp@ktnron0916w-lp130-02-76-66-162-159.dsl.bell.ca) (Ping timeout: 244 seconds) [12:48:24] *** Joins: alekseymmm (bcf3adf1@gateway/web/freenode/ip.188.243.173.241) [12:48:52] *** Quits: alekseymmm (bcf3adf1@gateway/web/freenode/ip.188.243.173.241) (Client Quit) [12:51:48] *** Joins: alekseymmm (bcf3adf1@gateway/web/freenode/ip.188.243.173.241) [13:21:30] *** Joins: philipp-sk (~Philipp@ktnron0916w-lp130-02-76-66-162-159.dsl.bell.ca) [14:22:01] *** Joins: travis-ci (~travis-ci@ec2-54-162-87-224.compute-1.amazonaws.com) [14:22:02] (spdk/master) json/rpc: Tests for nvmf subsystem (Pawel Niedzwiecki) [14:22:03] Diff URL: https://github.com/spdk/spdk/compare/f0ec7bc6e715...b2fd5b25b217 [14:22:03] *** Parts: travis-ci (~travis-ci@ec2-54-162-87-224.compute-1.amazonaws.com) () [14:24:53] *** Joins: travis-ci (~travis-ci@ec2-54-162-87-224.compute-1.amazonaws.com) [14:24:54] (spdk/master) NVMF: Fibre Channel Transport API (John Barnard) [14:24:54] Diff URL: https://github.com/spdk/spdk/compare/b2fd5b25b217...0e9f9bead972 [14:24:54] *** Parts: travis-ci (~travis-ci@ec2-54-162-87-224.compute-1.amazonaws.com) () [14:34:54] *** Joins: travis-ci (~travis-ci@ec2-54-145-176-190.compute-1.amazonaws.com) [14:34:55] (spdk/master) test: add ability to mock spdk_dma_malloc() (Paul Luse) [14:34:56] Diff URL: https://github.com/spdk/spdk/compare/0e9f9bead972...58d8a4564b8e [14:34:56] *** Parts: travis-ci (~travis-ci@ec2-54-145-176-190.compute-1.amazonaws.com) () [14:35:15] *** Quits: philipp-sk (~Philipp@ktnron0916w-lp130-02-76-66-162-159.dsl.bell.ca) (Ping timeout: 260 seconds) [14:40:48] *** Quits: Guest23991 (~lyan@2605:a000:160e:2124:4a4d:7eff:fef2:eea3) (Quit: Leaving) [15:07:52] *** Quits: alekseymmm (bcf3adf1@gateway/web/freenode/ip.188.243.173.241) (Quit: Page closed) [15:28:04] jimharris: can you take a look at John's first vm_setup.sh patch? https://review.gerrithub.io/#/c/spdk/spdk/+/416052/ [15:30:06] and then https://review.gerrithub.io/#/c/spdk/spdk/+/418866/ [15:30:28] *** Joins: JamesL1 (cf8c2b51@gateway/web/freenode/ip.207.140.43.81) [15:31:29] Hi guys, I'm trying to port spdk 17.03 to kernel 4.15, do you have any experience whether it will work? [15:33:22] in general SPDK has very few kernel dependencies - which components are you using in particular? [15:34:27] drv: looks good to me [15:34:51] I'm seeing a compile error with pci_enable_msix, sees the api name has been removed [15:35:12] as for components, we're using nvme and nvmf [15:35:42] johnmeneghini: we will commit the vm_setup.sh patch once you've confirmed seth's updates are ok [15:35:44] just an example of a compile error: if (pci_enable_msix(dev, &msix_entry, 1) == 0) { [15:36:03] kernel code shows it's been removed, looks like pci_enable_msi_exact will be the replacement [15:36:48] SPDK never calls pci_enable_msi* [15:36:51] which file contains the call to pci_enable_msix? [15:36:52] DPDK does [15:37:08] and only in parts that you can probably just not compile [15:37:43] we will attempt to get through all compile issues, our concern is whether it may impact functionalities [15:37:54] i really wouldn't suggest using spdk 17.03 nvmf though - there have been so many bug fixes since then [15:38:52] DPDK only calls that function in ethtool and igb_uio [15:38:55] neither of which are used by SPDK [15:39:45] bwalker: can you also check out https://review.gerrithub.io/#/c/spdk/spdk/+/418866/ when you get a chance? [15:39:50] ok, that's good to know [15:39:53] ok [15:41:56] thanks for the help everyone, will be in touch [15:45:35] If you haven't see it already I've updated my patch set, and it's read to go. Thanks to Seth for his tsocks fix. [15:45:38] remote: Updated Changes: [15:45:38] remote: https://review.gerrithub.io/#/c/spdk/spdk/+/414861 test: add support for multiple oses with vagrant [15:45:38] remote: https://review.gerrithub.io/#/c/spdk/spdk/+/416052 test: improvements to vm_setup.sh and pkgdep.sh [15:45:38] remote: https://review.gerrithub.io/#/c/spdk/spdk/+/418236 test: vm_setup.sh localization [15:47:31] first two patches are pushed - i haven't gotten to the vagrant patch yet today [15:49:56] *** Joins: travis-ci (~travis-ci@ec2-54-162-87-224.compute-1.amazonaws.com) [15:49:57] (spdk/master) bdev: fix alignment math in spdk_bdev_io_set_buf() (Daniel Verkamp) [15:49:58] Diff URL: https://github.com/spdk/spdk/compare/58d8a4564b8e...d53545eff791 [15:49:58] *** Parts: travis-ci (~travis-ci@ec2-54-162-87-224.compute-1.amazonaws.com) () [15:51:00] *** Joins: travis-ci (~travis-ci@ec2-54-146-55-232.compute-1.amazonaws.com) [15:51:01] (spdk/master) test: vm_setup.sh localization (Ed Rodriguez) [15:51:01] Diff URL: https://github.com/spdk/spdk/compare/d53545eff791...2e0df7813ede [15:51:01] *** Parts: travis-ci (~travis-ci@ec2-54-146-55-232.compute-1.amazonaws.com) () [15:52:39] *** Joins: travis-ci (~travis-ci@ec2-54-145-176-190.compute-1.amazonaws.com) [15:52:40] (spdk/master) ut/lvol: remove bogus lvol_op_comp test (Daniel Verkamp) [15:52:41] Diff URL: https://github.com/spdk/spdk/compare/2e0df7813ede...56ad1cbea458 [15:52:41] *** Parts: travis-ci (~travis-ci@ec2-54-145-176-190.compute-1.amazonaws.com) () [15:54:54] jimharris: thanks. If you want to merge https://review.gerrithub.io/418866 into master now, I'll rebase my vagrant change one more time and make a few tweaks to account for drv's change [16:03:15] ok - i had to run it through the test pool again due to a latent failure - then will commit and you can rebase [16:04:14] jimharris: this one could use your review as well (fixing up the error paths in that RDMA patch you pointed out): https://review.gerrithub.io/#/c/spdk/spdk/+/418869/ [16:04:15] Yes, I saw the Jenkins failed the CIT for 418866. [16:04:35] So you have intermittent failures in your CIT test too... [16:21:50] *** Joins: travis-ci (~travis-ci@ec2-54-146-55-232.compute-1.amazonaws.com) [16:21:51] (spdk/master) test: add SPDK_TEST_NVME_CLI autotest flag (Daniel Verkamp) [16:21:51] Diff URL: https://github.com/spdk/spdk/compare/56ad1cbea458...fbb481c2c613 [16:21:51] *** Parts: travis-ci (~travis-ci@ec2-54-146-55-232.compute-1.amazonaws.com) () [16:57:55] *** Joins: travis-ci (~travis-ci@ec2-54-145-176-190.compute-1.amazonaws.com) [16:57:56] (spdk/master) nvmf/rdma: check for rdma_get_devices() failure (Daniel Verkamp) [16:57:56] Diff URL: https://github.com/spdk/spdk/compare/fbb481c2c613...043e5edb1f1d [16:57:56] *** Parts: travis-ci (~travis-ci@ec2-54-145-176-190.compute-1.amazonaws.com) () [17:33:47] *** Quits: peter_turschm (~peter_tur@66.193.132.66) (Remote host closed the connection) [18:53:13] *** Quits: JamesL1 (cf8c2b51@gateway/web/freenode/ip.207.140.43.81) (Quit: Page closed) [21:10:52] *** Joins: darsto_ (~darsto@89-68-114-161.dynamic.chello.pl) [21:11:14] *** Quits: johnmeneghini (~johnmeneg@pool-100-0-53-181.bstnma.fios.verizon.net) (Quit: Leaving.) [21:12:11] *** Quits: darsto (~darsto@89-68-114-161.dynamic.chello.pl) (Ping timeout: 268 seconds) [21:12:11] *** darsto_ is now known as darsto [21:21:24] *** Joins: darsto_ (~darsto@89-68-114-161.dynamic.chello.pl) [21:22:21] *** Quits: darsto (~darsto@89-68-114-161.dynamic.chello.pl) (Ping timeout: 240 seconds) [21:22:22] *** darsto_ is now known as darsto [23:42:02] *** Joins: tomzawadzki (~tomzawadz@134.134.139.72)