[00:28:20] *** Joins: tkulasek (~tkulasek@134.134.139.74) [01:58:33] *** Quits: dlw (~Thunderbi@114.255.44.143) (Remote host closed the connection) [01:58:47] *** Joins: dlw (~Thunderbi@114.255.44.143) [03:27:03] *** Joins: tzawadzki (tomzawadzk@nat/intel/x-uryryolufxckbdic) [03:27:03] *** Quits: tomzawadzki (tomzawadzk@nat/intel/x-yxympcymdwhkoqcq) (Remote host closed the connection) [03:46:44] *** Quits: tzawadzki (tomzawadzk@nat/intel/x-uryryolufxckbdic) (Remote host closed the connection) [04:11:32] *** Quits: dlw (~Thunderbi@114.255.44.143) (Ping timeout: 276 seconds) [06:42:56] *** Joins: lyan (~lyan@2605:a000:160e:2124:4a4d:7eff:fef2:eea3) [06:43:20] *** lyan is now known as Guest53213 [06:47:13] *** ChanServ sets mode: +o peluse [06:57:23] *** Joins: philipp-sk (~Philipp@ktnron0916w-lp130-1-65-94-202-180.dsl.bell.ca) [07:39:27] *** Joins: dlw (~Thunderbi@222.131.154.193) [08:05:57] *** Quits: dlw (~Thunderbi@222.131.154.193) (Ping timeout: 240 seconds) [10:58:00] *** Quits: tkulasek (~tkulasek@134.134.139.74) (Ping timeout: 260 seconds) [10:58:18] https://review.gerrithub.io/#/c/spdk/spdk/+/416658/ needs another +2 and then we can commit this ioat patchset from darsto [11:26:17] *** Joins: travis-ci (~travis-ci@ec2-54-197-190-52.compute-1.amazonaws.com) [11:26:18] (spdk/master) ioat/verify: print missing status output on fill-only I/O (Dariusz Stojaczyk) [11:26:18] Diff URL: https://github.com/spdk/spdk/compare/69a762ca5290...73a7ecfe17b8 [11:26:18] *** Parts: travis-ci (~travis-ci@ec2-54-197-190-52.compute-1.amazonaws.com) () [12:10:04] *** Joins: travis-ci (~travis-ci@ec2-54-197-190-52.compute-1.amazonaws.com) [12:10:05] (spdk/master) scsi/ut: remove unused spdk_bdev_read() (Tomasz Zawadzki) [12:10:05] Diff URL: https://github.com/spdk/spdk/compare/73a7ecfe17b8...89f5d8b72ea1 [12:10:05] *** Parts: travis-ci (~travis-ci@ec2-54-197-190-52.compute-1.amazonaws.com) () [12:15:09] bwalker: did you see this -1 from john barnard? https://review.gerrithub.io/#/c/spdk/spdk/+/416466/ [12:16:37] *** Joins: travis-ci (~travis-ci@ec2-54-205-60-99.compute-1.amazonaws.com) [12:16:38] (spdk/master) test: Move poller test to the thread unit test (Ben Walker) [12:16:39] Diff URL: https://github.com/spdk/spdk/compare/89f5d8b72ea1...5dcd6f631863 [12:16:39] *** Parts: travis-ci (~travis-ci@ec2-54-205-60-99.compute-1.amazonaws.com) () [12:17:19] *** Joins: travis-ci (~travis-ci@ec2-54-91-62-70.compute-1.amazonaws.com) [12:17:21] (spdk/master) nvme/pcie: merge physically contiguous SGEs (Dariusz Stojaczyk) [12:17:21] Diff URL: https://github.com/spdk/spdk/compare/5dcd6f631863...5f146372467f [12:17:21] *** Parts: travis-ci (~travis-ci@ec2-54-91-62-70.compute-1.amazonaws.com) () [12:20:46] *** Joins: travis-ci (~travis-ci@ec2-54-197-190-52.compute-1.amazonaws.com) [12:20:47] (spdk/master) test/virtio: bring back QEMU's emulated virtio-scsi-pci tests (Pawel Niedzwiecki) [12:20:47] Diff URL: https://github.com/spdk/spdk/compare/5f146372467f...f2da539dbcf6 [12:20:47] *** Parts: travis-ci (~travis-ci@ec2-54-197-190-52.compute-1.amazonaws.com) () [12:24:31] *** Joins: travis-ci (~travis-ci@ec2-54-91-62-70.compute-1.amazonaws.com) [12:24:32] (spdk/master) virtio/user: remove leftover kernel vhost references (Dariusz Stojaczyk) [12:24:32] Diff URL: https://github.com/spdk/spdk/compare/f2da539dbcf6...f0501959a029 [12:24:32] *** Parts: travis-ci (~travis-ci@ec2-54-91-62-70.compute-1.amazonaws.com) () [12:32:11] *** Joins: travis-ci (~travis-ci@ec2-54-205-60-99.compute-1.amazonaws.com) [12:32:12] (spdk/master) test/json: Fix. Clear most dependent subsytems first. (Pawel Kaminski) [12:32:12] Diff URL: https://github.com/spdk/spdk/compare/f0501959a029...c8b03c762f26 [12:32:12] *** Parts: travis-ci (~travis-ci@ec2-54-205-60-99.compute-1.amazonaws.com) () [12:35:09] *** Joins: travis-ci (~travis-ci@ec2-54-197-190-52.compute-1.amazonaws.com) [12:35:10] (spdk/master) doc/lvol: add inflate doc (Tomasz Kulasek) [12:35:10] Diff URL: https://github.com/spdk/spdk/compare/c8b03c762f26...8ee512219e2f [12:35:10] *** Parts: travis-ci (~travis-ci@ec2-54-197-190-52.compute-1.amazonaws.com) () [13:06:56] pwodkowx: with rte_vhost, you can never be 100% sure about something [13:07:20] you can't ask for 1000% guarantee, come on [13:07:58] drv, bwalker - could you poke CI for https://review.gerrithub.io/#/c/spdk/spdk/+/416879/ and https://review.gerrithub.io/#/c/spdk/spdk/+/416878/ please? [13:56:25] *** Quits: pohly (~pohly@p5484976F.dip0.t-ipconnect.de) (Quit: Leaving.) [14:29:29] *** Joins: philipp_sk (~Philipp@ktnron0916w-lp130-1-65-94-202-180.dsl.bell.ca) [14:31:45] *** Quits: philipp-sk (~Philipp@ktnron0916w-lp130-1-65-94-202-180.dsl.bell.ca) (Ping timeout: 248 seconds) [14:32:21] *** Quits: philipp_sk (~Philipp@ktnron0916w-lp130-1-65-94-202-180.dsl.bell.ca) (Read error: Connection reset by peer) [14:32:35] *** Joins: philipp_sk (~Philipp@ktnron0916w-lp130-1-65-94-202-180.dsl.bell.ca) [14:33:11] *** Joins: travis-ci (~travis-ci@ec2-54-91-62-70.compute-1.amazonaws.com) [14:33:12] (spdk/master) scsi: Make scsi_lun_free_io_channel public (Shuhei Matsumoto) [14:33:13] Diff URL: https://github.com/spdk/spdk/compare/8ee512219e2f...1e5705ee6f18 [14:33:13] *** Parts: travis-ci (~travis-ci@ec2-54-91-62-70.compute-1.amazonaws.com) () [14:37:27] *** Joins: travis-ci (~travis-ci@ec2-54-91-62-70.compute-1.amazonaws.com) [14:37:28] (spdk/master) test: remove more sys_sgsw dependencies from test scripts (John Meneghini) [14:37:28] Diff URL: https://github.com/spdk/spdk/compare/1e5705ee6f18...f04277f0bffc [14:37:28] *** Parts: travis-ci (~travis-ci@ec2-54-91-62-70.compute-1.amazonaws.com) () [14:38:05] *** Quits: philipp_sk (~Philipp@ktnron0916w-lp130-1-65-94-202-180.dsl.bell.ca) (Ping timeout: 240 seconds) [14:57:26] *** Joins: bwalker_ (~bwalker@ip70-190-226-244.ph.ph.cox.net) [14:57:26] *** ChanServ sets mode: +o bwalker_ [15:37:44] *** Joins: travis-ci (~travis-ci@ec2-54-197-190-52.compute-1.amazonaws.com) [15:37:45] (spdk/master) test/iscsi: Change the order of delete_bdev to fix nightly test failure. (Shuhei Matsumoto) [15:37:45] Diff URL: https://github.com/spdk/spdk/compare/f04277f0bffc...546a1148470b [15:37:45] *** Parts: travis-ci (~travis-ci@ec2-54-197-190-52.compute-1.amazonaws.com) () [15:39:57] *** Joins: travis-ci (~travis-ci@ec2-54-205-60-99.compute-1.amazonaws.com) [15:39:58] (spdk/master) doc/jsonrpc: Add get_rpc_methods (Shuhei Matsumoto) [15:39:59] Diff URL: https://github.com/spdk/spdk/compare/546a1148470b...de2492fa8d16 [15:39:59] *** Parts: travis-ci (~travis-ci@ec2-54-205-60-99.compute-1.amazonaws.com) () [15:42:11] jimharris: Thx for the tip regarding the use of using ftrace on the kernel cmdline. Haven't gotten to it yet -- trying to debug another issue. [15:42:55] *** Joins: travis-ci (~travis-ci@ec2-54-205-60-99.compute-1.amazonaws.com) [15:42:56] (spdk/master) ocssd: lba_status bit array on vector completion queue entry (Young Tack Jin) [15:42:57] Diff URL: https://github.com/spdk/spdk/compare/de2492fa8d16...459af2f0d4c2 [15:42:57] *** Parts: travis-ci (~travis-ci@ec2-54-205-60-99.compute-1.amazonaws.com) () [15:44:11] jimharris: Trying to assist a group who launched nvmf_tgt with '-s 4096' based on SPDK release 18.04. Oddly, it looks like the '-s ' option is ignored, and nvmf_tgt gobbles up all of the hugepages -- way more 2048 hugepages -- and then later fails when trying to mmap() them. [15:44:49] dpdk always first maps all hugepages, then sorts them and releases ones it doesn't need later [15:45:02] so if you are checking sometime during initialization, that could be what you are seeing [15:45:18] jimharris: Addendum: the '-s ' isn't completely ignored as I can see it does the xlate to the '-m N' value to stdout. [15:46:02] bwalker: Really, it allocates them all then releases them?? Why not just allocate up to N ? [15:46:25] * jimharris waits for a rant from bwalker [15:46:26] the algorithm is trying to put together the fewest possible physically contiguous regions [15:46:34] *** Joins: dlw (~Thunderbi@222.131.154.193) [15:46:45] they rewrote this in DPDK 18.05 - at least as an option [15:47:09] so in DPDK 18.05 it stops trying to get big contiguous regions and instead just allocates them 1 hugepage at a time [15:49:50] This system has 32768 hugepages available for use. The -s option to nvmf_tgt was 4096, so only 2048 hugepages should get used. Per strace, I see an mmap() failure on page 7553, ENOMEM (Cannot allocate memory) [15:51:21] *** Quits: dlw (~Thunderbi@222.131.154.193) (Ping timeout: 264 seconds) [15:53:22] PSA: We will be rebooting the server at about 3:00 PM UTC tomorrow. There may be a short blip in connectivity to this channel. [15:54:53] (for people using the Intel ZNC bouncer) [15:55:04] ha - was just getting ready to say the same thing [15:55:51] lhodev - what will be using the rest of the hugepages on this system? [15:57:25] does it possibly run out of address space? I think there is some sysctl that controls that [15:58:02] or actually a ulimit [16:12:21] *** Joins: travis-ci (~travis-ci@ec2-54-91-62-70.compute-1.amazonaws.com) [16:12:22] (spdk/master) test/config: add command line args to vm_setup.sh (Seth Howell) [16:12:22] Diff URL: https://github.com/spdk/spdk/compare/e48f0bf17dd6...b3d61e5d0171 [16:12:22] *** Parts: travis-ci (~travis-ci@ec2-54-91-62-70.compute-1.amazonaws.com) () [16:25:16] *** Joins: travis-ci (~travis-ci@ec2-54-91-62-70.compute-1.amazonaws.com) [16:25:17] (spdk/master) blobstore: blob_operation_split_rw unit tests (Tomasz Kulasek) [16:25:18] Diff URL: https://github.com/spdk/spdk/compare/b3d61e5d0171...71098fd4899d [16:25:18] *** Parts: travis-ci (~travis-ci@ec2-54-91-62-70.compute-1.amazonaws.com) () [16:56:16] bwalker: do you want to reorder some of the patches in this series? https://review.gerrithub.io/#/c/spdk/spdk/+/416869/ - I think a few of these we can get in now pending discussion on the nvmf target buffer pool [17:02:32] *** Joins: philipp-sk (~Philipp@ktnron0916w-lp130-04-70-51-156-4.dsl.bell.ca) [17:19:05] drv, or anyone I guess. When I left a few weeks ago the crypto stuff was working great and if I pull that patch (34) it still does. however after rebase I get a strange (?) linker error (a shitload of them) on the openssl lib that I built myself, the same exact .a file that links fine with patch set 34. Below is a snippet of the error. Any hint as to what changed on master in the last 2 weeks that would cause this? If not I'll start digging in the [17:19:05] morning... [17:19:06] LINK test/unit/lib/iscsi/conn.c/conn_ut [17:19:06] CC test/unit/lib/scsi/scsi_bdev.c/scsi_bdev_ut.o [17:19:10] CC test/unit/lib/nvmf/ctrlr_discovery.c/ctrlr_discovery_ut.o [17:19:42] crap, the error won't paste for some reason... [17:19:47] "/home/peluse/spdk/dpdk/build/lib/libcrypto.a(cryptlib.o): relocation R_X86_64_32 against `.rodata.str1.1' can not be used when making a shared object; recompile with -fPIC" [17:19:55] oh, with double quotes it does... [17:20:06] peluse: you should be able to drop the crypto lib from your Makefile [17:20:15] we now link the system openssl -lcrypto in SYS_LIBS [17:20:43] OK, will try that. thanks [17:20:51] (the reason behind the error is probably that the libspdk.so shared lib won't link if your crypto lib isn't built with -fPIC) [17:21:49] there's a test timeout on https://review.gerrithub.io/416878, restart CI build please? [17:22:25] or, shall I look into something to fix it from my side? [17:24:23] drv, worked like a champ... thanks!!! [17:24:52] cool [17:25:21] philipp-sk: it looks like the follow-up patch timed out in the same way, so it's likely something in the patch [17:27:40] will look, thanks [17:38:20] looking at logs for both reviews, it is test/nvmf/lvol/nvmf_lvol.sh that fails. [17:38:22] On my local machine it is running fine: https://pastebin.com/8D8EiJBE [17:38:57] what's approach to resolve this? [17:39:17] drv^ [17:44:19] philipp-sk: are you testing on soft roce? [17:45:24] fedora-03 is testing with a mellanox rnic, so it tests with SUBSYS_NR=10 rather than SUBSYS_NR=1 [17:45:45] drv: 03:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5] [17:46:17] it is going down the "Using software RDMA, lowering number of NVMeOF subsystems." branch, so maybe the detection is wrong [17:46:44] you can try removing the check_ip_is_soft_roce block in the test script [17:46:57] or just change it to set SUBSYS_NR higher [17:50:50] ok, started getting same errors. thanks, drv [18:51:22] *** Quits: philipp-sk (~Philipp@ktnron0916w-lp130-04-70-51-156-4.dsl.bell.ca) (Ping timeout: 264 seconds) [19:01:32] *** Quits: Guest53213 (~lyan@2605:a000:160e:2124:4a4d:7eff:fef2:eea3) (Quit: Leaving) [19:54:00] *** Joins: dlw (~Thunderbi@114.255.44.143) [19:58:51] FYI this week's Asia community meeting falls on a US holiday, I doubt anyone from the US will be calling in - I know I won't be :) [21:02:46] *** Quits: dlw (~Thunderbi@114.255.44.143) (Ping timeout: 264 seconds) [21:03:49] *** Joins: dlw (~Thunderbi@114.255.44.143) [21:19:53] jimharris: Sorry for the slow reply. Re: "what will be using the rest of the hugepages", ultimately there will be secondary processes that will make use of the hugepages, but it's curious (to me) why the failure occurred. Even considering my new found knowledge thanks to you guys, that in the DPDK would alloc all the pages first and then release them, how would we incur a ENOMEM condition if all the hugepages were free prior to [21:19:54] nvmf_tgt launch? I'd ASSume "something" else consumed some hugepages first, but thus far I'm not aware of anything that has. [21:59:59] drv: Thx Daniel for the heads up to check on some resource vals; i.e. ulimit/rlimit. Will check on that when the system is brought back online tomorrow. [22:53:48] lhodev: My guess is that your ulimit is lower than 4GB. DPDK actually handles mmap ENOMEM failure, but if it didn't manage to allocate the requested amount of memory at the end, *then* it fails. [22:55:48] *** Joins: pohly (~pohly@p5484976F.dip0.t-ipconnect.de)