[00:17:38] *** Joins: vmysak (~vmysak@192.55.54.38) [00:19:57] *** Quits: vmysak (~vmysak@192.55.54.38) (Remote host closed the connection) [00:35:13] @peluse do you still need some help with LD_PRELOAD and gdb? I have done this before [01:12:46] *** Joins: sherlock1122_ (~sherlock1@119.167.96.147) [02:45:18] *** Quits: sherlock1122_ (~sherlock1@119.167.96.147) (Remote host closed the connection) [05:07:47] *** Joins: tomzawadzki (~tomzawadz@134.134.139.76) [05:39:39] alekseymmm, yeah, I don't need to debug now but I'm sure I will. Please pass on any tips! thanks [05:50:21] there are few options to debug with gdb and LD_PRELOAD [05:50:40] the first one [05:50:54] gdb --args env LD_PRELOAD=examples/bdev/fio_plugin/fio_plugin fio examples/bdev/fio_plugin/peluse_config.ini [05:50:59] the second [05:51:06] gdb fio [05:51:37] set exec-wrapper env 'LD_PRELOAD=examples/bdev/fio_plugin/fio_plugin' [05:51:48] set args examples/bdev/fio_plugin/[eluse_config.ini [05:51:56] run [05:52:39] the third option (I use it usually). Run gdb in eclipse with .gdb_init file specified . In .gdb_ini : [05:53:00] set exec-wrapper env 'LD_PRELOAD=examples/bdev/fio_plugin/fio_plugin' [06:08:04] ahh cool. I'll try the first one real quick [06:10:16] works great, thanks! I didn't know about the "env" in there, was trying without before. appreciate the help!! [06:42:45] jimharris, FYI crypto is passing on 2/3 of the crypto enabled systems in CI now. 3rd failures is a segfault in identify. 2 other systems failed, one I recognize as a latent failure, the other I don't but doesn't appear to be crypto. Will resubmit and see how the latent failures do. I'll send a list of them out after looking into them more closely [07:32:50] *** Quits: tomzawadzki (~tomzawadz@134.134.139.76) (Ping timeout: 272 seconds) [08:00:10] peluse: cool - the note you made earlier about I/O that was too big, did you get that issue resolved? [08:00:31] yeah, I just had my assert in the wrong place :) [08:00:38] btw, just merged the patch for getting the IO buf before splitting [08:01:17] crypto passed on the enabled machines twice now (in a row, heh). I see a segfault on identify 2x in a row also I'm starting to look at. a scan build issue on fedora 9 was the other issue, I submitte d apatch for that [08:01:27] cool [08:02:18] can you take a fast gander at this - just curious if you've seen it before (scroll to very end) https://ci.spdk.io/spdk/builds/review/5f4f73add394d7bc1ea005bc486d99ab17a8cf6b.1536325951/ubuntu17.10/build.log [08:03:17] *** Joins: travis-ci (~travis-ci@ec2-54-82-100-208.compute-1.amazonaws.com) [08:03:18] (spdk/master) bdev: call spdk_bdev_io_get_buf before splitting (Jim Harris) [08:03:18] Diff URL: https://github.com/spdk/spdk/compare/9ad05b31425e...01e7c02e152c [08:03:18] *** Parts: travis-ci (~travis-ci@ec2-54-82-100-208.compute-1.amazonaws.com) () [08:04:22] I've been remiss in looking at nightly build status but it doesn't look good on first glance. Is anyone already looking into these - do we have some known issues that are WIP? https://ci.spdk.io/spdk/nightly_status.html [08:05:56] jimharris, heh, looks like I submitted a patch for the same scan-build issue that you did, didn't notice yours :) [08:09:35] peluse: you realize it's after 5pm on Friday for darsto :) [08:10:21] to keep you unblocked, can you just take your snprintf approach, but if get_socket_id returns -1, print an error message and use 0 instead? [08:10:34] then you can change it later once darsto's stuff goes in [08:10:37] so what's your point? LOL, its not urgent, I mentioned in the email I can h/c 0 for now [08:11:29] the approach I suggested would be fine to be checked in [08:11:43] bueno [08:13:45] that identify failure you saw doesn't look familiar to me [08:18:06] ugh, thanks [08:18:31] can you mark it as a latent failure and re-run it? [08:22:36] yup, working on it now - rebasing and and updating the socket_id deal. Where do I mark it as latent (I know where its listed)? [08:22:52] through the internal spdk status page [08:25:12] OK, I'm not sure its latent though. Failed for 3x in a row but only on that machine. Just pushed w/rebase so if it fails 4x I'd say it's likely got something somehow to do with crypto. Will have to try on my own ubuntu instance I guess [08:26:51] that ubuntu 17 is a VM, I guess I should be able to use vmsetup stuff and repro that locally, will give that a go [08:29:04] btw you've changed lib/bdev/bdev.c to mode 100755 in your latest patch [08:30:05] hmm, wonder how that happened. thanks for noticing, will fix on next push [08:33:41] *** Quits: guerby (~guerby@april/board/guerby) (Remote host closed the connection) [08:34:34] *** Joins: guerby (~guerby@april/board/guerby) [09:05:17] *** Joins: travis-ci (~travis-ci@ec2-54-160-214-80.compute-1.amazonaws.com) [09:05:18] (spdk/master) build: remove DPDK_DIR (Jim Harris) [09:05:19] Diff URL: https://github.com/spdk/spdk/compare/01e7c02e152c...ff6299f831ab [09:05:19] *** Parts: travis-ci (~travis-ci@ec2-54-160-214-80.compute-1.amazonaws.com) () [09:34:15] jimharris, yeah everything is now passing on crypto except the nvme identify failure 4x in a row on ubuntu 17.10. As soon as I get my VM going will see how that can be related [09:44:54] could you try - just as a test - modifying line 90 in test/nvme/nvme.sh [09:45:10] change the 2048 to 4096 [09:45:31] this test failure is in the nvme/identify app, but this is running in multi-process mode [09:45:53] meaning that the stub app has probably run some of your crypto stuff - it's possible that's allocating extra memory than expected [09:46:12] if this works, then we still need to fix the nvme/identify app [11:27:48] jimharris, yeah but probably not before I have to take off. I'll let you know how it goes though [11:27:51] thanks! [11:28:29] well, actually I'll temp add that change to my patch and push it (instead of repro'ing locally) - I have time to do that real quick [13:29:25] *** Joins: travis-ci (~travis-ci@ec2-54-82-100-208.compute-1.amazonaws.com) [13:29:26] (spdk/master) doc: clarify alternate dpdk options (Lance Hartmann) [13:29:26] Diff URL: https://github.com/spdk/spdk/compare/1c79fadb23c4...b6938efd0b8a [13:29:26] *** Parts: travis-ci (~travis-ci@ec2-54-82-100-208.compute-1.amazonaws.com) () [13:36:54] *** Joins: bluebird (~bluebird@p5DE955DE.dip0.t-ipconnect.de) [13:38:55] Hi, I'm trying to get vhost-userscsi-pci and/or vhost-user-blk-pci working in Windows 10. I can see the device in disk manager, but I can't even partition it, gives random i/o errors. The spdk page states that it workes with recent version of Windows...anything special I need to do? It works perfectly with Linux. [13:40:38] hi bluebird [13:41:09] could you send a note to the spdk mailing list on this? i don't have an answer for you offhand [13:41:47] sure, thanks anyway :-) [13:42:01] a lot of the vhost experts are based in Poland and can take a look after the weekend - you may also consider filing an issue on GitHub [13:42:36] i do know that this worked at one point - unfortunately we don't have automated vhost tests in place for Windows VMs, so you're probably hitting a regression [13:44:09] could you try starting the vhost target, start your windows vm, then before trying to partition, enable debug flags on the vhost target via RPC? [13:44:17] one sec and i'll get you a command line to try [13:44:23] to enable the debug flags [13:44:32] then we can get more info on exactly why the I/O is failing [13:45:38] yeah, didn't even know there is an extra debug option, it's already quite verbose without one [13:46:44] agreed - that's something we need to work on - especially vhost since DPDK by default prints stuff out for every single vhost message [13:47:42] what are you using for the backing storage? [13:48:26] construct_aio_bdev on an lvm volume, I also tried a raw preallocated file on ext4 [13:50:02] and it fails for both user-scsi and user-blk? [13:50:57] yes, scsi seems to be slightly better but still unusable [13:51:16] ok [13:51:32] so the rpc command you'll want (assuming your pwd is root of spdk repository) [13:51:56] scripts/rpc.py set_trace_flag [13:52:14] there's no way currently to just enable all of them, so try enabling these: [13:52:41] aio, bdev, vhost_scsi, vhost_scsi_queue, vhost_scsi_data, scsi [14:00:22] hm, I'm not getting any additional output [14:00:41] oh - sorry, you'll need to ./configure --enable-debug [14:00:50] and rebuild the vhost target [14:01:22] that's an spdk bug - if you try to set these flags on the command line on a non-debug build, it will tell you outright - the RPCs should be doing the same [14:01:28] i'll go file that now [14:02:50] ah, there's the output [14:03:59] one odd thing already, during boot, which takes way longer than it should, it keeps logging "bdev_aio.c: 137:bdev_aio_readv: *INFO*: read 1 iovs size 512 to off: 0", as if it was retrying to read a 512byte sector but the blocksize of the volume is 4096 [14:05:44] well, unless you got an idea what to try from the top of your head it's probably best to collect all the output and send it to the mailing list [14:05:57] is it natively 4096 or did you explicitly set it to 4096 with the AIO config? [14:06:15] natively [14:06:43] did you enable the trace flags before or after booting the VM? [14:06:49] before [14:07:15] yeah - i'm not surprised that takes a long time [14:07:45] could you try a test with a native 512 byte block size? [14:08:40] yes, one moment [14:13:04] no difference. debug log looks the same as with 4096 [14:13:40] are you using a config file or RPCs to set up the aio bdev? [14:13:56] rpc, would a config file make a difference? [14:14:12] it shouldn't - what's the exact rpc command line you're using? [14:14:30] * jimharris should probably make sure it doesn't make a difference [14:16:01] /home/bluebird/SCM/spdk/scripts/rpc.py construct_aio_bdev /dev/sdd aio.0 && /home/bluebird/SCM/spdk/scripts/rpc.py construct_vhost_scsi_controller vhost.1 %% /home/bluebird/SCM/spdk/scripts/rpc.py add_vhost_scsi_lun vhost.1 0 aio.0 [14:16:55] sdd is an usb stick [14:17:11] which has 512 byte LBA? [14:17:15] yes [14:17:47] create_aio_disk() in lib/bdev/aio/bdev_aio.c [14:18:09] see the call to spdk_fd_get_blocklen() [14:18:27] this is where if you don't specify a block size (which you aren't) it tries to figure it out for you [14:19:13] could you instrument or use debugger to see what that is returning? [14:20:06] yes, one moment [14:20:25] np [14:22:03] it returns 512, just like blockdev --getbsz /dev/sdd [14:22:33] ok [14:22:44] so we know for sure it's not a 512 v. 4096 problem then [14:24:24] when you create the github issue, please attach the part of the extra debug output, when you try to do the partitioning [14:24:36] ok [15:21:25] *** Quits: bluebird (~bluebird@p5DE955DE.dip0.t-ipconnect.de) (Quit: Leaving) [16:32:12] jimharris, FYI I can repro the identify leak issue on one of my systems but not on the other same code, same OS but possibly some package version differences. Will keep ya posted as I look into it over the weekend [16:35:02] ...and on the one I can reproduce it on, if I compile w/o crypto (even though its not in the IO path on this test) it does indeed work