[00:03:14] *** Quits: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) (Ping timeout: 260 seconds) [00:26:12] *** Joins: tomzawadzki (~tomzawadz@192.55.54.40) [03:09:54] *** Quits: igor__ (84ed9a7e@gateway/web/freenode/ip.132.237.154.126) (Ping timeout: 260 seconds) [03:55:18] *** Quits: tkulasek (~tkulasek@192.55.54.44) (Quit: Leaving) [03:55:44] *** Joins: tkulasek (~tkulasek@134.134.139.83) [04:01:51] *** Quits: tomzawadzki (~tomzawadz@192.55.54.40) (Ping timeout: 255 seconds) [05:23:41] *** Quits: darsto (~darsto@89-68-12-100.dynamic.chello.pl) (*.net *.split) [05:23:43] *** Quits: mszwed (mszwed@nat/intel/x-ixwswxazgbkmxiwm) (*.net *.split) [05:23:43] *** Quits: destrudo (~destrudo@tomba.sonic.net) (*.net *.split) [05:24:57] *** Quits: tkulasek (~tkulasek@134.134.139.83) (Remote host closed the connection) [05:26:09] *** Joins: darsto (~darsto@89-68-12-100.dynamic.chello.pl) [05:26:09] *** Joins: mszwed (mszwed@nat/intel/x-ixwswxazgbkmxiwm) [05:26:09] *** Joins: destrudo (~destrudo@tomba.sonic.net) [05:26:58] *** Joins: tomzawadzki (tomzawadzk@nat/intel/x-xusctwhhqorumvob) [06:36:33] *** Quits: tomzawadzki (tomzawadzk@nat/intel/x-xusctwhhqorumvob) (Ping timeout: 248 seconds) [07:53:00] Reminder: community meeting is about 13 hours away from now. See http://www.spdk.io/community/ details and see trello for agenda at https://trello.com/b/DvM7XayJ/spdk-community-meeting-agenda [08:01:09] *** Quits: darsto (~darsto@89-68-12-100.dynamic.chello.pl) (*.net *.split) [08:01:13] *** Quits: mszwed (mszwed@nat/intel/x-ixwswxazgbkmxiwm) (*.net *.split) [08:01:13] *** Quits: destrudo (~destrudo@tomba.sonic.net) (*.net *.split) [08:01:14] *** Quits: dlw (~Thunderbi@114.255.44.143) (*.net *.split) [08:01:15] *** Quits: vermavis (vermavis@nat/intel/x-lydkabgktqkxurqo) (*.net *.split) [08:01:15] *** Quits: sethhowe (~sethhowe@192.55.54.42) (*.net *.split) [08:01:15] *** Quits: gila (~gila@5ED4D9C8.cm-7-5d.dynamic.ziggo.nl) (*.net *.split) [08:01:17] *** Quits: stefanha (~stefanha@yuzuki.vmsplice.net) (*.net *.split) [08:01:18] *** Quits: ChanServ (ChanServ@services.) (*.net *.split) [08:01:46] *** Joins: stefanha (~stefanha@yuzuki.vmsplice.net) [08:01:46] *** Joins: ChanServ (ChanServ@services.) [08:01:46] *** orwell.freenode.net sets mode: +o ChanServ [08:02:51] *** Joins: darsto (~darsto@89-68-12-100.dynamic.chello.pl) [08:02:51] *** Joins: mszwed (mszwed@nat/intel/x-ixwswxazgbkmxiwm) [08:02:51] *** Joins: destrudo (~destrudo@tomba.sonic.net) [08:04:07] *** Joins: dlw (~Thunderbi@114.255.44.143) [08:04:07] *** Joins: vermavis (vermavis@nat/intel/x-lydkabgktqkxurqo) [08:04:07] *** Joins: sethhowe (~sethhowe@192.55.54.42) [08:04:07] *** Joins: gila (~gila@5ED4D9C8.cm-7-5d.dynamic.ziggo.nl) [08:43:36] FreeBSD contigmem revert: https://review.gerrithub.io/#/c/spdk/dpdk/+/409047/ [08:47:03] and associated SPDK patch to update the DPDK submodule: https://review.gerrithub.io/#/c/spdk/spdk/+/409048/ [09:07:05] the submodule update will probably need to be respun once the DPDK change is merged - the commit hash will change due to Gerrit's rebase on merge [09:28:18] will do - just wanted to run it through once as a sanity check [09:34:03] bwalker: please take a look at Gang's QoS RPC fix: https://review.gerrithub.io/#/c/spdk/spdk/+/408937/ [09:34:14] also, we really need tests for this stuff... [09:53:59] bwalker: https://review.gerrithub.io/#/c/spdk/dpdk/+/409047/ [10:28:33] *** Joins: travis-ci (~travis-ci@ec2-54-92-168-171.compute-1.amazonaws.com) [10:28:34] (spdk/master) doc: changelog update (Maciej Szwed) [10:28:34] Diff URL: https://github.com/spdk/spdk/compare/130307862b12...ee3213da3e48 [10:28:34] *** Parts: travis-ci (~travis-ci@ec2-54-92-168-171.compute-1.amazonaws.com) () [10:41:43] *** Joins: travis-ci (~travis-ci@ec2-54-92-168-171.compute-1.amazonaws.com) [10:41:44] (spdk/master) nbd: fix JSON config dump (Pawel Wodkowski) [10:41:44] Diff URL: https://github.com/spdk/spdk/compare/d69ccc823a8d...629405ddfe2d [10:41:44] *** Parts: travis-ci (~travis-ci@ec2-54-92-168-171.compute-1.amazonaws.com) () [10:55:57] *** Joins: travis-ci (~travis-ci@ec2-54-92-168-171.compute-1.amazonaws.com) [10:55:58] (spdk/master) bdev/qos: set the enabled flag through the RPC method (GangCao) [10:55:58] Diff URL: https://github.com/spdk/spdk/compare/629405ddfe2d...804ebf9985e9 [10:55:58] *** Parts: travis-ci (~travis-ci@ec2-54-92-168-171.compute-1.amazonaws.com) () [11:05:22] *** Parts: darsto (~darsto@89-68-12-100.dynamic.chello.pl) ("Leaving") [11:14:37] *** Joins: mhae (0fd3c95b@gateway/web/freenode/ip.15.211.201.91) [11:21:57] I was wondering if anyone could give me some pointers in troubleshooting the vhost target example. I don't get the VM image to boot. I left out the Nvme and Malloc1 bdev. [11:22:59] I'm using SPDK 18.01.1 and QEMU 2.12 on CentOS 7 [11:24:23] The qemu command line I'm using is: /usr/local/bin/qemu-system-x86_64 --enable-kvm -cpu host,-pmu -smp 2 -m 1G -object memory-backend-file,id=mem0,size=1G,mem-path=/dev/hugepages,share=on -numa node,memdev=mem0 -drive file=t2.qcow2,if=none,id=disk -device ide-hd,drive=disk,bootindex=0 -chardev socket,id=spdk_vhost_scsi0,path=/var/tmp/vhost.0 -device vhost-user-scsi-pci,id=scsi0,chardev=spdk_vhost_scsi0,num_queues=4 -chardev stdio,id=seab [11:48:57] drv, FYI that bug I thought was in ioat I believe is in my write path in the crypto vbdev.. will keep ya posted I'm hot on its trail [11:51:31] *** Joins: darsto (~darsto@89-68-12-100.dynamic.chello.pl) [12:00:42] mhae: QEMU 2.12 have changes that will make SPDK vhost unable to boot. We fixed this in current master and those fixes will be part of 18.04 release [12:03:45] we may need to do a 18.01.2 release with that patch - pwodkowx, could you or darsto see if that patch applies cleanly to 18.01? [12:05:43] do we want this ASAP or we want final solution? [12:06:39] jimharris: sorry, what patch? [12:06:45] i had to leave #spdk for a while [12:07:06] for QEMU 2.12 [12:10:31] ls [12:14:22] jimharris: for https://review.gerrithub.io/#/c/spdk/spdk/+/407236/, is there any ETA this will be merged? This is conflicting with reworking bdev_iscsi for JSON configuration [12:15:01] QEMU 2.12 support for SPDK 18.01.x here - https://review.gerrithub.io/c/spdk/spdk/+/409073 [12:15:29] verified to boot [12:21:24] pwodkowx,darsto: Thanks for the quick reply. I just compiled master and I'm still not able to boot. Note, my CentOS7 host is inside a VM if that matters. I see the 'virto is now ready for processing." message but that's the last thing I see in the vhost output. [12:25:24] hmmm [12:26:19] mhae: can you just check https://github.com/spdk/qemu/tree/spdk-2.12-pre to see if you have working env? [12:26:40] it should be followed by "Started poller for vhost controller [...]" message [12:26:56] @pwodkowx: Will do [12:27:30] pwodkowx mhae: we also tested against upstream QEMU. klateck did a jenkins job just for that [12:29:07] @darsto: Yep. When I leave out the vhost user device and add it later (via qemu monitor) then I see "Started poller ... " messages. [12:39:02] I'm trying to test stable qemu 2.12 now [12:45:13] it still works [12:46:00] darsto: What quemu command line are you using? Or is there another example? [12:46:27] i tried the exact same you pasted here [12:49:00] mhae: if you shutdown the VM while it's stuck on booting, do you get any output in SPDK vhost? [12:52:12] darsto: The last message qemu outputs is: Searching bootorder for: /pci@i0cf8/*@4/*@0/*@0,0. When I reset the VM (Shutdown doesn't work) then I get "VHOST_CONFIG: /var/tmp/vhost.0: read message VHOST_USER_GET_VRING_BASE" and vhost peer closed after two more messages [13:01:37] darsto: a couple of quick fixes for the new rte_vhost API with older DPDK: https://review.gerrithub.io/#/c/spdk/spdk/+/409076/ and https://review.gerrithub.io/#/c/spdk/spdk/+/409077/ [13:01:39] mhae: and what is that? [13:02:03] I'm not getting anything like "Searching bootorder for: [...] [13:03:30] darsto: Did you enable the seabios debug output (-chardev stdio,id=seabios -device isa-debugcon,iob)? [13:06:59] Nope. I believe IRC truncated your QEMU config when you pasted it [13:07:51] well, it's unrelated then [13:10:11] I don't get spdk-2.12-pre to compile. Would it make sense to try it on real HW (Ubuntu)? Or is there any more debug log information? [13:12:52] I have... no clue [13:13:58] it seems like "virtio is now ready..." message is always followed by "Started poller..." or an error message [13:14:34] can you post full vhost log somewhere? [13:17:06] pwodkowx: we dropped the 18.04 tag for that patch since it wasn't critical for the 18.04 release - but didn't realize the bdev_iscsi JSON dependency - i'll take some time to review it this afternoon [13:19:02] drv: i think https://review.gerrithub.io/#/c/spdk/spdk/+/407236/ looks good - i'm ok getting this in now just to unblock some of the other JSON work that pwodkowx mentioned [13:19:33] OK - I wasn't sure if this correctly cleans up everything on shutdown, but I think it's not worse than what was there before, at least [13:19:43] and pawelkax's patches did some more cleanup on that, so I'm OK with putting it in [13:22:15] darsto: https://pastebin.com/QnbNG4bq [13:23:58] *** Joins: travis-ci (~travis-ci@ec2-54-92-168-171.compute-1.amazonaws.com) [13:23:59] (spdk/master) iscsi_initiator: Make the disconnect in async mode. (Ziye Yang) [13:23:59] Diff URL: https://github.com/spdk/spdk/compare/804ebf9985e9...9df618878152 [13:23:59] *** Parts: travis-ci (~travis-ci@ec2-54-92-168-171.compute-1.amazonaws.com) () [13:35:46] mhae: can you try adding format=qcow2 to your -drive option? [13:37:07] *** Joins: travis-ci (~travis-ci@ec2-54-81-232-247.compute-1.amazonaws.com) [13:37:08] (spdk/master) Update DPDK submodule to pull in contigmem revert. (Jim Harris) [13:37:08] Diff URL: https://github.com/spdk/spdk/compare/9df618878152...396bedd09fb4 [13:37:08] *** Parts: travis-ci (~travis-ci@ec2-54-81-232-247.compute-1.amazonaws.com) () [13:37:15] or format=raw if it's not a real qcow2 [13:39:23] darsto: Doesn't make a difference. If I remove the vhost device config then everything boots nicely (usr/local/bin/qemu-system-x86_64 --enable-kvm -cpu host,-pmu -smp 2 -m 1G -object memory-backend-file,id=mem0,size=1G,mem-path=/dev/hugepages,share=on -numa node,memdev=mem0 -drive file=t2.qcow2,format=qcow2,if=none,id=disk -device ide-hd,drive=disk,bootindex=0 -chardev stdio,id=seabios -device isa-debugcon,iobase=0x402,chardev=seabios) [13:39:42] darsto: I wonder if the boot order somehow doesn't work which could be BIOS related. [13:41:04] yep, my bios starts iterating from QEMU drive [13:41:12] I'm getting: [13:41:14] ata1-0: QEMU HARDDISK ATA-7 Hard-Disk (20480 MiBytes) [13:41:14] Searching bootorder for: /pci@i0cf8/*@1,1/drive@1/disk@0 [13:41:36] instead of "Searching bootorder for: /pci@i0cf8/*@4/*@0/*@0,0" as in your case [13:43:51] darsto: What BIOS are you using? SeaBIOS (version rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org)? [13:43:58] yep, exactly [13:44:29] Do you know your KVM version? What's your host OS? [13:50:13] i'm still afraid it's software issue [13:50:31] I'm using Debian 4.9.30-2+deb9u5 (2017-09-19) with default kvm [13:52:56] could you add some debug prints in SPDK in lib/vhost/vhost.c:1012 ? [13:53:03] is this function even entered? [13:53:31] Will do [13:55:39] pwodkowx: https://review.gerrithub.io/#/c/spdk/spdk/+/409024/ looks good but i posted a comment [14:07:42] darsto: yes, it makes it in start_device. [14:11:16] jimharris: what do you want to do about the AER errors? I think Changpeng's patch is a good idea, but as you said, it doesn't fix the problem on P4500 [14:11:18] https://review.gerrithub.io/#/c/spdk/spdk/+/408772/ [14:11:27] (I think the P4500 firmware is just buggy in this case) [14:11:56] i think we need to quirk it or something inside of SPDK [14:12:16] mhae: It should really print some logs afterwards [14:12:20] i'm worried about who knows how many people starting to see these error messages after 18.04 is released and thinking there's a problem [14:12:21] this is weird [14:12:50] yeah, I don't know how other drives behave with that bit set [14:13:18] crazy thought - does it make sense to squelch any error messages from internally generated requests? [14:13:29] possibly, although we probably do want to know this fails [14:14:19] darsto: Sorry, I wasn't clear. It cleanly runs through this function. I added debug stmts at the beginning and end. [14:14:55] darsto: Somehow it seems that the boot sequence gets interrupted and it doesn't continue on to the image drive [14:15:02] we could also do something like "only set ns_attr_notice for fabrics controllers or if controller supports NS management", but that's not really great [14:16:19] mhae: ahh, I see. we recently made these missing logs appear in debug builds only [14:16:52] so now we could assume I/O pollers *are* starting [14:17:36] mhae: it might be waiting for some inquiry I/O to complete [14:18:12] darsto: can we trace these? [14:18:34] yep, could you make SPDK with: [14:18:39] CONFIG_DEBUG=y make [14:19:24] and then add '-t vhost_scsi -t vhost_scsi_queue -t vhost_scsi_data' to vhost command line? [14:21:29] drv: is this intended? should SPDK_INFOLOG be hidden for non-debug builds? [14:24:12] darsto: I got halfway through changing that - the default print level is now INFO (so INFO logs should get printed by default), but the trace flags are all set to false by default right now [14:24:22] and there's no way to enable trace flags in a non-debug build [14:24:52] there's too much verbose stuff at INFO level currently to change that [14:25:04] see https://review.gerrithub.io/#/c/spdk/spdk/+/405935/ [14:26:05] jimharris: I pushed an updated version of changpe1's patch that adds the ns_manage || fabrics check - I think this is the least bad option for now [14:26:15] darsto: I see some request processing messages from vhost_scsi but as soon as all the start_device functions have been called, there is no additional message. When you look at my qemu cmd line, it does look correct, right? [14:26:45] yes, qemu is fine [14:27:04] could you share the log? [14:27:58] darsto: https://pastebin.com/QyzavHs1 [14:28:06] jimharris: actually, scratch that - the P4500 firmware I have reports that it supports Namespace Management, so it is just totally busted [14:28:21] yeah [14:28:27] ugh [14:28:46] it should definitely support the NS Attribute Changed event if it supports Namespace Management [14:28:51] so maybe a quirk is the way to go [14:31:53] I'm going to put changpe1's patch back the way it was [14:38:34] btw. we must have some dependency issue in makefiles. I used to see SPDK_INFOLOGS even without setting traces explicitly. They are gone now after a `make clean` [14:40:50] *** Joins: cebruns (~quassel@192.55.54.44) [14:58:08] is there any advantage to having separate apt-get/dnf/zypper/pkg invocations instead of putting everything in one? (in pkgdep.sh) [15:04:15] mhae: so it's just that seabios stops sending its I/O at some point [15:05:06] one suspicious place could be eventfd_write() in start_device() [15:06:35] we're interrupting queues that aren't really 'initialized' [15:07:09] seabios should handle it, but maybe it doesn't and gets stuck [15:07:52] darsto: I commented the code that does the eventfd_write but it didn't make a difference. [15:07:53] you could try removing that entire loop calling eventfd_write() in start_device() [15:08:09] darsto: Just did. No change )-: [15:08:09] yeah, other than that I'm out of ideas [15:08:53] I'm trying it on a laptop now that runs Ubuntu. [15:10:17] Thanks for your help [15:21:07] jimharris: I think we can merge the calls - the only benefit to the way it is now is that each group of packages gets a comment to explain why we need it [15:21:22] Ugh, I got it to work on Ubuntu. The SeaBIOS version is slightly different. I'm using qemu 2.10.1 and SPDK TOT. [15:22:25] Maybe running the host inside a VM is problematic. Very strange. [15:27:42] *** Joins: travis-ci (~travis-ci@ec2-184-73-98-158.compute-1.amazonaws.com) [15:27:43] (spdk/master) nvme: set AER configuration bits based on NVMe version (Changpeng Liu) [15:27:43] Diff URL: https://github.com/spdk/spdk/compare/396bedd09fb4...f0f3a48f4053 [15:27:43] *** Parts: travis-ci (~travis-ci@ec2-184-73-98-158.compute-1.amazonaws.com) () [15:30:17] *** Joins: travis-ci (~travis-ci@ec2-184-73-98-158.compute-1.amazonaws.com) [15:30:18] (spdk/master) vhost: fix build with DPDK 17.02 and older (Daniel Verkamp) [15:30:18] Diff URL: https://github.com/spdk/spdk/compare/f0f3a48f4053...e6acab993064 [15:30:18] *** Parts: travis-ci (~travis-ci@ec2-184-73-98-158.compute-1.amazonaws.com) () [15:51:19] *** Joins: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) [15:52:06] * peluse is losing what little hair he has left debugging a multi-core issue in the crypto vbdev.... [15:52:43] if anyone has a minute and wants to look and chat or suggest a good steep cliff w/o railing that I can go for a drive one, please take a gander.. https://gist.github.com/peluse/bad8ffacfb800098dd5c6d16cba8967a [16:09:14] *** Quits: mhae (0fd3c95b@gateway/web/freenode/ip.15.211.201.91) (Ping timeout: 260 seconds) [16:55:54] peluse: I highly recommend AZ 88 out past Apache Junction [16:56:52] LOL, I should go buy a new really fast car first :) [16:57:11] looking at the paste now [17:18:07] it looks like there's an I/O completing after the corresponding io_channel is gone, which is not supposed to happen [17:18:14] but figuring out why it's happening may be challenging [17:21:33] yeah, on the + side the address is the same every time so I've been putting a watchpoint on it and basically watching what you just described :) There are a bunch of other data points in the gist I linked above but, yeah, figuring out why is not going to be a walk in the park [17:22:36] especially without better knowledge of the bdev layer so I may take a break from debug and just start walking through init and teardown in a working case to try and build up a better idea of how shit is supposed to work and who is involved with what... [17:23:00] or I might just start drinking :) [17:57:09] *** Quits: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) (Ping timeout: 260 seconds) [18:17:06] drv, you there? [18:17:15] yes [18:17:39] QQ cuz my head hurts... easiest way to get to the spdk ch struct from my own ch struct? [18:19:54] spdk_io_channel_from_ctx() might be what you want [18:19:59] or do you mean in a debugger? [18:20:11] yeah, the API [18:20:13] thanks [18:25:36] *** Joins: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) [19:00:44] *** Quits: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) (Ping timeout: 260 seconds) [19:07:28] *** Joins: guerby_ (~guerby@ip165.tetaneutral.net) [19:09:11] *** Joins: peluse- (peluse@nat/intel/x-tnlckmrhqoghwcnk) [19:12:55] *** Quits: guerby (~guerby@april/board/guerby) (Ping timeout: 256 seconds) [19:12:55] *** Quits: peluse (peluse@nat/intel/x-rxykdertbtjmilil) (Ping timeout: 256 seconds) [19:12:57] *** peluse- is now known as peluse [19:20:36] *** ChanServ sets mode: +o peluse [19:43:46] *** Quits: guerby_ (~guerby@ip165.tetaneutral.net) (Remote host closed the connection) [19:43:54] *** Joins: guerby (~guerby@ip165.tetaneutral.net) [19:43:54] *** Quits: guerby (~guerby@ip165.tetaneutral.net) (Changing host) [19:43:54] *** Joins: guerby (~guerby@april/board/guerby) [20:55:10] FYI community meeting in 5 minutes [21:11:53] peluse: maybe we could try blindly deferring bdev_io_complete() with raw malloc+split [21:12:08] to make it more reproducible with some basic bdevio [21:15:11] *** Joins: bwalker_ (~bwalker@ip70-190-226-244.ph.ph.cox.net) [21:15:11] *** ChanServ sets mode: +o bwalker_ [21:17:02] I'm everywhere :) [21:21:42] *basic bdevio test, I meant [21:24:36] darsto, thanks yeah I can repro now pretty quick. I can see the address range in question being freed via my create_ch callabck when I put the IO channel for the underlying device and then later I get the ASAN complaint. Haven't been able to actually catch it in the debugger though... [21:24:58] I've tried setting the element in question to NULL after freeing and catching that before ASAN but maybe that's no how ASAN works [21:25:44] anyways, still have more goofing around to do in order to get a better understanding for my own general knowledge of the internal bdev layer structures and how they're used [21:28:47] *** Joins: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) [21:29:55] all: Thank you so much for today's meeting, I got nice idea. By the way next week is a holiday week in Japan and my activity will be suspended. Thank you. [23:03:24] *** Quits: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) (Ping timeout: 260 seconds)