[00:21:15] *** Joins: tomzawadzki (tomzawadzk@nat/intel/x-vhaqucgdtswmkilb) [00:47:19] https://review.gerrithub.io/c/391553/2/test/nvmf/host/identify_kernel_nvmf.sh#49 that sleep is still not enough [00:48:19] could we try to connect to the target via nvme connect, check a couple of times for a block device to appear and disconnect? [02:39:29] *** Quits: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) (Ping timeout: 260 seconds) [02:43:42] *** Joins: tkulasek (86bfdc49@gateway/web/freenode/ip.134.191.220.73) [04:05:54] *** Quits: tomzawadzki (tomzawadzk@nat/intel/x-vhaqucgdtswmkilb) (Quit: Leaving) [04:06:04] *** Joins: tomzawadzki (~tomzawadz@134.134.139.75) [05:52:34] *** Quits: tkulasek (86bfdc49@gateway/web/freenode/ip.134.191.220.73) (Ping timeout: 260 seconds) [06:39:26] *** Quits: sbasierx (sbasierx@nat/intel/x-iugasbnexfzvlnjg) (Quit: Going offline, see ya! (www.adiirc.com)) [06:52:10] *** Joins: tkulasek (86bfdc49@gateway/web/freenode/ip.134.191.220.73) [08:26:57] *** Quits: tomzawadzki (~tomzawadz@134.134.139.75) (Ping timeout: 248 seconds) [08:38:13] There is something wrong with NBD server. I think tests not properly disconnect server from /dev/nbd0 [08:38:17] Pls see http://spdk.intel.com/public/spdk/builds/review/f540e45cabb04dcac170b8ccf397cf6b36d769f4.1518783678/fedora-02/build.log [09:21:20] peluse: this ASAN issue is a recursion issue in the registration code [09:21:25] or rather unregistration code [09:22:11] yeah, I'm still seeing strange things after expanding the macro. What specifically do you think it is? [09:24:05] i thought I saw a recursion issue early on but convinced myself it wasn't a problem if you're talking about unregister in that FOREACH_SAFE loop ending up calling another unregsiter [09:25:16] simple case is Malloc with Split acting as a passthrough module (i.e. Split it into 1 piece instead of multiple) [09:25:42] unregister(Malloc0) calls remove_cb(Split) [09:26:17] Split closes its descriptor for Malloc, which triggers another call to unregister(Malloc0) [09:26:30] but this is all recursive [09:26:51] so eventually we unwind at the end, back to the original unregister(Malloc0) context and we're touching memory that got freed [09:27:22] if you put a static nest variable in spdk_bdev_unregister(), increment it when entering and decrement when returning - it's pretty clear [09:27:25] I'm not seeing to unregisters for the same bdev though [09:28:16] hmmm - you mean if you put a print statement in spdk_bdev_unregister(), you're not seeing it called twice for the same bdev? [09:28:48] pretty sure I would have noticed that but who knows, let me go look [09:28:52] here's what I am seeing: [09:29:56] watch the part and tmp vars used in the FOREACH_SAFE loop they all work as normal until the ref count goes to 0, then [09:34:11] OK, not ready to explain the list thing right now. let me go double check and make sure I don't see unregsiter twice real quick [09:36:10] Yeah, I did but the first couple of cases are beacuse of open descriptors so we return fromt he function a little early and don't call destruct. That looked normal to me [09:36:26] There's only one call that ever goes through the whole function and calls destruct [09:36:55] per bdev that is [09:39:01] can't fully explain what I'm in the middle of looking at yet but it looks sorta like the part* in the FOREACH_SAFE loop is getting corrupted through the unregsiter callback not because of a removal but because the first element in the part structure is the bdev so when the bdev gets removed the list element gets jacked. I put a dummy var at the top of the part struct and that ASAN went away. I have a new one a bit later on though that looks very similar, [09:39:01] but haven't dug in yet [09:42:19] BTW the SAFE loop I'm talking about is the one in spdk_bdev_part_base_hotremove() [09:50:38] that struct thing might be a read herring though... i have to back out some extra dbug stuff I added in that might be clouding the picture for me [09:50:45] red* [09:54:08] can you try https://review.gerrithub.io/#/c/400305/ [09:56:28] sure, have a meeting in just a few [09:57:50] I'm just gonna try pasting those into my working print-rittled branch :) [10:00:41] NICE! yeah, I would have gotten there eventually :) [10:00:47] did it work? [10:00:52] fa shizzle! [10:00:59] sweet [10:01:01] (for sure if you don't speak Snoop Dog) [10:01:20] I'll step through it after some calls and probably have some questions for ya. Thanks!!!! [10:24:25] *** Quits: tkulasek (86bfdc49@gateway/web/freenode/ip.134.191.220.73) (Ping timeout: 260 seconds) [11:35:27] pwodkowx: fix for nbd issue: https://review.gerrithub.io/#/c/400314/ [14:42:58] jimharris, how did that one show up? I also have been seeing an nbd issue while chasing this ASAN thing that I was gonna bring up next. Hell you're making my life easy! [14:43:35] can U please fix whatever it is I run into next before I run into it? That's be great, thanks [14:43:38] :) [14:43:40] lol [14:46:16] hmmm - looks like seth already got that fix in on master [14:49:43] pawel and i both hadn't rebased from that [16:35:49] yeah, me either. After I step through my branch w/o your fix on the unregister thing to make sure it makes compelte sense to me and to understand why i didn't see it before, I'll rebase and see if my nbd thing goes away too