Possible race condition in ActionQueue.ExecuteActions

Description

I already put this on the nhibernate-development mailing list because I wasn't sure if this is an actual NHibernate bug (back with 4.0.2GA, but since it apparently still exists over a year later and no replies on the ML, I decided to finally throw this up here (especially because we again tried to upgrade our libraries to latest and 4.1.1GA still shows the issue).

I randomly (and rarely) get an ArgumentOutOfRangeException in ActionQueue.ExecuteActions when committing an ITransaction (created by ISession.BeginTransaction) inside a TransactionScope:

In particular, the local size is something like 3 or 4 (depending on where I get the Exception and the number of entities affected), while list.Count is 0 when it happens. The loop iteration is mostly at the last index when it does; I've never seen it happen with an index lower than size-1 before.

Code looks pretty much like this:

(I noticed that committing the transaction also flushes the Session, so the session.Flush() there may not be necessary...)

The given code sample is called from a single thread (there is no manual multithreading going on, nor is there any async/await, Task.Run, Parallel.ForEach or whatever involved), although there are potentially multiple (distinct, non-related) threads running which also use their own instances of ISession. Sessions are not shared between threads, mostly because our own code isn't particularly thread-safe either; but in part also because Session-use is similar to the one illustrated above - using-Blocks inside methods without anything fancy around them.

I wasn't particularly successful in creating a minimum example that reproduces the issue, but my current workaround is patching the method to create a copy first:

And while this is still potentially problematic, it solved the exceptions we kept seeing every now and then; so it might be related to what ActionQueue.Execute does (either because the passed IExecutable is doing something to the execution list, or because the cleanup actions affect something afterwards)

Environment

None

Activity

Show:

Frédéric Delaporte September 4, 2017 at 1:28 PM

This is somewhat the same issue as NH-2322, which has been resolved with what I see as a crutch (switching a foreach loop for a for, in order to avoid the "collection modified during enumeration" error). But this crutch does not help for NH-4077.

In fact, the auto-flush is mainly disabled during operation which internals may triggers auto-flush, after they have already triggered it if required. The regular flush (included the one from commit) does not disable it. I am checking what it gives to have it disabled from the regular flush too.

Emanuel Wlaschitz September 4, 2017 at 12:06 PM

Actually, it seems that I can force two different exceptions by doing the same thing (causing auto-flush) from different Listeners.
Doing it during a IPostUpdateEventListener causes the ArgumentOutOfRangeException, but doing the same thing from a IPostInsertEventListener leads to a primary key violation due to the same inserts running multiple times.

Test cases on my GitHub fork

Using FlushMode.Commit actually does fix both of them, as well as removing the transaction and only using the session. But since we use this for multi-layered objects that need a consistent state, dropping the transaction isn't really an option. And with FlushMode.Commit, we noticed that some query results (ie. for validations beforehand or other bookkeeping that happens afterwards) were infavorable for what we wanted to do at that point (basically amounting to a dirty read, if we think about it carefully...so something that we'll have to fix as well, one way or the other)

Frédéric Delaporte September 4, 2017 at 10:39 AM
Edited

Well, glad you have found the root of the issue.

I tend to think that this case, having a flush triggered from a session event itself flush related, would be considered as an invalid usage case. So it would need to be fixed by user code rather than by changes in NHibernate code. Now having a test case would help to check if this is really the case. So feel free to add it there anyway.

Side note: if this trouble is triggered only by an auto-flush (flush done prior to a query which may have to touch some dirtied entities), you could check if disabling it (lowering FlushMode to Commit instead of Auto) is enough (and valid) for your case. Of course you need to analyze the consequences, such as queries taking into account only the actual state of data in database, instead of its state amended by ongoing changes in the session.
(Now thinking about it, normally auto-flushes are disabled during flushes, so we may have a bug here in NHibernate if your case is really triggered by what we call auto-flush.)

About mixing scopes and native transactions, guidances were not well defined on NHibernate side either, and are still not indeed. may amend the documentation for sorting it out.

Emanuel Wlaschitz September 4, 2017 at 9:26 AM
Edited

This is on a Windows Service application self-hosting a WCF service, no IIS or ASP.Net. CreateSession is merely a custom class that does nothing else than initialize a session-factory, keep it in a static field (because creating the session-factory is usually the heavy part) and wrap calls to OpenSession:

Nothing ThreadStatic in there, or related to a ThreadContext as far as I can see.

But to be fair, that piece of code has been around for almost 10 years now and the colleague that wrote it has left the company; so I cannot really tell if he knew about not mixing them or not. It probably seemed like a-good-idea™ back then; with the intent of that method to ensure a batch of work (multiple entity inserts/updates) either fully complete atomically, or not at all. It feels a little like he adopted some code that used System.Transactions in such a pattern and simply substituted NHibernate transactions.
I just ran my sample without the TransactionScope and got the same result (ArgumentOutOfRange exception on that very spot), so I'm not entirely sure this is the fault of mixing them - but regardless, your write-up is gold, thanks for that!

After almost of day of debugging, it appears we cause an Auto-Flush on that very same session that is currently being flushed/committed inside some IPostUpdateEventListener that attempts to do some bookkeeping. As a result, ActionQueue.ClearFromFlushNeededCheck clears that list which is currently being iterated inside ActionQueue.ExecuteActions (it's even visible in the stack trace if you drill deep enough).
So apparently it is our own fault after all, and we need to fix this in our code. Infact, doing so fixes the issue.

It is probably appropriate to close this issue...unless you have reason to believe there is indeed something that could/should be done from NHibernate to prevent those misbehaviors. Sorry for taking up your time on that, since we could've invested the time to investigate this properly like a year ago already :/

I got a Test-Case that reproduces the issue; but I'll still have to clean it up a little.

Frédéric Delaporte September 1, 2017 at 12:43 PM
Edited

We will probably not include your workaround, because this would very likely just hide the actual trouble instead of solving it.

The fact that your problem seems somewhat random looks like a threading trouble, although you assert your sessions are not shared among threads. CreateSession looks as some custom code, since the actual NHibernate method is OpenSession. Are you using some ThreadContext backed current session mechanism? If yes, within an IIS web app, Asp.Net thread agility could then cause a session to be used by a concurrent request. (Well, quite unlikely with your using pattern anyway.)

Otherwise maybe your application has added some interceptors/events triggered by those actions which would sometimes alter the session.

I am afraid you will need to achieve producing a test case for this trouble to be solved.

By the way, NHibernate transactions should not be mixed with transaction scopes, because this causes ADO transactions to be mixed with System transactions, which Microsoft advises against. See more details on (long read).

Fixed

Details

Assignee

Reporter

Components

Fix versions

Affects versions

Priority

Who's Looking?

Open Who's Looking?

Created September 1, 2017 at 11:34 AM
Updated September 6, 2017 at 3:16 PM
Resolved September 6, 2017 at 3:12 PM
Who's Looking?