NHibernate
  1. NHibernate
  2. NH-1930

Filter condition on nullable many to one should be on the join, not the where

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 2.1.0.GA
    • Fix Version/s: None
    • Component/s: Core
    • Labels:

      Description

      Using the code in NH-1919, get produces the following values for nullable property that has a filter on it:

      SELECT invoice0_.id          AS id1_1_, 
             invoice0_.otherprop   AS otherprop1_1_, 
             invoice0_.categoryid  AS categoryid1_1_, 
             category1_.id         AS id0_0_, 
             category1_.validuntil AS validuntil0_0_ 
      FROM   invoice invoice0_ 
             LEFT OUTER JOIN category category1_ 
               ON invoice0_.categoryid = category1_.id 
      WHERE  category1_.validuntil > @p0 
             AND invoice0_.id = @p1
      

      The problem is that the filter should be applied on the join, not on the where, since it filter the parent as well as the child.

        Activity

        Hide
        Dmitry Kustov added a comment -

        I made this in NHibernate.SqlCommand.ANSIJoinFragment. It works fine

        public override bool AddCondition(string condition)

        { //DK return AddCondition(conditions, condition); return AddCondition(buffer, condition); }

        public override bool AddCondition(SqlString condition)
        { //DK return AddCondition(conditions, condition); return AddCondition(buffer, condition); }
        Show
        Dmitry Kustov added a comment - I made this in NHibernate.SqlCommand.ANSIJoinFragment. It works fine public override bool AddCondition(string condition) { //DK return AddCondition(conditions, condition); return AddCondition(buffer, condition); } public override bool AddCondition(SqlString condition) { //DK return AddCondition(conditions, condition); return AddCondition(buffer, condition); }
        Hide
        Ricardo Peres added a comment -

        Same happens when specifying a where clause on a set using fetch mode join.
        I think the restriction should be placed on the join on clause, not the global where.

        Show
        Ricardo Peres added a comment - Same happens when specifying a where clause on a set using fetch mode join. I think the restriction should be placed on the join on clause, not the global where.
        Hide
        Robert Snyder added a comment -

        The filter condition being applied to the where instead of the on clause of the join does provide the wrong data in an outer join - the results are filtered to the same as if it were an inner join. This continues to be a problem for us in 3.3.1. In addition however, it can be a huge performance problem on an inner join - if the condition can filter rows out on the join the working set can be dramatically reduced. I think it very important to be able to specify where the condition is applied. In general it should always be applied in the join, but there are always exceptions - even to the point of wanting to put it in both places (admittedly not common).

        Show
        Robert Snyder added a comment - The filter condition being applied to the where instead of the on clause of the join does provide the wrong data in an outer join - the results are filtered to the same as if it were an inner join. This continues to be a problem for us in 3.3.1. In addition however, it can be a huge performance problem on an inner join - if the condition can filter rows out on the join the working set can be dramatically reduced. I think it very important to be able to specify where the condition is applied. In general it should always be applied in the join, but there are always exceptions - even to the point of wanting to put it in both places (admittedly not common).
        Hide
        Robert Snyder added a comment - - edited

        Attached is a project with database that demonstrates the issue.

        
        -- The NHibernate filter generates
        SELECT this_.Id as Id1_1_,
               this_.Data as Data1_1_,
               this_.TableBId as TableBId1_1_,
               this_.CompanyId as CompanyId1_1_,
               tableb1_.Id as Id2_0_,
               tableb1_.Data as Data2_0_,
               tableb1_.CompanyId as CompanyId2_0_
        FROM [TableA] this_ 
           left outer join [TableB] tableb1_ on this_.TableBId=tableb1_.Id
        WHERE (tableb1_.CompanyId = @p0) and (this_.CompanyId = @p0)
        
        -- But it should generate (or at least be able to be told to generate)
        SELECT this_.Id as Id1_1_,
               this_.Data as Data1_1_,
               this_.TableBId as TableBId1_1_,
               this_.CompanyId as CompanyId1_1_,
               tableb1_.Id as Id2_0_,
               tableb1_.Data as Data2_0_,
               tableb1_.CompanyId as CompanyId2_0_
        FROM [TableA] this_ 
           left outer join [TableB] tableb1_ on this_.CompanyId = tableb1_.CompanyId and this_.TableBId=tableb1_.Id
        WHERE (this_.CompanyId = @p0)
        
        -- or even
        SELECT this_.Id as Id1_1_,
               this_.Data as Data1_1_,
               this_.TableBId as TableBId1_1_,
               this_.CompanyId as CompanyId1_1_,
               tableb1_.Id as Id2_0_,
               tableb1_.Data as Data2_0_,
               tableb1_.CompanyId as CompanyId2_0_
        FROM [TableA] this_ 
           left outer join [TableB] tableb1_ on tableb1_.CompanyId = @p0 and this_.TableBId=tableb1_.Id
        WHERE (this_.CompanyId = @p0)
        

        Leaving columns playing the role that CompanyId plays here out of the join carries severe penalties - in a multi-tenant database especially.
        1 - Wrong Data: for an outer join, putting the condition in the where clause filters the result to produce a set identical to that of an inner join.
        1a - In order to get around that, we have to include an 'OR (tableb1_.CompanyId IS NULL)' clause, which defeats any index on CompanyId.
        2 - Performance: In a multi-tenant db the most efficient index for data is one with the tenancy column(s) first. We do this in SQL Server by defining clustered indexes that start with the tenancy columns. For a single record-fetch by primary key, the penalty is in performance, not working set size, but especially for queries that return multiple records, searching a primary key across the entire space of all tenants is far more expensive than using a tenancy-based clustered index.
        3 - Performance: In cases where sets are being searched based on criteria not including the primary key, not including the filter columns in the join (regardless of tenancy design) results in all records being joined into memory, then filtering off the ones that don't match - a huge penalty to both performance and scalability.
        There should be a way to control whether such conditions are put on the join or the where clause (the join would be a better default in my opinion). Ideally, it should be possible to specify it in BOTH places, but that is a less common need.

        Note that this example uses just two tables, but in practice there will often be a significant chain of tables, each requiring the tenancy columns in the join clause. The deeper the chain, the greater the issue, more or less geometrically.

        Show
        Robert Snyder added a comment - - edited Attached is a project with database that demonstrates the issue. -- The NHibernate filter generates SELECT this_.Id as Id1_1_, this_.Data as Data1_1_, this_.TableBId as TableBId1_1_, this_.CompanyId as CompanyId1_1_, tableb1_.Id as Id2_0_, tableb1_.Data as Data2_0_, tableb1_.CompanyId as CompanyId2_0_ FROM [TableA] this_ left outer join [TableB] tableb1_ on this_.TableBId=tableb1_.Id WHERE (tableb1_.CompanyId = @p0) and (this_.CompanyId = @p0) -- But it should generate (or at least be able to be told to generate) SELECT this_.Id as Id1_1_, this_.Data as Data1_1_, this_.TableBId as TableBId1_1_, this_.CompanyId as CompanyId1_1_, tableb1_.Id as Id2_0_, tableb1_.Data as Data2_0_, tableb1_.CompanyId as CompanyId2_0_ FROM [TableA] this_ left outer join [TableB] tableb1_ on this_.CompanyId = tableb1_.CompanyId and this_.TableBId=tableb1_.Id WHERE (this_.CompanyId = @p0) -- or even SELECT this_.Id as Id1_1_, this_.Data as Data1_1_, this_.TableBId as TableBId1_1_, this_.CompanyId as CompanyId1_1_, tableb1_.Id as Id2_0_, tableb1_.Data as Data2_0_, tableb1_.CompanyId as CompanyId2_0_ FROM [TableA] this_ left outer join [TableB] tableb1_ on tableb1_.CompanyId = @p0 and this_.TableBId=tableb1_.Id WHERE (this_.CompanyId = @p0) Leaving columns playing the role that CompanyId plays here out of the join carries severe penalties - in a multi-tenant database especially. 1 - Wrong Data: for an outer join, putting the condition in the where clause filters the result to produce a set identical to that of an inner join. 1a - In order to get around that, we have to include an 'OR (tableb1_.CompanyId IS NULL)' clause, which defeats any index on CompanyId. 2 - Performance: In a multi-tenant db the most efficient index for data is one with the tenancy column(s) first. We do this in SQL Server by defining clustered indexes that start with the tenancy columns. For a single record-fetch by primary key, the penalty is in performance, not working set size, but especially for queries that return multiple records, searching a primary key across the entire space of all tenants is far more expensive than using a tenancy-based clustered index. 3 - Performance: In cases where sets are being searched based on criteria not including the primary key, not including the filter columns in the join (regardless of tenancy design) results in all records being joined into memory, then filtering off the ones that don't match - a huge penalty to both performance and scalability. There should be a way to control whether such conditions are put on the join or the where clause (the join would be a better default in my opinion). Ideally, it should be possible to specify it in BOTH places, but that is a less common need. Note that this example uses just two tables, but in practice there will often be a significant chain of tables, each requiring the tenancy columns in the join clause. The deeper the chain, the greater the issue, more or less geometrically.
        Hide
        Adrian T Wright added a comment -

        I am experiencing this issue as well. In a large data set, there will be a noticeable performance degradation. A fix would be very helpful for my team.

        Show
        Adrian T Wright added a comment - I am experiencing this issue as well. In a large data set, there will be a noticeable performance degradation. A fix would be very helpful for my team.

          People

          • Assignee:
            Unassigned
            Reporter:
            Ayende Rahien
          • Votes:
            25 Vote for this issue
            Watchers:
            21 Start watching this issue

            Dates

            • Created:
              Updated:

              Who's Looking?