SAP R/3 форум ABAP консультантов
Russian ABAP Developer's Club

Home - FAQ - Search - Memberlist - Usergroups - Profile - Log in to check your private messages - Register - Log in - English
Blogs - Weblogs News

JOINS vs. FOR ALL ENTRIES - Which Performs Better?



 
Post new topic   Reply to topic    Russian ABAP Developer's Club Forum Index -> Performance tuning | Производительность
View previous topic :: View next topic  
Author Message
admin
Администратор
Администратор



Joined: 01 Sep 2007
Posts: 1639

PostPosted: Sat Sep 22, 2007 12:13 pm    Post subject: JOINS vs. FOR ALL ENTRIES - Which Performs Better? Reply with quote

JOINS vs. FOR ALL ENTRIES - Which Performs Better?
Rob Burbank
Company: The University of Toronto
Posted on Mar. 19, 2007 01:46 PM in ABAP

In earlier blogs, I looked at various performance tuning techniques and tried to identify some that are more important than others. In this blog, I want to look at different ways to construct a simple SELECT statement.

A very common question that is asked in the ABAP forum is “Which is better: a JOIN or FOR ALL ENTRIES ?” I’ve written a program that compares six different ways of constructing a SELECT statement: a simple, fully qualified SELECT ; a nested SELECT ; a SELECT using FOR ALL ENTRIES ; a SELECT using an INNER JOIN ; a SELECT using an OUTER JOIN ; and a SELECT using a sub-query. All of these SELECT s are fully qualified in the sense that they use all fields of the primary key. For comparison, I’ve also added a SELECT that doesn’t fully use the primary key.

This task was made more difficult by the fact that it’s not really easy to compare a JOIN with a sub-query. A JOIN assumes that you want the data from more than one table. A sub-query assumes that you need data only from the main table. So the SELECT statements that I have constructed are quite simple and in some cases, not practical. They are just for comparison purposes.

I’ve used the GET RUN TIME statement for comparison rather than the EXPLAIN function of transaction ST05 because it’s difficult to compare multiple SELECT s with single SELECT s using this function. GET RUN TIME is not perfect either, but if you do multiple comparisons, particularly in a system with little activity, the results should be OK. I’ve put all of the SELECTs used in comparisons within loops. You can adjust the number of loop passes on the selection screen.

In any event, here is the program:
Code:

REPORT ztest_selects LINE-SIZE 80 MESSAGE-ID 00.

DATA: t001      TYPE t001,
      bkpf      TYPE bkpf.

SELECT-OPTIONS: s_bukrs FOR  bkpf-bukrs MEMORY ID buk      OBLIGATORY.

SELECT-OPTIONS: s_belnr FOR  bkpf-belnr MEMORY ID bln      OBLIGATORY.
PARAMETERS:     p_gjahr LIKE bkpf-gjahr MEMORY ID gjr      OBLIGATORY.

SELECTION-SCREEN ULINE.

PARAMETERS:     p_loop1 TYPE i                             OBLIGATORY
                        DEFAULT 5,
                p_loop2 TYPE i                             OBLIGATORY
                        DEFAULT 10.

TYPES: BEGIN OF t001_type,
         bukrs  TYPE t001-bukrs,
       END   OF t001_type,

       BEGIN OF bkpf_type,
         bukrs  TYPE bkpf-bukrs,
         belnr  TYPE bkpf-belnr,
         gjahr  TYPE bkpf-gjahr,
       END   OF bkpf_type.

DATA: t001_int  TYPE TABLE OF t001_type,
      t001_wa   TYPE          t001_type,
      bkpf_int  TYPE TABLE OF bkpf_type,
      bkpf_wa   TYPE          bkpf_type.

DATA: start     TYPE i,
      end       TYPE i,
      dif       TYPE i.

START-OF-SELECTION.

  DO p_loop1 TIMES.
    PERFORM simple_select.
    PERFORM nested_select.
    PERFORM for_all_entries.
    PERFORM inner_join.
    PERFORM outer_join.
    PERFORM sub_query.
    PERFORM unqualified_select.
    SKIP 1.
  ENDDO.

*&---------------------------------------------------------------------*
*&      Form  simple_select
*&---------------------------------------------------------------------*
* First we get documents using a select statement that is
* fully qualified on the primary key. Because buffering may be an issue,
* the first select will be disregarded in this test. However, in real
* life, this would be the important time.
*----------------------------------------------------------------------*
FORM simple_select.

* Do an initial select of the documents we intend to get. Due to
* buffering, the first select may take much longer then the next one.

  SELECT  bukrs belnr gjahr
    FROM  bkpf
    INTO  TABLE bkpf_int
    WHERE bukrs IN s_bukrs
      AND belnr IN s_belnr
      AND gjahr EQ p_gjahr.

  IF sy-subrc <> 0.
    MESSAGE ID '00' TYPE 'E' NUMBER '001' WITH
               'No Data meets selection criteria'.
  ENDIF.

* Next we get the same document using the same fully qualified select
* statement. We will use this in comparisons.
  GET RUN TIME FIELD start.

  DO p_loop2 TIMES.
    SELECT  bukrs belnr gjahr
      FROM  bkpf
      INTO  TABLE bkpf_int
      WHERE bukrs IN s_bukrs
        AND belnr IN s_belnr
        AND gjahr EQ p_gjahr.
  ENDDO.

  GET RUN TIME FIELD end.
  dif = end - start.
  WRITE: /001 'Time for first SELECT   (fully qualified)',
          055 ':', dif, 'microseconds'.
ENDFORM.                    " simple_select

*&---------------------------------------------------------------------*
*&      Form  nested_select
*&---------------------------------------------------------------------*
*       text
*----------------------------------------------------------------------*
FORM nested_select.

* Use the same fully qualified SELECT, but this time nested. As usual,
* Ignore the first SELECT and use the subsequent ones for comparison.

  REFRESH: bkpf_int.

  SELECT  bukrs FROM t001
    INTO  t001_wa
    WHERE bukrs IN s_bukrs.

    SELECT  bukrs belnr gjahr
      FROM  bkpf
      INTO  bkpf_wa
      WHERE bukrs EQ t001_wa-bukrs
        AND belnr IN s_belnr
        AND gjahr EQ p_gjahr.

      APPEND bkpf_wa TO bkpf_int.

    ENDSELECT.
  ENDSELECT.

* Next we get the same document using the same fully qualified select
* statement. We will use this in comparisons.
  GET RUN TIME FIELD start.

  DO p_loop2 TIMES.

    REFRESH: bkpf_int.

    SELECT  bukrs FROM t001
      INTO  t001_wa
      WHERE bukrs IN s_bukrs.

      SELECT  bukrs belnr gjahr
        FROM  bkpf
        INTO  bkpf_wa
        WHERE bukrs EQ t001_wa-bukrs
          AND belnr IN s_belnr
          AND gjahr EQ p_gjahr.

        APPEND bkpf_wa TO bkpf_int.

      ENDSELECT.
    ENDSELECT.
  ENDDO.

  GET RUN TIME FIELD end.
  dif = end - start.
  WRITE: /001 'Time for second SELECT  (nested)',
          055 ':', dif, 'microseconds'.
ENDFORM.                    " nested_select

*&---------------------------------------------------------------------*
*&      Form  for_all_entries
*&---------------------------------------------------------------------*
*       text
*----------------------------------------------------------------------*
FORM for_all_entries .

* Use the same fully qualified SELECT, but this time with
* FOR ALL ENTRIES.

* Ignore the first set of SELECTs
  SELECT  bukrs
    FROM  t001
    INTO  TABLE t001_int
    WHERE bukrs IN s_bukrs.

  SELECT  bukrs belnr gjahr
    FROM  bkpf
    INTO TABLE bkpf_int
    FOR ALL ENTRIES IN t001_int
    WHERE bukrs EQ t001_int-bukrs
      AND belnr IN s_belnr
      AND gjahr EQ p_gjahr.

* Use these SELECTs in comparisons.
  GET RUN TIME FIELD start.

  DO p_loop2 TIMES.
    SELECT  bukrs
      FROM  t001
      INTO  TABLE t001_int
      WHERE bukrs IN s_bukrs.

    SELECT  bukrs belnr gjahr
      FROM  bkpf
      INTO TABLE bkpf_int
      FOR ALL ENTRIES IN t001_int
      WHERE bukrs EQ t001_int-bukrs
        AND belnr IN s_belnr
        AND gjahr EQ p_gjahr.
  ENDDO.

  GET RUN TIME FIELD end.
  dif = end - start.
  WRITE: /001 'Time for third SELECT   (using FOR ALL ENTRIES)',
          055 ':', dif, 'microseconds'.

ENDFORM.                    " for_all_entries

*&---------------------------------------------------------------------*
*&      Form  inner_join
*&---------------------------------------------------------------------*
*       text
*----------------------------------------------------------------------*
FORM inner_join .

* Use the same fully qualified SELECT, but this time with an INNER JOIN.

  SELECT  t001~bukrs bkpf~belnr bkpf~gjahr
    FROM  bkpf
      INNER JOIN t001 ON
        t001~bukrs EQ bkpf~bukrs
    INTO TABLE bkpf_int
    WHERE t001~bukrs IN s_bukrs
      AND bkpf~belnr IN s_belnr
      AND bkpf~gjahr EQ p_gjahr.

* Use this select in comparisons.
  GET RUN TIME FIELD start.

  DO p_loop2 TIMES.
    SELECT  t001~bukrs bkpf~belnr bkpf~gjahr
      FROM  bkpf
        INNER JOIN t001 ON
          t001~bukrs EQ bkpf~bukrs
      INTO TABLE bkpf_int
      WHERE t001~bukrs IN s_bukrs
        AND bkpf~belnr IN s_belnr
        AND bkpf~gjahr EQ p_gjahr.
  ENDDO.

  GET RUN TIME FIELD end.
  dif = end - start.
  WRITE: /001 'Time for fourth SELECT  (using an INNER JOIN)',
          055 ':', dif, 'microseconds'.

ENDFORM.                    " inner_join

*&---------------------------------------------------------------------*
*&      Form  outer_join
*&---------------------------------------------------------------------*
*       text
*----------------------------------------------------------------------*
FORM outer_join .

* Use the same fully qualified SELECT, but this time with an OUTER JOIN.

  SELECT  t001~bukrs bkpf~belnr bkpf~gjahr
    FROM  bkpf
      LEFT OUTER JOIN t001 ON
        t001~bukrs EQ bkpf~bukrs
    INTO TABLE bkpf_int
    WHERE bkpf~bukrs IN s_bukrs
      AND bkpf~belnr IN s_belnr
      AND bkpf~gjahr EQ p_gjahr.

* Use this select in comparisons.
  GET RUN TIME FIELD start.

  DO p_loop2 TIMES.
    SELECT  t001~bukrs bkpf~belnr bkpf~gjahr
      FROM  bkpf
        LEFT OUTER JOIN t001 ON
          t001~bukrs EQ bkpf~bukrs
      INTO TABLE bkpf_int
      WHERE bkpf~bukrs IN s_bukrs
        AND bkpf~belnr IN s_belnr
        AND bkpf~gjahr EQ p_gjahr.
  ENDDO.

  GET RUN TIME FIELD end.
  dif = end - start.
  WRITE: /001 'Time for fifth SELECT   (using an OUTER JOIN)',
          055 ':', dif, 'microseconds'.

ENDFORM.                    " outer_join

*&---------------------------------------------------------------------*
*&      Form  sub_query
*&---------------------------------------------------------------------*
*       text
*----------------------------------------------------------------------*
FORM sub_query .

* And a sub-query

  SELECT bukrs belnr gjahr
    FROM bkpf
    INTO TABLE bkpf_int
    WHERE bukrs IN
      ( SELECT bukrs
          FROM  t001
          WHERE bukrs IN s_bukrs )
      AND belnr IN s_belnr
      AND gjahr EQ p_gjahr.

* Use this select in comparisons.
  GET RUN TIME FIELD start.

  DO p_loop2 TIMES.
    SELECT  bukrs belnr gjahr
      FROM  bkpf
      INTO TABLE bkpf_int
      WHERE bukrs IN
        ( SELECT  bukrs
            FROM  t001
            WHERE bukrs IN s_bukrs )
        AND belnr IN s_belnr
        AND gjahr EQ p_gjahr.
  ENDDO.

  GET RUN TIME FIELD end.
  dif = end - start.
  WRITE: /001 'Time for sixth SELECT   (using a sub-query)',
          055 ':', dif, 'microseconds'.

ENDFORM.                    " sub_query

*&---------------------------------------------------------------------*
*&      Form  unqualified_select
*&---------------------------------------------------------------------*
* Compare the above results with a SELECT that is only partially
* qualified.
*----------------------------------------------------------------------*
FORM unqualified_select.

* Ignore the first SELECT
  SELECT  bukrs belnr gjahr
    FROM  bkpf
    INTO  TABLE bkpf_int
    WHERE belnr IN s_belnr
      AND gjahr EQ  p_gjahr.

* Use this select in comparisons.
  GET RUN TIME FIELD start.

  DO p_loop2 TIMES.
    SELECT  bukrs belnr gjahr
      FROM  bkpf
      INTO  TABLE bkpf_int
      WHERE belnr IN s_belnr
        AND gjahr EQ  p_gjahr.
  ENDDO.

  GET RUN TIME FIELD end.
  dif = end - start.
  WRITE: /001 'Time for seventh SELECT (partially qualified)',
          055 ':', dif, 'microseconds'.
ENDFORM.                    " unqualified_select


The program has two SELECT-OPTIONS and one PARAMETER for selecting data: Company code, document number and fiscal year. I ran it four different ways: with a single company code and document number, with a single company code and a range of document numbers, with a range of company codes and a single document number and with ranges of both company codes and document numbers.

I ran the program a number of times in a 4.7 environment with DB2 databases. I was a bit surprised at some of the results:
  • For the simple case (single company code and document number) all of the methods worked almost equally well. The single fully qualified SELECT worked best, while the OUTER JOIN was worst. But the worst case only added about 25% execution time. The nested SELECT was really no worse than the others
  • With a single company code and range of document numbers, the execution times increased, but the overall results were quite similar to the simple case with the exception that the nested SELECT added about 75% to the execution time.
  • With a range of company codes either with a single or range of document numbers, the results were different: the execution times for both the OUTER JOIN and fully qualified SELECT were dramatically higher (500 to 1000 times) than the other methods. This (to me at least) was the really surprising result.


The following statement:
Code:
SELECT bukrs belnr gjahr 
FROM bkpf 
INTO TABLE bkpf_int 
  WHERE bukrs IN s_bukrs
    AND belnr IN s_belnr 
    AND gjahr EQ p_gjahr.

Is far less efficient than:
Code:
SELECT t001~bukrs bkpf~belnr bkpf~gjahr
FROM bkpf
 INNER JOIN t001 ON
 t001~bukrs EQ bkpf~bukrs
 INTO TABLE bkpf_int
 WHERE t001~bukrs IN s_bukrs
   AND bkpf~belnr IN s_belnr
   AND bkpf~gjahr EQ p_gjahr.

when a range of company codes is used. The increase in execution time for the OUTER JOIN is probably due to the fact that I could not use T001~BUKRS in the WHERE clause because of that limitation on OUTER JOINs

In the final analysis, there is no “one size fits all” answer to the question is “Which is better: a JOIN or FOR ALL ENTRIES ?” In many, if not most cases, my money is on the JOIN , but the difference is not large enough to spend much time jumping through hoops to pare off the last microsecond. In the end, if you are interested in the differences for your particular case, then you must code different SELECTs to find which is best. But then you also have to bear in mind that the same SELECT may behave differently based on the makeup of the WHERE clause.

There are other considerations that come into play as well:
  • INNER JOINs only look at the intersection of the results that meet the WHERE clause.
  • FOR ALL ENTRIES eliminates duplicates from the results.
  • I find JOINs to be more time consuming to code. (I can never find the “~” key.)
  • When using FOR ALL ENTRIES you generally end up with at least two internal tables. This may or may not be a good thing.
  • The example I have shown uses the full primary key. Some preliminary testing I have done comparing JOINs with FOR ALL ENTRIES show that FOR ALL ENTRIES can give better performance in that case.


One final thing to note: in the above program, the one SELECT that consistently underperformed was the one that did not use the index effectively. And that is the real point here. All of the techniques that I have shown here work reasonably effectively. The most important thing to remember is to use an index.


Depends also on the DB tuning
Hi Rob,
Thanks for this interesting weblog!

I believe the result also depends on the DB tuning. There are several FOR ALL ENTRIES relevant profile parameters.
It's explained in the following OSS notes:
Note 48230 - Parameters for the SELECT ... FOR ALL ENTRIES statement
Note 652634 - FOR ALL ENTRIES performance with Microsoft SQL Server
Note 634263 - Selects with FOR ALL ENTRIES as of kernel 6.10
It's worth to check these parameters, before making any decision, which way to choose.

Best regards,
Peter
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    Russian ABAP Developer's Club Forum Index -> Performance tuning | Производительность All times are GMT + 4 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


All product names are trademarks of their respective companies. SAPNET.RU websites are in no way affiliated with SAP AG.
SAP, SAP R/3, R/3 software, mySAP, ABAP, BAPI, xApps, SAP NetWeaver and any other are registered trademarks of SAP AG.
Every effort is made to ensure content integrity. Use information on this site at your own risk.