Posted: Sat Sep 22, 2007 12:13 pm Post subject: JOINS vs. FOR ALL ENTRIES - Which Performs Better?
JOINS vs. FOR ALL ENTRIES - Which Performs Better?
Rob Burbank
Company: The University of Toronto
Posted on Mar. 19, 2007 01:46 PM in ABAP
In earlier blogs, I looked at various performance tuning techniques and tried to identify some that are more important than others. In this blog, I want to look at different ways to construct a simple SELECT statement.
A very common question that is asked in the ABAP forum is “Which is better: a JOIN or FOR ALL ENTRIES ?” I’ve written a program that compares six different ways of constructing a SELECT statement: a simple, fully qualified SELECT ; a nested SELECT ; a SELECT using FOR ALL ENTRIES ; a SELECT using an INNER JOIN ; a SELECT using an OUTER JOIN ; and a SELECT using a sub-query. All of these SELECT s are fully qualified in the sense that they use all fields of the primary key. For comparison, I’ve also added a SELECT that doesn’t fully use the primary key.
This task was made more difficult by the fact that it’s not really easy to compare a JOIN with a sub-query. A JOIN assumes that you want the data from more than one table. A sub-query assumes that you need data only from the main table. So the SELECT statements that I have constructed are quite simple and in some cases, not practical. They are just for comparison purposes.
I’ve used the GET RUN TIME statement for comparison rather than the EXPLAIN function of transaction ST05 because it’s difficult to compare multiple SELECT s with single SELECT s using this function. GET RUN TIME is not perfect either, but if you do multiple comparisons, particularly in a system with little activity, the results should be OK. I’ve put all of the SELECTs used in comparisons within loops. You can adjust the number of loop passes on the selection screen.
In any event, here is the program:
Code:
REPORT ztest_selects LINE-SIZE 80 MESSAGE-ID 00.
DATA: t001 TYPE t001,
bkpf TYPE bkpf.
SELECT-OPTIONS: s_bukrs FOR bkpf-bukrs MEMORY ID buk OBLIGATORY.
SELECT-OPTIONS: s_belnr FOR bkpf-belnr MEMORY ID bln OBLIGATORY.
PARAMETERS: p_gjahr LIKE bkpf-gjahr MEMORY ID gjr OBLIGATORY.
SELECTION-SCREEN ULINE.
PARAMETERS: p_loop1 TYPE i OBLIGATORY
DEFAULT 5,
p_loop2 TYPE i OBLIGATORY
DEFAULT 10.
TYPES: BEGIN OF t001_type,
bukrs TYPE t001-bukrs,
END OF t001_type,
BEGIN OF bkpf_type,
bukrs TYPE bkpf-bukrs,
belnr TYPE bkpf-belnr,
gjahr TYPE bkpf-gjahr,
END OF bkpf_type.
DATA: t001_int TYPE TABLE OF t001_type,
t001_wa TYPE t001_type,
bkpf_int TYPE TABLE OF bkpf_type,
bkpf_wa TYPE bkpf_type.
*&---------------------------------------------------------------------*
*& Form simple_select
*&---------------------------------------------------------------------*
* First we get documents using a select statement that is
* fully qualified on the primary key. Because buffering may be an issue,
* the first select will be disregarded in this test. However, in real
* life, this would be the important time.
*----------------------------------------------------------------------*
FORM simple_select.
* Do an initial select of the documents we intend to get. Due to
* buffering, the first select may take much longer then the next one.
SELECT bukrs belnr gjahr
FROM bkpf
INTO TABLE bkpf_int
WHERE bukrs IN s_bukrs
AND belnr IN s_belnr
AND gjahr EQ p_gjahr.
IF sy-subrc <> 0.
MESSAGE ID '00' TYPE 'E' NUMBER '001' WITH
'No Data meets selection criteria'.
ENDIF.
* Next we get the same document using the same fully qualified select
* statement. We will use this in comparisons.
GET RUN TIME FIELD start.
DO p_loop2 TIMES.
SELECT bukrs belnr gjahr
FROM bkpf
INTO TABLE bkpf_int
WHERE bukrs IN s_bukrs
AND belnr IN s_belnr
AND gjahr EQ p_gjahr.
ENDDO.
GET RUN TIME FIELD end.
dif = end - start.
WRITE: /001 'Time for first SELECT (fully qualified)',
055 ':', dif, 'microseconds'.
ENDFORM. " simple_select
*&---------------------------------------------------------------------*
*& Form nested_select
*&---------------------------------------------------------------------*
* text
*----------------------------------------------------------------------*
FORM nested_select.
* Use the same fully qualified SELECT, but this time nested. As usual,
* Ignore the first SELECT and use the subsequent ones for comparison.
REFRESH: bkpf_int.
SELECT bukrs FROM t001
INTO t001_wa
WHERE bukrs IN s_bukrs.
SELECT bukrs belnr gjahr
FROM bkpf
INTO bkpf_wa
WHERE bukrs EQ t001_wa-bukrs
AND belnr IN s_belnr
AND gjahr EQ p_gjahr.
APPEND bkpf_wa TO bkpf_int.
ENDSELECT.
ENDSELECT.
* Next we get the same document using the same fully qualified select
* statement. We will use this in comparisons.
GET RUN TIME FIELD start.
DO p_loop2 TIMES.
REFRESH: bkpf_int.
SELECT bukrs FROM t001
INTO t001_wa
WHERE bukrs IN s_bukrs.
SELECT bukrs belnr gjahr
FROM bkpf
INTO bkpf_wa
WHERE bukrs EQ t001_wa-bukrs
AND belnr IN s_belnr
AND gjahr EQ p_gjahr.
APPEND bkpf_wa TO bkpf_int.
ENDSELECT.
ENDSELECT.
ENDDO.
GET RUN TIME FIELD end.
dif = end - start.
WRITE: /001 'Time for second SELECT (nested)',
055 ':', dif, 'microseconds'.
ENDFORM. " nested_select
*&---------------------------------------------------------------------*
*& Form for_all_entries
*&---------------------------------------------------------------------*
* text
*----------------------------------------------------------------------*
FORM for_all_entries .
* Use the same fully qualified SELECT, but this time with
* FOR ALL ENTRIES.
* Ignore the first set of SELECTs
SELECT bukrs
FROM t001
INTO TABLE t001_int
WHERE bukrs IN s_bukrs.
SELECT bukrs belnr gjahr
FROM bkpf
INTO TABLE bkpf_int
FOR ALL ENTRIES IN t001_int
WHERE bukrs EQ t001_int-bukrs
AND belnr IN s_belnr
AND gjahr EQ p_gjahr.
* Use these SELECTs in comparisons.
GET RUN TIME FIELD start.
DO p_loop2 TIMES.
SELECT bukrs
FROM t001
INTO TABLE t001_int
WHERE bukrs IN s_bukrs.
SELECT bukrs belnr gjahr
FROM bkpf
INTO TABLE bkpf_int
FOR ALL ENTRIES IN t001_int
WHERE bukrs EQ t001_int-bukrs
AND belnr IN s_belnr
AND gjahr EQ p_gjahr.
ENDDO.
GET RUN TIME FIELD end.
dif = end - start.
WRITE: /001 'Time for third SELECT (using FOR ALL ENTRIES)',
055 ':', dif, 'microseconds'.
ENDFORM. " for_all_entries
*&---------------------------------------------------------------------*
*& Form inner_join
*&---------------------------------------------------------------------*
* text
*----------------------------------------------------------------------*
FORM inner_join .
* Use the same fully qualified SELECT, but this time with an INNER JOIN.
SELECT t001~bukrs bkpf~belnr bkpf~gjahr
FROM bkpf
INNER JOIN t001 ON
t001~bukrs EQ bkpf~bukrs
INTO TABLE bkpf_int
WHERE t001~bukrs IN s_bukrs
AND bkpf~belnr IN s_belnr
AND bkpf~gjahr EQ p_gjahr.
* Use this select in comparisons.
GET RUN TIME FIELD start.
DO p_loop2 TIMES.
SELECT t001~bukrs bkpf~belnr bkpf~gjahr
FROM bkpf
INNER JOIN t001 ON
t001~bukrs EQ bkpf~bukrs
INTO TABLE bkpf_int
WHERE t001~bukrs IN s_bukrs
AND bkpf~belnr IN s_belnr
AND bkpf~gjahr EQ p_gjahr.
ENDDO.
GET RUN TIME FIELD end.
dif = end - start.
WRITE: /001 'Time for fourth SELECT (using an INNER JOIN)',
055 ':', dif, 'microseconds'.
ENDFORM. " inner_join
*&---------------------------------------------------------------------*
*& Form outer_join
*&---------------------------------------------------------------------*
* text
*----------------------------------------------------------------------*
FORM outer_join .
* Use the same fully qualified SELECT, but this time with an OUTER JOIN.
SELECT t001~bukrs bkpf~belnr bkpf~gjahr
FROM bkpf
LEFT OUTER JOIN t001 ON
t001~bukrs EQ bkpf~bukrs
INTO TABLE bkpf_int
WHERE bkpf~bukrs IN s_bukrs
AND bkpf~belnr IN s_belnr
AND bkpf~gjahr EQ p_gjahr.
* Use this select in comparisons.
GET RUN TIME FIELD start.
DO p_loop2 TIMES.
SELECT t001~bukrs bkpf~belnr bkpf~gjahr
FROM bkpf
LEFT OUTER JOIN t001 ON
t001~bukrs EQ bkpf~bukrs
INTO TABLE bkpf_int
WHERE bkpf~bukrs IN s_bukrs
AND bkpf~belnr IN s_belnr
AND bkpf~gjahr EQ p_gjahr.
ENDDO.
GET RUN TIME FIELD end.
dif = end - start.
WRITE: /001 'Time for fifth SELECT (using an OUTER JOIN)',
055 ':', dif, 'microseconds'.
ENDFORM. " outer_join
*&---------------------------------------------------------------------*
*& Form sub_query
*&---------------------------------------------------------------------*
* text
*----------------------------------------------------------------------*
FORM sub_query .
* And a sub-query
SELECT bukrs belnr gjahr
FROM bkpf
INTO TABLE bkpf_int
WHERE bukrs IN
( SELECT bukrs
FROM t001
WHERE bukrs IN s_bukrs )
AND belnr IN s_belnr
AND gjahr EQ p_gjahr.
* Use this select in comparisons.
GET RUN TIME FIELD start.
DO p_loop2 TIMES.
SELECT bukrs belnr gjahr
FROM bkpf
INTO TABLE bkpf_int
WHERE bukrs IN
( SELECT bukrs
FROM t001
WHERE bukrs IN s_bukrs )
AND belnr IN s_belnr
AND gjahr EQ p_gjahr.
ENDDO.
GET RUN TIME FIELD end.
dif = end - start.
WRITE: /001 'Time for sixth SELECT (using a sub-query)',
055 ':', dif, 'microseconds'.
ENDFORM. " sub_query
*&---------------------------------------------------------------------*
*& Form unqualified_select
*&---------------------------------------------------------------------*
* Compare the above results with a SELECT that is only partially
* qualified.
*----------------------------------------------------------------------*
FORM unqualified_select.
* Ignore the first SELECT
SELECT bukrs belnr gjahr
FROM bkpf
INTO TABLE bkpf_int
WHERE belnr IN s_belnr
AND gjahr EQ p_gjahr.
* Use this select in comparisons.
GET RUN TIME FIELD start.
DO p_loop2 TIMES.
SELECT bukrs belnr gjahr
FROM bkpf
INTO TABLE bkpf_int
WHERE belnr IN s_belnr
AND gjahr EQ p_gjahr.
ENDDO.
GET RUN TIME FIELD end.
dif = end - start.
WRITE: /001 'Time for seventh SELECT (partially qualified)',
055 ':', dif, 'microseconds'.
ENDFORM. " unqualified_select
The program has two SELECT-OPTIONS and one PARAMETER for selecting data: Company code, document number and fiscal year. I ran it four different ways: with a single company code and document number, with a single company code and a range of document numbers, with a range of company codes and a single document number and with ranges of both company codes and document numbers.
I ran the program a number of times in a 4.7 environment with DB2 databases. I was a bit surprised at some of the results:
For the simple case (single company code and document number) all of the methods worked almost equally well. The single fully qualified SELECT worked best, while the OUTER JOIN was worst. But the worst case only added about 25% execution time. The nested SELECT was really no worse than the others
With a single company code and range of document numbers, the execution times increased, but the overall results were quite similar to the simple case with the exception that the nested SELECT added about 75% to the execution time.
With a range of company codes either with a single or range of document numbers, the results were different: the execution times for both the OUTER JOIN and fully qualified SELECT were dramatically higher (500 to 1000 times) than the other methods. This (to me at least) was the really surprising result.
The following statement:
Code:
SELECT bukrs belnr gjahr
FROM bkpf
INTO TABLE bkpf_int
WHERE bukrs IN s_bukrs
AND belnr IN s_belnr
AND gjahr EQ p_gjahr.
Is far less efficient than:
Code:
SELECT t001~bukrs bkpf~belnr bkpf~gjahr
FROM bkpf
INNER JOIN t001 ON
t001~bukrs EQ bkpf~bukrs
INTO TABLE bkpf_int
WHERE t001~bukrs IN s_bukrs
AND bkpf~belnr IN s_belnr
AND bkpf~gjahr EQ p_gjahr.
when a range of company codes is used. The increase in execution time for the OUTER JOIN is probably due to the fact that I could not use T001~BUKRS in the WHERE clause because of that limitation on OUTER JOINs
In the final analysis, there is no “one size fits all” answer to the question is “Which is better: a JOIN or FOR ALL ENTRIES ?” In many, if not most cases, my money is on the JOIN , but the difference is not large enough to spend much time jumping through hoops to pare off the last microsecond. In the end, if you are interested in the differences for your particular case, then you must code different SELECTs to find which is best. But then you also have to bear in mind that the same SELECT may behave differently based on the makeup of the WHERE clause.
There are other considerations that come into play as well:
INNER JOINs only look at the intersection of the results that meet the WHERE clause.
FOR ALL ENTRIES eliminates duplicates from the results.
I find JOINs to be more time consuming to code. (I can never find the “~” key.)
When using FOR ALL ENTRIES you generally end up with at least two internal tables. This may or may not be a good thing.
The example I have shown uses the full primary key. Some preliminary testing I have done comparing JOINs with FOR ALL ENTRIES show that FOR ALL ENTRIES can give better performance in that case.
One final thing to note: in the above program, the one SELECT that consistently underperformed was the one that did not use the index effectively. And that is the real point here. All of the techniques that I have shown here work reasonably effectively. The most important thing to remember is to use an index.
Depends also on the DB tuning
Hi Rob,
Thanks for this interesting weblog!
I believe the result also depends on the DB tuning. There are several FOR ALL ENTRIES relevant profile parameters.
It's explained in the following OSS notes:
Note 48230 - Parameters for the SELECT ... FOR ALL ENTRIES statement
Note 652634 - FOR ALL ENTRIES performance with Microsoft SQL Server
Note 634263 - Selects with FOR ALL ENTRIES as of kernel 6.10
It's worth to check these parameters, before making any decision, which way to choose.
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum You cannot attach files in this forum You can download files in this forum
All product names are trademarks of their respective companies. SAPNET.RU websites are in no way affiliated with SAP AG. SAP, SAP R/3, R/3 software, mySAP, ABAP, BAPI, xApps, SAP NetWeaver and any other are registered trademarks of SAP AG. Every effort is made to ensure content integrity. Use information on this site at your own risk.