Oracle to Oracle performance

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
DiscGolfer5000
Participant
Posts: 5
Joined: Thu Jul 29, 2004 1:28 pm
Location: USA

Oracle to Oracle performance

Post by DiscGolfer5000 »

I have a 7.5.2 AIX parallel job that reads 10 million rows from one Oracle database, does minor transformations and does a truncate and load into another Oracle database and it takes 15 hours.

I have split the job into two jobs. First job is the Oracle to transformer to sequential file and that job reads and writes the 10 million rows in 1 hour. The next job reads the sequential file and does the truncate and load into Oracle for 10 million rows and takes 8 minutes which is over 20,000 rows/sec.

All the Ora stages are Oracle Enterprise. Has anyone else run into this? What is the reason for such a difference? Is there a better solution?

Thanks,
Eric
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard.

What are you doing in the SELECT? Any grouping or join? What are you doing in the Transformer stage?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
dsuser_cai
Premium Member
Premium Member
Posts: 151
Joined: Fri Feb 13, 2009 4:19 pm

Post by dsuser_cai »

How complex is your select query?
Are the two tables in the same database?
Did you check for any locks?

Its always a good practice to have a staging area, before loading into the target (but depends on the situation). So design your job for optimal performance.
Thanks
Karthick
DiscGolfer5000
Participant
Posts: 5
Joined: Thu Jul 29, 2004 1:28 pm
Location: USA

Post by DiscGolfer5000 »

The job queries one Oracle database and loads into a separate Oracle database. The query is against a view:

SELECT SDATE, LEVEL1, LEVEL2, LEVEL3, LEVEL4, LEVEL5, LEVEL6,CAST(C_PRED AS DECIMAL(38,10)) C_PRED
FROM BIEO_SL_BASELINE_FCST


The view definition has some grouping and joins:

Code: Select all

CREATE OR REPLACE VIEW bieo_sl_baseline_fcst (sdate,
                                              level1,
                                              level2,
                                              level3,
                                              level4,
                                              level5,
                                              level6,
                                              c_pred,
                                              record_type
                                             )
AS
   SELECT sdate, level1, level2, level3, level4, level5, level6, c_pred,
          record_type
     FROM (SELECT   MIN (t_dates_list.from_sales_date) AS sdate,
                    (CASE
                        WHEN (  MIN (t_dates_list.from_sales_date)
                              - TO_DATE ('05-16-2010', 'MM-DD-YYYY')
                             ) <= 0
                           THEN (CASE
                                    WHEN (  MAX (t_dates_list.from_sales_date)
                                          - TO_DATE ('05-16-2010',
                                                     'MM-DD-YYYY'
                                                    )
                                         ) <= 0
                                       THEN 1
                                    ELSE 3
                                 END
                                )
                        ELSE 2
                     END
                    ) AS record_type,
                    MAX (comb_id) AS comb_id, t_comb_list.level_id1,
                    t_comb_list.level_id2, t_comb_list.level_id3,
                    t_comb_list.level_id4, t_comb_list.level_id5,
                    t_comb_list.level_id6,
                    NVL
                       (SUM
                           (  DECODE
                                 (branch_data.enter_fore,
                                  NULL, (NVL
                                            (branch_data.col_for_over,
                                             (  (  (    NVL
                                                           (branch_data.manual_stat,
                                                            NVL
                                                               (branch_data.sim_val_1,
                                                                  branch_data.fore_2
                                                                * 1
                                                               )
                                                           )
                                                      * (  1.00
                                                         + NVL
                                                              (branch_data.manual_fact,
                                                               0
                                                              )
                                                        )
                                                    + NVL
                                                         (branch_data.fixed_prom,
                                                          0
                                                         )
                                                   )
                                                 * (  1
                                                    - NVL
                                                         (branch_data.rule_coll,
                                                          0
                                                         )
                                                   )
                                                )
                                              + (  NVL (branch_data.int_cost,
                                                        0
                                                       )
                                                 * NVL (branch_data.rule_coll,
                                                        0
                                                       )
                                                )
                                             )
                                            )
                                   ),
                                  branch_data.ff
                                 )
                            * 1
                           ),
                        0
                       ) AS c_pred,
                    (CASE
                        WHEN MAX (branch_data.note_id) > 0
                           THEN 1
                        ELSE 0
                     END
                    ) AS note_id,
                    (CASE
                        WHEN MAX (branch_data.is_promotion) > 0
                           THEN 1
                        ELSE 0
                     END
                    ) AS is_promotion,
                    (CASE
                        WHEN MAX (branch_data.is_supply_plan) > 0
                           THEN 1
                        ELSE 0
                     END
                    ) AS is_supply_plan,
                    (CASE
                        WHEN MAX (branch_data.is_scenario_resource) > 0
                           THEN 1
                        ELSE 0
                     END
                    ) AS is_scenario_resource,
                    1 AS is_base_0
               FROM sales_data branch_data,
                    mdp_matrix,
                    t_ep_item,
                    t_ep_e1_parent_ad_num,
                    t_ep_ebs_tp_zone,
                    t_ep_e1_cust_cat_5,
                    t_ep_ebs_account,
                    t_ep_site,
                    int_comb_602_389 t_comb_list,
                    int_date_602_389 t_dates_list
              WHERE branch_data.sales_date >= t_dates_list.from_sales_date
                AND branch_data.sales_date <= t_dates_list.to_sales_date
                AND branch_data.item_id = mdp_matrix.item_id
                AND branch_data.location_id = mdp_matrix.location_id
                AND mdp_matrix.t_ep_item_ep_id = t_ep_item.t_ep_item_ep_id
                AND mdp_matrix.t_ep_e1_parent_ad_num_ep_id =
                             t_ep_e1_parent_ad_num.t_ep_e1_parent_ad_num_ep_id
                AND mdp_matrix.t_ep_ebs_tp_zone_ep_id =
                                       t_ep_ebs_tp_zone.t_ep_ebs_tp_zone_ep_id
                AND mdp_matrix.t_ep_e1_cust_cat_5_ep_id =
                                   t_ep_e1_cust_cat_5.t_ep_e1_cust_cat_5_ep_id
                AND mdp_matrix.t_ep_ebs_account_ep_id =
                                       t_ep_ebs_account.t_ep_ebs_account_ep_id
                AND mdp_matrix.t_ep_site_ep_id = t_ep_site.t_ep_site_ep_id
                AND t_comb_list.level_id1 = mdp_matrix.t_ep_item_ep_id
                AND t_comb_list.level_id2 =
                                        mdp_matrix.t_ep_e1_parent_ad_num_ep_id
                AND t_comb_list.level_id3 = mdp_matrix.t_ep_ebs_tp_zone_ep_id
                AND t_comb_list.level_id4 =
                                           mdp_matrix.t_ep_e1_cust_cat_5_ep_id
                AND t_comb_list.level_id5 = mdp_matrix.t_ep_ebs_account_ep_id
                AND t_comb_list.level_id6 = mdp_matrix.t_ep_site_ep_id
           GROUP BY t_comb_list.level_id1,
                    t_comb_list.level_id2,
                    t_comb_list.level_id3,
                    t_comb_list.level_id4,
                    t_comb_list.level_id5,
                    t_comb_list.level_id6,
                    t_dates_list.from_sales_date) int_table,
          int_comb_602_389
    WHERE int_comb_602_389.level_id1 = int_table.level_id1
      AND int_comb_602_389.level_id2 = int_table.level_id2
      AND int_comb_602_389.level_id3 = int_table.level_id3
      AND int_comb_602_389.level_id4 = int_table.level_id4
      AND int_comb_602_389.level_id5 = int_table.level_id5
      AND int_comb_602_389.level_id6 = int_table.level_id6;

I didn't check for any locks. I have run the 'ora query -> xfm -> seq' job several times and it's consistently taking around 45 to 50 minutes for the above to execute. The seq file is 1.5 GB.

The transformer stage is straight copying the 8 columns, adding 1 column that is set to @NULL, and adding 2 columns that are set to DSJobStartTimestamp.

Thanks,
Eric
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Initialize a stage variable to DSJobStartTimestamp and use this stage variable rather than re-evaluate the macro twice for every row processed.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
DiscGolfer5000
Participant
Posts: 5
Joined: Thu Jul 29, 2004 1:28 pm
Location: USA

Post by DiscGolfer5000 »

I tried a job using the stage variable to set DSJobStartTimestamp. Don't know why but it aborts after 4 million rows. Ran it again, same abort. Then I re-ran the previous job that queries -> xfm -> seq file and it ran 10 million rows fine. Must be a bug in DS 7.5.2. It sounds like a good tip but I think the main overall issue must have something to do with landing the data to a text file vs. not landing it. I wonder where DS stores the data when it's not landed using a seq file stage.

Thanks,
Eric
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

DiscGolfer5000 wrote:...I think the main overall issue must have something to do with landing the data to a text file vs. not landing it. I wonder where DS stores the data when it's not landed using a seq file stage.

Thanks,
Eric
I was wondering along that same line: Is it possible that the CPU/memory being used is maxing out its resources? The data has to "live" somewhere when it's not being landed to a file, and buffer-to-file is normally going to be faster than buffer to Oracle, or so I assume. Maybe this is a hardware performance issue.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

A number of years ago, Ken Bland posted several items here about landing data, including this thread. If you search for "pressure" with "kcbland" as the author, you'll also find a number of discussions on the concept of "back-pressure" and why landing between the extract and load addresses the issue that you are seeing.

Ken is one of our Premium Posters here, so not much of them will be accessable, I'm afraid, but did want to let you know they are out there. A search may turn up more, I don't recall off the top of my head.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply