Tips for implementing a DataStage release in a SDLC

Moderators: chulett, rschirm

Post Reply
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Tips for implementing a DataStage release in a SDLC

Post by kcbland »

Here's some tips for implementing a DataStage release:

1. Fully contain your ETL application within a parameterized framework. Strive to have all directory paths, databases, userids, and passwords setup as parameters to be fed at runtime. This will allow you to have coexisting copies of your application on the same server.

2. Promote a new copy of your ETL application into a new project. This allows you to prepare the next release of you ETL application without affecting the current production release. For example, if you're on release 1.0 of your warehouse, put it into a versioned project name, such as EDW_1_0. When you have a new set of jobs and fixes to existing, create a project EDW_1_1. You can put your jobs in place ahead of the go live date. If you've done #1 above, you simply have to run your jobs out of the new project when you're ready. If this is a parameter set at runtime from your enterprise scheduler, you potentially could have a simple switch to flip.

3. Version your shell scripts and sql scripts along with your release. You will have to enhance some sql scripts along with a database change and job changes. You will want to make sure that your directory structure allows you to version there just like you version your projects. Name your directory subtree the same as the project. For example, EDW_1_0 could have a subtree of /var/opt/etl/edw_1_0. You will be able to put release 1_1 into place without disturbing the production application.

4. You will have ETL work tables, views, and stored procedures that should exist within the userid schema that your ETL application uses. Convince the DBA team to version this user schema along with the project and directory naming convention. If you have done #1, then the userid is parameterized and this will work fantastically! You will be able to put enhancements to work tables, views, and stored procedures into the production environment ahead of time, because you are isolated within the newer release schema. You will not disturb the production application. When you flip the switch between projects, directory paths, the schema should go with the newer userid.

5. Version your EDW tables. If your tables are small enough, then instead of doing ALTER statements on the tables, have newer versions created with the enhanced DDL. This will allow you on the next release to reorganize the columns, especially if the legacy table always had new columns appended (messy!). You will backfill/convert existing data in on implementation day. This again will allow you to test the new release in a production environment, without having it live. Except for the biggest tables, this can be unbelievably simple and luxurious to do it this way.

6. Do as much work ahead of time to prepare the production environment. On implementation day of the new release, you do not want to have a significant amount of importing, compilation, test runs, database access checking, synonym verification, etc. If you've done #1-#5 above, implementations can happen late in the day, because you've done all of the installing, moving, verifications, and test runs ahead of time. You will impact your users insignificantly for the short period of time where the DBA team either renames current tables to older versions and rename the newer tables under the current names; or simply updates synonyms to point to a newer version of a table. (Applications like BO that use fully qualified table names with schema need the renaming shuffle as the method, but simple semantic layer views could do the trick).
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Teej
Participant
Posts: 677
Joined: Fri Aug 08, 2003 9:26 am
Location: USA

Re: Tips for implementing a DataStage release in a SDLC

Post by Teej »

Here is the way we do it that differs from the above methods:

We are strong believers in total separation of each step of the process - from the Development, QA, UAT, and Production. We also do not use DataStage Version Control -- relying on DSX to migrate between projects. We also discovered that VSS are woefully inadequate to handle DSX reliably, so we utilize a file system that are daily archived.

We employ a 8 digit versioning - "01000000" - Where the first two digits are the major version, next two referring to minor version, and the third set of two digits for emergency bug fixes in production mode, and the last two digits for the beta releases. This is something that raise a lot of debates around here. I personally prefers something simplier - "01.00.000", where the last 4 digits are combined into three digits for every release due to issues found in Beta/UAT. If a fix/minor upgrade was done to production code, a version that was "01.00.012" would be promoted to "01.01.000" for the next release.

In a nutshell, choose a versioning scheme that works for your company and stick to it, period. Don't change it unless there is a very strong and compelling reason to do so.

We do not version our instance names or anything like that. We do time-stamp the ERWIN changes. We only internally version the PL/SQL, along with Pro*C and other programs. When there are executables, we are able to visibly version it ("ProgramName_01000000").

On our Beta Release form, we version it.

Now here's something that differs from Ken's viewpoint: We have a very distinct (but not always successful) boundary between Developer, QA, UAT, and Production test environments. We also have upward to 40-50 programs that works within the same environment to produce the entire Data Factory process, all a mixture of Pro*C, PL/SQL, ETL, and Cold Fusion. To lock-step the Database along with all of those programs would be a logistic nightmare, and add a burdensome layer that unduely delay the release and testing process. We have a fairly large staff of DBAs that are already working overtime to keep up with the changes we ask from them, to require this would be a nightmare for them at a very little benefit for us.

Yes, it's an easy flip, but it's also easier to just create a test script that points to the environment. We have an universal script when it come to DataStage - "StartJob", internally created with its own rules and requirements. For every program that is released, they must have an up-to-date Program Guide documentation with the required parameters that would allow the program to work properly. QA and UAT are both expected to develop their own scripts based on this requirement. UAT are also expected to ensure that the Production's scripts are set up properly - as the Production Team practice a philosophy of a dumb 'click this button, and ensure the results don't deviate'. They are not allowed to make any changes once it is in production.

So there is usually no shell scripts that are being passed around. I strongly prefers this, because this would also help QA and UAT expose potential problems by executing their own perspective, instead of what the Developer think things should be.

As for our repository, we have three separate schemes, one not being used right now due to the time crunch due to upper management:

Code: Select all

ETL_Repository
    + Client
        - Program name
    + Library
        + Client
            - Library Name
        + ALL
            - Library Name

ETL_QA
    + Client
        + Program name
            - Version Number
    + Library
        + Client
            + Library Name
                - Version Number
        + ALL
            + Library Name
                - Version Number

ETL_PROD
    + Client
        + Program name
            - Version Number
    + Library
        + Client
            + Library Name
                - Version Number
        + ALL
            + Library Name
                - Version Number
Our system differs from that slightly due to our own specific needs (We have multiple clients within a single universe, so we have [project]_[client]_Program Name.

We also employs the Gatekeeping process which unfortunately deviated from my original design to a lazy form due to the extreme slowness by the new staffs. When I along with one other guy was onboard, we were dishing out releases within 1 hour of it being released to us. The new staff would take upward to 3 business days to release a project. So they decided to allow developers to release straight to QA, QA straight to UAT, and UAT through gatekeeping to Production. The biggest advantage of requiring Gatekeeping between each step is to keep an audit trail. We were in the process of developing a web-based release program to help automate the QA/UAT release, but it was repeatingly killed by upper management over the past 2 years.

Whose method is correct? It depends. Ken's method obviously works, since he is involved with multiple projects that likely used it. Our methods also worked for us. But I am sure we both agree: Having an audit trail, and visual separation between each versions are vital during this process. Minimizing complexity of releasing to Production (which we do by having the UAT step) is also vital. There is no one magic pixie dust for everything - so focus on the underlining goals, not schematics such as how versioning a program should be done (8 digits! 4!)

It also help when you have a strong leading voice dedicated in focusing on providing constructive compromises that still preserves the underlining goals.

-T.J.
Developer of DataStage Parallel Engine (Orchestrate).
Post Reply