Tips for implementing a DataStage release in a SDLC
Posted: Wed Nov 26, 2003 10:48 pm
Here's some tips for implementing a DataStage release:
1. Fully contain your ETL application within a parameterized framework. Strive to have all directory paths, databases, userids, and passwords setup as parameters to be fed at runtime. This will allow you to have coexisting copies of your application on the same server.
2. Promote a new copy of your ETL application into a new project. This allows you to prepare the next release of you ETL application without affecting the current production release. For example, if you're on release 1.0 of your warehouse, put it into a versioned project name, such as EDW_1_0. When you have a new set of jobs and fixes to existing, create a project EDW_1_1. You can put your jobs in place ahead of the go live date. If you've done #1 above, you simply have to run your jobs out of the new project when you're ready. If this is a parameter set at runtime from your enterprise scheduler, you potentially could have a simple switch to flip.
3. Version your shell scripts and sql scripts along with your release. You will have to enhance some sql scripts along with a database change and job changes. You will want to make sure that your directory structure allows you to version there just like you version your projects. Name your directory subtree the same as the project. For example, EDW_1_0 could have a subtree of /var/opt/etl/edw_1_0. You will be able to put release 1_1 into place without disturbing the production application.
4. You will have ETL work tables, views, and stored procedures that should exist within the userid schema that your ETL application uses. Convince the DBA team to version this user schema along with the project and directory naming convention. If you have done #1, then the userid is parameterized and this will work fantastically! You will be able to put enhancements to work tables, views, and stored procedures into the production environment ahead of time, because you are isolated within the newer release schema. You will not disturb the production application. When you flip the switch between projects, directory paths, the schema should go with the newer userid.
5. Version your EDW tables. If your tables are small enough, then instead of doing ALTER statements on the tables, have newer versions created with the enhanced DDL. This will allow you on the next release to reorganize the columns, especially if the legacy table always had new columns appended (messy!). You will backfill/convert existing data in on implementation day. This again will allow you to test the new release in a production environment, without having it live. Except for the biggest tables, this can be unbelievably simple and luxurious to do it this way.
6. Do as much work ahead of time to prepare the production environment. On implementation day of the new release, you do not want to have a significant amount of importing, compilation, test runs, database access checking, synonym verification, etc. If you've done #1-#5 above, implementations can happen late in the day, because you've done all of the installing, moving, verifications, and test runs ahead of time. You will impact your users insignificantly for the short period of time where the DBA team either renames current tables to older versions and rename the newer tables under the current names; or simply updates synonyms to point to a newer version of a table. (Applications like BO that use fully qualified table names with schema need the renaming shuffle as the method, but simple semantic layer views could do the trick).
1. Fully contain your ETL application within a parameterized framework. Strive to have all directory paths, databases, userids, and passwords setup as parameters to be fed at runtime. This will allow you to have coexisting copies of your application on the same server.
2. Promote a new copy of your ETL application into a new project. This allows you to prepare the next release of you ETL application without affecting the current production release. For example, if you're on release 1.0 of your warehouse, put it into a versioned project name, such as EDW_1_0. When you have a new set of jobs and fixes to existing, create a project EDW_1_1. You can put your jobs in place ahead of the go live date. If you've done #1 above, you simply have to run your jobs out of the new project when you're ready. If this is a parameter set at runtime from your enterprise scheduler, you potentially could have a simple switch to flip.
3. Version your shell scripts and sql scripts along with your release. You will have to enhance some sql scripts along with a database change and job changes. You will want to make sure that your directory structure allows you to version there just like you version your projects. Name your directory subtree the same as the project. For example, EDW_1_0 could have a subtree of /var/opt/etl/edw_1_0. You will be able to put release 1_1 into place without disturbing the production application.
4. You will have ETL work tables, views, and stored procedures that should exist within the userid schema that your ETL application uses. Convince the DBA team to version this user schema along with the project and directory naming convention. If you have done #1, then the userid is parameterized and this will work fantastically! You will be able to put enhancements to work tables, views, and stored procedures into the production environment ahead of time, because you are isolated within the newer release schema. You will not disturb the production application. When you flip the switch between projects, directory paths, the schema should go with the newer userid.
5. Version your EDW tables. If your tables are small enough, then instead of doing ALTER statements on the tables, have newer versions created with the enhanced DDL. This will allow you on the next release to reorganize the columns, especially if the legacy table always had new columns appended (messy!). You will backfill/convert existing data in on implementation day. This again will allow you to test the new release in a production environment, without having it live. Except for the biggest tables, this can be unbelievably simple and luxurious to do it this way.
6. Do as much work ahead of time to prepare the production environment. On implementation day of the new release, you do not want to have a significant amount of importing, compilation, test runs, database access checking, synonym verification, etc. If you've done #1-#5 above, implementations can happen late in the day, because you've done all of the installing, moving, verifications, and test runs ahead of time. You will impact your users insignificantly for the short period of time where the DBA team either renames current tables to older versions and rename the newer tables under the current names; or simply updates synonyms to point to a newer version of a table. (Applications like BO that use fully qualified table names with schema need the renaming shuffle as the method, but simple semantic layer views could do the trick).