Home » Other » General » Deleting data during ETL process
Deleting data during ETL process [message #258981] Tue, 14 August 2007 00:40 Go to next message
adiga
Messages: 6
Registered: August 2007
Junior Member
Hello All,

I am working on a data migration project where we are loading data in CSV files into an Oracle 10g DB using Business Objects ETL tool called Data Integrator.

Data has to be migrated from the current system to the new system that we are developing which uses Oracle DB. There are over 250 hospitals where the current system is being used.

Our schema in the Oracle DB is in third-normal form with some specific exceptions where we have denormalized for increased performance. Also, we have a lot of referential integrity constraints

The question is:
Say, after migrating data from one set of files, I find that there were some errors and I want to rerun my ETL process with the same set of files. I want to delete the data specific to this run from the DB. Note that the DB already has data loaded from different hospitals.
What is the best way to delete data specific to a hospital from the DB? Almost all the tables have hospital number field that I can use to identify the records. However, since there are a lot of referential integrity constraints, though I have a delete script in the proper order, the deletes will take a huge amount of time. Another approach I have thought about is to take a backup of the DB before running my ETL job, and restore the DB from the backup in case of any issues with the ETL job. But this would need the help of the DBA which would be an overhead for me since I keep running and rerunning my ETL process.

Is there any other approach that I can try?

Thanks in advance,
Pradeep
Re: Deleting data during ETL process [message #258983 is a reply to message #258981] Tue, 14 August 2007 00:55 Go to previous messageGo to next message
Michel Cadot
Messages: 68645
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Another way: Better testing your program/process on test db before doing it in production db.

Regards
Michel
Re: Deleting data during ETL process [message #258990 is a reply to message #258983] Tue, 14 August 2007 01:22 Go to previous messageGo to next message
adiga
Messages: 6
Registered: August 2007
Junior Member
Each hospital has its own customization due to which, there might be some data issues during ETL. We log all the errors into an error log table. After checking this table for errors, we might fix the issue in the source system and generate a fresh set of source files that we need to load again into our Oracle DB. Hence, however much we test the ETL process in the test db, in production, we might have to rerun the job frequently.
Re: Deleting data during ETL process [message #258999 is a reply to message #258990] Tue, 14 August 2007 01:39 Go to previous messageGo to next message
Michel Cadot
Messages: 68645
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
So this is the normal process? I don't see any reason why DBA should growl.

Regards
Michel
Re: Deleting data during ETL process [message #259008 is a reply to message #258990] Tue, 14 August 2007 02:15 Go to previous messageGo to next message
djmartin
Messages: 10181
Registered: March 2005
Location: Surges Bay TAS Australia
Senior Member
Account Moderator
Are you writing your own scripts? If so, instead of selecting ALL the recods from the hospital which is being merged, use a 'select' statement that finds those that do not exist and 'insert' them, and use a 'select' statement that finds those that have data differences and 'update' them.

You may have to create indexes that are not needed for daily production work.

David
Re: Deleting data during ETL process [message #259217 is a reply to message #258981] Tue, 14 August 2007 12:35 Go to previous messageGo to next message
adiga
Messages: 6
Registered: August 2007
Junior Member
Actually, the problem is not in inserting the data which we are doing using an ETL tool. The issue we are facing is while deleting data for one particular hospital. If I delete using delete statements, they are going to take a lot of time due to the foreign key constraints and I would have to analyze the schema after deleting so much data.
Re: Deleting data during ETL process [message #259246 is a reply to message #259217] Tue, 14 August 2007 18:34 Go to previous message
djmartin
Messages: 10181
Registered: March 2005
Location: Surges Bay TAS Australia
Senior Member
Account Moderator
Instead of deleting why not just update those that have changed data. As the key data will not have changed you won't have to worry about the foreign key constraint issues.

David
Previous Topic: Enterprise Wide Deployment
Next Topic: About-ORA-00600: internal error code.
Goto Forum:
  


Current Time: Thu Apr 25 05:40:09 CDT 2024