A few questions:
1. Have you tried forming the SQL insert statements as COPY or PREPARE statements instead? (Per http://www.postgresql.org/docs/8.0/interactive/sql-copy.html
2. Have you tried batch insert operations using JDBC and PreparedStatements? (Per http://publib.boulder.ibm.com/infocenter/iseries/v5r3/index.jsp?topic=/rzaha/batchpre.htm)
3. Would disabling the audit triggers temporarily, then adding rows to the audit_log to indicate the batch insert, be an option for you?
Sent: Wed 5/31/2006 10:47 AM
To: [email protected]
Subject: [Developers] Has anyone tried to do mass OpenClinica data loadingvia customized Java code?
After about a month of reading through the source code, we have developed a set of tools to automate loading info behind the scenes. We have frozen data sets stored in SAS (about 30 different CRFs) that we would like to put into OpenClinica, for testing purposes. We had to modify a number of the DAO/Bean java files to include the ability to use a javax.sql.Connection in place of the DataSource Object. We left the DataSource code intact and supplemented it with Connection code. The automation does the following:
1) Call SAS via a command line procedure to convert the data into a tilde (~) delimited file and contents file. We initially tried comma-delimited, but since some notes fields have commas in their data, we chose tildes.
2) Parse the contents file and, using the CRF_Design_Template.xls, auto-generate the respective XLS file for the CRF. Version info is hard-coded to “v1.0” and all items are combined into one section.
3) Auto-generate CRF, CRFVersion, etc. records and use the SpreadSheetTable code to upload the XLS.
4) Parse the tilde-delimited data file and auto-generate subjects, study_subjects, etc.
5) As the tilde-delimited parsing occurs, insert item_data.
6) Loop through the contents of a folder that has all the original SAS files, cleaning up as each file is finished.
This is a simplified overview of what the code does. The main issue we have experienced thus far is the slowdown in inserts, as more and more data files are parsed. Initial data sets take .3 seconds for each item_data insert, but as the loads progress, the insert time increases exponentially. The audit triggers on the tables add about 5 inserts for each individual insert, delaying loads longer as more and more audit info is stored. Has anyone tried to load substantial amounts of data? If so, have you experienced these issues and determined how to resolve them? Thanks, in advance, for any guidance you can provide. I look forward to hearing any insights you may have.
Stephen J. Granite, M.S.
Director of Database/Software Development
Center for Cardiovascular Bioinformatics and Modeling
Johns Hopkins University
Clark Hall Room 202
3400 North Charles Street
Baltimore, MD 21218-2686
e-mail: [email protected]
Sent: Tuesday, May 16, 2006 12:07 PM
To: [email protected]
Subject: [Developers] CRF DATE issue
Hello, the Akaza team found an issue and would like to share a solution: Note: We are releasing version 1.04 today which fixes this and a few other issues.
CRFs using the DATE data type load OK but cause a runtime error when attempting to view or perform data entry with the CRF.
The script used to build the initial database schema (OpenClinica_1.0_db_tables_only.sql) that was distributed with the 1.0 and 1.01 versions failed to insert a DATE record into the item_data_type table.
You can verify the record is missing by executing the following SQL using psql or pgAdminIII.
SELECT * FROM item_data_type WHERE code = ‘DATE’
If you receive 0 records this issue probably applies to you.
You can manually add the column using psql or pgAdminIII. The following SQL will add the column.
INSERT INTO item_data_type (code, name, definition, reference) VALUES ( ‘DATE’, ‘date’, ‘NULL’, ‘NULL’)
If you have questions or comments please feel free to respond.
One Kendall Square
Bldg. 400, 4th Fl
Cambridge, MA 02139
tel: 617.621.8585 x.15
Email: [email protected]