We are currently working on the forum. For the short-term, all forum content will be in read-only format. We apologize for the interruption and look forward to collaborating with you shortly. All the best in your research!

Disabling triggers speeds loading up

Tom,

We’re in the process of investigating the 1st two options you suggested. We implemented option 3 and things sped up significantly for a single CRF load (30 minutes rather than 4 hours). We’ve run into another issue since then. We are trying to use the data extraction tool, but none of the data are “available” for extraction. In our code, we set the following flags:

1) Set the Subject_event status to COMPLETED
2) Set the Event_CRF stage to INITIAL_DATA_ENTRY_COMPLETE
3) Set the individual item_data records to UNAVAILABLE

From the webpage perspective, this is correct. However, no data is available to export. Do you have any ideas which flags might be incorrect? Thanks, in advance, for your help.

Stephen J. Granite, M.S.
Director of Database/Software Development
Center for Cardiovascular Bioinformatics and Modeling
Johns Hopkins University
Clark Hall Room 202
3400 North Charles Street
Baltimore, MD 21218-2686
phone: 410-516-8613
e-mail: [email protected]
From: Tom Hickerson [mailto:[email protected]]
Sent: Wednesday, May 31, 2006 11:49 AM
To: Stephen Granite; [email protected]
Subject: RE: [Developers] Has anyone tried to do mass OpenClinica data loadingvia customized Java code?

Hi Stephen,

A few questions:

1. Have you tried forming the SQL insert statements as COPY or PREPARE statements instead? (Per http://www.postgresql.org/docs/8.0/interactive/sql-copy.html and http://www.postgresql.org/docs/8.0/interactive/sql-prepare.html)

2. Have you tried batch insert operations using JDBC and PreparedStatements? (Per http://publib.boulder.ibm.com/infocenter/iseries/v5r3/index.jsp?topic=/rzaha/batchpre.htm)

3. Would disabling the audit triggers temporarily, then adding rows to the audit_log to indicate the batch insert, be an option for you?

Sincerely,
Tom Hickerson
Sent: Wed 5/31/2006 10:47 AM
To: [email protected]
Subject: [Developers] Has anyone tried to do mass OpenClinica data loadingvia customized Java code?
Hi everyone,

After about a month of reading through the source code, we have developed a set of tools to automate loading info behind the scenes. We have frozen data sets stored in SAS (about 30 different CRFs) that we would like to put into OpenClinica, for testing purposes. We had to modify a number of the DAO/Bean java files to include the ability to use a javax.sql.Connection in place of the DataSource Object. We left the DataSource code intact and supplemented it with Connection code. The automation does the following:

1) Call SAS via a command line procedure to convert the data into a tilde (~) delimited file and contents file. We initially tried comma-delimited, but since some notes fields have commas in their data, we chose tildes.
2) Parse the contents file and, using the CRF_Design_Template.xls, auto-generate the respective XLS file for the CRF. Version info is hard-coded to “v1.0” and all items are combined into one section.
3) Auto-generate CRF, CRFVersion, etc. records and use the SpreadSheetTable code to upload the XLS.
4) Parse the tilde-delimited data file and auto-generate subjects, study_subjects, etc.
5) As the tilde-delimited parsing occurs, insert item_data.
6) Loop through the contents of a folder that has all the original SAS files, cleaning up as each file is finished.

This is a simplified overview of what the code does. The main issue we have experienced thus far is the slowdown in inserts, as more and more data files are parsed. Initial data sets take .3 seconds for each item_data insert, but as the loads progress, the insert time increases exponentially. The audit triggers on the tables add about 5 inserts for each individual insert, delaying loads longer as more and more audit info is stored. Has anyone tried to load substantial amounts of data? If so, have you experienced these issues and determined how to resolve them? Thanks, in advance, for any guidance you can provide. I look forward to hearing any insights you may have.

Stephen J. Granite, M.S.
Director of Database/Software Development
Center for Cardiovascular Bioinformatics and Modeling
Johns Hopkins University
Clark Hall Room 202
3400 North Charles Street
Baltimore, MD 21218-2686
phone: 410-516-8613
e-mail: [email protected]
Sent: Tuesday, May 16, 2006 12:07 PM
To: [email protected]
Subject: [Developers] CRF DATE issue

Hello, the Akaza team found an issue and would like to share a solution: Note: We are releasing version 1.04 today which fixes this and a few other issues.

Issue –
CRFs using the DATE data type load OK but cause a runtime error when attempting to view or perform data entry with the CRF.

Cause –
The script used to build the initial database schema (OpenClinica_1.0_db_tables_only.sql) that was distributed with the 1.0 and 1.01 versions failed to insert a DATE record into the item_data_type table.

You can verify the record is missing by executing the following SQL using psql or pgAdminIII.
SELECT * FROM item_data_type WHERE code = ‘DATE’
If you receive 0 records this issue probably applies to you.

Solution –
You can manually add the column using psql or pgAdminIII. The following SQL will add the column.

INSERT INTO item_data_type (code, name, definition, reference) VALUES ( ‘DATE’, ‘date’, ‘NULL’, ‘NULL’)

If you have questions or comments please feel free to respond.

Thanks,
Jaron
..................................
Jaron Sampson
Software Engineer
Akaza Research
One Kendall Square
Bldg. 400, 4th Fl
Cambridge, MA 02139
tel: 617.621.8585 x.15
fax: 617.621.0065
Email: [email protected]

Comments

  • jordan52jordan52 Posts: 10
    There is a file called pg_view.sql that you run to fill a table (named test_table_three) with data. The data export features use this table (stop me if I’m wrong) to generate the datasets.

    The idea was to set up a cron job that fires off the updateview.cron script that will run pg_view.sql at the required times. In other words, you cannot immediately extract the data you put into the the database. You have to wait (or manually) fire the pg_view.sql.

    I’m typing out this message quickly cause I’m late to a lunch date… so let me know if something doesn’t make sense. I can clarify this afternoon.

    Jordan

    The materials in this e-mail are private and may contain Protected Health Information. If you are not the intended recipient be advised that any unauthorized use, disclosure, copying, distribution or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone at 314-747-8162 or by return e-mail.
    Sent: Monday, June 05, 2006 11:11 AM
    To: [email protected]
    Subject: [Developers] Disabling triggers speeds loading up

    Tom,

    We’re in the process of investigating the 1st two options you suggested. We implemented option 3 and things sped up significantly for a single CRF load (30 minutes rather than 4 hours). We’ve run into another issue since then. We are trying to use the data extraction tool, but none of the data are “available” for extraction. In our code, we set the following flags:

    1) Set the Subject_event status to COMPLETED
    2) Set the Event_CRF stage to INITIAL_DATA_ENTRY_COMPLETE
    3) Set the individual item_data records to UNAVAILABLE

    From the webpage perspective, this is correct. However, no data is available to export. Do you have any ideas which flags might be incorrect? Thanks, in advance, for your help.

    Stephen J. Granite, M.S.
    Director of Database/Software Development
    Center for Cardiovascular Bioinformatics and Modeling
    Johns Hopkins University
    Clark Hall Room 202
    3400 North Charles Street
    Baltimore, MD 21218-2686
    phone: 410-516-8613
    e-mail: [email protected]
    From: Tom Hickerson [mailto:[email protected]]
    Sent: Wednesday, May 31, 2006 11:49 AM
    To: Stephen Granite; [email protected]
    Subject: RE: [Developers] Has anyone tried to do mass OpenClinica data loadingvia customized Java code?

    Hi Stephen,

    A few questions:

    1. Have you tried forming the SQL insert statements as COPY or PREPARE statements instead? (Per http://www.postgresql.org/docs/8.0/interactive/sql-copy.html and http://www.postgresql.org/docs/8.0/interactive/sql-prepare.html)

    2. Have you tried batch insert operations using JDBC and PreparedStatements? (Per http://publib.boulder.ibm.com/infocenter/iseries/v5r3/index.jsp?topic=/rzaha/batchpre.htm)

    3. Would disabling the audit triggers temporarily, then adding rows to the audit_log to indicate the batch insert, be an option for you?

    Sincerely,
    Tom Hickerson
    Sent: Wed 5/31/2006 10:47 AM
    To: [email protected]
    Subject: [Developers] Has anyone tried to do mass OpenClinica data loadingvia customized Java code?
    Hi everyone,

    After about a month of reading through the source code, we have developed a set of tools to automate loading info behind the scenes. We have frozen data sets stored in SAS (about 30 different CRFs) that we would like to put into OpenClinica, for testing purposes. We had to modify a number of the DAO/Bean java files to include the ability to use a javax.sql.Connection in place of the DataSource Object. We left the DataSource code intact and supplemented it with Connection code. The automation does the following:

    1) Call SAS via a command line procedure to convert the data into a tilde (~) delimited file and contents file. We initially tried comma-delimited, but since some notes fields have commas in their data, we chose tildes.
    2) Parse the contents file and, using the CRF_Design_Template.xls, auto-generate the respective XLS file for the CRF. Version info is hard-coded to “v1.0” and all items are combined into one section.
    3) Auto-generate CRF, CRFVersion, etc. records and use the SpreadSheetTable code to upload the XLS.
    4) Parse the tilde-delimited data file and auto-generate subjects, study_subjects, etc.
    5) As the tilde-delimited parsing occurs, insert item_data.
    6) Loop through the contents of a folder that has all the original SAS files, cleaning up as each file is finished.

    This is a simplified overview of what the code does. The main issue we have experienced thus far is the slowdown in inserts, as more and more data files are parsed. Initial data sets take .3 seconds for each item_data insert, but as the loads progress, the insert time increases exponentially. The audit triggers on the tables add about 5 inserts for each individual insert, delaying loads longer as more and more audit info is stored. Has anyone tried to load substantial amounts of data? If so, have you experienced these issues and determined how to resolve them? Thanks, in advance, for any guidance you can provide. I look forward to hearing any insights you may have.

    Stephen J. Granite, M.S.
    Director of Database/Software Development
    Center for Cardiovascular Bioinformatics and Modeling
    Johns Hopkins University
    Clark Hall Room 202
    3400 North Charles Street
    Baltimore, MD 21218-2686
    phone: 410-516-8613
    e-mail: [email protected]
    Sent: Tuesday, May 16, 2006 12:07 PM
    To: [email protected]
    Subject: [Developers] CRF DATE issue

    Hello, the Akaza team found an issue and would like to share a solution: Note: We are releasing version 1.04 today which fixes this and a few other issues.

    Issue –
    CRFs using the DATE data type load OK but cause a runtime error when attempting to view or perform data entry with the CRF.

    Cause –
    The script used to build the initial database schema (OpenClinica_1.0_db_tables_only.sql) that was distributed with the 1.0 and 1.01 versions failed to insert a DATE record into the item_data_type table.

    You can verify the record is missing by executing the following SQL using psql or pgAdminIII.
    SELECT * FROM item_data_type WHERE code = ‘DATE’
    If you receive 0 records this issue probably applies to you.

    Solution –
    You can manually add the column using psql or pgAdminIII. The following SQL will add the column.

    INSERT INTO item_data_type (code, name, definition, reference) VALUES ( ‘DATE’, ‘date’, ‘NULL’, ‘NULL’)

    If you have questions or comments please feel free to respond.

    Thanks,
    Jaron
    ..................................
    Jaron Sampson
    Software Engineer
    Akaza Research
    One Kendall Square
    Bldg. 400, 4th Fl
    Cambridge, MA 02139
    tel: 617.621.8585 x.15
    fax: 617.621.0065
    Email: [email protected]
  • jaronjaron Posts: 30
    Thanks,
    Jaron

    ..................................
    Jaron Sampson
    Software Engineer
    Akaza Research
    One Kendall Square
    Bldg. 400, 4th Fl
    Cambridge, MA 02139
    tel: 617.621.8585 x.15
    fax: 617.621.0065
    Email: [email protected]
    Sent: Monday, June 05, 2006 12:49 PM
    To: Stephen Granite; [email protected]
    Subject: RE: [Developers] Disabling triggers speeds loading up

    I’m pretty out of the loop on the openclinica code base lately… but, did you run the stored procedure that fills the export table with data?

    There is a file called pg_view.sql that you run to fill a table (named test_table_three) with data. The data export features use this table (stop me if I’m wrong) to generate the datasets.

    The idea was to set up a cron job that fires off the updateview.cron script that will run pg_view.sql at the required times. In other words, you cannot immediately extract the data you put into the the database. You have to wait (or manually) fire the pg_view.sql.

    I’m typing out this message quickly cause I’m late to a lunch date… so let me know if something doesn’t make sense. I can clarify this afternoon.

    Jordan

    The materials in this e-mail are private and may contain Protected Health Information. If you are not the intended recipient be advised that any unauthorized use, disclosure, copying, distribution or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone at 314-747-8162 or by return e-mail.
    Sent: Monday, June 05, 2006 11:11 AM
    To: [email protected]
    Subject: [Developers] Disabling triggers speeds loading up

    Tom,

    We’re in the process of investigating the 1st two options you suggested. We implemented option 3 and things sped up significantly for a single CRF load (30 minutes rather than 4 hours). We’ve run into another issue since then. We are trying to use the data extraction tool, but none of the data are “available” for extraction. In our code, we set the following flags:

    1) Set the Subject_event status to COMPLETED
    2) Set the Event_CRF stage to INITIAL_DATA_ENTRY_COMPLETE
    3) Set the individual item_data records to UNAVAILABLE

    From the webpage perspective, this is correct. However, no data is available to export. Do you have any ideas which flags might be incorrect? Thanks, in advance, for your help.

    Stephen J. Granite, M.S.
    Director of Database/Software Development
    Center for Cardiovascular Bioinformatics and Modeling
    Johns Hopkins University
    Clark Hall Room 202
    3400 North Charles Street
    Baltimore, MD 21218-2686
    phone: 410-516-8613
    e-mail: [email protected]
    From: Tom Hickerson [mailto:[email protected]]
    Sent: Wednesday, May 31, 2006 11:49 AM
    To: Stephen Granite; [email protected]
    Subject: RE: [Developers] Has anyone tried to do mass OpenClinica data loadingvia customized Java code?

    Hi Stephen,

    A few questions:

    1. Have you tried forming the SQL insert statements as COPY or PREPARE statements instead? (Per http://www.postgresql.org/docs/8.0/interactive/sql-copy.html and http://www.postgresql.org/docs/8.0/interactive/sql-prepare.html)

    2. Have you tried batch insert operations using JDBC and PreparedStatements? (Per http://publib.boulder.ibm.com/infocenter/iseries/v5r3/index.jsp?topic=/rzaha/batchpre.htm)

    3. Would disabling the audit triggers temporarily, then adding rows to the audit_log to indicate the batch insert, be an option for you?

    Sincerely,
    Tom Hickerson
    Sent: Wed 5/31/2006 10:47 AM
    To: [email protected]
    Subject: [Developers] Has anyone tried to do mass OpenClinica data loadingvia customized Java code?
    Hi everyone,

    After about a month of reading through the source code, we have developed a set of tools to automate loading info behind the scenes. We have frozen data sets stored in SAS (about 30 different CRFs) that we would like to put into OpenClinica, for testing purposes. We had to modify a number of the DAO/Bean java files to include the ability to use a javax.sql.Connection in place of the DataSource Object. We left the DataSource code intact and supplemented it with Connection code. The automation does the following:

    1) Call SAS via a command line procedure to convert the data into a tilde (~) delimited file and contents file. We initially tried comma-delimited, but since some notes fields have commas in their data, we chose tildes.
    2) Parse the contents file and, using the CRF_Design_Template.xls, auto-generate the respective XLS file for the CRF. Version info is hard-coded to “v1.0” and all items are combined into one section.
    3) Auto-generate CRF, CRFVersion, etc. records and use the SpreadSheetTable code to upload the XLS.
    4) Parse the tilde-delimited data file and auto-generate subjects, study_subjects, etc.
    5) As the tilde-delimited parsing occurs, insert item_data.
    6) Loop through the contents of a folder that has all the original SAS files, cleaning up as each file is finished.

    This is a simplified overview of what the code does. The main issue we have experienced thus far is the slowdown in inserts, as more and more data files are parsed. Initial data sets take .3 seconds for each item_data insert, but as the loads progress, the insert time increases exponentially. The audit triggers on the tables add about 5 inserts for each individual insert, delaying loads longer as more and more audit info is stored. Has anyone tried to load substantial amounts of data? If so, have you experienced these issues and determined how to resolve them? Thanks, in advance, for any guidance you can provide. I look forward to hearing any insights you may have.

    Stephen J. Granite, M.S.
    Director of Database/Software Development
    Center for Cardiovascular Bioinformatics and Modeling
    Johns Hopkins University
    Clark Hall Room 202
    3400 North Charles Street
    Baltimore, MD 21218-2686
    phone: 410-516-8613
    e-mail: [email protected]
    Sent: Tuesday, May 16, 2006 12:07 PM
    To: [email protected]
    Subject: [Developers] CRF DATE issue

    Hello, the Akaza team found an issue and would like to share a solution: Note: We are releasing version 1.04 today which fixes this and a few other issues.

    Issue –
    CRFs using the DATE data type load OK but cause a runtime error when attempting to view or perform data entry with the CRF.

    Cause –
    The script used to build the initial database schema (OpenClinica_1.0_db_tables_only.sql) that was distributed with the 1.0 and 1.01 versions failed to insert a DATE record into the item_data_type table.

    You can verify the record is missing by executing the following SQL using psql or pgAdminIII.
    SELECT * FROM item_data_type WHERE code = ‘DATE’
    If you receive 0 records this issue probably applies to you.

    Solution –
    You can manually add the column using psql or pgAdminIII. The following SQL will add the column.

    INSERT INTO item_data_type (code, name, definition, reference) VALUES ( ‘DATE’, ‘date’, ‘NULL’, ‘NULL’)

    If you have questions or comments please feel free to respond.

    Thanks,
    Jaron
    ..................................
    Jaron Sampson
    Software Engineer
    Akaza Research
    One Kendall Square
    Bldg. 400, 4th Fl
    Cambridge, MA 02139
    tel: 617.621.8585 x.15
    fax: 617.621.0065
    Email: [email protected]

    Attachments:
    INSTALL-windows.doc 1.0 MB
This discussion has been closed.