We are currently working on the forum. For the short-term, all forum content will be in read-only format. We apologize for the interruption and look forward to collaborating with you shortly. All the best in your research!

outofmemory during data export

Hi all,

I have successfully exported a small dataset to a csv file. This outofmemory error happened when I am trying to export a dataset that has 380,000+ rows. To make things worse, the size of the dataset used by our users will be more likely a few times bigger than this. I am wondering if anyone has modified the export routine for large dataset export before I start writing one myself.

Thanks,
Gem Yang

Software Developer
Johns Hopkins University
Baltimore, MD 21218

Comments

  • jaronjaron Posts: 30
    Thanks for the report. How much memory do you have on the machine running OpenClinica?

    Thanks,
    Jaron

    ..................................
    Jaron Sampson
    Software Engineer
    Akaza Research
    One Kendall Square
    Bldg. 400, 4th Fl
    Cambridge, MA 02139
    tel: 617.621.8585 x.15
    fax: 617.621.0065
    Email: [email protected]
    Sent: Thursday, June 08, 2006 5:04 PM
    To: [email protected]
    Subject: [Developers] outofmemory during data export

    Hi all,

    I have successfully exported a small dataset to a csv file. This outofmemory error happened when I am trying to export a dataset that has 380,000+ rows. To make things worse, the size of the dataset used by our users will be more likely a few times bigger than this. I am wondering if anyone has modified the export routine for large dataset export before I start writing one myself.

    Thanks,
    Gem Yang

    Software Developer
    Johns Hopkins University
    Baltimore, MD 21218
  • Jun XuJun Xu Posts: 20
    I don't think the exporting code itself limits the size of a dataset. Maybe you could post the error log so we can check the details. But the amount of memory used for OpenClinica is controlled by Tomcat and JVM options. You can try to increase JVM heap size and see whether that can make any difference. Since you have a very large dataset, I'm not sure how much memory is required, you could try to increase the dataset size and memory size step by step.
    I suggest first check the current memory used by Tomcat: go to Tomcat manager ( http://localhost:8080/manager/) and check server status, you will see something like the following:
    JVM
    Free memory: 53.42 MB Total memory: 63.31 MB Max memory: 63.31 MB

    Then try to increase Tomcat memory, use a lower value than the amount of physical RAM in your system. If you use Linux, place the Java options into the CATALINA_OPTS environment variable.
    For example: "-server -Xmx400m".

    Please make sure to check the memory again in Tomcat manager panel after you increase it.

    Hope this helps.

    -Jun


    ------------------------------------------
    Jun Xu
    Developer/Analyst
    Akaza Research
    One Kendall Square, Bldg 400, Fourth Floor
    Cambridge, MA 02139
    p 617.621.8585 x20
    f 617.621.0065
    "Open informatics for public research"
    http://www.akazaresearch.com/
    http://www.openclinica.org/
    Sent: Thursday, June 08, 2006 5:55 PM
    To: Gem Yang; [email protected]
    Subject: RE: [Developers] outofmemory during data export
    Gem,

    Thanks for the report. How much memory do you have on the machine running OpenClinica?

    Thanks,
    Jaron

    ..................................
    Jaron Sampson
    Software Engineer
    Akaza Research
    One Kendall Square
    Bldg. 400, 4th Fl
    Cambridge, MA 02139
    tel: 617.621.8585 x.15
    fax: 617.621.0065
    Email: [email protected]
    Sent: Thursday, June 08, 2006 5:04 PM
    To: [email protected]
    Subject: [Developers] outofmemory during data export

    Hi all,

    I have successfully exported a small dataset to a csv file. This outofmemory error happened when I am trying to export a dataset that has 380,000+ rows. To make things worse, the size of the dataset used by our users will be more likely a few times bigger than this. I am wondering if anyone has modified the export routine for large dataset export before I start writing one myself.

    Thanks,
    Gem Yang

    Software Developer
    Johns Hopkins University
    Baltimore, MD 21218
  • gemyanggemyang Posts: 9
    Sorry, I did not reply to the list. As I told Jaron, I have bumped up the memory for tomcat to 1GB with no vail.


    You are right in that the exporting code itself does not put a limit on the size of the dataset. The problem is that the export routine (starting from ExportDataset.processRequest() to DatasetDAO.getDatasetData(), to EntityDAO.select() to EntityDAO.processRows() to various ArrayLists to an ExtractBean object, to writing the bean to file, finally) saves all the data as objects in the memory first before writing them out to a file. The code has to be modified to write to the file one row at a time. Also, a scrollable resultset should be used to minimize the memory footprint for the resultset object (which is about 300 MB alone for 380,000 rows upon my test).

    FYI, here are a couple of other problems related to data export:
    - the owner of test_table_three needs to be changed to ‘clinica’ after creating the table, otherwise a permission exception is thrown;
    - after downloading a zip file, an exception is thrown at page forwarding stage since the response header has already be set before being forwarded;

    Thanks,
    Gem

    Software Developer
    Johns Hopkins University
    Baltimore, MD 21218

    From: Jun Xu akazaresearch.com [mailto:[email protected]]
    Sent: Friday, June 09, 2006 11:08 AM
    To: Jaron Sampson; Gem Yang; [email protected]
    Subject: RE: [Developers] outofmemory during data export

    Hi Gem and everyone,
    I don't think the exporting code itself limits the size of a dataset. Maybe you could post the error log so we can check the details. But the amount of memory used for OpenClinica is controlled by Tomcat and JVM options. You can try to increase JVM heap size and see whether that can make any difference. Since you have a very large dataset, I'm not sure how much memory is required, you could try to increase the dataset size and memory size step by step.
    I suggest first check the current memory used by Tomcat: go to Tomcat manager ( http://localhost:8080/manager/) and check server status, you will see something like the following:
    JVM
    Free memory: 53.42 MB Total memory: 63.31 MB Max memory: 63.31 MB

    Then try to increase Tomcat memory, use a lower value than the amount of physical RAM in your system. If you use Linux, place the Java options into the CATALINA_OPTS environment variable.
    For example: "-server -Xmx400m".

    Please make sure to check the memory again in Tomcat manager panel after you increase it.

    Hope this helps.

    -Jun


    ------------------------------------------
    Jun Xu
    Developer/Analyst
    Akaza Research
    One Kendall Square, Bldg 400, Fourth Floor
    Cambridge, MA 02139
    p 617.621.8585 x20
    f 617.621.0065
    "Open informatics for public research"
    http://www.akazaresearch.com/
    http://www.openclinica.org/
    Sent: Thursday, June 08, 2006 5:55 PM
    To: Gem Yang; [email protected]
    Subject: RE: [Developers] outofmemory during data export
    Gem,

    Thanks for the report. How much memory do you have on the machine running OpenClinica?

    Thanks,
    Jaron

    ..................................
    Jaron Sampson
    Software Engineer
    Akaza Research
    One Kendall Square
    Bldg. 400, 4th Fl
    Cambridge, MA 02139
    tel: 617.621.8585 x.15
    fax: 617.621.0065
    Email: [email protected]
    Sent: Thursday, June 08, 2006 5:04 PM
    To: [email protected]
    Subject: [Developers] outofmemory during data export

    Hi all,

    I have successfully exported a small dataset to a csv file. This outofmemory error happened when I am trying to export a dataset that has 380,000+ rows. To make things worse, the size of the dataset used by our users will be more likely a few times bigger than this. I am wondering if anyone has modified the export routine for large dataset export before I start writing one myself.

    Thanks,
    Gem Yang

    Software Developer
    Johns Hopkins University
    Baltimore, MD 21218
  • jordan52jordan52 Posts: 10
    1. Create a comma report bean (TextReportBean which really is just a ReportBean)

    2. Pull all the data using an ExtractBean (it looks like it runs the queries and fills the ReportBean’s array lists with all the data in one shot.)

    3. Create the csv file by opening the file and writing to it the results of calling TextReportBean.toString() Have a look at that code… TextReportBean.toString() simply chugs through the ReportBean’s ginormous ArrayLists and appends everything to a string buffer.



    The way I see it, on export, several copies of the data are stored in memory... You’re going to need a ton of memory to export large datasets.

    Would it be better to pass a writer around and write the data directly to the file as you pull it from the database? That will skip a lot of the overhead of string copies and arrayLists.

    Jordan










    The materials in this e-mail are private and may contain Protected Health Information. If you are not the intended recipient be advised that any unauthorized use, disclosure, copying, distribution or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone at 314-747-8162 or by return e-mail.
    Sent: Friday, June 09, 2006 10:08 AM
    To: Jaron Sampson; Gem Yang; [email protected]
    Subject: RE: [Developers] outofmemory during data export

    Hi Gem and everyone,
    I don't think the exporting code itself limits the size of a dataset. Maybe you could post the error log so we can check the details. But the amount of memory used for OpenClinica is controlled by Tomcat and JVM options. You can try to increase JVM heap size and see whether that can make any difference. Since you have a very large dataset, I'm not sure how much memory is required, you could try to increase the dataset size and memory size step by step.
    I suggest first check the current memory used by Tomcat: go to Tomcat manager ( http://localhost:8080/manager/) and check server status, you will see something like the following:
    JVM
    Free memory: 53.42 MB Total memory: 63.31 MB Max memory: 63.31 MB

    Then try to increase Tomcat memory, use a lower value than the amount of physical RAM in your system. If you use Linux, place the Java options into the CATALINA_OPTS environment variable.
    For example: "-server -Xmx400m".

    Please make sure to check the memory again in Tomcat manager panel after you increase it.

    Hope this helps.

    -Jun


    ------------------------------------------
    Jun Xu
    Developer/Analyst
    Akaza Research
    One Kendall Square, Bldg 400, Fourth Floor
    Cambridge, MA 02139
    p 617.621.8585 x20
    f 617.621.0065
    "Open informatics for public research"
    http://www.akazaresearch.com/
    http://www.openclinica.org/
    Sent: Thursday, June 08, 2006 5:55 PM
    To: Gem Yang; [email protected]
    Subject: RE: [Developers] outofmemory during data export
    Gem,

    Thanks for the report. How much memory do you have on the machine running OpenClinica?

    Thanks,
    Jaron

    ..................................
    Jaron Sampson
    Software Engineer
    Akaza Research
    One Kendall Square
    Bldg. 400, 4th Fl
    Cambridge, MA 02139
    tel: 617.621.8585 x.15
    fax: 617.621.0065
    Email: [email protected]
    Sent: Thursday, June 08, 2006 5:04 PM
    To: [email protected]
    Subject: [Developers] outofmemory during data export

    Hi all,

    I have successfully exported a small dataset to a csv file. This outofmemory error happened when I am trying to export a dataset that has 380,000+ rows. To make things worse, the size of the dataset used by our users will be more likely a few times bigger than this. I am wondering if anyone has modified the export routine for large dataset export before I start writing one myself.

    Thanks,
    Gem Yang

    Software Developer
    Johns Hopkins University
    Baltimore, MD 21218
  • Jun XuJun Xu Posts: 20
    But I think the output format can be customized to meet different requirements, for example, for some special large dataset, as you said, just write the data direclty to a file as we pull it from the database.

    Jun
    From: Woerndle,Jordan [mailto:[email protected]]
    Sent: Friday, June 09, 2006 11:48 AM
    To: Jun Xu akazaresearch.com; Jaron Sampson; Gem Yang; [email protected]
    Subject: RE: [Developers] outofmemory during data export
    I did a quick run through the code. Stop me if I’m wrong (or if you want me to shut up) But, doesn’t the ExtractBean store all its data in a ReportBean which is a wrapper for a few ArrayLists? My guess is everything is indeed stored in memory. I say this because I cannot see any code in ExportDatasetServlet that flushes memory out to the file. Call me out if I’m wrong, but the algorithm looks something like:

    1. Create a comma report bean (TextReportBean which really is just a ReportBean)

    2. Pull all the data using an ExtractBean (it looks like it runs the queries and fills the ReportBean’s array lists with all the data in one shot.)

    3. Create the csv file by opening the file and writing to it the results of calling TextReportBean.toString() Have a look at that code… TextReportBean.toString() simply chugs through the ReportBean’s ginormous ArrayLists and appends everything to a string buffer.



    The way I see it, on export, several copies of the data are stored in memory... You’re going to need a ton of memory to export large datasets.

    Would it be better to pass a writer around and write the data directly to the file as you pull it from the database? That will skip a lot of the overhead of string copies and arrayLists.

    Jordan










    The materials in this e-mail are private and may contain Protected Health Information. If you are not the intended recipient be advised that any unauthorized use, disclosure, copying, distribution or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone at 314-747-8162 or by return e-mail.
    Sent: Friday, June 09, 2006 10:08 AM
    To: Jaron Sampson; Gem Yang; [email protected]
    Subject: RE: [Developers] outofmemory during data export

    Hi Gem and everyone,
    I don't think the exporting code itself limits the size of a dataset. Maybe you could post the error log so we can check the details. But the amount of memory used for OpenClinica is controlled by Tomcat and JVM options. You can try to increase JVM heap size and see whether that can make any difference. Since you have a very large dataset, I'm not sure how much memory is required, you could try to increase the dataset size and memory size step by step.
    I suggest first check the current memory used by Tomcat: go to Tomcat manager ( http://localhost:8080/manager/) and check server status, you will see something like the following:
    JVM
    Free memory: 53.42 MB Total memory: 63.31 MB Max memory: 63.31 MB

    Then try to increase Tomcat memory, use a lower value than the amount of physical RAM in your system. If you use Linux, place the Java options into the CATALINA_OPTS environment variable.
    For example: "-server -Xmx400m".

    Please make sure to check the memory again in Tomcat manager panel after you increase it.

    Hope this helps.

    -Jun


    ------------------------------------------
    Jun Xu
    Developer/Analyst
    Akaza Research
    One Kendall Square, Bldg 400, Fourth Floor
    Cambridge, MA 02139
    p 617.621.8585 x20
    f 617.621.0065
    "Open informatics for public research"
    http://www.akazaresearch.com/
    http://www.openclinica.org/
    Sent: Thursday, June 08, 2006 5:55 PM
    To: Gem Yang; [email protected]
    Subject: RE: [Developers] outofmemory during data export
    Gem,

    Thanks for the report. How much memory do you have on the machine running OpenClinica?

    Thanks,
    Jaron

    ..................................
    Jaron Sampson
    Software Engineer
    Akaza Research
    One Kendall Square
    Bldg. 400, 4th Fl
    Cambridge, MA 02139
    tel: 617.621.8585 x.15
    fax: 617.621.0065
    Email: [email protected]
    Sent: Thursday, June 08, 2006 5:04 PM
    To: [email protected]
    Subject: [Developers] outofmemory during data export

    Hi all,

    I have successfully exported a small dataset to a csv file. This outofmemory error happened when I am trying to export a dataset that has 380,000+ rows. To make things worse, the size of the dataset used by our users will be more likely a few times bigger than this. I am wondering if anyone has modified the export routine for large dataset export before I start writing one myself.

    Thanks,
    Gem Yang

    Software Developer
    Johns Hopkins University
    Baltimore, MD 21218
This discussion has been closed.