simultaneous data entry and data export issues

Hi everyone,

I reaching you to try to get some answers to issues we are facing with OpenClinica. I should mention that I am not computer scientist at all, I am a biologist working in a translational research lab. So my questions may be very naïve, and I apologize in advance!

Here our story.
Couple of years ago, we decided to use Openclinica as an EDC for our clinical trial. Indeed, without entering into the details, we needed to design multi-disease CRFs. We ended up with 28 CRFs capturing > 5000 items in a single visit!
Now, time passed, IT infrastructure was improved and we finally setup an Openclinica instance on a linux virtual machine (centOs) with 4Go RAM (upgraded to 32Go recently), 15Go Disk space, 8 cores. We started the data entry couple of monthes ago, and we quickly faced several issues (unsolved with the RAM upgrade):
- simultaneous data entry from more than one crashes the VM
- going through the CRFs, meaning loading pages one after the other, crashes the VM
- data export of all the item for all the patients (about 100) never completed

So, I'm here to ask what's wrong in our design? is there any solution to improve the performances?

Thank you very much in advance for any comment and suggestion!
Cheers
Encarnita

Tagged:
«1

Comments

  • kristiakkristiak Posts: 1,266 ✭✭✭
    Hi Encarnita,
    Like yourself I'm not a computer scientist but have worked with OC since its birth. I'm not very familiar with your operating environment but in my experience I was never very successful using virtual machines. Since then we run everything under WINDOWS server on a dedicated system and today we have thousands of patients with many visits and it runs very well. We always divide our data in logical units for any exports and do not run to many exports at the same time. You can also improve performance by upgrading to 3.13 and run a full 64 bit operation with PGSQL 9.5. You can also improve Tomcat functions as per the OC instructions
    Good luck

    Krister
  • encarnitaencarnita Posts: 8
    edited January 19
    Hi Krister,

    Thank you for your reply. I'll see about the OS, but meanwhile can you give some precisions on :
    - what you meant by "logical units" (our CRFs, even considering them one by one, contain various type of data (integers, free text, calculated scores, dates, ...). Do you suggest that we should select similar type of data per extraction (which is going to be a nightmare ;-)!!!).
    - what is for you "to many exports at the same time". I just tried 223 heterogeneous items for 10 patients, and it worked!

    Thanks for your comment!!!
    Encarnita
  • kristiakkristiak Posts: 1,266 ✭✭✭
    With logical units I mean select values by visit and type, eg baseline BP weight temperature etc. What ever make sense for your type of data. I do not know how many patient you have but I would extract all patients for a certain visit and limit the number of item for each extraction to maybe 2000-3000. But do not start several extraction at the same time. But I would upgrade first to OC 3.13 and 64 bits like PGSL 9.5
    /Krister
  • lindsay.stevenslindsay.stevens Posts: 403 ✭✭✭
    via Email
    Hi @encarnita,

    I would say your VM specs sound very well equipped to handle a large
    database. However, note that boosting the VM specs doesn't necessarily mean
    OpenClinica's app server (Tomcat) or database server (Postgres) are
    actually using those resources. The default resource usage configuration
    for both is conservative and must be updated and then the servers restarted.

    By design it's not possible to do data entry to the same CRF at the same
    time by different users. But if you mean you're seeing crashes in other
    scenarios such as data entry to different CRFs at the same time, that is
    worrying and could you please share more details?

    As Krister alluded to, CRF design can impact performance. The detrimental
    things tend to be to have CRFs with a huge amount of items (particularly in
    the same section, say over 200), and/or adding a huge amount of rules set
    to evaluate on data entry (rather than deferring some to run on schedule).
    Where possible it's best to break such data models down into smaller CRFs.

    Extracts with a huge amount of items similarly tend to take time or in the
    worst case not complete. I can't remember the exact setting but you can
    adjust the pagination in extracts to help with this - please search the
    forum for previous discussion on this. Doing smaller extracts and combining
    the data externally can also work.

    If the extract setting doesn't help then the other thing to consider is
    extracting directly from the database. To achieve this task I shared a
    "community datamart" project on my GitHub page that you could try, although
    I no longer have capacity to support it. The next evolution of it is a new
    OpenClinica product called Insight, which I'd be glad to show you if you're
    interested.


    Best regards,
    Lindsay



    On 20 Jan. 2018 03:03, "encarnita"
    wrote:

    OpenClinica https://forums.openclinica.com/
    encarnita started a new discussion: simultaneous data entry and data export
    issues

    Hi everyone,

    I reaching you to try to get some answers to issues we are facing with
    OpenClinica. I should mention that I am not computer scientist at all, I am
    a biologist working in a translational research lab. So my questions may be
    very naïve, and I apologize in advance!

    Here our story.
    Couple of years ago, we decided to use Openclinica as an EDC for our
    clinical trial. Indeed, without entering into the details, we needed to
    design multi-disease CRFs. We ended up with 28 CRFs capturing > 5000 items
    in a single visit!
    Now, time passed, IT infrastructure was improved and we finally setup an
    Openclinica instance on a linux virtual machine (centOs) with 4Go RAM
    (upgraded to 32Go recently), 15Go Disk space, 8 cores. We started the data
    entry couple of monthes ago, and we quickly faced several issues (unsolved
    with the RAM upgrade):
    - simultaneous data entry from more than one crashes the VM
    - going through the CRFs, meaning loading pages one after the other,
    crashes the VM
    - data export of all the item for all the patients (about 100) never
    completed

    So, I'm here to ask what's wrong in our design? is there any solution to
    improve the performances?

    Thank you very much in advance for any comment and suggestion!
    Cheers
    Encarnita
  • encarnitaencarnita Posts: 8
    Thank you very much @krister and @lindsay.stevens!

    @krister , we have just 100 patients right now, the main issue I'm now figuring out is the size of some of our CRFs! Some of them contain > 1000 items.
    100 heterogeneous items from 1 CRF for 100 patients worked in a reasonable time frame, but still the extraction of the full CRFs will be very long :-)!

    @lindsay.stevens , regarding the VM spec, yes, we just boosted to test rather, although convinced that it would not change so much based on what we could read in OC forum and manuals...

    As for the data entry, I guess that yes several users entering data for different patients but on the same CRF simultaneously has probably occurred since they are getting the data from paper CRF all organized in the same order...
    However, this is not the only situation where the instance crashed. Different scenarios occured:
    - only one person connected to our OC instance, validating CRFs one after the other without entering data.
    - a data entry person connected entering data + the data manager connected checking the subject matrix
    - a data entry person connected entering data + study director checking CRFs one by one (maybe including the CRF on which was the data entry person) without validation nor saving not data entry...

    Regarding the size of CRFs, most of our CRFs have 10 to 200 items except three that have around 1000 items! Splitting them now, may require to rebuild the full database (and therefore to reenter all the data,which it is not conceivable). Do you think that if we extract the data from those CRFs 100 by 100 items at a time it will work? Anyway, to extract our 5000 items for each current 100 patients (400 more patients are expected to be recruited) will be a nightmare and too long.
    We may try datamart but if it is not maintained anymore, I would rather be interested on the new tool Insight. So yes, if we could discuss about it, it would be great!

    Cheers,
    Encarnita


  • kristiakkristiak Posts: 1,266 ✭✭✭
    You have not told us which version of OpenClinica that you use. If not the latest version, and upgrade would probably help. I would also go trough your data to see if you could not eliminate some data points, 5000 seems an awful lot.

    Regards

    Krister
  • toskriptoskrip Posts: 252 ✭✭
    Before jumping to any conclusions (or upgrade) I would first recommend to check if any errors have been logged (OC and Tomcat logs files) and verify you Java Virtual Machine (JVM) settings, because this is what restrict the performance of Tomcat at the first place (in other words, you cannot scale Tomcat only by adding more memory to the server, the configuration of JVM has to be adapted as well). These settings are specified within JAVA_OPTS string e.g. in your tomcat startup script.

    In general the are two types of memory areas used by JVM which are: heap and PermGen. And you can specify starting memory allocation and maximum memory allocation. For simplicity I would recommend to set the starting and maximum values for each of memory area to the same value. In that way you know exactly that how much you have assigned will be allocated by JVM and consequently available for Tomcat applications.

    Now... you want to increase a heap memory space if you know that OC will be creating lots of object instances (items in eCRFs, eCRFs, events, subjects, ...). Every new object instance will occupy more space within heap memory space. So if you have 5000 items, and when they actually hold value for specific subject, then 5000 ItemData instances are going to be created in heap just for that one subject (and of course many other more object instances will be present for other entities).

    e.g. to use 8GB heap space you can apply these setting:
    -Xms8192m -Xmx8192m

    PermGen on the other hand holds internal representation of Java classes. The number of classes (the definition according the which object instances are initiated) should be mostly fixed per application. In that sense you want to increase PermGen if you know that you have more applications within one Tomcat. Each application will (depending on the size of application itself not the database) will occupy certain space from PermGen after the deployment.

    e.g to use 512MB PermGen space you can apply these setting:
    -XX:PermSize=512m -XX:MaxPermSize=512m

    you was saying that you are not an IT professional, I would recommend to do the Tomcat performance tweaking together with somebody with certain IT background.

    best

    Tomas
  • encarnitaencarnita Posts: 8
    Hi @krister
    You're right. We are using Version: 3.13.
    As for the data, we setup a trial where 19 diseases are evaluated, patient have one of those 19 diseases, and some CRFs collect information for all the 19 diseases, regardless of the diagnostic of the patient. That's why we have a lot of data.
    How would you like to see the data?
  • encarnitaencarnita Posts: 8
    Hi @toskrip
    Thanks for your suggestions! I'll ask our IT person today if he can have a look and get back to you all!
    Best
    Encarnita
  • RCHENURCHENU Posts: 195
    Yes, keep us posted !

    Thank you,

    Romain.
Sign In or Register to comment.