Need urgent help on large dataset extraction

2

Comments

  • kristiakkristiak Posts: 1,249 ✭✭✭
    I would follow Gerbens advice and take a look at the Tomcat comfiguration. WHat version of OC and PG are you running?
  • RCHENURCHENU Posts: 184
    I'm using OC 3.12 and postgres 8.4.
    I've already updated the server.xml file if you are talking about this one.

    Do you know any other parameters that should be increased ?




  • kristiakkristiak Posts: 1,249 ✭✭✭
    Have you set the Tomcat JAVA parameters? they should be set as follows or even higher depending on what CPU and memory you use.
    -XX:+UseParallelGC
    -XX:ParallelGCThreads=4
    -XX:MaxPermSize=180m
    -XX:+CMSClassUnloadingEnabled

    You could also consider upgrading to the 64 bit PostegreSQL 9.5 and OC 3.13. This would give you significant improvement in performance,
    What server do you use, CPU? and memory? For a large database you should use at least 16 Gb of memory.

    What do you mean with a "large database"? 1000 patients or 10000 patients, how many events and parameters per event?


  • RCHENURCHENU Posts: 184
    edited July 18
    Thanks a lot for your input !

    This is a Virtual Machine with:
    CentOS 7.2
    Apache Tomcat 7
    PostgreSQL 8.4
    Quad Core Xeon Processor 2.0 Ghz
    4 GB RAM

    And my Java parameters are:
    JAVA_OPTS="-Xmx1280m -XX:MaxPermSize=512m -XX:+UseParallelGC -XX:ParallelGCThreads=1 -XX:+CMSClassUnloadingEnabled -Duser.country=US -Duser.language=en"

    I tried also to improve Postgre with that:
    effective_cache_size = 2GB
    checkpoint_segments= 32
    checkpoint_completion_target = 0.9
    maintenance_work_mem= 256MB
    wal_buffers = 16MB
    random_page_cost = 3.0


    There will be less than 2000 patients, 38 events and in each event from 6 to 18 CRFs inside. There is only one event (unscheduled visit) that can have multiple occurences, and some of my CRFs have tables (AE, ConcMed...)

    Do you think I can increase the ParallelGCThreads parameter ?
    Should I see with the IT guy to increase the RAM ?

    Thanks a lot,

    Romain.

  • ebsebs Posts: 111 ✭✭
    Parallel G Threads should be the number of cores so try 4
  • kristiakkristiak Posts: 1,249 ✭✭✭
    You should clearly change the JAVA parameters to
    -XX:ParallelGCThreads=4
    -XX:MaxPermSize=512m

    Initial memory pool:. 128
    Maximum memory pool: 1280 or more

    The RAM size is definitely to small for a large database, increase to 16Gb.
    I would also move away from the Virtual Machine to a dedicated server. We occasionally use Virtual server fro small training servers but it will never work with the type of large extracts that you intend to use. ´But I would clearly move to a different server!!!
  • RCHENURCHENU Posts: 184
    Thanks both of you for your answers. I'll try what you suggest today and come back to you asap.
    I will also see what can be done for the server...
  • RCHENURCHENU Posts: 184
    edited July 18
    After looking more into details on my centos, it seems that I have only one core (not 4) so I shouldn't change ParallelGCThreads.

    I added manually the new variables, CRFs by CRFs and Event by Event. It works. It says that I have 11007 variables in my study.
    In OpenClinica documentation, it says that:
    Note: With some large studies (> 10,000 Items), the 'Select All' function may not work. If this is the case you will have to manually select each Item you want in your dataset.

    So maybe that's why it didn't work...
    I don't know, maybe this is also a problem with my VM.
    I will keep you inform if I can upgrade my server to a better one, because adding manually the variables is awful and maybe I missed some...

    Thanks again for your help
    Romain.
    Post edited by RCHENU on
  • kristiakkristiak Posts: 1,249 ✭✭✭
    Well, our experience with VPS servers is rather poor and I think that your only solution for your relatively large database is to change to a standalone server with an XEON CPU with 4 kernels and 16 Gb RAM. running WINDOWS 2012. Then you will not have any problems with your exports. We have some rather large databases with more than 2000 patients 15 events over a two year period with some 800 different variables in total. They run all very well on dedicated servers with the above configuration.
Sign In or Register to comment.