Problem while extracting data

Hello all,

Is there any suggestions as to how big the database’s tablespace should be? We’re using Oracle 10g and been running out of it lately when extracting data from the largest studies (either lots of items or lots of subjects). So, the DB guys here have agreed to give us more room, provided we give them an estimation about how much would be necessary. As of now I’m puzzled about how to estimate that, since the SQL statement to get the data is huge (besides, I’m usually lousy at estimations, since I tend to be extremely pessimistic).

Also, if this was commented earlier, either in issues, mails or docs, could someone please point me to them?

Any hint would be appreciated.

Carlos Manuel Tilve Álvarez,
Servicio de Informática, 4ª planta.
Xerencia de Xestión Integrada de A Coruña
Hospital A Coruña
As Xubias de arriba nº 86 cp:15006 A Coruña.
TLF: 981 176 213
981 178 000 ext. 295467

Nota: A información contida nesta mensaxe e os seus posibles documentos adxuntos é privada e confidencial e está dirixida únicamente ó seu destinatario/a. Se vostede non é o/a destinatario/a orixinal desta mensaxe, por favor elimínea. A distribución ou copia desta mensaxe non está autorizada.
Nota: La información contenida en este mensaje y sus posibles documentos adjuntos es privada y confidencial y está dirigida únicamente a su destinatario/a. Si usted no es el/la destinatario/a original de este mensaje, por favor elimínelo. La distribución o copia de este mensaje no está autorizada.
See more languages: http://www.sergas.es/aviso_confidencialidad.htm

Comments

  • Hi Carlos,

    I can try to estimate something, but first a few questions:

    1. How big is the tablespace currently? How soon do you usually run out of space?

    2. How is your institution set up to use Oracle 10g (i.e. physical servers, or some kind of cloud service where you add resources as necessary)?

    3. What version of OpenClinica are you currently using? About how many items/subjects are you usually extracting? What extract formats are you using?

    Thanks in advance and Happy New Year, Tom

    Tom Hickerson | Senior Tier-2 Engineer
    [email protected]
    www.openclinica.com
    Open Source Platform for Clinical Research
  • cmtilvecmtilve Posts: 54
    Hi, Tom, thanks for taking the time. And happy new year to everyone, I’m amazed I didn’t realize what time of the year it was when I wrote the mail.

    1. Currently we have 2 GB assigned to tablespace. At the beginning, 1GB seemed to be enough, last week we increased it once, but as I mentioned, I can’t ask for another resize until I have an approximate number. We only run out of space when we try to extract these two particular datasets, I’ll describe them below (both of them quite large in comparison to the rest).
    2. That I really don’t know. However, I’m pretty sure they’re not physical servers, and they’ve already tried to increase the tablespace once, so I guess the second approach would be closer.
    3. Sorry, I forgot to mention, we’re currently using version 3.0.4.1 in four different copies of the application pointing to three different databases (horrible, I know, but there’s absolutely no way of avoiding that).
    4. As for the datasets, one of them extracted close to 110 items from 10K subjects and the other 863 items from a little under 1200 subjects. Usually we extract with tab separated files, since we can use the with excel afterwards.

    Thanks again (don’t thank me, it’s you helping me out! ;) ), and kind regards.



    De: [email protected] [mailto:[email protected]] En nombre de Tom Hickerson
    Enviado el: martes, 03 de enero de 2012 11:15
    Para: [email protected]
    Asunto: Re: [Users] Problem while extracting data

    Hi Carlos,

    I can try to estimate something, but first a few questions:

    1. How big is the tablespace currently? How soon do you usually run out of space?

    2. How is your institution set up to use Oracle 10g (i.e. physical servers, or some kind of cloud service where you add resources as necessary)?

    3. What version of OpenClinica are you currently using? About how many items/subjects are you usually extracting? What extract formats are you using?

    Thanks in advance and Happy New Year, Tom

    Tom Hickerson | Senior Tier-2 Engineer
    [email protected]
    www.openclinica.com
    Open Source Platform for Clinical Research
    Sent: Fri 12/30/2011 14:07
    To: [email protected]
    Subject: [Users] Problem while extracting data
    Hello all,

    Is there any suggestions as to how big the database’s tablespace should be? We’re using Oracle 10g and been running out of it lately when extracting data from the largest studies (either lots of items or lots of subjects). So, the DB guys here have agreed to give us more room, provided we give them an estimation about how much would be necessary. As of now I’m puzzled about how to estimate that, since the SQL statement to get the data is huge (besides, I’m usually lousy at estimations, since I tend to be extremely pessimistic).

    Also, if this was commented earlier, either in issues, mails or docs, could someone please point me to them?

    Any hint would be appreciated.

    Carlos Manuel Tilve Álvarez,
    Servicio de Informática, 4ª planta.
    Xerencia de Xestión Integrada de A Coruña
    Hospital A Coruña
    As Xubias de arriba nº 86 cp:15006 A Coruña.
    TLF: 981 176 213
    981 178 000 ext. 295467


    Nota: A información contida nesta mensaxe e os seus posibles documentos adxuntos é privada e confidencial e está dirixida únicamente ó seu destinatario/a. Se vostede non é o/a destinatario/a orixinal desta mensaxe, por favor elimínea. A distribución ou copia desta mensaxe non está autorizada.
    Nota: La información contenida en este mensaje y sus posibles documentos adjuntos es privada y confidencial y está dirigida únicamente a su destinatario/a. Si usted no es el/la destinatario/a original de este mensaje, por favor elimínelo. La distribución o copia de este mensaje no está autorizada.
    See more languages: http://www.sergas.es/aviso_confidencialidad.htm
    Nota: A información contida nesta mensaxe e os seus posibles documentos adxuntos é privada e confidencial e está dirixida únicamente ó seu destinatario/a. Se vostede non é o/a destinatario/a orixinal desta mensaxe, por favor elimínea. A distribución ou copia desta mensaxe non está autorizada.
    Nota: La información contenida en este mensaje y sus posibles documentos adjuntos es privada y confidencial y está dirigida únicamente a su destinatario/a. Si usted no es el/la destinatario/a original de este mensaje, por favor elimínelo. La distribución o copia de este mensaje no está autorizada.
    See more languages: http://www.sergas.es/aviso_confidencialidad.htm
  • Hi Carlos,

    The first observation I want to point out is that we changed the Extract procedure post-3.0.4.2 so that it would take up less memory. With 3.0.4.2, there are actually two very large queries hitting your database, one for the item_data and one for the metadata, and then the extract is constructed using data from both. Post-3.0.4.2, we used different queries to construct an extract in ODM XML, and then would transform the data on the application server side using XSLT.

    One way to test this out without going through an upgrade to 3.1.2 would be to try the ODM XML queries of your datasets and compare how quickly you run out of tablespace, as compared to your extracts in tab-separated format. Does running this in XML impact your database the same way?

    Best, Tom

    Tom Hickerson | Senior Tier-2 Engineer
    [email protected]
    www.openclinica.com
    Open Source Platform for Clinical Research
  • cmtilvecmtilve Posts: 54
    Hi again Tom,

    It would seem then that we should consider the upgrade again. We didn’t upgrade before because we had just updated to version 3.0.4.1, Is there a big difference in the interface from the user’s perspective?

    Back on the topic, I tried to extract with the ODM XML, it did extract OK and it took OC much less time to complete the operation. It usually takes up to 6 minutes or so to run out of tablespace, but this finished in less than two. I guess that the answer would be that it doesn’t impact our database in the same way. In fact it seems it doesn’t impact our database in any way.

    Well, this means that even if we’re not allowed to increase the tablespace again, we still have options.

    Again, thanks and kind regards.

    De: [email protected] [mailto:[email protected]] En nombre de Tom Hickerson
    Enviado el: martes, 03 de enero de 2012 14:19
    Para: [email protected]
    Asunto: Re: [Users] Problem while extracting data

    Hi Carlos,

    The first observation I want to point out is that we changed the Extract procedure post-3.0.4.2 so that it would take up less memory. With 3.0.4.2, there are actually two very large queries hitting your database, one for the item_data and one for the metadata, and then the extract is constructed using data from both. Post-3.0.4.2, we used different queries to construct an extract in ODM XML, and then would transform the data on the application server side using XSLT.

    One way to test this out without going through an upgrade to 3.1.2 would be to try the ODM XML queries of your datasets and compare how quickly you run out of tablespace, as compared to your extracts in tab-separated format. Does running this in XML impact your database the same way?

    Best, Tom

    Tom Hickerson | Senior Tier-2 Engineer
    [email protected]
    www.openclinica.com
    Open Source Platform for Clinical Research
    Sent: Tue 1/3/2012 14:46
    To: [email protected]
    Subject: Re: [Users] Problem while extracting data
    Hi, Tom, thanks for taking the time. And happy new year to everyone, I’m amazed I didn’t realize what time of the year it was when I wrote the mail.

    1. Currently we have 2 GB assigned to tablespace. At the beginning, 1GB seemed to be enough, last week we increased it once, but as I mentioned, I can’t ask for another resize until I have an approximate number. We only run out of space when we try to extract these two particular datasets, I’ll describe them below (both of them quite large in comparison to the rest).
    2. That I really don’t know. However, I’m pretty sure they’re not physical servers, and they’ve already tried to increase the tablespace once, so I guess the second approach would be closer.
    3. Sorry, I forgot to mention, we’re currently using version 3.0.4.1 in four different copies of the application pointing to three different databases (horrible, I know, but there’s absolutely no way of avoiding that).
    4. As for the datasets, one of them extracted close to 110 items from 10K subjects and the other 863 items from a little under 1200 subjects. Usually we extract with tab separated files, since we can use the with excel afterwards.

    Thanks again (don’t thank me, it’s you helping me out! ;) ), and kind regards.



    De: [email protected] [mailto:[email protected]] En nombre de Tom Hickerson
    Enviado el: martes, 03 de enero de 2012 11:15
    Para: [email protected]
    Asunto: Re: [Users] Problem while extracting data

    Hi Carlos,

    I can try to estimate something, but first a few questions:

    1. How big is the tablespace currently? How soon do you usually run out of space?

    2. How is your institution set up to use Oracle 10g (i.e. physical servers, or some kind of cloud service where you add resources as necessary)?

    3. What version of OpenClinica are you currently using? About how many items/subjects are you usually extracting? What extract formats are you using?

    Thanks in advance and Happy New Year, Tom

    Tom Hickerson | Senior Tier-2 Engineer
    [email protected]
    www.openclinica.com
    Open Source Platform for Clinical Research
    Sent: Fri 12/30/2011 14:07
    To: [email protected]
    Subject: [Users] Problem while extracting data
    Hello all,

    Is there any suggestions as to how big the database’s tablespace should be? We’re using Oracle 10g and been running out of it lately when extracting data from the largest studies (either lots of items or lots of subjects). So, the DB guys here have agreed to give us more room, provided we give them an estimation about how much would be necessary. As of now I’m puzzled about how to estimate that, since the SQL statement to get the data is huge (besides, I’m usually lousy at estimations, since I tend to be extremely pessimistic).

    Also, if this was commented earlier, either in issues, mails or docs, could someone please point me to them?

    Any hint would be appreciated.

    Carlos Manuel Tilve Álvarez,
    Servicio de Informática, 4ª planta.
    Xerencia de Xestión Integrada de A Coruña
    Hospital A Coruña
    As Xubias de arriba nº 86 cp:15006 A Coruña.
    TLF: 981 176 213
    981 178 000 ext. 295467


    Nota: A información contida nesta mensaxe e os seus posibles documentos adxuntos é privada e confidencial e está dirixida únicamente ó seu destinatario/a. Se vostede non é o/a destinatario/a orixinal desta mensaxe, por favor elimínea. A distribución ou copia desta mensaxe non está autorizada.
    Nota: La información contenida en este mensaje y sus posibles documentos adjuntos es privada y confidencial y está dirigida únicamente a su destinatario/a. Si usted no es el/la destinatario/a original de este mensaje, por favor elimínelo. La distribución o copia de este mensaje no está autorizada.
    See more languages: http://www.sergas.es/aviso_confidencialidad.htm

    Nota: A información contida nesta mensaxe e os seus posibles documentos adxuntos é privada e confidencial e está dirixida únicamente ó seu destinatario/a. Se vostede non é o/a destinatario/a orixinal desta mensaxe, por favor elimínea. A distribución ou copia desta mensaxe non está autorizada.
    Nota: La información contenida en este mensaje y sus posibles documentos adjuntos es privada y confidencial y está dirigida únicamente a su destinatario/a. Si usted no es el/la destinatario/a original de este mensaje, por favor elimínelo. La distribución o copia de este mensaje no está autorizada.
    See more languages: http://www.sergas.es/aviso_confidencialidad.htm
    Nota: A información contida nesta mensaxe e os seus posibles documentos adxuntos é privada e confidencial e está dirixida únicamente ó seu destinatario/a. Se vostede non é o/a destinatario/a orixinal desta mensaxe, por favor elimínea. A distribución ou copia desta mensaxe non está autorizada.
    Nota: La información contenida en este mensaje y sus posibles documentos adjuntos es privada y confidencial y está dirigida únicamente a su destinatario/a. Si usted no es el/la destinatario/a original de este mensaje, por favor elimínelo. La distribución o copia de este mensaje no está autorizada.
    See more languages: http://www.sergas.es/aviso_confidencialidad.htm
  • Hi Carlos,

    Regarding your question about the interface, the change is not a big one - the main change is that the server will split off a separate thread to run the extract while the user can go back to work on other things. You can read about the changes in more detail here:

    http://blog.openclinica.com/2010/12/22/plug-in-architecture-for-openclinica-data-extracts/

    Best, Tom


    Tom Hickerson | Senior Tier-2 Engineer
    [email protected]
    www.openclinica.com
    Open Source Platform for Clinical Research
This discussion has been closed.