We hope you'll join us for our 4/23 webinar on using data tables to apply reference ranges and AE codes in OC4. For more information and to register, visit https://register.gotowebinar.com/register/2882170018956684555

Repeated form extraction

When an event has a CRF that is repeated, such as a daily collection form,
and then I extract the data elements, I would expect output such as the
following, with each daily form on a different line:
Patient# Date1 Item1 Item2 Item3 ...
Patient# Date2 Item1 Item2 Item3 ...
But what actually comes over is:
Item1 Item2 Item3... Item1 Item2 Item3 ...
I am clearly missing something. Thanks for any advice.
- Mike
-- J. Michael Dean, M.D., M.B.A. H.A. and Edna Benning Presidential Professor of Pediatrics Professor of Biomedical Informatics Vice-Chairman of Pediatrics Chief, Division of Pediatric Critical Care Medicine University of Utah School of Medicine Salt Lake City, Utah

Comments

  • Hi Mike,

    I have defined each part of how the item variables are stored and extracted by adding a color key. This shows how the variables are extracted for repeating events as well as repeating items. I hope this helps.


    ITEMNAME_A1_1_GROUPNAME_1

    Item name is as defined in the spreadsheet

    The alpha character depicts from which event in your dataset this variable is coming from. If you include CRFs from two different events, the first event would be ‘A’ the second event would be ‘B’. The numeric value depicts which CRF within the event the variable is coming from. If you include two CRFs from the same event definition they would be ‘A1’ and ‘A2’.
    This ordinal corresponds to the occurrence of the event. If you extracted data for an event which repeated three times you would see: ITEMNAME_A1_1 , ITEMNAME_A1_2, ITEMNAME_A1_3 .
    Group name is as defined in your spreadsheet ( GROUP_LABEL). This will only be included for items which have been assigned to a group.
    This ordinal corresponds to the group repetition. If you had three rows of data in the group, they would be saved as ITEMNAME_A1_1_GROUPNAME_1, ITEMNAME_A1_1_GROUPNAME_2, ITEMNAME_A1_1_GROUPNAME_2


    The repeat function allows you to collect the same set of data multiple times. I have created a table which looks similar to what you would see in the user interface, while what is in red shows how it will be stored in the database.

    Where ITEM* = Item variable as defined in your spreadsheet and GROUPLABEL = the GROUP_LABEL as defined in your spreadsheet




    ITEM1

    ITEM2

    ITEM3

    ITEM4
    Repeat 1

    ITEM1_A1_1_GROUPLABEL_1

    ITEM2_A1_1_GROUPLABEL_1

    ITEM3_A1_1_GROUPLABEL_1

    ITEM4_A1_1_GROUPLABEL_1
    Repeat 2

    ITEM1_A1_1_GROUPLABEL_2

    ITEM2_A1_1_GROUPLABEL_2

    ITEM3_A1_1_GROUPLABEL_2

    ITEM4_A1_1_GROUPLABEL_2
    Repeat 3

    ITEM1_A1_1_GROUPLABEL_3

    ITEM2_A1_1_GROUPLABEL_3

    ITEM3_A1_1_GROUPLABEL_3

    ITEM4_A1_1_GROUPLABEL_3

    I hope this is helpful. Please let me know if you have more questions.

    Best Regards,

    Alicia

    . . . . . . . . . . . . . . . . . . . .

    Alicia Goodwin
    Akaza Research
    400 One Kendall Square
    Cambridge, MA 02139
    tel: 617.621.8585 ext. 24
    [email protected]
    www.akazaresearch.com

    www.openclinica.org
    Open Source Platform for Clinical Research



    -----Original Message-----
    Sent: Tuesday, May 26, 2009 2:03 PM
    To: [email protected]
    Subject: [Users] Repeated form extraction

    When an event has a CRF that is repeated, such as a daily collection form,
    and then I extract the data elements, I would expect output such as the
    following, with each daily form on a different line:

    Patient# Date1 Item1 Item2 Item3 ...
    Patient# Date2 Item1 Item2 Item3 ...

    But what actually comes over is:

    Item1 Item2 Item3... Item1 Item2 Item3 ...

    I am clearly missing something. Thanks for any advice.

    - Mike
    --
    J. Michael Dean, M.D., M.B.A.
    H.A. and Edna Benning Presidential Professor of Pediatrics
    Professor of Biomedical Informatics
    Vice-Chairman of Pediatrics
    Chief, Division of Pediatric Critical Care Medicine
    University of Utah School of Medicine
    Salt Lake City, Utah
  • hevoilahevoila Posts: 126
    Hello Alicia,
    Thanks you for your great description.
    Few more questions about this:
    1. This coding is defined only for flat file? But not with ODM?
    For _A1_
    2. Which limit do we have with the alpha character? (26 From A to Z?)
    3. For the numeric value, does we need to have the same or a different CRF? It's the occurrence of CRF in the event, as I understand.
    4. For every numeric value, what's the limit? (99, 999..?)
    5. What about the version in the CRF? Is It detached from this?
    6. In my opinion, with what you describe, it's important to make an excel CRF spreadsheet as little as possible, isn't it?
    7. The goal of this, is to manage item variable with a data management software, like SAS.
    Do you use FORMAT command or something like this in SAS to help you and split datas in the appropriate array?
    King Regards,
    Hevoila.
    ----- Mail Original -----
    De: "Alicia Goodwin"
    À: [email protected]
    Envoyé: Mardi 26 Mai 2009 20h15:43 GMT +01:00 Amsterdam / Berlin / Berne / Rome / Stockholm / Vienne
    Objet: RE: [Users] Repeated form extraction
    Hi Mike,
    I have defined each part of how the item variables are stored and extracted by adding a color key. This shows how the variables are extracted for repeating events as well as repeating items. I hope this helps.
    ITEMNAME_A1_1_GROUPNAME_1
    • Item name is as defined in the spreadsheet
    • The alpha character depicts from which event in your dataset this variable is coming from. If you include CRFs from two different events, the first event would be ‘A’ the second event would be ‘B’. The numeric value depicts which CRF within the event the variable is coming from. If you include two CRFs from the same event definition they would be ‘A1’ and ‘A2’.
    • This ordinal corresponds to the occurrence of the event. If you extracted data for an event which repeated three times you would see: ITEMNAME_A1_1 , ITEMNAME_A1_2, ITEMNAME_A1_3 .
    • Group name is as defined in your spreadsheet ( GROUP_LABEL). This will only be included for items which have been assigned to a group.
    • This ordinal corresponds to the group repetition. If you had three rows of data in the group, they would be saved as ITEMNAME_A1_1_GROUPNAME_1, ITEMNAME_A1_1_GROUPNAME_2, ITEMNAME_A1_1_GROUPNAME_2
    The repeat function allows you to collect the same set of data multiple times. I have created a table which looks similar to what you would see in the user interface, while what is in red shows how it will be stored in the database.
    Where ITEM* = Item variable as defined in your spreadsheet and GROUPLABEL = the GROUP_LABEL as defined in your spreadsheet


    ITEM1
    ITEM2
    ITEM3
    ITEM4

    Repeat 1
    ITEM1 _A1_1 _GROUPLABEL_1
    ITEM2_ A1_1_ GROUPLABEL_1
    ITEM3_ A1_1_ GROUPLABEL_1
    ITEM4_ A1_1_ GROUPLABEL_1

    Repeat 2
    ITEM1_ A1_1_ GROUPLABEL_2
    ITEM2_ A1_1_ GROUPLABEL_2
    ITEM3_ A1_1_ GROUPLABEL_2
    ITEM4_ A1_1_ GROUPLABEL_2

    Repeat 3
    ITEM1_ A1_1_ GROUPLABEL_3
    ITEM2_ A1_1_ GROUPLABEL_3
    ITEM3_ A1_1_ GROUPLABEL_3
    ITEM4_ A1_1_ GROUPLABEL_3
    I hope this is helpful. Please let me know if you have more questions.
    Best Regards,
    Alicia
    . . . . . . . . . . . . . . . . . . . .
    Alicia Goodwin
    Akaza Research
    400 One Kendall Square
    Cambridge , MA 02139
    tel: 617.621.8585 ext. 24
    [email protected]
    www.akazaresearch.com
    www.openclinica.org
    Open Source Platform for Clinical Research
    -----Original Message-----
    Sent: Tuesday, May 26, 2009 2:03 PM
    To: [email protected]
    Subject: [Users] Repeated form extraction
    When an event has a CRF that is repeated, such as a daily collection form,
    and then I extract the data elements, I would expect output such as the
    following, with each daily form on a different line:
    Patient# Date1 Item1 Item2 Item3 ...
    Patient# Date2 Item1 Item2 Item3 ...
    But what actually comes over is:
    Item1 Item2 Item3... Item1 Item2 Item3 ...
    I am clearly missing something. Thanks for any advice.
    - Mike
    -- J. Michael Dean, M.D., M.B.A. H.A. and Edna Benning Presidential Professor of Pediatrics Professor of Biomedical Informatics Vice-Chairman of Pediatrics Chief, Division of Pediatric Critical Care Medicine University of Utah School of Medicine Salt Lake City , Utah
  • fruitynewtfruitynewt Posts: 21
    For those of you who have an interest in exporting data for SAS a colleague
    of mine is developing a command line utility to do just that. We have a
    working prototype but nothing that we are comfortable making public just
    yet.
    I've included an overview of the functionality below, including some example
    SAS code. If anyone would like to help us test the utility when the time
    comes they can email me at [email protected]
    Coincidentally the same XML file can be imported directly into Access.
    However doing so loses the formats and labels that we derive from the CRF
    metadata.
    =========================================================================
    The idea is to produce two XML files. The first is a data file containing
    the data from the OpenClinica CRFs, events, subject records and
    discrepancies. This file will also contain a 'table' that will be used to
    create the SAS format catalog. The second XML file is a SAS XML map. This
    tells the SAS libname engine how to interpret tha data file. It also
    includes the item and data set labels (derived from the CRF metadata) and
    format/informat assignments.
    These two files allow the data to be read in SAS directly via the libname
    engine almost as if the data were native SAS data sets. Something like this
    (the xml92 libname is a SAS 9.2 enhancement over the existing XML engine):
    /* define the exported data as a SAS library */
    libname ocdata xml92 'M:\Projects\data.xml'
    xmlmap="M:\Projects\map.xml"
    access=readonly;
    /* define a location for the format catalog */
    libname library 'M:\Projects\fmtlib';
    /* create the format catalog from the exported data */
    proc format cntlin=ocdata.fmtlib library = library fmtlib;
    run;
    /* create native SAS data sets from the exported data. This could equally
    well be a data step accessing a table within the XML library. This statement
    is just included by way of example */
    proc datasets library=ocdata;
    copy out=work;
    exclude fmtlib;
    quit;
    There are a few decisions we had to make designing the code.
    1) Because of the possible complexity of CRF design each CRF section becomes
    a separate SAS data set. The assumption here is that a data manager would
    design the CRF to separate logical sets of data into sections. If a
    different structure is required for data management purposes then the data
    can be restructured further when it is in SAS.
    2) A CRF could contain many groups. Two or more groups contained within the
    same CRF section will be 'merged' into one data set. Any ungrouped data
    contained within the same section as a group will be repeated with each
    record from the group. If two groups are not logically related they will
    need to be placed in different CRF sections.
    3) SAS format names must be unique to the SAS session (or 'database'). This
    application reads all the response options for the study and creates a list
    of formats. If two different response lists exist on different CRFs but with
    the same names one of them will be renamed to prevent a clash. If two code
    lists are identical only one format will be created.
    4) We do not assume the data complies with SAS naming rules and we do our
    best to translate invalid SAS names derived from the CRF data into something
    SAS will accept. Typically this involves replacing invalid characters and
    truncating long item names.
    We're currently refactoring the code to make better use of available memory
    so we don't anticipate a version for testing for a few weeks.
    Rick.
    -----Original Message-----
    On Behalf Of [email protected]
    Sent: May-27-09 9:21 AM
    To: [email protected]
    Subject: Re: [Users] Repeated form extraction
    Hello Alicia,
    Thanks you for your great description.
    Few more questions about this:
    1. This coding is defined only for flat file? But not with ODM?
    For _A1_
    2. Which limit do we have with the alpha character? (26 From A to Z?)
    3. For the numeric value, does we need to have the same or a different CRF?
    It's the occurrence of CRF in the event, as I understand.
    4. For every numeric value, what's the limit? (99, 999..?)
    5. What about the version in the CRF? Is It detached from this?
    6. In my opinion, with what you describe, it's important to make an excel
    CRF spreadsheet as little as possible, isn't it?
    7. The goal of this, is to manage item variable with a data management
    software, like SAS.
    Do you use FORMAT command or something like this in SAS to help you and
    split datas in the appropriate array?
    King Regards,
    Hevoila.
    ----- Mail Original -----
    De: "Alicia Goodwin"
    À: [email protected]
    Envoyé: Mardi 26 Mai 2009 20h15:43 GMT +01:00 Amsterdam / Berlin / Berne /
    Rome / Stockholm / Vienne
    Objet: RE: [Users] Repeated form extraction
    Hi Mike,
    I have defined each part of how the item variables are stored and extracted
    by adding a color key. This shows how the variables are extracted for
    repeating events as well as repeating items. I hope this helps.
    ITEMNAME_A1_1_GROUPNAME_1
    • Item name is as defined in the spreadsheet
    • The alpha character depicts from which event in your dataset this
    variable is coming from. If you include CRFs from two different events, the
    first event would be ‘A’ the second event would be ‘B’. The numeric value
    depicts which CRF within the event the variable is coming from. If you
    include two CRFs from the same event definition they would be ‘A1’ and ‘A2’.
    • This ordinal corresponds to the occurrence of the event. If you
    extracted data for an event which repeated three times you would see:
    ITEMNAME_A1_1 , ITEMNAME_A1_2, ITEMNAME_A1_3 .
    • Group name is as defined in your spreadsheet ( GROUP_LABEL). This will
    only be included for items which have been assigned to a group.
    • This ordinal corresponds to the group repetition. If you had three
    rows of data in the group, they would be saved as ITEMNAME_A1_1_GROUPNAME_1,
    ITEMNAME_A1_1_GROUPNAME_2, ITEMNAME_A1_1_GROUPNAME_2
    The repeat function allows you to collect the same set of data multiple
    times. I have created a table which looks similar to what you would see in
    the user interface, while what is in red shows how it will be stored in the
    database.
    Where ITEM* = Item variable as defined in your spreadsheet and GROUPLABEL =
    the GROUP_LABEL as defined in your spreadsheet


    ITEM1
    ITEM2
    ITEM3
    ITEM4

    Repeat 1
    ITEM1 _A1_1 _GROUPLABEL_1
    ITEM2_ A1_1_ GROUPLABEL_1
    ITEM3_ A1_1_ GROUPLABEL_1
    ITEM4_ A1_1_ GROUPLABEL_1

    Repeat 2
    ITEM1_ A1_1_ GROUPLABEL_2
    ITEM2_ A1_1_ GROUPLABEL_2
    ITEM3_ A1_1_ GROUPLABEL_2
    ITEM4_ A1_1_ GROUPLABEL_2

    Repeat 3
    ITEM1_ A1_1_ GROUPLABEL_3
    ITEM2_ A1_1_ GROUPLABEL_3
    ITEM3_ A1_1_ GROUPLABEL_3
    ITEM4_ A1_1_ GROUPLABEL_3
    I hope this is helpful. Please let me know if you have more questions.
    Best Regards,
    Alicia
    . . . . . . . . . . . . . . . . . . . .
    Alicia Goodwin
    Akaza Research
    400 One Kendall Square
    Cambridge , MA 02139
    tel: 617.621.8585 ext. 24
    [email protected]
    www.akazaresearch.com
    www.openclinica.org
    Open Source Platform for Clinical Research
    -----Original Message-----
    On Behalf Of Mike Dean
    Sent: Tuesday, May 26, 2009 2:03 PM
    To: [email protected]
    Subject: [Users] Repeated form extraction
    When an event has a CRF that is repeated, such as a daily collection form,
    and then I extract the data elements, I would expect output such as the
    following, with each daily form on a different line:
    Patient# Date1 Item1 Item2 Item3 ...
    Patient# Date2 Item1 Item2 Item3 ...
    But what actually comes over is:
    Item1 Item2 Item3... Item1 Item2 Item3 ...
    I am clearly missing something. Thanks for any advice.
    - Mike
    -- J. Michael Dean, M.D., M.B.A. H.A. and Edna Benning Presidential Professor of Pediatrics Professor of Biomedical Informatics Vice-Chairman of Pediatrics Chief, Division of Pediatric Critical Care Medicine University of Utah School of Medicine Salt Lake City , Utah
  • mnattermnatter Posts: 2
    Hi Rick,
    I was just investigating doing similar with SAS. But I noticed that SAS has a cdisc-specific component, so was going to use this (proc cdisc) - see http://www.sas.com/industry/pharma/cdisc/index.html
    Have you/are you using this? I believe it's part of the base xml distribution, although I could be wrong. Seems like would make things fairly straightforward, or am I missing the point?
    Marc Natter, MD
    Fellow, Clinical Informatics
    Children's Hospital Informatics Program
    Children's Hospital Boston
    Marc.Natter*at*childrens.harvard.edu
    -----Original Message-----
    Sent: Saturday, May 30, 2009 1:00 PM
    To: [email protected]
    Subject: Users Digest, Vol 38, Issue 35
    Send Users mailing list submissions to
    [email protected]
    To subscribe or unsubscribe via the World Wide Web, visit
    http://www.openclinica.org/mailman/listinfo/users
    or, via email, send a message with subject or body 'help' to
    [email protected]
    You can reach the person managing the list at
    [email protected]
    When replying, please edit your Subject line so it is more specific
    than "Re: Contents of Users digest..."
    Today's Topics:
    1. RE: Repeated form extraction (Rick Watts)
    ----------------------------------------------------------------------
    Message: 1
    Date: Fri, 29 May 2009 19:56:00 -0600
    From: "Rick Watts"
    Subject: RE: [Users] Repeated form extraction
    To:
    Message-ID: <[email protected]>
    Content-Type: text/plain; charset="iso-8859-1"
    For those of you who have an interest in exporting data for SAS a colleague
    of mine is developing a command line utility to do just that. We have a
    working prototype but nothing that we are comfortable making public just
    yet.
    I've included an overview of the functionality below, including some example
    SAS code. If anyone would like to help us test the utility when the time
    comes they can email me at [email protected]
    Coincidentally the same XML file can be imported directly into Access.
    However doing so loses the formats and labels that we derive from the CRF
    metadata.
    =========================================================================
    The idea is to produce two XML files. The first is a data file containing
    the data from the OpenClinica CRFs, events, subject records and
    discrepancies. This file will also contain a 'table' that will be used to
    create the SAS format catalog. The second XML file is a SAS XML map. This
    tells the SAS libname engine how to interpret tha data file. It also
    includes the item and data set labels (derived from the CRF metadata) and
    format/informat assignments.
    These two files allow the data to be read in SAS directly via the libname
    engine almost as if the data were native SAS data sets. Something like this
    (the xml92 libname is a SAS 9.2 enhancement over the existing XML engine):
    /* define the exported data as a SAS library */
    libname ocdata xml92 'M:\Projects\data.xml'
    xmlmap="M:\Projects\map.xml"
    access=readonly;
    /* define a location for the format catalog */
    libname library 'M:\Projects\fmtlib';
    /* create the format catalog from the exported data */
    proc format cntlin=ocdata.fmtlib library = library fmtlib;
    run;
    /* create native SAS data sets from the exported data. This could equally
    well be a data step accessing a table within the XML library. This statement
    is just included by way of example */
    proc datasets library=ocdata;
    copy out=work;
    exclude fmtlib;
    quit;
    There are a few decisions we had to make designing the code.
    1) Because of the possible complexity of CRF design each CRF section becomes
    a separate SAS data set. The assumption here is that a data manager would
    design the CRF to separate logical sets of data into sections. If a
    different structure is required for data management purposes then the data
    can be restructured further when it is in SAS.
    2) A CRF could contain many groups. Two or more groups contained within the
    same CRF section will be 'merged' into one data set. Any ungrouped data
    contained within the same section as a group will be repeated with each
    record from the group. If two groups are not logically related they will
    need to be placed in different CRF sections.
    3) SAS format names must be unique to the SAS session (or 'database'). This
    application reads all the response options for the study and creates a list
    of formats. If two different response lists exist on different CRFs but with
    the same names one of them will be renamed to prevent a clash. If two code
    lists are identical only one format will be created.
    4) We do not assume the data complies with SAS naming rules and we do our
    best to translate invalid SAS names derived from the CRF data into something
    SAS will accept. Typically this involves replacing invalid characters and
    truncating long item names.
    We're currently refactoring the code to make better use of available memory
    so we don't anticipate a version for testing for a few weeks.
    Rick.
    -----Original Message-----
    On Behalf Of [email protected]
    Sent: May-27-09 9:21 AM
    To: [email protected]
    Subject: Re: [Users] Repeated form extraction
    Hello Alicia,
    Thanks you for your great description.
    Few more questions about this:
    1. This coding is defined only for flat file? But not with ODM?
    For _A1_
    2. Which limit do we have with the alpha character? (26 From A to Z?)
    3. For the numeric value, does we need to have the same or a different CRF?
    It's the occurrence of CRF in the event, as I understand.
    4. For every numeric value, what's the limit? (99, 999..?)
    5. What about the version in the CRF? Is It detached from this?
    6. In my opinion, with what you describe, it's important to make an excel
    CRF spreadsheet as little as possible, isn't it?
    7. The goal of this, is to manage item variable with a data management
    software, like SAS.
    Do you use FORMAT command or something like this in SAS to help you and
    split datas in the appropriate array?
    King Regards,
    Hevoila.
    ----- Mail Original -----
    De: "Alicia Goodwin"
    ?: [email protected]
    Envoy?: Mardi 26 Mai 2009 20h15:43 GMT +01:00 Amsterdam / Berlin / Berne /
    Rome / Stockholm / Vienne
    Objet: RE: [Users] Repeated form extraction
    Hi Mike,
    I have defined each part of how the item variables are stored and extracted
    by adding a color key. This shows how the variables are extracted for
    repeating events as well as repeating items. I hope this helps.
    ITEMNAME_A1_1_GROUPNAME_1
    ? Item name is as defined in the spreadsheet
    ? The alpha character depicts from which event in your dataset this
    variable is coming from. If you include CRFs from two different events, the
    first event would be ?A? the second event would be ?B?. The numeric value
    depicts which CRF within the event the variable is coming from. If you
    include two CRFs from the same event definition they would be ?A1? and ?A2?.
    ? This ordinal corresponds to the occurrence of the event. If you
    extracted data for an event which repeated three times you would see:
    ITEMNAME_A1_1 , ITEMNAME_A1_2, ITEMNAME_A1_3 .
    ? Group name is as defined in your spreadsheet ( GROUP_LABEL). This will
    only be included for items which have been assigned to a group.
    ? This ordinal corresponds to the group repetition. If you had three
    rows of data in the group, they would be saved as ITEMNAME_A1_1_GROUPNAME_1,
    ITEMNAME_A1_1_GROUPNAME_2, ITEMNAME_A1_1_GROUPNAME_2
    The repeat function allows you to collect the same set of data multiple
    times. I have created a table which looks similar to what you would see in
    the user interface, while what is in red shows how it will be stored in the
    database.
    Where ITEM* = Item variable as defined in your spreadsheet and GROUPLABEL =
    the GROUP_LABEL as defined in your spreadsheet


    ITEM1
    ITEM2
    ITEM3
    ITEM4

    Repeat 1
    ITEM1 _A1_1 _GROUPLABEL_1
    ITEM2_ A1_1_ GROUPLABEL_1
    ITEM3_ A1_1_ GROUPLABEL_1
    ITEM4_ A1_1_ GROUPLABEL_1

    Repeat 2
    ITEM1_ A1_1_ GROUPLABEL_2
    ITEM2_ A1_1_ GROUPLABEL_2
    ITEM3_ A1_1_ GROUPLABEL_2
    ITEM4_ A1_1_ GROUPLABEL_2

    Repeat 3
    ITEM1_ A1_1_ GROUPLABEL_3
    ITEM2_ A1_1_ GROUPLABEL_3
    ITEM3_ A1_1_ GROUPLABEL_3
    ITEM4_ A1_1_ GROUPLABEL_3
    I hope this is helpful. Please let me know if you have more questions.
    Best Regards,
    Alicia
    . . . . . . . . . . . . . . . . . . . .
    Alicia Goodwin
    Akaza Research
    400 One Kendall Square
    Cambridge , MA 02139
    tel: 617.621.8585 ext. 24
    [email protected]
    www.akazaresearch.com
    www.openclinica.org
    Open Source Platform for Clinical Research
    -----Original Message-----
    On Behalf Of Mike Dean
    Sent: Tuesday, May 26, 2009 2:03 PM
    To: [email protected]
    Subject: [Users] Repeated form extraction
    When an event has a CRF that is repeated, such as a daily collection form,
    and then I extract the data elements, I would expect output such as the
    following, with each daily form on a different line:
    Patient# Date1 Item1 Item2 Item3 ...
    Patient# Date2 Item1 Item2 Item3 ...
    But what actually comes over is:
    Item1 Item2 Item3... Item1 Item2 Item3 ...
    I am clearly missing something. Thanks for any advice.
    - Mike
    -- J. Michael Dean, M.D., M.B.A. H.A. and Edna Benning Presidential Professor of Pediatrics Professor of Biomedical Informatics Vice-Chairman of Pediatrics Chief, Division of Pediatric Critical Care Medicine University of Utah School of Medicine Salt Lake City , Utah
  • rwattsrwatts Posts: 102
    No - you are not missing the point. However PROC CDISC contains bugs and
    is being 'deprecated' by SAS. Unofficially I've been told that SAS will
    no longer support it as it is being replaced by a new CDISC Toolkit.
    However the toolkit will only initially support SDTM and SAS could not
    tell me when ODM support will be made available.
    OpenClinica's support for ODM was also limited to CRF data, used naming
    conventions based on the system OIDs, and did not apply SAS formats. It
    did not expand multi-select objects. I use the past tense because I
    understand version 3.0 will address some of these requirements.
    We took the decision to develop our own tool so that we would have total
    control over output. We are an academic site and ease of use is more of
    a priority for us than standards compliance. However, I have no doubt
    that what works for us will not be everyone else's 'cup of tea'!
    R.
    -----Original Message-----
    [mailto:[email protected]] On Behalf Of Natter, Marc
    Sent: June 1, 2009 3:19 PM
    To: [email protected]
    Subject: RE: [Users] Repeated form extraction
    Hi Rick,
    I was just investigating doing similar with SAS. But I noticed that SAS
    has a cdisc-specific component, so was going to use this (proc cdisc) -
    see http://www.sas.com/industry/pharma/cdisc/index.html
    Have you/are you using this? I believe it's part of the base xml
    distribution, although I could be wrong. Seems like would make things
    fairly straightforward, or am I missing the point?
    Marc Natter, MD
    Fellow, Clinical Informatics
    Children's Hospital Informatics Program
    Children's Hospital Boston
    Marc.Natter*at*childrens.harvard.edu
    -----Original Message-----
    [mailto:[email protected]] On Behalf Of
    [email protected]
    Sent: Saturday, May 30, 2009 1:00 PM
    To: [email protected]
    Subject: Users Digest, Vol 38, Issue 35
    Send Users mailing list submissions to
    [email protected]
    To subscribe or unsubscribe via the World Wide Web, visit
    http://www.openclinica.org/mailman/listinfo/users
    or, via email, send a message with subject or body 'help' to
    [email protected]
    You can reach the person managing the list at
    [email protected]
    When replying, please edit your Subject line so it is more specific
    than "Re: Contents of Users digest..."
    Today's Topics:
    1. RE: Repeated form extraction (Rick Watts)
    ----------------------------------------------------------------------
    Message: 1
    Date: Fri, 29 May 2009 19:56:00 -0600
    From: "Rick Watts"
    Subject: RE: [Users] Repeated form extraction
    To:
    Message-ID: <[email protected]>
    Content-Type: text/plain; charset="iso-8859-1"
    For those of you who have an interest in exporting data for SAS a
    colleague
    of mine is developing a command line utility to do just that. We have a
    working prototype but nothing that we are comfortable making public just
    yet.
    I've included an overview of the functionality below, including some
    example
    SAS code. If anyone would like to help us test the utility when the time
    comes they can email me at [email protected]
    Coincidentally the same XML file can be imported directly into Access.
    However doing so loses the formats and labels that we derive from the
    CRF
    metadata.
    ========================================================================
    =
    The idea is to produce two XML files. The first is a data file
    containing
    the data from the OpenClinica CRFs, events, subject records and
    discrepancies. This file will also contain a 'table' that will be used
    to
    create the SAS format catalog. The second XML file is a SAS XML map.
    This
    tells the SAS libname engine how to interpret tha data file. It also
    includes the item and data set labels (derived from the CRF metadata)
    and
    format/informat assignments.
    These two files allow the data to be read in SAS directly via the
    libname
    engine almost as if the data were native SAS data sets. Something like
    this
    (the xml92 libname is a SAS 9.2 enhancement over the existing XML
    engine):
    /* define the exported data as a SAS library */
    libname ocdata xml92 'M:\Projects\data.xml'
    xmlmap="M:\Projects\map.xml"
    access=readonly;
    /* define a location for the format catalog */
    libname library 'M:\Projects\fmtlib';
    /* create the format catalog from the exported data */
    proc format cntlin=ocdata.fmtlib library = library fmtlib;
    run;
    /* create native SAS data sets from the exported data. This could
    equally
    well be a data step accessing a table within the XML library. This
    statement
    is just included by way of example */
    proc datasets library=ocdata;
    copy out=work;
    exclude fmtlib;
    quit;
    There are a few decisions we had to make designing the code.
    1) Because of the possible complexity of CRF design each CRF section
    becomes
    a separate SAS data set. The assumption here is that a data manager
    would
    design the CRF to separate logical sets of data into sections. If a
    different structure is required for data management purposes then the
    data
    can be restructured further when it is in SAS.
    2) A CRF could contain many groups. Two or more groups contained within
    the
    same CRF section will be 'merged' into one data set. Any ungrouped data
    contained within the same section as a group will be repeated with each
    record from the group. If two groups are not logically related they will
    need to be placed in different CRF sections.
    3) SAS format names must be unique to the SAS session (or 'database').
    This
    application reads all the response options for the study and creates a
    list
    of formats. If two different response lists exist on different CRFs but
    with
    the same names one of them will be renamed to prevent a clash. If two
    code
    lists are identical only one format will be created.
    4) We do not assume the data complies with SAS naming rules and we do
    our
    best to translate invalid SAS names derived from the CRF data into
    something
    SAS will accept. Typically this involves replacing invalid characters
    and
    truncating long item names.
    We're currently refactoring the code to make better use of available
    memory
    so we don't anticipate a version for testing for a few weeks.
    Rick.
    -----Original Message-----
    [mailto:[email protected]]
    On Behalf Of [email protected]
    Sent: May-27-09 9:21 AM
    To: [email protected]
    Subject: Re: [Users] Repeated form extraction
    Hello Alicia,
    Thanks you for your great description.
    Few more questions about this:
    1. This coding is defined only for flat file? But not with ODM?
    For _A1_
    2. Which limit do we have with the alpha character? (26 From A to Z?)
    3. For the numeric value, does we need to have the same or a different
    CRF?
    It's the occurrence of CRF in the event, as I understand.
    4. For every numeric value, what's the limit? (99, 999..?)
    5. What about the version in the CRF? Is It detached from this?
    6. In my opinion, with what you describe, it's important to make an
    excel
    CRF spreadsheet as little as possible, isn't it?
    7. The goal of this, is to manage item variable with a data management
    software, like SAS.
    Do you use FORMAT command or something like this in SAS to help you and
    split datas in the appropriate array?
    King Regards,
    Hevoila.
    ----- Mail Original -----
    De: "Alicia Goodwin"
    ?: [email protected]
    Envoy?: Mardi 26 Mai 2009 20h15:43 GMT +01:00 Amsterdam / Berlin / Berne
    /
    Rome / Stockholm / Vienne
    Objet: RE: [Users] Repeated form extraction
    Hi Mike,
    I have defined each part of how the item variables are stored and
    extracted
    by adding a color key. This shows how the variables are extracted for
    repeating events as well as repeating items. I hope this helps.
    ITEMNAME_A1_1_GROUPNAME_1
    ? Item name is as defined in the spreadsheet
    ? The alpha character depicts from which event in your dataset this
    variable is coming from. If you include CRFs from two different events,
    the
    first event would be ?A? the second event would be ?B?. The numeric
    value
    depicts which CRF within the event the variable is coming from. If you
    include two CRFs from the same event definition they would be ?A1? and
    ?A2?.
    ? This ordinal corresponds to the occurrence of the event. If you
    extracted data for an event which repeated three times you would see:
    ITEMNAME_A1_1 , ITEMNAME_A1_2, ITEMNAME_A1_3 .
    ? Group name is as defined in your spreadsheet ( GROUP_LABEL). This
    will
    only be included for items which have been assigned to a group.
    ? This ordinal corresponds to the group repetition. If you had three
    rows of data in the group, they would be saved as
    ITEMNAME_A1_1_GROUPNAME_1,
    ITEMNAME_A1_1_GROUPNAME_2, ITEMNAME_A1_1_GROUPNAME_2
    The repeat function allows you to collect the same set of data multiple
    times. I have created a table which looks similar to what you would see
    in
    the user interface, while what is in red shows how it will be stored in
    the
    database.
    Where ITEM* = Item variable as defined in your spreadsheet and
    GROUPLABEL =
    the GROUP_LABEL as defined in your spreadsheet


    ITEM1
    ITEM2
    ITEM3
    ITEM4

    Repeat 1
    ITEM1 _A1_1 _GROUPLABEL_1
    ITEM2_ A1_1_ GROUPLABEL_1
    ITEM3_ A1_1_ GROUPLABEL_1
    ITEM4_ A1_1_ GROUPLABEL_1

    Repeat 2
    ITEM1_ A1_1_ GROUPLABEL_2
    ITEM2_ A1_1_ GROUPLABEL_2
    ITEM3_ A1_1_ GROUPLABEL_2
    ITEM4_ A1_1_ GROUPLABEL_2

    Repeat 3
    ITEM1_ A1_1_ GROUPLABEL_3
    ITEM2_ A1_1_ GROUPLABEL_3
    ITEM3_ A1_1_ GROUPLABEL_3
    ITEM4_ A1_1_ GROUPLABEL_3
    I hope this is helpful. Please let me know if you have more questions.
    Best Regards,
    Alicia
    . . . . . . . . . . . . . . . . . . . .
    Alicia Goodwin
    Akaza Research
    400 One Kendall Square
    Cambridge , MA 02139
    tel: 617.621.8585 ext. 24
    [email protected]
    www.akazaresearch.com
    www.openclinica.org
    Open Source Platform for Clinical Research
    -----Original Message-----
    [mailto:[email protected]]
    On Behalf Of Mike Dean
    Sent: Tuesday, May 26, 2009 2:03 PM
    To: [email protected]
    Subject: [Users] Repeated form extraction
    When an event has a CRF that is repeated, such as a daily collection
    form,
    and then I extract the data elements, I would expect output such as the
    following, with each daily form on a different line:
    Patient# Date1 Item1 Item2 Item3 ...
    Patient# Date2 Item1 Item2 Item3 ...
    But what actually comes over is:
    Item1 Item2 Item3... Item1 Item2 Item3 ...
    I am clearly missing something. Thanks for any advice.
    - Mike
    -- J. Michael Dean, M.D., M.B.A. H.A. and Edna Benning Presidential Professor of Pediatrics Professor of Biomedical Informatics Vice-Chairman of Pediatrics Chief, Division of Pediatric Critical Care Medicine University of Utah School of Medicine Salt Lake City , Utah
  • mnattermnatter Posts: 2
    Thanks for the add'l info, Rick, I wasn't at all aware of the proc cdisc issues. It will be a problem if SAS puts future development of ODM 1.2/1.3 on the back burner, too.
    I'd certainly be interested in helping test out any routines you'd be using. We have some applications with OpenClinica which I will be (at least temporarily) using SAS as the ETL glue layer for.
    - Marc
    Marc Natter, MD
    Fellow, Clinical Informatics
    Children's Hospital Informatics Program
    Children's Hospital Boston
    Marc.Natter*at*childrens.harvard.edu
    ------------------------------
    Message: 2
    Date: Mon, 1 Jun 2009 15:53:08 -0600
    From:
    Subject: RE: [Users] Repeated form extraction
    To:
    Message-ID:
    <[email protected]>
    Content-Type: text/plain; charset="us-ascii"
    No - you are not missing the point. However PROC CDISC contains bugs and
    is being 'deprecated' by SAS. Unofficially I've been told that SAS will
    no longer support it as it is being replaced by a new CDISC Toolkit.
    However the toolkit will only initially support SDTM and SAS could not
    tell me when ODM support will be made available.
    OpenClinica's support for ODM was also limited to CRF data, used naming
    conventions based on the system OIDs, and did not apply SAS formats. It
    did not expand multi-select objects. I use the past tense because I
    understand version 3.0 will address some of these requirements.
    We took the decision to develop our own tool so that we would have total
    control over output. We are an academic site and ease of use is more of
    a priority for us than standards compliance. However, I have no doubt
    that what works for us will not be everyone else's 'cup of tea'!
    R.
    -----Original Message-----
    [mailto:[email protected]] On Behalf Of Natter, Marc
    Sent: June 1, 2009 3:19 PM
    To: [email protected]
    Subject: RE: [Users] Repeated form extraction
    Hi Rick,
    I was just investigating doing similar with SAS. But I noticed that SAS
    has a cdisc-specific component, so was going to use this (proc cdisc) -
    see http://www.sas.com/industry/pharma/cdisc/index.html
    Have you/are you using this? I believe it's part of the base xml
    distribution, although I could be wrong. Seems like would make things
    fairly straightforward, or am I missing the point?
    Marc Natter, MD
    Fellow, Clinical Informatics
    Children's Hospital Informatics Program
    Children's Hospital Boston
    Marc.Natter*at*childrens.harvard.edu
This discussion has been closed.