We hope you'll join us for our 4/23 webinar on using data tables to apply reference ranges and AE codes in OC4. For more information and to register, visit https://register.gotowebinar.com/register/2882170018956684555

Cannot extract data from a study on an upgraded database

Please, I have an urgent matter:
I have a database coming from OpenClinica 3.0.4.2 on a Postgres 9.0
database, which is now on a OpenClinica 3.1.2-Communiry on a Postgres 8.4
database.
The migration worked fine. The database contains 2 studies.
I can create data sets and extract data with no problems from study A. But
when I do the same on study B, I get the following:
The extract data job failed with the message:
java.io.UTFDataFormatException: Invalid byte 2 of 3-byte UTF-8 sequence.
More information may be available in the log files.
The log in the log files isn't worth much (tomcat6-stdout):
found both id 7 and dataset 16
found odm xml file path C:\Program Files\Apache Software Foundation\Tomcat
6.0\openclinica2.datadatasets\16\2012\10\19\232930271
found xslt file name C:\Program Files\Apache Software Foundation\Tomcat 6.0
\openclinica2.dataxslt\ODMToTAB.xsl
== found job date: Fri Oct 19 23:29:30 CEST 2012
Warning: at xsl:variable on line 735 of :
SXWN9001: A variable with no following sibling instructions has no effect
Warning: at xsl:variable on line 1009 of :
SXWN9001: A variable with no following sibling instructions has no effect
Error
java.io.UTFDataFormatException: Invalid byte 2 of 3-byte UTF-8 sequence.
trying to retrieve status on
16_EXCEL_Urine_24h_2012-10-19-232930266.xls1350682170272 XsltTriggers
found state: -1
adding a message!
net.sf.saxon.trans.DynamicError: java.io.UTFDataFormatException: Invalid
byte 2 of 3-byte UTF-8 sequence.
at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:283)
at net.sf.saxon.event.Sender.send(Sender.java:144)
at net.sf.saxon.event.Sender.send(Sender.java:46)
at net.sf.saxon.Controller.transform(Controller.java:1340)
at
org.akaza.openclinica.job.XsltTransformJob.executeInternal(XsltTransformJob.java:264)
at
org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86)
at org.quartz.core.JobRunShell.run(JobRunShell.java:199)
at
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)
Caused by: java.io.UTFDataFormatException: Invalid byte 2 of 3-byte UTF-8
sequence.
at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown Source)
at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown
Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown
Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:270)
... 7 more
---------
java.io.UTFDataFormatException: Invalid byte 2 of 3-byte UTF-8 sequence.
at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown Source)
at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown
Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown
Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:270)
at net.sf.saxon.event.Sender.send(Sender.java:144)
at net.sf.saxon.event.Sender.send(Sender.java:46)
at net.sf.saxon.Controller.transform(Controller.java:1340)
at
org.akaza.openclinica.job.XsltTransformJob.executeInternal(XsltTransformJob.java:264)
at
org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86)
at org.quartz.core.JobRunShell.run(JobRunShell.java:199)
at
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)
File Name?EXCEL_Urine_24h_2012-10-19-230444192.xls
file to deleted:EXCEL_Urine_24h_2012-10-19-230444192.xlsFile Not to
deleted:EXCEL_Urine_24h_2012-10-19-230444192.xls
File Name?Urine_24hD20121019232930+0200.xml
trying to retrieve status on
16_EXCEL_Urine_24h_2012-10-19-232930266.xls1350682170272 XsltTriggers
found state: -1
adding a message!
No matter which extract output format I choose, I get the same error.
Unfortunately, I cannot see which character or line of text that causes
this UTFDataFormatException.
Even when I create datasets containing just a single item, I get the same
error. Please note, that data extracts the other study on the same database
work perfectly.
Does anyone have an idea to what could be wrong?? How do I check which
character or which line this causes the exception to be thrown?
Best regards,
Janus

Comments

  • JanusJanus Posts: 260
    Please, I have an urgent matter:
    I have a database coming from OpenClinica 3.0.4.2 on a Postgres 9.0
    database, which is now on a OpenClinica 3.1.2-Communiry on a Postgres 8.4
    database.
    The migration worked fine. The database contains 2 studies.
    I can create data sets and extract data with no problems from study A. But
    when I do the same on study B, I get the following:
    The extract data job failed with the message:
    java.io.UTFDataFormatException: Invalid byte 2 of 3-byte UTF-8 sequence.
    More information may be available in the log files.
    The log in the log files isn't worth much (tomcat6-stdout):
    found both id 7 and dataset 16
    found odm xml file path C:\Program Files\Apache Software Foundation\Tomcat
    6.0\openclinica2.datadatasets\16\2012\10\19\232930271\
    found xslt file name C:\Program Files\Apache Software Foundation\Tomcat 6.0
    \openclinica2.dataxslt\ODMToTAB.xsl
    == found job date: Fri Oct 19 23:29:30 CEST 2012
    Warning: at xsl:variable on line 735 of :
    SXWN9001: A variable with no following sibling instructions has no effect
    Warning: at xsl:variable on line 1009 of :
    SXWN9001: A variable with no following sibling instructions has no effect
    Error
    java.io.UTFDataFormatException: Invalid byte 2 of 3-byte UTF-8 sequence.
    trying to retrieve status on
    16_EXCEL_Urine_24h_2012-10-19-232930266.xls1350682170272 XsltTriggers
    found state: -1
    adding a message!
    net.sf.saxon.trans.DynamicError: java.io.UTFDataFormatException: Invalid
    byte 2 of 3-byte UTF-8 sequence.
    at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:283)
    at net.sf.saxon.event.Sender.send(Sender.java:144)
    at net.sf.saxon.event.Sender.send(Sender.java:46)
    at net.sf.saxon.Controller.transform(Controller.java:1340)
    at
    org.akaza.openclinica.job.XsltTransformJob.executeInternal(XsltTransformJob.java:264)
    at
    org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86)
    at org.quartz.core.JobRunShell.run(JobRunShell.java:199)
    at
    org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)
    Caused by: java.io.UTFDataFormatException: Invalid byte 2 of 3-byte UTF-8
    sequence.
    at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown Source)
    at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
    at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
    at org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown
    Source)
    at
    org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown
    Source)
    at
    org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
    Source)
    at
    org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
    Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
    at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:270)
    ... 7 more
    ---------
    java.io.UTFDataFormatException: Invalid byte 2 of 3-byte UTF-8 sequence.
    at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown Source)
    at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
    at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
    at org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown
    Source)
    at
    org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown
    Source)
    at
    org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
    Source)
    at
    org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
    Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
    at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:270)
    at net.sf.saxon.event.Sender.send(Sender.java:144)
    at net.sf.saxon.event.Sender.send(Sender.java:46)
    at net.sf.saxon.Controller.transform(Controller.java:1340)
    at
    org.akaza.openclinica.job.XsltTransformJob.executeInternal(XsltTransformJob.java:264)
    at
    org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86)
    at org.quartz.core.JobRunShell.run(JobRunShell.java:199)
    at
    org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)
    File Name?EXCEL_Urine_24h_2012-10-19-230444192.xls
    file to deleted:EXCEL_Urine_24h_2012-10-19-230444192.xlsFile Not to
    deleted:EXCEL_Urine_24h_2012-10-19-230444192.xls
    File Name?Urine_24hD20121019232930+0200.xml
    trying to retrieve status on
    16_EXCEL_Urine_24h_2012-10-19-232930266.xls1350682170272 XsltTriggers
    found state: -1
    adding a message!
    No matter which extract output format I choose, I get the same error.
    Unfortunately, I cannot see which character or line of text that causes
    this UTFDataFormatException.
    Even when I create datasets containing just a single item, I get the same
    error. Please note, that data extracts the other study on the same database
    work perfectly.
    Does anyone have an idea to what could be wrong?? How do I check which
    character or which line this causes the exception to be thrown?
    Best regards,
    Janus
  • hhonshukuhhonshuku Posts: 50
    This problem is typically observed when termination character is pasted into CRF data often caused by Microsoft product, i.e., CRF column text was pasted from MSWord. Can you open Study metadata without similar error? It it also errors, usually the error gives you a better clue where in the database contains the offending character. If metadata doesn't error on view, hunting down the offensive character is a bit more tricky.
    -Hiro
    On Fri, Oct 19, 2012 at 7:08 PM, Janus Engstrøm wrote:
    Please, I have an urgent matter:
    I have a database coming from OpenClinica 3.0.4.2 on a Postgres 9.0
    database, which is now on a OpenClinica 3.1.2-Communiry on a Postgres 8.4
    database.
    The migration worked fine. The database contains 2 studies.
    I can create data sets and extract data with no problems from study A. But
    when I do the same on study B, I get the following:
    The extract data job failed with the message:
    java.io.UTFDataFormatException: Invalid byte 2 of 3-byte UTF-8 sequence.
    More information may be available in the log files.
    The log in the log files isn't worth much (tomcat6-stdout):
    found both id 7 and dataset 16
    found odm xml file path C:\Program Files\Apache Software Foundation\Tomcat
    6.0\openclinica2.datadatasets\16\2012\10\19\232930271\
    found xslt file name C:\Program Files\Apache Software Foundation\Tomcat 6.0
    \openclinica2.dataxslt\ODMToTAB.xsl
    == found job date: Fri Oct 19 23:29:30 CEST 2012
    Warning: at xsl:variable on line 735 of :
    SXWN9001: A variable with no following sibling instructions has no effect
    Warning: at xsl:variable on line 1009 of :
    SXWN9001: A variable with no following sibling instructions has no effect
    Error
    java.io.UTFDataFormatException: Invalid byte 2 of 3-byte UTF-8 sequence.
    trying to retrieve status on
    16_EXCEL_Urine_24h_2012-10-19-232930266.xls1350682170272 XsltTriggers
    found state: -1
    adding a message!
    net.sf.saxon.trans.DynamicError: java.io.UTFDataFormatException: Invalid
    byte 2 of 3-byte UTF-8 sequence.
    at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:283)
    at net.sf.saxon.event.Sender.send(Sender.java:144)
    at net.sf.saxon.event.Sender.send(Sender.java:46)
    at net.sf.saxon.Controller.transform(Controller.java:1340)
    at
    org.akaza.openclinica.job.XsltTransformJob.executeInternal(XsltTransformJob.java:264)
    at
    org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86)
    at org.quartz.core.JobRunShell.run(JobRunShell.java:199)
    at
    org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)
    Caused by: java.io.UTFDataFormatException: Invalid byte 2 of 3-byte UTF-8
    sequence.
    at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown Source)
    at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
    at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
    at org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown
    Source)
    at
    org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown
    Source)
    at
    org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
    Source)
    at
    org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
    Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
    at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:270)
    ... 7 more
    ---------
    java.io.UTFDataFormatException: Invalid byte 2 of 3-byte UTF-8 sequence.
    at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown Source)
    at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
    at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
    at org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown
    Source)
    at
    org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown
    Source)
    at
    org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
    Source)
    at
    org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
    Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
    at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:270)
    at net.sf.saxon.event.Sender.send(Sender.java:144)
    at net.sf.saxon.event.Sender.send(Sender.java:46)
    at net.sf.saxon.Controller.transform(Controller.java:1340)
    at
    org.akaza.openclinica.job.XsltTransformJob.executeInternal(XsltTransformJob.java:264)
    at
    org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86)
    at org.quartz.core.JobRunShell.run(JobRunShell.java:199)
    at
    org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)
    File Name?EXCEL_Urine_24h_2012-10-19-230444192.xls
    file to deleted:EXCEL_Urine_24h_2012-10-19-230444192.xlsFile Not to
    deleted:EXCEL_Urine_24h_2012-10-19-230444192.xls
    File Name?Urine_24hD20121019232930+0200.xml
    trying to retrieve status on
    16_EXCEL_Urine_24h_2012-10-19-232930266.xls1350682170272 XsltTriggers
    found state: -1
    adding a message!
    No matter which extract output format I choose, I get the same error.
    Unfortunately, I cannot see which character or line of text that causes
    this UTFDataFormatException.
    Even when I create datasets containing just a single item, I get the same
    error. Please note, that data extracts the other study on the same database
    work perfectly.
    Does anyone have an idea to what could be wrong?? How do I check which
    character or which line this causes the exception to be thrown?
    Best regards,
    Janus
This discussion has been closed.