Microsoft Encoding Issues

If you copy/paste string(s) from MS Word to MS Excel during CRF build, most likely you will get into a trouble.  Here are the reasons:
  • OpenClinica, like the rest of the world, uses UTF-8 encoding
  • Microsoft chose UTF-16LE encoding for its file system
  • Microsoft Office products still uses ancient, Microsoft propriety so-called Extended ASCII, including:
    • Smart Quotes
    • M-Dash
    • N-Dash
  • Microsoft Office copies/pasts invisible control characters into/from Windows OS clipboard

Anytime OpenClinica encounters one or more of these characters stored in its database, the common symptoms are:

  • Study Metadata fails to open
  • Unable to extract data
  • Rule crashes with null pointer exception error

Another issue.  As I mentioned above, Microsoft file system is UTF-16LE encoding.  This may cause a trouble with Data Mart in Downloadable format because OpenClinica/Java produces UTF-8 file format.  Windows OS may corrupt UTF-8 characters used in OpenClinica.

