I have had a problem that seems to be related to concurrency, and I was wondering if anyone has experience with something similar.
When I HTTP POST 100 synchronous (1 at a time) study:listAllrequest requests to the SOAP web services, it will handle them fine, all HTTP 200 OK responses. It is still OK even if I blast it with 2000 duplicate requests. Connections are re-used up to 100 times before closing.
When I HTTP POST 100 asynchronous (more than 1 at a time, up to 10) study:listAllrequest requests to the SOAP web services, randomly between 0% to 75% of the requests will be HTTP 500 Internal Server Error SOAP responses, and the rest are HTTP 200 OK. Multiple connections are used in this case, e.g. 2 workers = 2 connections, 10=10, being re-used until all 100 requests are sent, although a 500 response seems to trigger a replacement of a connection in the pool.
I have tried numerous different ways of writing the same code, all with the same result. I've tried using the python 3 synchronous libraries urllib and requests, using ThreadPoolExecutors to run concurrently. I've also tried the python 3 asynchronous library aiohttp, using asyncio's yield from / await features. I've also checked it wasn't a python-specific thing by using SOAPUI 5.0.0 LoadTests with more than 1 thread.
I've also tried adjusting the Tomcat server.xml settings, but the defaults for threadpools, connections and timeouts are already decently high; increasing them had no effect. Similarly, adjusting the OpenClinica settings in datainfo.properties quartz.threadPool had no effect.
It doesn't seem to happen with the OpenClinica web app. I can send 100 HTTP GETs to the login page, or 100 HTTP POSTs to j_spring_security_check, and it never seems to return a HTTP 500, no matter if the requests are sent sync or async, 2 or 20 workers.
So my conclusion is that the web services don't work for this use case and that requests must be sent one at a time.
Other evidence for this conclusion is that when running these tests, some bizarre stuff is showing up in the openclinica debug logs. Like string and int query parameters getting mixed up, or incorrect timestamp values being sent. Example log lines copied below.
03/31 19:09:21  AUTH WARN o.a.o.d.l.UserAccountDAO:648 - Exception while processing result rows, EntityDAO.select: : Bad value for type int : coordinator: array length: 1
03/31 19:09:21  AUTH ERROR o.a.o.d.l.UserAccountDAO:649 - Bad value for type int : coordinator
org.postgresql.util.PSQLException: Bad value for type int : coordinator
03/31 18:42:58  AUTH ERROR o.a.o.d.l.UserAccountDAO:649 - Bad value for type date : [[email protected]
org.postgresql.util.PSQLException: Bad value for type date : [[email protected]
Caused by: java.lang.NumberFormatException: Trailing junk on timestamp: ''