Issue Details (XML | Word | Printable)

Key: MB-558
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Gary Tully
Reporter: Torsten Mielke
Votes: 0
Watchers: 3
Operations

If you were logged in you would be able to see more operations.
FUSE Message Broker

JMS clients stalled in case of many messages on a different queue.

Created: 22/Oct/09 12:29 PM   Updated: 29/Oct/09 05:05 PM
Component/s: broker
Affects Version/s: 5.0.0.27-fuse, 5.3.0.4-fuse
Fix Version/s: 5.3.0.5-fuse

Environment: FUSE MB 5.4, direct JDBC persistence to external database, large amount of message on a queue.

External Issue URL: https://issues.apache.org/activemq/browse/AMQ-2470


 Description  « Hide
Consider the following scenario: a queue Q has 2 million messages that are all persistet into the JDBC db. Another producer/consumer pair connects to a different queue X but does not get to produce/consume any messages.

Because of having two million msgs on one queue, the periodically run clean up task takes ages to complete (for me >20 mins) but locks the entire database during that time as it runs an SQL delete statement:

DEBUG DefaultJDBCAdapter - Executing SQL: DELETE FROM ACTIVEMQ_MSGS WHERE ( EXPIRATION<>0 AND EXPIRATION<?)
OR ID <= ( SELECT min(ACTIVEMQ_ACKS.LAST_ACKED_ID) FROM ACTIVEMQ_ACKS WHERE 
ACTIVEMQ_ACKS.CONTAINER=ACTIVEMQ_MSGS.CONTAINER)

During that time no other message (no matter for what queue) can be processed.
Also, the cleanup task is scheduled to run every 2-3 mins, so once it has completed to run over the 2 million messages (>20 mins) it is already late on schedule, hence it gets kicked off again straight away, not leaving much processing time for other threads to insert msgs into the db. So the threads managing the other queues starve and might perhaps never get the processing time to process their messages.

In order to diagnose this further I suggest to enable this logging:

log4j.logger.org.apache.activemq.store.jdbc=DEBUG

This should show the above SQL statement being executed for a long time and being called repeatedly as soon as it has finished.
Also, I suggest to attach jconsole to the broker and take a stack trace of the thread that manages the queue X and has a producer/consumer pair attached. It should be waiting almost forever in the following stack trace:

Name: QueueThread:queue://JMSLoadTest.queue.0
  State: RUNNABLE
  Total blocked: 1  Total waited: 0

  Stack trace: 
  java.net.SocketInputStream.socketRead0(Native Method)
  java.net.SocketInputStream.read(SocketInputStream.java:129)
  com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:113)
  com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:160)
  com.mysql.jdbc.util.ReadAheadInputStream.read(ReadAheadInputStream.java:188)
  com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2428)
  com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2882)
  com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2871)
  com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3414)
  com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1936)
  com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2060)
  com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2542)
  com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1734)
  com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1876)
  org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:91)
  org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:91)
  org.apache.activemq.store.jdbc.adapter.DefaultJDBCAdapter.doRecoverNextMessages(DefaultJDBCAdapter.java:709)
  org.apache.activemq.store.jdbc.JDBCMessageStore.recoverNextMessages(JDBCMessageStore.java:230)
  org.apache.activemq.store.ProxyMessageStore.recoverNextMessages(ProxyMessageStore.java:83)
  org.apache.activemq.broker.region.cursors.QueueStorePrefetch.doFillBatch(QueueStorePrefetch.java:75)
  org.apache.activemq.broker.region.cursors.AbstractStoreCursor.fillBatch(AbstractStoreCursor.java:210)
  org.apache.activemq.broker.region.cursors.AbstractStoreCursor.hasNext(AbstractStoreCursor.java:119)
  org.apache.activemq.broker.region.cursors.StoreQueueCursor.hasNext(StoreQueueCursor.java:131)
  org.apache.activemq.broker.region.Queue.doPageIn(Queue.java:1243)
  org.apache.activemq.broker.region.Queue.pageInMessages(Queue.java:1378)
  org.apache.activemq.broker.region.Queue.iterate(Queue.java:1086)
  org.apache.activemq.thread.DeterministicTaskRunner.runTask(DeterministicTaskRunner.java:84)
  org.apache.activemq.thread.DeterministicTaskRunner$1.run(DeterministicTaskRunner.java:41)
  java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
  java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
  java.lang.Thread.run(Thread.java:595)


 All   Comments   Change History      Sort Order: Ascending order - Click to sort in descending order
Torsten Mielke added a comment - 22/Oct/09 12:43 PM
Perhaps a fix could limit the amount of time the cleanup task gets to executed to ensure it doesn't block everything else.

Torsten Mielke added a comment - 23/Oct/09 08:01 AM
This problem does not seem to manifest on every JDBC database. I reproduced it using MySQL. However when Apache Derby is used, the problem is not observed. It seems that Derby does not lock the entire db table when running the cleanup task's SQL statement (perhaps a row locking mechanism is applied).
So whatever solution is worked out for this bug, it should probably be configurable.

Gary Tully added a comment - 23/Oct/09 11:59 AM - edited
the cleanup task is using a fixed rate rather than a fixed delay executor so changing that would make it a lot better. At least if it takes 20minutes, it will wait cleanupInterval before trying again. At the moment a bunch of cleanup executions will have backed up and it will execute immediately again.

That may do it. Another approach would be to limit the number of rows that are queried by the delete statement, but some state needs to be maintained to allow the entire table to be processed over time. ROWNUM > X < Y


Gary Tully added a comment - 29/Oct/09 02:12 PM
opened https://issues.apache.org/activemq/browse/AMQ-2470 at apache to implement some changes. Simplest solution is to have the cleanup on a fixed delay schedule so that if it takes 20 minutes, and with a period of 10 minutes, it will wait 10 minutes from when it finishes. ie: fixed delay. At the moment it is on a fixed rate schedule.
If this is not sufficient, a limit via ROWNUM needs to be explored.

Gary Tully added a comment - 29/Oct/09 05:05 PM
merged fix from https://issues.apache.org/activemq/browse/AMQ-2470 - making cleanup have fixed delay.

cleanup period attribute on the JDBCPeristenceAdapter allows the period to be configured. This period is now the delay from the finish of the task till the next start such that it won't hog the DB.