Effective October 27, 2012, online and email support for FuseSource products will move to Red Hat support channels. For more information, please see the JIRA Migration to Red Hat FAQ.
As of October 27th, please open all new issues in the Red Hat Customer Portal .
Issue Details (XML | Word | Printable)

Key: MB-668
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Blocker Blocker
Assignee: Gary Tully
Reporter: Rob Davies
Votes: 1
Watchers: 5
Operations

If you were logged in you would be able to see more operations.
FUSE Message Broker

Message Broker Stops Dispatching from Queues

Created: 10/May/10 08:29 AM   Updated: 31/Aug/10 12:36 PM
Component/s: broker
Affects Version/s: 5.3.0.3-fuse
Fix Version/s: None

Environment: AIX 6.1
Issue Links:
Linked
 


 Description  « Hide
Application has Two Queues, used for processing persistent messages. There is an inbound Queue and an Oubound Queue - both with multiple consumers, using selectors.
The Queue depth never gets beyond 2-3 messages, until dispatching suddenly stops.

 All   Comments   Change History      Sort Order: Ascending order - Click to sort in descending order
Rob Davies added a comment - 10/May/10 08:36 AM
Core file when broker stopped dispatching. Log appender blocked, Message Store full - probably because dispatching stopped

Gary Tully added a comment - 10/May/10 09:46 PM
It appears that a concurrent jmx copy operation, via the console may be the problem. This makes a temp change to the maxPageSize that can break dispatch if it happens concurrently with dispatch.

Would there be some admin task that uses the console or jmx to copy some messages by any chance?


Jack Britton added a comment - 07/Jun/10 07:18 PM
Hi gary. I noticed you were the last to comment on this issue. Is there an update?

Gary Tully added a comment - 09/Jun/10 03:37 PM
Jack, I would like some feedback on my question about the use of jmx at the time that the problem occurred or the use of jmx programatically. From analysis of the logs the use of JMX (Jconsole or webconsole) copy seems to be the culprit. I wonder if this can be confirmed from the customer usecase or usage pattern of the application in question?

Jack Britton added a comment - 22/Jul/10 03:53 PM
All CVS Carmark's, AMQ and ESB instances are monitored by HP SiteScope using a JMX interface similar to JConsole. The HP SiteScope monitors (polls) every couple of minutes to check in statistics like queue depths, Heap Sizes, Component States, and number of open sockets, etc.

Specificall for the CVS ESP (Enterprise Service Platform) the AMQ broker dtopped accepting new messages even when hundreds of publishers were attached trying to enqueue.

Question: Can you help CVS understand why concurrent JMX operations lead to a hung state in a lightly loaded broker?


Gary Tully added a comment - 22/Jul/10 04:51 PM
So the problem is only with the use of the attribute on a destination view, maxPageSize, A change to this while dispatch is in operation can lead to dispatch stopping, which seems to be the case from the analysis of the logs.
A JMX copy/move/ operation changes this attribute as part of normal operation so this is the most likely culprit, but as the attribute exposed in read/write mode on the DestinationView any tool that can change those attributes could be involved.

In any event, we need to fix this concurrency bug around the maxPageSize attribute, but It would be great to know if it is probable that via JMX, either that attribute was set or a copy or move operation was executed on that broker.


Jack Britton added a comment - 22/Jul/10 06:00 PM
The change in maxPageSize was dont via JMX after the hung condition manifested itself (Rob suggested it during troubleshooting). The only other thing we do with JMX is monitor so I dont think the sitescope tools would be doing a copy or move?

Gary Tully added a comment - 23/Jul/10 02:43 PM
That explains why the broker got into the non recoverable state it was in but it does not help understand the original root cause. What information have we from before the JMX maxPageSize operation was invoked?

Jack Britton added a comment - 26/Jul/10 03:17 PM
Just what was in the submitted logs. This really is an issue here as everything is monitored with JMX and if there a chance that the monitoring is somehow stoping messaging then we need to figure that out.

Gary Tully added a comment - 26/Jul/10 03:51 PM
We can address the issue with concurrent access to maxPageSize via JMX and dispatch in the 5.4 release but from your comments it looks like this is not the root cause. I have opened http://fusesource.com/issues/browse/MB-706 to track this for 5.4.

To do more analysis we need some more thread dumps and logs from a broker that gets into this state, or a test case that can reproduce the hung scenario.
I need to do one more pass of the existing logs to ensure there is nothing useful being overlooked.


Gary Tully added a comment - 29/Jul/10 02:33 PM
Jeff yes, that looks like the sort of information we need. Essentially a collection of periodic thread dumps from the broker when the hang occurs.

Jack, can you try and pull together the relevant information, logs and thread dumps that are relevant to a given event, there seems to be information scattered across a bunch of jira issues. We need to make sure we are analyzing the correct latest information.