Forum Home » Fuse Distributions » Fuse Mediation Router

Thread: potential quartz/threading or esb bug

 

Permlink Replies: 17 - Last Post: Jun 6, 2010 10:48 PM Last Post By: unwired Threads: [ Previous | Next ]
unwired

Posts: 35
Registered: 08/04/09
potential quartz/threading or esb bug
Posted: Mar 19, 2010 6:29 AM
  Click to reply to this thread Reply
Hi,

We are using servicemix 4.1.0.2 with mediation router (camel 2.2) and have a very consistent, unexplainable experience that we are having. It is happening so consistently now that it is definitely not a "fluke". I will describe what is happening and it will sound impossible :) but we are pretty much 100% certain that it is.

We have about 4 osgi bundles that are using the quartz timer. All of these bundles use common bundles. I only mention that as it may be related but I don't think it is.

This I am very certain of:

what happens is every time we either 'stop' a bundle - or 'refresh' a bundle that is using the quartz timer, the other bundles who are also on the quartz time stop working. When I say stop working, I just mean the quartz no longer executes the route. Through the osgi console they still report as 'Active' and 'Started', and there are NO errors reported in the log. In fact the bundles that did not get restarted just no longer do anything.

This I am still confirming for sure:

Here's some more details that might be relevant.

-I think timer may do this as well but we're not using it much now so not sure but i thought I saw it do it too.
-I believe sometimes even the bundle that gets refreshed also does not work (i.e. sometimes the bundle that you do a refresh on works fine, other times i think all the bundles on the quartz stops working).

Has anyone else seen this? I have not had time to look at the camel or fuse code but I am thinking it is more in the realm of camel but not sure. It seems like a potential threading issue to me but again not certain on this.

any ideas will be helpful because at the moment every time we refresh a bundle that is on a quartz we have to sort out all other bundles on quartz - obviously not ideal! :)

thanks very much
davsclaus

Posts: 1,893
Registered: 10/14/08
Re: potential quartz/threading or esb bug
Posted: Mar 19, 2010 7:20 AM   in response to: unwired in response to: unwired
  Click to reply to this thread Reply
FUSE ESB 4.2 is the first release which officially supports FUSE MR 2.2.
I recommend to upgrade to use an ESB which can be supported.
unwired

Posts: 35
Registered: 08/04/09
Re: potential quartz/threading or esb bug
Posted: Mar 19, 2010 9:11 AM   in response to: davsclaus in response to: davsclaus
  Click to reply to this thread Reply
thanks very much for the quick response.

i just noticed i said servicemix 4.1 but clearly you figured out i meant esb.:) we can explore upgrading to esb 4.2 (didn't realize it was available) but FYI this was happening on fuse 4.1.0.2 with both camel 2.0 and 2.0-M3. We've only recently upgraded to camel 2.2 to pick up a fix/improvement with the ftp (reconnects) and file (issue with archiving the file in 2.0-M3) components and hoped that this would go away but it did not.

We will investigate that upgrade when we have time and report back, but i must admit it seems like this issue resides more with camel than esb. it seems odd that only certain components have the problem. our polling consumers do not see this issue.

thanks very much
davsclaus

Posts: 1,893
Registered: 10/14/08
Re: potential quartz/threading or esb bug
Posted: Mar 19, 2010 10:26 AM   in response to: unwired in response to: unwired
  Click to reply to this thread Reply
The quartz scheduler must be shutdown to properly terminate the quartz tasks.

You should see this DEBUG logging

if (LOG.isDebugEnabled()) {
                LOG.debug("Shutting down Quartz scheduler: " + scheduler.getSchedulerName());
            }


In this logname org.apache.camel.component.quartz

Can you try to see that? Eg it also logs when its starting the quartz scheduler
unwired

Posts: 35
Registered: 08/04/09
Re: potential quartz/threading or esb bug
Posted: Mar 19, 2010 11:26 PM   in response to: davsclaus in response to: davsclaus
  Click to reply to this thread Reply
Ok thanks will look for this in our logs next week (it's our weekend here:)). All our applications are currently logging to one log so I assume I should only see that once if we stop one of the applications using quartz.

i will also confirm that there are no namespace conflicts (i.e. we're using different timer names) but i'm nearly certain we are - and i don't think it would let you even bring up two quartz with the same name anyway but will investigate next week.

thanks
unwired

Posts: 35
Registered: 08/04/09
Re: potential quartz/threading or esb bug
Posted: Mar 21, 2010 9:13 PM   in response to: davsclaus in response to: davsclaus
  Click to reply to this thread Reply
Actually I didn't see that log but I do see this, but I don't think this is when we are 'refreshing' or stop/starting a bundle. I'll put this in for reference but I personally don't think it's too relevant because I don't see this at the time of a deploy (see logs below).

2010-03-19 04:02:35,928 | WARN  | 25: ShutdownTask | DefaultShutdownStrategy          | mel.impl.DefaultShutdownStrategy  181 | Error occurred while shutting down route: Consumer[quartz://collector/AccountVw/?cron=0+*+*+*+*+%3F+*]. This exception will be ignored.
2010-03-19 04:03:07,471 | WARN  | 26: ShutdownTask | DefaultShutdownStrategy          | mel.impl.DefaultShutdownStrategy  181 | Error occurred while shutting down route: Consumer[quartz://collector/DHCP/?cron=0+*+*+*+*+%3F+*]. This exception will be ignored.
2010-03-19 04:03:14,241 | WARN  | 27: ShutdownTask | DefaultShutdownStrategy          | mel.impl.DefaultShutdownStrategy  181 | Error occurred while shutting down route: Consumer[quartz://collector/Usage/?cron=0+25+18+*+*+%3F+*]. This exception will be ignored.


I wonder if the fact that they're all on the same group name has anything to do with this? I just re-read the doco but I'm not certain from that; what is the expected behaviour of quartz timers in the same group. If one application using quartz is on the same group as others and it gets bounced - is it expected the others will be affected? I didn't think they would but now i'm not sure.

I just reproduced in on test (it pretty much happens every time), and here's the relevant logs.

At first the other quartz's are firing away this is healthy....

2010-03-21 20:27:00,003 | DEBUG | zSchedulerThread | SimpleJobFactory                 | rg.quartz.simpl.SimpleJobFactory   50 | Producing instance of Job 'DEFAULT.quartz://collector/AccountVw/?cron=0+*+*+*+*+%3F+*', class=org.apache.came
l.component.quartz.CamelJob
2010-03-21 20:27:00,003 | DEBUG | heduler_Worker-8 | JobRunShell                      | org.quartz.core.JobRunShell       201 | Calling execute on job DEFAULT.quartz://collector/AccountVw/?cron=0+*+*+*+*+%3F+*
2010-03-21 20:27:00,004 | DEBUG | heduler_Worker-8 | QuartzEndpoint                   | .component.quartz.QuartzEndpoint  100 | Firing Quartz Job with context: JobExecutionContext: trigger: 'collector.AccountVw/ job: DEFAULT.quartz://col
lector/AccountVw/?cron=0+*+*+*+*+%3F+* fireTime: 'Sun Mar 21 20:27:00 UTC 2010 scheduledFireTime: Sun Mar 21 20:27:00 UTC 2010 previousFireTime: 'Sun Mar 21 20:26:00 UTC 2010 nextFireTime: Sun Mar 21 20:28:00 UTC 2010 isRecovering: false
 refireCount: 0
2010-03-21 20:27:00,004 | DEBUG | heduler_Worker-8 | DefaultListableBeanFactory       | tory.support.AbstractBeanFactory  214 | Returning cached instance of singleton bean 'collectorHelper'
2010-03-21 20:27:00,005 | DEBUG | zSchedulerThread | SimpleJobFactory                 | rg.quartz.simpl.SimpleJobFactory   50 | Producing instance of Job 'DEFAULT.quartz://collector/DHCP/?cron=0+*+*+*+*+%3F+*', class=org.apache.camel.com
ponent.quartz.CamelJob
2010-03-21 20:27:00,005 | DEBUG | heduler_Worker-2 | JobRunShell                      | org.quartz.core.JobRunShell       201 | Calling execute on job DEFAULT.quartz://collector/DHCP/?cron=0+*+*+*+*+%3F+*
2010-03-21 20:27:00,005 | DEBUG | heduler_Worker-2 | QuartzEndpoint                   | .component.quartz.QuartzEndpoint  100 | Firing Quartz Job with context: JobExecutionContext: trigger: 'collector.DHCP/ job: DEFAULT.quartz://collecto
r/DHCP/?cron=0+*+*+*+*+%3F+* fireTime: 'Sun Mar 21 20:27:00 UTC 2010 scheduledFireTime: Sun Mar 21 20:27:00 UTC 2010 previousFireTime: 'Sun Mar 21 20:26:00 UTC 2010 nextFireTime: Sun Mar 21 20:28:00 UTC 2010 isRecovering: false refireCou
nt: 0
 
 
.................
 
2010-03-21 20:31:00,018 | DEBUG | zSchedulerThread | SimpleJobFactory                 | rg.quartz.simpl.SimpleJobFactory   50 | Producing instance of Job 'DEFAULT.quartz://collector/AccountVw/?cron=0+*+*+*+*+%3F+*', class=org.apache.came
l.component.quartz.CamelJob
2010-03-21 20:31:00,018 | DEBUG | heduler_Worker-6 | JobRunShell                      | org.quartz.core.JobRunShell       201 | Calling execute on job DEFAULT.quartz://collector/AccountVw/?cron=0+*+*+*+*+%3F+*
2010-03-21 20:31:00,018 | DEBUG | zSchedulerThread | SimpleJobFactory                 | rg.quartz.simpl.SimpleJobFactory   50 | Producing instance of Job 'DEFAULT.quartz://collector/DHCP/?cron=0+*+*+*+*+%3F+*', class=org.apache.camel.com
ponent.quartz.CamelJob
2010-03-21 20:31:00,018 | DEBUG | heduler_Worker-6 | QuartzEndpoint                   | .component.quartz.QuartzEndpoint  100 | Firing Quartz Job with context: JobExecutionContext: trigger: 'collector.AccountVw/ job: DEFAULT.quartz://col
lector/AccountVw/?cron=0+*+*+*+*+%3F+* fireTime: 'Sun Mar 21 20:31:00 UTC 2010 scheduledFireTime: Sun Mar 21 20:31:00 UTC 2010 previousFireTime: 'Sun Mar 21 20:30:00 UTC 2010 nextFireTime: Sun Mar 21 20:32:00 UTC 2010 isRecovering: false
 refireCount: 0
2010-03-21 20:31:00,019 | DEBUG | heduler_Worker-1 | JobRunShell                      | org.quartz.core.JobRunShell       201 | Calling execute on job DEFAULT.quartz://collector/DHCP/?cron=0+*+*+*+*+%3F+*
2010-03-21 20:31:00,019 | DEBUG | heduler_Worker-1 | QuartzEndpoint                   | .component.quartz.QuartzEndpoint  100 | Firing Quartz Job with context: JobExecutionContext: trigger: 'collector.DHCP/ job: DEFAULT.quartz://collecto
r/DHCP/?cron=0+*+*+*+*+%3F+* fireTime: 'Sun Mar 21 20:31:00 UTC 2010 scheduledFireTime: Sun Mar 21 20:31:00 UTC 2010 previousFireTime: 'Sun Mar 21 20:30:00 UTC 2010 nextFireTime: Sun Mar 21 20:32:00 UTC 2010 isRecovering: false refireCou
nt: 0


But then I do a refresh on one of the applications (classifier) and the other apps on quartz stop.

There's no explicit mention of anything happening to the other apps on quartz, but it's like the deploy of one of the apps "affects" the quartz bundle and the others lose reference to it or something...

2010-03-21 20:31:19,682 | DEBUG | Timer-1          | DefaultListableBeanFactory       | ort.DefaultSingletonBeanRegistry  476 | Retrieved dependent beans for bean 'camelContext': [org.apache.camel.component.quartz.QuartzComponent, org.ap
ache.camel.component.jdbc.JdbcComponent]
2010-03-21 20:31:19,683 | DEBUG | Timer-1          | DisposableBeanAdapter            | ry.support.DisposableBeanAdapter  148 | Invoking destroy() on bean with name 'camelContext'
2010-03-21 20:31:19,683 | INFO  | Timer-1          | DefaultCamelContext              | e.camel.impl.DefaultCamelContext 1098 | Apache Camel 2.2.0 (CamelContext:camelContext) is stopping
2010-03-21 20:31:19,683 | INFO  | Timer-1          | DefaultShutdownStrategy          | mel.impl.DefaultShutdownStrategy   73 | Starting to graceful shutdown routes (timeout 300 seconds)
2010-03-21 20:31:19,684 | DEBUG |  7: ShutdownTask | DefaultShutdownStrategy          | ultShutdownStrategy$ShutdownTask  274 | There are 1 routes to shutdown
2010-03-21 20:31:19,684 | DEBUG |  7: ShutdownTask | QuartzConsumer                   | pache.camel.impl.DefaultConsumer   69 | Stopping consumer: Consumer[quartz://collector/btsclassifier/?cron=0+5%2C20%2C35%2C50+*+*+*+%3F+*]
2010-03-21 20:31:19,684 | DEBUG |  7: ShutdownTask | DefaultShutdownStrategy          | mel.impl.DefaultShutdownStrategy  187 | Shutdown complete for: Consumer[quartz://collector/btsclassifier/?cron=0+5%2C20%2C35%2C50+*+*+*+%3F+*]
 
........................
 
2010-03-21 20:31:19,689 | DEBUG | Timer-1          | QuartzComponent                  | component.quartz.QuartzComponent  132 | Shutting down Quartz scheduler: DefaultQuartzScheduler
2010-03-21 20:31:19,689 | INFO  | Timer-1          | QuartzScheduler                  | org.quartz.core.QuartzScheduler   570 | Scheduler DefaultQuartzScheduler_$_NON_CLUSTERED shutting down.
2010-03-21 20:31:19,689 | INFO  | Timer-1          | QuartzScheduler                  | org.quartz.core.QuartzScheduler   496 | Scheduler DefaultQuartzScheduler_$_NON_CLUSTERED paused.
2010-03-21 20:31:19,690 | INFO  | Timer-1          | QuartzScheduler                  | org.quartz.core.QuartzScheduler   621 | Scheduler DefaultQuartzScheduler_$_NON_CLUSTERED shutdown complete.
2010-03-21 20:31:19,690 | DEBUG | Timer-1          | SharedProducerServicePool        | he.camel.impl.DefaultServicePool  100 | Stopping service pool: org.apache.camel.impl.SharedProducerServicePool@1d13f38
 
..................
 
2010-03-21 20:31:19,694 | DEBUG | Timer-1          | DefaultManagementAgent           | anagement.DefaultManagementAgent  218 | Unregistered MBean with objectname: org.apache.camel:context=[servername]/camelContext,type=en
dpoints,name="quartz://collector/btsclassifier/\?cron=0+5%2C20%2C35%2C50+\*+\*+\*+%3F+\*"
2010-03-21 20:31:19,694 | INFO  | Timer-1          | DefaultCamelContext              | e.camel.impl.DefaultCamelContext 1141 | Apache Camel 2.2.0 (CamelContext:camelContext) stopped
 
................
 
2010-03-21 20:31:27,191 | DEBUG | xtenderThread-90 | Activator                        | org.apache.camel.osgi.Activator   303 | Found entry: quartz via type: org.apache.camel.component.quartz.QuartzComponent
2010-03-21 20:31:27,191 | DEBUG | xtenderThread-90 | DefaultListableBeanFactory       | ractAutowireCapableBeanFactory$1  383 | Creating instance of bean 'org.apache.camel.component.quartz.QuartzComponent'
2010-03-21 20:31:27,192 | DEBUG | xtenderThread-90 | DefaultListableBeanFactory       | tory.support.AbstractBeanFactory  214 | Returning cached instance of singleton bean 'camelContext'
2010-03-21 20:31:27,192 | DEBUG | xtenderThread-90 | DefaultListableBeanFactory       | tory.support.ConstructorResolver  601 | Autowiring by type from bean name 'org.apache.camel.component.quartz.QuartzComponent' via constructor to bean named 'camelContext'
2010-03-21 20:31:27,193 | DEBUG | xtenderThread-90 | DefaultListableBeanFactory       | ractAutowireCapableBeanFactory$1  411 | Finished creating instance of bean 'org.apache.camel.component.quartz.QuartzComponent'
2010-03-21 20:31:27,195 | DEBUG | xtenderThread-90 | CachedIntrospectionResults       | beans.CachedIntrospectionResults  151 | Not strongly caching class [org.apache.camel.management.mbean.ManagedComponent] because it is not cache-safe
2010-03-21 20:31:27,197 | DEBUG | xtenderThread-90 | DefaultManagementAgent           | anagement.DefaultManagementAgent  304 | Registered MBean with objectname: org.apache.camel:context=[serverName]/camelContext,type=components,name="quartz"
2010-03-21 20:31:27,226 | INFO  | xtenderThread-90 | SimpleThreadPool                 | rg.quartz.simpl.SimpleThreadPool  258 | Job execution threads will use class loader of thread: SpringOsgiExtenderThread-90
2010-03-21 20:31:27,227 | INFO  | xtenderThread-90 | SchedulerSignalerImpl            | uartz.core.SchedulerSignalerImpl   63 | Initialized Scheduler Signaller of type: class org.quartz.core.SchedulerSignalerImpl
2010-03-21 20:31:27,228 | INFO  | xtenderThread-90 | QuartzScheduler                  | org.quartz.core.QuartzScheduler   215 | Quartz Scheduler v.UNKNOWN.UNKNOWN.UNKNOWN created.
2010-03-21 20:31:27,228 | INFO  | xtenderThread-90 | RAMJobStore                      | org.quartz.simpl.RAMJobStore      141 | RAMJobStore initialized.
2010-03-21 20:31:27,228 | INFO  | xtenderThread-90 | StdSchedulerFactory              | .quartz.impl.StdSchedulerFactory 1224 | Quartz scheduler 'DefaultQuartzScheduler' initialized from default resource file in Quartz package: 'quartz.properties'
2010-03-21 20:31:27,229 | INFO  | xtenderThread-90 | StdSchedulerFactory              | .quartz.impl.StdSchedulerFactory 1228 | Quartz scheduler version: UNKNOWN.UNKNOWN.UNKNOWN
2010-03-21 20:31:27,229 | DEBUG | xtenderThread-90 | QuartzComponent                  | component.quartz.QuartzComponent  123 | Starting Quartz scheduler: DefaultQuartzScheduler
2010-03-21 20:31:27,229 | INFO  | xtenderThread-90 | QuartzScheduler                  | org.quartz.core.QuartzScheduler   461 | Scheduler DefaultQuartzScheduler_$_NON_CLUSTERED started.
2010-03-21 20:31:27,230 | DEBUG | xtenderThread-90 | DefaultComponent                 | ache.camel.impl.DefaultComponent   78 | Creating endpoint uri=[quartz://collector/btsclassifier/?cron=0+5%2C20%2C35%2C50+*+*+*+%3F+*], path=[collector/btsclassifier/], parameters=[{cron=0 5,20,35,50 * * * ? *}]
2010-03-21 20:31:27,231 | DEBUG | xtenderThread-90 | DefaultCamelContext              | e.camel.impl.DefaultCamelContext  428 | quartz://collector/btsclassifier/?cron=0+5%2C20%2C35%2C50+*+*+*+%3F+* converted to endpoint: Endpoint[quartz://collector/btsclassifier/?cron=0+5%2C20%2C35%2C50+*+*+*+%3F+*] by component: org.apache.camel.component.quartz.QuartzComponent@1e0e184


Thanks very much for any insight you might have.

cheers

Edited by: unwired on Mar 21, 2010 9:43 PM
davsclaus

Posts: 1,893
Registered: 10/14/08
Re: potential quartz/threading or esb bug
Posted: Mar 22, 2010 7:47 AM   in response to: unwired in response to: unwired
  Click to reply to this thread Reply
You can try upgrading Quartz, there is a new 1.7.3 release.
Also try with 100% separated group names.

I have improved the Camel shutdown logging in the next release so the stacktrace of that WARN will be logged as well.

You can enable event notifier to have it logged though in the current 2.2 release.
http://camel.apache.org/advanced-configuration-of-camelcontext-using-spring.html
unwired

Posts: 35
Registered: 08/04/09
Re: potential quartz/threading or esb bug
Posted: Mar 22, 2010 10:46 AM   in response to: davsclaus in response to: davsclaus
  Click to reply to this thread Reply
Thanks again for the quick response. :)

We will try the upgrade of the quartz component, but i'll see what version we're on as 1.7.3 seems to only be a daylight savings bug fix.

Also when you say you improved the logging are you referring to camel 2.3?

we will also explore the event notifier when we have time. I was trying to think of a way to use aop (through spring) to help us see what's happening but that might be easier.

Unfortunately I'm not sure we can do all these investigations right now as they'll take some time (i can only test properly in our sit env which is in use so i'll have to build another one). Even the group name test will take code changes sadly because we have framework code prepending that 'collector' (our group name). I'll check the priority with my manager as our workaround of always restarting all quartz and then monitoring for a while is a viable alternative at the moment.

if there's any way you have tests you can put in place to simply deploy two applications using quartz components and refresh or stop/start one and see if it affects the other that would be greatly appreciated. we'll update you once we complete our investigations.

thanks for the help!

Edited by: unwired on Mar 22, 2010 10:47 AM
unwired

Posts: 35
Registered: 08/04/09
Re: potential quartz/threading or esb bug
Posted: Jun 2, 2010 2:10 PM   in response to: davsclaus in response to: davsclaus
  Click to reply to this thread Reply
Hi,

We have taken time to investigate this as it has become quite cumbersome to deploy our quartz bundles. I pulled down the latest fuse-esb (4.2), quartz (1.8.0), camel-quartz (2.2.0 - didn't move to 2.3 but as per earlier that was just for better debugging).

I removed the quartz group and tested a very,very simple route which simple gets kicked off from cron every minute and then gets sent to a service which simply logs to a file. The result is very reproducible. You deploy the first quartz bundle; all is ok. Deploy the second, seems fine. If you redeploy the first bundle, the second bundle stops working. I am attaching a number of files as follows:

  • deployed_one_and_did_NOT_stop_two.txt - the log when you deploy a quartz bundle and it does not "stop" the other.

  • deployed_two_DID_stop_one.txt - the log when it DID stop the other bundle.

There are some obvious differences in these files. The thing that strikes me is that in the one that gets hosed, these lines appear:

22:23:21,504 | DEBUG | heduler_Worker-4 | SimpleThreadPool                 | pl.SimpleThreadPool$WorkerThread  570 | WorkerThread is shutting down
22:23:21,505 | DEBUG | heduler_Worker-5 | SimpleThreadPool                 | pl.SimpleThreadPool$WorkerThread  570 | WorkerThread is shutting down
22:23:21,506 | DEBUG | heduler_Worker-2 | SimpleThreadPool                 | pl.SimpleThreadPool$WorkerThread  570 | WorkerThread is shutting down
22:23:21,508 | DEBUG | heduler_Worker-3 | SimpleThreadPool                 | pl.SimpleThreadPool$WorkerThread  570 | WorkerThread is shutting down
22:23:21,509 | DEBUG | eduler_Worker-10 | SimpleThreadPool                 | pl.SimpleThreadPool$WorkerThread  570 | WorkerThread is shutting down
22:23:21,511 | DEBUG | heduler_Worker-1 | SimpleThreadPool                 | pl.SimpleThreadPool$WorkerThread  570 | WorkerThread is shutting down
22:23:21,516 | DEBUG | heduler_Worker-8 | SimpleThreadPool                 | pl.SimpleThreadPool$WorkerThread  570 | WorkerThread is shutting down
22:23:21,516 | DEBUG | heduler_Worker-9 | SimpleThreadPool                 | pl.SimpleThreadPool$WorkerThread  570 | WorkerThread is shutting down
22:23:21,517 | DEBUG | heduler_Worker-6 | SimpleThreadPool                 | pl.SimpleThreadPool$WorkerThread  570 | WorkerThread is shutting down
22:23:21,519 | DEBUG | heduler_Worker-7 | SimpleThreadPool                 | pl.SimpleThreadPool$WorkerThread  570 | WorkerThread is shutting down


This does NOT happen when the bundle is not stopped. Further, when the problem arises, you can see the worker thread is reset:

22:20:00,003 | INFO  | heduler_Worker-8 | QuartzOneService                 | artzone.service.QuartzOneService   18 | Within handle
 
22:20:00,006 | INFO  | heduler_Worker-9 | QuartzTwoService                 | artztwo.service.QuartzTwoService   18 | Within handle
 
22:21:00,002 | INFO  | eduler_Worker-10 | QuartzOneService                 | artzone.service.QuartzOneService   18 | Within handle
 
22:21:00,004 | INFO  | heduler_Worker-1 | QuartzTwoService                 | artztwo.service.QuartzTwoService   18 | Within handle
 
22:22:00,002 | INFO  | heduler_Worker-2 | QuartzOneService                 | artzone.service.QuartzOneService   18 | Within handle
 
22:22:00,004 | INFO  | heduler_Worker-3 | QuartzTwoService                 | artztwo.service.QuartzTwoService   18 | Within handle
 
22:23:00,002 | INFO  | heduler_Worker-4 | QuartzOneService                 | artzone.service.QuartzOneService   18 | Within handle
 
22:23:00,003 | INFO  | heduler_Worker-5 | QuartzTwoService                 | artztwo.service.QuartzTwoService   18 | Within handle


above... both were printing.. but now just the one prints bc QuartzOne stopped and the worker threads are reset.

22:24:00,009 | INFO  | heduler_Worker-1 | QuartzTwoService                 | artztwo.service.QuartzTwoService   18 | Within handle
22:25:00,002 | INFO  | heduler_Worker-2 | QuartzTwoService                 | artztwo.service.QuartzTwoService   18 | Within handle
22:26:00,002 | INFO  | heduler_Worker-3 | QuartzTwoService                 | artztwo.service.QuartzTwoService   18 | Within handle
22:27:00,002 | INFO  | heduler_Worker-4 | QuartzTwoService                 | artztwo.service.QuartzTwoService   18 | Within handle


I also attached a QUARTZ_TESTING.tar.gz which contains the two simple bundles i tested with. they won't build straight away because the pom uses a common dependencies i'm using that's internal, but at least you can see the code and spring config.

If you could test this locally that would be great because this is actually causing us a lot of pain at the moment.

Just let me know if I need to raise this as a bug elsewhere.

Thanks very much for the help.

cheers

Edited by: unwired on Jun 2, 2010 2:46 PM
davsclaus

Posts: 1,893
Registered: 10/14/08
Re: potential quartz/threading or esb bug
Posted: Jun 2, 2010 4:44 PM   in response to: unwired in response to: unwired
  Click to reply to this thread Reply
I had a look and it appears to be a Quartz issue in the light that it just returns a single shared instance of the scheduler.

I have created a ticket for Camel to improve on this
https://issues.apache.org/activemq/browse/CAMEL-2784
unwired

Posts: 35
Registered: 08/04/09
Re: potential quartz/threading or esb bug
Posted: Jun 2, 2010 10:57 PM   in response to: davsclaus in response to: davsclaus
  Click to reply to this thread Reply
Great thanks very much for the quick response. We look forward to the confirmation/resolution of the issue.

I'll follow the CAMEL-2784.

cheers
davsclaus

Posts: 1,893
Registered: 10/14/08
Re: potential quartz/threading or esb bug
Posted: Jun 3, 2010 5:55 AM   in response to: unwired in response to: unwired
  Click to reply to this thread Reply
Hi

I have committed a fix. Can you try it out on your system?
I have attached a new camel-quartz.jar on the JIRA ticket you can install in your osgi container.
unwired

Posts: 35
Registered: 08/04/09
Re: potential quartz/threading or esb bug
Posted: Jun 3, 2010 2:19 PM   in response to: davsclaus in response to: davsclaus
  Click to reply to this thread Reply
sweet!! i will check it out first thing in the morning (it's just past midnight here) once i'm back in at work. seriously appreciate the efforts and immediate response. will let you know as soon as i've tried it out..

thanks!!
unwired

Posts: 35
Registered: 08/04/09
Re: potential quartz/threading or esb bug
Posted: Jun 4, 2010 12:05 AM   in response to: davsclaus in response to: davsclaus
  Click to reply to this thread Reply
whoo hoo! :) preliminary tests look much better if not great. I ran my isolated, simple test and wasn't really able to reproduce the issue (once I restart the fuse server - updating the bundle alone didn't work for some reason).

However, i decided to aggressively deploy just for testing and copied my quartzone and quartztwo bundles in back and forth a number of times. The result produced errors in logs (see attached) and the osgi console said the bundles were active....

[ 179] [Active     ] [            ] [       ] [   60] ubo collector Quartzone (1.0.0)
[ 181] [Resolved   ] [            ] [       ] [   60] camel-quartz (2.2.0)
[ 182] [Active     ] [            ] [       ] [   60] ubo collector Quartztwo (1.0.0)
[ 183] [Active     ] [            ] [       ] [   60] camel-quartz (2.4.0.SNAPSHOT)


But the quartz actually stopped working. Even if I deployed the application again, it would report that it was active but the same error resulted in the log and it did not actually trigger. Perhaps this would never come up normally and there isn't a way to stop it? but just thought i'd highlight it.



Also, I modified the manifest to use 2.2 since we have not yet upgraded to 2.4 as required by the original jar. Let me know if you foresee any issues with that but all seemed to work fine.

I will move it to our sit environment and test more thoroughly and let you know within 1-2 days.

Thanks so much for the help!
davsclaus

Posts: 1,893
Registered: 10/14/08
Re: potential quartz/threading or esb bug
Posted: Jun 4, 2010 4:25 AM   in response to: unwired in response to: unwired
  Click to reply to this thread Reply
Thanks for testing it out.

I have attached a slightly modified JAR to the JIRA ticket trying to fix that potential debug issue from your log.

Would be cool if you could give that a test as well.
unwired

Posts: 35
Registered: 08/04/09
Re: potential quartz/threading or esb bug
Posted: Jun 5, 2010 7:48 AM   in response to: davsclaus in response to: davsclaus
  Click to reply to this thread Reply
Attachment deploy_probs_2.txt (881.4 KB)
Attachment deploy_probs_1.txt (890.1 KB)
Hi,

I tested out the latest version off CAMEL-2784 and it seems the deploy problem may still exist. The logs are very similar (see attached).

I diffed the original and revised jars and only the QuartzComponent class differed so hopefully i'm on the right version.

thanks heaps for the investigation

cheers
davsclaus

Posts: 1,893
Registered: 10/14/08
Re: potential quartz/threading or esb bug
Posted: Jun 6, 2010 7:09 AM   in response to: unwired in response to: unwired
  Click to reply to this thread Reply
Thanks for testing again.

Its definitely an interesting perspective in terms of deployment on OSGi. I really wonder if it would be much simpler on OSGi if it had a queue of deployment tasks and executed them serialized. Or was able to detect deployments which involves shared bundles and thus will be able to queue those deployment, so they are executed in serial order.

What happens from the logs is that one is shutting down, while another is starting, and they get interleaved.
unwired

Posts: 35
Registered: 08/04/09
Re: potential quartz/threading or esb bug
Posted: Jun 6, 2010 10:47 PM   in response to: davsclaus in response to: davsclaus
  Click to reply to this thread Reply
Thanks for the feedback.

Your explanation is interesting because we are actually experiencing issues on restarts of the server, even aside from the quartz issues we've raised. I agree that a serialized execution should sort this out; I can only imagine asynchronous loading would be used to try to save time bc osgi purports a fast startup.

Also, just a little feedback - in the wrong forum now as this is for servicemix -, but we also see issues simply deploying bundles. It is not uncommon at all to deploy a bundle, get an Invalid Bundle Context exception, which then seems to put the whole server in a bad "state". We then have to bounce the whole server - which prior to your solution for the quartz issue was a major pain.. but now at least we can do an easier restart.

We are planning to trial spring's dm server when we have a little time to see if some of these issues aren't resolved. I wonder if they do synchronous loading; I'm sure that's how they load spring beans but not sure if their osgi server follows suit.

Thanks for the feedback anyway. We'll plunge forward with your latest fix.

thanks for the help/feedback!

Edited by: unwired on Jun 6, 2010 10:47 PM