Troubleshooting Bad Time Bombs
There are times when Rulemanager will report a bad time_bomb in the system. Although it’s easy to just ignore when this happens, it’s better to diagnose the issue to understand (and resolve) the root cause.
This post will walk through an approach to troubleshooting bad time bombs.
Time Bombs
Recall that a time_bomb record is created when events happen in the system, such as a case being created or closed, workflow actions (dispatch, accept, assign, yank), etc. Rulemanager is responsible for picking up these time bombs, and comparing them against the defined business rules in the system, to see if something should happen, such as a notification.
Time Bombs Gone Bad
On occasion, Rulemanager will report a bad time_bomb. This will typically show up in the Rulemanager log file. It may also be reported via email, if your Rulemanager is so configured.
What’s nice is that rather than deleting the bad time bomb, Rulemanager will set its escalation data far into the future (in the year 2999). This allows us to still have access to the time bomb record so that we can troubleshoot the core issue.
Dovetail Rulemanager
Here’s an example from a Dovetail Rulemanager log file showing an invalid time bomb:
2012-11-01 14:58:20,400 [Producer] ERROR FChoice.RuleManager.ApplicationEventRepository - Application Event with ID 268444196 is invalid because it does not have a valid Focus Object: FocusTypeID: 5208, FocusTableName: communication, FocusID: 268435547 2012-11-01 14:58:20,408 [Producer] ERROR FChoice.RuleManager.ApplicationEventRepository - Invalid Application Event updated with future escalation time: ID: 268444196, Escalation Time: 1/1/2999 12:00:00 AM
Clarify Rulemanager
Here’s an example from a Clarify/Amdocs Rulemanager log file showing an invalid time bomb:
2012/10/22 12:15:43.980 Info Terse 1 0 ProcessJob JobID=216155882 2012/10/22 12:15:43.994 Info Verbose 2 0 Processing time bomb with id = 216155882 escalation time = 10/22/2012 12:15:41 2012/10/22 12:15:44.000 Critical 2 0 Erros ocurred while processing time bomb 2012/10/22 12:15:44.000 Critical 2 0 Reasons for error: Object of type communication and low oid 1d9fdf does not exist in the database. 2012/10/22 12:15:44.000 Critical 2 0 Information about time bomb 2012/10/22 12:15:44.000 Critical 2 0 Focus object type is 'Communication', Focus object objid is '1941471', Time bomb type is '2', Time bomb objid is '216155882', 2012/10/22 12:15:44.002 Info Terse 2 0 Resetting time for bad bomb to far in the future. 2012/10/22 12:15:44.002 Info Terse 2 0 Resetting bad bomb's escalate time in the database. 2012/10/22 12:15:44.015 Info Terse 2 0 Removing bad bomb from time bomb queue. 2012/10/22 12:15:44.016 Info Terse 2 0 DONE processing bad time bomb.
Reason
In both cases, the content of the problem is pretty clear. We have a time bomb that is for a Communication object that doesn’t exist.
Which brings us to a my number 1 tip for debugging time bomb issues: Enable Rulemanager logging at a verbose/debug level. This will give you lots of good information, and in many cases, the exact reason why the time bomb is bad.
View the time bomb record
Lets take a look at the time bomb record itself using SQL
select * from table_time_bomb where objid = 268444196
Notice that the time_bomb record exists, and its escalate_time has been set to January 1, 2999. Since Rulemanager only processes time bombs whose escalate_date has past, we have over 900 years to troubleshoot this issue before Rulemanager picks it up. Plenty of time.
Time Bomb Details: Focus Type and Focus Lowid
The focus_type is the schema id of the kind of object that the event occurred on. Case = 0, Contact = 45, Communication = 5208. Etc.
The focus_lowid is the objid of the object.
Lets do some verification of that data.
You can easily verify the focus_type using your schema file, or using BOLT.
OK, so we’ve verified that 5208 is a communication object. Good.
We can then use SQL to verify that the communication object exists:
select * from table_communication where objid = (select focus_lowid from table_time_bomb where objid = 268444196 )
This returns zero rows, so we’ve now confirmed what Rulemanager is telling us:
Object of type communication and objid 268435547 does not exist in the database.
Time Bomb Details: employee
The cmit_creator2employee should be a valid objid of an employee record.
Lets confirm that with SQL:
select from table_employee where objid = (select cmit_creator2employee from table_time_bomb where objid = 268444196 )
This returns one row, so that all looks good.
Time Bomb Details: time_period/act_entry
In most cases, the time_period will be a valid objid of an act_entry record.
I say most cases because it depends on the type of time_bomb (rule, non-rule, scheduling, etc.) But the majority of the time, you’ll be debugging rule scheduling time_bombs, so it will be the objid of an act_entry record.
Typically, Clarify will show these as Type 2 time bombs. As shown in the log file: Time bomb type is ‘2’
and Dovetail will show them as Application Events. As shown in the log file: ApplicationEventRepository – Application Event with ID 268444196
Lets confirm we have a valid act_entry with SQL:
select * from table_act_entry where objid = (select time_period from table_time_bomb where objid = 268444196 )
This returns one row, so that all looks good.
We can also verify that the act_entry2user is filled in correctly:
select * from table_user where objid = (select act_entry2user from table_act_entry where objid = (select time_period from table_time_bomb where objid = 268444196 ))
This also returns one row, which tells us that the relation from act_entry to user is valid.
Time Bomb Details: flags
The flags represent the type of event that occurred. It should be a valid number.
This one is a bit tricky to figure out.
If it’s a user-defined event, then the flags is calculated as: flags = (n * 65536) + 2
where n is the event ID of the entry that was added to the event list on form 472.
So if your event was USER1001, then the flags should be 65601538.
Or, to flip the equation, n = (flags – 2) / 65536
These formulas work for user-defined rule events. Baseline Clarify events don’t always follow this formula.
This is covered in more detail in the Advanced Business Rules (part 2) training presentation.
Back around to the initial problem at hand
What we now know:
- The time_bomb is related to a valid act_entry
- The time_bomb is related to a valid employee
- The flags of the time_bomb appear to be correct
- The act_entry is related to a valid user (and we know who that user is)
- The activity (Create Communication) is valid for the type of object (communication)
- The flags looks OK (it’s not zero)
- The focus_type is valid (communication)
- The focus_lowid is invalid (which confirms what Rulemanager logs told us)
So, why do we have a time_bomb and an act_entry for a communication object that doesn’t exist?
Lets take a closer look at the act_entry.
Lets look at the additional info of the act_entry:
select addnl_info from table_act_entry where objid = (select time_period from table_time_bomb where objid = 268444196 )
The additional info of the act_entry is: Communication created for : fred
Next, lets look at the activity name for this act_entry.
select title from table_gbst_elm, table_gbst_lst where rank = ( select act_code from table_act_entry where objid = (select time_period from table_time_bomb where objid = 268444196 )) and gbst_elm2gbst_lst = table_gbst_lst.objid and table_gbst_lst.title = 'Activity Name'
This returns: Create Communication
So, now we know that the activity that did this whole shebang was creating a communication. The additional info confirms this.
So, why do we have a time_bomb and an act_entry for creating a communication object, but the communication object doesn’t exist?
A couple options from here
1. We know who did this activity (as the act_entry is related to a valid user), so we can talk to that person, and see if they might provide some insight as to what they did.
2. Use our knowledge, and some experimenting, and see if we can replicate it.
I went with option #2. My initial thought was that the communication object was created, and then it was deleted somehow. Typically, you don’t delete things in Clarify. But, I started playing around with the Clarify client app, and figured out that I could actually delete a communication object.
Test
So, I then setup my test:
- Stop Rulemanager
- Create a communication and save it as a draft.
- Confirm that a Create Communication activity entry and time bomb were created
- Delete the communication using the Delete Draft button
- Confirm that the communication row has been deleted from the database
- Confirm that the act_entry and time_bomb records still exist in the database
- Start Rulemanager
- Observe Rulemanager logs – confirm that the original error occurs
And my test proved true! We’ve now reproduced the issue.
Final Step – fix the core issue
So now that we can reproduce the issue, our task is to fix the core issue.
In this particular case, it’s really a bug in the Clarify Classic Client. IMO, if you delete an object, you should delete its related records as well.
So if you delete a communication object, you should delete its related time_bomb records. ‘Cause if you don’t, you have problems (as we’re seeing here).
We can do this with a bit of ClearBasic code added to the Delete Draft button. Sample code is on github:https://gist.github.com/3987865
Other examples
Here’s a few other examples that I’ve seen over the years that cause Rulemanager to report errors
- Invalid Rule Properties. Such as paths that are invalid.
- Invalid focus_type. Such as leaving the focus_type as zero (which means Case) when it should be for a different object (such as 192 for Bug/Change Request)
- Business rule that calls a command line script/executable but that script/executable doesn’t exist.
Make sure you know when errors occur
When Rulemanager, for whatever reason, cannot handle a timebomb it changes the escalation time to be far into the future (January 1, 2999) Application administrators may be interested to know when this happens.
Typically, these errors just end up in log files. Rulemanager can be configured to notify someone (such as an administrator) when a time bomb cannot be processed.
Here’s how to set that up.
Dovetail Rulemanager:
http://support.dovetailsoftware.com/selfservice/solutions/show/380
Clarify Rulemanager on Windows:
Using the Configure Clarify Services application (SvcConfig.exe), set the Notify in Case of Problem to a list of email addresses, to which an email message is forwarded if problems
occur.
Clarify Rulemanager on UNIX:
In the rulemansvc.env file, set the NOTIFY_LIST to a list of email addresses, separated by semicolons, to which an email message is forwarded if problems occur.
Summary
Unfortunately, almost every bad time_bomb issue will be slightly different. What’s more important here is the approach, as opposed to the specifics.
Start with the Rulemanager log files! Normally they provide good information that will put you on the right path. Make sure you’re running at the highest log level (Verbose / Debug).
Investigate each of the important fields of the time_bomb record, including its relations.
Investigate the related act_entry, and its related user and related activity code.
If this is a custom (user-defined) time_bomb, review your code that is creating these. It’s not uncommon for developers to create time_bombs that are invalid (because it can be tricky).
Finally, rely on your application support provider. That’s what you pay yearly maintenance fees for.
Oh, and as I mentioned earlier, the Rulemanager Companion Wiki is a great resource for all things related to Rulemanager and Business Rules.
Hope this helps.
Rock on.