BlackBerry Forums Support Community
              

Closed Thread
 
Thread Tools
Old 05-20-2008, 10:48 AM   #1
TJefferies
Knows Where the Search Button Is
 
Join Date: Mar 2007
Model: 8310
Carrier: AT&T
Posts: 21
Question 30-60 Minute Delay on Push E-mails

Please Login to Remove!

OK, so I'm not entirely a BES noob, but I've reached the limit of what I know to troubleshoot in our situation. Let me first explain what the symptoms are and then I'll provide more information via the MAGT logs, etc.

For approximately two weeks, e-mail receipt on Blackberry handhelds has been delayed by anywhere from 15 minutes to 60 minutes sporadically. Sometimes e-mail is immediately delivered to our devices and it pops in prior to our seeing it in Outlook, other times it doesn't.
It doesn't queue in chunks, as it would if it was falling back on the 15 minute polling interval if our exchange integration wasn't looking -- it pushes single e-mails at a time, anywhere from 15 minutes late to 60 minutes late.

We've spent a LOT of time on the phone with RIM, they had us replace our service account with a new one, make sure the permissions were set, make sure the CDO.DLL is proper (which it is), as wells as a mess of other vaguely entertaining things, but the end result is still the same -- sporadic losses of connectivity.

We've determined that if we re-start the Blackberry Router service, approximately 45-60 minutes later (after a rescan of our mailboxes), the messages all route properly and immediately -- but after several hours, the slowdown occurs again.

I'd like to state that this server has been active for a fairly extended period of time and WAS previously working 100% perfectly (prior to the 4.1.whatever SP5 update we did).

The service interruption or delay in BES directly correlates in the MAGT logs to the following events:

[30181] (05/20 11:29:52.340):{0x1C40} Performing system health check (BlackBerry Mailbox Agent 1 - BESX Version 4.1.5.40)
[30399] (05/20 11:29:52.374):{0x1C40} Worker Thread: *** Busy Working *** Thread Id=0x24A8, Handle=0x10F0, BusyCount=4, WorkingTime=45 min, LastActivity=0 min, Event: RELOAD_FLDRS, User: ****@****.com, Server: (exchange server name replaced), Activity: Starting
[30399] (05/20 11:29:52.374):{0x1C40} Worker Thread: *** Busy Working *** Thread Id=0x2628, Handle=0x1130, BusyCount=4, WorkingTime=45 min, LastActivity=0 min, Event: RELOAD_FLDRS, User: ****@****.com, Server: (exchange server name replaced), Activity: Starting

These are created for nearly every user who is having issues with receiving slow e-mails. Users that we specifically assign agents to have no issues receiving e-mails during these periods. This leads me to believe that the agents are choking while doing the health check on the mailbox. What I find surprising (which may or may not be to you experienced BES gurus out there) is that these workers show that they've been working for as long as 45 and 50 minutes, but show the LastActivity as 0 minutes.... I know my server didn't just determine it wanted to be slow.... Or at least, I think it didn't? Perfmon/etc. doesn't show an increase in disk activity or cpu usage, there are no queued reads from the disk... Our ping time to our exchange server is <1ms 99/100 times and is 1ms the other 1 time...

Any ideas what/where/who/how to figure this one out?

Thanks in advance for any help!
__________________
8700c -> 8800 -> 8310
Offline  
Old 05-20-2008, 09:54 PM   #2
Aroc
CrackBerry Addict
 
Join Date: Jul 2005
Location: Solon, OH, USA
Model: 9000
OS: 4.6.0.167
PIN: 20878533
Carrier: ATT
Posts: 708
Default

That's odd that RIM wasn't of more help. I assume you've already looked through the Windows Event logs and other easy test like disabling any A/V software (temporarily). If it were me, I'd get another box, get a temp BES SRP ID, install BES on it then move over one user (and perhpas a second) to see if the symptom still persist for that user. Even though it doesn't draw the best defined fence troubleshootingwise, it should give you an indication if there is just some simply unknown odd issue with you BES box.

Maybe time to roll-back to BES 4.1.4?

This reminds me of a similar problem I had with the attachment service and later memory leaks with nBES.exe on Domino. In both cases I had to wait for RIM to fix it.

I don't have any suggestions RE: correlation to the agents or mail store health checks since we're a Domino shop.
__________________
--
Domino 7.0.4FP1 | BES 4.1.6 MR-7 | 42 handhelds
Offline  
Old 05-21-2008, 08:11 AM   #3
TJefferies
Knows Where the Search Button Is
 
Join Date: Mar 2007
Model: 8310
Carrier: AT&T
Posts: 21
Default

I'm not entirely certain why RIM wasn't of much help. They seemed to be under the belief that after we restarted the Router service and it rescanned our mailboxes that everything was working as it should... We ended up calling them back 3 times that day, once every 2 or 3 hours when it occurred, and they still wouldn't correlate the repeated failures as part of a bigger issue.

Could be that we got a bad tech on a bad day, not entirely certain. We figured after 4 hours of being told to restart the services and check the permissions on our BES Administrator account we could just as easily tell ourselves to do that. If the problem persists, we'll end up calling back and hopefully get someone with more patience for this type of issue.

We have looked through the event log and there are a few 'Failed to construct PIM Information for memo such and such', but we're talking one or two errors over a 24 hour period, so I don't believe those are causing it.. Doesn't mean they're not, but the timing of them doesn't correlate by any way we can determine.

We'll be taking action on the Exchange stores this weekend to check to make sure that all of the stores are fully consistent. I've been reading that heavy load on the Exchange server or failing databases can cause similar issues. We run nightly backups and do fairly regular consistency checks, and everything has come back clean -- (we've done one since the slowdown started occurring, with nothing but clean results) but I feel at this point if the BES server really thinks it's doing what it needs to, it could be related to slow response to RPC calls by the Exchange server... I'll start doing more monitoring to see the avg. disk queue and see if I'm having intermittent hardware problems on that server... Anything important comes up I'll post an update, and if I figure it out, I'll definitely post the solution..

Thanks!
__________________
8700c -> 8800 -> 8310
Offline  
Old 05-22-2008, 01:05 PM   #4
RichieRichR6
Thumbs Must Hurt
 
Join Date: Oct 2007
Model: 9700
PIN: N/A
Carrier: ATT
Posts: 52
Default

I am seeing the same thing in my environment now for the last 2 month. Its killing me. I've tried moving from MSDE to SQL....no luck. Installed a new BES and performed a cutover.....still no luck. Messages seem to be picked up on rescan. Perhaps there is an issue with UDP.

Called RIM about this and they think it could be a Exchange or Network issue. I will check that next.

Man, I can't tell you how many Coronas I've put down thinking about this.
Offline  
Old 05-22-2008, 03:24 PM   #5
knottyrope
BlackBerry Elite
 
knottyrope's Avatar
 
Join Date: Jan 2008
Location: Massachusetts
Model: DT60
OS: 123456789
PIN: t of blood has been taken
Carrier: AT&T-US with I dee ten tee errors
Posts: 7,325
Arrow

Ever run the Microsoft Exchange troubleshooting assisstant?

That can give you some info on what exchange is doing.
Run it during peak hours to see where bottle neck might be if its exchange related.

Download details: Exchange Troubleshooting Assistant
__________________
I had to fall
To lose it all
But in the end
It doesn't even matter

Rocking the Motion with out lotion.

Last edited by knottyrope; 05-22-2008 at 03:27 PM.. Reason: speeling cheek
Offline  
Old 05-27-2008, 08:18 AM   #6
SoUnCool
Talking BlackBerry Encyclopedia
 
Join Date: Feb 2007
Location: Toronto
Model: 9800
Carrier: Rogers
Posts: 319
Default

check your BES and SRP connection if it is droping ?
on your firewall were there new rules to block or allow certain traffic ? I would create an exception for BES on firewall to let go all the traffic in and out, just for testing

also mapi32.dll is it same on all exchange and BES servers ?
Offline  
Old 05-27-2008, 08:26 AM   #7
Keyscan
Thumbs Must Hurt
 
Keyscan's Avatar
 
Join Date: Aug 2007
Model: 8800
PIN: N/A
Carrier: Rogers
Posts: 140
Default

Quote:
Originally Posted by TJefferies View Post
I'm not entirely certain why RIM wasn't of much help. They seemed to be under the belief that after we restarted the Router service and it rescanned our mailboxes that everything was working as it should... We ended up calling them back 3 times that day, once every 2 or 3 hours when it occurred, and they still wouldn't correlate the repeated failures as part of a bigger issue.

Could be that we got a bad tech on a bad day, not entirely certain. We figured after 4 hours of being told to restart the services and check the permissions on our BES Administrator account we could just as easily tell ourselves to do that. If the problem persists, we'll end up calling back and hopefully get someone with more patience for this type of issue.

We have looked through the event log and there are a few 'Failed to construct PIM Information for memo such and such', but we're talking one or two errors over a 24 hour period, so I don't believe those are causing it.. Doesn't mean they're not, but the timing of them doesn't correlate by any way we can determine.

We'll be taking action on the Exchange stores this weekend to check to make sure that all of the stores are fully consistent. I've been reading that heavy load on the Exchange server or failing databases can cause similar issues. We run nightly backups and do fairly regular consistency checks, and everything has come back clean -- (we've done one since the slowdown started occurring, with nothing but clean results) but I feel at this point if the BES server really thinks it's doing what it needs to, it could be related to slow response to RPC calls by the Exchange server... I'll start doing more monitoring to see the avg. disk queue and see if I'm having intermittent hardware problems on that server... Anything important comes up I'll post an update, and if I figure it out, I'll definitely post the solution..

Thanks!
That is a good next step. A lot of times these issues come down to RPC or disk i/o latency on your exchange server. Running some performance checks should give you a better idea. As for RIM, must've caught a few bad people on a bad day. Most seem to be very helpful.
__________________
BES 4.1.4 - Exchange 2003
8800 and my trusty 8700r.
To change your PIN to FFFFFFFF, drop the BB in a lake.
Offline  
Old 05-27-2008, 08:44 AM   #8
SoUnCool
Talking BlackBerry Encyclopedia
 
Join Date: Feb 2007
Location: Toronto
Model: 9800
Carrier: Rogers
Posts: 319
Default

See if you can get boxtone or concievium trial version running, both has nice tools to show latency in BES traffic
Offline  
Old 05-27-2008, 01:48 PM   #9
RichieRichR6
Thumbs Must Hurt
 
Join Date: Oct 2007
Model: 9700
PIN: N/A
Carrier: ATT
Posts: 52
Default

I've downloaded the Resource Kit tool and found out that most of the message are being picked up during rescans. Looks like the Messenging Agent is getting overworked. Research...
Offline  
Old 05-27-2008, 01:49 PM   #10
SoUnCool
Talking BlackBerry Encyclopedia
 
Join Date: Feb 2007
Location: Toronto
Model: 9800
Carrier: Rogers
Posts: 319
Default

in that case check your disk space, disk I/O and memory usage , on BES box
Offline  
Old 06-11-2008, 08:00 AM   #11
TJefferies
Knows Where the Search Button Is
 
Join Date: Mar 2007
Model: 8310
Carrier: AT&T
Posts: 21
Default

Quote:
Originally Posted by RichieRichR6 View Post
I've downloaded the Resource Kit tool and found out that most of the message are being picked up during rescans. Looks like the Messenging Agent is getting overworked. Research...
I'm in the same boat. It's showing heavy load on the Messaging Agent, but all messages are being properly picked up during the rescan.... I've also noted that when the device stops routing (or starts delaying routing) I get a huge chunk of Pending Data Packets in the BB Manager that never go down.

This also brings me to another question -- clearing the statistics makes the Pending packets drop down to 0 - and I've had issues before where things didn't work until I did that. Can anyone tell me definitively if clearing the statistics actually clears any pending data packets? I've noted that sometimes if I just clear the statistics, BB will start routing again..... Personally, that doesn't make any sense to me!
__________________
8700c -> 8800 -> 8310
Offline  
Old 06-11-2008, 08:23 AM   #12
Cryotic
New Member
 
Join Date: Apr 2008
Model: 8900
PIN: N/A
Carrier: KPN
Posts: 6
Default

Are you seeing error events 9646 on either the BES or Exchange servers?
Offline  
Old 06-11-2008, 09:01 PM   #13
hdawg
BlackBerry Genius
 
hdawg's Avatar
 
Join Date: Aug 2006
Model: hdawg
PIN: port3101.org
Carrier: hdawg
Posts: 6,632
Default

If all of your messages are being picked up by rescan BES either isn't receiving UDP notifications from Exchange, or BES is too heavily loaded to process the notifications ... usually not the latter.

Check out this post http://www.blackberryforums.com/bes-...tml#post643263 ... and look at your Exchange performance.

If you don't want to do 1 minute perfmon counters do 3 or 5 minute ... but run them for a week at least.
Offline  
Old 07-15-2008, 07:57 AM   #14
hdawg
BlackBerry Genius
 
hdawg's Avatar
 
Join Date: Aug 2006
Model: hdawg
PIN: port3101.org
Carrier: hdawg
Posts: 6,632
Default

Take a look at the post about performance issues too.
Offline  
Old 08-26-2008, 11:14 AM   #15
TJefferies
Knows Where the Search Button Is
 
Join Date: Mar 2007
Model: 8310
Carrier: AT&T
Posts: 21
Default

Thank you for the replies -- it looks like I'm dealing with an issue where my Exchange server is the bottleneck now.... Thanks again for the info, I'll post more once everything is resolved!
__________________
8700c -> 8800 -> 8310
Offline  
Old 08-26-2008, 02:00 PM   #16
H.Nayl
Thumbs Must Hurt
 
Join Date: Dec 2005
Location: Union NJ
Model: 8700
Carrier: TMobile
Posts: 59
Default

Disk I/O, Disk I/O, Disk I/O...I went through this same issue for months where we just didn't have enought IOPS per user for the BES to effectively receive mapi calls to and from our Exchange environment.

It took a split of our exchange servers onto a seperate SAN to bring things back under control. BES hates it when your physical disk avg disk reads/sec and writes/sec exceed 15-20ms.....constantly. Spikes every once in a while are ok, but if you're constantly seeing your disks at this range you will see messaging delays. Higher than that consistently will lead to those "busy" threads you've been seeing and when it starts to skyrocket (25ms and up) you will start to see the non-responsive "hung threads" which will delay everyone on that particular mail agent.

BES is incredibly I/O hungry. BES 4.0 and up helped a bit with making things such as a folder reloads only produce "busy" threads if disk I/O was high where as versions prior to 4.0 would hang those folder reload threads outright.

We had 3 backend Exchange box each housing about 1400-1800 users a piece...and the LUNS for each server that housed the the databases were completely choking. Since then we've moved half of each server's population to 3 new Exchange boxes on their own SAN/LUNs. I've seen that the sweet spot for Avg disk reads/write per second anywhere from 4-10ms acceptable for messages to appear in OL and simutaneously hit a BB.

Last edited by H.Nayl; 08-26-2008 at 02:08 PM..
Offline  
Closed Thread



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


Vintage MAC Knife Japan 4.75

Vintage MAC Knife Japan 4.75" Folding Lock Blade Knife Chef Pocket Knife Utility

$224.99



Vintage Catalog 1985 MAC Tools Product MT86CAT Washington Court House Ohio picture

Vintage Catalog 1985 MAC Tools Product MT86CAT Washington Court House Ohio

$19.99



Vintage VTG A. W. Mack 122387 Large Industrial Fuse Puller 100 Amp - 600 Amp picture

Vintage VTG A. W. Mack 122387 Large Industrial Fuse Puller 100 Amp - 600 Amp

$104.99



Vintage Mac Tools AW343 Series 1/2 Pneumatic Impact Driver  picture

Vintage Mac Tools AW343 Series 1/2 Pneumatic Impact Driver

$40.00



Big Mac One Piece Work Coveralls Short-sleeve vtg mechanic trucker M/L 40-42 picture

Big Mac One Piece Work Coveralls Short-sleeve vtg mechanic trucker M/L 40-42

$72.78



Vintage UNHOLTZ-DICKIE MAC-6C Equipment - Untested As-is picture

Vintage UNHOLTZ-DICKIE MAC-6C Equipment - Untested As-is

$71.99







Copyright © 2004-2016 BlackBerryForums.com.
The names RIM © and BlackBerry © are registered Trademarks of BlackBerry Inc.