Skip to content

Microsoft 365 Was Throttling batch emails

The problem

A client was reporting that a small number of report notification emails were occasionally failing to deliver when a large batch job ran. Not all of them — most would arrive fine — but a handful would consistently go missing after high-volume dispatch events. The application logged the failures, but they were buried in noise and the pattern wasn't obvious until someone counted the missing notifications against the send volume and noticed the failures clustered tightly in time.

Symptoms

  • Intermittent email delivery failures, never on all emails — always a subset
  • Failures clustered immediately after batch report runs, not during low-volume periods
  • Affected emails logged as failed but not automatically retried
  • 25 emails succeeded in the same batch where 2 failed — failure rate roughly 5–10%
  • No SMTP authentication errors, no SPF/DKIM issues — the failing emails never even got through the connection handshake

The error in the application log was:

ERROR Error in SendEmail while sending email: 4.3.2 Concurrent connections limit exceeded.
      Visit https://aka.ms/concurrent_sending for more information.
      [Hostname=DM4PR18MB4126.namprd18.prod.outlook.com]

Diagnostic path

The 4.3.2 SMTP code is a temporary rejection — a soft bounce. Microsoft is not saying the email is bad; it's saying "too many connections right now, try later." The aka.ms/concurrent_sending link in the error message points directly to Microsoft's documentation on per-account connection throttling for Office 365.

My first step was to confirm this was genuinely a Microsoft-side throttle rather than an application misconfiguration. I queried the email log table in the database:

SELECT status, COUNT(*)
FROM email_log
WHERE sent_date >= TRUNC(SYSDATE)
GROUP BY status;

The counts matched exactly what the logs said — a small number of F (failed) records, the rest successful, all within the same few seconds.

SELECT * FROM email_log
WHERE status = 'F'
AND sent_date >= SYSDATE - 1
ORDER BY sent_date DESC;

All failed records had the same 4.3.2 error text and timestamps within a 2–3 second window. That's not a random delivery failure — that's a burst throttle being applied.

The application sends report emails in parallel when a batch job completes. When 20+ reports finish simultaneously, the email service opens that many concurrent SMTP connections to Office 365. Microsoft's per-account limit for concurrent SMTP connections is low (the exact number isn't published, but in practice you hit it with more than a handful of simultaneous connections from a single sending account). The excess connections get the 4.3.2 rejection.

What made this harder to spot earlier: the failed emails were logged but the retry mechanism was a separate manual job rather than automatic. So failed records just sat in the database, and nobody had set up alerting on email_log failure counts.

The fix

Three parts:

Immediate: Ran the failed-email resend job to recover the undelivered notifications from the current incident. Query to identify what needs resending:

SELECT * FROM email_log
WHERE status = 'F'
ORDER BY sent_date DESC
FETCH FIRST 20 ROWS ONLY;

Short-term: Throttled the application's parallel SMTP dispatch — instead of opening N connections simultaneously for a batch of N reports, introduced a concurrency cap so no more than 3–4 SMTP connections were open at any one time. This trades slightly slower batch completion for reliable delivery.

Longer-term: Flagged to the client's Microsoft 365 administrator to review the sending connector configuration and evaluate whether a request to Microsoft for a higher throttle limit was warranted, or whether switching to Microsoft Graph API for mail delivery (which has higher and more predictable limits) made sense for their volume.

Lesson

The 4.3.2 SMTP code is easy to miss because the email doesn't bounce back to the sender — it just fails silently in the application log. If you're running batch email delivery through Office 365, the concurrent connection limit will eventually bite you as volume grows. The fix is straightforward but you have to know to look for it. Monitoring email_log failure counts with an alert threshold is the defensive measure I'd put in place from day one now. Also: automatic retry on 4.3.2 (it is explicitly a temporary rejection) is the difference between "degraded delivery" and "silent data loss."