Salesforce Integration Architecture: A Practitioner's Guide to Choosing the Right Pattern

Architect's Playbook

Salesforce Integration Architecture: A Practitioner's Guide to Choosing the Right Pattern

The decision logic experienced architects use when a stakeholder describes a requirement and waits to see how you reason through real failure modes and latency.

Every Salesforce architect eventually hits the same wall: the data model is clean, the automation is solid, the UI is polished — and then someone asks, "How does this talk to SAP?" or "What happens when the payment gateway is down?" That's the moment architecture stops being theoretical. Integration is where design decisions get tested against real systems, real latency, and real failure modes.

This post is a working framework for thinking through Salesforce integration architecture — not a tool-by-tool feature list, but the decision logic experienced architects actually use when a stakeholder describes a requirement and waits to see how you reason through it.

Why Integration Architecture Is Different From Application Architecture

Inside Salesforce, you control the runtime. Governor limits are documented, behavior is consistent, and you can reason about the whole system. The moment you integrate with an external system, you lose that control. The other system has its own uptime, its own rate limits, its own failure modes, and usually its own opinions about data formats.

💡 Core Principle

Good integration architecture isn't about picking the "best" technology — it's about explicitly designing for the fact that you don't control both ends.

That reframing matters because it changes the first question you should ask in any integration scenario. Not "REST or SOAP?" but "What happens when this fails, and who needs to know?"

The Core Integration Patterns

Most integration requirements map to one of six patterns. Knowing the names is table stakes; knowing which trade-offs each one forces on you is what signals architectural maturity.

1. Request-Reply (Synchronous)

The caller sends a request and blocks until it gets a response. Simple to reason about, but it couples the caller's transaction to the callee's availability and latency. On the Salesforce side, this is an Apex callout or a consumer hitting the REST API directly. The hidden cost: synchronous callouts inside a Salesforce transaction count against callout limits and timeout thresholds, so a slow external system becomes your user's slow page load.

2. Fire-and-Forget (Asynchronous)

The caller publishes and moves on, with no expectation of an immediate response. This is the right default whenever the caller doesn't need to wait — a Salesforce platform event published to notify a downstream system, or a Queueable Apex job firing a callout in the background. It decouples timing, but it introduces a new problem you must design for explicitly: how does the caller know the operation eventually succeeded or failed?

3. Batch Data Synchronization

Large volumes moved on a schedule rather than in real time — nightly product catalog syncs, historical data loads, bulk reconciliation jobs. Bulk API and Batch Apex exist precisely for this. The trade-off is staleness: if the business says "near real-time," batch is the wrong answer regardless of how convenient it is to build.

4. Remote Call-In

An external system initiates a call into Salesforce rather than the other way around — a mobile app, a customer portal, or another internal system invoking Salesforce's REST or SOAP API. The architectural burden here shifts to authentication, rate limiting awareness on the Salesforce side, and making sure the exposed API contract is something you're willing to support long-term, because external consumers will build dependencies on it.

5. Publish-Subscribe (Event-Driven)

One publisher, multiple independent subscribers, fully decoupled. Platform Events and Change Data Capture are Salesforce's native mechanisms here. This pattern shines when several systems need to react to the same business event — an order being placed might need to trigger fulfillment, notify finance, and update a loyalty system, all independently, all without the order-creation process knowing or caring who's listening.

6. Data Virtualization (The Overlooked Option)

Instead of moving data at all, you expose external data through Salesforce as if it were native — using Salesforce Connect with OData adapters, a custom Apex Connector Framework adapter, or the Cross-Org Adapter. The data never leaves its source system; it's queried live. This is the right call when replication is expensive (massive datasets that blow past storage limits) or when staleness is unacceptable. The trade-off: performance is bounded by the external system's response time, and if it goes down, the data vanishes from the UI.

"A genuinely strong integration design rarely uses just one of these. A real architecture might use data virtualization for high-volume legacy billing data, pub/sub for order-status changes, and batch sync for nightly catalog updates — all within the same org."

Choosing Among Salesforce's Event-Driven Mechanisms

Once you've decided an event-driven approach fits, Salesforce gives you several tools that look similar on the surface but solve different problems.

  • Platform Events: Custom-defined events you design yourself. You control the schema, making them ideal when you're modeling a specific business event (OrderShipped, InventoryThresholdBreached) that needs to fan out to multiple consumers.
  • Change Data Capture (CDC): Tracks changes to actual Salesforce records (inserts, updates, deletes) automatically. Use this when the requirement is genuinely "tell me when this record changes." Reaching for Platform Events to replicate what CDC gives you for free is a common over-engineering mistake.
  • Streaming API (PushTopics): The older mechanism, largely superseded by CDC and Platform Events for new builds, but it still shows up in legacy orgs.
  • Outbound Messages: Declarative, SOAP-based, triggered by workflow rules or flows. They're rarely the right choice for new architecture, but tearing them out blindly can break production environments quietly.
Architecture Rule of Thumb: "Am I modeling a business event, or do I need to react to data changes on an object I don't control the schema design intent for?" That single question routes you to the right tool most of the time.

API Design: Picking the Right Interface

REST, SOAP, Bulk API, and GraphQL aren't competing standards so much as tools tuned for different shapes of problems:

API Type Best Used For
REST Default choice for modern CRUD-style operations and lightweight external system integrations.
SOAP Enterprise environments with legacy ERPs, old middleware, or entrenched finance/healthcare systems.
Bulk API High-volume processing (thousands of records+). Bypasses REST callout overhead via batching.
GraphQL Client applications needing specific fields across nested related objects in a single round trip to prevent over-fetching.

Middleware vs. Point-to-Point: Earning the Complexity

Point-to-point integration — Salesforce talking directly to one other system — is simpler to build, cheaper to license, and easier to debug because there's nothing between the two endpoints to inspect. It's also the source of the classic integration mess.

The trap to avoid in an interview is defaulting to "we'd use MuleSoft" for every scenario, because that's a reflex, not a design decision. A two-system integration with simple, stable data shapes — say, a single nightly sync between Salesforce and one well-behaved SaaS tool with a clean REST API — is often genuinely better served by a direct, well-built point-to-point connection using Named Credentials and a scheduled Apex job. Introducing middleware here adds licensing cost, an additional hop of latency, and another system that can go down — and now you're debugging across two platforms instead of one.

"I'd start point-to-point if it's genuinely two systems with simple, stable contracts. I'd introduce middleware the moment a third system enters the picture, or the moment transformation complexity starts living in Apex classes that are really doing an integration platform's job."

That single phrasing demonstrates you're not pattern-matching to a tool — you're reasoning about when complexity is worth paying for. It's also worth noting that many real organizations run a **hybrid approach**: middleware for the complex, multi-system, governed integrations, and lightweight point-to-point connections for simple, isolated ones — because forcing every integration through the iPaaS layer "for consistency" often just adds latency and cost to cases that didn't need it.

Designing for Failure, Not Just for Success

This is the area where junior designs and architect-level designs diverge most visibly. A design that only handles the happy path isn't an architecture — it's a demo. Distributed systems fail in specific, recurring ways, and a mature integration design names each failure mode and answers it rather than hoping it doesn't happen.

1. Idempotency

If a message gets delivered twice — and in distributed systems, it eventually will, whether from a network timeout that triggers a client-side retry or a middleware redelivery after an ack was lost — does processing it twice cause a duplicate order, a double charge, a corrupted record?

The fix is to design operations to be safely repeatable: use a natural or external key (an order number from the source system, not a Salesforce-generated Id) to upsert rather than insert, or maintain an explicit "processed message IDs" log that's checked before any side effect runs. This needs to be a first-class part of the data model, not a patch added after the first duplicate-order incident in production.

2. Retry Strategy with Backoff

A failed callout shouldn't be retried instantly and repeatedly — that's how a brief, recoverable outage on the other side turns into a self-inflicted denial-of-service against an already-struggling system. Exponential backoff (retry after 1s, then 2s, 4s, 8s, with a sensible cap) combined with a maximum retry count gives the external system room to recover.

In Apex, this typically means chaining Queueable jobs with a stored retry count and a calculated delay, rather than looping synchronously inside one transaction. For truly critical paths, a circuit-breaker pattern is worth knowing by name: after N consecutive failures, stop calling the failing system entirely for a cooldown period instead of continuing to hammer it, and surface that state so monitoring can alert a human.

3. Compensating Actions for Multi-Step Failures

Some integrations aren't single calls — they're a sequence (charge the payment, then create the shipment, then update loyalty points), and if step three fails after steps one and two succeeded, you can't just "retry the whole thing" without risking a double charge. This is where the saga pattern becomes relevant: design an explicit compensating action for each step (refund the payment, cancel the shipment) so a partial failure can be unwound cleanly instead of leaving the business in an inconsistent state that someone discovers manually three days later.

4. Dead-Letter Handling

When something fails permanently — malformed data, a contract mismatch, a referenced record that no longer exists — where does it go? In Salesforce-side designs, this is often a custom "Integration Error Log" object (or a Big Object if volume is high) capturing the payload, the error, the timestamp, and a status field for manual reprocessing. Silently dropping a failed message, or worse, swallowing the exception with an empty catch block, is a production incident waiting to surface weeks later as "why is this customer's order missing," with no audit trail to explain why.

5. Monitoring and Alerting

Someone needs to know an integration is failing before a business user notices the symptom. This is the most neglected piece of the design, and the one separating integrations that survive in production from ones that quietly degrade for months. Practically, this means error counts and failure rates visible on a dashboard, a Platform Event or scheduled Apex check that pages someone (via Slack, email, or an incident tool) when failure rate crosses a threshold.

For Shield or Event Monitoring customers, this means using that tooling to catch anomalies in API usage patterns that might indicate an upstream problem before it cascades. The architectural point to make explicitly in an interview: error handling isn't a try-catch block, it's a feedback loop that has to reach a human in time to act.

Security: The Part That Can't Be an Afterthought

Modern Salesforce integration security has matured significantly with the introduction of External Credentials alongside the older Named Credentials model. The split matters architecturally: a Named Credential now primarily defines the endpoint and request behavior (timeout, headers), while an External Credential holds the authentication configuration separately.

That separation means one External Credential can be reused across multiple Named Credentials and integrations, and rotating a secret or changing an auth parameter doesn't require touching every callout that depends on it — a meaningful operational win over the old model where credentials and endpoint were bundled together.

Choosing the right OAuth flow is one of the questions an interviewer will probe hardest, because it reveals whether you understand who is actually performing the action:

• JWT Bearer Flow: Fits true server-to-server integration with no user in the loop — a nightly batch job, a backend service calling Salesforce on its own behalf. No interactive login, no refresh token to manage, just a signed assertion validated against a pre-registered certificate.
• Client Credentials Flow: The modern, simpler choice for service-to-service integration when you don't need to model a specific running user's permissions — Salesforce treats the call as coming from a designated integration user.
• Web Server (Authorization Code) Flow: Belongs wherever a human is actually authorizing the connection and actions should be attributed to that person — a customer logging into a portal that then acts on their behalf in Salesforce, for instance.
• Username-Password Flow (Avoid): Still technically exists but should generally be avoided in new designs — it requires storing a user's actual credentials, has no natural credential rotation story, and is exactly the kind of legacy pattern that shows up in a security audit with a red flag next to it.

Beyond the OAuth flow, a few more pieces round out a genuinely defensible design:

  • Connected App Scoping: Should follow least privilege — an integration that only needs to read Account data shouldn't be issued a scope that also allows modifying user permissions, because the blast radius of a leaked token should be as small as the integration's actual job.
  • Rotation Management: Certificate and secret rotation needs an owner and a calendar reminder, not just a "we'll deal with it when it breaks" assumption — a certificate expiring silently eighteen months after go-live is one of the most common real-world integration outages.
  • Network Layer Security: IP allowlisting / Trusted IP Ranges add a second layer so a leaked token alone isn't sufficient to call in from anywhere. Encryption in transit (TLS, non-negotiable) and encryption at rest (Shield Platform Encryption) are separate decisions — don't assume one implies the other.
🛡️ Senior Architectural Signal

Is the integration over-scoped on data? If an external system's API returns the full customer record when the integration only needs three fields, that's not just inefficient — it's unnecessary exposure of PII to a system or log file that didn't need to see it. Naming that as a deliberate scoping decision signals real security maturity rather than checkbox compliance.

Pulling this together: security in integration architecture is rarely about picking the "most secure" option in the abstract. It's about matching the authentication model to who or what is actually performing the action, scoping access to exactly what's needed, and building rotation and monitoring into the design from day one instead of treating them as an operational afterthought discovered during an audit.

Governor Limits Aren't a Constraint — They're a Design Input

Treating Salesforce's limits as an obstacle to work around is a junior framing. An architect treats them as a known constraint that shapes the design from the start — the same way a civil engineer treats the load-bearing capacity of steel as an input, not an annoyance. A few specific limits drive most integration design decisions in practice:

Per-Transaction Ceilings & Bulkification: A single transaction allows up to 100 callouts, with a cumulative callout timeout ceiling across that transaction — which means looping through 500 records and firing one callout each, inside a trigger, isn't a "we'll optimize it later" problem, it's a design that cannot work at scale. The fix is bulkification at the integration layer: batch multiple records into a single payload and make one callout instead of many.

Execution Context Boundaries: Synchronous Apex also can't make callouts after DML in the same transaction without setSavepoint/ordering complications, and long-running external calls simply don't belong inside a trigger context at all. If latency is unpredictable, the call needs to happen asynchronously, via Queueable Apex, a Platform Event publish that's handled by an async consumer, or a middleware-mediated flow.

"I keep synchronous callouts out of trigger context by design, not as an afterthought, because a trigger is the worst place to absorb unpredictable external latency."

Volume & Allocation Caps: Daily API request limits mean a high-frequency polling integration can quietly consume an org's daily allocation before anyone notices — which is itself an argument for event-driven design over polling. Platform Events have daily publish allocations that differ between standard and high-volume types, and assuming "events are free" can hit a wall in high-throughput scenarios. Furthermore, Async Apex (Queueable, Batch, Future) has its own ceiling on jobs per 24-hour period. A naive retry-forever loop can eat into that allocation and start blocking legitimate async processing elsewhere in the org.

The architectural response to all of this isn't just "know the numbers" — it's building monitoring that catches approaching limits before they're hit, using tools like the Limits Apex class defensively inside code, Salesforce Optimizer for org-level visibility, and Event Monitoring for usage trends over time. When an org genuinely outgrows single-org limits at scale, that's the point where a multi-org strategy or a heavier reliance on middleware to absorb volume outside Salesforce entirely becomes a legitimate architectural conversation. The mark of a senior answer here is treating limits as a sizing input from the first conversation with the business.

Bringing It Together: A Worked Example

Scenario: A retailer wants Salesforce Order records to reflect inventory levels from a warehouse management system (WMS) "as close to real-time as possible."

A strong architectural answer doesn't open with a tool name. It queries constraints: What's the actual latency tolerance? What's the event volume?

• If volume is high and the WMS can publish events, route changes via message queues into Platform Events or the Pub/Sub API to decouple the apps.
• If the WMS cannot publish, fall back to a polling pattern (acknowledging the latency trade-off).
• If the volume is massive and the data only needs to be viewed passively on an Order page, look into Data Virtualization via Salesforce Connect to eliminate data replication costs entirely.

The Architect's Checklist

Evaluate your next design against these essential architectural checkpoints:

[] What's the actual latency requirement, stated in concrete terms — not "real-time" as a buzzword?
[] What's the volume profile, and does it change the calculus (e.g., Bulk API instead of REST)?
[] Does this need to move data, or can it be virtualized instead?
[] What happens when the other system is down — and does the business understand that trade-off?
[] Is this idempotent? What happens on duplicate delivery?
[] Who gets alerted when this fails, and how long until they notice?
[] Does the authentication model match who's actually performing the action?
[] Is this synchronous logic accidentally living inside a transaction it doesn't belong in?

Integration architecture isn't a checklist to memorize — it's a habit of asking these questions before reaching for a tool. The architecture is the reasoning that connects a business constraint to a technical decision, with the trade-offs spoken out loud rather than left implicit.

Comments

Popular posts from this blog

Communicating between Independent LWC in Omniscript

JWT (JSON Web Token)

Efficient way to write apex code

Import third party JS library in OmniScript Custom Lightning Web Components

Server-Side Document Generation

Reusable Code in OmniScript - Lightning Web Components

Salesforce Best Features available

Mastering the Matrix: Top 10 Advanced Salesforce Integration Interview Questions