This chapter covers custom data connectors in Microsoft Sentinel, which are essential for ingesting security data from sources that lack pre-built connectors. On the SC-200 exam, custom data connectors appear in roughly 10-15% of questions related to data ingestion, often testing your ability to choose the right method (e.g., Logic Apps, CEF, Syslog) and configure them correctly. Understanding custom connectors is critical because many real-world environments use niche or legacy systems that require tailored ingestion. This chapter will explain the mechanisms, configuration steps, and common pitfalls to help you pass the exam and succeed as a security operations analyst.
Jump to a section
Imagine a large office where all incoming mail is sorted by a central mailroom. The mailroom has standard slots for common departments like HR, IT, and Finance. But what if a new department, say 'Data Analytics', needs its own slot? You can't just add a new slot to the mailroom's sorting machine—it only supports predefined slots. Instead, you create a custom mailbox for Data Analytics in the office's internal mail system. You then set up a dedicated courier who picks up mail from a specific external source (like a partner company's dropbox) and delivers it to that custom mailbox. The mailroom doesn't need to know about this new mailbox; it's outside the standard sorting process. However, you must ensure the courier follows the same format for envelopes (e.g., standard envelope size, correct addressing) so that the Data Analytics team can read the mail. Similarly, in Microsoft Sentinel, custom data connectors allow you to ingest logs from sources that don't have a built-in connector. You create a custom connector using Logic Apps or other means, define the data format (e.g., JSON, CEF), and set up the ingestion pipeline. The connector acts like the courier, pulling data from an external source and delivering it to a custom Log Analytics workspace table. Sentinel's standard analytics and detection rules can then process this data, just as the Data Analytics team can read the mail once it's in their custom mailbox.
What Are Custom Data Connectors and Why Do They Exist?
Microsoft Sentinel provides over 100 built-in data connectors for popular security solutions like Azure Active Directory, Microsoft 365 Defender, and AWS CloudTrail. However, many organizations use proprietary or less common security tools—such as custom-built applications, legacy IDS/IPS systems, or IoT device logs—that lack a native connector. Custom data connectors fill this gap by allowing you to ingest any log source into Sentinel.
Custom connectors are not a single product but a set of methods to bring data into a Log Analytics workspace. The primary methods include: - Azure Logic Apps: For REST API sources. - Logstash: For any source Logstash supports. - Azure Functions: For serverless ingestion. - Syslog/CEF: For Linux-based sources. - Direct API ingestion: Using the Log Analytics Ingestion API.
Each method has specific use cases, latency characteristics, and configuration steps. The SC-200 exam focuses on when to use each and how to set them up.
How Custom Data Connectors Work Internally
All custom data connectors ultimately deliver log data to a Log Analytics workspace via the Log Analytics Ingestion API (also known as the Data Collector API). The API accepts data in JSON or CSV format and requires a workspace ID and a shared key for authentication. The data is then stored in custom tables (tables with names ending in _CL for custom logs) or in standard tables like Syslog or CommonSecurityLog.
The workflow is:
1. Data Source: The external system generates logs (e.g., application logs, firewall logs).
2. Connector Agent: The custom connector (e.g., a Logic App, a Logstash plugin) collects the logs. It may pull from an API or receive pushed data.
3. Transformation: The connector may transform the data into a schema that Sentinel expects. For example, converting CSV to JSON, or mapping fields to the CommonSecurityLog schema.
4. Authentication: The connector authenticates to the Log Analytics workspace using the workspace ID and shared key (or Azure AD authentication for newer APIs).
5. Ingestion: The connector sends the data via HTTPS POST to the Ingestion API endpoint: https://<workspace-id>.ods.opinsights.azure.com/api/logs?api-version=2016-04-01.
6. Storage: The data is stored in a custom table or a standard table. Custom tables are created automatically on first ingestion.
7. Detection: Sentinel analytic rules can query the custom table just like any other table.
Key Components, Values, Defaults, and Timers
Workspace ID and Shared Key: These are found in the Log Analytics workspace under Agents management. The shared key is a base64-encoded string used for HMAC-SHA256 signature in the Authorization header.
Log Analytics Ingestion API: The API has a limit of 30 MB per request (compressed) and a rate limit of 500 MB per minute per workspace. Exceeding this results in HTTP 429 (Too Many Requests).
Custom Table Naming: Custom tables must end with _CL (for custom logs) or _CSV (for CSV-based logs). The table name can only contain letters, numbers, and underscores.
Time Generated Field: Each log entry must have a TimeGenerated field in ISO 8601 format (e.g., 2025-03-15T12:00:00Z). If missing, the ingestion time is used.
Logstash Plugin: The microsoft-sentinel-logstash-output-plugin supports batching and retry. Default batch size is 1000 events; default flush interval is 5 seconds.
Logic App Connector: The Azure Sentinel connector in Logic Apps uses the same Ingestion API. It supports up to 100 actions per run and a timeout of 120 seconds per action.
Azure Functions: The consumption plan has a timeout of 10 minutes per function execution. The function should be triggered by a timer (e.g., every 5 minutes) or by an event.
Configuration and Verification Commands
For Logstash:
1. Install the plugin: bin/logstash-plugin install microsoft-sentinel-logstash-output-plugin
2. Configure the output in your Logstash config file:
output {
microsoft-sentinel-logstash-output-plugin {
workspace_id => "your-workspace-id"
workspace_key => "your-shared-key"
table_name => "MyCustomLog_CL"
time_generated_field => "TimeGenerated"
}
}Verify by checking the Logstash logs for successful POST requests.
For Logic Apps:
1. Create a Logic App with a trigger (e.g., HTTP request, recurrence).
2. Add an action: Azure Log Analytics Data Collector.
3. Provide the workspace ID and shared key (use Azure Key Vault for security).
4. Set the JSON payload in the request body. The schema must match the custom table.
5. Test by running the Logic App and querying the custom table in Log Analytics: MyCustomLog_CL | take 10.
For Azure Functions:
1. Create a function with an HTTP trigger or timer trigger.
2. Use the Microsoft.Azure.OperationalInsights NuGet package (C#) or equivalent Python/Node.js SDK.
3. Send logs using the LogAnalyticsClient class.
4. Monitor function logs for errors.
How Custom Connectors Interact with Related Technologies
Custom data connectors often work with: - Azure Monitor Agent (AMA): For Syslog and CEF, you can use AMA to collect logs from Linux machines and send them to Sentinel. This is not a custom connector per se, but it's an alternative for standard formats. - Logstash: Commonly used with Elastic stack; the Sentinel output plugin allows seamless integration. - Azure Event Hubs: For high-volume data ingestion, you can send logs to Event Hubs and then use a Logic App or Azure Function to process and forward to Sentinel. - Azure Data Lake Storage: For historical data, you can use Azure Data Factory to copy logs to a staging location and then ingest via Logic App.
Important Exam Note: The SC-200 exam does not require you to write code for custom connectors, but you must understand the architecture and configuration steps. You should know that custom connectors are created in the Sentinel portal under Data connectors > Create data connector (which opens a Logic App template) or by using the Log Analytics Ingestion API directly.
Identify the Data Source
Determine the type of log source you need to ingest. Is it a REST API, a syslog server, a flat file, or a database? For the SC-200 exam, common scenarios include custom applications (REST API), legacy network appliances (syslog/CEF), or IoT devices (MQTT). Understanding the source helps choose the right connector method. For example, if the source supports syslog, use the Syslog connector (built-in) or a custom CEF connector. If it's a proprietary API, you'll need a Logic App or Azure Function. Document the data format (JSON, CSV, XML) and the authentication method (API key, OAuth, basic auth). This step is critical because it dictates the entire ingestion pipeline. Misidentifying the source leads to incorrect connector selection, which is a common exam trap.
Choose the Connector Method
Based on the data source, select one of the custom connector methods: Logic Apps (for API sources with moderate volume), Logstash (for high-volume, real-time ingestion from Linux sources), Azure Functions (for serverless, event-driven ingestion), or direct API (for simple, low-volume sources). The exam tests your ability to match the method to the scenario. For instance, if the source is a REST API with rate limits, Logic Apps with retry policy is appropriate. If the source is a syslog stream, use Logstash with the Sentinel output plugin. Each method has trade-offs: Logic Apps are easier to configure but have higher latency (seconds to minutes); Logstash offers low latency (sub-second) but requires managing a server; Azure Functions are cost-effective for sporadic data but have execution time limits.
Configure Authentication
Obtain the Log Analytics workspace ID and shared key from the Azure portal. These are used to authenticate to the Ingestion API. For security, store the shared key in Azure Key Vault and reference it from Logic Apps or Azure Functions. The authentication uses HMAC-SHA256: you create a string to sign (the request body, content type, and date) and then compute the signature. The Authorization header is: `SharedKey <workspace-id>:<signature>`. The exam may test that you cannot use Azure AD authentication with the classic Ingestion API; you must use the shared key. Newer preview APIs support Azure AD, but the exam focuses on the current GA version. Misconfiguring the signature is a common error that leads to 403 Forbidden responses.
Define the Custom Table Schema
Decide the table name (must end with `_CL`) and the schema. The schema is defined by the JSON payload you send. For example, if your log has fields `event_id`, `event_name`, `timestamp`, you send JSON like: `[{"event_id": 123, "event_name": "Login", "timestamp": "2025-03-15T12:00:00Z"}]`. The first ingestion creates the table with the inferred schema. You can also pre-create the table using the Log Analytics UI or API. The `TimeGenerated` field is required and should be in UTC. If you omit it, Sentinel uses the ingestion time, which may affect time-based detections. The exam often tests that custom tables are automatically created; you do not need to manually create them beforehand.
Implement and Test the Connector
Deploy the connector (e.g., create the Logic App, configure Logstash, or write the Azure Function). Test by sending sample logs and verifying they appear in the custom table. Use KQL queries like `MyCustomLog_CL | take 10` in Log Analytics. Check for errors: in Logic Apps, look at the run history; in Logstash, check the logs for HTTP errors; in Azure Functions, check Application Insights. Common issues include incorrect table name (missing `_CL`), wrong time format, or exceeding API limits. For high-volume sources, implement batching (e.g., send 1000 events per request) and compression (gzip). The exam may ask about monitoring ingestion using the `Heartbeat` table or the `Operation` table. Once data flows, you can create analytic rules and workbooks on the custom table.
Scenario 1: Ingesting Custom Application Logs from a REST API
A financial services company has a proprietary trading application that logs transaction data to a REST API. The logs include fields like trade_id, symbol, quantity, price, and timestamp. There's no built-in Sentinel connector for this application. The security team needs to ingest these logs to detect suspicious trading patterns. They choose Azure Logic Apps because the API is well-documented and the volume is moderate (about 1000 transactions per minute).
Configuration: They create a Logic App with a recurrence trigger set to run every minute. The Logic App calls the REST API, parses the JSON response, and then uses the Azure Log Analytics Data Collector action to send the data to a custom table named TradingLogs_CL. They store the API key in Key Vault and the workspace shared key in Key Vault as well.
Performance Considerations: The Logic App must handle API pagination (e.g., the API returns a next page token). They implement a loop to fetch all pages within the 120-second action timeout. The Log Analytics API limit of 500 MB per minute is not an issue at this volume.
Common Pitfalls: If the API changes its JSON schema, the Logic App may fail to parse. They set up error handling to send alerts to the security operations center. Also, they ensure the TimeGenerated field is in UTC to avoid timezone confusion.
Scenario 2: High-Volume Syslog from Legacy Network Appliances
A large enterprise has hundreds of legacy network firewalls that send syslog data in a proprietary format (not CEF). The volume is high—about 50,000 events per second. They cannot use the built-in Syslog connector because it expects RFC 3164 or RFC 5424 format. They decide to use Logstash running on a Linux server to collect the syslog, parse it, and forward it to Sentinel.
Configuration: They install Logstash with the microsoft-sentinel-logstash-output-plugin. The input is a syslog listener on port 514. They write a custom filter to parse the proprietary format into key-value pairs. The output sends to the custom table FirewallLogs_CL. They tune the batch size to 5000 events and flush interval to 2 seconds to balance latency and API limits.
Scale Considerations: The Logstash server must have sufficient CPU and memory to handle 50,000 EPS. They deploy multiple Logstash instances behind a load balancer. They monitor the Logstash queue depth to ensure it doesn't grow unbounded. The Log Analytics API rate limit of 500 MB per minute may be a bottleneck; they compress the data (gzip) to reduce size.
What Goes Wrong: If the shared key is rotated without updating Logstash, ingestion stops. They automate key rotation using Azure Key Vault and a script that restarts Logstash with the new key. Another issue: if the syslog format changes, the filter fails, and events are lost. They implement a dead-letter queue to capture unparsed events.
Exactly What SC-200 Tests on Custom Data Connectors
The SC-200 exam objective 2.1 covers data ingestion, including custom connectors. Specifically, you must know:
The four primary methods: Logic Apps, Logstash, Azure Functions, and direct API.
When to use each method based on data source and volume.
The authentication mechanism (workspace ID and shared key) and that it uses HMAC-SHA256.
That custom tables must end with _CL and are created automatically.
The requirement for a TimeGenerated field in ISO 8601 format.
The Log Analytics Ingestion API endpoint and its limits (30 MB per request, 500 MB per minute).
How to configure a Logic App as a custom connector (trigger, action, authentication).
How to use Logstash with the Sentinel output plugin.
That Azure Functions can be used for serverless ingestion.
The difference between custom connectors and built-in connectors.
Common Wrong Answers and Why Candidates Choose Them
1. Wrong: "Custom connectors require you to create a custom Log Analytics table manually in advance." Why wrong: Tables are created automatically on first ingestion. You only need to specify the table name in the connector configuration. Candidates choose this because they think you must pre-define schemas, but Sentinel infers the schema from the JSON payload.
2. Wrong: "You can use Azure AD authentication instead of workspace key for the classic Ingestion API." Why wrong: The classic API only supports shared key authentication. Azure AD is only available in preview APIs. Candidates confuse this with other Azure services where Azure AD is standard.
3. Wrong: "The custom table name can be any name without restrictions."
Why wrong: Custom tables must end with _CL (or _CSV for CSV). If you omit the suffix, the API returns an error. Candidates forget this naming convention.
4. Wrong: "Logstash can only send data to Sentinel using the built-in Syslog connector." Why wrong: Logstash uses a dedicated output plugin that sends directly to the Ingestion API. The Syslog connector is for standard syslog format, not for Logstash. Candidates confuse Logstash with syslog.
Specific Numbers and Terms That Appear on the Exam
30 MB: Maximum request size (compressed).
500 MB/minute: Workspace ingestion rate limit.
`_CL`: Custom log table suffix.
`TimeGenerated`: Required field.
`SharedKey`: Authorization scheme.
`ods.opinsights.azure.com`: Endpoint domain.
`api-version=2016-04-01`: API version.
Logstash output plugin: microsoft-sentinel-logstash-output-plugin.
Logic App action: Azure Log Analytics Data Collector.
Edge Cases and Exceptions the Exam Loves to Test
What if the data source is a database? You can use Azure Data Factory to export to blob storage, then use a Logic App to ingest from blob. Or use an Azure Function to query the database and send logs.
What if the log volume exceeds 500 MB/minute? You must compress data, batch requests, or use multiple workspaces. The exam may ask about scaling strategies.
What if the data is in CSV format? Use the _CSV suffix or convert to JSON before ingestion. The classic API only accepts JSON; CSV is not directly supported.
What if the connector fails? Check the Operation table in Log Analytics for ingestion errors. The exam may ask about troubleshooting.
How to Eliminate Wrong Answers Using the Underlying Mechanism
Understand the mechanism: The Ingestion API is a simple HTTPS POST with shared key auth. Any method that can make an HTTPS POST with the correct headers and body can be a custom connector. Therefore, any answer suggesting a method that cannot make HTTPS requests (e.g., file upload via FTP) is wrong. Also, any answer that says you need to install an agent on the data source is likely wrong for custom connectors (unless it's Logstash or AMA). The exam tests your ability to reason about the underlying pipeline.
Custom data connectors use the Log Analytics Ingestion API with workspace ID and shared key authentication.
Custom table names must end with `_CL` (e.g., `MyLogs_CL`) and are created automatically on first ingestion.
The `TimeGenerated` field is required in ISO 8601 UTC format; if missing, ingestion time is used.
The Ingestion API accepts up to 30 MB per request (compressed) and 500 MB per minute per workspace.
Logic Apps are best for REST API sources with moderate volume; Logstash is best for high-volume real-time data.
Azure Functions can be used for serverless ingestion but have execution time limits (10 minutes on consumption plan).
You cannot use Azure AD authentication with the classic Ingestion API; use shared key.
Common exam wrong answers: pre-creating tables, using Azure AD, omitting `_CL` suffix.
Monitor ingestion errors via the `Operation` table in Log Analytics.
For high volume, implement batching and compression to avoid API rate limits.
These come up on the exam all the time. Here's how to tell them apart.
Logic Apps
No-code/low-code, easy to configure for REST API sources.
Higher latency (seconds to minutes) due to polling or trigger delays.
Built-in error handling and retry policies.
Limits: 100 actions per run, 120-second timeout per action.
Best for moderate volume (up to thousands of events per minute).
Logstash
Requires managing a Logstash server (Java, configuration).
Sub-second latency, ideal for real-time ingestion.
Customizable filtering and parsing with Logstash plugins.
High throughput: can handle tens of thousands of events per second.
Best for high-volume, real-time syslog or custom log formats.
Mistake
Custom data connectors require writing custom code in C# or Python.
Correct
While you can write code, many custom connectors are built using no-code/low-code tools like Logic Apps, or using Logstash with a pre-built plugin. The SC-200 exam focuses on configuration, not programming.
Mistake
The custom table name can be anything, like 'MyLogs'.
Correct
Custom tables must end with `_CL` (for custom logs) or `_CSV` (for CSV). If you send data to a table name without the suffix, the API returns an error.
Mistake
You can use Azure AD authentication to call the Log Analytics Ingestion API.
Correct
The classic Ingestion API (api-version=2016-04-01) only supports shared key authentication. Azure AD authentication is available in a newer preview version, but the exam focuses on the GA version.
Mistake
Logstash sends data to Sentinel via the Syslog connector.
Correct
Logstash uses a dedicated output plugin that sends data directly to the Log Analytics Ingestion API. The Syslog connector is for standard syslog sources, not for Logstash.
Mistake
You must pre-create the custom table in Log Analytics before sending data.
Correct
The table is created automatically on the first ingestion. You only need to specify the table name in the connector configuration.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
You don't 'create' a custom connector as a single object. Instead, you configure one of the methods: Logic Apps (via Sentinel's 'Create data connector' button, which opens a Logic Apps template), Logstash (install the output plugin and configure your pipeline), Azure Functions (write a function to call the Ingestion API), or direct API calls. The exam expects you to know which method to use based on the scenario.
The endpoint is `https://<workspace-id>.ods.opinsights.azure.com/api/logs?api-version=2016-04-01`. Replace `<workspace-id>` with your Log Analytics workspace ID. The request must include an Authorization header with the shared key signature, and the body must be JSON. The exam may ask you to identify the correct endpoint URL.
The classic Ingestion API only accepts JSON. If your data is in CSV, you must convert it to JSON before sending. You can use a Logic App or Azure Function to parse CSV and create JSON objects. Alternatively, you can use the `_CSV` suffix (preview) but the exam focuses on `_CL`.
The API returns HTTP 429 (Too Many Requests). Your connector should implement retry logic with exponential backoff. For Logstash, the output plugin retries automatically. For Logic Apps, you can configure retry policies. To avoid this, batch requests and compress data (gzip) to reduce size.
Not necessarily. Custom connectors can pull data from APIs without an agent. However, if the source is a syslog stream, you may need a syslog collector (like Logstash or rsyslog) that forwards to Sentinel. The exam distinguishes between agent-based (Azure Monitor Agent) and agentless (Logic Apps) methods.
Use the `Operation` table in Log Analytics: `Operation | where OperationCategory == 'Ingestion'`. You can also use Sentinel's Data connector health workbook. For Logic Apps, check run history. For Logstash, monitor its logs. The exam may ask about using the `Heartbeat` table to verify agent connectivity, but for custom connectors, the `Operation` table is key.
Standard tables (like `SecurityEvent`, `Syslog`) have predefined schemas and are used by built-in connectors. Custom tables (ending with `_CL`) are created by users for custom log sources. You can query custom tables with KQL just like standard tables, but they do not have predefined schemas—the schema is inferred from the first JSON payload.
You've just covered Custom Sentinel Data Connectors — now see how well it sticks with free SC-200 practice questions. Full explanations included, no account needed.
Done with this chapter?