Add CEF processor to Ingest node#122491
Conversation
|
I realize this draft is still in progress, and you likely already have plans for these items.
Additionally, there is a CEF v1 specification (our |
|
I asked Lee H about micro-benchmarking, and JMH is being used (see https://github.com/elastic/elasticsearch/tree/main/benchmarks#elasticsearch-microbenchmark-suite). So this could add a benchmark under that suite of tests. |
Will this be comparable to the microbenchmarking that is done in the beats processor? |
modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/CefParser.java
Outdated
Show resolved
Hide resolved
|
I rewrote the parsing implementation to scan through things manually rather than relying on regexes, it's quite a bit faster this way. I also rewrote the date parsing so that it doesn't rely on a The changes I made pass all the same tests that were already here, I didn't touch the tests themselves: I'm going to benchmark this tomorrow to compare the before and after and quantify where we were in terms of performance versus where we are now. My guess is that we're close enough at this point that performance is no longer a concern, but I might be wrong about that. If you wouldn't mind taking a little time during your next workday to review what I've written, I'd appreciate it. |
Thanks @joegallo for taking time into this. The new manual parsing looks good. Just was unsure about the error with The changes in overall look good to me. Please let me know how the performance part runs through and we can push this if everything looks great.. |
|
To save myself some future web searching (and for the benefit of future github archaeologists), here's a link to the ArcSight SmartConnectors 25.1 CEF Implementation Standard. |
|
Caveats about microbenchmarking aside, here's this PR before my rewrite of the parser (that is, this is the regex based parser): And here it is for the same workload after my rewrite: So at this point we're averaging about 12.7 microseconds per invocation. The workload for this was generated from the test fixtures cef messages, which explains why there are so many failures (since some of the fixture files contain illegal things to demonstrate that we fail on them). |
|
We'll need to add documentation for the new processor, and it'll need to be added to the spec for the benefit of Kibana and the clients. I'm okay if we merge this PR as is, and then iterate on those things follow up PRs. |
|
Ok. We can iterate on the docs in a new PR. Thanks a lot @joegallo for taking time into this. |
Closes - #126201
This PR creates a new CEF ingest node processor. The CEF processor converts a Common Event Format logs into a JSON structure. This processor also maps relevant CEF fields to ECS mappings without a need for additional processors in Ingest pipeline
Encoding rules from the spec
Ensure the following when encoding symbols in CEF:
<space>.that the pipes in the extension do not need escaping. For example:
Sep 19 08:26:10 host CEF:0|security|threatmanager|1.0|100|detected a
| in message|10|src=10.0.0.1 act=blocked a | dst=1.1.1.1
another backslash (). For example:
Sep 19 08:26:10 host CEF:0|security|threatmanager|1.0|100|detected a
\ in packet|10|src=10.0.0.1 act=blocked a \ dst=1.1.1.1
Equal signs in the header need no escaping. For example:
Sep 19 08:26:10 host CEF:0|security|threatmanager|1.0|100|detected a =
in message|10|src=10.0.0.1 act=blocked a = dst=1.1.1.1
Note that multiple lines are only allowed in the value part of the extensions. For
example:
Sep 19 08:26:10 host CEF:0|security|threatmanager|1.0|100|Detected a
threat. No action needed.|10|src=10.0.0.1 msg=Detected a threat.\n No
action needed
Example
An example CEF parsing would look like
CEF LOG
Parsed CEF content
{ "process": { "name": "procName", "pid": 5678 }, "agent": { "ip": "192.168.0.1", "name": "example.com", "id": "agentId", "type": "agentType", "version": "1.0", "mac": "00:0a:95:9d:68:16" }, "cef": { "severity": 10, "extensions": { "agentTranslatedZoneExternalID": "ext123", "flexDate1": "2021-06-01T11:43:20Z", "deviceCustomString3Label": "cs3Label", "oldFileSize": 2048, "sourceZoneURI": "sourceZoneUri", "deviceCustomIPv6Address4Label": "c6a4Label", "destinationTranslatedZoneURI": "destUri", "agentZoneURI": "zoneUri", "oldFileName": "oldFile", "deviceCustomDate2Label": "customDate2Label", "deviceNtDomain": "example.org", "deviceCustomFloatingPoint4Label": "cfp4Label", "sourceTranslatedZoneURI": "sourceUri", "deviceCustomIPv6Address1": "2001:db8::1", "deviceCustomDate1Label": "customDate1Label", "deviceCustomIPv6Address4": "2001:db8::4", "requestCookies": "cookies", "deviceCustomIPv6Address3": "2001:db8::3", "oldFilePermission": "rw-r--r--", "deviceCustomIPv6Address2": "2001:db8::2", "deviceCustomString2Label": "cs2Label", "deviceCustomFloatingPoint2Label": "cfp2Label", "deviceCustomDate2": "2021-06-01T11:45Z", "agentTranslatedZoneURI": "uri", "deviceCustomDate1": "2021-06-01T11:43:20Z", "deviceCustomIPv6Address2Label": "c6a2Label", "oldFileModificationTime": "2021-06-01T11:45Z", "deviceCustomFloatingPoint1": 1.23, "oldFileHash": "oldHash", "deviceCustomFloatingPoint2": 2.34, "deviceCustomFloatingPoint3": 3.45, "flexString1": "flexString1", "deviceCustomFloatingPoint4": 4.56, "oldFileId": "oldId", "deviceCustomNumber1": 123, "agentTranslatedAddress": "10.0.0.1", "deviceCustomNumber3": 345, "deviceCustomNumber2": 234, "flexString2": "flexString2", "baseEventCount": 1234, "deviceCustomIPv6Address1Label": "c6a1Label", "deviceTranslatedZoneExternalID": "transExtId", "deviceZoneExternalID": "zoneExtId", "agentTimeZone": "UTC", "deviceCustomString6Label": "cs6Label", "deviceCustomNumber2Label": "cn2Label", "deviceCustomString5Label": "cs5Label", "deviceCustomFloatingPoint1Label": "cfp1Label", "sourceZoneExternalID": "sourceZoneExtId", "deviceTranslatedZoneURI": "transUri", "destinationTranslatedZoneExternalID": "destExtId", "flexString1Label": "flexString1Label", "deviceCustomNumber1Label": "cn1Label", "categoryDeviceType": "catDeviceType", "deviceZoneURI": "zoneUri", "flexString2Label": "flexString2Label", "deviceCustomNumber3Label": "cn3Label", "deviceCustomString1": "customString1", "externalId": "extId", "oldFilePath": "/old/path", "deviceCustomString3": "customString3", "deviceCustomString2": "customString2", "deviceCustomString1Label": "cs1Label", "deviceCustomString5": "customString5", "deviceCustomString4": "customString4", "agentZoneExternalID": "zoneExtId", "oldFileCreateTime": "2021-06-01T11:43:20Z", "deviceCustomString6": "customString6", "deviceCustomIPv6Address3Label": "c6a3Label", "deviceEventCategory": "category", "deviceCustomString4Label": "cs4Label", "deviceCustomFloatingPoint3Label": "cfp3Label", "destinationZoneExternalID": "destZoneExtId", "flexDate1Label": "flexDate1Label", "sourceTranslatedZoneExternalID": "sourceExtId", "agentNtDomain": "example.org", "oldFileType": "oldType", "destinationZoneURI": "destZoneUri" }, "device.version": "1.0", "name": "trojan successfully stopped", "device.vendor": "security", "device.product": "threatmanager", "device.event_class_id": 100, "version": 0 }, "log": { "syslog": { "facility": { "code": 16 } } }, "destination": { "nat": { "port": 8080, "ip": "10.0.0.2" }, "geo": { "location": { "lon": -122.4194, "lat": 37.7749 } }, "registered_domain": "destNtDomain", "process": { "name": "destProc", "pid": 1234 }, "port": 80, "bytes": 91011, "service": { "name": "destService" }, "domain": "destHost", "ip": "192.168.0.2", "user": { "name": "destUser", "id": "destUserId", "group": { "name": "admin" } }, "mac": "00:0a:95:9d:68:16" }, "source": { "geo": { "location": { "lon": -122.4194, "lat": 37.7749 } }, "nat": { "port": 8081, "ip": "10.0.0.4" }, "registered_domain": "sourceNtDomain", "process": { "name": "sourceProc", "pid": 1234 }, "port": 443, "service": { "name": "sourceService" }, "bytes": 5678, "ip": "192.168.0.4", "domain": "sourceDomain", "user": { "name": "sourceUser", "id": "sourceUserId", "group": { "name": "sourcePriv" } }, "mac": "00:0a:95:9d:68:16" }, "message": "message", "url": { "original": "url" }, "network": { "protocol": "HTTP", "transport": "TCP", "direction": "inbound" }, "observer": { "ingress": { "interface": { "name": "eth0" } }, "registered_domain": "example.com", "product": "threatmanager", "hostname": "host1", "vendor": "security", "ip": "192.168.0.3", "name": "extId", "version": "1.0", "mac": "00:0a:95:9d:68:16", "egress": { "interface": { "name": "eth1" } } }, "file": { "inode": 5678, "path": "/path/to/file", "size": 1024, "created": "2021-06-01T11:43:20Z", "name": "file.txt", "mtime": "2021-06-01T11:45Z", "type": "txt", "hash": "abcd1234", "group": "rw-r--r--" }, "@timestamp": "2021-06-01T11:43:20Z", "organization": { "name": "custUri", "id": "custExtId" }, "host": { "nat": { "ip": "10.0.0.3" } }, "http": { "request": { "referrer": "referrer", "method": "GET" } }, "event": { "reason": "reason", "ingested": "2021-06-01T11:43:20Z", "original": "rawEvent", "code": 100, "kind": 1, "created": "2021-06-01T11:43:20Z", "timezone": "UTC", "start": "2021-06-01T11:43:20Z", "action": "blocked", "end": "2021-06-01T11:45Z", "id": "evt123", "outcome": "success" }, "user_agent": { "original": "Mozilla" } }gradle check?