[Djigzo users] dlp regexp behaves different on pre-made

CipherMail support support at ciphermail.com
Thu Dec 3 12:50:30 CET 2015


On 12/03/2015 12:34 PM, CipherMail support wrote:
> On 12/03/2015 12:24 PM, Raymond Bakker wrote:
>>>> Hello,
>>>>
>>>> ==Summary== 
>>>> We are experiencing different DLP behavior for complex RegEx between two installations. 
>>>>
>>>>
>>>> ==System==
>>>> Version:  ciphermail-virtual-appliance-2.10.0-3.
>>>>   1. Ubuntu pre-made virtual appliance (on my laptop)
>>>>   2. Red Hat & CentOS gateway package (on a test server)
>>>>
>>>>
>>>> ==Configuration==
>>>> DLP: several triggers with "Must Encrypt"
>>>> Settings: Encrypt Mode "No Encryption"
>>>> Settings: DLP Patterns added
>>>>
>>>>
>>>> ==Example==
>>>> We want to search a message for [any text][four numbers][any text]
>>>> So we try this RegEx: *.\d{4}.*
>>>>
>>>> This works perfectly on the Ubuntu VA, but it encrypts EVERY message on CentOS.
>>>> Everything is back to normal when we disable the complex RegEx on CentOS.
>>>>
>>>> We also tried to search for a little more simple like: [0-9][0-9][0-9][0-9]
>>>> Ubuntu version is fine, CentOS version encrypts every message.
>>>>
>>>>
>>>> ==DLP Trigger Comparison ==
>>>> Ubuntu version:
>>>>   - Single words work as expected
>>>>   - Mail header works as expected
>>>>   - Complex *.\d{4}.* works as expected
>>>>
>>>> CentOS version:
>>>>   - Single words work as expected
>>>>   - Mail header works as expected
>>>>   - Complex *.\d{4}.* works DIFFERENT
>>>>
>>>>
>>>> Does anyone have experience with this situation?
>>>>
>>>> Is our installation perhaps incorrect?
>>>
>>> It's quite likely that a message contains 4 digits. Could it be that the
>>> mail sent via the CentOS gateway is sent with some other mail app than
>>> the mail sent via the virtual appliance?
>>>
>>>>> We will look at this tomorrow, but I'm quite sure it is a default 
>>>>> intallation as described in the CipherMail guide.
>>>
>>> The DLP text extractor also extracts header values. So for example a
>>> date header will also be extracted. Since almost all mails contain a
>>> date header, almost any mail will contain 4 digits.
>>>
> 
>> That's true. The original is 8 digits (simulate Dutch Personal Id) 
>> but I get the point. What I don't understand (yet) is that my testing 
>> method & messages are the same on Ubuntu and CentOS and that it works
>> on the Ubuntu version.
> 
> Are you sure that the messages sent via the Ubuntu version are exactly
> the same as the message sent via the CentOS version? If for example the
> message sent via the Ubuntu system is sent by Zimbra but the message
> sent via the CentOS version is sent via Exchange then it's kind of
> comparing apples and oranges. It might be that one mail client (server?)
> adds certain headers with 8 digits and the other mail client (server?) not.

To make it less likely to have false positives, it might help if you
require that the number of digits are exactly 8 for a match. Because
with your original reg exp, digit sequences of 8 or more would trigger.

The following reg exp only triggers on digit sequences of exactly 8 digits:

\b\d{8}\b

Note: the \b is a word boundary separator

Kind regards,

CIpherMail support

-- 
CipherMail email encryption

Email encryption with support for S/MIME, OpenPGP, PDF encryption and
secure webmail pull.

https://www.ciphermail.com

Twitter: http://twitter.com/CipherMail



More information about the Users mailing list