Exact Data Match (New experience)

The new Exact Data Match (EDM) experience further simplifies the process to create a custom sensitive info type classifier. However, the first question is, why do I need Exact Data Match?

Microsoft Purview includes a set of out-of-the-box sensitive info types, however, nuances in your data format, or custom data in your organization may or may not detect using these out-of-the-box classifiers. You may also experience sensitive data that is periodically updated. Example: Employee records that include their name, date of birth, employee ID etc. Exact Data Match uses uploaded data to create a custom sensitive info type, which reduce the rate of false positives.

Tip

  • Keep the name of the EDM classifier simple with no spaces
  • When uploading data from a CSV or TSV file, ensure there are no spaces or underscores in the column names

Step 1: Create a sample upload file

The sample upload file columns need to match the columns used in the actual/final upload file. Gather the source data in CSV or TSV format.

Keep the sample file below 2.5MB. The limits for the actual upload file are:

  • Up to 100 million rows of sensitive data
  • Up to 32 columns (fields) per data source
  • Up to ten columns (fields) marked as searchable

Step 2: Create the EDM classifier in Microsoft Purview

Microsoft Purview > Data Classification > Classifiers > Create EDM Classifier (using the new experience)

  • Provide a name and description
  • Upload the sample file (automatically defines the schema)
  • OR: Manually define the schema
  • Validate uploaded data and column names
  • Specify the primary element (up to 10)
    • Pick primary elements that are unique: example: SSN, not names or date of birth
  • Specify if the data is case sensitive, or if you want to ignore delimiters
  • 2 rules are automatically created with High and Medium confidence
    • Customize the rules if required
  • After the EDM classifier is created, note down the schema name from the flyout (used in Step 5)

Step 3: Create the ‘Security’ group in M365: EDM_DataUploaders

  • Add members who will hash and upload data to this Security group

Step 4: Decision to use a single device, or separate devices to hash and upload data

Option 1: Single device to hash and upload

Typically used when the device is secure, there are no concerns with plain text sensitive data residing on the device

Option 2: Separate device to hash and upload

Typically used when hashing data on a managed and secure device and uploading from a public facing device

If there are concerns with plain text sensitive data residing on the device used in the upload process

If using 2 devices, ensure the EDM upload tool is installed and authorized on both machines

Download the EDM Upload tool from here

Step 5: Prepare and authorize the hash and upload devices

The steps below use an example directory location, does not need to be strictly followed. This process is a one-time setup to prepare and authorize the devices

  • Create a directory: C:\EDM
  • Create a folder in the directory: C:\EDM\Hash
    • Your hashed data is automatically created here
  • Create a folder in the directory: C:\EDM\Data
    • Place your plain text upload file here
  • Run PowerShell as an administrator, change directory to the EDM upload tool location, and run the following (Remember: In PowerShell, you need to add a dot and slash before each command if you need to run an executable)
    • EdmUploadAgent.exe /Authorize
    • You will be prompted to authenticate, ensure this account is added to the M365 Security group created earlier (see Step 3)
  • Download your schema, note down the name of the XML file (to be used in Step 6)
    • EdmuploadAgent.exe /SaveSchema /DataStoreName <replace with schema name> /OutputDir c:\EDM\Data\

Step 6a: Hash and Upload data from a single device

  • Validate your upload data against the schema
    • EdmUploadAgent.exe /ValidateData /DataFile c:\EDM\Data\<replace with upload file name> /Schema c:\EDM\Data\<replace with schema XML filename>
  • To hash and upload the data in a single step
    • EdmUploadAgent.exe /UploadData /DataStoreName <replace with schema name> /DataFile c:\EDM\Data\<replace with upload file name> /HashLocation c:\EDM\Hash\ /Schema c:\EDM\Data\<replace with schema XML filename>  /AllowedBadLinesPercentage 0
  • Validate the upload command is complete
    • EdmUploadAgent.exe /GetDataStore
  • Go back to Microsoft Purview to check on the EDM indexing status

Step 6b: Hash and Upload data from separate devices

  • Create a hash file
    • EdmUploadAgent.exe /CreateHash /DataFile c:\EDM\Data\<replace with upload file name> /HashLocation c:\EDM\Hash\ /Schema c:\EDM\Data\<replace with schema XML filename> /AllowedBadLinesPercentage 0
  • Transfer the hash files automatically created and stored in c:\EDM\Hash to the device that will perform the upload process
  • Upload the hashed data
    • EdmUploadAgent.exe /UploadHash /DataStoreName <replace with schema name> /HashFile C:\Edm\Hash\< replace with .EdmHash file name>
  • Validate the upload command is complete
    • EdmUploadAgent.exe /GetDataStore
  • Go back to Microsoft Purview to check on the EDM indexing status