The new Exact Data Match (EDM) experience further simplifies the process to create a custom sensitive info type classifier. However, the first question is, why do I need Exact Data Match?
Microsoft Purview includes a set of out-of-the-box sensitive info types, however, nuances in your data format, or custom data in your organization may or may not detect using these out-of-the-box classifiers. You may also experience sensitive data that is periodically updated. Example: Employee records that include their name, date of birth, employee ID etc. Exact Data Match uses uploaded data to create a custom sensitive info type, which reduce the rate of false positives.

Tip
- Keep the name of the EDM classifier simple with no spaces
- When uploading data from a CSV or TSV file, ensure there are no spaces or underscores in the column names
Step 1: Create a sample upload file
The sample upload file columns need to match the columns used in the actual/final upload file. Gather the source data in CSV or TSV format.
Keep the sample file below 2.5MB. The limits for the actual upload file are:
- Up to 100 million rows of sensitive data
- Up to 32 columns (fields) per data source
- Up to ten columns (fields) marked as searchable
Step 2: Create the EDM classifier in Microsoft Purview
Microsoft Purview > Data Classification > Classifiers > Create EDM Classifier (using the new experience)
- Provide a name and description
- Upload the sample file (automatically defines the schema)
- OR: Manually define the schema
- Validate uploaded data and column names
- Specify the primary element (up to 10)
- Pick primary elements that are unique: example: SSN, not names or date of birth
- Specify if the data is case sensitive, or if you want to ignore delimiters
- 2 rules are automatically created with High and Medium confidence
- Customize the rules if required
- After the EDM classifier is created, note down the schema name from the flyout (used in Step 5)
Step 3: Create the ‘Security’ group in M365: EDM_DataUploaders
- Add members who will hash and upload data to this Security group
Step 4: Decision to use a single device, or separate devices to hash and upload data
Option 1: Single device to hash and upload
Typically used when the device is secure, there are no concerns with plain text sensitive data residing on the device
Option 2: Separate device to hash and upload
Typically used when hashing data on a managed and secure device and uploading from a public facing device
If there are concerns with plain text sensitive data residing on the device used in the upload process
If using 2 devices, ensure the EDM upload tool is installed and authorized on both machines
Download the EDM Upload tool from here
Step 5: Prepare and authorize the hash and upload devices
The steps below use an example directory location, does not need to be strictly followed. This process is a one-time setup to prepare and authorize the devices
- Create a directory: C:\EDM
- Create a folder in the directory: C:\EDM\Hash
- Your hashed data is automatically created here
- Create a folder in the directory: C:\EDM\Data
- Place your plain text upload file here
- Run PowerShell as an administrator, change directory to the EDM upload tool location, and run the following (Remember: In PowerShell, you need to add a dot and slash before each command if you need to run an executable)
- EdmUploadAgent.exe /Authorize
- You will be prompted to authenticate, ensure this account is added to the M365 Security group created earlier (see Step 3)
- Download your schema, note down the name of the XML file (to be used in Step 6)
- EdmuploadAgent.exe /SaveSchema /DataStoreName <replace with schema name> /OutputDir c:\EDM\Data\
Step 6a: Hash and Upload data from a single device
- Validate your upload data against the schema
- EdmUploadAgent.exe /ValidateData /DataFile c:\EDM\Data\<replace with upload file name> /Schema c:\EDM\Data\<replace with schema XML filename>
- To hash and upload the data in a single step
- EdmUploadAgent.exe /UploadData /DataStoreName <replace with schema name> /DataFile c:\EDM\Data\<replace with upload file name> /HashLocation c:\EDM\Hash\ /Schema c:\EDM\Data\<replace with schema XML filename> /AllowedBadLinesPercentage 0
- Validate the upload command is complete
- EdmUploadAgent.exe /GetDataStore
- Go back to Microsoft Purview to check on the EDM indexing status
Step 6b: Hash and Upload data from separate devices
- Create a hash file
- EdmUploadAgent.exe /CreateHash /DataFile c:\EDM\Data\<replace with upload file name> /HashLocation c:\EDM\Hash\ /Schema c:\EDM\Data\<replace with schema XML filename> /AllowedBadLinesPercentage 0
- Transfer the hash files automatically created and stored in c:\EDM\Hash to the device that will perform the upload process
- Upload the hashed data
- EdmUploadAgent.exe /UploadHash /DataStoreName <replace with schema name> /HashFile C:\Edm\Hash\< replace with .EdmHash file name>
- Validate the upload command is complete
- EdmUploadAgent.exe /GetDataStore
- Go back to Microsoft Purview to check on the EDM indexing status