On-Premise Scanner (Microsoft Information Protection)

The Microsoft Purview Content Explorer (formerly known as M365 Compliance) is an exceptional tool for the identification of data within your M365 environment. It serves as a crucial initial step in your Data Classification journey, commonly referred to as ‘Know your data’.

In order to effectively identify and classify data that is stored on-premises, it is necessary to install and configure the Information Protection scanner. This scanner functions as a service on a Windows Server, maintaining logs of the scanning progress in SQL, and generating a comprehensive report that highlights data matches.


Before you get started, here are a few items that need to be setup prior to installing the scanner.

Scanner Server

SQL server instance

  • Preferably on a different machine from the Scanner server
  • SQL server requirements: Here

Accounts setup

  • Create a service account with the following permissions or requirements: Full list here
    • Active Directory account synced to Azure AD
    • Log on locally
    • Logon as a service
    • Publish at least 1 label to this account
    • Full control permissions in SharePoint
    • Site collector auditor in SharePoint to allow targeted scanning only
    • Read, Write, Modify permissions to File Shares
    • Sysadmin role on SQL server instance

Define the Scan Cluster and Scan Job

Login to Microsoft Purview: compliance.microsoft.com > Settings > Information protection scanner

  • Create a new cluster (represents a group of scanners that share the scanning load)
    • Typically located in the same geo-location
    • Connected to the same SQL instance
    • Give it a simple name (avoid special characters if possible)
  • Create a new content scan job
    • Provide a scan job name
    • Select the previously created cluster from the dropdown
    • Schedule
      • Manual: Use this for initial discovery
      • Automatic: Once your scan jobs have been thoroughly tested, switch to Automatic
    • Info types to be discovered
      • Policy only: Use this if labels have SIT auto label conditions defined
      • All: Use this if labels are not configured with SIT conditions
    • Treat recommended labeling as automatic
      • Off: Use this if you have automatic classification defined in your label configuration
      • On: Use this if your configuration is set to recommend a label
    • Enable DLP policy rules
      • On: Use this if you want to enforce your DLP policy scoped to on-premise repositories
      • Off: No DLP policy evaluation needed
    • Enforce sensitivity labeling policy
      • Off: Use this open when running the scan in discovery mode
      • On: Scan and apply a label
    • Label files based on content
      • On: Use this option to inspect content and apply a label
      • Off: Apply a default label
    • Default label
      • None: Do not apply a default label to unlabeled files
      • Policy only: Apply a default label specified in the policy
      • Custom: Select one of your published labels as the default label
    • Relabel files
      • Off: Do not relabel a file, unless the new label has a higher classification level
      • On: Always relabel a file if there is a condition match
    • Preserve modification dates
      • On: Preferably preserve the original dates
      • Off: The modification dates are changes based on when the scanner modifies the file
    • Include or Exclude file types
      • Leave default values or customize per your org requirements
    • Default owner:
      • Scanner Account (Default)
      • Custom. Use this to customize the Owner property on the file
    • Set repository owner
      • Off
      • Specify SAMAccountName, UPN or SID. Grants owner full permissions on the file if the classification is updated by the scanner
  • Save the content scan job. After saving, you can specify target repositories.
    Examples:

Now, open a text editor and copy the following text:
**The following have been setup so far, update them in your text file**
Scanner account:
SQL Instance:
Scan cluster name:
**The following items will be setup in the next section**
App Name:
AppId:
AppSecret:
TenantID:


Register a new application in Azure AD

Login to Azure AD: portal.azure.com > App registrations > New registration

  • Provide a name: example: AIP-Scanner
  • Select ‘Accounts in this organizational directory only’
  • Redirect URI:
  • Click ‘Register’
  • ***Note down the App Name, AppId and TenantID in your file
  • After registering you App, you are taken to the App overview
  • Go to Certificates and secrets > New client secret > Give a name: ex: AIPScannerSecret > Select a validity period > Save
  • Note down the AppSecret in your file. After you move away from this screen, you can no longer get to your secret, make sure you have copied this to your file.
  • Next, we specify API permissions
  • Go to API Permission > Add a permission
  • Under the ‘Microsoft APIs’ tab, select: Azure Rights management services > Select ‘Application permissions’, then select the following:
    • Content.DelegatedReader
    • Content.DelegatedWriter
    • Content.SuperUser – optional (can scan all protected files)
  • Go to API Permission > Add a permission
  • Under the ‘APIs my organization uses’ tab, select: Microsoft Information Protection Sync service > Select ‘Application permissions’, then select the following
    • UnifiedPolicy.Tenant.Read
  • Grant admin consent when done

Install the scanner on Windows Server

• Logon to the Windows Server, open and run PowerShell as an administrator
• Substitute the Scanner account from your text file into the following and run:
$serviceacct=Get-Credential -UserName domain\user -Message ScannerAccount


• Substitute the SQL instance info and cluster name from your text file into the following and run:
Install-AIPScanner -SqlServerInstance domain\instance -Profile cluster -ServiceUserCredentials $serviceacct
• The scanner installation will now commence, and the database creation will take place automatically. This may take longer than expected.


• Once complete, substitute the AppId, TenantId, AppSecret and Scanner account (UPN format: user@comain.com) info from your text file into the following and run:
Set-AIPAuthentication -AppId “AppId” -AppSecret “AppSecret” -TenantId “TenantId” -DelegatedUser “Scanner account” -OnBehalfOf $serviceacct


• Verify the installation, this will return all results in Green if no issues are identified
Start-AIPScannerDiagnostics -OnBehalfOf $serviceacct

Login to Microsoft Purview: compliance.microsoft.com > Settings > Information protection scanner

A scanner node should now appear, in an idle status.


Running a scan
Login to Microsoft Purview: compliance.microsoft.com > Settings > Information protection scanner > Content Scan Job. Select the scan job checkbox, and click ‘Scan Now’ OR

Logon to the Windows Server where the scanner is installed, run PowerShell as an administrator and run ‘Start-AIPScan’ (see below)

Options:

  • Start-AIPScan
  • Get-AIPScannerStatus
  • Stop-AIPScan

Post scan

After a scan is complete, the Reports are stored on the scanner server at the following location: %localappdata%\Microsoft\MSIP\Scanner\Reports


Troubleshooting