contentCrawler for SharePoint Online - Getting Started


On-Demand Webinar

Learn how to download the contentCrawler for SharePoint Online VM, and to run the complimentary audit.


contentCrawler can be used immediately (without a paid license) to run a ‘Trial Audit.’ It is recommended to run the audit as a way of understanding the number of documents in SharePoint that are non-searchable and/or that can be compressed.

We encourage you to discuss the audit results with DocsCorp before running in production. Our guidance will extend to advising whether a larger multi-core Azure VM is advisable OR scaling out to multiple Azure VMs might be suitable for your circumstances and any timeframes you might be working towards.

For more complete administration information on contentCrawler please see the Administration Guide.

How to run a contentCrawler Trial Audit

1. Double-click the contentCrawler Administration Console icon on the Windows Desktop.

2. Select the File > New 3.

3. Under the Backlog service icon, click New.

4. Select OCR and/or Compress checkbox options to be reviewed in this audit.
- For the audit process select both options.

5. Connect to your SharePoint Online service.
- Select SharePoint Online as the content repository.
- Enter a site URL for contentCrawler to search. This can be the top level site or any particular site location. Sub-sites under this location will also be included in the search.
- Choose ‘Claims Based Authentication’ option from the drop-down.
- Enter user credentials and password for SharePoint Online with sufficient permissions to read/write to the document libraries in your sites.

NOTE: your password details are stored only on this VM and are encrypted.

6. Set your search ‘date range’ and ‘file types’ to be searched.
- You can edit these settings if you have any particular requirements for this audit, but the defaults are suitable in all cases.

7.  Indicate if you would like contentCrawler to search minor versions
- If you are using minor versions option in any of your document libraries, contentCrawler will by default only search for the major version e.g. V1.0, v2.0 and not the minor versions such as V1.1, V1.2.
- If your use of minor versions is such that a minor version is not just a work in progress draft then checking this option may suit your needs.
- contentCrawler is looking for image file types such as TIFF, PDF documents and email MSG files. These may be less likely in any case to have multiple versions.

8. Assessment Settings
- It is recommended these settings are left as default for the audit.
- The default settings are used by the vast majority of contentCrawler customers in audit and production mode.

9. Compression Settings
- It is recommended these settings are left as default for the audit.

10. Select to save processed document back to SharePoint as New Version or Replace Original.
- Backup Details are required, click Browse to locate or create a local folder, and enter the required Admin Username (ie. domain name\username) and Password to access it.
- contentCrawler creates a backup folder of the original documents before replacing the original. A backup will also be taken when adding a new version if a SharePoint document version limit has been reached. For more information refer to contentCrawler SharePoint Online Versioning.

NOTE: your password details are stored only on this VM and are encrypted.

11. Specify a name for the Trial Audit Service and select ‘Start service now’ to start the service immediately.
- Trial Mode is automatically Enabled - It is recommended these settings are left as default for the audit. Note: 48 hours is the maximum run time in trial mode, but the audit information can still be accessed after the 48 hours has finished.

12. Click finish, you will be returned to the contentCrawler Administration Console Dashboard.

13. The trial audit will begin searching all existing documents in your SharePoint Online site for documents that match the criteria specified in your service.
- An overview and basic service information on how many document have been searched and assessed can be viewed.

- For more detailed information and real-time reporting, navigate to the Service Summary Dashboard by clicking on link - in our example above Backlog – SharePoint Online Test.

14. The Backlog Service Summary will be displayed.

15. Once the search of your repository has been completed, contentCrawler will assess a percentage of the documents found to determine if they are eligible for OCR and/or Compression (depending on the process types specified in the service creation).

16. Select ‘Save Audit Report’ in the Audit Summary pane to save a summary of overall statistics of the audit.

Important information

During the search and assessment phase of the audit, no documents in your SharePoint Online site have been physically updated for OCR or Compression. To enable you to evaluate the processing end to end, trial audit mode allows up to 100 documents to be processed and saved back to your SharePoint Online site.

Documents that have been queued for processing are Held for review at the Search and Assess stage. Unchecking Hold for Review will filter 100 random documents through to the process stage.

Once a document has been successfully processed (OCR’d and/or compressed) this will be ‘Held for Review’ at the process stage before being saved back to your SharePoint Online site.

These documents can be manually reviewed and released to Save individually, alternatively unchecking Hold for Review at the process stage will allow the 100 processed documents through to Save..

NOTE: Saved documents are going into your SharePoint Online document libraries.

For full details on processing workflows please see the Administration Guide.