Getting Started with contentCrawler

Running contentCrawler in trial mode (15:00)


contentCrawler can be used immediately (without a paid license) to run a ‘Trial Audit.’ It is recommended to run the audit as a way of understanding how many documents in NetDocuments are non-searchable and/or can be compressed.

We encourage you to discuss the audit results with DocsCorp before running in production. Our guidance will extend to advising whether a larger multi-core Azure VM is advisable OR scaling out to multiple Azure VMs might be suitable for your circumstances and any timeframes you might be working towards.

For more complete administration information on contentCrawler please see the Administration Guide.

How to run a contentCrawler Trial Audit

1. Double-click the contentCrawler Administration Console icon on the Windows Desktop.

2. Select File > New.

3. Under the Backlog service icon, click New.

4. Select OCR and/or Compress checkbox options to be reviewed in this audit.

-       For the audit process select both options.

5. Connect to your NetDocuments Repository.

-       Select NetDocuments as the content repository.

-       Select your NetDocuments datacenter location.

-       Select to authenticate to the selected datacenter. The following NetDocuments authentication screen will be displayed.

-       Specify a user name and password that provides FULL CONTROL permissions to access all documents in your NetDocuments content repository.

-       Click Loginto authenticate - NOTE: your password details are not stored on the VM.

-       Once successfully authenticated by NetDocuments, a list of your repository cabinets will be populated. Select one cabinet per service.

IMPORTANT: Authentication to NetDocuments is valid for 12 months. Re-authentication will be required before the expiry date shown to ensure the service continues without error. Re-authentication can be performed via the service settings screen. Refer to the full Administration Guide for further details.

6. First time authentication to NetDocuments will prompt to grant contentCrawler access to your NetDocuments repository for successful processing.

-       Once access has been granted, contentCrawler will be listed under ‘My Integrated Apps’ in NetDocuments under Settings > Manage App Access.

IMPORTANT – Revoking access for contentCrawler WILL stop all contentCrawler services processing.

7. Set your search ‘date range’ and ‘file types’ to be searched.

-       You can edit these settings if you have any particular requirements for this audit, but the defaults are suitable in all cases.

IMPORTANT: Version 1 of a MSG file cannot be replaced. contentCrawler will process version 1 of a MSG file and save as a new official version. 

8. Assessment Settings

-       It is recommended these settings are left as default for the audit.

-       The default settings are used by the vast majority of contentCrawler customers in audit and production mode.

9. Compression Settings

-       It is recommended these settings are left as default for the audit.

10. Select to save processed documents back to NetDocuments as New Version or Replace Original.

-       New versions will be saved as the ‘Official,’ this is to ensure that the new OCR content is indexed and available for searching in NetDocuments. Unofficial versions are not indexed by NetDocuments.

-       Backup Details are required, click Browse to locate or create a local folder, and enter the required Admin Username (ie. domain name\username) and Password to access it.

-       contentCrawler creates a backup folder of the original documents before replacing the original. NOTE: your password details are stored only on this VM and are encrypted.

11. Specify a name for the Trial Audit Service and select ‘Start service now’ to start the service immediately.

-       Trial Mode is automatically Enabled.

-       It is recommended these settings are left as default for the audit.

Note: 48 hours is the maximum run time in trial mode, but the audit information can still be accessed after the 48 hours has finished.

12. Click finish, you will be returned to the contentCrawler Administration Console Dashboard.

13. The trial audit will begin searching all existing documents in your NetDocuments repository for documents that match the criteria specified in your service.

-       An overview and basic service information on how many documents have been searched and assessed can be viewed.

-       For more detailed information and real-time reporting, navigate to the Service Summary Dashboard by clicking on <Named Backlog Service> link - in our example above Backlog – NetDocuments.

14. The Backlog Service Summary will be displayed.

15. Once the search of your repository has been completed, contentCrawler will assess a percentage of the documents found to determine if they are eligible for OCR and/or Compression (depending on the process types specified in the service creation).

16. Select ‘Save Audit Report’ in the Audit Summary pane to save a summary of overall statistics of the audit.

Important information

During the search and assessment phase of the audit, no documents in your NetDocuments repository have been physically updated for OCR or Compression. To enable you to evaluate the processing end to end, trial audit mode allows up to 100 documents to be processed and saved back to your NetDocuments repository.

Documents that have been queued for processing are Held for review at the Search and Assess stage. Unchecking hold for review will filter 100 random documents through to the process stage.

Once a document has been successfully processed (OCR’d and/or compressed) this will be ‘held for review’ at the process stage before being saved back to your NetDocuments repository.

These documents can be manually reviewed and released to save individually, alternatively unchecking hold for review at the process stage will allow the 100 processed documents through to save.

NOTE: Saved documents are going into your NetDocuments document cabinets.

For full details on processing workflows please see the Administration Guide.