Last updated:

August 10, 2018

About the project

Public Record Office Victoria (PROV) has undertaken a Proof of Concept project with CenITex to test an eDiscovery tool on a sample set of Lotus Notes emails. Emails are a vital part of doing business and considered public records under the Public Records Act 1973. Emails enable exchange of ideas, enactment of decisions and support collaboration between an increasingly dispersed workforce. In government, emails also provide evidence essential for accountability and need to be preserved as public records into the future.


Why did PROV undertake this project?

Over twenty years of routine backup has resulted in an unwieldly backlog of Victorian Government emails including 67,000 tapes and 28 petabytes of content. Access and retrieval of emails for the purpose of analysis and evidence of decisions can be difficult, expensive and time consuming. This compromises the Government’s reputation for transparency and accountability.

PROV undertook a Proof of Concept project to develop and test a process that could address these failures and be made available for future re-use across departments. It involved exploring the use of an eDiscovery tool to review and facilitate disposal of large volumes of emails, including:
• an initial assessment to quantify and qualify a sample email data set
• identifying duplicates within the data set
• identifying low value versus high value records within the data set
• assigning contextual information to the de-duplicated set
• a manual review of results to determine level of accuracy.


What was involved in the project?

The tool was used to identify duplicate emails from within the sample, and low-value emails from those remaining after de-duplication. To identify low value emails among the remaining sample we reviewed a list of email domains to identify those that would reasonably result in irrelevant, non-business related emails. The top results, which included common subscription emails and Google Alerts, were selected and saved as filters. The use of Fwd: in the subject line was also used as a filter.

Next we tried a second approach on the sample, searching the remaining emails for key search terms.

Using a third approach we were able to apply additional contextual information to the emails, which would allow them to be grouped by areas of responsibility within the organisation. This allowed us to assess and prioritise the emails to be kept long term.


What were the results?

Of the sample 4.6 million emails we found 43% duplication and 7% of low value.

2 pie charts showing 43 % duplication and 7% low value emails


The eDiscovery tool was successful in allowing us to identify emails eligible for disposal, as well as assessing and prioritising remaining emails with between 98% and 100% accuracy, with upto 50% of the sample identified for potential disposal. The tool allowed us to apply additional metadata to every email in the set, enabling easier identification of emails at a high level, facilitating future decision making around retention.

An eDiscovery tool may be used to assist agencies to reduce their email backlogs and unlock greater value from their email assets, though a larger sample of manual testing is recommended prior to implementing disposal. Note, an eDiscovery tool may be beyond the means of smaller agencies who nonetheless struggle with similar email backlog issues. An investigation into email back-up for smaller agencies and potential testing of free, open source solutions is recommended.

For more information, download our proof of concept summary as a PDF


Next steps

This Proof of Concept was just the first step in making Lotus Notes emails more available, valued and better managed. Phase two of the project will begin later in 2018-19. This page will be updated as Phase two progresses so please check back. 

If you'd like further information about this project feel free to contact David Brown, Assistant Director Government Services,