About the projects
Public Record Office Victoria (PROV) undertook a Proof of Concept (PoC) project with CenITex during 2017/18 to test an eDiscovery tool on a sample set of Lotus Notes emails, focusing on disposal outcomes.
During 2019/20 PROV is undertaking a Pilot project to further test the eDiscovery tool as well as other available tools, this time, focusing on appraisal outcomes. Outcomes of the Pilot project will be made available on this page once the project has progressed.
Why is PROV undertaking these projects?
Emails are important records
Emails are a vital part of doing business and considered public records under the Public Records Act 1973. Emails enable exchange of ideas, enactment of decisions and support collaboration between an increasingly dispersed workforce. In government, emails also provide evidence essential for accountability and need to be preserved as public records into the future.
Emails have not been well-managed to date
Over twenty years of routine backup has resulted in an unwieldly backlog of Victorian Government emails including 67,000 tapes and 28 petabytes of content. Access and retrieval of emails for the purpose of analysis and evidence of decisions can be difficult, expensive and time consuming. This compromises the Government’s reputation for transparency and accountability.
About the PoC project
Summary of tasks
The Proof of Concept project involved exploring the use of an eDiscovery tool to review and facilitate disposal of large volumes of emails, including:
- an initial assessment to quantify and qualify a sample email data set
- identifying duplicates within the data set
- identifying non-records within the data set
- assigning contextual information to the de-duplicated set
- a manual review of results to determine level of accuracy.
The tool was used to identify duplicate emails from within the sample, and low-value emails from those remaining after de-duplication. To identify low value emails among the remaining sample we reviewed a list of email domains to identify those that would reasonably result in irrelevant, non-business related emails. The top results, which included common subscription emails and Google Alerts, were selected and saved as filters. The use of Fwd: in the subject line was also used as a filter.
Next we tried a second approach on the sample, searching the remaining emails for key search terms.
Using a third approach we were able to apply additional contextual information to the emails, which would allow them to be grouped by areas of responsibility within the organisation. This allowed us to assess and prioritise the emails to be kept long term.
Of the sample 4.6 million emails we found 43% duplication and 7% of low value.
The eDiscovery tool was successful in allowing us to identify emails eligible for disposal, as well as assessing and prioritising remaining emails with between 98% and 100% accuracy, with upto 50% of the sample identified for potential disposal. The tool allowed us to apply additional metadata to every email in the set, enabling easier identification of emails at a high level, facilitating future decision making around retention.
An eDiscovery tool may be used to assist agencies to reduce their email backlogs and unlock greater value from their email assets, though a larger sample of manual testing is recommended prior to implementing disposal. Note, an eDiscovery tool may be beyond the means of smaller agencies who nonetheless struggle with similar email backlog issues. An investigation into email back-up for smaller agencies and potential testing of free, open source solutions is recommended.
For more information about the PoC project outcomes, please download our proof of concept summary report.
Please contact David Brown, Assistant Director Government Services, firstname.lastname@example.org for further information.