Rowe, N. C. (2016) Identifying forensically uninteresting files in a large corpus. EAI Endorsed Transactions on Security and Safety, 3 (7). e2. ISSN 2032-9393
eai.8-12-2016.151725.pdf
Available under License Creative Commons Attribution No Derivatives.
Download (1MB) | Preview
Abstract
For digital forensics, eliminating the uninteresting is often more critical than finding the interesting. We discuss methods exploiting the metadata of a large corpus. Tests were done with an international corpus of 262.7 million files obtained from 4018 drives. For malware investigations, we show that using a Bayesian ranking formula on metadata can increase malware recall by 5.1 while increasing precision by 1.7 times over inspecting executables alone. For more general investigations, we show that requiring two of nine criteria for uninteresting files, with exceptions for some special interesting files, can exclude 77.4% of our corpus. For a test set that was manually inspected, interesting files identified as uninteresting were 0.18% and uninteresting files identified as interesting were 29.31%. The generality of the methods was confirmed by separately testing two halves of our corpus. This work provides both new uninteresting hash values and programs for finding more.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | digital forensics, metadata, files, corpus, data reduction, hashes, triage, whitelists, classification, malware, camouflage. |
Subjects: | H Social Sciences > H Social Sciences (General) Q Science > QA Mathematics > QA75 Electronic computers. Computer science QA75 Electronic computers. Computer science |
Depositing User: | EAI Editor IV |
Date Deposited: | 26 Mar 2021 13:51 |
Last Modified: | 26 Mar 2021 13:51 |
URI: | https://eprints.eudl.eu/id/eprint/2047 |