top of page

CUI Identification: The Overwhelming Challenge of Manual Detection in Enterprise Environments

  • Feb 17
  • 4 min read
Manual CUI identification in enterprises with 94 million+ files

The CUI program, established by Executive Order 13556, standardizes the safeguarding of unclassified information that requires protection across executive branch agencies and their contractors. DoD Instruction 5200.48 provides specific guidance for designation, marking, handling, dissemination, and training related to CUI. Defense contractors supporting DoD contracts must apply these requirements to all information systems that process, store, or transmit CUI, including cloud environments such as Microsoft 365.


Key Challenges in CUI Identification


Despite clear policy intent, manual CUI identification remains exceptionally difficult for defense contractors. The official CUI Registry lists over 120 distinct categories, each carrying unique marking, dissemination, and handling rules. Contractors must evaluate every document, email, drawing, and spreadsheet against these categories, often without uniform agency guidance.


Legacy markings such as For Official Use Only or Sensitive But Unclassified frequently require manual re-evaluation and potential re-marking as CUI, adding significant time and error risk. Determining system scope and boundaries is another persistent challenge, as CUI may exist in unmarked files, subcontractor portals, Teams channels, or email threads developed in support of contracts. Department of Defense Office of Inspector General audits have repeatedly identified weaknesses in contractor compliance, including inadequate verification of CUI protection measures and inconsistent marking practices. These findings demonstrate that manual processes are inherently prone to human error, oversight, and inconsistency, particularly in large-scale environments.


The Scale of Enterprise Data: A Practical Illustration


To illustrate the scale of the challenge, consider a defense contractor with 4,000 employees licensed for Microsoft 365 E3 or equivalent plans. SharePoint Online, a primary repository for technical data packages, engineering drawings, contracts, and compliance documentation, allocates 1 TB base storage plus 10 GB per licensed user, totaling approximately 41 TB of tenant storage. This represents a standard configuration per Microsoft documentation, though organizations may purchase additional storage.


Assuming the storage contains typical defense-related files (PDFs of technical data, engineering drawings, spreadsheets, Word documents, and scanned records), industry benchmarks indicate an average file size of around 435 KB. This results in roughly 94 million files across the 41 TB environment.


Manually reviewing each file for CUI indicators at a conservative rate of 20 seconds per file requires approximately 1.88 billion seconds. This equates to:






For a single compliance or security professional working an 8-hour day, with approximately 250 working days per year (accounting for holidays, leave, and weekends), the effort translates to over 65,000 workdays, or approximately 260 years.


Real-world file-size variations amplify the impracticality. Smaller files, such as plain text technical notes or email exports at 50 KB, could increase the count to around 880 million files, extending continuous review timelines well beyond 500 years and human effort proportionally longer. Larger files, such as high-resolution 5 MB scanned engineering drawings or CAD exports, might reduce the count to roughly 8.6 million files, shortening continuous review to about 5 years, or around 20 years for a single reviewer. Even in the most favorable scenario, manual processes remain unsustainable for any defense contractor subject to CMMC Level 2 or higher requirements.


Why Manual Processes Are Unsustainable


Additional complexities compound the challenge. SharePoint version history multiplies review effort if historical versions must be assessed. OneDrive for Business provides 1 TB per user, potentially adding another 4,000 TB across the organization if fully utilized, scaling timelines accordingly. Emails, Teams chats, and subcontractor-shared portals often contain unmarked CUI, requiring separate discovery processes. Human factors such as fatigue, context switching, and interruptions further reduce accuracy and efficiency.


The cumulative effect is clear: manual CUI identification cannot scale to modern enterprise data volumes while meeting the rigorous standards of NIST SP 800-171 and CMMC. Contractors risk audit failures, false negatives in CUI discovery, unauthorized disclosure, and loss of competitive positioning.


Why Automation is Essential for CUI Identification


Automated, purpose-built CUI identification solutions address these limitations directly. Solutions like Teramis scan environments systematically, detect CUI indicators with high precision, classify data against relevant categories, and provide continuous monitoring for new or modified instances. By automating discovery and classification, these solutions reduce human burden, minimize errors, ensure consistent application of marking and protection rules, and generate auditable evidence for CMMC assessments.


Implementing such technology aligns with the intent of federal requirements to protect CUI effectively without imposing impractical manual burdens. Defense contractors can focus resources on core mission activities while maintaining compliance and security posture.


In conclusion, the volume and complexity of data in defense contractor environments render manual CUI identification unsustainable and high-risk. Transitioning to automated scanning and monitoring is essential for achieving and demonstrating compliance in an increasingly regulated and data-intensive landscape.




Sources:



Comments


bottom of page