The Storage Problem Nobody Talks About
Ask an IT director how much storage their organisation is consuming in Microsoft 365 and you will almost always get the same answer: a shrug followed by a rough estimate. The reason is simple — Microsoft 365 storage grows silently. Every Teams channel file upload, every SharePoint document version, every OneDrive sync conflict, every shared mailbox attachment accumulates without any automatic cleanup mechanism. Over a period of three to five years, the average UK organisation accumulates an enormous volume of data that nobody has asked for and nobody is managing.
The shape of this problem is remarkably consistent across organisations of different sizes and sectors. SharePoint sites created for projects that ended two years ago remain live, populated with superseded documents and conflicting versions. OneDrive accounts belonging to former employees sit untouched, consuming their full allocation. Teams channels contain dozens of copies of the same presentation, uploaded by different team members at different stages of revision. Version history — enabled by default and rarely limited — stores thirty or forty iterations of documents that nobody will ever retrieve.
Microsoft's storage model amplifies this issue financially. Every Microsoft 365 tenant includes a base pool of 1TB plus 10GB per licensed user. An organisation with 200 users therefore has a 3TB pool included in their subscription. Beyond that threshold, Microsoft charges for additional storage — and the overage rates are not trivial. Organisations that have allowed storage to grow unchecked frequently discover they are paying a substantial recurring overage charge on top of their licence fees, for data that is almost entirely without ongoing business value. Duplicate files, old version histories, orphaned sites, and stale personal drives account for the overwhelming majority of this cost.
The problem is compounding. As organisations add users, run more projects, and generate more content, storage grows faster than included allocations. Without intervention, the storage line on the M365 bill only ever goes in one direction — and the cost of inaction doubles every two to three years for a growing organisation.
Why Storage Bloat Is a Copilot Problem, Not Just a Budget Problem
When organisations deploy Microsoft Copilot, they are deploying an AI system that indexes and retrieves information from across the Microsoft 365 estate. Copilot does not distinguish between a document that was accurate last week and one that was superseded eighteen months ago. It cannot tell the difference between the final version of a contract and the fourteen drafts that preceded it. It indexes everything it has permission to access — and in most tenants, that means everything.
The consequences for Copilot quality are significant and are only beginning to be understood. When a user asks Copilot to summarise the current pricing terms with a supplier, it may draw on three different versions of the master service agreement stored across two SharePoint sites and a Teams channel. When it is asked to generate a document in line with company policy, it may reference a policy document that was revoked and replaced. The result is not an error message — Copilot surfaces an answer that looks authoritative but is built on contradictory or outdated source material. These are the hallucinations that erode trust in AI tools and generate significant rework costs.
The research behind our 3x hallucination statistic reflects a consistent finding across Copilot deployments: tenants with high proportions of stale, duplicate, or orphaned content generate measurably worse Copilot outputs than tenants with clean, well-governed data. The underlying mechanism is straightforward — retrieval-augmented generation, the architecture Copilot uses to answer questions about your organisation's data, is only as good as the data it retrieves. Poor signal-to-noise ratio in the underlying content pool degrades response quality in proportion to the volume of noise.
Copilot agents built on messy data fail silently, which is the most dangerous failure mode of all. A Copilot agent configured to handle supplier queries, for example, may appear to function correctly while routinely drawing on outdated contract terms. The agent gives confident, well-structured answers. Those answers are wrong. A real example from a financial services deployment: a Copilot agent trained on a tenant containing multiple overlapping versions of a commercial pricing schedule consistently quoted rates from an agreement that had been renegotiated and replaced. The error was only discovered when a client queried a discrepancy — by which point it had propagated through several client communications. Clean data is not a luxury for Copilot deployments. It is a prerequisite.
What TSO Reveals in a Single Scan
TSO — Tenant Storage Optimisation — is a read-only diagnostic tool that connects to your Microsoft 365 tenant via the standard Microsoft consent flow and produces a comprehensive picture of your storage estate in minutes. Unlike manual audits, which typically require weeks of effort from IT staff and produce incomplete results, TSO crawls every accessible node of the tenant automatically — SharePoint sites, OneDrive accounts, Exchange mailboxes, Teams channels — and aggregates the findings into a structured report that supports immediate action.
The scan surfaces several categories of information that are genuinely difficult to obtain through native Microsoft tooling. At the most basic level, it shows storage economics: your total actual usage versus your included pool versus the overage calculation at current Microsoft pricing. For many organisations, this single view is the first time anyone has seen the real storage bill in a format that connects consumption to cost. The numbers are frequently surprising — not because the data is hidden, but because Microsoft's native reporting makes it genuinely difficult to aggregate across workloads.
Beyond the headline numbers, TSO identifies the largest storage consumers broken down by SharePoint site, by user, and by file type. This decomposition is commercially useful because it identifies where cleanup effort will have the greatest impact. A single SharePoint site for a concluded project, for example, may account for 15% of total storage — and may be entirely safe to archive or delete. TSO highlights these high-value targets explicitly rather than leaving IT teams to work through exhaustive lists manually.
The most actionable output is the content quality analysis. TSO identifies files with zero access in the past 12 months or longer — the category that represents the bulk of the 68% statistic in our opening data. It surfaces duplicate files across sites, version histories that have grown beyond any reasonable retention requirement, and orphaned sites that have no active owner and have not been accessed in over a year. Each of these categories carries a specific cost and a specific cleanup recommendation.
Finally, TSO produces three cleanup scenarios that give IT and finance leaders a range of options based on appetite for disruption. The Conservative scenario models a 10% storage reduction using only the lowest-risk cleanup actions — archiving orphaned sites and removing clearly superseded file versions. The Expected scenario models a 20% reduction by adding deduplication across key SharePoint sites. The Aggressive scenario models a 35% reduction through comprehensive cleanup including personal drive archiving and full version history management. Each scenario is costed against actual Microsoft pricing so that the financial case is immediate and verifiable.
The Economics: What Storage Cleanup Actually Saves
To make the financial case concrete, consider a representative UK organisation with 143 licensed Microsoft 365 users. Their included storage pool is 1TB base plus 1.43TB (10GB per user) — approximately 2.43TB total. TSO's scan reveals actual consumption of 29.9TB, driven by years of accumulated Teams content, multiple generations of SharePoint project sites, and OneDrive allocations that have never been subject to any lifecycle management. At Microsoft's current overage pricing, the organisation is paying approximately £66,000 per year above their included allocation for this excess storage.
Applying TSO's three cleanup scenarios to this organisation produces a concrete savings range. The Conservative approach — archiving orphaned sites, removing version history beyond ten versions per document, and archiving OneDrive accounts for leavers — reduces total consumption to approximately 26.9TB and eliminates approximately £11,000 of the annual overage. The Expected scenario, which adds deduplication across the five largest SharePoint site collections and removes files not accessed in three or more years, reduces consumption to approximately 23.9TB and saves approximately £18,000 annually. The Aggressive scenario — comprehensive cleanup including personal drive archiving, full deduplication, and version history limits of five per document — reduces consumption to approximately 19.4TB and saves approximately £29,000 per year.
These savings are achievable within a single quarter. The cleanup actions required for the Conservative and Expected scenarios typically take one to two days of IT effort using Microsoft's native tools, guided by TSO's prioritised target list. The Aggressive scenario requires more careful change management — particularly around personal drive archiving, which benefits from a user communication campaign — but remains achievable within six to eight weeks for most organisations of this size.
The financial case does not stop at direct storage savings. The secondary benefit is Copilot quality uplift — faster responses, more accurate answers, fewer re-prompts — which is harder to quantify precisely but consistently observed across TSO deployments. Organisations that clean their tenant before or immediately after Copilot deployment report measurably higher user satisfaction scores and lower rates of the "Copilot gave me the wrong answer" complaints that drive disengagement and adoption stall. For organisations already paying for Copilot licences, the return on a day's worth of cleanup effort is immediate and compounding.
"TSO gave us a complete picture of our M365 storage for the first time. We reduced our SharePoint footprint by 28% in six weeks and cut our overage bill entirely." — IT Director, UK Financial Services
How to Run Your First TSO Scan
TSO is designed to be accessible to IT teams without specialist data engineering skills. The tool operates on a read-only basis throughout — it never modifies, moves, or deletes any content in your tenant, and no data leaves your Microsoft 365 environment at any point. The scan is a diagnostic tool, not a cleanup tool: it shows you what is there and what it costs, and gives your team the information needed to take action using Microsoft's own native management capabilities.
Connecting TSO to your tenant takes approximately two minutes. The process uses Microsoft's standard OAuth consent flow — the same mechanism used by thousands of enterprise applications connecting to Microsoft 365. An administrator with Global Admin or SharePoint Admin rights authorises the connection, and TSO receives read-only access to the storage metadata it needs for the analysis. The consent screen clearly lists the permissions requested. No credentials are stored; TSO uses a short-lived access token for the duration of the scan only.
For tenants of up to 10,000 users, the scan completes in minutes. Larger tenants complete within an hour. The scan runs asynchronously — the administrator does not need to remain connected while it runs — and delivers a notification when the report is ready. For most UK mid-market organisations, the practical experience is: authorise the connection after lunch, and have a complete storage report waiting when you return.
The report is available for download in both PDF and CSV formats. The PDF is formatted for presentation to IT leadership or finance directors — a structured summary of findings, the three cleanup scenarios with their associated cost savings, and a prioritised list of cleanup actions. The CSV provides the underlying data at site and user level for IT teams who want to work through the detail or import findings into their own tooling. Both formats include everything needed to move directly from scan to action without any additional data gathering.
Conclusion
The connection between SharePoint storage hygiene and Microsoft Copilot performance is not a theoretical concern — it is a practical reality that every organisation deploying Copilot will encounter. Storage bloat inflates Microsoft 365 costs, degrades the quality of AI-generated answers, and creates the conditions for the kind of silent failures that erode trust in enterprise AI tools. TSO addresses both problems simultaneously: a single scan reveals the true cost of storage waste and provides the prioritised information needed to clean it, at a fraction of the time and effort required by any alternative approach. Clean data means a lower bill and a better Copilot. The two goals are not in tension — they are the same goal, addressed by the same action.