Every business has assets on their balance sheet that determine its long-term value. Property, Plant, and Equipment depreciate, but they contain where our operations run from daily. Customer lists generate revenue, and some even have monetary valuations because contact information is worth its weight in gold depending on the industry. Trademarks and brands build moats around your customers and competition. Even non-competes and non-solicitation agreements for executives are valuable assets for businesses trying to protect "trade secrets" or strategies their competitors may try to mimic. These are what historical models used to focus on for determining the valuation of a business, and PE firms are still continuously looking at them. However, there is now a new asset class that is going to tip the scales into higher valuations, and almost no mid-market company has it yet. That's a properly dimensional data model.
This is not an IT project that just tracks data in a data warehouse. It is not just using Power BI, Adaptive Insights, or Tableau to visualize the data that has been collected. It's a deliberate tool that defines how a business's strategy is actually executed. It translates what success looks like for each individual business, how those KPIs connect to one another, and what questions can be asked across all the systems that connect them. The companies that have built this asset early are about to compound their growth over the next 5 to 10 years. The ones that think this is just an IT or Finance project are quickly going to realize that all their future investments in AI will not produce any return. Worse, those investments will lead to more confusion than clarity.
Why Now
The shift to data integrity and data readiness has already been top of mind for large acquisitions at PE firms like TPG, New Mountain Capital, Blackstone, and Bain. In a private sale process I was a part of in 2024, in the first week, some of these top-level PE firms were already asking how interconnected our data model was on the back end. They weren't asking to evaluate our reporting strength or IT system stack. They wanted to know how they could leverage our data in other AI investments built into their portfolio companies. Those questions weren't being asked in 2021 in the conversations I was a part of. In 2021, they were asking the typical questions around EBITDA, customer concentration, market strength, cash flow, and backlog or recurring revenue already contracted. In 2024, and now in 2026, they are asking about data architecture as if it's a standalone asset, because it's increasingly becoming its own asset, and one that can become the most valuable asset in the history of the company.
These questions are coming from their direct clients too. A 2026 FirmLever analysis of accounting firm M&A projects that firms with documented technology and data infrastructure will trend toward 2.5x revenue at exit, against an industry baseline of 1.0x. They describe these multiples as "unheard of for a traditional firm." This isn't the only report echoing the trend. Bain reports that technology valuations across all industries rose more than 75% year-over-year through 2025, with nearly half of the larger deals involving AI-native companies or citing AI benefits in the deal memo. The premium for being a data-prepared company is being paid for right now, and will continue to be paid for as AI improves by the week.
What changed, specifically, is the usage of AI. EY-Parthenon explicitly added AI as a "third pillar of value enhancement" in private equity, alongside financial engineering and operational excellence. This enhances the reliance on a clean and usable data model. AI didn't create the importance of the data model overnight. Anyone who's tried running an operation over $50M in recurring revenue has known the importance of clean, joinable, dimensional data for decades. What AI did was turn the data model from "important" to a true gating item that determines whether companies can deploy AI tools quickly. The companies that spent the time and money to build their data architecture before 2024 can now deploy AI tools that work. The companies that didn't are deploying AI tools that only work half the time, and getting outputs that mislead rather than inform.
This is exactly what Harvard Business Review discussed in June 2025: "few report significant returns on generative AI investments so far." The PE world is investing heavily in AI to stay on top of the growing industry, but the returns are uneven. There are two reasons for this. Wolf and Co names the first one directly: every company using AI must have clean, reliable data with clear governance, and the large majority of companies don't. AI deployed on broken data amplifies the brokenness instead of fixing it. The second reason is harder to see, but more important. Most of the AI investment going on right now is aimed at the wrong jobs. Visual content creation, marketing copy, custom websites, automating manual processes that should have been automated years ago. All useful, but none of it is transformative. The transformative use of AI is in compressing the speed of customer acquisition, identifying similar materials that are cheaper to use across your entire portfolio for cost savings, the speed of operational decisions, the speed of getting the right information in front of the right customer or vendor, and even the speed at which a business can act on what its data is telling it. These all require a data model the AI can reason with and make sense of. Most companies have neither.
The major firms have figured this out. The largest portfolio companies and corporate buyers are operationalizing it. The mid-market companies, between $15M and $200M in revenue, that make up the high majority of the U.S. economy, are about two years behind where they need to be. That gap is closing, and it's closing fast in one direction. The companies that build this asset early aren't going to be slightly more efficient than the ones that don't. They are going to be operating at a completely different tempo entirely, on data their competitors can't even query, making decisions on actions their competitors can't see coming for another 6 to 12 months.
The Two Layers of a Working Data Model
A working data model has two layers, and most mid-market companies have neither of them built correctly.
The first layer is the dimensions. These are the building blocks the business is reasoned about, the lenses every measurement gets sliced through. The second layer is the structural properties that determine whether the model actually functions across those dimensions. Both layers depend on a prior decision: the model has to be aligned to the strategy of the business, not to its industry. Two construction companies with similar revenue can need completely different data models if one is built on service contracts and the other on large projects. The service-contract business is fundamentally measuring recurring relationships across time. The project business is fundamentally measuring discrete delivery against scope. The same dynamic shows up everywhere. A SaaS company built around enterprise contracts has different fundamental dimensions than one built around self-serve volume, even at the same revenue. A manufacturer built around long-cycle equipment sales operates differently than one built around consumable supply. Two healthcare practices in the same specialty can be built around acute episodes or around chronic-care relationships, and the data model each needs is unrecognizable to the other. The word "customer" means different things in each. The unit of measurement is different. The questions the model has to answer are different. A data model built without this strategic alignment looks correct on paper and produces useless output in practice.
This is why the data model can't be bought. No off-the-shelf product, no enterprise data platform, no AI tool knows what your strategy is. Strategic alignment is human work, and it has to happen before either layer of the framework can be built.
Layer 1: The Dimensions
Every business has a small set of core dimensions it slices, measures, and reasons about. There are eight that apply to nearly every mid-market company.
Customer
Who buys from you. Sometimes a person, sometimes a building, sometimes a corporate parent that contains many sub-customers. The hardest of the eight to define cleanly because customer hierarchies are almost always more complex than they appear.
Product or Service Type
What you sell. The dimension that determines whether you're really three businesses pretending to be one, and whether your margin story is what the dashboard says it is.
Geography or Location Detail
Where the work or the business happens. Region, market, building, jobsite, server region, customer location. The dimension that surfaces whether your business is truly scaling or just expanding into new markets that haven't started underperforming yet.
Vendor
Who you buy from. Underweighted in most mid-market data models, which is why most mid-market companies have no idea what their vendor concentration risk actually is.
Contract or Work Item
The unit of revenue commitment. A service contract, a project, a work order, a recurring subscription. Different businesses have wildly different shapes here, and the choice of granularity matters enormously.
Division or Company
The legal or operational entity. Critical for any multi-entity business and especially critical for any roll-up. Most platforms treat division as an afterthought instead of as a core dimension.
Employee or Team
Who does the work. Technician, salesperson, project manager, account team, line operator. The dimension that determines whether you can answer questions about productivity, training ROI, and which people drive which outcomes.
Time
When everything happened. Sounds simple, isn't. Time is the dimension that breaks most often in mid-market data models because the source systems change while the business keeps running, and yesterday's customer table doesn't match today's.
These are dimensions, not measures. Revenue is a measure. Margin is a measure. Renewal rate is a measure. The dimensions are the lenses you slice those measures through, and a working model lets you ask any measure across any combination of dimensions. Revenue by customer by product type by geography by quarter. You should get an answer in seconds, not weeks.
The single most common mid-market data failure isn't a tooling failure. It's that the company never decided what its dimensions actually were in the first place. They have a CRM that defines customer one way, an ERP that defines the same customer differently, an HR system that uses a third notion (clients, accounts, contracts), and a project system with a fourth. Three definitions of one dimension means the dimension doesn't really exist. There are just three almost-versions of it that don't reconcile, and the data model has nothing to dimensionalize against.
This work, defining the eight dimensions, getting every system to agree on those definitions, and rebuilding the historical record so the dimensions hold across time, is slow, political, and where most data modernization projects fail. It's also the work AI cannot do, because the question of what your dimensions should be is identical to the question of how your business actually operates.
Layer 2: The Structural Properties
Once the dimensions are defined and aligned, five structural properties determine whether the model functions. Three are make-or-break. Two are supporting.
One Way to Count. The data model's job is to make sense of everyone's processes in business terms. Your CRM has a unique customer list based on sales processes and commission planning. Your ERP has a completely different customer list based on how the customer wants to be invoiced. Your data warehouse has another list that combines them all but doesn't quite match any one of them, depending on what system was treated as the "system of record" at any given point in time. In something as simple as a question of "who is our customer?" you can have at least four different answers, which will drastically impact your reporting results. Then decisions are being made off of gut feel rather than reality once again.
Your data model should translate, consolidate, and make sense of all the processes that are occurring in the company for each individual process. Not just the major ones. This is where most mid-sized companies fail in their data model build. One "exception" to the rule compounds on another, and now you live in exception reporting instead of reality reporting. These challenges can be as simple as "who is our customer?" or as complex as "what type of work did we give to that customer?"
The data model needs to be rooted in how the business actually operates, not how we think the business operates. Another simple question that every business tends to get confused about and constantly debate is "what is service?" From a construction perspective, are you servicing the building or the equipment? In a surgeon's line of work, does reopening the same patient a second time count as a completely new surgery, or as servicing work that was already done? In a SaaS business, does a custom integration count as service, or only bug fixes and customer support tickets? In manufacturing, is service only when the line is shut down and the equipment itself needs work?
Something as simple as what type of work is being done for a customer leads to many questions. The answer to those questions becomes the base your data model is built on. It needs to be rooted in how you view the business and what the strategic ambitions of the company are long term. If the goal is to continue growing a "service" line, then defining that early in the process is incredibly important and will dictate how your data model is built. You can only live off of one definition that everyone agrees upon. Otherwise, you will constantly be in a room of pointing fingers at which numbers should we actually look at. Is it the CRM, the financials, or the random spreadsheet some department actually runs operations from?
Time Behaves Properly. The data model is where reality lives. Not your budget that FP&A wants you to compare against, not your CFO's latest forecast that got built in Excel two hours before the meeting started, not the rolling reforecast FP&A is constantly tweaking. The data model holds the actual history of what the business has done, and Time Behaves Properly means that history holds together as the business and its systems change.
That sounds easy. It isn't. Take 2025 year-to-date against 2024 year-to-date. You've lost customers and gained new ones, large projects on existing customers have ramped up, the market shifted because of the political environment, and midway through 2024 you acquired a competitor and didn't get them onto your systems for six to twelve months. Then there's the calendar itself. A 4-4-5 fiscal calendar adds a leap week every five or six years. February has different working days each year. A 4-week versus a 5-week pay cycle changes payroll expense by 25% in the same month, year over year. None of these are bugs. They're what actually happened in the business and the calendar. The data model has to absorb all of them and still produce trends that are real trends, not artifacts of how the data happened to be structured at different points in time.
Most mid-market BI fails this test silently. The dashboard shows a five-year trend. The trend is wrong because something drifted between years two and three, and decisions get made on trend lines that aren't real.
A critical part of ensuring that time is behaving properly is identifying what data is comparable across time periods by your dimensions. For example, all newly acquired companies need to have their historical records reconstructed to match your data model going back at least four years. Without that work, the question "how is the platform performing year over year" is unanswerable for any company acquired in the last two years, and the company as a whole would always look like it's killing it because part of the business has no history to compare against. The historical reconstruction work is harder than the technical integration itself. It also holds almost the entire value of making your data model worth it and accurate. The same principle applies to identifying "standard" variances across time, so you can have accurate conversations about the reality of the business, not what a positive or negative trend line is suggesting.
Strategic Granularity. The data is granular enough to support the strategic decisions the business actually makes, but not so granular that the noise drowns out the signal. The right level depends entirely on what the business is trying to do. This is where most companies fail without realizing it. They have data, the data is clean, but it's aggregated to the wrong level for the decisions that matter, or it takes another mini project to get to the answers.
The failure looks like this. There is a monthly review where revenue has decreased for the 3rd month in a row. The CEO asks the question for the 3rd month, "why is our revenue decreasing?" The dashboard shows revenue and margin by service line by quarter, and it's clear there is a decline in a specific service line over the past 3 months. However, the renewal rate isn't there because renewal happens at the customer level, and the dashboard rolls customer up into service line. The head of sales throws out anecdotes as to why sales has not been as strong the past few months due to "margin constraints" or some other market factor. The head of finance says they will do a deep dive analysis on renewal rates and sales pipeline analysis to determine where the miss is for next month's review. Next month it doesn't get pulled together because the analyst who could pull it together is busy with the never-ending close cycle at a mid-market business, and even if they did, the answer would take three different system extracts and a manual reconciliation that nobody fully trusts. So the question goes unanswered, and we are now 4 months into discussing what the problem is, instead of discussing solutions to a problem. After enough of these, decisions stop being made on what the data shows and start being made on whoever in the room argues most confidently and sounds most convincing, or who brought "random" data to support their claim.
The simplest way to find your right granularity is to start with the questions you actually ask. Take your most important strategic question, then ask the three follow-up questions about what's driving the answer. The data model needs to be granular enough to answer all four. Then go one layer deeper than that, because the question you didn't anticipate is always the one that matters most. If your first question is "what is driving our service contract growth?" the follow-ups are something like "which customers are growing fastest," "which service types are they buying," and "which geographies are they in." That tells you the data model needs customer, service type, and geography as queryable dimensions, all rolled up to month or quarter. The layer deeper is whatever supports those dimensions: the technician who serviced the customer, the product mix on the contract, the specific job site rather than the metro region. You won't query it most weeks. The week you need it, it's there.
The purpose of a data model is to not be reliant on the financial month-end close cycles, which are always at least a month behind in making decisions. The data model being updated consistently throughout the week gives you answers before the finance and accounting team has finished their month-end close processes and built the traditional price-volume-mix analysis that typically confuses the room more than it explains what is happening in the business. This is also why time granularity should be weekly, not monthly. Weekly granularity lets you see what's occurring throughout the month instead of waiting for the end of the quarter to make a decision.
Operational decisions that mattered the most to a company were always a 3rd or 4th layer deeper than what was originally asked, but not to the level of what occurred every 15 seconds in a company (unless you are a security-based or SaaS-based business). That depth is the difference between meetings that produce decisions and meetings that produce next month's deep-dive analysis.
Those three are make-or-break. The other two are no less important, but they are downstream of getting the first three right.
Connectedness Across the Operational Picture. The HR system, the financial system, the operations system, and the sales system have to be joinable through the unified dimensions. The most valuable questions a business can ask require crossing these systems, and a model that can't cross them can only answer trivial questions. The most valuable cross-system question we built at Crete was "which technicians on which job types at which acquired companies produce the highest customer renewal rates." That question required joining HR, operations, sales, and customer success data, and it could only be asked because we had unified dimensions first. Without that foundation, the question was unanswerable. With it, the question revealed which technician training programs deserved investment across the platform.
Auditable Data. When a number on a dashboard is wrong, you can trace it back to its source and find the error in under ten minutes. Every figure on every report has known provenance. This is the dimension that determines whether the team actually uses the data model or quietly works around it. At Crete, a number being wrong wasn't unusual. Data flowing in from 47 acquired companies is messy, and surprises are inevitable. What was unusual was that we could find errors fast. Trust collapses when you can't, and once trust collapses, the team builds shadow spreadsheets in Excel and the data model becomes irrelevant.
These two layers, the eight dimensions and the five structural properties, are the structural definition of a working data model. None of them are technical problems. All of them are business problems with technical implications, and they're choices about what the company measures, how those measurements relate, and what questions the company is built to answer. AI can implement the model once it's defined, but AI cannot define it, because the question of what your dimensions should be is identical to the question of how your business actually operates. That's the human work this asset requires, and it's why building it well is one of the highest-leverage things a mid-market operator can spend the next two years on.
What To Do Monday Morning
The framework above has a specific first move, and it's the same whether you're a PE operating partner six months into a portfolio company's hold period or a founder-CEO running a $30M services business in your sixth year.
Don't start with the data.
Start with a half-day on the calendar, the CEO and the two or three people who run the operational core of the business in the room, and one question on the whiteboard: What are the three to five business questions this company needs to be able to answer in the next 18 months?
The answers should be specific and tied to the strategy. "Are we growing?" doesn't qualify. It's too abstract to drive a decision. "What is driving our service contract growth?" qualifies, because it's a real question a CEO asks, and answering it well requires the data model to work. So does "Which product lines are gaining margin and which are losing it?" Or "Which customer segments are renewing at higher rates than expected, and which aren't?" Or "Where are we acquiring customers we shouldn't be acquiring?" These sound like sentences a person would actually say in a meeting. They are also, underneath, sophisticated dimensional questions. Answering "what is driving our service contract growth" means slicing growth across customer, service type, geography, and time, then comparing what you find to what your strategy assumed.
Write the questions down. Look at them. For each one, ask whether your team could answer it cleanly tomorrow morning. If they could, the data model is doing its job. If they couldn't, and they usually can't, that's the gap.
That document, with the questions written out and the gap identified for each one, is the strategic alignment artifact. Everything downstream of it gets easier. Without it, every conversation about "should we invest in a data warehouse" is a conversation in the wrong direction. With it, the next several months of work practically prioritize themselves.
This is the work that has to happen before any system, any tool, any AI deployment. It's also the work that most CEOs and operating partners avoid, because it requires admitting which questions the business currently can't answer. That admission is the precondition for fixing it.
The Diagnostic
There's one question that diagnoses whether your data model is doing its job, and it's not a question for a vendor, a consultant, or a board deck. It's a question for your own team.
Pick the most important strategic question your business is currently trying to answer. The one that, if answered cleanly, would change a real decision in the next 12 months. Then ask your team to actually answer it. Not strategically. Operationally. Have them produce the actual numbers, with the actual dimensional breakouts, by tomorrow afternoon.
If you don't have a question yet, here are four to choose from. Each has the version a CEO would casually ask in a conversation, and the version that actually tests the data model.
The growth question
"What is driving our growth?"
The version that actually tests the model
"How much of our growth this year came from existing customers buying more versus new customers we acquired, and which products or services are doing most of that work?"
The profitability question
"Are we still making good money on our best products?"
The version that actually tests the model
"Across our top product lines, how is gross margin trending over the last two years, and is the trend the same across our biggest customers and our smallest?"
The retention question
"Are our customers staying with us?"
The version that actually tests the model
"Of the customers we acquired in 2023, how many are still customers today, and is the retention pattern different for the ones who came in through different channels?"
The portfolio question
"Which parts of the business are working and which aren't?"
The version that actually tests the model
"Across our locations or divisions, which ones are growing revenue but losing margin, and which are doing the opposite, and what's different about how they operate?"
Pick the one that matters most for your situation. Then run the test.
If your team can answer it cleanly, with the dimensional breakouts that matter, by tomorrow afternoon, your data model is doing its job. The framework above describes what you've already built. Use this article to communicate to your buyer or your board what you have and why it's valuable.
If they can't, and most teams cannot, the answer they bring back will be approximate, hedged, or accompanied by a list of caveats about why the systems aren't quite reconciled. They'll explain that this number comes from one system, that number comes from another, the two don't quite agree, and the reconciliation takes a few days. Or they'll go pull the data themselves into a spreadsheet, which is a sign you've been operating without a working data model for longer than anyone has acknowledged.
That hedge is the diagnostic. The gap between what you can answer in 24 hours and what you should be able to answer in 24 hours is exactly the work this article has been about. Closing that gap is the highest-leverage investment most mid-market businesses can make right now. The companies that close it are the ones that will compound.
The Window
Most mid-market businesses have a narrow window to do this work. The companies that start now, the ones that book the half-day, name their dimensions, run the diagnostic, and follow what they find, will compound for the next decade on an asset their competitors don't have. The companies that wait will discover, 18 to 36 months from now, that they're doing the same work under more pressure: in the middle of a sale process, after a failed AI deployment, or while a board demands answers the data can't yet produce. The window is open right now precisely because most companies haven't moved through it yet. It will close.
The reason this asset compounds rather than commoditizes is the same reason it can't be bought off the shelf. AI can implement a data model once it's defined, but AI cannot define one, because defining one is identical to deciding how the business actually operates. Which dimensions matter, which questions need answering, which trade-offs the company is willing to accept. That work is human work. It requires judgment about strategy that no model can possess on its own, because no model knows what the company is built to do.
This is why most of the AI investment happening right now is producing thin returns. The AI tools are pointed at the wrong jobs. Content generation, marketing automation, productivity hacks for technical staff. None of it transforms the business. The transformative use of AI requires a working data model underneath it, and most companies don't have one. The ones that do are about to compound for the next decade, on data their competitors can't query, with AI working on top of an asset their competitors don't own.
This is the work mid-market businesses can do over the next 18 to 24 months. Not buy. Not outsource. Build, deliberately, slowly, politically, with the right people in the room making strategic choices that match what the business is actually trying to be.
There is no other operational asset that pays back across enterprise value, decision speed, and AI leverage simultaneously. There is no other asset that gets more valuable as AI gets better. And there is no other asset that has to be built rather than bought.
That is what makes it the most important capital project in the room.