I was looking into the new, probably AI, data center being built in town and noticed it’s built by a private equity backed firm. The data center was rejected by the city and has to operate with a standard cooperate building water supply. They said are switching to air cooling only and reducing the compute power to keep power usage the same. This has caused amazon, the alleged operator, to back out. So they are building a giant reduced capacity data center with no operator and apparently still think that’s a good idea. My understanding of the private equity bubble is that the firms can hide “under performing” assets because it’s all private. From what I read, possibly 3.2 Trillion dollars of it. I feel like this new data center is going on the “under performing” pile.
I would not call PE a “bubble”. It’s not something people are just tossing money into because there are nebulous promises and the numbers are going up. PE is involved in EVERYTHING - restaurants, housing, tech, manufacturing, finance, marketing. It’s not an industry, just a way of investing that bypasses pretty much all of the safeguards and regulations societies have put in place for public trading. And I don’t expect it to “pop”. Either it continues, and all of the wealth continues to be concentrated towards the top, or the populace manages to take enough power back to get legislation, regulation, and enforcement to add transparency and rules to private equity.
The thing with PE is they only invest what they’re willing to lose, which the vast majority of their investments do, but the tiny fraction that don’t make enough money to fund profits and cover losses.
If 95% of companies in the stock market lost money, that’d be the end of days, but that’s because generally once you graduate to an IPO you have to be pretty profitable.
I’m not going to come running to the defense of private equity (PE) firms, but compared to so-called AI companies, the PE firms are at least building tangible things that have an ostensible alternative use. A physical data center building – even one located far away from the typical metropolitan area that have better connectivity to the world’s fibre networks – will still be an asset with some utility, when/if the AI bubble pops.
In that scenario, the PE firm would certainly take a haircut on their investment, but they’d still get something because an already-built data center will sell for some non-zero price, with possible buyers being the conventional, non-AI companies that just happen to need some cheap rack space. Looking at the AI companies though, what assets do they have which carry some intrinsic value?
It is often said that during the California Gold Rush, the richest people were not those which staked out the best gold mining sites, but those who sold pickaxes to miners. At least until gold fever gave way to sober realization that it was overhyped. So too would PE firms pivot to whatever comes next, selling their remaining interest from the prior hype cycle and moving to the next.
I’ve opined before that because no one knows when the bubble will burst, it is simultaneously financially dangerous to: 1) invest into that market segment, but also 2) to exit from that market segment. And so if a PE firm has already bet most of the farm, then they might just have to follow through with it and pray for the best.
I acknowledge your point about alternate use, but we also need to look at a datacenter we may or may not need as a “power consumption plant.” These jackasses just keep loading and loading up the grid, looking to make a private dollar on public infrastructure. It’s wasteful and not necessarily a baseline good thing TM even if AI goes flop.
Used for AI, I agree that a faraway, loud, energy-hungry data center comes with a huge host of negatives for the locals, to the point that I’m not sure why they keep getting building approval.
But my point is that in an eventual post-bubble puncture world where AI has its market correction, there will be at least some salvage value in a building that already has power and data connections. A loud, energy-hungry data center can be tamed to be quiet and energy-sipping based on what’s hardware it’s filled in. Remove the GPUs and add some plain servers and that’s a run-of-the-mill data center, the likes of which have been neighbors to urbanites for decades.
I suppose I’d rehash my opinion as such: building new data centers can be wasteful, but I think changing out the workload can do a lot to reduce the impacts (aka harm reduction), making it less like reopening a landfill, and more like rededicating a warehouse. If the building is already standing, there’s no point in tearing it down without cause. Worst case, it becomes climate-controlled paper document storage, which is the least impactful use-case I can imagine.
It’s important to note that in some previous bubbles, the leftovers of the crash ended up spurring new beneficial growth after.
GPUlike computing power available at scape for essentially free after the ai crash could be used in all sorts of potential ways.
Maybe it makes rending movies with special effects super cheap, and available even to tiny indie studios. Maybe scientists grab it for running physics simulations or disease treatment computations.
The problem is that the deprecation/obsolescence/lifetime cycles of GPUs are WAY more rapid than anyone in the “AI” circlejerk bubble is willing to admit. Aside from the generational upgrades that you tend to see in GPUs, which make older models far less valuable in terms of investment, server hardware simply cannot function at peak load indefinitely - and running GPUs at peak load constantly MASSIVELY shortens the MTBF.
TL;DR: the way GPUs are used in ML applications mean that they tend to cook themselves WAY quicker than the GPU you have in your gaming machine or console - as in, they often have a couple of years lifetime, max, and that failure rate is a bell curve.
You’re pulling shit out of your ass at this point, there are some doom reports out of people suggesting that may be a problem, but there are also reports out of other companies(meta for example) with documentation saying the rate is much lower and the mean failure is 6+ years.
The other leftovers from the crash also won’t have that problem. It’s not just about GPUs. Datacenters and their infrastructure last a lot longer, and the electric generation/transportation networks will also potentially be useful for various alternative applications if the AI use case flops.
MTBF is absolutely not six years if you’re running your H100 nodes at peak load and heat soaking the shit out of them. ML workloads are particularly hard on GPU RAM in particular, and sustained heat load on that particular component type on the board is known to degrade performance and integrity.
As to Meta’s (or MS, or OpenAI, or what have you) doc on MTBF: I don’t really trust them on that, because they’re a big player in the “AI” bubble, so of course they’d want to give the impression that the hardware they’re using in their data centers still have a bunch of useful life left. That’s a direct impact to their balance sheet. If they can misrepresent extremely expensive components that they have a shitload of as still being worth a lot, instead of being essentially being salvage/parts only, I would absolutely expect them to do that. Especially in the regulatory environment in which we now exist.
I mean, we really don’t have the data to prove this either way.
Meta’s training of Llama3 405B model had a 1.34% failure rate for GPUs over the 54 days it ran, across 16387 gpus. It’s not likely that all of those faults led to bricked hardware either, they could have just lost part of their performance or memory.
The real question is does that test scale to the long term, often with hardware like this there’s a bathtub curve for failure. If those units used were brand new, many of the failures could have just been the initial wave of failures, and there could be a long period of relative stability that hadn’t even been seen yet.
GPU based coin mining demonstrated that GPUs often had a lifespan over 5 years of constant use before failure on consumer cards in often less than ideal operating conditions.
gpus as used for genai aren’t really suitable for normal loads like aerodynamic simulations, genai uses low precision data like fp8, fp4, blackwells and such are optimized for it so hard that you can’t really do anything else on this thing
They’re still somewhat functional for those workloads, and they can even use those low-precision components to emulate high-precision using libraries like cublas, though obviously not as fast as hardware that could do it natively.
It’s not like they can’t do it at all. It’s just that Hopper was better at FP64 than Blackwell is but if Blackwell chips become effectively free due to an AI crash then you could likely still use them in that capacity.
it’s hacky and wrong. or you could use different hardware, maybe from competition, because result isn’t worth electricity it used
Cloud compute was attractive to 3d rendering for a while, as you could put your non urgent renders on the cloud at the lowest priority and take advantage of off peak pricing. Now model training demand has wiped out off peak pricing and forced the cloud rendering cost way higher than rendering locally.
Running those power hungry gpu based data centers is going to cost beaucoup bucks regardless of the usage.
The power costs are nothing compared to the hardware costs for those things.
Getting the massive power required is difficult for datacenters but the usage per unit of compute is actually quite low.
Absolutely, yes. I didn’t want to elongate my comment further, but one odd benefit of the Dot Com bubble collapsing was all of the dark fibre optic cable laid in the ground. Those would later be lit up, to provide additional bandwidth or private circuits, and some even became fibre to the home, since some municipalities ended up owning the fibre network.
In a strange twist, the company that produced a lot of this fibre optic cable and went bankrupt during the bubble pop – Corning Glass – would later become instrumental in another boom, because their glass expertise meant they knew how to produce durable smartphone screens. They are the maker of Gorilla Glass.
Rack space is literally the only thing valuable that would be left. Those GPUs are useless for non LLM computation. The optimization of the chips and the massive amounts of soldered RAM. They are purpose made, and they were also manufactured very cheap without common longevity and endurance design features. They will degrade and start failing after less than 5 years or so. Most would be inoperable in a decade. Those data centers are massive piles of e-waste, an absolute misuse of sand.
Racks/cabinets, fiber optic cables, PDUs, CAT6 (OOBM network), top-of-rack switches, aggregation switches, core switches, core routers, external multi-homed ISP/transit connectivity, megawatt three-phase power feeds from the electric utility, internal power distribution and step-down transformers, physical security and alarm systems, badge access, high-strength raised floor, plenum spaces for hot/cold aisles, massive chiller units.
Yes, that’s rack space. It is not even half of the costs of a data center. I know because I’ve worked in data centers and read the financial breakdowns of those materials. They are also useless without actual servers and deprecrate their value really fast.
Why would you be curious about an unreliable, easily manipulatable, economic pile of shit like the stock market? “Value” means hardly anything, anymore.
Very valid question



