Need to let loose a primal scream without collecting footnotes first? Have a sneer percolating in your system but not enough time/energy to make a whole post about it? Go forth and be mid: Welcome to the Stubsack, your first port of call for learning fresh Awful you’ll near-instantly regret.
Any awful.systems sub may be subsneered in this subthread, techtakes or no.
If your sneer seems higher quality than you thought, feel free to cut’n’paste it into its own post — there’s no quota for posting and the bar really isn’t that high.
The post Xitter web has spawned soo many “esoteric” right wing freaks, but there’s no appropriate sneer-space for them. I’m talking redscare-ish, reality challenged “culture critics” who write about everything but understand nothing. I’m talking about reply-guys who make the same 6 tweets about the same 3 subjects. They’re inescapable at this point, yet I don’t see them mocked (as much as they should be)
Like, there was one dude a while back who insisted that women couldn’t be surgeons because they didn’t believe in the moon or in stars? I think each and every one of these guys is uniquely fucked up and if I can’t escape them, I would love to sneer at them.
(Semi-obligatory thanks to @dgerard for starting this.)
Maybe this is common knowledge, but I had no idea before. What an absolutely horrible decision from google to allow this. What are they thinking?? This is great for phishing and malware, but I don’t know what else. (Yeah ok, the reason has probably something to do with “line must go up”.)
I recall seeing something of this sort happening on goog for about 12~18mo - every so often a researcher post does the rounds where someone finds Yet Another way goog is fucking it up
the advertising dept has completely captured all mindshare and it is (demonstrably) the only part that goog-the-business cares about
CIDR 2025 is ongoing (Conference on Innovative Data Systems Research). It’s a very good conference in computer science, specifically database research (an equivalent of a journal for non-CS science). And they have a whole session on LLMs called “LLMs ARE THE NEW NO-SQL”
I didn’t have time to read the papers yet, believe me I will, but the abstracts are spicy
We systematically develop benchmarks to study [the problem] and find that standard methods answer no more than 20% of queries correctly, confirming the need for further research in this area.
(Text2SQL is Not Enough: Unifying AI and Databases with TAG, Biswal et al.)
Hey guys and gals, I have a slightly different conclusion, maybe a baseline 20% correctness is a great reason to not invest a second more of research time into this nonsense? Jesus DB Christ.
I’d also like to shoutout CIDR for setting up a separate “DATABASES AND ML” session, which is an actual research direction with interesting results (e.g. query optimizers powered by an ML model achieving better results than conventional query optimizers). At least actual professionals are not conflating ML with LLMs.
Polish commentary on Hitlergruß: https://bsky.app/profile/smutnehistorie.bsky.social/post/3lgaoyezhgc2c
Translation:
- it’s just a Hindu symbol of prosperity
- a normal Roman salute
- regular rail car
- wait a second
From the “flipping through LessWrong for entertainment” department:
What effect does LLM use have on the quality of people’s thinking / knowledge?
- I’d expect a large positive effect from just making people more informed / enabling them to interpret things correctly / pointing out fallacies etc.
You’d think the AI safety chuds would have more reservations about using GPT, which they believe has sapience, to learn things. They have the concept of an AI being a good convincer, which, hey, idiots, how have none of you thought the great convincing has started? Also, how have none of you realised that maybe you should be a little harder to convince in general???
It is a long-established truth that it’s significantly easier to con someone who thinks they’re smarter than you. Also as I think about it a little bit there seems to be a reasonable corollary of their approach towards Bayesian thinking that you not question anything that matches your expectations, which is exactly how you get taken advantage of by the kind of grifter they’re attached to. Like, they’ve been thinking about the singularity for long enough that the Sams (bankman-fried, Altman, etc) have a well-developed script for what they expect the first stages to look like and it is, as demonstrated, very easy to fake that.
The findings revealed a significant negative correlation between frequent AI tool usage and critical thinking abilities, mediated by increased cognitive offloading
i think it was posted somewhere in techtakes https://www.mdpi.com/2075-4698/15/1/6
yeah, I posted in previous stubsack I think
https://xcancel.com/kailentit/status/1881476039454699630
“We did not have superintelligent relations with that…”
This is a thought I’ve been entertaining for some time, but this week’s discussion about Ars Technica’s article on Anthropic, as well as the NIH funding freeze, finally prodded me to put it out there.
A core strategic vulnerability that Musk, his hangers-on, and geek culture more broadly haven’t cottoned onto yet: Space is 20th-century propaganda. Certainly, there is still worthwhile and inspirational science to be done with space probes and landers; and the terrestrial satellite network won’t dwindle in importance. I went to high school with a guy who went on to do his PhD and get into research through working with the first round of micro-satellites. Resources will still be committed to space. But as a core narrative of technical progress to bind a nation together? It’s gassed. The idea that “it might be ME up there one day!” persisted through the space shuttle era, but it seems more and more remote. Going back to the moon would be a remake of an old television show, that went off the air because people ended up getting bored with it the first time. Boots on Mars (at least healthy boots with a solid chance to return home) are decades away, even if we start throwing Apollo money at it immediately. The more outlandish ideas like orbital data centers and asteroid mining don’t have the same inspirational power, because they are meant to be private enterprises operated by thoroughly unlikeable men who have shackled themselves to a broadly destructive political program.
For better or worse, biotechnology and nanotechnology are the most important technical programs of the 21st century, and by backgrounding this and allowing Trump to threaten funding, the tech oligarchs kowtowing to him right now are undermining themselves. Biotech should be obvious, although regulatory capture and the impulse for rent-seeking will continue to hold it back in the US. I expect even more money to be thrown at nanotechnology manufacturing going into the 2030s, to try to overcome the fact that semiconductor scaling is hitting a wall, although most of what I’ve seen so far is still pursuing the Drexlerian vision of MEMS emulating larger mechanical systems… which, if it’s not explicitly biocompatible, is likely going down a cul-de-sac.
Everybody’s looking for a positive vision of the future to sell, to compete with and overcome the fraudulent tech-fascists who lead the industry right now. A program of accessible technology at the juncture of those two fields would not develop overnight, but could be a pathway there. Am I off base here?
This seems like yet another disconnect between however the fuck science communication has been failing the general public and myself.
Like when you say space I think, fuck yeah, space! Those crisp pictures of Pluto! Pictures of black holes! The amazing JWST data! Gravitational waves detection! Recreating the conditions of the early universe in particle accelerators to unlock the secrets of spacetime! Just most amazing geek shit that makes me as excited as I was when I was 12 looking at the night sky through my cheap-ass telescope.
Who gives a single fuck about sending people up there when we have probes and rovers, true marvels of engineering, feeding us data back here? Did you know Voyager 1, Voyager Fucking ONE, almost 50 years old probe, over 150 AU away from Earth, is STILL SENDING US DATA? We engineered the fuck of that bolt bucket so that even the people that designed it are surprised by how long it lasted. You think a human would last 50 years in the interstellar medium? I don’t fucking think so.
We’re unlocking the secrets of the universe and confirming theories from decades ago, has there been a more exciting time to be a scientist? Wouldn’t you want to run a particle accelerator? Do science on the ISS? Be the engineer behind the next legendary probe that will benefit mankind even after you’re gone? If you can’t spin this into a narrative of technical progrees and humans being amazing then that’s a skill issue, you lack fucking whimsy.
And I don’t think there’s a person in the world less whimsical than Elon fucking Musk.
Its really about the ultimate white flight.
Agree with space travel being retro-futurist fluff. It’s very rich men badly remembering mediocre science fiction.
The US could lead the world in innovation in green technology but that’s now tainted by wokeness.
Hmm, any sort of vision for generating public support for development of a technology has to have either ideological backing or a profit incentive. I don’t say this to mean that the future must be profitable, rather, I say this to mean that you don’t get the space race if western powers aren’t afraid of communism appearing as a viable alternative to capitalism, on both ideological and commercial fronts.
Unfortunately, a vision of that kind is necessarily technofascist. Rather than look for a tech-forward vision of the future, we need deprogram ourselves and unlearn the unspoken narratives that prop up capitalism and liberal democracy as the only viable forms of society. We need to dismantle the systems and structures that require the complex political buy-in for projects that are clearly good for society at large.
Uh, I guess I’ve kind of gone completely orthogonal to your point of discussion. I’m kind of saying the collapse of the US is inevitable.
On another somewhat orthogonal point, I suspect AI has likely soured the public on any kinda tech-forward vision for the foreseeable future.
Both directly and indirectly, the AI slop-nami has caused a lot of bad shit for the general public - from plagiarism to misinformation, from shit-tier AI art to screwing human artists, the public has come to view AI as an active blight on society, and use of AI as a virtual “Kick Me” sign.
I’ve been struggling with what the appropriate level of engagement for all the tech shit is.
I can stick to making fun of the AI crap and whatever else the tech people shit out because it’s tangible for me, and I can more or less be an effective gatekeeper for my community, but the problems go beyond just a bunch of rich tech weirdos floating bad ideas, it’s what they’re trying to paper over. The fact that they’re incompetent at it is very funny, but I’ve been laughing with gritted teeth for too long.
I just want it all to stop.
For the US to avoid collapse, the Democrats would have to sweep the board in multiple successive elections and be more unified and committed to deep reform than they ever have been.
I will pause for the laughter to fade.
Snark answer: for the US to avoid collapse, the democrats will have to do literally anything, so yeah collapse is inevitable.
Optimistic answer: a third, actually leftist, anti-liberal party suddenly gains popularity and power and reforms the US entirely.
Realistic answer: trump and the republicans will fully construct a fascist chokehold over the US probably by the end of this year at the earliest. Anyone who has any hope in non-violent action is deluding themselves.
A necessary precondition for the Democrats to do anything is Democrats regaining the Senate, which pretty much requires winning a Senate seat in North Carolina, where the state supreme court is taking the attitude that no Democratic win is legitimate. So, yeah: There’s basically no institutional way for this country to come back from where it has gone.
In completely related news I’m strongly considering getting my affairs in order and moving
anywhere in the entire world besides the united statessomewhere in Europe; as it’s apparently no longer safe for trans peopleor C++ developers*in the US. So if anyone has any advice (or job leads) please do share.* This is a memory safety joke
from what I’ve been told, a digital nomad visa and EU citizenship by descent are a couple of routes worth looking into. I have frustratingly little detail on the expectations around the visa though, and citizenship by descent laws vary by country.
Estonia has an immigration thing for tech workers I believe!
as it’s apparently no longer safe for trans people or C++ developers
Sorry but Rust knowledge is now a hard requirement for visas so you better hit the book
might be relevant https://lemmy.world/post/21995141
No actually, I think what you have to say is in line with my broader point. As the top source of global consumer demand, America is primarily held together by its supply chains at this point. To be crude about it, the best reasons to be an American in the 21st century are the swag and the cheap gas. When the MAGA and Fox News crowd are pointing fingers and ranting about Marxism, they’re actively trying to obscure materialism and keep people from thinking about material conditions. Having a material program, that at least has elements that can be built from the bottom up, is at least as crucial as having an electoral program. I know the Four Thieves people got rightfully shredded here a few weeks back, and that kind of technical pushback on amateur dreams is necessary, so it’s a tough needle to thread. But for instance, consider Gavin Newsom’s plan to have California operate its own insulin production, within existing systems and regulations: https://calmatters.org/health/2025/01/insulin-production-gavin-newsom/ This is a Newsom policy I actually think is a fantastic idea, and a big credit to him if it happens! But it’s bogged down in the production-line validation stage, because we already know how to synthesize insulin and that it’s effective. And the production may not even be in California when it happens! There’s plenty of room for improvement here.
Space and centralized, rent-seeking “AI” are not material programs that improve conditions for the broader population. The original space program was successful because a more tightly controlled media environment gave the opportunity to use it to cover for the missile development that was the enduring practical outcome. Positive consumer outcomes from all that have always felt, to me, like something that was bolted onto the history later. We wouldn’t have Tang and transistors if not for Apollo! Well, one is kind of shitty and useless, the other is so overwhelmingly advantageous that it surely would have happened anyway.
And to your last point, I somewhat sadly feel like a lot of doomer shit I was reading ~15 years ago actually prepared me to at least be unsurprised about the situation we’re in. A lot of those writers (James Howard Kunstler, John Michael Greer for instance) have either softly capitulated, or else happily slotted themselves into the middle of the red-brown alliance. I think that’s a big part of why we’re at where we’re at: a lot of people who were actually willing to consider the idea of American collapse were perfectly fine with letting it happen.
what? space race was thinly disguised ICBM development program
ah, am conflating the cold war and the space race. Though, why the nations wanted to develop ICBMs is entirely relevant.
Reposting this for the new week thread since it truly is a record of how untrustworthy sammy and co are. Remember how OAI claimed that O3 had displayed superhuman levels on the mega hard Frontier Math exam written by Fields Medalist? Funny/totally not fishy story haha. Turns out OAI had exclusive access to that test for months and funded its creation and refused to let the creators of test publicly acknowledge this until after OAI did their big stupid magic trick.
From Subbarao Kambhampati via linkedIn:
"𝐎𝐧 𝐭𝐡𝐞 𝐬𝐞𝐞𝐝𝐲 𝐨𝐩𝐭𝐢𝐜𝐬 𝐨𝐟 “𝑩𝒖𝒊𝒍𝒅𝒊𝒏𝒈 𝒂𝒏 𝑨𝑮𝑰 𝑴𝒐𝒂𝒕 𝒃𝒚 𝑪𝒐𝒓𝒓𝒂𝒍𝒍𝒊𝒏𝒈 𝑩𝒆𝒏𝒄𝒉𝒎𝒂𝒓𝒌 𝑪𝒓𝒆𝒂𝒕𝒐𝒓𝒔” hashtag#SundayHarangue. One of the big reasons for the increased volume of “𝐀𝐆𝐈 𝐓𝐨𝐦𝐨𝐫𝐫𝐨𝐰” hype has been o3’s performance on the “frontier math” benchmark–something that other models basically had no handle on.
We are now being told (https://lnkd.in/gUaGKuAE) that this benchmark data may have been exclusively available (https://lnkd.in/g5E3tcse) to OpenAI since before o1–and that the benchmark creators were not allowed to disclose this *until after o3 *.
That o3 does well on frontier math held-out set is impressive, no doubt, but the mental picture of “𝒐1/𝒐3 𝒘𝒆𝒓𝒆 𝒋𝒖𝒔𝒕 𝒃𝒆𝒊𝒏𝒈 𝒕𝒓𝒂𝒊𝒏𝒆𝒅 𝒐𝒏 𝒔𝒊𝒎𝒑𝒍𝒆 𝒎𝒂𝒕𝒉, 𝒂𝒏𝒅 𝒕𝒉𝒆𝒚 𝒃𝒐𝒐𝒕𝒔𝒕𝒓𝒂𝒑𝒑𝒆𝒅 𝒕𝒉𝒆𝒎𝒔𝒆𝒍𝒗𝒆𝒔 𝒕𝒐 𝒇𝒓𝒐𝒏𝒕𝒊𝒆𝒓 𝒎𝒂𝒕𝒉”–that the AGI tomorrow crowd seem to have–that 𝘖𝘱𝘦𝘯𝘈𝘐 𝘸𝘩𝘪𝘭𝘦 𝘯𝘰𝘵 𝘦𝘹𝘱𝘭𝘪𝘤𝘪𝘵𝘭𝘺 𝘤𝘭𝘢𝘪𝘮𝘪𝘯𝘨, 𝘤𝘦𝘳𝘵𝘢𝘪𝘯𝘭𝘺 𝘥𝘪𝘥𝘯’𝘵 𝘥𝘪𝘳𝘦𝘤𝘵𝘭𝘺 𝘤𝘰𝘯𝘵𝘳𝘢𝘥𝘪𝘤𝘵–is shattered by this. (I have, in fact, been grumbling to my students since o3 announcement that I don’t completely believe that OpenAI didn’t have access to the Olympiad/Frontier Math data before hand… )
I do think o1/o3 are impressive technical achievements (see https://lnkd.in/gvVqmTG9 )
𝑫𝒐𝒊𝒏𝒈 𝒘𝒆𝒍𝒍 𝒐𝒏 𝒉𝒂𝒓𝒅 𝒃𝒆𝒏𝒄𝒉𝒎𝒂𝒓𝒌𝒔 𝒕𝒉𝒂𝒕 𝒚𝒐𝒖 𝒉𝒂𝒅 𝒑𝒓𝒊𝒐𝒓 𝒂𝒄𝒄𝒆𝒔𝒔 𝒕𝒐 𝒊𝒔 𝒔𝒕𝒊𝒍𝒍 𝒊𝒎𝒑𝒓𝒆𝒔𝒔𝒊𝒗𝒆–𝒃𝒖𝒕 𝒅𝒐𝒆𝒔𝒏’𝒕 𝒒𝒖𝒊𝒕𝒆 𝒔𝒄𝒓𝒆𝒂𝒎 “𝑨𝑮𝑰 𝑻𝒐𝒎𝒐𝒓𝒓𝒐𝒘.”
We all know that data contamination is an issue with LLMs and LRMs. We also know that reasoning claims need more careful vetting than “𝘸𝘦 𝘥𝘪𝘥𝘯’𝘵 𝘴𝘦𝘦 𝘵𝘩𝘢𝘵 𝘴𝘱𝘦𝘤𝘪𝘧𝘪𝘤 𝘱𝘳𝘰𝘣𝘭𝘦𝘮 𝘪𝘯𝘴𝘵𝘢𝘯𝘤𝘦 𝘥𝘶𝘳𝘪𝘯𝘨 𝘵𝘳𝘢𝘪𝘯𝘪𝘯𝘨” (see “In vs. Out of Distribution analyses are not that useful for understanding LLM reasoning capabilities” https://lnkd.in/gZ2wBM_F ).
At the very least, this episode further argues for increased vigilance/skepticism on the part of AI research community in how they parse the benchmark claims put out commercial entities."
Big stupid snake oil strikes again.
Every time they go ‘this wasnt in the data’ it turns out it was. A while back they did the same with translating rareish languages. Turns out it was trained on it. Fucked up. But also, wtf how are they expecting this to stay secret and there being no backlash? This world needs a better class of criminals.
The conspiracy theorist who lives in my brain wants to say its intentional to make us more open to blatant cheating as something that’s just a “cost of doing business.” (I swear I saw this phrase a half dozen times in the orange site thread about this)
The earnest part of me tells me no, these guys are just clowns, but I dunno, they can’t all be this dumb right?
holy shit, that’s the excuse they’re going for? they cheated on a benchmark so hard the results are totally meaningless, sold their most expensive new models yet on the back of that cheated benchmark, further eroded the scientific process both with their cheating and by selling those models as better for scientific research… and these weird fucks want that to be fine and normal? fuck them
they can’t even sell o3 really - in o3 high mode, needed to do this level of query, it’s about $1000 per query lol
do you figure it’s $1000/query because the algorithms they wrote with their insider knowledge to cheat the benchmark are very expensive to run, or is it $1000/query because they’re grifters and all high mode does is use the model trained on frontiermath and allocate more resources to the query? and like any good grifter, they’re targeting whales and institutional marks who are so invested that throwing away $1000 on horseshit feels like a bargain
so, for an extremely unscientific demonstration, here (warning: AWS may try hard to get you to engage with Explainer[0]) is an instance of an aws pricing estimate for big handwave “some gpu compute”
and when I say “extremely unscientific”, I mean “I largely pulled the numbers out of my ass”. even so, they’re not entirely baseless, nor just picking absolute maxvals and laughing
parametersassumptions made:- “somewhat beefy” gpu instances (g4dn.4xlarge, selected through the tried and tested “squint until it looks right” method)
- 6-day traffic pattern, excluding sunday[1]
- daily “4h peak” total peak load profile[2]
- 50 instances mininum, 150 maximum (let’s pretend we’re not openai but are instead some random fuckwit flybynight modelfuckery startup)
- us west coast
- spot instances, convertible spot reserves, 3y full prepay commit (yeah I know full vs partial is a big diff; once again, snore)
(and before we get any fucking ruleslawyering dumb motherfuckers rolling in here about accuracy or whatever: get fucked kthx. this is just a very loosely demonstrative example)
so you’d have a variable buffer of 50…150 instances, featuring 3.2…9.6TiB of RAM for working set size, 800…2400 vCPU, 50…150 nvidia t4 cores, and 800…2400GiB gpu vram
let’s presume a perfectly spherical ops team of uniform capability[3] and imagine that we have some lovely and capable active instance prewarming and correct host caching and whatnot. y’know, things to reduce user latency. let’s pretend we’re fully dynamic[4]
so, by the numbers, then
1y times 4h daily gives us 1460h (in seconds, that’s 5256000). this extremely inaccurate full-of-presumptions number gives us “service-capable life time”. the times your concierge is at the desk, the times you can get pizza delivered.
x3 to get to lifetime matching our spot commit, x50…x150 to get to “total possible instance hours”. which is the top end of our sunshine and rainbows pretend compute budget. which, of course, we still have exactly no idea how to spend. because we don’t know the real cost of servicing a query!
but let’s work backwards from some made-up shit, using numbers The Poor Public gets (vs numbers Free Microsoft Credits will imbue unto you), and see where we end up!
so that means our baseline:
- upfront cost: $4,527,400.00
- monthly: $1460.00 (x3 x12 = $52560)
- whatever the hell else is incurred (s3, bandwidth, …)
=200k/y
per ops/whatever person we have
3y of 4h-daily at 50 instances = 788400000 seconds. at 150 instances, 2365200000 seconds.
so we can say that, for our deeply Whiffs Ever So Slightly values, a second’s compute on the low instance-count end is $0.01722755 and $0.00574252 at the higher instance-count end! which gives us a bit of a handle!
this, of course, entirely ignores parallelism, n-instance job/load/whatever distribution, database lookups, network traffic, allllllll kinds of shit. which we can’t really have good information on without some insider infrastructure leaks anyway. if we pretend to look at the compute alone.
so what does $1000/query mean, in the sense of our very ridiculous and fantastical numbers? since the units are now The Same, we can simply divide things!
at the 50 instance mark, we’d need to hypothetically spend 174139.68 instance-seconds. that’s 2.0154 days of linear compute!
at the 150 instance mark, 522419.05 instance-seconds! 6.070 days of linear compute!
so! what have we learned? well, we’ve learned that we couldn’t deliver responses to prompts in Reasonable Time at these hardware presumptions! which, again, are linear presumptions. and there’s gonna be a fair chunk of parallelism and other parts involved here. but even so, turns out it’d be a bit of a sizable chunk of compute allocated. to even a single prompt response.
[0] - a product/service whose very existence I find hilarious; the entire suite of aws products is designed to extract as much money from every possible function whatsoever, leading to complexity, which they then respond to by… producing a chatbot to “guide users”
[1] - yes yes I know, the world is not uniform and the fucking promptfans come from everywhere. I’m presuming amerocentric design thinking (which imo is probably not wrong)
[2] - let’s pretend that the calculators’ presumption of 4h persistent peak load and our presumption of short-duration load approaching 4h cumulative are the same
[3] - oh, who am I kidding, you know it’s gonna be some dumb motherfuckers with ansible and k8s and terraform and chucklefuckery
when digging around I happened to find this thread which has some benchmarks for a diff model
it’s apples to square fenceposts, of course, since one llm is not another. but it gives something to presume from. if g4dn.2xl gave them 214 tok/s, and if we make the extremely generous presumption that tok==word (which, well, no; cf.
strawberry
), then any Use Deserving Of o3 (let’s say 5~15k words) would mean you need a tok-rate of 1000~3000 tok/s for a “reasonable” response latency (“5-ish seconds”)so you’d need something like 5x g4dn.2xl just to shit out 5000 words with dolphin-llama3 in “quick” time. which, again, isn’t even whatever the fuck people are doing with openai’s garbage.
utter, complete, comprehensive clownery. era-redefining clownery.
but some dumb motherfucker in a bar will keep telling me it’s the future. and I get to not boop 'em on the nose. le sigh.
Yeah we would like to stop lying and cheating, but the number you see.
They understand that all of the major model providers is doing it, but since the major model providers are richer than they are, they can’t possibly ask OpenAI and friends to stop, so in their heads, it is what it is and therefore must be allowed to continue.
Or at least, that’s my face value read of it, I certainly hope I’m simplifying things too much.
also they are rationalists and hence the most gullible mfs on any of this stuff
But also, wtf how are they expecting this to stay secret and there being no backlash?
No, they bet on it not mattering and they’ve been completely right thus far.
it’s enough if it ends up not mattering long enough for them to cash out, then they don’t care
Ah right yes.