In a similar fashion, Apple Podcasts app decided to download 120GB of podcasts for random reason and never deleted them. It even showed up as "System Data" and made me look for external drive solutions.
Use whatever you like but I don't think Podcast app users are rare by any stretch of the imagination.
I use my MacBook for a mix of dev work and music production and between docker, music libraries, update caches and the like it’s not weird for me to have to go for a fresh install once every year or two.
Once that gets filled up, it’s pretty much impossible to understand where the giant block of memory is.
I should not have to hack through /Libary files to regain data on a TB drive because Osx wanted to put 200gbs of crap there in an opaque manner and not give the user ANY direct way to regain their space.
You can enable "calculate all sizes" in Finder with Cmd+J. I think it only works in list view however.
du -hs ~/Library/Caches/*And then, should you try to set up OneDrive (despite Microsoft's shenanigans, it does simplify taking care of non-tech-savvy relatives), it will refuse to sync the photos folder because 'it contains another cloud storage' and you'll genuinely wonder how or why anyone uses computers anymore
I downloaded several MacOS installers, not for the MacBook I use, but intending to use them to create a partitioned USB installer (they were for macOS versions that I could clearly not even use for my current MacBook). Then, after creating the USB, since I was short of space, I deleted the installers, including from the trash.
Weirdly, I did not reclaim any space; I wondered why. After scratching my head for a while, I asked an LLM, which directed me to check the system snapshots. I had previously disabled time machine backup and snapshots, and yet I saw these huge system snapshots containing the files I had deleted, and kicker was, there was no way to delete them!
Again I scratched my head for a while for a solution other than wiping the MacBook and re-installing MacOS, and then I had the idea to just restart. Lo and behold, the snapshots were gone after restarting. I was relieved, but also pretty pissed off at Apple.
She said "Oh, you bought a toy computer. How cute!"
I've owned every architecture of Mac since then, and I still think of it is my toy computer.
Like, assuming they need the data and it's inconveniently large to fit into RAM, where/how should they store and access it if not the primary disk?
But none of this applies to caches and temporary files, which could be reasonably managed for 99% of users by adding a "clear all caches" checkbox in the reboot dialog with a warning that doing this is likely to slow down the system and increase battery usage for the next few hours, or to system-managed snapshots that mostly just need better UI and documentation.
UI transparency is my only real complaint. A reasonable amount of data the system wants to make difficult to delete is fine, so long as it clearly explains what it is and why. "System Data" is only acceptable as a description for the root of what should be a well-documented hierarchy.
Your friend is called ncdu and can be used as follows:
sudo ncdu -x -e --exclude Volumes /System/Volumes/Data/
The exclude for Volumes is necessary because otherwise ncdu ends up in an infinite loop - "/Volumes/Macintosh\ HD/Volumes/" can be repeated ad nauseam and ncdu's -x flag doesn't catch that for whatever reason.Spotify just gave up while trying to show me my podcasts. I can't listen to anything not already downloaded right now.
Yet at 3am I'll be able to download a 100GB LLM without difficulty onto the same device that can't stream a podcast right now.
Unfortunately I don't think 5G is the streaming panacea you have in mind. Maybe one day...
I also prompt warp/gemini cli to identify unnecessary cache and similar data and delete them
One would think that's a extremely common use case and it will only grow the more years iMessage exists. Just offload them to the cloud, charge me for it if you want but every other free message service that exists has no problem doing that.
I don't know what the formula it uses is, but it's insufficient.
Note that if your Photos library is already larger than you want it to be, you may need to make sure it's synced, delete it, and create a new library on the drive. It will then sync with iCloud. But that's a hassle, and I would back up the library before you do this.
I'd rather pay for cloud space that I'm already using anyway than having it take up my limited space on my laptop that I can't extend.
I don't want to babysit my attachments or delete old conversations just because Apple doesn't put effort into that app. Probably my fault for still using it, but Telegram, WhatApp and Signal all manage to do it better.
sudo du -sh ~/Library/Messages
Password:
du: /Users/cvaske/Library/Messages: Operation not permitted
Wow, SIP is a bit more insidious than I remember. Maybe I should try it in Terminal.app rather than a third party app... I wonder if there will ever be a way to tell the OS "this action really was initiated by the user, not malware, let them try to do what they say they want to do"Edit: investigating a bit more, apparently the lack of a sudo-equivalent, an "elevate this one process temporarily" command is intentional in the design, so that malicious scripts can't take advantage of that "this is really the user" approval path. I can't say I agree with that design decision, but at least it's an ethos.
That's one way to drive sales for higher priced SSDs in Apple products. I'm pretty sure that that sort of move shows up as a real blip on Apple's books.
Did you mean than accurate rather than and accurate? Having a more accurate issue description only sounds like a good thing to me
Imagine a user had a vague idea or something that is broken, then the LLM will choose to interpret his comment for what it thinks is the most likely actual underneath problem, without actually checking anything.
I’ve run into the issue trying to use Claude to instrument and analyze some code for performance. It would make claims like “around 500mb ram are being used in this allocation” without evidence.
> Filed via Claude Code
I assume part of it is true, but determining which part is true is the hard part. I’ve lost a lot of time chasing AI-written bug reports that were actually something else wrong with the user’s computer. I’m assuming the claims of “75% faster” and other numbers are just AI junk, but at least someone could verify if the 10GB VM exists.
So contrary to the github issue, my problem is that it's not enough space. So the fix is to navigate to ~/Library/Application\ Support/Claude/vm_bundles, and then ask Claude Code to upsize the disk to a sparse 60 GiB file, giving cowork much more space to work in while not immediately taking up 60 GiB.
Bigger picture, what this teaches me though, is that my knowledge is still useful in guiding the AI to be able to do things, so I'm not obsolete yet!
Yea, that's a receipt for problems.
AI really give much user ability to develop a completed product, but the quality is decreasing. Professional developers will be in demand when the products/features become popular.
First batch of users of new products need to take more responsibility to test the product like a rats in lab
Looking at the amount of issues, outages and rookie mistakes the employees are making leads me to believe that most of them are below junior level.
If anyone were to re-interview everyone at Anthropic for their own roles with their own interview questions, I would guess that >75% of them would not pass their own interviews.
The only team the would pass them are the Bun team and some other of the recently acquired startups.
Using 'software engineering benchmarks' and 'leaderboards' to mask for those issues in scenarios that require rapid response or urgency doesn't make any sense and even going with that, I would expect less outages but it is in fact the opposite, especially when what we are seeing is that one outage occurrs, another one appears right afterwards almost the next day.
Try this if you have claude code -- ls -a your home dir and see all the garbage claude creates.
I also noticed this 10GB VM from CoWork. And was also surprised at just how much space various things seem to use for no particular reason. There doesn't seem to be any sort of cleanup process in most apps that actually slims down their storage, judging by all the cruft.
Even Xcode. The command line tools installs and keeps around SDKs for a bunch of different OS's, even though I haven't launched Xcode in months. Or it keeps a copy of the iOS simulator even though I haven't launched one in over a year.
Not a new problem, unfortunately. DevCleaner is commonly used to keep it under control: https://github.com/vashpan/xcode-dev-cleaner
i kept telling myself this BUT NEVER ELECTRON AGAIN.
this is usual reason for divorce /s
If you'd like to verify for yourself: On your mac, right click on the Claude app icon and click on "Show Package Contents" and then navigate to Contents > Frameworks > Electron Framework.framework.
Might be virtualization woes or something adjacent.
Pondering... Noodling... Some other nonsense...
https://developer.hashicorp.com/vagrant is still a thing.
The market for Cowork is normals, getting to tap into a executive assistant who can code. Pros are running their consumer "claws" on a separate Mac Mini. Normals aren't going to do that, and offices aren't going to provision two machines to everyone.
The VM is an obvious answer for this early stage of scaled-up research into collaborative computing.
Some reasons aren't even optional. Small but regulated entities exist, and most "Team" sized businesses aren't in Google apps or "the cloud" as they think about it, but are in M365, and do pay for cyber insurance.
Cowork with skills plugins that leverage Python or bash is a remarkably enabling framework given how straightforward it is. A skill engineer can sit with an individual contributor domain expert, conversationally decompose the expert's toil into skills and subcommands, iterate a few times, and like magic the IC gets hours back a day.
Cowork is Agents-On-Rails™ for LLM apps, like Rails was to PHP for web apps.
The VM makes that anti-fragile.
For any SaaS builders reading this: by far most white collar small business work is in Microsoft Office. The scarce "Continue with Microsoft" OIDC reaches more potential SMB desks than the ubiquitous "Continue with Google" and you don't have to learn the legacy SAML dance.
Anthropic seems to understand this. It's refreshing when a firm discovers how to cater to the 25–150 seat market. There's an uncanny valley between early adopters and enterprise contracts, but the world runs on SMBs.
Sign them all up!
However, I’m also curious about using NixOS for dev environments. I think there’s untapped potential there.
containers contain stuff the way an open bookcase contains books, they're just namespaces and cgroups on a file system overlay, more or less, held together by willpower not boundaries:
https://jvns.ca/blog/2016/10/10/what-even-is-a-container/
https://github.com/p8952/bocker
as a firm required to care about infosec, we appreciate the stance in their (2). and MacOS VMs are so fast now, they might as well be containers except, you know, they work. (if not fast, that should be fixed.)
that said, yes, running local minikube and the like remain incredibly useful for mocking container envs where the whole environment is inside a machine(s) boundary. containers are _almost_ as awesome as bookcases…
On macOS, Lima has been a godsend. I have Claude Code in an image, and I just mount the directory I want the VM to have access to. It works flawlessly and has been a replacement for Vagrant for me for some time. Though, I owe a lot to Vagrant. It was a lifesaver for me back in the day.
Storage should be cheaper, complain about Apple making you pay a premium.
- Santa is a very common tool used by macOS admins to lock down binary and file access privileges for apps, usually on managed machines
- Disk Inventory X and GrandPerspective are well-known disk space usage tools for macOS (I personally use DaisyDisk but that requires a license)
- WizTree and WinDirStat are very common tools from Windows admin toolkits
The only one here I can say is potentially suspect is ClearDisk. I haven't used it before, but it does appear to be useful for specifically tracking down developer caches that eat up disk space.
Claude Cowork uses the Claude Code agent harness running inside a Linux VM (with additional sandboxing, network controls, and filesystem mounts). We run that through Apple's virtualization framework or Microsoft's Host Compute System. This buys us three things we like a lot:
(1) A computer for Claude to write software in, because so many user problems can be solved really well by first writing custom-tailored scripts against whatever task you throw at it. We'd like that computer to not be _your_ computer so that Claude is free to configure it in the moment.
(2) Hard guarantees at the boundary: Other sandboxing solutions exist, but for a few reasons, none of them satisfy as much and allow us to make similarly sound guarantees about what Claude will be able to do and not to.
(3) As a product of 1+2, more safety for non-technical users. If you're reading this, you're probably equipped to evaluate whether or not a particular script or command is safe to run - but most humans aren't, and even the ones who are so often experience "approval fatigue". Not having to ask for approval is valuable.
It's a real trade-off though and I'm thankful for any feedback, including this one. We're reading all the comments and have some ideas on how to maybe make this better - for people who don't want to use Cowork at all, who don't want it inside a VM, or who just want a little bit more control. Thank you!
I am building one now that works locally. But back in the day, I saw how extremely efficient VMs can be at AWS. microVMs power lambda btw
> All-in-One Sandbox for AI Agents that combines Browser, Shell, File, MCP and VSCode Server in a single Docker container.
https://code.claude.com/docs/en/devcontainer
It does work but I found pretty quickly that I wanted to base my robot sandbox on an image tailored for the project and not the other way around.
Also, please allow Cowork to work on directories outside the homedir!
I'd like to explore that topic more too, but I feel like the context of "we deferred to MacOS/Windows" is highly relevant context here. I'd even argue that should be the default position and that "extensive justification" is required to NOT do that.
That made me realize it wants to also run a Apple virtualization VM but can’t since it’s inside one already - imo the error messaging here could be better, or considering that it already is in a VM, it could perhaps bypass the vm altogether. Because right now I still never got to try cowork because of this error.
1. Yes, but only Linux guests 2. Yes, but only M3+
It would be really nice to ask the user, “Are you sure you want to use Cowork, it will download and install a huge VM on your disk.”
Also discovered that VM image eating 10GB for no reason. I have Claude Desktop installed, but almost never use it (mostly Claude Code).
This is the most interesting requirement.
So all the sandbox solutions that were recently developed all over GitHub, fell short of your expectations?
This is half surprising since many people were using AI to solve the sandboxing issue have claimed to have done so over several months and the best we have is Apple containers.
What were the few reasons? Surely there has to be some strict requirement for that everyone else is missing.
But still having a 10 GB claude.vmbundle doesn't make any sense.
Sorry for the ask here, but unaware of other avenues of support as the tickets on the Claude Code repo keep getting closed, as it is not a CC issue.
Claude mangles XML files with <name> as an XML Tag to <n>
Ironically, VMs are typically blocked because the infosec team isn't sure how to look inside them and watch you, unlike containers where whatever's running is right there in the `ps` list.
They don't look inside the JVM or .exes either, but they don't think about that the same way. If they treat an app like an exe like a VM, and the VM is as bounded as an app or an exe, with what's inside staying inside, they can get over concerns. (If not, build them a VM with their sensors inside it as well, and move on.)
This conversation can take a while, and several packs of whiteboard markers.
Speaking as a tiny but regulated SMB that's dabbling in skill plugins with Cowork: we strongly appreciate and support this stance. We hope you don't relax your standards, and need you not to. We strongly agree with (1), (2), and (3).
If working outside the sandbox becomes available, Cowork becomes a more interesting exfil vector. A vbox should also be able to be made non-optional — even if MDM allows users to elevate privileges.
We've noticed you're making other interesting infosec tradeoffs too. Your M365 connector aggressively avoids enumeration, which we figured was intentional as a seatbelt for keeping looky-loos in their lane.* Caring about foot-guns goes a long way in giving a sense of you being responsible. Makes it feel less irresponsible to wade in.
In the 'thankful for feedback' spirit, here's a concrete UX gap: we agree approval fatigue matters, and we appreciate your team working to minimize prompts.
But the converse is, when a user rejects a prompt — or it ends up behind a window — there's no clear way to re-trigger. Claude app can silently fail or run forever when it can't spin up the workspace, wasn't allowed to install Python, or was told it can't read M365 data.
Employees who've paid attention to their cyber training (reasonably!) click "No" and then they're stuck without diagnostics or breadcrumbs.
For a CLI example of this done well, see `m365-cli`'s `auth` and `doctor` commands. The tool supports both interactive and script modes through config (backed by a setup wizard):
https://pnp.github.io/cli-microsoft365/cmd/cli/cli-doctor/
Similarly, first party MCPs may run but be invisible to Cowork. Show it its own logs and it says "OK, yes, that works but I still can't see it, maybe just copy and paste your context for now." A doctor tool could send the user to a help page or tell them how to reinstall.
Minimal diagnostics for managed machines — running without local admin but able to be elevated if needed — would go a long way for the SMBs that want to deploy this responsibly.
Maybe a resync perms button or Settings or Help Menu item that calls cowork's own doctor cli when invoked?
---
* When given IDs, the connector can read anything the user can anyway. We're able to do everything we need, just had to ship ID signposts in our skill plugin that taps your connector. Preferred that hack over a third party MCP or CLI, thanks to the responsibility you look to be iteratively improving.
It seems to me that the main issue here is painful disconnects between the VM and the host system. The kernel in the VM wants to manage memory and disk usage and that management ultimately means the host needs to grant the guest OS large blocks of disk and memory.
Is anyone thinking about or working on narrowing that requirement? Like, I may want the 99% of what a VM does, but I really want my host system to ultimately manage both memory and disk. I'd love it if in the linux VM I had a bridge for file IO which interacted directly with the host file system and a bridge in the memory management system which ultimately called the host system's memory allocation API directly and disabled the kernels memory management system.
containers and cgroups are basically how linux does this. But that's a pretty big surface area that I doubt any non-linux system could adopt.
Unfortunately, unlike Linux, macOS doesn't have a great out-of-the-box story there; even Apple's first-party OCI runtime is based on per-container Linux VMs.
And after looking into Jails, it looks like BSD also supports linux cgroups... that's actually really impressive. [1]
[1] https://docs.freebsd.org/en/books/handbook/linuxemu/#linuxemu
As for memory ballooning, the main issue with it is that it (generally) only gets triggered when the host runs out of memory.
For a host which is only running VMs, this is fine. But for the typical consumer host it becomes cumbersome as you still need to give the VM a giant memory block and hope that your VM of choice is good enough to free on time. It's also uncoordinated. When swapping needs to happen, if the VM was using the host for allocation the host could much more efficiently decide what needs to go into swap.
And if the host was in charge of both the memory and file system, then things like a system cache could be done more efficiently on top of all that.
NFS doesn't have to be slow. If you avoid traversing the TCP/IP stack, performance is fine. Linux guests can use vsock to communicate with the hypervisor directly, and macOS hosts can use the Virtualization framework to map a guest vsock to a host UNIX socket.
For personal use, where I have a Pro subscription and adventure into exploring all the other features/products they have... I mean, the experience outside of Claude Code and the terminal has been... bad.
It is quite normal for me to have to force-close Claude Desktop.
yeah they're shipping too fast and everything is buggy as shit
- fork conversation button doesn't even work anymore in vscode extension
- sometimes when I reconnect to my remote SSH in VSCode, previously loaded chats become inaccessible. The chats are still there in the .jsonl files but for some reason the CC extension becomes incapable of reading them.
so crazy on a windows desktop I at most complain if it is hardcoded to the system drive (looking at you ollama)
ChatGPT's code execution container contains 56 vCPUs!! Back then, simonw mentioned:
> It appears to have 4GB of RAM and 56 (!?) CPU cores https://chatgpt.com/share/6977e1f8-0f94-8006-9973-e9fab6d24418
I'm seeing something similar on a free account too: https://chatgpt.com/share/69a5bbc8-7110-8005-8622-682d5943dcd9
On my paid account, I was able to verify this. I was also able to get a CPU-bound workload running on all cores. Interestingly, it was not able to fully saturate them, though - despite trying for 20-odd minutes. I asked it to test with stress-ng, but it looks like it had no outbound connectivity to install the tool: https://chatgpt.com/share/69a5c698-28bc-8005-96b6-9c089b0cc561
Anyways, that's a lot of compute. Not quite sure why its necessary for a plus account. Would love to get some thoughts on this?
Out of curiosity, why are you running Cowork inside a VM in the first place? What does that get you that letting Cowork use its own VM wouldn’t?