โŒ

Reading view

There are new articles available, click to refresh the page.

Stop crashing your Python scripts: How to handle massive datasets on any laptop

I started a climate modeling project assuming I'd be dealing with "large" datasets. Then I saw the actual size: 2 terabytes. I wrote a straightforward NumPy script, hit-run, and grabbed a coffee. Bad idea. When I came back, my machine had frozen. I restarted and tried a smaller slice. Same crash. My usual workflow wasn't going to work. After some trial and error, I eventually landed on Zarr, a Python library for chunked array storage. It let me process that entire 2TB dataset on my laptop without any crashes. Here's what I learned:

โŒ