Posts

Showing posts from June, 2025

Multi-Band Synchronization

Getting synchronized multi-band cutouts took a full week to solve. The trick, which seems so obvious now, was to calculate the bounding box just once using a reference coordinate system (WCS) and then apply that same box to all the different filter bands (g-band, r-band, etc.). Before this, I was wrestling with cases where the g-band had data but the r-band would throw an exception. The goal was to "handle missing bands gracefully." When I asked for a definition of "gracefully," we settled on a simple one: "don't crash." Mission accomplished.

Batch Processing

ThreadPoolExecutor is my new best friend. I've redesigned the data flow (for the third time), and now the system can process over 100 cutouts per minute! The parallel fetching pipeline works like a charm. 

Cache Me If You Can

The LRU (Least Recently Used) cache is implemented! It's strangely satisfying to watch the cache hit rate climb above 80% on repeated queries for the same tracts and patches. The performance difference is night and day. Since we're dealing with "astronomical" images (pun very much intended), I also added some memory monitoring to make sure the cache doesn't eat all our RAM.

The subtle art of chugging coffee

 When code doesn't work the way you want, you debug. When debugging doesn't work the way you want, you don't sleep.  When you don't sleep, you chug coffees, when you chug coffees you don't sleep. When you don't sleep, you debug code. When you debug code and find a solution, you release dopamine and sleep like a baby (albeit in the afternoon).  That's what has been the whole week like. 

Down the Error Handling Rabbit Hole

 You know what they say about the best-laid plans? I initially thought I'd just need to handle a few common Butler exceptions. I was wrong. So, so wrong. I've now encountered DataIdValueError, LookupError, connection timeouts, missing datasets, invalid coordinate systems (WCS), and more. I’ve had to build an entire hierarchy of custom exceptions to manage it all. But on the bright side, the code can now fail gracefully instead of just crashing!

The LsstDataFetcher Awakens

Three days into Phase 1, and the LsstDataFetcher class is already becoming a beast. It started as a simple idea: a tool to fetch a small image cutout. Now, it's ballooned to handle bounding box parameters, multi-band image synchronization, data quality checks, and even partial data recovery. This is quickly evolving from a simple wrapper into a full-fledged data access layer. My coffee consumption has officially doubled.

Phase 0 Complete! (Finally)

Success! All seven of my comprehensive environment tests are passing.  The foundation for the project feels rock-solid. I've confirmed that PyTorch gets along with the LSST stack (which feels like a small miracle) and even managed to get GPU support working with our Quadro P600. Feeling really good about where things stand. With the setup phase complete, tomorrow I start on Phase 1: implementing the actual data access tools. I'm nervous but incredibly excited!