Object Storage
Object storage is a data storage model designed for large, unstructured files. It sits alongside relational databases and caches in most production architectures, handling the data those systems are not suited for.
Contents
- What Is Object Storage?
- Flat Structure and Keys
- BLOBs
- How Access Works
- When to Use Object Storage
- Summary
1. What Is Object Storage?
Object storage is a system for storing arbitrary binary data as discrete objects. Each object is an independent unit containing the file data itself, a globally unique key used to identify it, and optional metadata describing it.
Major cloud providers offer managed object storage as a service: AWS S3, Azure Blob Storage, and Google Cloud Storage are the most widely used. The core model is the same across all of them.
The key differences from a traditional filesystem or relational database:
- No hierarchy. There are no real directories or folders. Objects sit in a flat namespace.
- Write once. Objects can be created and retrieved, but not partially updated in place.
- Accessed by key. Objects are retrieved by their unique key, not by traversing a path.
- Optimised for large files. Object storage handles gigabyte and terabyte-scale files efficiently, where a relational database would not.
2. Flat Structure and Keys
Object storage has no real folder hierarchy. When a file appears to live at a path like photos/2024/january/img_001.jpg, that full string is simply the object's key. The slashes are part of the name, not actual directory separators. The storage system does not know or care about them.
This means keys must be globally unique within a bucket. Two objects cannot share the same key. In AWS S3, bucket names themselves must be globally unique across all customers. Unique keys are typically generated using UUIDs, timestamps, or hash-based identifiers.
Because retrieval is by key, searches over object contents are not possible natively. If an application needs to find objects matching certain criteria, it stores that metadata in a relational database and uses the database to look up the corresponding key, then fetches the object from storage.
3. BLOBs
The data stored in object storage is referred to as a BLOB: Binary Large Object. A BLOB is any arbitrary sequence of bytes. Object storage does not interpret or understand the content; it stores and returns bytes.
Typical BLOBs include:
- Images (JPEG, PNG, WebP)
- Videos (MP4, MOV)
- Audio files (MP3, WAV)
- Documents (PDF, DOCX)
- Database backups and exports
- Compiled software artifacts and build outputs
- Log archives
The "large" in BLOB is relative: object storage handles files from a few bytes up to terabytes, but its design is optimised for files too large to store efficiently in a database column.
📝Object Storage Is Not a Database
Object storage has no query language, no transactions, and no schema. It is a key-value store for binary data. For structured data that needs to be queried, updated, or joined, a relational or NoSQL database is still the right tool. Object storage handles the files those databases reference.
4. How Access Works
Objects are accessed over HTTP, using standard REST verbs. Most object storage systems expose a simple API:
PUT /bucket/keyuploads a new object.GET /bucket/keyretrieves the object.DELETE /bucket/keyremoves the object.
There is no PATCH or partial update. If an object needs to change, the entire object must be re-uploaded under the same key, replacing the previous version.
For publicly accessible files, object storage providers also support pre-signed URLs: time-limited URLs that grant temporary access to a specific object without requiring the requester to have storage credentials. This is commonly used to serve images and videos directly to users without routing the binary data through the application server.
5. When to Use Object Storage
Object storage is the right choice whenever the data is:
- Large binary files. Images, videos, audio, PDFs. Storing these in a database column is technically possible but wastes database resources and degrades performance.
- Write-once or rarely updated. Profile pictures, uploaded documents, media assets. The inability to partially update an object is not a limitation when the files themselves do not change frequently.
- Accessed directly by clients. Videos and images can be served to users directly from object storage via a CDN, bypassing the application server entirely.
- Long-term archival. Backups, logs, and audit trails that need to be retained but not queried.
Object storage is not appropriate for structured data that needs to be queried, filtered, or joined. Those use cases belong in a database. The common pattern in most applications is: store structured metadata (title, uploader, timestamp, file key) in a relational database, and store the actual file in object storage.
💡Pair Object Storage with a CDN for Media
Serving large files directly from object storage on every request is slow and expensive. A CDN (content delivery network) sits in front of object storage and caches files at edge locations close to users. The first request fetches from storage; subsequent requests are served from the cache. This is the standard architecture for any application serving images or video at scale.
Summary
| Concept | Key Takeaway |
|---|---|
| Object storage | A flat key-value store for arbitrary binary files. No hierarchy, no schema, no partial updates. |
| BLOB | Binary Large Object. The raw file data stored in object storage: images, video, audio, documents, backups. |
| Flat namespace | Folder paths are an illusion. The full path string is simply the object's key. There are no real directories. |
| Unique keys | Every object must have a globally unique key within its bucket. Duplicate keys overwrite the existing object. |
| Write once | Objects cannot be partially updated. To change a file, the entire object must be re-uploaded. |
| HTTP access | Objects are read and written over standard HTTP using GET, PUT, and DELETE. |
| Pre-signed URLs | Time-limited URLs that grant temporary access to a specific object. Used to serve files directly to clients. |
| Common pattern | Store structured metadata and the file key in a relational database. Store the actual file in object storage. |
| CDN pairing | Object storage is typically fronted by a CDN to cache files at edge locations and reduce latency for end users. |