Using pickle in Python: Serialize & Save Objects

This lesson teaches you how to store Python objects (lists, dicts, custom classes, etc.) into bytes/files and restore them later using pickle. Includes best practices, common mistakes, and mini assignments.

Topic: Serialization
Module: pickle
Level: Beginner → Intermediate
Focus: Safety + Patterns
✅ Works offline 📦 dump/load 🧠 custom classes ⚠️ security

1) What is pickle?

pickle is a Python module that converts a Python object into a stream of bytes (called pickling or serialization) and converts those bytes back into the original object (called unpickling or deserialization).

Use case: Save your program’s data/state to disk and load it later, quickly, in Python. Example: caching a trained ML model object, saving a configuration dict, or storing a computed result.
Important: Pickle is Python-specific (not a universal format like JSON). If you need cross-language compatibility, prefer JSON / CSV / SQLite / Parquet (depending on data).

Key terms

TermMeaning
Serialize / PickleConvert an object → bytes.
Deserialize / UnpickleConvert bytes → object.
ProtocolPickle format version (affects size/speed/compatibility).
Binary filePickle data is bytes, so files must be opened in binary mode (rb, wb).

2) Mental Model: Bytes vs File

Think of pickle in two layers:

  • In memory: object ⇄ bytes (use dumps / loads)
  • On disk: object ⇄ file (use dump / load)
# In memory:
obj --pickle.dumps--> bytes --pickle.loads--> obj

# On disk:
obj --pickle.dump(file)--> .pkl file --pickle.load(file)--> obj
Rule of thumb: If you already have a file handle, use dump/load. If you want the raw bytes (for DB, cache, network), use dumps/loads.

3) Quick Start (Two Simple Examples)

Example A: Save and load a dictionary to a file

import pickle

data = {
    "name": "Champak",
    "city": "Varanasi",
    "scores": [95, 88, 91],
}

# Save (write binary)
with open("data.pkl", "wb") as f:
    pickle.dump(data, f)

# Load (read binary)
with open("data.pkl", "rb") as f:
    loaded = pickle.load(f)

print(loaded)
print(type(loaded))

Example B: Convert object to bytes and back

import pickle

numbers = [1, 2, 3, 4]

b = pickle.dumps(numbers)     # object -> bytes
again = pickle.loads(b)       # bytes -> object

print(b[:20], "...")          # just show a small slice
print(again)
What you learned: dump/load = file-based, dumps/loads = in-memory bytes.

4) dumps() and loads() (In-memory)

Use these when you want bytes directly—for example storing in a database, sending over a socket, or putting into a cache.

Basic usage

import pickle

obj = {"a": 1, "b": [2, 3]}
blob = pickle.dumps(obj)
restored = pickle.loads(blob)

print(type(blob))      # bytes
print(restored)

Using protocols with dumps

import pickle

obj = {"x": list(range(1000))}

b_default = pickle.dumps(obj)                 # default protocol
b_v4 = pickle.dumps(obj, protocol=4)          # explicit protocol
b_highest = pickle.dumps(obj, protocol=pickle.HIGHEST_PROTOCOL)

print(len(b_default), len(b_v4), len(b_highest))
Note: Different protocols may produce different sizes and performance. Using pickle.HIGHEST_PROTOCOL usually gives better efficiency for modern Python, but you may need an older protocol for compatibility with older Python versions.

5) dump() and load() (Files)

Use these to write and read pickled objects from files. The biggest gotcha: open files in binary mode.

Saving one object

import pickle

settings = {"theme": "light saffron", "font_size": 16}

with open("settings.pkl", "wb") as f:
    pickle.dump(settings, f, protocol=pickle.HIGHEST_PROTOCOL)

Loading one object

import pickle

with open("settings.pkl", "rb") as f:
    settings = pickle.load(f)

print(settings)

Saving multiple objects to the same file (one after another)

import pickle

with open("many.pkl", "wb") as f:
    pickle.dump("first", f)
    pickle.dump([1, 2, 3], f)
    pickle.dump({"k": "v"}, f)

with open("many.pkl", "rb") as f:
    a = pickle.load(f)
    b = pickle.load(f)
    c = pickle.load(f)

print(a, b, c)
Tip: If you store multiple objects in one file, you must read them back in the same order.

6) File Modes (Binary is Mandatory)

Pickle outputs bytes, so use these modes:

TaskModeExample
Write new pickle filewbopen("x.pkl","wb")
Read pickle filerbopen("x.pkl","rb")
Append more objectsabopen("x.pkl","ab")
Common mistake: Using "w" or "r" (text mode) can break pickles or corrupt data. Always use wb/rb.

7) Protocols & Compatibility

Pickle protocols are versions of the pickle format. Newer protocols are usually faster and smaller, but older Python versions may not be able to read them.

Choosing a protocol

  • For modern projects: use pickle.HIGHEST_PROTOCOL
  • If you must support older Python: specify a lower protocol (commonly 4 or 3)
import pickle

obj = {"numbers": list(range(10))}

with open("p4.pkl", "wb") as f:
    pickle.dump(obj, f, protocol=4)
Note: A pickle created in Python is not guaranteed to be readable forever if your code changes (e.g., you rename a class or move a module). See the section on custom classes.

8) What Can Be Pickled (and What Usually Cannot)

Commonly picklable

  • Basic types: int, float, str, bool
  • Containers: list, tuple, dict, set
  • Nested combinations of the above
  • Many user-defined classes (if defined at module top-level)

Often not picklable (or risky)

  • Open file handles
  • Database connections
  • Threads, locks
  • Lambdas and locally defined functions (common issue)
  • Objects tied to external resources (sockets, GUI handles)
Practical idea: Instead of pickling a DB connection, pickle the connection settings (host/user/db) and recreate the connection when you load the file.

9) Pickling Custom Classes (Most Useful Real-World Skill)

You can pickle instances of classes, but there are rules. The class should usually be defined at the top level of a module. That means: not inside another function.

Example: Pickle a custom object

import pickle

class Student:
    def __init__(self, name, marks):
        self.name = name
        self.marks = marks

    def average(self):
        return sum(self.marks) / len(self.marks)

s1 = Student("Asha", [95, 88, 91])

with open("student.pkl", "wb") as f:
    pickle.dump(s1, f)

with open("student.pkl", "rb") as f:
    s2 = pickle.load(f)

print(s2.name, s2.marks, s2.average())

Common pitfall: defining the class inside a function

import pickle

def make_student():
    class Student:          # ❌ local class (often causes pickling problems)
        def __init__(self, name):
            self.name = name
    return Student("Ravi")

s = make_student()

# This may fail with: "Can't pickle local object ..."
with open("bad.pkl", "wb") as f:
    pickle.dump(s, f)
Fix: Move the class definition to the top level of the module (outside the function).

10) Advanced Control: __getstate__ & __setstate__

Sometimes your object contains something you don’t want to pickle (like a cache, a file handle, or a connection). You can control what gets pickled using __getstate__ and rebuild the missing parts in __setstate__.

Example: Ignore a “non-picklable” attribute (and rebuild it)

import pickle
import time

class Timer:
    def __init__(self, label):
        self.label = label
        self.started_at = time.time()
        self._last_report = None   # pretend this is a cache

    def __getstate__(self):
        state = self.__dict__.copy()
        # do not pickle runtime-only cache
        state["_last_report"] = None
        return state

    def __setstate__(self, state):
        self.__dict__.update(state)
        # rebuild runtime-only fields if needed
        self._last_report = None

t = Timer("study-session")

with open("timer.pkl", "wb") as f:
    pickle.dump(t, f)

with open("timer.pkl", "rb") as f:
    t2 = pickle.load(f)

print(t2.label, t2.started_at, t2._last_report)
Use this when: Your object has some parts that should not be persisted, or cannot be serialized.

11) Common Errors & How to Fix Them

Error / Symptom Likely Cause Fix
TypeError: can't pickle ... Object contains a non-picklable thing (file handle, lock, lambda) Remove it, store only settings, or use __getstate__/__setstate__
AttributeError: Can't pickle local object ... Class or function is defined inside another function Move definitions to top-level (module scope)
EOFError: Ran out of input File is empty or you read past the last object Check file path, ensure it was written, read correct number of objects
UnpicklingError File is not a pickle, is corrupted, or incompatible Verify source, re-create file, ensure correct protocol & python version
Loaded object behaves weirdly after code changes Class moved/renamed; old pickle refers to old module path Keep stable module paths, use migrations, or avoid pickling long-term
Debug tip: Check the file size. If it’s 0 bytes, it likely never wrote successfully. Also ensure you used "wb" not "w".

12) Security (Very Important)

Never unpickle data from an untrusted source. Unpickling can execute arbitrary code during loading. If the pickle file can be modified by others (or downloaded), it can be dangerous.

Safe alternatives (depending on data)

  • JSON: for plain data (dict/list/strings/numbers)
  • CSV: for tabular data
  • SQLite: for structured data and queries
  • MessagePack: compact data exchange (still not as universal as JSON, but common)

When is pickle acceptable?

  • Local-only files you create yourself
  • Data stored in a controlled environment (your machine, your server) with proper permissions
  • Quick caching where security boundaries are clear

13) When to Use Pickle (and When Not)

Use pickle when

  • You need to store Python objects quickly
  • You control the data source
  • You want speed and convenience over portability
  • It’s okay if the file is Python-only

Avoid pickle when

  • You need human-readable files
  • Other languages/tools must read the data
  • You need a stable long-term format across code refactors
  • The data may come from untrusted sources
Good practical guideline: Pickle is excellent for short-term persistence and internal caches. For long-term storage and sharing, prefer more stable formats.

14) Useful Patterns

Pattern A: Save “state” with a version number

Adding a version helps you upgrade later if your data structure changes.

import pickle

STATE_VERSION = 1

state = {
    "version": STATE_VERSION,
    "user": "Champak",
    "progress": {"lesson": "pickle", "done": True},
}

with open("state.pkl", "wb") as f:
    pickle.dump(state, f, protocol=pickle.HIGHEST_PROTOCOL)

with open("state.pkl", "rb") as f:
    loaded = pickle.load(f)

if loaded.get("version") != STATE_VERSION:
    print("State version mismatch. Consider migrating data!")
else:
    print("OK:", loaded)

Pattern B: Atomic write (reduce corruption risk)

Write to a temporary file first, then replace the original.

import os
import pickle
import tempfile

data = {"a": 1, "b": 2}
target = "safe.pkl"

dir_name = os.path.dirname(target) or "."
fd, tmp_path = tempfile.mkstemp(prefix="tmp_", suffix=".pkl", dir=dir_name)

try:
    with os.fdopen(fd, "wb") as f:
        pickle.dump(data, f, protocol=pickle.HIGHEST_PROTOCOL)

    # replace is atomic on many systems
    os.replace(tmp_path, target)
finally:
    # if something went wrong before replace, clean up temp
    if os.path.exists(tmp_path):
        try:
            os.remove(tmp_path)
        except OSError:
            pass

Pattern C: Compress pickles for large data

import pickle
import gzip

big = {"nums": list(range(200000))}

with gzip.open("big.pkl.gz", "wb") as f:
    pickle.dump(big, f, protocol=pickle.HIGHEST_PROTOCOL)

with gzip.open("big.pkl.gz", "rb") as f:
    loaded = pickle.load(f)

print(len(loaded["nums"]))
Reminder: Compression helps size, but adds CPU cost. Use it when files are large.

15) Mini Assignments (Practice)

Assignment 1: Save & load a student report

  • Create a dict: name, class, marks list, total, average
  • Pickle it to report.pkl, then load and print neatly

Assignment 2: Store multiple objects in one file

  • Pickle 3 objects (string, list, dict) into multi.pkl
  • Load them back in correct order and verify types

Assignment 3: Make a custom class picklable

  • Create class Notebook with title + pages(list)
  • Pickle an instance and restore it
  • Verify the restored object methods still work

Assignment 4: Exclude a non-picklable attribute

  • Create a class that holds a “cache” attribute
  • Use __getstate__ to remove/reset it before pickling
  • Load and confirm cache is rebuilt/empty

Assignment 5: Create a small “settings manager”

  • Functions: save_settings(dict, path) and load_settings(path)
  • Store a version number in the dict
  • On load: if version mismatches, print a warning
Goal: By completing these, you’ll know 95% of real-world pickle usage.

16) Quick Self-Check (Mini Quiz)

  1. What is the difference between pickle.dump and pickle.dumps?
  2. Why must you open files in rb / wb mode?
  3. What does “protocol” mean in pickle?
  4. Why is unpickling untrusted data dangerous?
  5. If you get Can't pickle local object, what should you change?
Show suggested answers
  1. dump writes to a file; dumps returns bytes in memory.
  2. Pickle is bytes, and binary mode avoids text encoding/decoding issues.
  3. Protocol is the pickle format version controlling size/speed/compatibility.
  4. Pickle can execute code during loading; malicious pickles can run harmful actions.
  5. Move the class/function to module top-level (not inside another function).

17) Summary Cheat Sheet

# 1) Save object to file
import pickle
with open("file.pkl", "wb") as f:
    pickle.dump(obj, f, protocol=pickle.HIGHEST_PROTOCOL)

# 2) Load object from file
with open("file.pkl", "rb") as f:
    obj = pickle.load(f)

# 3) In-memory bytes
b = pickle.dumps(obj)
obj = pickle.loads(b)

# 4) Always use binary modes: rb / wb / ab

# 5) SECURITY RULE:
# Never unpickle data from untrusted sources.