1) What is pickle?
pickle is a Python module that converts a Python object into a stream of bytes (called
pickling or serialization) and converts those bytes back into the original object
(called unpickling or deserialization).
Key terms
| Term | Meaning |
|---|---|
| Serialize / Pickle | Convert an object → bytes. |
| Deserialize / Unpickle | Convert bytes → object. |
| Protocol | Pickle format version (affects size/speed/compatibility). |
| Binary file | Pickle data is bytes, so files must be opened in binary mode (rb, wb). |
2) Mental Model: Bytes vs File
Think of pickle in two layers:
- In memory: object ⇄ bytes (use
dumps/loads) - On disk: object ⇄ file (use
dump/load)
# In memory:
obj --pickle.dumps--> bytes --pickle.loads--> obj
# On disk:
obj --pickle.dump(file)--> .pkl file --pickle.load(file)--> obj
dump/load.
If you want the raw bytes (for DB, cache, network), use dumps/loads.
3) Quick Start (Two Simple Examples)
Example A: Save and load a dictionary to a file
import pickle
data = {
"name": "Champak",
"city": "Varanasi",
"scores": [95, 88, 91],
}
# Save (write binary)
with open("data.pkl", "wb") as f:
pickle.dump(data, f)
# Load (read binary)
with open("data.pkl", "rb") as f:
loaded = pickle.load(f)
print(loaded)
print(type(loaded))
Example B: Convert object to bytes and back
import pickle
numbers = [1, 2, 3, 4]
b = pickle.dumps(numbers) # object -> bytes
again = pickle.loads(b) # bytes -> object
print(b[:20], "...") # just show a small slice
print(again)
dump/load = file-based, dumps/loads = in-memory bytes.
4) dumps() and loads() (In-memory)
Use these when you want bytes directly—for example storing in a database, sending over a socket, or putting into a cache.
Basic usage
import pickle
obj = {"a": 1, "b": [2, 3]}
blob = pickle.dumps(obj)
restored = pickle.loads(blob)
print(type(blob)) # bytes
print(restored)
Using protocols with dumps
import pickle
obj = {"x": list(range(1000))}
b_default = pickle.dumps(obj) # default protocol
b_v4 = pickle.dumps(obj, protocol=4) # explicit protocol
b_highest = pickle.dumps(obj, protocol=pickle.HIGHEST_PROTOCOL)
print(len(b_default), len(b_v4), len(b_highest))
pickle.HIGHEST_PROTOCOL usually gives better efficiency for modern Python,
but you may need an older protocol for compatibility with older Python versions.
5) dump() and load() (Files)
Use these to write and read pickled objects from files. The biggest gotcha: open files in binary mode.
Saving one object
import pickle
settings = {"theme": "light saffron", "font_size": 16}
with open("settings.pkl", "wb") as f:
pickle.dump(settings, f, protocol=pickle.HIGHEST_PROTOCOL)
Loading one object
import pickle
with open("settings.pkl", "rb") as f:
settings = pickle.load(f)
print(settings)
Saving multiple objects to the same file (one after another)
import pickle
with open("many.pkl", "wb") as f:
pickle.dump("first", f)
pickle.dump([1, 2, 3], f)
pickle.dump({"k": "v"}, f)
with open("many.pkl", "rb") as f:
a = pickle.load(f)
b = pickle.load(f)
c = pickle.load(f)
print(a, b, c)
6) File Modes (Binary is Mandatory)
Pickle outputs bytes, so use these modes:
| Task | Mode | Example |
|---|---|---|
| Write new pickle file | wb | open("x.pkl","wb") |
| Read pickle file | rb | open("x.pkl","rb") |
| Append more objects | ab | open("x.pkl","ab") |
"w" or "r" (text mode) can break pickles or corrupt data.
Always use wb/rb.
7) Protocols & Compatibility
Pickle protocols are versions of the pickle format. Newer protocols are usually faster and smaller, but older Python versions may not be able to read them.
Choosing a protocol
- For modern projects: use
pickle.HIGHEST_PROTOCOL - If you must support older Python: specify a lower protocol (commonly 4 or 3)
import pickle
obj = {"numbers": list(range(10))}
with open("p4.pkl", "wb") as f:
pickle.dump(obj, f, protocol=4)
8) What Can Be Pickled (and What Usually Cannot)
Commonly picklable
- Basic types:
int,float,str,bool - Containers:
list,tuple,dict,set - Nested combinations of the above
- Many user-defined classes (if defined at module top-level)
Often not picklable (or risky)
- Open file handles
- Database connections
- Threads, locks
- Lambdas and locally defined functions (common issue)
- Objects tied to external resources (sockets, GUI handles)
9) Pickling Custom Classes (Most Useful Real-World Skill)
You can pickle instances of classes, but there are rules. The class should usually be defined at the top level of a module. That means: not inside another function.
Example: Pickle a custom object
import pickle
class Student:
def __init__(self, name, marks):
self.name = name
self.marks = marks
def average(self):
return sum(self.marks) / len(self.marks)
s1 = Student("Asha", [95, 88, 91])
with open("student.pkl", "wb") as f:
pickle.dump(s1, f)
with open("student.pkl", "rb") as f:
s2 = pickle.load(f)
print(s2.name, s2.marks, s2.average())
Common pitfall: defining the class inside a function
import pickle
def make_student():
class Student: # ❌ local class (often causes pickling problems)
def __init__(self, name):
self.name = name
return Student("Ravi")
s = make_student()
# This may fail with: "Can't pickle local object ..."
with open("bad.pkl", "wb") as f:
pickle.dump(s, f)
10) Advanced Control: __getstate__ & __setstate__
Sometimes your object contains something you don’t want to pickle (like a cache, a file handle, or a connection).
You can control what gets pickled using __getstate__ and rebuild the missing parts in __setstate__.
Example: Ignore a “non-picklable” attribute (and rebuild it)
import pickle
import time
class Timer:
def __init__(self, label):
self.label = label
self.started_at = time.time()
self._last_report = None # pretend this is a cache
def __getstate__(self):
state = self.__dict__.copy()
# do not pickle runtime-only cache
state["_last_report"] = None
return state
def __setstate__(self, state):
self.__dict__.update(state)
# rebuild runtime-only fields if needed
self._last_report = None
t = Timer("study-session")
with open("timer.pkl", "wb") as f:
pickle.dump(t, f)
with open("timer.pkl", "rb") as f:
t2 = pickle.load(f)
print(t2.label, t2.started_at, t2._last_report)
11) Common Errors & How to Fix Them
| Error / Symptom | Likely Cause | Fix |
|---|---|---|
TypeError: can't pickle ... |
Object contains a non-picklable thing (file handle, lock, lambda) | Remove it, store only settings, or use __getstate__/__setstate__ |
AttributeError: Can't pickle local object ... |
Class or function is defined inside another function | Move definitions to top-level (module scope) |
EOFError: Ran out of input |
File is empty or you read past the last object | Check file path, ensure it was written, read correct number of objects |
UnpicklingError |
File is not a pickle, is corrupted, or incompatible | Verify source, re-create file, ensure correct protocol & python version |
| Loaded object behaves weirdly after code changes | Class moved/renamed; old pickle refers to old module path | Keep stable module paths, use migrations, or avoid pickling long-term |
0 bytes, it likely never wrote successfully.
Also ensure you used "wb" not "w".
12) Security (Very Important)
Safe alternatives (depending on data)
- JSON: for plain data (dict/list/strings/numbers)
- CSV: for tabular data
- SQLite: for structured data and queries
- MessagePack: compact data exchange (still not as universal as JSON, but common)
When is pickle acceptable?
- Local-only files you create yourself
- Data stored in a controlled environment (your machine, your server) with proper permissions
- Quick caching where security boundaries are clear
13) When to Use Pickle (and When Not)
Use pickle when
- You need to store Python objects quickly
- You control the data source
- You want speed and convenience over portability
- It’s okay if the file is Python-only
Avoid pickle when
- You need human-readable files
- Other languages/tools must read the data
- You need a stable long-term format across code refactors
- The data may come from untrusted sources
14) Useful Patterns
Pattern A: Save “state” with a version number
Adding a version helps you upgrade later if your data structure changes.
import pickle
STATE_VERSION = 1
state = {
"version": STATE_VERSION,
"user": "Champak",
"progress": {"lesson": "pickle", "done": True},
}
with open("state.pkl", "wb") as f:
pickle.dump(state, f, protocol=pickle.HIGHEST_PROTOCOL)
with open("state.pkl", "rb") as f:
loaded = pickle.load(f)
if loaded.get("version") != STATE_VERSION:
print("State version mismatch. Consider migrating data!")
else:
print("OK:", loaded)
Pattern B: Atomic write (reduce corruption risk)
Write to a temporary file first, then replace the original.
import os
import pickle
import tempfile
data = {"a": 1, "b": 2}
target = "safe.pkl"
dir_name = os.path.dirname(target) or "."
fd, tmp_path = tempfile.mkstemp(prefix="tmp_", suffix=".pkl", dir=dir_name)
try:
with os.fdopen(fd, "wb") as f:
pickle.dump(data, f, protocol=pickle.HIGHEST_PROTOCOL)
# replace is atomic on many systems
os.replace(tmp_path, target)
finally:
# if something went wrong before replace, clean up temp
if os.path.exists(tmp_path):
try:
os.remove(tmp_path)
except OSError:
pass
Pattern C: Compress pickles for large data
import pickle
import gzip
big = {"nums": list(range(200000))}
with gzip.open("big.pkl.gz", "wb") as f:
pickle.dump(big, f, protocol=pickle.HIGHEST_PROTOCOL)
with gzip.open("big.pkl.gz", "rb") as f:
loaded = pickle.load(f)
print(len(loaded["nums"]))
15) Mini Assignments (Practice)
Assignment 1: Save & load a student report
- Create a dict: name, class, marks list, total, average
- Pickle it to
report.pkl, then load and print neatly
Assignment 2: Store multiple objects in one file
- Pickle 3 objects (string, list, dict) into
multi.pkl - Load them back in correct order and verify types
Assignment 3: Make a custom class picklable
- Create class
Notebookwith title + pages(list) - Pickle an instance and restore it
- Verify the restored object methods still work
Assignment 4: Exclude a non-picklable attribute
- Create a class that holds a “cache” attribute
- Use
__getstate__to remove/reset it before pickling - Load and confirm cache is rebuilt/empty
Assignment 5: Create a small “settings manager”
- Functions:
save_settings(dict, path)andload_settings(path) - Store a version number in the dict
- On load: if version mismatches, print a warning
16) Quick Self-Check (Mini Quiz)
- What is the difference between
pickle.dumpandpickle.dumps? - Why must you open files in
rb/wbmode? - What does “protocol” mean in pickle?
- Why is unpickling untrusted data dangerous?
- If you get
Can't pickle local object, what should you change?
Show suggested answers
dumpwrites to a file;dumpsreturns bytes in memory.- Pickle is bytes, and binary mode avoids text encoding/decoding issues.
- Protocol is the pickle format version controlling size/speed/compatibility.
- Pickle can execute code during loading; malicious pickles can run harmful actions.
- Move the class/function to module top-level (not inside another function).
17) Summary Cheat Sheet
# 1) Save object to file
import pickle
with open("file.pkl", "wb") as f:
pickle.dump(obj, f, protocol=pickle.HIGHEST_PROTOCOL)
# 2) Load object from file
with open("file.pkl", "rb") as f:
obj = pickle.load(f)
# 3) In-memory bytes
b = pickle.dumps(obj)
obj = pickle.loads(b)
# 4) Always use binary modes: rb / wb / ab
# 5) SECURITY RULE:
# Never unpickle data from untrusted sources.