×

Data Serialization and Deserialization in Python

Serialization is the act of transforming a structured object into a sequence of bytes that can be stored in a file system, database, or communicated over the network.

Serialization is the process of encoding complex data into a stream of bytes in such a way that it can be converted back into the original data by a second operation. This can be used to store data or send it across a network.

Deserialization is just a reverse process of serialization. Deserialization is the process of transforming a series of bytes into a structured object. Deserialization is frequently faster than initializing from a class when creating an object.

Picture1
This image shows the serialization and deserialization process

Serialization and Deserialization using pickle library

The module used for serialization in python is the pickle. This module is used to load or deserialize the stored data only back to python. Exchange of data between different programming languages is not possible in the pickle module.

The pickle module is considered to have security and interoperability issues. Pickle is the binary protocol of converting the python objects into bytes while encoding and then loading the bytes back into python objects while decoding.

Pickle is not suitable for processing untrusted data. Pickle is an in-built standard library module. Since pickle is a binary protocol, the file should be opened with binary mode for processing.

Pickle can represent a large number of python datatypes.

dump() function

dump() function serializes data.

  • Syntax : dump(python object,dictionary)
  • Returns : serializes the content into a binary file.
#Serialize
import pickle
friends = {"Dan" : [20,"London", 3234], "Maria" : [22,"Paris",7876]}
with open('friends.dat','wb') as f:
   pickle.dump(f,friends)

Output

The dictionary will be encoded into binary format and stored in friends.dat.

load() function

load() function is used to read the pickle representation.

  • Syntax : load(f),f is a file pointer.
  • Returns : reconstituted object
#Deserialize
with open('friends.dat','rb') as f:
    obj = pickle.load(f)
    print(type(obj))
    print(obj)

Output

dict
{"Dan" : [20,"London", 3234], "Maria" : [22,"Paris",7876]}

The serialization and deserialization of more objects can be done as follows :

#Serialize
import pickle
friend1 = {"Dan" : [20,"London", 3234], "Maria" : [22,"Paris",7876]}
friend2 = {"Joey" : [23,"Newyork", 32394], "Ross" : [25,"Washington",786776]}
friends = (friend1,friend2)
with open('friends.dat','wb') as f:
   pickle.dump(f,friends)

#Deserialize
with open('friends.dat','rb') as f:
    obj = pickle.load(f)
    print(type(obj))
    print(obj)

Output

tuple
{"Dan" : [20,"London", 3234], "Maria" : [22,"Paris",7876]} , {"Joey" : [23,"Newyork", 32394], "Ross" : [25,"Washington",786776]}

Data Serialization using JSON

JSON is easy for humans to read as well as for machines to process data. Unlike Pickle library, JSON provides interoperability for non-python platforms. JSON is mostly used if MongoDB is used as the database.

JSON can be used to work with untrusted data and is more secure than the pickle library. JSON is more preferable to pickle, except that JSON can work only with a subset of python datatypes.

The JSON module is inbuilt in python to encode and decode data. There are two methods for serializing the data in JSON.

dump() function

This function saves data on the file to the disk. dump() function writes the string representation of the python object to the file.

  • Syntax : json.dump(python object, file object)
import json
students = {"Ram" : [14592,"chennai",9.18], "John" : [14122,"Madurai",9.09], "Sanchay" : [13467,"Mysore",8.07]}
with open('students.json','wt') as jfile:
   json.dump(students, jfile)

Output

#students.json
{"Ram" : [14592,"chennai",9.18], "John" : [14122,"Madurai",9.09], "Sanchay" : [13467,"Mysore",8.07]}

In case of handling a large amount of data, the datum will be squished and making it difficult to read. The indent parameter helps to add spaces thereby making it more readable.

import json
students = {"Ram" : [14592,"chennai",9.18], "John" : [14122,"Madurai",9.09], "Sanchay" : [13467,"Mysore",8.07]}
with open('students.json','wt') as jfile:
   json.dump(students, jfile, indent=4)

Output

#students.json
{
  "Ram": [
    14592,
    "chennai",
    9.18
  ],
  "John": [
    14122,
    "Madurai",
    9.09
  ],
  "Sanchay": [
    13467,
    "Mysore",
    8.07
  ]
}

dumps() function

This function converts the string representation into the JSON format.

  • Syntax : json.dumps(python object)
  • Returns : Encoded string object
import json
students={"Ram" : [14592,"chennai",9.18], "John" : [14122,"Madurai",9.09], "Sanchay" : [13467,"Mysore",8.07]}
json_result = json.dumps(students)
print(json_result)

Output

{"Ram" : [14592,"chennai",9.18], "John" : [14122,"Madurai",9.09], "Sanchay" : [13467,"Mysore",8.07]}

The indent parameter can be used with dumps() function to increase the readability

import json
students={"Ram" : [14592,"chennai",9.18], "John" : [14122,"Madurai",9.09], "Sanchay" : [13467,"Mysore",8.07]}
json_result = json.dumps(students,indent=4)
print(json_result)

Output

{
  "Ram": [
    14592,
    "chennai",
    9.18
  ],
  "John": [
    14122,
    "Madurai",
    9.09
  ],
  "Sanchay": [
    13467,
    "Mysore",
    8.07
  ]
}

Data Deserialization using JSON

Deserialization is the process of converting JSON-formatted data into a python data type. Deserialization decodes JSON data into a dictionary in Python (data type in python).

load() function

  • Syntax : json.load(file object)
  • Returns : python object
import json
with open('students.json',rt) as file:
    obj = json.load(file)
    print(type(obj))
    print(obj)

Output

dict
{"Ram" : [14592,"chennai",9.18], "John" : [14122,"Madurai",9.09], "Sanchay" : [13467,"Mysore",8.07]}

loads() function

This function deserializes from the JSON string.

  • Syntax : json.loads(JSON string)
  • Returns : python object
import json
json-string = """  {"Ram" : [
                  14592,  chennai", 9.18 
               ],
 "John" : [
                   14122, "Madurai", 9.09
              ],
 "Sanchay" : [
                    13467, "Mysore", 8.07
                   ]
} """
obj = json.loads(json_string)
print(type(obj)
print(obj)

Output

dict
{"Ram" : [14592,"chennai",9.18], "John" : [14122,"Madurai",9.09], "Sanchay" : [13467,"Mysore",8.07]}

JSON does not support all Python data types.

PYTHONJSON
dictobject
list, tuplearray
setset is not serializable
strstring
int, floatnumber
Truetrue
Falsefalse
Nonenull
This table shows the data type for python vs JSON.