Requests Library is used to make requests to the HTTP websites/API endpoints. Using requests library we can make a request to a URL, get information of a website such as HTML Content, download image, perform authentication for login and form fillup purposes.
Install Requests package
Enter the following command in the terminal to install the requests package. The command is the same for Windows and Mac operating systems.
pip install requests
Once the requests package is downloaded and installed, we can import the module into our program using the import
keyword.
To check the version of the requests package that got installed we can use the .__version__
method.
import requests
print(requests.__version__)
# Output
2.26.0
Types of Requests
The Requests Module uses HTTP in-built methods to perform certain actions on the server. Different Actions such as retrieve data, add data, delete data, etc., can be performed. A Client can access the data on the server using different request methods such as:-
- GET
- POST
- PUT
- DELETE
- PATCH
- HEAD
GET Requests
GET Requests are one of the HTTP methods used to request data from the server. The data will be retrieved from the URL that we pass into the GET method.
The Syntax for making a GET request is given below:-
response = requests.get(URL)
Let’s try performing a GET request on https://chercher.tech/sample/api/product/read, and analyze the response retrieved using the GET request.
import requests
response = requests.get('https://chercher.tech/sample/api/product/read')
Once we make a request on the server, it returns a response and the response object has a few properties that can help us understand the data that is retrieved.
We can view these properties of the response object using the dir()
method.
dir(response)
Among all the properties the most frequently used properties are listed below:-
- json
- content
- text
- elapsed
- headers
- history
- status_code
- cookies
- url
- encoding
- links
- raw
JSON response
When a response is successfully returned by the server, If the response is in JSON format which means that data is stored as key-value pairs, then we can retrieve the data using response.json()
method.
import requests
response = requests.get('https://chercher.tech/sample/api/product/read')
print(response.json())
# Output
{'records': [{'id': '6275', 'name': 'ram', 'description': 'executed',
'price': '1200', 'created': '2021-12-17 16:18:27'},
- - -
Since the data is in JSON format we can parse through the data using keys and indexes.
import requests
response = requests.get('https://chercher.tech/sample/api/product/read')
json_data = response.json()
print(json_data['records'][0])
# Output
{'id': '6275',
'name': 'ram',
'description': 'executed',
'price': '1200',
'created': '2021-12-17 16:18:27'
}
If you want to get a specific entry from the above output you can access the element similar to accessing the items of a dictionary.
import requests
response = requests.get('https://chercher.tech/sample/api/product/read')
json_data = response.json()
print(json_data['records'][0]['price'])
# Output
1200
Text response
If the response is in text format, we can read the response using response.text
method.
import requests
response = requests.get('https://chercher.tech/sample/api/product/read')
print(response.text)
Binary response
If the response returned is in a binary format such as image or media files then we can retrieve the response in the binary format itself using response.content
, we can check the datatype returned response using type()
and evaluate that the data is in bytes format.
print(response.content)
print(type(response.content)
# Output
b'{"records": [{"id": "6275","name": "ram", - - - - - - "created": "2020-10-06 17:45:22"}]}'
<class 'bytes'>
Note:- The properties and methods for a response object of a POST, PUT, PATCH is almost similar to GET Request.
POST Requests
POST Requests is one of the HTTP requests and is used to send data to the server along with the body of the request. Post request is mostly used to send the form data to the server.
It is also used to upload multiple file-formats to the server and store them. The main advantage of the POST request is that it allows us to send a large amount of data to the server in a single request.
- Performing Multiple POST requests with same data on the same URL creates equal number of multiple copies with different ID’s
- POST Request creates a new sub-ordinate or a child for a resource if the resource already exists.
- The POST Requests are not Cached by-default. But they can be Cached by using the
Expires
header orCache-Control
header
The Syntax to perform a post request on a URL is given below:-
requests.post(URL, data = {key : value})
The data is sent in a dictionary using the data
attribute. The reason behind this is that the post requests expect a form data or a file to be sent to the server, we need to use data
attribute
In the below example, we are sending data to the URL using a post request. Since we are performing a post request we need to use requests.post()
import requests
info = {
'name' : 'chercher_requests',
'description' : 'post example',
}
response = requests.post('https://httpbin.org/post', data = info)
print(response.text)
Output:-
We can also send JSON data to the same URL. But the data has to be serialized before sending it over to the server.
To serialize the data we use json.dumps()
and pass the data in a dictionary format. In the below example, we perform serialization of the same data and send it to the server.
import requests
import json
info = {
'name' : 'chercher_requests',
'description' : 'post example',
}
response = requests.post('https://httpbin.org/post', data = json.dumps(info))
print(response.text)
Output:- In the previous example, we have sent form data using post request. But now we have sent json data it will be stored under json attribute. The same can be viewed in the output below.
PUT Requests
PUT Requests is an HTTP Requests used to update a resource and is sent along with the request body. In the PUT request if the resource already exists then it just updates it with new data.
PUT is mostly useful when we want to modify a singular resource that is a part of resource collection or already exists.
- Calling the PUT request multiple time update the same resource multiple times and gives a single resource as a end result
- Since PUT request is idempotent which means making multiple identical requests results has the same effect as the single request
- PUT request should not Cached bacause of it’s idempotency
Unlike POST, In the PUT request when we perform the same request multiple times, the resource end result remained the same as shown in the figure below:-
The syntax for making a put request remains the same as post request, we just need to change the request to requests.put()
requests.put(URL, data = {key : value})
The data is sent in a dictionary using the data
attribute. In the below example, we are sending data to the URL using a put request. Since we are performing a put request we need to use requests.put()
import requests
info = {
'name' : 'chercher_requests',
'description' : 'put example',
}
response = requests.put('https://httpbin.org/put', data = info)
print(response.text)
Output:-
PUT vs POST
- PUT is used to update/modify a single resource
- Multiple same PUT requests returns same result as a single request. Doesnot create a new resource
- PUT method is idempotent which results in same resource
- PUT method should not Cached
- PUT should be used when the same requests are being performed for update operations
- ‘N’ PUT operations could result in a single resource
- POST is used to create a child resource to the original resource
- Multiple same POST requests makes multiple sub-ordinates or child resources.
- POST method is not idempotent since it creates new sub-ordinates for every request
- POST method could be Cached using specific header parameters
- POST should be used when different requests are performed to create resources
- ‘N’ POST operations could result in ‘N’ resources
DELETE Requests
DELETE Requests are performed to delete data. It deletes the resources on a specific URL.
requests.delete(URL)
The URL of the resource will be passed to the delete request. The server returns a response by deleting the resources. We can also delete a specific resource using the resource ID.
import requests
response = requests.delete('https://httpbin.org/delete')
print(response.json())
Output:-
PATCH Requests
Patch Requests is also one of the HTTP methods used for partially updating the Resource. The PATCH method is different when compared with the PUT request.
- PATCH method performs partial updation of resource whereas PUT method replace a resource with a new version
- When comparing PUT and PATCH the end result could be same but might differ in size of the resource
To perform a PATCH request we can use requests.patch()
method and pass the URL and the data that needs to be updated.
import requests
info = {
'name' : 'chercher',
'description' : 'patch example',
}
response = requests.patch('https://httpbin.org/patch', data = info)
print(response.json())
Output:-
HEAD Requests
The HEAD method performs requests similar to that of the GET method. But instead of requesting for a whole response object, it only requests for a response header.
The response header contains information such as metadata. This metadata is information about the server.
The Syntax for the HEAD request is given below:-
requests.head(URL)
When we perform a HEAD request, we can try to check the response body but it will be null. Thus unlike other HTTP methods, we cannot retrieve the response body since only the header will be returned from the server.
In the below example, we are trying to retrieve the response body using response.content
which returns a binary object. Since the HEAD method returns only headers the response.content
returns a null value.
import requests
response = requests.head('https://httpbin.org/')
print(response.content)
# Output
b''
We can access the response using response.headers
, which returns the information about the server.
print(response.headers)
Response Status Codes
Apart from reading the response, we can know whether the request is successful or not using response.status_code. Status codes are issued by the server in response to the requests that are made from the client side.
We can check whether the request is successful or not using the status_code
property of the response object.
response.status_code
The Response status codes are classified into 5 different categories, The first digit in the status code represents the class of the status code that it represents.
- 1xx – Informational Response representing the request/information is recieved or understood by the server
- 2xx – Successful is issued when the request made was successful, recieved, understood, and accepted
- 3xx – Redirection represents the client has to perform additional actions to complete the requests
- 4xx – Client Error is intended for situations in which error seems to have been caused on the client side
- 5xx – Server Error indicates the server has encountered error or is incapable of performing the request
Headers
Headers pass additional information such as metadata while performing a request/response on a server. Metadata contains data about the data.
In HTTP request headers, we can pass information regarding the request-context from the client-side such as browser information, accept, Authorization, etc., Most commonly used HTTP Request Headers are listed below:-
- Accept – Determines what kind of media files the client browser could accept for the response
- Accept Encoding – Determines the list of encodings the client can accept/understand. Basically Server responds in decoded format and if the format is not in acceptable format on client-side then 503 response status code is issued
- Authorization – Data related to authentication such as login credentials can be passed using this parameter
- Cache Control – Cache Control is used in both request and response scenarios either to process the request throught the cache or not
- Connection – This parameter is useful to make the connection to be persistent even after completing the request. This connection can be in the form of TCP and can be used for further requests
- Cookie – Cookie are the flat files that store the information regarding the user activity. We can pass certain strings to make sure the requests are processed using either cookies or not
- Content-Type – Sends information regarding what kind of data should be returned as a response by the server
The syntax for passing headers while making a request is given below. In the below example we have set the Content-Type for the server response to be in JSON type. If the response is in a different type then the error status code resembling no data found for the respective type is returned.
import requests
response = requests.get('https://chercher.tech/sample/api/product/read',
headers = {'Content-Type' : 'application/json'})
print(response.headers['Content-Type'])
# Output
application/json; charset=UTF-8
For more information about other Requests Header parameters please click the link Request Header
HTTP Response Headers are returned by the server that contains the information regarding the server. We can get the list of header parameters that are returned as a response from the server using response.header
method.
import requests
response = requests.get('https://chercher.tech/sample/api/product/read')
response_headers = []
for i in response.headers.keys():
response_headers.append(i)
print(response_headers)
# Output
['Date', 'Content-Type', 'Transfer-Encoding', 'Connection', 'access-control-allow-origin',
'cache-control', 'content-encoding', 'display', 'response', - -
- - - ]
Generally, Response Headers are returned in dictionary key-value pairs and can be accessed using the keys.
import requests
response = requests.get('https://chercher.tech/sample/api/product/read')
print(response.headers['Content-Type'])
print(response.headers['response'])
print(response.headers['Server'])
# Output
application/json; charset=UTF-8
200
cloudflare
Cookies in Request/Response
Cookies are flat-file types that store the user activity when we perform requests on a server. When a request is performed server returns a copy of cookies that can store the user activity and a similar copy is stored on the server also.
Cookies can also be used to continue a session from the point where a user might have left. Basically once a request is completed with some response the connection will be terminated.
Instead, we can retrieve the cookies from the response and use them in the next request.
This makes the user experience to be much better while making Request/Response on a server. We can view the cookies using response.cookies
method.
import requests
response = requests.get('https://chercher.tech/python-programming/introduction-to-python')
print(response.cookies)
# Output
<RequestsCookieJar[<Cookie active_template::243221=pub_site.1643878303 for .chercher.tech/>,
<Cookie ezoab_243221=mod1 for .chercher.tech/>, <Cookie ezoadgid_243221=-1 for .chercher.tech/>,
- - - -
We can also retrieve the cookies in a dictionary format by creating a session object and apply get_dict() method.
import requests
request = requests.Session()
response = request.get('https://chercher.tech/python-programming/introduction-to-python')
print(request.cookies.get_dict())
# Output
{'active_template::243221': 'pub_site.1643878303', 'ezoab_243221': 'mod1',
'ezoadgid_243221': '-1', 'ezopvc_243221': '1', 'ezoref_243221': '',
'ezosuibasgeneris-1': '43a05457-4ade-4bc8-41a2-1eeb9df21457',
- - - -
HTTP Sessions
HTTP Sessions help us to persist information among multiple requests. Using Sessions we can send data of an individual user to the server which could be tracked by a Session_ID.
The data of the user is stored on the server and this data could be retrieved by sending the session_id in a cookie.
To perform multiple requests in a single session we need to create a sessions object. To create sessions object we can use requests.Session()
import requests
s = requests.Session()
The session object has all the in-built methods similar to the requests library. Using the session object created in the previous step we can perform requests on the server and pass data.
import requests
import json
s = requests.Session()
data = {
'name' : 'chercher',
'description' : 'Session example',
}
s.get('https://httpbin.org/cookies/set', params = data)
response = s.get('https://httpbin.org/cookies')
print(response.text)
From the below output, we can view the data sent to the server. This data could be accessed as long as the session is live. Thus sessions could help load and process the requests fastly.
# Output
{
"cookies": {
"description": "Session example",
"name": "chercher"
}
}
Requests Timeouts
Timeouts can abort a request if the server takes more time than a certain limit to return a response. When a server takes more time or is not responding, in such a scenario time gets wasted.
To avoid this we can set a timeout
parameter such that if a request takes more than the time specified in the timeout parameter, it will abort the request and give control back to the user.
In the below example, we are trying to perform a request on a URL along with a timeout. We can pass the amount of time it needs to get delayed.
URL:- https://httpbin.org/delay/n
[ n is the timeout ]
import requests
try:
response = s.get('https://httpbin.org/delay/5', timeout = 2)
except requests.exceptions.RequestException as exception:
print(exception)
Output:-
Since the request is taking more time than the timeout that we have specified it will abort the request. This could be really helpful when you are testing the response time of the server.
Errors and Exceptions
While performing requests on a URL we might encounter a few errors. But the most commonly occurred errors are ConnectionError and HTTPError.
ConnectionErrors are caused due to invalid URLs from the client-side, whereas HTTPErrors are occurred due to no response or the URL might be broken.
When an error occurs, the execution of the next part of the program that was dependent on the request might get affected. To avoid such scenario’s we need to handle these errors. One of the ways is to use raise_for_status()
. It will raise an HTTP Error object if an HTTP error has occurred.
These HTTP errors occur because the server cannot process the requested URL, which might have been caused due to broken URLs, or the site not functioning.
import requests
r = requests.get("https://httpbin.org/status/404")
r.raise_for_status()
print('Bad Request', r.status_code)
Output:-
As we can notice from the output, the r.raise_for_status()
is doing its job by throwing an encountered HTTP error but the other code blocks are not executed.
In the below example, when we made a request on the URL, the server returned an HTTP Error. To handle this HTTP Error we passed raise_for_status()
so that it will throw an exception and the control is passed to the exception block. Thus the error is handled efficiently.
import requests
try:
r = requests.get("https://httpbin.org/status/404")
r.raise_for_status()
except requests.exceptions.HTTPError as error:
print('Bad Request', r.status_code)
print('continue')
Proxy Requests
Proxies act as a gateway between the client and the internet. Proxies enable us to access the internet using a different IP address. This way Client’s IP address will be hidden from the network. Proxies also act as a firewall.
- The main advantage of Proxies is that we can cache the data for common requests and this could increase the response time for the requests.
- Also while performing web scraping few sites could block the IP address if we perform multiple requests. Using Proxies we can perform multiple requests without getting blocked
To perform requests using a proxy we need to pass a proxy dictionary with HTTP or HTTPS method as key and IP address as the value. These proxies can be retrieved from various free web site that offers proxies.
proxy = {'https' : 'http://IP_address:Host', 'http' : 'http://IP_address:Host'}
The proxy dictionary could be passed to the request under the parameter “proxies”.
First, let’s check the IP address of our own device. When we make a request on “https://httpbin.org/ip” it returns the IP address of the client as a response.
import requests
r = requests.get("https://httpbin.org/ip")
print(r.text)
Output:-
To perform the same request under a different IP address we need to pass the proxies dictionary as a parameter while making the request.
import requests
proxies = {'https' : 'http://12.218.209.130:53281', 'http' : 'http://12.218.209.130:53281'}
r = requests.get("https://httpbin.org/ip", proxies = proxies)
print(r.text)
Output:-
Authentication Requests
When a client requests a protected/hidden resource, then the server needs to validate whether the user is eligible to access the resource or not. Thus server performs validation using Authentication.
The client needs to pass credentials such as username/password while making the request. Then the authentication is performed on the server-side, if the credentials are valid then the server returns 200 response codes with the requested resource.
There are Different authentications that could be implemented:-
- Basic Authentication
- Digest Authentication
Basic Authentication
Basic Authentication is the simplest form of authentication that could be implemented. In Basic Authentication the client needs to pass user_name and password by embedding them in the request.
In Basic Auth, the user_name and password are sent in an unencrypted base64 encoded text, and it’s better to use HTTPS over HTTP to avoid capturing the password and being re-used by other users or websites.
To implement basic auth we need to import HTTPBasicAuth
from the requests.auth
module
from requests.auth import HTTPBasicAuth
Now we can pass user_name and password to the requested URL.
import requests
from requests.auth import HTTPBasicAuth
basic = HTTPBasicAuth('user_1', '1234')
r = requests.get('https://httpbin.org/basic-auth/user_1/1234', auth=basic)
print(r.status_code)
print(r.json())
Output:-
Digest Authentication
In Digest Authentication security is high when compared to Basic Authentication. In this type of Authentication, a hash function is used on the password before being sent to the server.
While making a request using Digest Authentication and Basic Authentication the process is almost the same but internally the hashing and security features do differ. The below points explain how it is done internally:-
- In Digest Authentication, the user requests for a protected resource without any credentials thus server returns a 401 “unauthorized” response code.
- A pop-window will be opened and the user will be prompted to enter username and password
- These credentials are encrypted using a hash algorithm and then passed to server. If the credentials are valid the resource is returned as response
- The Digest Authentication adds a Client Nonce that can mitigate the Plain-Text Attacks and also adds a QOP( Quality of Protection )
To implement digest auth we need to import HTTPDigestAuth
from the requests.auth
module.
import requests
from requests.auth import HTTPDigestAuth
digest = HTTPDigestAuth('user_2', '1234')
r = requests.get('https://httpbin.org/digest-auth/auth/user_2/1234', auth=digest)
print('digest_auth status code:- ',r.status_code)
print(r.json())
Output:-
Event Hooks
Event Hooks help us to execute particular functions based on the event that occurred. In this case, the event is the response returned due to a request.
Event hooks application is the same in every scenario such as scraping or making requests, but the implementation might slightly vary from one to another.
In Requests, Event hooks are triggered when a response is returned for a request. These Event hooks are basically associated with a Call_back function.
We can associate as many Call_back functions as required with a single event hook. Below is the syntax for associating an event hook with Call_back functions.
def func(response):
print('response attributes or anything')
hook = {'response' : func}
If we want to associate multiple functions then instead of sending it as a single dictionary value, we can send them as a tuple hook = {'response' : (func_1, func_2, .. .. .. )}
In the below example we are associating a Call_back function that accepts the response and returns the response.status_code
, Also while making a request we need to assign the event_hooks variable to the hook parameter along with the URL.
import requests
def status_code(response, *args, **kwargs):
print('Status_Code:- ', response.status_code)
def headers(response, *args, **kwargs):
print('headers:- ', response.headers)
event_hook = {'response' : (status_code, headers)}
response = requests.get('https://httpbin.org/get', hooks = event_hook)
Output:-
SSL Protocol
Secure Socket Layer is a protocol in the form of a certificate that enables a secure connection with the browser. These certificates are public keys issued by servers in small data files.
The client and server share encrypted keys in the form of certificates and verify them before sharing the data.
Once the certificates are verified on both sides then the data is shared in an encrypted format that will be decrypted at the other end using the key. Thus client completes the SSL/TLS Handshake before requesting data from the server.
Generally, the SSL Handshake is done between client and server, but when we perform requests using the requests library the SSL is implemented internally by the requests module.
While making a request we can pass Self-Signed Certificate Authority. To create a Self-Signed Certificate use the code below:-
% openssl req -new -x509 -days 365 -nodes -out cert.pem -keyout cert.pem
Output:-
This Certificate will be sent over an email and could be used while performing requests using verify
attribute along with the URL.
import requests
response = requests.get('https://httpbin.org/get', verify = 'user/downloads/self_signed_certicate')
If we pass verify = False
which means there is no encryption or protocol is not implemented then a warning will be thrown. We can either pass a self-signed certificate or leave it as default. But if we assign False then it throws a warning that the connection is not secure.
import requests
response = requests.get('https://httpbin.org/get', verify = False)
Output:-
Redirect Requests
The Requests Redirection is performed by the server once the user is served with an authorization or an access token. The redirection is applied in different scenarios such as login portal, click an advertisement, etc.,
The requests module by default provides URL redirection for all the HTTP methods such as GET, POST, PUT, DELETE. The redirection for the HEAD method has to be done Manually.
For example, when we perform GET, POST, PUT, DELETE request methods on htttp://github.com, by default it redirects the user from the HTTP website to the HTTPS website.
import requests
response = requests.get('http://github.com')
print(response.url)
print(response.status_code)
We can check the status_code
returned and the URL
. It returns a 200 status_code and a new URL representing that the user is redirected permanently
Output:-
We can also avoid the redirection by setting the allow_redirects
parameter to False. In this case the response.status_code returns 301 code
response = requests.get('http://github.com', allow_redirects= False)
print(response.url)
Output:-