×

Introduction to spacy

Spacy is a free, open-source library used for advanced natural language processing (NLP), written in the programming languages Python and Cython. Spacy is incredible fast as it’s written in CPython language.

Spacy is mainly used in the development of production software and also supports deep learning workflow via statistical models of PyTorch and TensorFlow.

If you are working with text data, you’ll eventually want to know more about spacy. For example, how many keywords related to the product are there in the text, What does the word means in the context, and many more.

What spacy can do?

Spacy provides accurate syntactic analysis and offers many things listed below:

  • Part-of-speech (POS) Tagging,
  • Named Entity Recognition (NER),
  • Syntactic parsing,
  • Tokenizing,
  • Word vectors and similarity,
  • Many convenient methods for cleaning and normalizing text and many more

Spacy model

Spacy has 3 different models small, medium, and large that we can use as per the use case. The large model will take few seconds to load the model. The size for the smaller model is 12MB, the medium model is 43MB, the larger model is 741MB

Install spacy library

We can install the spacy library with pip and anaconda.

Install a spacy library with pip installer to install your Python libraries, go to the command line and execute the following statement.

pip install -U spacy

Install the spacy library with anaconda, you need to execute the following command on the Anaconda prompt.

conda install -c conda-forge spacy

The next step is to download the language model, here, we are going to download an English model.

python -m spacy download en_core_web_trf

Now let’s import the spacy library.

import spacy

Load the spacy model

To use the spacy model we first need to load the model into a variable, here, we have to use variable names as nlp.

nlp = spacy.load(“en_core_web_sm”)

Declaring the variable and downloading the model, spacy will take a couple of seconds to load the model. The load function of the spacy library is to load the model. The model is stored in nlp variable.

Note: Here, we are downloading an English language model, there are other language models too which we can download as per the use-case.

Example with spacy

Here, we will see how to find the length of the string by using len() function in spacy. Loading the small spacy model into a nlp variable. Taking any string and passing that string into a doc variable. Now we will pass the doc into len() function for finding the length of the string.

import spacy
nlp = spacy.load("en_core_web_sm")
text = ' 2021 is far worse than 2020 due to covid'
doc = nlp(text)
print(len(doc))

# Output
10

Final conclusion

We see what is spacy and what are the application of spacy and what spacy can do with the data. In the upcoming articles, we will see all the features of the spacy.