×

Lexical Attributes in Spacy

Lexical attributes are the attributes of a token object which give an idea about what does the token does. In this article, you will learn about a few more significant lexical attributes.

There are many Lexical Attributes in spacy. Here, we are going to see some of the most frequently used lexical attributes.

  • Finding the length of the string
  • Finding if punctuation is there in the string
  • Finding if there is any number in the string
  • Finding if there is any percentage symbol in the string
  • Finding if there is an email in the string
  • Finding if there is any city name in the string
  • Finding if there are any alpha characters in the string
  • Finding if there are any stop words in the string
  • Finding if there is any digit in the string
  • Finding if there is any lowercase token in the string
  • Finding if there are any URL Link in the string
  • Finding if there is any additional space in the string
  • Finding if there is any brackets in the string

Importing the spacy library.

import spacy

Loading the smaller spacy model into a nlp variable.

nlp = spacy.load("en_core_web_sm")

Example 1: Finding the length of the string

Here, we will see how to find the length of the string by using len() function in spacy. Loading the small spacy model into an nlp variable. Taking any string and passing that string into a doc variable. Now we will pass the doc into len() function for finding the length of the string.

import spacy
nlp = spacy.load("en_core_web_sm")
text = ' 2021 is far worse than 2020 due to covid'
doc = nlp(text)
print(len(doc))

# Output
10

In the output, we have 10 words but we only have 9 words in the string then how come we have 10? It’s because we have a space at the starting of the string, space also counts that starting space as a word.

Example 2: Finding if punctuation is there in the string

Here, we will see how to find the punctuation from the string by using is_punct function in spacy. Loading the small spacy model into a nlp variable. Taking any string and passing that string into a doc variable.

Now we will be setting a condition and checking every token if they have punctuations in them or not by using is_punct in the string.

import spacy
nlp = spacy.load("en_core_web_sm")
text = 'Hey! Good Morning'
doc = nlp(text)
for token in doc:
    if token.is_punct:
        print(token)

# Output
!

The output which we got is an exclamation mark (!) which is attached to the word Hey.

Example 3: Finding if there is any number in the string

Here, we will see how to find the number from the string by using like_num function in spacy. Now we will be setting a condition and checking every token if they have a number in them or not by using like_num function in the string.

import spacy
nlp = spacy.load("en_core_web_sm")
text = '2021 is far worse than 2020 due to covid'
doc = nlp(text)
for token in doc:
    if token.like_num:
        print(token)

# Output
2021
2020

Example 4: Finding if there is any percentage symbol in the string

Here, we will see how to find the number having a percentage symbol from the string by using like_num() a function in spacy.

Now we will check if there is any number in the string by using a like_num() function and if we detect a number we will store that number into a variable and then check if they have a percentage symbol or not.

import spacy
nlp = spacy.load("en_core_web_sm")
text = ' 2021 is far worse than 2020 due to covid as their is a hike of 60% cases'
doc = nlp(text)
for token in doc:
    if token.like_num:
        index_of_next_token=token.i+ 1
        next_token=doc[index_of_next_token]
        if next_token.text == '%':
            print(token.text, '%')

# Output
60 %

Example 5: Finding email in the string

Here, we will see how to find the Email ID in the string by using like_email a function in spacy.

import spacy
nlp = spacy.load("en_core_web_sm")
text = 'My email id is [email protected]'
doc = nlp(text)
for token in doc:
    if token.like_email:
        print(token)

# Output
[email protected]

Example 6: Finding if there is any city name in the string

Here, we will see how to find a city name from the string by using a set_extensio() function in spacy. We will load the spacy modal into an nlp variable.

Now we will be giving some of the city names explicitly and store them in a city_getter variable. Now we will be checking if the city names which we predefined are there or not in the string.

import spacy
from spacy.tokens import Doc
nlp = spacy.load("en_core_web_sm")

city_getter = lambda doc: any(city in doc.text for city in ("New York", "Paris", "Berlin"))
Doc.set_extension("has_city", getter=city_getter)
doc = nlp("I like Paris")
print(doc._.has_city)

# Output
True

Example 7: Finding if there is any alpha characters in the string

Here, we will see how to find the alpha characters in the string by using is_alpha a function in spacy.

import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Hey! Good Morning")
for token in doc:
    print(token.is_alpha)

# Output
True
False
True
True

Example 8: Finding if there are any stop words in the string

Here, we will see how to find the stop words in the string by using an is_stop function in spacy. Stop words are unwanted words which doesn’t hold any significance in the sentence like, and, or, but, etc.

import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Hey! Good Morning and have a good day")
for token in doc:
    print(token.is_stop)

# Output
False
False
False
False
True
True
True
False
False

Example 9: Finding if there is any digit present in the string

Here, we will see how to find the digits in the string by using a is_digit function in spacy.

import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Hey my number is 2345")
for word in doc:
    lexeme = doc.vocab[word.text]
    print(lexeme.is_digit)

# Output
False
False
False
False
True

Example 10: finding if there is any lowercase token present in the string

Here, we will see if any words are in lower case or not by using a is_lower function in spacy.

import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Hey! Good morning")
for token in doc:
    print(token.is_lower)

# Output
False
False
False
True

Here, we will see how to find the URL link from the string by using a like_url function in spacy.

import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp(" Hey! Good login here https://www.google.com/")
for token in doc:
    if token.like_url:
        print(token.text)

# Output
https://www.google.com/

Example 12: Finding if the any additionall space in the string

Here, we will see if any words have additional space or not by using a is_space function in spacy. Here in the example, we have additional space in the string before Hey word.

import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp(" Hey! Good morning")
for token in doc:
    print(token.is_space)

# Output
True
False
False
False
False

Example 13: Finding if the string contains bracket

Here, we will see if any words have brackets ‘()’ or not by using a is_bracket function in spacy.

import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp(" Hey! Good morning (all)")
for token in doc:
    print(token.is_bracket)

# Output
False
False
False
False
False
True
False
True

Final Conclusion

Here we saw what is lexical attributes and how to use different lexical attributes in spacy.