Lexical attributes are the attributes of a token object which give an idea about what does the token does. In this article, you will learn about a few more significant lexical attributes.
There are many Lexical Attributes in spacy. Here, we are going to see some of the most frequently used lexical attributes.
- Finding the length of the string
- Finding if punctuation is there in the string
- Finding if there is any number in the string
- Finding if there is any percentage symbol in the string
- Finding if there is an email in the string
- Finding if there is any city name in the string
- Finding if there are any alpha characters in the string
- Finding if there are any stop words in the string
- Finding if there is any digit in the string
- Finding if there is any lowercase token in the string
- Finding if there are any URL Link in the string
- Finding if there is any additional space in the string
- Finding if there is any brackets in the string
Importing the spacy library.
import spacy
Loading the smaller spacy model into a nlp
variable.
nlp = spacy.load("en_core_web_sm")
Example 1: Finding the length of the string
Here, we will see how to find the length of the string by using len()
function in spacy. Loading the small spacy model into an nlp
variable. Taking any string and passing that string into a doc
variable. Now we will pass the doc
into len()
function for finding the length of the string.
import spacy
nlp = spacy.load("en_core_web_sm")
text = ' 2021 is far worse than 2020 due to covid'
doc = nlp(text)
print(len(doc))
# Output
10
In the output, we have 10 words but we only have 9 words in the string then how come we have 10? It’s because we have a space at the starting of the string, space also counts that starting space as a word.
Example 2: Finding if punctuation is there in the string
Here, we will see how to find the punctuation from the string by using is_punct
function in spacy. Loading the small spacy model into a nlp
variable. Taking any string and passing that string into a doc
variable.
Now we will be setting a condition and checking every token if they have punctuations in them or not by using is_punct
in the string.
import spacy
nlp = spacy.load("en_core_web_sm")
text = 'Hey! Good Morning'
doc = nlp(text)
for token in doc:
if token.is_punct:
print(token)
# Output
!
The output which we got is an exclamation mark (!) which is attached to the word Hey.
Example 3: Finding if there is any number in the string
Here, we will see how to find the number from the string by using like_num
function in spacy. Now we will be setting a condition and checking every token if they have a number in them or not by using like_num
function in the string.
import spacy
nlp = spacy.load("en_core_web_sm")
text = '2021 is far worse than 2020 due to covid'
doc = nlp(text)
for token in doc:
if token.like_num:
print(token)
# Output
2021
2020
Example 4: Finding if there is any percentage symbol in the string
Here, we will see how to find the number having a percentage symbol from the string by using like_num() a function in spacy.
Now we will check if there is any number in the string by using a like_num()
function and if we detect a number we will store that number into a variable and then check if they have a percentage symbol or not.
import spacy
nlp = spacy.load("en_core_web_sm")
text = ' 2021 is far worse than 2020 due to covid as their is a hike of 60% cases'
doc = nlp(text)
for token in doc:
if token.like_num:
index_of_next_token=token.i+ 1
next_token=doc[index_of_next_token]
if next_token.text == '%':
print(token.text, '%')
# Output
60 %
Example 5: Finding email in the string
Here, we will see how to find the Email ID in the string by using like_email
a function in spacy.
import spacy
nlp = spacy.load("en_core_web_sm")
text = 'My email id is [email protected]'
doc = nlp(text)
for token in doc:
if token.like_email:
print(token)
# Output
[email protected]
Example 6: Finding if there is any city name in the string
Here, we will see how to find a city name from the string by using a set_extensio() function in spacy. We will load the spacy modal into an nlp
variable.
Now we will be giving some of the city names explicitly and store them in a city_getter
variable. Now we will be checking if the city names which we predefined are there or not in the string.
import spacy
from spacy.tokens import Doc
nlp = spacy.load("en_core_web_sm")
city_getter = lambda doc: any(city in doc.text for city in ("New York", "Paris", "Berlin"))
Doc.set_extension("has_city", getter=city_getter)
doc = nlp("I like Paris")
print(doc._.has_city)
# Output
True
Example 7: Finding if there is any alpha characters in the string
Here, we will see how to find the alpha characters in the string by using is_alpha
a function in spacy.
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Hey! Good Morning")
for token in doc:
print(token.is_alpha)
# Output
True
False
True
True
Example 8: Finding if there are any stop words in the string
Here, we will see how to find the stop words in the string by using an is_stop
function in spacy. Stop words are unwanted words which doesn’t hold any significance in the sentence like, and, or, but, etc.
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Hey! Good Morning and have a good day")
for token in doc:
print(token.is_stop)
# Output
False
False
False
False
True
True
True
False
False
Example 9: Finding if there is any digit present in the string
Here, we will see how to find the digits in the string by using a is_digit
function in spacy.
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Hey my number is 2345")
for word in doc:
lexeme = doc.vocab[word.text]
print(lexeme.is_digit)
# Output
False
False
False
False
True
Example 10: finding if there is any lowercase token present in the string
Here, we will see if any words are in lower case or not by using a is_lower
function in spacy.
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Hey! Good morning")
for token in doc:
print(token.is_lower)
# Output
False
False
False
True
Example 11: Finding if there is any URL link in the string
Here, we will see how to find the URL link from the string by using a like_url
function in spacy.
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp(" Hey! Good login here https://www.google.com/")
for token in doc:
if token.like_url:
print(token.text)
# Output
https://www.google.com/
Example 12: Finding if the any additionall space in the string
Here, we will see if any words have additional space or not by using a is_space
function in spacy. Here in the example, we have additional space in the string before Hey word.
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp(" Hey! Good morning")
for token in doc:
print(token.is_space)
# Output
True
False
False
False
False
Example 13: Finding if the string contains bracket
Here, we will see if any words have brackets ‘()’ or not by using a is_bracket
function in spacy.
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp(" Hey! Good morning (all)")
for token in doc:
print(token.is_bracket)
# Output
False
False
False
False
False
True
False
True
Final Conclusion
Here we saw what is lexical attributes and how to use different lexical attributes in spacy.