×

NumPy String Functions

String Operations

The Numpy package has numpy.char module which provides a set of string operations for arrays of type str or bytes.

  • add()
  • multiply()
  • center()
  • capitalize()
  • join()
  • lower()
  • upper()
  • title()
  • swapcase()
  • partition()
  • replace()
  • split()
  • strip()
  • zfill()
  • encode() and decode()

add():

The add() function of numpy.char module is used for element-wise string concatenation of the two input arrays of type str(string) or Unicode.

Syntax:

char.add(x1, x2)

The x1 and x2 parameters are the input arrays of string or Unicode(array shape should be the same).

import numpy as np
a = np.array(['Welcome', ' the', ' String'])
b = np.array([' to', ' NumPy', ' functions'])
print(np.char.add(a, b))

#Output:
['Welcome to' ' the NumPy' ' String functions']

multiply():

The multiply() function of numpy.char module returns ndarray having multiple string concatenation of the input array( having type str(string) or Unicode), element-wise.

Syntax:

char.multiply(a, i)

The parameter a is the input array of string or Unicode and the i parameter is the value(integer or sequence of integers) of multiple concatenations (if values are less than 0 then it is treated as 0 which implies an empty string).

a = np.array(['Hello ', 'World ', '! '])
print(np.char.multiply(a, [2, 3, 0]))

#Output:
['Hello Hello ' 'World World World ' '']

center():

The center() function returns a copy of the input array with its elements centered in a string of length width.

Syntax:

char.center(a, width, fillchar=' ')

The parameter a is the input array of string or Unicode, the width parameter is the length of the resulting strings, and the fillchar parameter is the padding character to use (default is space).

a = np.array(['Hey', 'there','Mr.','String']) 
print(np.char.center(a, width= 6, fillchar = '_'))

#Output:
['_Hey__' 'there_' '_Mr.__' 'String']

CaSe Handling:

  • capitalize(): The capitalize() function capitalizes only the first character of each element of the input array.
  • lower(): The lower() function returns an array having elements converted to lowercase.
  • upper(): The upper() function returns an array having elements converted to uppercase.
  • title(): The title() function returns the element-wise title-cased version of the input array(array type is string or Unicode).
  • swapcase(): The swapcase() function returns element-wise a copy of the input array of string with uppercase characters converted to lowercase and lowercase characters to uppercase.

For all the case handling functions, parameter a is the input array of strings.

a = np.array(['Hey', 'how','are','you'])
print(np.char.capitalize(a))

#Output:
['Hey' 'How' 'Are' 'You']

print(np.char.lower(a))
#Output:
['string' 'in' 'lowercase']

print(np.char.upper(a))
#Output:
['STRING' 'IN' 'UPPERCASE']

print(np.char.title(a))
#Output:['This Is' 'How The Title' 'Case Strings' 'Looks Like.']

print(np.char.swapcase(a))
#Output:
['tHIS iS' 'a sTRING' 'wITH' 'swapcaseS']

join():

The join() function of numpy.char module returns a ndarray string which is the concatenation of the strings in the input sequence.

Syntax:

char.join(sep, seq)

The sep parameter is the separator values (string or array of string) and the seq parameter is the input array of strings.

a = np.array(['Hey', 'how','are','you'])
sep = ['_', '-', '*', '^']
print(np.char.join(sep, a))

#Output:
['H_e_y' 'h-o-w' 'a*r*e' 'y^o^u']

partition():

The partition() function partitions(divides) each element of the input array as per the specified separator value. The partition() function partition for each element in the input array, and divides the element as the first occurrence of the separator, and returns 3 strings containing the part before the separator, the separator itself, and the part after the separator.

Syntax:

char.partition(a, sep)

The parameter a is the input array of strings and the sep parameter is the separator to split each string element in the input array.

a = np.array(['This0string', 'is0seperated', 'around', 'the white0spaces.'])
print(np.char.partition(a, sep = '0'))

#Output:
[['This' '0' 'string']
 ['is' '0' 'seperated']
 ['around' '0' '']
 ['the white' '0' spaces.']]

replace():

The replace() function replaces each element in the input array. The replace() function returns a copy of the string with all occurrences of the substring that is replaced by the new substring.

Syntax:

char.replace(a, old, new, count=None)

The parameter a is the input array of strings, the old is the string to replace and the new parameter is the string to take the place of the old string, and the count parameter(optional parameter) is the number of occurrences is to be replaced in each element of the array.

a = np.array(['Thistheisthe', 'theatheString',
 'thewiththe', 'thethethe replaced values'])
print(np.char.replace(a, old ='the', new =' ', count = 2))

#Output:
['This is ' ' a String' ' with ' '  the replaced values']

split():

The split() function splits each element in the input array. The split() function returns a list of the words in the string, using a separator as the delimiter string.

Syntax:

char.split(a, sep=None, maxsplit=None)

The parameter a is the input array of strings, the sep parameter(optional) is the separator to split(if not given, it splits whitespace) and the maxsplit parameter(optional) is the number of first occurrences of separator in which the split is done.

a = np.array(['this_is_a_split_string'])
print(np.char.split(a, sep ='_', maxsplit= 3))

#Output:[list(['this', 'is', 'a', 'split_string'])]

strip(), rstrip() and lstrip():

  • strip(): The strip() function strips the specified character if placed in leading and trailing positions of each element in the input array.
  • rstrip(): The rstrip() function strips and return a copy of the input array with the specified trailing characters removed element-wise.
  • lstrip(): The rstrip() function strips and return a copy of the input array with the specified trailing characters removed element-wise.

For all three of the strip functions, the parameter a is the input array of strings, and the chars parameter(optional) is the separator to strip(if not given, it strips whitespaces)

a = np.array(['ab1ab2a', 'AaAab1ab2AC', 'AacAacAa'])

print(np.char.strip(a, chars = 'aA'))
#Output:
['b1ab2' 'b1ab2AC' 'cAac']

print(np.char.rstrip(a, chars = 'a'))
#Output:
['ab1ab2' 'AaAab1ab2AC' 'AacAacA']

print(np.char.lstrip(a, chars = 'Aa'))
#Output:
['b1ab2a' 'b1ab2AC' 'cAacAa']

zfill():

The zfill() function returns the numeric string left-filled with zeros in the input array,element-wise.

Syntax:

char.zfill(a, width)

The parameter a is the input array of strings, and the width parameter is the width of string to left-fill elements in the input array.

If the width is less than the number of characters in elements of the array, then it will strip the characters of elements according to the width specified.

a= np.array(['Hello', 'World'])
print(np.char.zfill(a, width=7))

#Output:
['00Hello' '00World']

print(np.char.zfill(a, width=3))

#Output:
['Hel' 'Wor']

encode() and decode():

  • encode(): The encode() function is used to encode the input array of strings, element-wise.
  • decode(): The decode() function is used to decode the input array of strings, element-wise.

For encode() and decode() functions, parameter a is the input array of strings, and the encoding parameter is the name of an encoding technique.

a = ['Welcome', 'to', 'Python']

b = np.char.encode(a, encoding = 'Cp500')
print(b)
#Output:
[b'\xe6\x85\x93\x83\x96\x94\x85' b'\xa3\x96' b'\xd7\xa8\xa3\x88\x96\x95']

print(np.char.decode(b, encoding = 'Cp500'))
#Output:
['Welcome' 'to' 'Python']

String Comparisons

The numpy.char also provides a set of functions for comparing string values in the array.

  • equal() : The equal() function of numpy.char module performs element-wise equality(==) comparison of two input string arrays.
  • not_equal(): The not_equal() function returns element-wise inequality(!=) comparison of two input string arrays.
  • greater_equal(): The greater_equal() function performs element-wise “>=”comparison of two input string arrays.
  • less_equal(): The less_equal() function calculates element-wise “<=”comparison of two input string arrays.
  • greater(): The greater() function performs element-wise “>”comparison of two input string arrays.
  • less(): The less() function returns element-wise “<“comparison of two input string arrays.

All these functions strip the whitespace from the end of strings of each element before comparison. The x1 and x2 parameters are the two input string arrays to be compared(arrays must have the same shape).

a = np.array(["Hey", "abc", "xyz"])
b = np.array(["Hey", "cba", "xyz"])

print(np.char.equal(a, b))
#Output:
[ True False  True]

print(np.char.not_equal(a, b))
#Output:
[False  True False]

print(np.char.greater_equal(a, b))
#Output:
[ True False  True]

print(np.char.less_equal(a, b))
#Output:
[ True  True  True]

print(np.char.greater(a, b))
#Output:
[False False False]

print(np.char.less(a, b))
#Output:
[False  True False]

compare_chararrays():

The compare_chararrays() function of numpy.char module returns the element-wise comparison of two input string arrays using the specified comparison operator.

Syntax:

char.compare_chararrays(a, b, cmp_op, rstrip)

The a and b parameters are the two input string arrays to be compared(arrays must have same shape), the cmp_op parameter is the type of comparison(“<“, “>”, “>=”, “<=”, “==”, “!=”), and the rstrip parameter(if True) removes the spaces at the end of Strings before the comparison.

a = np.array(["Hey", "Hello", "Hie"])
b = np.array(["Hey", "Hello", "Hei"])
print(np.char.compare_chararrays(a, b,"!=", True))

#Output:
[False False  True]

String Information

The numpy.char module provides a lot of functions to get the information regarding strings in the NumPy array.

  • count(): The count() function of numpy.char module returns an array with the number of non-overlapping occurrences of a substring within the given range.
  • startswith(): The startswith() function returns a boolean array which is True where the string element in the input array starts with prefix, otherwise False.
  • endswith(): The endswith() function returns a boolean array which is True where the string element in the input array ends with suffix, otherwise False.
  • find(): The find() function of numpy.char module finds a substring in the input string array, element-wise.
  • index(): The index() function is also used to find a substring in the input array, element-wise but raises a ValueError when the substring is not found.
  • isalpha(): The isaplha() function checks if all the strings are aplhabetic and there is at least one character in the input array, element-wise, and returns True, otherwise False.
  • isalnum(): The isalnum() function checks if all the strings are alphanumeric and there is atleast one character in the input array, element-wise, and returns True, otherwise False.
  • str_len(): The str_len() function of numpy.char module returns the length of each element in the input string array.

For all the string information functions, the parameter is the input string array, the sub is the substring to search for, the prefix is the starting substring of elements of the array, the suffix is the ending substring of elements of the array, and the start and end parameters(optional) are interpreted as slice notation to specify the range in which to count.

arr = np.array(["array","an array","is array", "are all arrays", "the arrays"])
a = np.array(['a','are','a01']) 

print(np.char.count(arr, "a"))
#Output:
[2 3 2 4 2]

print(np.char.count(arr, "s", start = 3, end = 10))
#Output:
[0 0 0 0 1]

print(np.char.startswith(arr, "ar"))
#Output:
[ True False False  True False]

print(np.char.startswith(arr, "ar", start = 3, end = 10))
#Output:
[False  True  True False False]

print(np.char.endswith(arr, "y"))
#Output:
[ True  True  True False False]

print(np.char.endswith(arr, "ys", start = 3, end = 15))
#Output:
[False False False  True  True]

print(np.char.find(arr, 'arr'))
#Output:
[0 3 3 8 4]

print(np.char.index(arr, 'y'))
#Output:
[ 4  7  7 12  8]

print(np.char.isalpha(a))
#Output:
[ True  True False]

print(np.char.isalnum(a))
#Output:
[ True  True  True]

print(np.char.str_len(a))
#Output:
[1 3 3]