Python string slicing ,string concatenation and unicode string encoding ,string decoding

In the previous post we have discuss how to use Python string in our program,here we will see how to perform string slicing ,string concatenation ,Unicode string encoding and string decoding.If you do not how to use string or what is string at all visit the link given below.

Link : Python string data type


String concatenation

By string concatenation mean adding two strings and this can be done easily by just adding two string using the ‘+'(plus) sign.

>>> 'New' + ' World' #work fine
'New World'
>>> str1="One"
>>> str2 "Piece"
>>> str1 + ' ' + str2 #work fine
One Piece
>>> str3=str1 + " " + str2 #still work fine
>>> str3
One Piece

Netx let’s see string slicing.



String slicing

By string slicing we mean separating a part of the string from the original string for our varies needs.To perform string slicing we will follow a certain format.Certain things to note:
 
i)If we need to slice a string we must always know the object name.This means a string must be assigned to some object whose name we are aware of.
 
ii)We will use some index values to slice a string.The index value is written inside ‘[]‘ and it follows the format.
 
string_object_name[start:stop:step]
 
The ‘start‘ index value signify the position from where the string slicing will begin.
The ‘stop‘ index value signify the position where the slicing will end.
The ‘step‘ index value signify the number of characters which should be jump between the ‘start‘ and ‘stop‘ index value.An example will clarify this.
The colon ‘:‘ after each index value is important!

You can mention only the ‘start’ index or ‘stop’ index or both or all the three and depending on that the string will be sliced accordingly.And note the string index counting always begin from 0.

>>> s='Happy birthday to you'
>>> #mentioning only 'start' index value without the colon
>>> s[0] 
'H'
>>> s[5]
' '
>>> s[6] #count from 0 the 6th position map to 'b'
'b'
>>> #mentioning the colon ':' after the 'start' index
>>> s[6:] #slicing begin from 6th character till the end
'birthday to you'
>>> s[3:] #slicing begin from 3rd character till the end
'py birthday to you'

If colon is placed after the index value than the slicing begin from the index value til the end,what if we put the colon before the index value.

>>> #Using the string s='Happy birthday to you'
>>> s[:6] #6th is treated as the 'stop' index
'Happy '
>>> s[:10] #10th is treated as the 'stop' index
'Happy birt'

If the colon is place before the index then the index value is treated as the ‘stop’ index and the string is sliced from the beginning to just before the ‘stop’ index.

The next example uses both ‘start’ and ‘stop’ index.

>>> #Using the string s='Happy birthday to you'
>>> s[6:10] #slicing begin at 6th and end just before 10th
'birt'
>>> s[11:19] #slicing begin at 11th and end just before 19th
'day to y'

Using the three index ‘start’,’stop’ and ‘step’

>>> #Using the string s='Happy birthday to you'
>>> s[2:19:4] #start at 2nd end before 19th and note every 4 characters
'pbh y'

what ‘s[2:9:4] means is the slicing will begin from 2nd i.e. ‘p’ and count 4 starting from ‘p’, the character is ‘b’ now count 4 again starting from ‘b’ which falls to ‘h’ and count 4 starting from ‘h’ and the character is ‘ ‘(space) and count 4 again starting from ‘ ‘(space) which falls on ‘y’,if we count 4 again it is beyond 19th character so the slicing ends here.To simplify things the syntax ‘s[2:19:4]’ means select all the characters lying at this position: ‘2 , (2+4) , (2+4+4) , (2+4+4+4) , (2+4+4+4+4)’.

Another example given below.

>>> ss="Candcplusplus is the best!"
>>> ss[4:15:3] #select all characters lying at 4,4+3,4+3+3,4+3+3+3
'cul '

The last case is only when colon is mentioned without any index.And this will copy the entire string.

>>> ss="Candcplusplus is the best!"
>>> ss1=ss[:] #copy the entire 'ss' string
>>> ss1
'Candcplusplus is the best!'

Unicode string encoding and decoding

We know that every Unicode character has a corresponding integer value which map to the character and vice versa (note the integer value of the character are also written in hexadecimal format ).By encoding a Unicode string mean the Unicode characters are replaced by their hexadecimal value.Consider the code below.

>>> ss="ÆÐÑ Øß ñð" #a Unicode string instance 
>>> encoded_ss =ss.encode('utf-8') #encode 'ss' in utf-8 format
>>> encoded_ss #display the encoded string
b'\xc3\x86\xc3\x90\xc3\x91 \xc3\x98\xc3\x9f \xc3\xb1\xc3\xb0'

The Unicode are encoded to the Utf-8 hexadecimal format.To verify that ‘\xc3\x86’ stands for ‘Æ’ and check other Unicode characters visit the link https://www.utf8-chartable.de/unicode-utf8-table.pl,a table is given in that page you can see their corresponding hexadecimal value.

To decode an encoded Unicode string means to revert back the encoded string to their original character.This is done as shown in the example given below.

>>> encoded_ss.decode('utf-8') #decode the string
'ÆÐÑ Øß ñð' #the original string
>>> #or
>>> ss12=encoded_ss.decode('utf-8') #work fine
>>> ss12
'ÆÐÑ Øß ñð' # the original string


Leave a Reply

Your email address will not be published. Required fields are marked *