Python string literals
The Python string literals refer to a constant value assigned to the string object. The constant value assigned to the string object has certain form and design. The lexical definition of the Python string literals is shown below.
string_literal=[Prefix](string_characters)
The string literal is a combination of [Prefix] and (string_characters). What do they stand for is shown below.
(If you do not know what string data type is, check the link String data type)
*Note: If you look at the lexical definitions of the string literals in the Python Documentation it is defined more vividly, here I have tried to simplified it for better understanding. The link is https://docs.python.org/3/reference/lexical_analysis.html?highlight=string%20literal#string-and-bytes-literals
[Prefix]: The string prefix can be a single character or two characters. The prefix render certain property to the string. The prefix can be any of the character: “r”, “u”, “R”, “U”, “f” and “F” or their combination: “fr”, “Fr”, “fR”, “FR”, “rf”, “rF”, “Rf” and “RF”.More about string prefix is discussed below.
(string_characters): The (string_characters) can be a single character or a sequece of characters defined under the UTF-8 encoding format such as the English Alphabets, Latin characters and some other characters. You can also change the encoding format to any other format. The default is the UTF-8 format.
The (string_characters) can be also just an escape sequence or a combination of the text with the escape sequence. The escape sequence is explain below.
*Note; The (string_characters) is enclosed by the single quotation(”) or the double quotation(“”) or a triple quotation. The triple quotation means three single quotations(”’string_characters”’) or three double quotations(“””string_characters”””).
Python escape sequence
The escape sequence allows you to insert a character that has a certain meaning which might not be possible to insert it normally. The escape sequence always has the backslash (\) character in front of the character that has the special meaning.
A simple instance of escape sequence is when you want to insert a single quotation in a string that is enclosed by the single quotation. Here you can use the backslash and insert the single quote without any problems. If you do not use the backslash you will get an error.
>>> st='John's New car!!' #Error!! SyntaxError: invalid syntax >>> st='John\'s New car!!' #work fine >>> print(st) John's New car!! >>> #Similarly you can do so for double quote >>> wd="Emma said \"C++ rocks!\" " >>> print(wd) Emma said "C++ rocks!"
More escape sequence characters are given below.
Escape Sequence | Meaning |
---|---|
\newline | Backslash and newline ignored |
\\ | Backslash (\) |
\’ | Single quote (‘) |
\” | Double quote (“) |
\a | ASCII Bell (BEL) |
\b | ASCII Backspace (BS) |
\f | ASCII Formfeed (FF) |
\n | ASCII Linefeed (LF) |
\r | ASCII Carriage Return (CR) |
\t | ASCII Horizontal Tab (TAB) |
\v | ASCII Vertical Tab (VT) |
\ooo | Character with octal value ooo |
\xhh | Character with hex value hh |
The escape sequence “\ooo” and “\xhh” are used to represent a Unicode character meaning a character present in the Unicode, like the Japanese, Chinese or the Russian script, etc.
Escape sequences which are recognized only in string literals are:
Escape Sequence | Meaning |
---|---|
\N{name} | Character named name in the Unicode database |
\uxxxx | Character with 16-bit hex value xxxx |
\Uxxxxxxxx | Character with 32-bit hex value xxxxxxxx |
The third escape sequence ‘\Uxxxxxxxx’ can be used to encode a Unicode character.
Let us look at some of the examples using the escape sequence.
>>> st="Backslash (\) character" >>> print(st) Backslash (\) character >>> st="Backslash (\\) character" >>> print(st) Backslash (\) character
There is no difference between ‘\’ and ‘\\’, both are interpreted as ‘\’.
sound='\a' >>> print(sound)
The ‘\a’ escape sequence should give a sound if you are using cmd to run your Python programs. Python IDLE users will not hear any sound.
>>> #using '\n' >>> nl="New \nyear" >>> print(nl) New year >>> #Using '\f' >>> ff='New\f ngr' >>> print(ff) New ngr >>> #Using '\t' >>> tb="New\tYear" >>> print(tb) New Year
In the code below the ‘\x4d’ and ‘\115’ escape sequences stand for ‘M’.
>>> hx="Hello \x4dO\115!" >>> print(hx) Hello MOM!
Python string literals prefix
The prefix of the string literals can be any of the following “r”, “u”, “R”, “U”, “f”, “F”, “fr”, “Fr”, “fR”, “FR”, “rf”, “rF”, “Rf” and “RF”. Here we will see what they represent.
‘r’ or ‘R’ prefix
The prefix ‘r’ or ‘R’ are the same and when a string is prefixed with it the string is called a raw string. In raw strings the escape sequence are not treated as escape sequence they are treated as a normal character.
Look at the code below.
>>> st="New \n happy" >>> print(st) New happy >>> rs=r"New \n happy" >>> print(rs) New \n happy
In a normal string the ‘\n’ escape sequence is interpreted to a new line character, whereas in the raw string ‘rs’, ‘\n’ remains as it is, it is not treated as an escape sequence.
Here is another example.
>>> #Using \t >>> st=R"a tab \t end" >>> print(st) a tab \t end >>> #using \xhhh >>> hx="Hello \x4dO\115!" >>> print(hx) Hello MOM! >>> #In raw string >>> hr=R"Hello \x4dO\115!" >>> print(hr) Hello \x4dO\115!
Notice the difference in the raw string and the normal string.
Points to note in a raw string
i)We can still escape a single or double quotation, except the backslash will remain.
>>> st=R"\"" >>> print(st) \" >>> s='\'' >>> print(s) ' >>> s=r'\'' >>> print(s) \'
ii)A raw string cannot end in single backslash. Ending in backslash will escape the quotation and the string is render without the ending quote.
>>> s=r'\' #Error!! SyntaxError: EOL while scanning string literal >>> ss=r'\\' >>> print(ss) \\
iii)Using the ‘\newline’ (Press ENTER after ‘\’) escape sequence in a normal string specify that the text written after the ‘\newline’ is a continuation of the previous line. But in a raw string, the ‘\newline’ sequence is taken literally as a new line not a continuation of the previous line.
Consider the code below.
>>> #Normal string >>> sss='''New \ dome''' >>> print(sss) New dome >>> #In a raw string sss=R'''New \ dome''' >>> print(sss) New \ dome
‘u’ or ‘U’ prefix
The ‘u’ or ‘U’ prefix are used for Unicode literals. By Unicode we mean a character encoding format that consists of almost all the known scripts in the world. If you want to use characters other than the English or the Latin alphabets use the Unicode. But even without specifying the ‘u’ or ‘U’ prefix it seems we can still use any Unicode characters.
Consider the code example below.
>>> #Without 'U' s='Д дД дЖ жЖ ж' >>> print(s) Д дД дЖ жЖ ж >>> #With 'U' >>> us=U'Д дД дЖ жЖ ж' >>> print(us) Д дД дЖ жЖ ж
By the way the characters utilized above are Russian characters(I don’t know why I always use Russian characters to test a Unicode program).
‘f’ or ‘F’ prefix
Any string with the prefix ‘f’ or ‘F’ are known as formatted string literals. Formatted string literals can contain a replaceable field. The exact literals ofa formatted string is known only after running the program(ofcourse programmers know the exact string beforehand).
A replaceable field is represented by a braces. Inside the braces the name of the string you want to replace its literal with is written.
Consider the code below.
>>> name="Drawin's" >>> st=f"The {name} theory of evolution." >>> print(st) The Drawin's theory of evolution.
In the program, inside the ‘st’ string we have a replaceable field ‘name’ which is replaced with the ‘Darwin’s’ literal.
There are many more rules that we can apply on the replaceable field while using the formatted string, they are not disucced here. A thorough discussion of the formatted string is done in another post.
Link : Python formatted string literals
“fr”, “Fr”, “fR”, “FR”, “rf”, “rF”, “Rf” and “RF” prefix
“fr”, “Fr”, “fR”, “FR”, “rf”, “rF”, “Rf” and “RF” prefix combine the raw string rule and the formatted string rule. They simply evaluate the string to a raw formatted string or a formatted raw string.
Consider the code below.
>>> field='New york' >>> rs=R'I am from {field} \t and hamster' >>> print(rs) I am from {field} \t and hamster >>> #The '\t' remains as it is >>> fs=F'I am from {field} \t and hamster' >>> print(fs) I am from New york and hamster >>> #The '\t'and {field} is evaluated >>> frs=FR'I am from {field} \t and hamster' >>> print(frs) I am from New york \t and hamster
In the ‘frs’ string the ‘\t’ escape sequence remains as it is and the {field} is replaced with the corressponding string.