Python 3 is the latest version of Guido van Rossum's powerful general-purpose programming language. It breaks backwards compatibility with the 2.x line but has cleaned up some syntax issues. This article is the first in a series that talks about the changes that affect the language and backwards compatibility, and it provides examples of new features.
Python version 3, also known as Python 3000 or Py3K (a nickname that's a pun on the Microsoft® Windows® 2000 operating system), is the latest version of Guido van Rossum's general-purpose programming language. Although many improvements have been made to the core language, the new version will break backwards compatibility with the 2.x line. Other changes have been anticipated for a while, such as:
- True division—for example, 1/2 returns .5.
- The
longandinttypes have been unified into a single type, with the trailing L now removed. True,False, andNoneare now keywords.
This article—the first in a series on Python 3—covers the new
print() function, input(), changes to input/output
(I/O), the new bytes data type, changes to strings and string
formatting, and finally, changes to the built-in dict type. This
article is meant for programmers already familiar with Python who are curious
about the changes but don't want to wade through the long list of Python
Enhancement Proposals (PEPs).
You'll have to retrain your fingers to stop typing print "hello"
and start typing print("hello"), because print is now
a function, not a statement. I know, it's painful. Every Python programmer I
know—as soon as they install version 3 and get the error "incorrect
syntax"—screams in agony. I know the extra two characters are annoying; I know
this will break backwards compatibility. But there are advantages.
Consider cases in which you need to redirect standard output (stdout) to a
log. The following example opens the file log.txt for appending and assigns the
object to fid. A string is then redirected to the file,
fid, using print>>:
>>>fid = open("log.txt", "a") |
Another example is to redirect to standard error (sys.stderr):
>>>print>>sys.stderr, "an error occurred" |
Both of the previous examples are nice, but there's a better solution. The
new syntax is now to simply pass in a value to the keyword argument
file in the print() function. For example:
>>>fid = open("log.txt", "a") |
This code has much cleaner syntax. Another advantage is the ability to change
the separator by passing a string to the sep keyword argument and
change the end string by passing another string to the end keyword
argument. To change the separator, you could use:
>>>print("Foo", "Bar", sep="%") |
In general, the new syntax is:
print([object, ...][, sep=' '][, end='endline_character_here'][, file=redirect_to_here]) |
where the code inside the square brackets ([]) is optional. By
default, calling print() by itself appends a newline character (
\n).
In Python version 2.x, raw_input() reads an input from standard
input (sys.stdin) and returns a string with the trailing newline character
stripped from the end. The following example uses raw_input() to
grab a string from the command prompt, then assigns the value to
quest.
>>>quest = raw_input("What is your quest? ") |
In contrast, the input() function in Python 2.x expects a valid
Python expression, such as 3+5.
Originally, it was suggested that both input() and
raw_input() be removed from the Python built-in namespace
altogether, thereby requiring an import to have any kind of input capability.
This seemed pedagogically unsound; suddenly, simply typing:
>>>quest = input("What is your quest?") |
would have turned into:
>>>import sys |
which is far more verbose for just a simple input and a lot more to explain
to a novice. This would have required teaching what modules and
imports are, printing a string, and the dot operator. (This suspiciously
feels a little too much like the Java™ language...) So, in Python 3,
raw_input() is renamed input(), and no import is
required to get data from the standard input. If you need to retain the version
2.x input() functionality, use eval(input()), which
works identically.
|
The new data type, the bytes literal, as well as the bytes
object are used for storing binary data. This object is an immutable sequence of
integers between 0 and 127, or ASCII-only characters. Really, it's an immutable
version of the bytearray object from version 2.5. A bytes
literal is a string with a b before it—for example, b'byte
literal'. Evaluation of a bytes literal produces a new bytes
object. You can create a new bytes object with the
bytes() function. The constructor for a bytes object
is:
bytes([initializer[, encoding]]) |
For example:
>>>b = (b'\xc3\x9f\x65\x74\x61') |
creates a bytes object but is redundant, because you can create
a bytes object by simply assigning a byte literal. (I just wanted
to demonstrate that you can do this: I'm not actually suggesting you do it.) If
you wanted to use an iso-8859-1 encoding, you could try this:
>>>b = bytes('\xc3\x9f\x65\x74\x61', 'iso-8859-1') |
If the initializer is a string, you must provide an encoding. If the initializer is a bytes literal, you need not specify the encoding type: Remember, bytes literals are not strings. But like strings, you can concatenate bytes:
>>>b'hello' b' world' |
You use the bytes() method to represent both binary data and
encoded text. To convert from bytes to str, the
bytes object must be decoded (more on this later). Binary data is
decoded with the decode() method. For example:
>>>b'\xc3\x9f\x65\x74\x61'.decode() |
You can also read binary data directly from a file. The code:
>>>data = open('dat.txt', 'rb').read() |
opens for reading a file object in binary mode and reads in the entire file.
|
Python has a single string type, str, that behaves similarly to
the version 2.x unicode type. In other words, all strings are unicode strings.
Also—and very conveniently for non-Latin text users—non-ASCII identifiers are
now permissible. For example:
>>>césar = ["author", "consultant"] |
In previous versions of Python, the repr() method converts 8-bit
strings to ASCII. For example:
>>>repr('é') |
It now returns a unicode string:
>>>repr('é') |
which, as mentioned earlier, is the built-in string type.
String objects and byte objects are incompatible. If you want the string
representation of a byte, use its decode() method. Conversely, use
the encode() method of a string object if you want a bytes literal
from that string.
|
Many Python programmers felt that the built-in % operator for
formatting strings was too constrained, because:
- It is a binary operator and can take at most two arguments.
- Exempting the format string argument, all other arguments must be squeezed in with either a tuple or a dictionary.
This style is somewhat inflexible, so Python 3 introduces a new way of doing
string formatting. (Both the % operator and the
string.Template module are retained in version 3.) String objects
now have a method, format(), that accepts positional and keyword
arguments, which are passed into replacement fields. Replacement fields
are denoted by curly brackets ({}) inside a string. The element
inside a replacement field is simply called a field. Here's a simple
example:
>>>"I love {0}, {1}, and {2}".format("eggs", "bacon", "sausage") |
The fields {0}, {1}, and {2} are passed in the
positional parameters eggs, bacon, and
sausage to the format() method. The following example
shows how to use format() with keyword arguments passed in to
format:
>>>"I love {a}, {b}, and {c}".format(a="eggs", b="bacon", c="sausage") |
Here's another example that combines positional parameters and keyword arguments:
>>>"I love {0}, {1}, and {param}".format("eggs", "bacon", param="sausage") |
Remember that it's a syntax error to have a non-keyword argument after a keyword argument. To escape the curly braces, double them, like so:
>>>"{{0}}".format("can't see me") |
The positional parameter can't see me isn't printed, because
there's no field to print to. Note that this does not cause an error.
The new format() built-in function formats a single value. For
example:
>>>print(format(10.0, "7.3g")) |
In other words, the g stands for general format, which prints a fixed-width number. The first number before the dot specifies the minimum width, and the number after the dot specifies the precision. The complete syntax for format specifiers is beyond the scope of this article, but you can find links to more information in the Resources section.
|
Changes to the built-in dict type
Another major change in 3.0 is the removal of the
dict.iterkeys(), dict.itervalues(), and
dict.iteritems() methods in dictionaries. Instead, you use
.keys(), .values(), and .items(), which
have been revamped to return lightweight, set-like container objects instead of
a list that's a copy of the keys or values. The advantage here is the ability to
perform set operations on keys and items without having to copy
them. For example:
>>>d = {1:"dead", 2:"parrot"} |
Note: In Python, sets are unordered collections of unique elements.
Here, I created a dictionary with two keys and values, then printed the
values of d.items(), which returns an object, not a list of values.
You can test the membership of an element just like a set
object:
>>>1 in d # test for membership |
Here's an example of iterating over the items of the dict_values
object:
>>>for values in d.items(): |
But, if you really want a list of values, you can always cast the returned
dict object. For example:
>>>keys = list(d.keys()) |
|
|
Before delving in to the new mechanisms for I/O, it's necessary to review abstract base classes (ABCs). A more in-depth treatment is provided on this topic in the second part of this series.
ABCs are classes that can't be instantiated. To use an ABC, a subclass
must inherit from the ABC and override its abstract methods. A method is
abstract if it's preceded with the decorator @abstractmethod. The
new ABC framework also provides the @abstractproperty decorator for
defining abstract properties. You access the new framework by importing the
standard library module abc. Listing 1 provides a simple
example.
Listing 1. A simple abstract base class
from abc import ABCMeta |
The register() method call takes a class as an argument and
makes the ABC a subclass of the registered class. You can verify this by calling
the assert statement on the last line. Listing 2 provides another
example that uses decorators.
Listing 2. An implemented abstract base class with decorators
from abc import ABCMeta, abstractmethod |
Now that you know about ABCs, let's continue with the new I/O system.
Previous Python releases lacked important yet exotic functions, such as
seek(), for some stream-like objects. Stream-like objects
are file-like objects with read() and write()
methods—sockets or files, for example. Python 3 has multiple layers for I/O on
stream-like objects—a raw I/O layer, a buffered I/O layer, and a text I/O
layer—each defined with it own ABC with implementations.
You still open a stream using the built-in open(fileName)
function, although you can also call io.open(fileName)). Doing so
returns a buffered text file; read() and readline()
return strings. (Remember that all strings in Python 3 are unicode.) You can
also open a buffered binary file by using the form open(fileName,
'b'). In this case, read() returns bytes, but you can't use
readline().
The constructor for the built-in open() function is:
open(file,mode="r",buffering=None,encoding=None,errors=None,newline=None,closefd=True) |
The possible modes are:
r: Readingw: Open for writinga: Open for appendingb: Binary modet: Text mode+: Open a disk file for updatingU: Universal newline mode
The default mode is rt, or open for reading text mode.
The buffering keyword argument expects one of three integers to
determine the buffering policy:
0: Switches buffering off1: Line buffering> 1: Full buffering (default)
The default encoding is platform dependent. The close file descriptor, or
closefd, can be True or False. If False, the file descriptor is
kept after the file is closed. Providing a file name won't work, in which case,
closefd must be set to True.
The object that open() returns depends on the mode you set.
Table 1 shows the return types.
Table 1. Return types for different open modes
| Mode | Object returned |
|---|---|
| Text mode | TextIOWrapper |
| Binary | BufferedReader |
| Write binary | BufferedWriter |
| Append binary | BufferedWriter |
| Read/Write mode | BufferedRandom |
Note: Text mode can be w, r,
wt, rt, and so on.
The example in Listing 3 opens a buffered binary stream for reading.
Listing 3. Open a buffered binary stream for reading
>>>import io |
The BufferedReader object has access to several useful methods,
such as isatty, peek, raw,
readinto, readline, readlines,
seek, seekable, tell,
writable, write, and writelines, to name
a few. To see the full list, run a dir() on a
BufferedReader object.
|
Whether the Python community will accept version 3 is anyone's guess. The
breaking of backwards compatibility will mean supporting two different versions
in parallel. Some project developers may not want to migrate their projects,
even with the 2to3 converter. Personally, I found that migrating from Python
version 2 to 3 was primarily a matter of relearning a few things: It certainly
wasn't as drastic a change as moving from Python to say the Java or Perl
languages. Many of the changes have been long anticipated, such as true division
and changes to dict. Performing a print() is a whole
lot easier than System.out.println() in Java, so the learning curve
is relatively small and there are advantages to be gained.
I'm guessing from reading entries in the blogosphere that many Pythonistas consider some of the changes—such as the backwards compatibility break—deal breakers. Lambda had originally been scheduled for removal but has been retained, and in its original form. For the complete list of things that are staying, visit the Python core development site. If you're adventurous enough to rummage through the PEPs, you can find more in-depth information there.
The next installment in this series will cover more advanced topics, such as metaclass syntax, ABCs, decorators, integer literal support, base types, and exceptions.
discuss this topic to forum
