• home
  • forum
  • my
  • kt
  • download
  • Web Python Tutorial

    Author: 2007-08-25 10:09:20 From:

    This tutorial covers two web programming models that can be used with Python: the CGI standard and mod_python.

    As a general rule if you are in a shared host environment then your only option will be to run python scripts as CGI. Some specialized hosting providers will let you use and configure mod_python. Mod_python's performance and flexibility is much superior to CGI.

    There is no need to read the mod_python section if you want CGI and vice versa. They are not related to each other. To follow this tutorial it is required that you have a basic Python programming knowledge. If you need to learn how to program visit the Python Tutorial first.

    Please post any doubt, suggestion or critique at the Web Python Board and help making this a better tutorial.

    Some host providers only let you run CGI scripts in a certain directory, often named cgi-bin. In this case all you have to do to run the script is to call it like this:

    http://my_server.tld/cgi-bin/my_script.py

    The script will have to be made executable by "others". Give it a 755 permission or check the executable boxes if there is a graphical FTP interface.

    Some hosts let you run CGI scripts in any directory. In some of these hosts you don't have to do anything to configure the directories. In others you will have to add these lines to a file named .htaccess in the directory you want to run CGI scripts from:

    Options +ExecCGI
    AddHandler cgi-script .py

    If the file does not exist create it. All directories below a directory with a .htaccess file will inherit the configurations. So if you want to be able to run CGI scripts from all directories create this file in the document root.

    If you are using your own server then probably you won't need to do anything to run a CGI script at the cgi-bin directory. Just make sure there is a line like the next in httpd.conf and that it is not commented. The trailing slashs are required.

    ScriptAlias /cgi-bin/ "/path/to/cgi-bin/directory/"

    If you are using the line above and want html files to be handled correctly in the cgi-bin directory add the next to httpd.conf. No trailing slash.

    <Directory /path/to/cgi-bin/directory>
       AddHandler default-handler .html .htm
    </Directory>

    To run a script saved at the root:

    http://my_server.tld/my_script.py

    If it was saved in some directory:

    http://my_server.tld/some_dir/some_subdir/my_script.py

    If your desktop is the server then execute it like this:

    http://localhost/cgi-bin/my_script.py

    In Windows, sometimes Apache will listen on port 8080. In this case the above address will be written with the port:

    http://localhost:8080/cgi-bin/my_script.py

    Make sure all text files you upload to the server are uploaded as text (not binary), specially if you are in Windows, otherwise you will have problems.

    This is the classical "Hello World" in python CGI fashion:

    #!/usr/bin/env python
    print "Content-Type: text/html"
    print
    print """\
    <html>
    <body>
    <h2>Hello World!</h2>
    </body>
    </html>
    """

    To test your setup save it with the .py extension, upload it to your server as text and make it executable before trying to run it.

    The first line of a python CGI script sets the path where the python interpreter will be found in the server. Ask your provider what is the correct one. If it is wrong the script will fail. Some examples:

    #!/usr/bin/python
    #!/usr/bin/python2.3
    #!/usr/bin/python2.4
    #!c:\Python24\python.exe
    #!c:\Python25\python.exe

    The first 3 lines above are Linux paths and the last 2 are Windows paths.

    It is necessary that the script outputs the HTTP header. The HTTP header consists of one or more messages followed by a blank line. If the output of the script is to be interpreted as HTML then the content type will be text/html. The blank line signals the end of the header and is required.

    print "Content-Type: text/html"
    print

    Many times the blank line will be written as \n:

    print "Content-Type: text/html\n"

    If you change the content type to text/plain the browser will not interpret the script's output as HTML but as pure text and you will only see the HTML source. Try it now to never forget. A page refresh may be necessary for it to work.

    Client versus Server

    All python code will be executed at the server only. The client's agent (for example the browser) will never see a single line of python. Instead it will only get the script's output. This is something realy important to understand.

    When programming for the Web you are in a client-server environment, that is, do not make things like trying to open a file in the client's computer as if the script were running there. It isn't.

    To catch syntax error messages run the script in a local shell before uploading to the server. Header errors are hard to catch unless you have access to the server logs. In case you have, look for error_log and access_log in Linux and for error.log and access.log in Windows.

    For a nice exceptions report there is the cgitb module. It will show a traceback inside a context. The default output is sent to standard output as HTML:

    #!/usr/bin/env python
    print "Content-Type: text/html"
    print
    import cgitb; cgitb.enable()
    print 1/0

    The handler() method can be used to handle only the catched exceptions:

    #!/usr/bin/env python
    print "Content-Type: text/html"
    print
    import cgitb
    try:
       f = open('non-existent-file.txt', 'r')
    except:
       cgitb.handler()

    There is also the option for a crude approach making the header "text/plain" and setting the standard error to standard out:

    #!/usr/bin/env python
    print "Content-Type: text/plain"
    print
    import sys
    sys.stderr = sys.stdout
    f = open('non-existent-file.txt', 'r')

    Will output this:

    Traceback (most recent call last):
      File "/var/www/html/teste/cgi-bin/text_error.py", line 6, in ?
        f = open('non-existent-file.txt', 'r')
    IOError: [Errno 2] No such file or directory: 'non-existent-file.txt'

    Warning: These techniques expose information that can be used by an attacker. Use it only while developing/debugging. Once in production disable them.

    The FieldStorage class of the cgi module has all that is needed to handle submited forms.

    import cgi
    form = cgi.FieldStorage() # instantiate only once!

    It is transparent to the programmer if the data was submited by GET or by POST. The interface is exactly the same.

    Suppose we have this HTML form which submits a field named name to a python CGI script named process_form.py:

    <html><body>
    <form method="get" action="process_form.py">
    Name: <input type="text" name="name">
    <input type="submit" value="Submit">
    </form>
    </body></html>

    This is the process_form.py script:

    #!/usr/bin/env python
    import cgi
    form = cgi.FieldStorage() # instantiate only once!
    name = form.getfirst('name', 'empty')
    
    # Avoid script injection escaping the user input
    name = cgi.escape(name)
    
    print """\
    Content-Type: text/html\n
    <html><body>
    <p>The submitted name was "%s"</p>
    </body></html>
    """ % name

    The getfirst() method returns the first value of the named field or a default or None if no field with that name was submited or if it is empty. If there is more than one field with the same name only the first will be returned.

    If the HTML form method is changed from get to post the process_form.py script will be the same.

    If the user inputed data is to be shown in a HTML document then it is necessary to escape it from HTML tags or else everything inside < > will be interpreted by the HTML parser including javascript code like
    <script type="text/javascript"> malicious code here </script>

    The cgi.escape() method will transform the above into safe HTML text:
    &lt;script type="text/javascript"&gt; malicious code here &lt;/script&gt;

    This is useful not only to prevent script injection but also to make it possible to display HTML source code as has just been done above.

    If there is more than one field with the same name like in HTML input check boxes then the method to be used is getlist(). It will return a list containing as many items (the values) as checked boxes. If no check box was checked the list will be empty.

    Sample HTML with check boxes:

    <html><body>
    <form method="post" action="process_check.py">
    Red<input type="checkbox" name="color" value="red">
    Green<input type="checkbox" name="color" value="green">
    <input type="submit" value="Submit">
    </form>
    </body></html>

    And the corresponding process_check.py script:

    #!/usr/bin/env python
    import cgi
    form = cgi.FieldStorage()
    
    # getlist() returns a list containing the
    # values of the fields with the given name
    colors = form.getlist('color')
    
    print "Content-Type: text/html\n"
    print '<html><body>'
    print 'The colors list:', colors
    for color in colors:
       print '<p>', cgi.escape(color), '</p>'
    print '</body></html>'

    The python scripts in this page and in the next one will try to save an uploaded file in a directory named files in the directory where it is running. If the directory where the script is running is /path/to/dir then the /path/to/dir/files directory must exist. If it does not it will fail.

    To upload a file the HTML form must have the enctype attribute set to multipart/form-data. The input tag with the file type will create a "Browse" button.

    <html><body>
    <form enctype="multipart/form-data" action="save_file.py" method="post">
    <p>File: <input type="file" name="file"></p>
    <p><input type="submit" value="Upload"></p>
    </form>
    </body></html>

    The getfirst() and getlist() methods will only return the file(s) content. To also get the filename it is necessary to access a nested FieldStorage instance by its index in the top FieldStorage instance.

    #!/usr/bin/env python
    import cgi, os
    import cgitb; cgitb.enable()
    
    try: # Windows needs stdio set for binary mode.
        import msvcrt
        msvcrt.setmode (0, os.O_BINARY) # stdin  = 0
        msvcrt.setmode (1, os.O_BINARY) # stdout = 1
    except ImportError:
        pass
    
    form = cgi.FieldStorage()
    
    # A nested FieldStorage instance holds the file
    fileitem = form['file']
    
    # Test if the file was uploaded
    if fileitem.filename:
       
       # strip leading path from file name to avoid directory traversal attacks
       fn = os.path.basename(fileitem.filename)
       open('files/' + fn, 'wb').write(fileitem.file.read())
       message = 'The file "' + fn + '" was uploaded successfully'
       
    else:
       message = 'No file was uploaded'
       
    print """\
    Content-Type: text/html\n
    <html><body>
    <p>%s</p>
    </body></html>
    """ % (message,)

    A directory traversal attack is one where the attacker submits a file with a leading path like in ../../attacker_program. This way he can save a program wherever the Apache user has write permission. Or read a file if the target script reads files.

    To handle big files without using all the available memory a generator can be used. The generator will return the file in small chunks:

    #!/usr/bin/env python
    import cgi, os
    import cgitb; cgitb.enable()
    
    try: # Windows needs stdio set for binary mode.
        import msvcrt
        msvcrt.setmode (0, os.O_BINARY) # stdin  = 0
        msvcrt.setmode (1, os.O_BINARY) # stdout = 1
    except ImportError:
        pass
    
    form = cgi.FieldStorage()
    
    # Generator to buffer file chunks
    def fbuffer(f, chunk_size=10000):
       while True:
          chunk = f.read(chunk_size)
          if not chunk: break
          yield chunk
          
    # A nested FieldStorage instance holds the file
    fileitem = form['file']
    
    # Test if the file was uploaded
    if fileitem.filename:
    
       # strip leading path from file name to avoid directory traversal attacks
       fn = os.path.basename(fileitem.filename)
       f = open('files/' + fn, 'wb', 10000)
    
       # Read the file in chunks
       for chunk in fbuffer(fileitem.file):
          f.write(chunk)
       f.close()
       message = 'The file "' + fn + '" was uploaded successfully'
    
    else:
       message = 'No file was uploaded'
       
    print """\
    Content-Type: text/html\n
    <html><body>
    <p>%s</p>
    </body></html>
    """ % (message,)

    discuss this topic to forum

    relation tutorial

    No relevant information

    Category

      Development (6)
      Introduction to Python (5)
      Miscellaneous (4)
      Searching (2)
      Web Fetching (5)
      XML and Python (0)

    New

    Hot