Table of contents
- What is cross site scripting
- What is cross site request forgery
- Who is to blame
- How can users protect themselves
- How do you know what type of login a site is using
- How can Web sites protect themselves
- Cross site scripting
- Escaping data
- Never trust URLs you are given
- Being careful with scripts
- Using HTTP-only cookies
- Using HTTP authentication
- Storing safer cookies
- Allowing only certain HTML input
- Using BB code
- Embedding content from other sites
- Cross site request forgery
- Encode a session ID in the URL
- Check referrers
- Prompting for passwords
- Pass unique IDs in form submission
- Secure sites
- Cross site scripting
What is cross site scripting
Cross site scripting (XSS) is where one site manages to run a script on another site, with the privileges of you, the user.
In many pages, this would be completely harmless. But now imagine that you have logged into site A, and that site has used a session cookie to store your identity. If site B manages to make you load a page on site A containing a script they have injected into it, that script could take the cookie for site A, and send it to site B. The person running site B can now use your cookie in their own browser, and can now use site A, making it think they are you.
In the case of site A being a blog or forum, they could erase or alter your posts, add new abusive posts, or erase your account. In the case of Web mail systems, they could send abusive email to your colleagues, delete any emails, or read all the passwords you have been sent in your email, which may give them access to even more systems. In the case of it being a banking site, they could make large cash transactions using your bank account. In the case of banking or shopping sites, they could obtain your banking details, and use them to make their own purchases.
XSS can also be a problem from users on shared sites, such as forums or blog comments, where users may find a way to inject scripts into page content, where the exploit can survive much longer than just a single page load.
Cookies are not the only target of cross site scripting, but they are a very easy way to exploit a simple mistake made by the site author. In some cases, it may be possible to inject a script onto the login form of the site, and convince you to fill it in, and then they can make it send them your password. Or they could simply make it load another page on the site, submitting form data to it, or using other means to perform actions on your behalf.
Unlike phishing scams where a site tries to trick users into thinking it is another site, XSS is the real site, not a fake. It has just allowed another site to run a script on it, in the context of one of its users.
What is cross site request forgery
Cross Site Request Forgery (XSRF or CSRF), also known as Cross Site Reference Forgery, is similar in some respects to XSS, but very different in one important respect. It does not rely at all on being able to inject a script. It is more unreliable, but its effects can be just as damaging.
The general idea of XSRF is that while you are logged into site A, you also look at a page on site B. Site B is evil, and submits form data to site A, or requests Web pages from site A using some other means. Since you are logged into site A, it uses the form data as if you yourself sent it. That may then do all of the same things as XSS attacks, such as creating forum posts, or making bank transactions.
Having strong passwords that cannot be guessed or calculated is always a good idea, but XSRF (and XSS) bypasses that part of the protection, as it works once the user has logged themself into the site using their strong password.
Who is to blame
Blame is a fairly harsh word, since it is the attacking site B that is really to blame, but evil sites are a fact of life, and site A should protect its users. XSS in particular is always the result of a mistake on the part of the site with the vulnerability. XSS is also worryingly common, many sites make the basic mistakes that allow XSS attacks.
How can users protect themselves
Most users are not even aware that XSS and XSRF are possible. And they should not need to be. Users should not be expected to be security experts – if they were, you as a Web developer would be out of a job. However, users who wish to protect themselves can take a few steps to do so.
The basic step is to never open other sites while you are logged into a site. This means that while you are logged into your bank, shopping site, blog, forum, web mail, side admin section, etc., never open another site in any window. Do not click links in emails, or other applications. If you use Internet Explorer (I recommend against this if you value your security), do not run programs like Word (that includes opening attachments in email), or generally any other programs that view Web pages, as many Windows programs use the Internet Explorer engine either directly or via macros, and may be used as a vector for these attacks.
You may also want to disable iframes if your browser (such as Opera) allows that. This step should not be necessary as long as you do not browse other sites at the same time, but if you do, it makes it a little harder for XSRF attacks to be carried out, as most (but by no means all) of them use iframes.
All of these measures are extremely limiting, and certainly not something most users would want to do. So the final step will always be the one that is preferred: Make sure the sites you log into have actually checked their sites for XSS and XSRF attacks. Not just that they are aware that they exist, or believe they are safe, but that they have actually run checks to make sure they are protected against those attacks.
How do you know what type of login a site is using
Cookie logins are fairly easy to identify. They are almost always what can be seen as a form on a Web page, asking you for a username or email address and password. HTTP authentication (or similar types of authentication) are shown as a dialog that appears in its own little dialog window in front of the Web page, asking you for a username and password.
Shopping sites almost always use either a cookie or a URL encoded session ID, and virtually never use HTTP authentication. In general, you can tell which they are using by looking at the page address. If it contains a large amount of seemingly random characters (usually near the end of the address), then it is probably using a URL encoded session ID. Otherwise it will be using a cookie.
Working out if a cookie is a session cookie or not is a little harder. Some browsers may allow you to see the properties of stored cookies, or to prompt for them with their details when the server tries to set them. However, a simple test is to log into the site, then without logging out, close all browser windows, then restart the browser, and try reloading pages on the site. If you are still logged in, then it is not a session cookie.
There are some alternative types of login, such as using a client certificate (which you will normally have been asked to install at some point), or IP based authentications (typically used on local intranets). It is not normally possible to log out from either of these, even by restarting your browser. In general, you can identify these either because you had been asked to install a client certificate (not a normal root certificate), or because you never had to log in in the first place.
How can Web sites protect themselves
These are very complex issues, and there are no simple solutions, but there are certain things that the site should always do. In almost all cases, it is server side scripting that needs to be changed or fixed.
Cross site scripting
This is by far one of the most common mistakes made by Web authors, and turns up on a substantially high number of sites – even those you would expect to be written by knowledgeable authors. Some even dismiss these mistakes as harmless, or trivial, ignoring the dangers of what those mistakes can present.
Let us take this simple example; somewhere on a site, there is a form that a user can fill in. If they fill it in with invalid details, the form is displayed again, with what they typed last time shown in the the form inputs, allowing them to change whatever was wrong. This is often done with login forms, although it could in fact be any form on the site while they are logged in. On its own, this is not dangerous at all, and is in fact a very good thing.
The problem is that some sites forget to escape the data before putting it back into the form. Assume that the form had this:
<input name="foo" value="">
Now assume that the site displays it to the user like this (I will use PHP here, but it could in fact be any server side language):
<input name="foo" value="<?php print $foo; ?>">
With that single, simple print command, the site has opened itself up to XSS attacks. Imagine now that the evil site uses an iframe, link, image URL or form submission to the url:
The server side script would then create this in the page source code:
<input name="foo" value=""><script>insert-evil-script-here</script><"">
The implications of this are immediately obvious. The evil script could then load other pages on the site using XMLHttpRequest, even taking any URL encoded session ID into account, and in doing so it could add or change data, or make form submissions, etc. It could also read any session cookies, and send them back to the evil site as part of a URL using an iframe, image, script, stylesheet, or just about any type of external content. The possibilities are fairly limitless. This simple change would have protected the site:
<input name="foo" value="<?php print htmlspecialchars($foo); ?>">
Although forms are the most common place where this happens, it is not the only time this can be a problem. The same situation occurs if a site writes unescaped content as part of any page content – for example, many pages use a single page that writes whatever information it was passed as a page heading.
Note that even if scripting is prevented by other means, an attack could, for example, display a false login form to the user, that sends the details to another site. They could also display misleading information. Though not as harmful as a XSS vulnerability, as it needs the user to be tricked into following those instructions, this is still a problem that needs to be prevented.
So the solution is to ensure that if contents like this are entered into the form, that the server side script escapes them before adding them to the page content. HTML offers a simple way to escape these; use HTML entities for < > & and ” characters. Yes, for virtually all situations, this really is all it takes. PHP offers a simple function to do this; htmlspecialchars. Other languages sometimes offer ways to do this, but some do not. One of the big offenders is JSP which, to my knowledge, has no equivalent method. Authors simply do not realise they should create one for themselves. Many JSP pages are left open to XSS attacks as a result.
It is not enough to escape just < and > characters, since quotes can be just as damaging inside an attribute. If quotes are not escaped, the attribute can be ended, and a new event handler attribute started, that triggers when the user clicks it, or focuses it, or moves their mouse over it. If you are putting the content inside an attribute, make sure the attribute uses ” (double) quotes, or the attribute could also be ended simply by including a space (if using ‘ [single] quotes around the attribute value, make sure you tell PHP’s htmlspecialchars function to convert those as well inside the attribute value).
Form data must also be escaped before using it as part of a database query, typically by putting backslashes before quotes (again, PHP has inbuilt methods for doing this). Failure to escape it could allow people to end the query, and start a second one, deleting random data, corrupting databases, or at worst, being able to run shell commands, and take over the server. A similar situation could occur if your script uses that data to construct a shell command.
Never trust URLs you are given
Being careful with scripts
An example of where this usually occurs is an image gallery script, where the image to display is passed as a parameter to the page address, and a script then extracts it to display the image. If a script accepts URLs as a parameter, it must always check that the URL starts with a trusted protocol, such as ‘http:’ or ‘https:’, or it will leave itself open to this sort of attack:
Similarly, if the data is evaluated by the page using the eval or an equivalent method, attackers can simply feed their script directly into that parameter. A script must never evaluate something passed as a parameter to the page.
Using HTTP-only cookies
Cookies that are set via HTTP (such as authentication cookies) are also available to scripts. One of the most common demonstrations used for cross site scripting, is taking another user’s login cookie, and then performing some action as them. If the cookie was not available to scripts, they could not take them. Internet Explorer and recent versions of some other browsers allow an optional ‘httponly’ parameter when setting HTTP cookies, which prevents them from being accessible to scripts.
This is not a solution, as it has only limited scope. For a start, this is only useful if all browsers support it – as I have already said, the exploit only needs to work once in one browser for it to be successful. More importantly, however, cookies are rarely used in real exploits. Someone who manages to inject a script into someone else’s page is not very likely to use their cookie themselves, as that would immediately give away their IP address, making it easier to locate and prosecute them. They are far more likely to run a script there and then, to do the damage through the user themself. HTTP-only cookies give a false sense of security; they may protect some people from demonstrations, but they will not protect from real attacks.
Of course, the main point is that it should never be allowed to get to this stage. XSS should be prevented at all costs. If you have a XSS vulnerability in your site, then cookie stealing is the least of your problems. Fix the real problem, not the symptom.
Using HTTP authentication
HTTP authentication is like the HTTP-only cookie, except that it works in all browsers. It still suffers from the same false sense of security, however, and in addition, no browser currently allows you to log out of it, meaning it is more susceptible to delayed XSRF attacks.
Storing safer cookies
Some sites take the simple approach of saving the user’s username and password in the cookie. This is an instant giveaway if a XSS attack manages to get the cookie, as they have the username and password. Even if the user logs out, the attacker can log in again. It is better to store a unique session ID. That way, if they log out and the server invalidates the session, the attacker can no longer do anything with the cookie. To make it even harder for an attacker, the server can tie the session ID to the user’s IP address. Attackers would have to be able to use the same IP address for them to exploit it – this is possible (for example, they may be behind the same NAT), but it makes it much harder.
However, again, cookies are only a minor concern considering that the XSS vulnerability can be exploited in a number of ways, that do not need any cookie at all.
Allowing only certain HTML input
Some people want to allow certain HTML to be used, but not others. Typically, this is for forums, where users should only be allowed to enter basic HTML that does not affect other users, or blogs, where comments should only use basic HTML. This is certainly not trivial, and unless you are very experienced in avoiding XSS attacks, I suggest you leave well alone, and escape everything.
However, if you feel that you know enough to do this, then prepare to step into a minefield.
The basic idea is not to remove anything you think is dangerous, but to remove everything unless you know it is safe. The number of ways that scripts can be added to a document is quite staggering – some of these only work in certain browsers, but it only takes one of these to work in one browser for the exploit to be a success:
- A script tag.
- A script tag that has a namespace prefix in its tag name;
<div xmlns:foo="http://www.w3.org/1999/xhtml"> <foo:script>...script here...</foo:script> </div>
- Event handler attributes – these typically begin with ‘on’, and may have spaces before and after the ‘=’ sign (and can also have a namespace prefix).
- Any of those within a custom namespace.
- CSS -moz-binding or behavior (these can also be in imported or linked stylesheets).
- CSS expression (these can also be in imported or linked stylesheets).
- HTML style attributes that use any of those CSS methods.
- Iframes, frames, links, etc. with ‘data:’ URLs of pages containing scripts (currently these are treated by some browsers – but not all – as a script from another domain, but that is not a requirement, and browsers may change in future, since a same-domain response is expected and more useful).
- Objects, embeds or applets that then run a script on the parent page (in most browsers this is allowed without any of the usual cross domain restrictions).
- XML entities, which can contain any other scriptable content, and hide it behind a harmless-looking entity reference:
<!DOCTYPE foo [ <!ENTITY bar '<script xmlns="http://www.w3.org/1999/xhtml">...script here...</script>'> ]> <foo>&bar;</foo>
These can also be defined in a remote file, which is loaded through a harmless-looking URL:
<!ENTITY bar SYSTEM "http://example.com/extra.xml">
Or even indirectly via a custom DOCTYPE, which then contains the entity references:
<!DOCTYPE foo SYSTEM "http://example.com/untrusted.dtd">
- XSLT which creates scripts using any of the other points (XSLT itself can also be very damaging).
- XBL which makes additional elements or attributes become scriptable.
- XUL which contains script elements or scriptable attributes.
- Conditional comments, which can then contain any other HTML, but appear to be only a comment.
- Script within SVG images (or equivalent namespaced script elements).
- XML events ‘listener’ elements or namespaced attributes.
- VoiceXML and VoiceXML events.
- XML processing instructions (like
There are certainly many other ways to put a script into a page, and that is why I call this a minefield. You absolutely must not blacklist elements or attributes you know are dangerous. You must whitelist those that you know are safe. Even seemingly safe elements such as LINK (or the related CSS
@import rules) can end up importing a stylesheet from an untrusted source that contains the harmful content described above.
As well as whitelisting elements, you must also whitelist the attributes that each of them may have. Anything that is not on your whitelist must be removed, or deliberately altered so that it no longer functions as the element or attribute it is intended to be. PHP has a function that is supposed to help do this, called strip_tags. However, this copes very badly with invalid HTML, and it is possible to bypass it by feeding it specially broken HTML.
Stripping tags is a fine art, and can be exceptionally difficult, as you must be able to cope with intentionally broken HTML, designed so that after the tags have been stripped, what remains is another tag that was created by the removal of another one. An example would be this:
Stripping them multiple times would be equally uneffective (unless a matching ‘while’ loop was used until the tags had been removed), as they could be nested to indefinite levels, but could end up with something that browsers understand.
Remember that “LiNk”, “LINK” and “link” are all considered to be the same tag in HTML. In XHTML, namespaced elements can also be the same as non-namespaced ones. For the sake of simplicity, it is easiest to remove anything that is namespaced; if someone is trying to use a namespace in a forum post or blog comment, then they are probably trying to exploit something anyway.
For removing tags and attributes, you may find it more effective to use a simple XML parser that only allows the non-namespaced tags and attributes you have decided to allow. Anything else can throw an error (make sure it does not attempt to display the rendered output in the error message, or you will be back where you started).
Future versions of major browsers may also support other potentially dangerous protocols. Remember that more ways to trick browsers into running scripts are discovered all the time, and you will need to keep your pages protected against them. An easy way to do this is to always insist that every URL a user provides must start with ‘http:’ or ‘https:’ (or ‘ftp:’, if you want to allow that). This is by far the best protection, as it ensures that only those safe protocols can be linked to, even if it may be slightly more inconvenient for the user to type. Other protocols you might want to consider safe are: ‘mailto:’, ‘irc:’, ‘ircs:’, ‘gopher:’, ‘news:’, ‘nntp:’, ‘feed:’, ‘wap:’, ‘wtai:’, ‘about:’, ‘opera:’, ‘smsto:’, ‘mmsto:’, ‘tel:’, ‘fax:’, ‘ed2k:’, ‘dchub:’, ‘magnet:’. I do not recommend whitelisting streaming media protocols, for reasons given above and below. Be warned that with META refresh tags, some browsers allow multiple instances of the target URL, and any one could contain the scripted protocol.
Failing to cope with all of these possibilities could lead to a fully fledged attack being launched against your site. The MySpace worm is a good example of the lengths that you will need to go to, to protect yourself against these attacks.
Using BB code
BB code is like a simplistic version of HTML. There are several variations – wikis generally have their own syntax that serves a similar purpose. The general idea is that instead of using normal HTML, the user is allowed to enter only a small selection of HTML equivalents, that are then converted into the HTML they represent, with all other content escaped.
This makes it easier to work out what is or is not allowed – if it does not match the exact patterns, it is not converted. This can be less difficult than having to detect which parts to remove, as the parts of HTML that end up being used are generated by the server, and will not include anything that is considered dangerous.
Embedding content from other sites
It is possible to use content from other sites, such as images or scripts, from other sites (a practice sometimes known as “hotlinking”), using an absolute URL:
In the case of Flash, there is an optional ALLOWSCRIPTING parameter that can be set to
The same problem is true in reverse. If you produce plugin content, and that content has access to sensitive information, some other site may embed your content in their own page, and start interacting with it using scripts. If the information can be accessed through scripts, then it can be accessed by any page that embeds your plugin content. This is of particular importance to Flash-based shopping sites, or plugin-based security systems. The plugin itself may offer some form of protection (such as checking the hosting domain), but this is up to the individual plugin, and you should refer to that plugin’s documentation for more information about protecting your content from remote scripting.
Cross site request forgery
XSRF attacks are based on knowing what the URL will look like, and knowing exactly what data the server expects to be passed, in order to perform an action, such as changing database data, or purchasing items.
They also rely on the target site thinking that the user themself submitted the form, or requested the action from the site itself, and not another site.
Any solution must make it impossible for another site to do either of these.
XSRF attacks also rely on the user being logged in, and to visit the exploiting page, while the attack is carried out. These conditions require a certain amount of social engineering, and the success rate will also depend on timing. However, it only needs to be successful once for the effects to be extremely damaging. The solutions I will present are not exhaustive, you may also find others, but I recommend you use a combination of these approaches.
Some proposed solutions attempt to use multi-page forms to ensure the correct submission path is followed, and use POST instead of GET as the form method. Neither of these offers effective protection. Both make things a little harder for the attacker, but can fairly easily be circumvented. They can use a real form to get POST to work, and use timed submission in frames, iframes, or popups to simulate multi-page submission.
Although XSRF attacks are usually referred to with two separate sites being involved, this is not a requirement. Blogs and forums are very easy targets. For example, if you post an entry on your blog, and somebody comments, they can put HTML code in the comment that causes the blog post to be deleted as soon as you look through your comments.
These attacks can also be carried out through BB code or wiki syntax, as long as an element is allowed that has a URL value. Considering how many elements have URI values, this is a fairly reliable attack. It also has the added benefit that users will usually be logged in while viewing comments on their own blog or forum. This particular type of attack can be partially protected against by insisting that forms that request actions use POST instead of GET, but as I have already said, POST is definitely not a complete solution to the XSRF problem.
Encode a session ID in the URL
This is a fairly simple way to make it virtually impossible for a malicious site to predict what the URL of the target page will be. Make sure that the session ID is sufficiently long and unpredictable, so that the site cannot simply try multiple combinations until one works. 20 random characters should usually be sufficient, but you may want to use more.
Unfortunately, this means that the site will need to generate every page to make sure that the session ID is used by every page, every link, every form (as a hidden input). It is not convenient, but it is very effective protection.
If a page containing a form or link is supposed to be the only page that can send data to a server-side processing script to request an action, then that processing page should check the referrer header to make sure that the page that requested the action was the correct page. Any requests from other pages (including if no referrer is sent, or if it is blank), should not cause any processing, and instead, should display a warning saying that the referrer header was not correct.
Note that some browsers can disable the referrer header if the user requests it – they should be asked to enable it again. Some browsers never send a referrer header. If you intend to use the referrer header as a security precaution, then these browsers will simply not be able to use the site. It is important not to allow requests that do not have a referrer, as an exploiting site could use a trick to prevent a browser sending the header, and this must not be mistaken for a browser that never sends one.
This on its own is not a complete solution for multi-user sites such as blogs, blog comments, or forums, as the attacker may be able to create forms or equivalent links on the page itself and convince you to click a button to initiate the action.
Prompting for passwords
This is a very unpopular idea, but it is a very effective way of ensuring that the user is themself, and not a page that has posted form data as that user. The form that submits data to the processing page should also have a field where the user must enter their password (yes, even though they are already logged in). If the password is not correct, then the processing page must not process the data. Attacking pages will not know the user’s password, so they will not be able to fake the form submission.
Pass unique IDs in form submission
Instead of having to encode a unique session ID in every page, include it in a hidden input in the form that submits to the processing page. This can be the same as the user’s session ID that is held in a cookie. With XSRF attacks, the attacker does not know what the user’s session ID is, so they will not be able to send that part of the form data. The processing page should then check for that session ID, and if it does not find it, it should reject the submission.
Many sites, such as shopping and banking, use encrypted connections to allow users to ensure that they are talking to the correct site, and to prevent attackers from sniffing network data packets. These require a whole new level of attack (as well as the XSS and XSRF attacks), but considering the amounts of money involved, these attacks are profitable enough to be done.
Encrypted connections do a lot more than just encrypting data sent by the user. They also encrypt pages sent to the user, and offer a certificate path that allows the user to ensure they are talking to the real site before they give it any sensitive information.
Typical attacks would involve intercepting and rewriting a page before the user receives it. This could be done through a compromised router, for example. Another would be to use a compromised DNS server to point the user to the wrong server that pretends to be the real site – the user’s address bar will of course show the correct site, and it could even be encrypted. Strictly speaking, these are not cross site scripting attacks, but the effects are the same; some content of the page is changed by a third party, so that sensitive information can be sent to them instead.
Secure connections can deal with both of these situations. Firstly, an encrypted connection can be intercepted, but the attacker cannot read or rewrite the page content, unless they can break the encryption fast enough. This is why it is important to use high level (typically 128 bit) encryption, as it is not currently possible to break within the lifespan of the attacker. Some of the lower level older encryptions (56 bit) can be broken within just a few seconds.
Encrypted connections also offer the ability to check the certification path. This is also virtually impossible to fake, so a user can check the certificate to make sure it is the right company. The browser can check the certification path to ensure the certificates are valid, and that the certification path is correct. Any failures will cause a browser to display warnings to the user so they are aware that the site may not be who it claims to be.
The first and one of the biggest mistakes a site can make is to use both secure and insecure content on the same page. An attacker only needs to compromise one file in order to carry out a successful attack. If they compromise the insecure content (such as replacing a safe script file with an unsafe one), the secure content is compromised as well. This mix of content security happens on quite a few sites, and browsers usually display warnings, but are moving towards denying it altogether.
The next most stupid mistake is to have the login form on an insecure page, that posts the login information to the secure page. It assumes that since the data is encrypted when it is sent, that everything is OK. This happens on a disturbingly high number of bank sites, especially those in the USA.
The problem with this approach is that the user should be able to check the site is real before they give it their information. If the DNS has been compromised, they would only find that out after they have sent their login details to the wrong site. If the page has been altered by a compromised router, for example, to change the action of the form, the user would not know about it until after they sent their data to the wrong site (or if it then sent them to the real site, they would never know).
Very occasionally, there is the problem that an encrypted site sends data – via forms, XMLHttpRequest, or any other means – to an insecure page, either directly or via a redirect. Packet sniffing and rewriting means that an attacker has immediate access to that information.
Secure sites need to ensure that they do not make any of these mistakes, as well as not allowing XSS and XSRF attacks.