Pentesting considerations and analysis on the possibility of full pentest automation

The fact that even Optimus Prime and the Autobots needed human help to save Earth is a good hint on how I’ll be ending this post.

So, when we speak of automation in this scenario, it’s important to distinguish between three desirable features that ought to be guaranteed in the process:

Accurately rating the findings;
Not causing collateral damage;
Detection.

So let’s take a look at each.

Rating issues: example with HSTS HTTP header

I’ve written about the importance of HTTP Strict Transport Security (HSTS HTTP Header) before and Nessus will easily detect this, as it’s technically easy to do so. But the issue here is more on automating the rating process as Nessus will report it by default as a simple informational finding on every single detection:

Of course you can configure Nessus to rate it as whatever you want, but the problem is that it will then report it the same in all cases, even though it actually varies a lot depending on where you find it. Bear in mind that this is not a problem for just this finding in particular, as it could be said to be a problem for most findings.

However, this (lack of HSTS HTTP header) shouldn’t be an issue if, for example, the target is an empty or static set of pages, in which case I’d simply ignore it since the risk is too low (due to low business impact).

On the other hand, let’s say Facebook were to ask me to do a pentest on their main site and they didn’t have the header on. I’d even go as far as reporting it as a high or even critical, given that you’ll definitely find a client of theirs on any public wireless network, which would make the likelihood go way up the scale.

Variable rating attribution depending on context is a common limitation as it’s very hard to address in automation. Even though you can have tools, or develop them to try rating attribution on a best effort basis, at least for now you’ll still need someone to QA the finding before presenting it to the client.

Minimising collateral damage: example with SQLi

The SQL language has statement types other than the most commonly used Select statement. Trying to understand which SQL statement is being used is crucial in minimising collateral and unintended damages to the client’s infrastructure. And that’s something very very hard to automate.

In the process of trying to maximise detection, while ignoring what SQL statement is being used, you’ll often have sqlmap, for example, set to “–risk=3” which is the highest. The problem with this is that, as the name implies, you’re also maximising the risk of collateral damage.

Burp Scanner is also another commonly used SQLi detection tool. On a (very) tight schedule, I’ll actually have it as my only SQLi detection tool, and if it finds anything, then I’ll go to sqlmap to extract the database. But even on a tight schedule, I’ll still send the GET/POST request to Intruder (Payload Positions Tab) and choose “Actively scan defined insertion points” (as I like to control/know exactly what I’m testing, mostly to avoid me not knowing what I did in case something goes wrong) which sends the parameters for testing on Burp Scanner. Burp Scanner will also use the OR-based tests (if Active Intrusive scan type is on):

Let’s take the example of an HTML form that updates the user’s password.

The SQL statement would be something along the lines of:

UPDATE users SET password=’…’ WHERE user = ‘…’ and password = ‘…’

Now let’s assume the “current password” is vulnerable to a boolean-based SQLi, which would mean injecting on the second/last password field in the previous command, and I inject the following:

xxx’ OR ‘a’=’a

Even with a wrong password, the AND boolean operation would return False, as it has precedence over the OR operation, but when OR’ed with that True statement (‘a’=’a’) it would actually update the passwords from ALL users in the users table, effectively DoS’ing all the client’s users, as they wouldn’t know their “new” passwords anymore.

Obviously, a risk=3 on sqlmap, or an Intrusive scan type from Burp won’t cause collateral damage on any Select statement, so using it on search queries for example is harmless.

Another interesting example of minimising collateral damage is injecting a stored XSS and not being able to remove the injected payload afterwards. I can totally understand a pentester’s reasoning that “if the client told me to test it, then I can do it”, but trust me, if you deface, effectively rendering useless, a payment/booking page of a client that is making a lot of money by the hour on that page, and they struggle to reset it… you’re the one they’ll blame. And because I don’t like getting caught with my pants down, If I’m, for example, on a forum page, and I want to inject an XSS payload into a message that I know will get stored, I’ll be damn sure to check before hand, if I can delete that message afterwards. If I can’t (no feature to do so) I’ll make sure to explain the possible consequences to the client, have them on standby before starting testing that part of the site, and only do so with their permission (usually after they’re sure they’re capable of undoing what I’m about to do on the backend).

As you might imagine, this would be very hard to automate.

Automating detection… or not

Detecting the lack of an HTTP header like HSTS, or detecting autocomplete=on on login forms are fairly straight forward to detect and automate the process.

Side note: The autocomplete=on is just an example of something that is commonly reported. I stopped reporting it for some time now as I agree with the Mozilla’s stand on the issue:

“Even without a master password, in-browser password management is generally seen as a net gain for security. Since users do not have to remember passwords that the browser stores for them, they are able to choose stronger passwords than they would otherwise.”

Back to the point, there are however, plenty of findings that are extremely hard to automate detection with a single algorithm.

I’ve had a case once, of a login form which I tested for data input validation. Given that it wasn’t reflecting back any of the inputted data, I didn’t test for XSS. But while browsing the application, I found a log page, only accessible when logged in as the administrator, which showed all login attempts. To be honest, I can’t recall if it only showed the usernames, or if it also showed the passwords, but even with only the usernames, I remember clearly recognising some of the SQLi injection payloads from Burp, and going back to the login page to inject a specific (adapted to that scenario) XSS payload which was indeed triggered when opening the login log page.

Another particular case was one where the login form was vulnerable to a boolean-based SQLi. But the detection, even though basic (as simple as setting the password as x’ OR ‘1’=’1 ) was not actually detected by Burp Scanner. Fortunately, mostly when I have a decent testing time window, and specially when I don’t find much, I like to go on manual-mode on at least the most prominent parameters, and I managed to find it by myself. The issue was that the web application, after a successful login (or a successful SQLi injection) would set a session cookie which, if used, would have any submitted login form (even with wrong credentials) go straight into the dashboard/home page, until you actually perform a logout which would invalidate that cookie. Ok, but how did Burp miss it? Burp won’t report on a boolean-based SQLi by only sending one payload. It actually sends two: a true statement and a false statement. Given that I was giving it, as a base test, a fake password (and existing username), by sending x’ or ‘1’=’1 it was expecting a different response from when it sends x’ or ‘1’=’2 (Burp actually uses other numbers in order avoid filter detection). But because Burp request was being sent with a cookie that, because of the true statement, was getting authenticated, the False statement with that same cookie (already authenticated by the previous request) returned the same response as the True one (the HTTP 200 OK with the dashboard page). This made Burp believe that it was a false positive and it did not report the SQLi. And that’s why I’ll usually remove the cookie on the request I send to Burp’s Intruder/Repeater/Scanner when testing in such scenarios. But I sure wouldn’t remove the cookie in an authenticated scenario. Again, you can see how hard would it be to try to automate all this.

And of course there are multiple cases where you’ll have to:

develop personalised Burp Macros: to bypass CSRF tokens, or applications that will log you out for sending forbidden characters, and the macro will have to log you in every time before you send another payload to test;
use already developed Burp Extensions, or develop one yourself;
develop specific SQLMap tamper scripts;
use specific SQLMap parameters to tell the tool exactly how it will know when it found something (–string / –not-string / –regep / –code).

These will allow you to adapt your attack vectors into very specific scenarios which no tools will detect by default, and these happen more often than not.

Conclusion

Obviously the whole pentest process won’t be easy to automate in a near future (if ever). Nevertheless, there is a lot to be done in making the whole process more efficient. As mentioned before, the low hanging fruit will still have to be validated by a person, but if done well this can actually free up the pentester’s time to actually focus on some more complex findings that no tool can detect, making the penetration testing service delivered by the company much better since, while having the same project deadline as others, they can actually extract more value out of it.

The devil of course is in the “if done well”. I’ll be writing more on that in a near future.

Pentester's life

Try harder… and then go a little deeper.

Pentesting considerations and analysis on the possibility of full pentest automation

Rating issues: example with HSTS HTTP header

Minimising collateral damage: example with SQLi

Automating detection… or not

Conclusion

Leave a comment Cancel reply

Rating issues: example with HSTS HTTP header

Minimising collateral damage: example with SQLi

Automating detection… or not

Conclusion

Share this:

Related

Leave a comment Cancel reply