Lemon Aquarium

Challenge Information

Project: pdfbox

Type: full

Harnesses: 6

Vulnerabilities: 8

GitHub • Challenge Download

AFC Challenge Performance

Number of Unique Vulnerabilities Discovered: #

Number of Teams with Scoring PoVs: 5

Number of Teams with Scoring Patches: 3

Number of Teams with Scoring Bundles: 2

Total Points Scored for this Challenge: 83.76047069101571

What design decisions were considered for this challenge?

Unlike commons-compress, where the focus was on individual challenges, this challenge embeds a large number of vulnerabilities into one full repository scan.

Why this set of vulnerabilities?

While there is a heavy focus on the Type1 font parser, there are vulnerabilities that also focus on general structure of the PDF file.

Delta vs Full and why?

Full repository scan, with a large number of vulnerabilities to test breadth of detection across the codebase.

Challenge Harnesses

DomXfaParserFuzzer
DomXmpParserFuzzer
PDFExtractTextFuzzer
PDFStreamParserFuzzer
PDFWriteReadFuzzer
PDFOCRFuzzer

Challenge Timeouts

enabled

Challenge Vulnerabilities

SSRF in xmp

Vulnerability Information

Author: Tim Allison

Harness: DomXmpParserFuzzer

CWE Classification: CWE-611 , CWE-918

What functions and functionality is relevant?

Parsing of Extensible Metadata Platform (XMP) within a PDF.

Why is this vulnerable?

The XML parser is not securely configured.

Is this a replay and/or is inspired by anything?

This is a replay of CVE-2016-2175.

What makes it interesting?

This vulnerability is buried fairly deeply in the codebase. The vulnerability should be easy to fix, but finding it in the full codebase and generating a proof of vulnerability are both good challenges.

SSRF in xfa

Vulnerability Information

Author: Tim Allison

Harness: DomXfaParserFuzzer

CWE Classification: CWE-611 , CWE-918

What functions and functionality is relevant?

Parsing of XML Forms Architecture (XFA) within a PDF.

Why is this vulnerable?

Code fails to configure the XML DOM build securely.

Is this a replay and/or is inspired by anything?

This is a replay of a code refactoring that was part of the DRY fixes for CVE-2019-0228. At the time of that fix, the XFA code was already secured against xxe.

What makes it interesting?

Infinite loops in type 1 font parser

Vulnerability Information

Author: Tim Allison

Harness: PDFExtractTextFuzzer

CWE Classification: CWE-835 , CWE-834

What functions and functionality is relevant?

Parsing a Type1 font embedded in a PDF.

Why is this vulnerable?

Failure to check if next value is null.

Is this a replay and/or is inspired by anything?

This reintroduces an infinite loop fixed on PDFBOX-5624

What makes it interesting?

Further, finding the vulnerability is non-trivial.

Additional details

https://issues.apache.org/jira/browse/PDFBOX-5624
https://github.com/apache/pdfbox/commit/aa7dc6ccd1c3055b70c8084d7bf383f799047ad5

Infinite loops in type 1 font parser

Vulnerability Information

Author: Tim Allison

Harness: PDFExtractTextFuzzer

CWE Classification: CWE-835 , CWE-834

What functions and functionality is relevant?

Parsing a Type1 font embedded in a PDF.

Why is this vulnerable?

Failure to check for null when calling “nextToken”.

Is this a replay and/or is inspired by anything?

This reintroduces an infinite loop fixed on PDFBOX-5624

What makes it interesting?

This is similar to vuln_3, but located in a slightly different location within the Type1Parser.

Additional details

https://issues.apache.org/jira/browse/PDFBOX-5624
https://github.com/apache/pdfbox/commit/aa7dc6ccd1c3055b70c8084d7bf383f799047ad5

Type1Font lexer OutOfMemoryError

Vulnerability Information

Author: Tim Allison

Harness: PDFExtractTextFuzzer

CWE Classification: CWE-789

What functions and functionality is relevant?

Parsing a Type1 font in a PDF

Why is this vulnerable?

The code reads a value from user input and then allocates that amount of memory without any checks.

Is this a replay and/or is inspired by anything?

This is inspired by the “read length then allocate” without any checks that is common in MSOffice OLE based file formats and several compression formats. However, this is an organic memory usage vulnerability.

What makes it interesting?

Type1Font int overflow into OOM

Vulnerability Information

Author: Tim Allison

Harness: PDFExtractTextFuzzer

CWE Classification: CWE-789

What functions and functionality is relevant?

Parsing Printer Font Binary (PFB) Type 1 fonts within a PDF.

Why is this vulnerable?

Integer overflow in a check that is intended to prevent an Out-of-memory allocation. With a very small crafted file, the parser can allocate 2gb of memory.

Is this a replay and/or is inspired by anything?

This is an organic vulnerability. It is based on read-length-then-allocate vulnerabilities that are common in other file formats. The twist to this is that there was an incorrect fix to add a heuristic record limit, but that fix, in turn, fails to account for integer overflow. There was no check in the actual PDFBox codebase.

What makes it interesting?

As with the other Type1 font vulnerabilities, the POV was fairly easily generated with a custom harness and a custom seed corpus. However, neither of these resources were made available in the competition. So, generating the POV is challenging. Finding the vulnerability should be straight forward with static analysis, but it would be very difficult to find via fuzzing with the harnesses supplied during the competition.

Additional details

https://issues.apache.org/jira/browse/PDFBOX-6044

PageTrees -> PageForests

Vulnerability Information

Author: Tim Allison

Harness: PDFExtractTextFuzzer

CWE Classification: CWE-834

What functions and functionality is relevant?

Parsing a PDF’s page tree.

Why is this vulnerable?

There’s no check on which objects have been processed, and a crafted PDF may contain a loop in the page tree.

Is this a replay and/or is inspired by anything?

Replay of PDFBOX-4623, but rewritten to trigger a timeout instead of a StackOverflow.

What makes it interesting?

A crafted PDF with a loop in the page tree triggers a timeout rather than a StackOverflow, making detection and diagnosis less straightforward.

Additional details

https://issues.apache.org/jira/browse/PDFBOX-4623
The POV is https://issues.apache.org/jira/secure/attachment/12993517/loop_in_page_tree.pdf

Ye olde infinite XRefs

Vulnerability Information

Author: Tim Allison

Harness: PDFExtractTextFuzzer

CWE Classification: CWE-834

What functions and functionality is relevant?

Parsing an xref table in a PDF.

Why is this vulnerable?

There’s no check for circular references in the xref table.

Is this a replay and/or is inspired by anything?

This is a replay of a famous infinite loop/Denial of Service vulnerability that was fixed in PDFBOX-3919. Andreas Bogk presented this vulnerability at Chaos Communication Camp in 2011. It affected poppler, qpdf and PDFBox among, probably, many other PDF parsers.

What makes it interesting?

This is a very famous vulnerability. It would be challenging to identify and patch without historical context.

Additional details

The POV is taken from: https://bugs.launchpad.net/ubuntu/+source/poppler/+bug/825554 See also:

https://www.ieee-security.org/TC/SPW2014/papers/5103a198.PDF
https://www.bleepingcomputer.com/news/software/six-year-old-loop-bug-re-discovered-to-affect-almost-all-major-pdf-viewers/
https://blog.fuzzing-project.org/59-Six-year-old-PDF-loop-bug-affects-most-major-implementations.html