When Adobe introduced the portable document format (PDF) in 1993, a consultant from Gartner called it “the dumbest idea I’ve ever heard in my life”. Users would have to twiddle their thumbs waiting for the megabyte-sized files to download over their dial-up internet, then wait again for their PCs to render them. The software-maker’s board wanted to kill the project. But the PDF triumphed, particularly after the Internal Revenue Service, America’s tax authority, began to use it for digital tax forms. Today more than 2.5trn PDFs float in the ether. But will the format survive the ai revolution?
Representational image.
PDFs still have drawbacks. They are a pain to view on a smartphone. Copying data from them is fiddly. Software tools that read screens for blind people struggle with PDFs. The file type, which Adobe relinquished control over in 2008, is also a vehicle for malware: a fifth of email-based cyber-attacks utilise PDF attachments, according to Check Point, a cyber-security firm.
Lately another source of criticism has emerged. The large language models (LLMs) underpinning generative AI are often bamboozled by PDFs, reading a page set in several columns from left to right rather than top to bottom, say, or getting confused by headers and footers. Trouble parsing PDFs is one of the reasons AI chatbots occasionally “hallucinate” nonsense.
Enter the disrupters. Startups such as Factify are on a mission to build a new file type that is better suited to the technology. Matan Gavish, its boss, talks of his “megalomaniac” vision of displacing the PDF.
Yet Duff Johnson, head of the PDF Association, protector of the format, argues that the fault lies not in the file type but in ourselves. He contends that there is no reason developers cannot build bots that are able to use PDFs. The AI assistant embedded in Acrobat, Adobe’s PDF reader, is designed to do precisely that, points out Leonard Rosenthol, the software-maker’s PDF guru. Google, a leader in AI, has also rolled out a tool for developers who use its Gemini models that makes it easier to ingest PDFs. The format’s reign may yet continue.