Scribble, Watermarks, and the Vault Leaks

In the news today is that the Vault series of leaks includes information about “Scribble”, a watermark beacon inserted into CIA docs so as to track leakers.

Well, not to rain on that parade, but frankly, anyone who gets some leaked docs from ANY agency (and most companies) ought to be bright enough to know they are likely to be watermarked (have a hidden unique pattern of bits in the binary part marking each copy and who checked it out) and take steps to block that.

This isn’t all that hard.

On a sterile machine (new install, no network connection) you ‘display’ each document and copy it. Note this is a copy of the TEXT not the BINARY. You can even go so far as to print it out and OCR (Optical Character Recognition) it back to a pristine binary. When done, scrub the machine. Personally, I’d go for print to paper and OCR into a separate sterile machine, but I’m like that ;-)

Now there are issues even with that. For example, you can put specific changes of text and / or spelling into a doc to mark it. Counter measures to that are a bit more complicated, but a good spell check is a start. Similarly, assure that there’s a change of some things like font and margins so the text reflows and repaginates. If lives depend on it, run it through a translator to another language and back, then proofread for translation errors.

I know, lots of folks still get caught just on Microsoft Metadata on docs and images. But if you are in the game of taking on TLAs (Three letter Agencies), folks really ought to be expecting watermarks and the need to remove them.

Subscribe to feed

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in Political Current Events, Tech Bits and tagged , , , . Bookmark the permalink.

20 Responses to Scribble, Watermarks, and the Vault Leaks

  1. Zeke says:

    Assange trains his eye on Pompeo’s self-contradictory speech:

    “Words matter, and I assume that Pompeo meant his when he said, “Julian Assange has no First Amendment freedoms. He’s sitting in an embassy in London. He’s not a U.S. citizen.” As a legal matter, this statement is simply false. It underscores just how dangerous it is for an unelected official whose agency’s work is rooted in lying and misdirection to be the sole arbiter of the truth and the interpreter of the Constitution.

    Pompeo demonstrated a remarkable lack of irony when he suggested that WikiLeaks “focus instead on the autocratic regimes in this world that actually suppress free speech and dissent” — even as he called for a crackdown of such speech. In fact, Pompeo finds himself in the unsavory company of Recep Tayyip Erdogan of Turkey (257,934 documents published by WikiLeaks); Bashar al-Assad of Syria (2.3 million documents); and the dictatorship in Saudi Arabia (122,609 documents), to name just a few who have tried and failed to censor WikiLeaks.”

  2. Zeke says:

    Assange appeals to the Presidency on behalf of free speech, and I second this:

    “President Theodore Roosevelt understood the danger of giving in to those “foolish or traitorous persons who endeavor to make it a crime to tell the truth about the Administration when the Administration is guilty of incompetence or other shortcomings.” Such “endeavor is itself a crime against the nation,” Roosevelt wrote. President Trump and his officials should heed that advice.”

    Assange, though no lover of the USA, “gets it”:

    “All democratic governments are managed by imperfect human beings. And autocracies are much worse — the “benign dictator” is a myth. These human beings, democratic and autocratic alike, make mistakes and commit crimes, and often serve themselves rather than their countries. They are the focus of WikiLeaks’ publications.”

    If you hate that intelligence agency, from its infamous Nzi beginnings to its worldwide crimes, it does not mean you hate the US. The Administration should understand that the American people do not feel that that intelligence agency is keeping them safe, and some feel that it is not anything but the world’s largest criminal organization from its inception. The President may have to keep up appearances of being pro-law enforcement and therefore procia, but he must not allow anything to happen to Julian Assange.

  3. tom0mason says:

    I have done similar tricks years ago while working for a company that was less than honest with it’s employees.
    I would Hexedit my pdf files so it says it was generated by ”TMason’ Distiller’ or some-such, and assign it weird version numbers. PDF readers never check that they are valid but will reveal this information if capable.

    In early versions of MS Word there was plenty of room in the metadata area to hide information, see here ( for some of the information, note Word documents now hold encrypted Information Rights Management (IRM), such protected documents are encrypted in the same way that OfficeXP. and documents are encrypted using a password. .

    Simpler ‘tricks’ are to hide information in apparently unused Headers, footnotes, bookmarks, etc. all in a very small white font, thus hidden in plain sight. These days MS Word has embedded XML as well as OLE to play with.

  4. E.M.Smith says:


    Why I like the “print and scan” as it covers anything that can’t be seen… even new stuff…

  5. Larry Ledwick says:

    Note pad is also handy for stripping out everything but normal ascii text.
    Copy and paste to note pad, then copy and paste that text back into what every output document software you want to use. Once formatted, do a print screen to capture the text as a plain bit map etc. Lots of ways to strip out the meta data. The trick is most folks don’t even know all those layers of hidden data exist.

    One gotcha most people don’t realize exists is modern color prints print a “finger print” in fine yellow dots in the margin that identifies the printer the document was printed on. It can be seen if the document is viewed in blue light otherwise not visible to the naked eye.

    So if you print out such a stripped image on your home printer you leave a finger print on which printer actually printed the document even though you carefully stripped out the embedded meta data.

  6. E.M.Smith says:

    That’s why you OCR the print…

    Big margins, OCR the nonmargin area, inspect for artifacs and remove any (though what isn’t visible to the eye usually isn’t seen by the OCR process either… though that is tunable to some extent):

    Once, when handling a 4000+ layoff, HR would not give us a machine readable copy of the personel being let go. Only 8.5 x 11 inch oxblood paper as that was “uncopyable”. Well, since doing 4000 account closures by hand typing of black on nearly black would screw up a lot and take forever (thus be a big security exposure) we scanned it with extreme contrast settings, then OCRed the scan, then made the machine scripts to drive with that file. Only a couple of names needed fixing in the scan…

    At the next layoff managers prep meeting I had my copy, on white paper, just to make a point… got dirty looks from HR but the VP Engineering smiled ;-)

    HR never refused my request for a machine readable copy again…

    Knowing your printer, scanner and good OCR software are core skills…

  7. Ralph B says:

    Take a picture of the screen and print that…no worries about font changes, no worries about something nefarious getting put on your printer. PC stays completely quarantined.

  8. Zeke says:

    beththeserf says, “Who guards the guardians?”

    (See if this image displays)

  9. jim2 says:

    I only recently discovered that SD cards are very difficult to erase. They have many more memory cells than they need and “wear-level” them, alternating use to slow the degradation of the cells, which have a limited number of read-write cycles.

    And you can’t count on the manufacturer to supply a “full wipe” capability, either. There is a standard for it, but it isn’t always implemented. Looks like if you have something sensitive on it, like your personally identifiable information, the best thing to do is nuke it in the microwave then bust it up with a hammer for good measure :)

  10. jim2 says:

    “Only 8.5 x 11 inch oxblood paper”

    Was the print black on red? Or dark red on red?

    In the first case, you can set the scanner for red drop-out. That is done a lot on documents made for scanners. That way the dropped-out lines don’t interfere with OCR.

    If it was red on red, I can see why the high contrast might have been necessary. However, in RBG, dark red is a different color than light red, so it should be easy to make the background disappear.

  11. E.M.Smith says:


    This was in the early 1980s so black toner on oxblood paper (that I think was a mix or red and black dyes). Our scanner was only B&W so contrast was the available control… a color scanner does make things easier…

    Per SD cards:

    That kind of thing is why after an erase, I fill a device with crap (preferably copies of microsft software :-) and then delete that. For sensitive things, I do it a few times… For things involving TLAs, law enforcement risks, death, or blackmail risk, well, a blowtorch is your friend :-)

    Slag doesn’t say much…MAPP gas is enough… but it looks like the current stuff is not the same as real MAPP, so make sure you get slag with it…

  12. philjourdan says:

    The Beauty of Notepad. Copy to notepad, then copy back to Word. All metadata is gone (not if you use the same computer!).

    Standard practice (for other reasons) for me.

  13. jim2 says:

    Yep. I like MAPP gas for plumbing. The faster the joint gets hot enough to solder, the less the rest of the pipe heats.

  14. p.g.sharrow says:

    jim2 says “I like MAPP gas for plumbing”
    AS do I ! Specially the self commencing torch. Pull the trigger, instant fire! release and dead out. Very handy while crawling around under a building to do work, a lot safer as well. Battery power tools with built in LED lights are also great. Not at all like the “good old days” of my youth.
    Advancements in technology available to us can make things a lot easier as well as more complex. I have been spending days getting comfortable with Acad (again) so my grandson can “print” up some parts for a small fume scrubber we are working on to clean the fumes generated by the action of the 3D printer. 8-( something about an Old Dog learning new tricks…pg

  15. jim2 says:

    PS Pad will show formatting characters. One could, I suppose, encode something in those white space characters. /r/r /r/r/r/ /r

  16. pouncer says:

    One of the Tom Clancy / Jack Ryan spy thrillers discussed the “canary trap” concept. Each document had chapter summaries with very quotable headers. Each COPY of the document had slightly DIFFERENT quotes.

    So the copy provided the Senator discussed how the NorK missiles were “protected in narrow, twisty caverns ” while

    the copy provided the Congressman discussed how missiles were “within protective twisty narrow caverns” and

    the copy provided the Secretary of State discussed missiles in “secure caverns, twisty and narrow”

    So when a quote leaked, the exact wording would specify which office was the singing canary.

    I believe such a method would defeat the notion of printing, scanning, and OCR reproduction.

  17. Oliver K. Manuel says:

    JoNova has an encouraging report on the impact of Brett Stephen’s first report in the NY Times:

  18. E.M.Smith says:


    Um, I think I referenced that idea in the article, though more indirectly:

    “Now there are issues even with that. For example, you can put specific changes of text and / or spelling into a doc to mark it. ”

    Then suggested running it through a ‘translate and back’ to change all the exact phrasing just a little….

  19. Larry Ledwick says:

    Or instead of quoting the document you can summarize it using different phrasing. This defeats all the subtle tags like simple typos, odd punctuation and phrase construction, but opens you up to your personal communication style and traffic analysis on how you tend to say things. Anonymizing it by saying “a source (or sources) tells me xyz” creates a plausible deniability and makes it much more difficult to tie to a specific source as long as all or most of the original documents contain the same essential elements as your summary and none of them exclusively contains a point you summarize. Pretty soon you get into a labyrinth of guessing which items are common to all sources and which are unique.

    Depending on how skillful the originator is, you can get burned no matter how you try to disguise the information content.

    Mailing lists flag their source by including pseudo addresses, if you send your content to that fake address, you had to get it from their spiked list. Same happens with maps, ever see a small one horse town on a map and when you got to it there was nothing there but hay fields? Likely that small fictional town was a tell tale for copyright infringement on the map. One map lists distance between two towns as 59 miles another lists it as 62 miles. The error is small enough people will not pay much attention to it but if all such faulty distances are repeated in another map you have strong evidence that the source map was just copied.

Comments are closed.