• 6 Posts
  • 6.1K Comments
Joined 2 年前
cake
Cake day: 2023年7月9日

help-circle



  • We had a situation where the banks forced us to split into two companies. Our part would be a new-formed company under a previous department manager. Our choice was to join the new company or let go. So I told the CEO that I would like to work in the new company, but if he would give us that guy as a boss, we’d be bankrupt in two or three months. Most of my coworkers told him about the same.

    Handbrakes were applied to the process, the company owner was called, he talked to us, too, and called in a general staff assembly to tell us that the original candidate would not be available for the role of CEO for “health reasons”. We got split off with a different CEO, and since then we are prospering like never before.







  • My wife and I had booked a hotel with “shower en suite”. We assumed that this meant that the hotel rooms bathroom had a shower instead e.g. a tub.

    Nope. In the middle of the room, there was a plastic booth not unlike those portable toilets you can find at festivals. This was the shower. You had to drop in coins to get warm water in the shower.

    There was no bathroom as such. A common toilet was half the stairs down from the room, and it ran out of toilet paper on the weekend.

    The breakfast was rather spartan, a lot of “either this or that, but not both” selections.

    This was very emberrasing for our friends who had recommended that place (and helped us with a roll of toilet paper). They had been in that hotel some 30 or 40 years before, when it still had style…










  • The problem lies in the PDFs themselves. In there are objects that represent lines of glyphs. If you are lucky. A conversion tool can guess which of those lines belong together and produce the text.

    It cannot know any intentions behind it, though. Take a numbered list. The first line is two line objects: the number plus the . or the ), and the first line of text. The conversion tool can now guess. As the line blocks with the numbers are all left of the line blocks with text, this could be a numbered list. Or it could be a table with two columns. Nothing in the PDF is giving any hints.

    And that is the easy part. This assumes that the document either uses default fonts, or keeps its embedded fonts untouched. If they use embedded fonts and a PDF optimizer that only embeds the used characters and renumbers them, any copy or conversion tool is bound to fail.

    Same with protected PDFs where you simply cannot copy the text from the start.

    And then there are PDFs that just consist of scanned pages. Here you would need an OCR software to get something readable out of them.

    PDF is an archival, output format, the end of a process. Not something to work from.

    Always preserve the original file. Keep it safe. If you change tools, make sure you have a conversion path into something editable. The PDF is for giving away, nothing else.