• 33 Posts
  • 52 Comments
Joined 5 months ago
cake
Cake day: July 1st, 2024

help-circle



  • wget has a --load-cookies file option. It wants the original Netscape cookie file format. Depending on your GUI browser you may have to convert it. I recall in one case I had to parse the session ID out of a cookie file then build the expected format around it. I don’t recall the circumstances.

    Another problem: some anti-bot mechanisms crudely look at user-agent headers and block curl attempts on that basis alone.

    (edit) when cookies are not an issue, wkhtmltopdf is a good way to get a PDF of a webpage. So you could have a script do a wget to get the HTML faithfully, and wkhtmltopdf to get a PDF, then pdfattach to put the HTML inside the PDF.

    (edit2) It’s worth noting there is a project called curl-impersonate which makes curl look more like a GUI browser to get more equal treatment. I think they go as far as adding a javascript engine or something.


  • It’s perhaps the best way for someone that has a good handle on it. Docs say it “sets infinite recursion depth and keeps FTP directory listings. It is currently equivalent to -r -N -l inf --no-remove-listing.” So you would need to tune it so that it’s not grabbing objects that are irrelevent to the view, and probably exclude some file types like videos and audio. If you get a well-tuned command worked out, that would be quite useful. But I do see a couple shortcomings nonetheless:

    • If you’re on a page that required you to login to and do some interactive things to get there, then I think passing the cookie from the gui browser to wget would be non-trivial.
    • If you’re on a capped internet connection, you might want to save from the brower’s cache rather that refetch everything.

    But those issues aside I like the fact that wget does not rely on a plugin.



  • IIUC you are referring to this extension, which is Firefox-only (likeunlike the save page WE, which has a Chromium version).

    Indeed the beauty of ZIP is stability. But the contents are not. HTML changes so rapidly, I bet if I unzip an old MAFF file it would not have stood the test of time well. That’s why I like the PDF wrapper. Nonetheless, this WebScrapBook could stand in place of the MHTML from the save page WE extension. In fact, save page WE usually fails to save all objects for some reason. So WebScrapBook is probably more complete.

    (edit) Apparently webscrapbook gives a choice between htz and maff. I like that it timestamps the content, which is a good idea for archived docs.

    (edit2) Do you know what happens with JavaScript? I think JS can be quite disruptive to archival. If webscrapbook saves the JS, it’s saving an app, in effect, and that language changes. The JS also may depend on being able to access the web, which makes a shitshow of archival because obviously you must be online and all the same external URLs must still be reachable. OTOH, saving the JS is probably desirable if doing the hybrid PDF save because the PDF version would always contain the static result, not the JS. Yet the JS could still be useful to have a copy of.

    (edit3) I installed webscrapbook but it had no effect. Right-clicking does not give any new functions.








  • Your assertion that the document is malicious without any evidence is what I’m concerned about.

    I did not assert malice. I asked questions. I’m open to evidence proving or disproving malice.

    At some point you have to decide to trust someone. The comment above gave you reason to trust that the document was in a standard, non-malicious format. But you outright rejected their advice in a hostile tone. You base your hostility on a youtube video.

    There was too much uncertainty there to inspire trust. Getoffmylan had no idea why the data was organised as serialised java.

    You should read the essay “on trusting trust” and then make a decision on whether you are going to participate in digital society or live under a bridge with a tinfoil hat.

    I’ll need a more direct reference because that phrase gives copious references. Do you mean this study? Judging from the abstract:

    To what extent should one trust a statement that a program is free of Trojan horses? Perhaps it is more important to trust the people who wrote the software.

    I seem to have received software pretending to be a document. Trust would naturally not be a sensible reaction to that. In the infosec discipline we would be incompetent fools to loosely trust whatever comes at us. We make it a point to avoid trust and when trust cannot be avoided we seek justfiication for trust. We have a zero-trust principle. We also have the rule of leaste privilige which means not to extend trust/permissions where it’s not necessary for the mission. Why would I trust a PDF when I can take steps to access the PDF in a way that does not need excessive trust?

    The masses (security naive folks) operate in the reverse-- they trust by default and look for reasons to distrust. That’s not wise.

    In Canada, and elsewhere, insurance companies know everything about you before you even apply, and it’s likely true elsewhere too.

    When you move, how do they find out if you don’t tell them? Tracking would be one way.

    Privacy is about control. When you call it paranoia, the concept of agency has escaped you. If you have privacy, you can choose what you disclose. What would be good rationale for giving up control?

    Even if they don’t have personally identifiable information, you’ll be in a data bucket with your neighbours, with risk profiles based on neighbourhood, items being insuring, claim rates for people with similar profiles, etc. Very likely every interaction you have with them has been going into a LLM even prior to the advent of ChatGPT, and they will have scored those interactions against a model.

    If we assume that’s true, what do you gain by giving them more solid data to reinforce surreptitious snooping? You can’t control everything but It’s not in your interest to sacrifice control for nothing.

    But what you will end up doing instead is triggering fraudulent behaviour flags. There’s something called “address fraud”, where people go out of their way to disguise their location, because some lower risk address has better rates or whatever.

    Indeed for some types of insurance policies the insurer has a legitimate need to know where you reside. But that’s the insurer’s problem. This does not rationalize a consumer who recklessly feeds surreptitious surveillance. Street wise consumers protect themselves of surveillance. Of course they can (and should) disclose their new address if they move via proper channels.

    Why? Because someone might take a vacation somewhere and interact from another state. How long is a vacation? It’s for the consumer to declare where they intend to live, e.g. via “declaration of domicile”. Insurance companies will harrass people if their intel has an inconsistency. Where is that trust you were talking about? There is no reciprocity here.

    When you do everything you can to scrub your location, this itself is a signal that you are operating as a highly paranoid individual and that might put you in a bucket.

    Sure, you could end up in that bucket if you are in a strong minority of street wise consumers. If the insurer wants to waste their time chasing false positives, the time waste is on them. I would rather laugh at that than join the street unwise club that makes the street wise consumers stand out more.











  • I should also add that some people come for asylum but they do not follow the legal process because they are reasonably concerned that the process will fail to protect them (especially if they entered under the Trump regime). If someone enters without filing then gets targeted (e.g. a hospital rats them out), and only then claim asylum, I don’t know what happens but obviously we need the process is competent about separating the genuine cases from the rest. I suppose that’s the scenario you are referring to.



  • In fact, borderline human rights compromise is actually a good incentive for people to leave. Would perhaps be good for the country if those in Texas who respect human rights would move from Texas to Pennsylvania for a human rights upgrade (where also the death penalty was repealed).

    But I doubt your statement is accurate considering inbound refugees are fleeing from even worse conditions w.r.t. human rights. Refugees still technically have their human right to access emergency medical treatment, they just risk getting harassed and tagged for deportation.