I just wanted to make sure you were aware - there seem to be long load times especially when loading the community pages (posts load fairly quickly).

  • plantteacher@mander.xyz
    link
    fedilink
    arrow-up
    0
    ·
    5 days ago

    I’ve not tried the onion instance since reporting the data loss issue, but in principle a the onion host could be a good candidate for read-only access (scraping).

    Would it perhaps make sense to redirect the greedy subnet to the onion instance? I wonder if it’s even possible. The privacyinternational website used to auto-detect requests from Tor exit nodes and automatically redirect to their onion site. In the case of mander, it would do that for the subnet giving problems. They are obviously not using Tor to visit your site, but they could have Tor installed. You would effectively be sending the msg “hey, plz do your scraping on the onion node,” which is gentler than blocking in case there is more legit traffic from the same subnet. That is assuming your problem is not scraping generally but just that they are hogging bandwidth that competes with most users. The Tor network has some built-in anti-DDoS logic now, supposedly, so they would naturally get bottlenecked IIUC.

    I guess the next question is whether the onion site has a separate allocation of bandwidth. But even if it doesn’t, Tor has a natural bottleneck b/c traffic can only move as fast as the slowest of the 3 hops the circuit goes through.

    • Salamander@mander.xyzM
      link
      fedilink
      arrow-up
      0
      ·
      2 days ago

      I have experienced issues both over tor and over clearnet. The tor front-end exists on its own server, but it connects to the mander server. So, the server that hosts the front-end via Tor will see the exit node connecting to it, and then the mander server gets the requests via that Tor server. Ultimately some bandwidth is used for both servers because the data travels from mander, to the tor front-end, and then to the exit node. There is also another server that hosts and serves the images.

      What I see is not a bandwidth problem, though. It seems like the database queries are the bottleneck. There is a limited number of connections to the database, and some of the queries are complex and use a lot of CPU. It is the intense searching through the database what appears to throttle the website.

      • plantteacher@mander.xyz
        link
        fedilink
        arrow-up
        0
        ·
        1 day ago

        So, the server that hosts the front-end via Tor will see the exit node connecting to it

        The onion eliminates the use of exit nodes. But I know what you mean.

        I appreciate the explanation. It sounds like replicating the backend and DB on the Tor node would help. Not sure how complex it would be to have the DBs synchronise during idle moments.

        Perhaps a bit radical, but I wonder if it would be interesting to do a nightly DB export to JSON or CSV files that are reachable from the onion front end. Scrapers would prefer that over scraping, and it would be less intrusive on the website. Though I don’t know how tricky it would be to exclude non-public data from the dataset.

  • Shdwdrgn@mander.xyz
    link
    fedilink
    English
    arrow-up
    0
    ·
    13 days ago

    Got a few minutes of 503 gateway timeout errors here, but the pages just loaded back up again.

  • Salamander@mander.xyzM
    link
    fedilink
    arrow-up
    0
    ·
    11 days ago

    This morning I woke up and a new IP sub-net (43.173.0.0/16) was excessively hitting the site from multiple IPs, probably scraping, making the site unresponsive. I blocked that sub-net and the site is responsive again.