P.G.: You Will Like Calamari – A Squid Testimonial

I know, I said I was going to bed… and I almost did… but it’s hard for a geek in a coding frenzy to just ‘let go’. I was only going to browse one or two other possible config issues on the tablet as the lights were out.

Then I ran into a reference to using “squid” with “tor” (and “privoxy” with “tor”… for another posting later…) as “chained proxies”. Why would you ever want to do that? I pondered… And at that moment I knew “Curiosity Compelled The Geek” was in the wind.

Well, it’s about an hour later. I’ve installed “squid” and I’m liking it a whole lot.

What’s squid? It is a “caching proxy”. You load a web page or ad or whatever once, and it puts it in a nice fat cache. Next time you access it, it comes from the cache. Seems this sucker is used all over the internet to lighten load on web servers (so the squid takes a surge of load locally instead of funneling it all back to the main server. Sort of like an Akamai light).

http://www.squid-cache.org/

Making the most of your Internet Connection

Squid is used by hundreds of Internet Providers world-wide to provide their users with the best possible web access. Squid optimises the data flow between client and server to improve performance and caches frequently-used content to save bandwidth. Squid can also route content requests to servers in a wide variety of ways to build cache server hierarchies which optimise network throughput.

Website Content Acceleration and Distribution

Thousands of web-sites around the Internet use Squid to drastically increase their content delivery. Squid can reduce your server load and improve delivery speeds to clients. Squid can also be used to deliver content from around the world – copying only the content being used, rather than inefficiently copying everything. Finally, Squid’s advanced content routing configuration allows you to build content clusters to route and load balance requests via a variety of web servers.

” [The Squid systems] are currently running at a hit-rate of approximately 75%, effectively quadrupling the capacity of the Apache servers behind them. This is particularly noticeable when a large surge of traffic arrives directed to a particular page via a web link from another site, as the caching efficiency for that page will be nearly 100%. ” – Wikimedia Deployment Information.

75% thinks I… no way…

So I did a simple test. Shut off anything that blocks ads or does other load lightening stuff. Hit WUWT and do the 1 Mississippi 2 Mississippi… 80 Mississippi and it’s done.

Turn on the proxy connection. Hit it again, another 80 count (as the squid cache loads). Hit it again… 22 Mississippi… and it is done. Yup, about 75% less bandwidth used.

Now I’ve not tested on a lot of stuff, but this was a darned quick and easy “wow” factor.

Hitting Tallblokes was 24 and 8. About a 66% lighter load.

Now I don’t know how much it will help in general browsing as you don’t usually hit ‘reload’ on sites. BUT I often revisit a site or web page a couple of times, and then ads and other “junk” is often very repetitious. I expect the speedup to rise over time as the cache builds, for general site visits. (Privoxy is a proxy that does a whole lot of ad removal, banner removal, dancing animated gif removal, etc. After I get it installed and tested I’ll report on it, too. I’m suspecting maybe chaining the two for a mix of removal and cache might be interesting. Then again, maybe it won’t do much… We’ll see.)

But the first thing I thought of was P.G. on the end of his slow link. And John on his boat… If any would benefit from a large local cache, it would be folks with a slow link.

How hard was it? Darned easy. I installed it on the CentOS box (probably the hardest one to do, as CentOS is picky about things and often not in sync on release cycle with the current trendy stuff). It was basically just install the package, change one line in the default config file, and turn on the service. Then point the browser proxy at port 3128. The site says they have versions for Windows too. I’m going to try it on the XP box “some other day” (unless folks need a ‘how to’ sooner).

There’s a long list of binary versions available here:

http://wiki.squid-cache.org/SquidFaq/BinaryPackages

As this is a proxy meant to server many clients, it is usually done on a server, so one could also put squid on a R.Pi doing routing and then just have one place all your systems can point. I’d have done that but I had the router card out of the Pi right now and this box was easiest to get up quick and ‘try it’.

So what did I do?

yum install squid

vi /etc/squid/squid.conf

# Uncomment and adjust the following to add a disk cache directory.
cache_dir ufs /var/spool/squid 100 16 256

service squid start

That’s IT

The first time I tried to start the service, it would not start. Taking the # from in front of the cache line allowed it to start. (Hard to start with no place to put the cache, I guess). The numbers are size, top directories, second directories. So 100 MB of disk for the cache size. I’ll likely make all of those larger over time. Maybe.

Then in Firefox, under Edit : Preference : Advanced : Network : Connection (Settings)click the ‘manual proxy’ radio button, put 127.0.0.1 in the http proxy name (since we are using the local host), put 3128 in the port number, and check the ‘use this proxy for all’ box, OK.

More trouble to set the proxy in Firefox than to get the service up, in some ways.

For non-Fedora non-CentOS boxes (like the pi) I’d expect that package manager line instead of being the yum command to be something like “apt-get install squid”. That, too, will wait for tomorrow and the R.Pi test. As for Windows, I expect it’s the usual unzip the .exe process.

The one “negative” I’ve run into is that loading the “Tech Rebound – The Musical” page has black squares where the youtube images ought to be. I suspect there is a config setting to do something special with them. Just be advised that it looks like it blocks youtube by default and one would need to either point away from the proxy, or perhaps change the config file, to have them visible in pages. For P.G. that’s likely a ‘feature’ ;-0

In Conclusion

As it stands, I’m very very happy with squid. Anything that cuts repeat loads of things by 2/3 to 3/4 is a nice thing to have around. I often hit the same web sites every day, and I’m sure a lot of the ads, widgets, and ‘whatever’ like images and banners are constant. Just doing this posting things are “snappier” as many of the parts of the page are cached, both on the writing /editing page and on the preview page.

I’m going to take privoxy for a test drive in the next few days too, while getting TOR to work likely using the “other” config listings I’ve gotten. Then I’ll try chaining different combinations to see how they do. Several sites recommended different combos of squid-tor and privoxy-tor (depending on your goal) but the squid site suggested using privoxy for privacy instead of squid, so chaining them might not do much; as one removes the stuff – ads, banners, graphics – that the other one caches.

It is highly likely that I’ll leave squid installed on several of my machines, even if I have a R.Pi squid-tor server running. Just because sometimes I tear down systems to play with the parts and it would be nice to have it there just one swap of proxy settings away…

So “happy hacking” and if you try it, please let us know what you think of any experience differences you have (good or bad). It will be a while before I can give it a real ‘shakedown’ as I really really am going to bed “real soon now” ;-) and a broader test base would help others decide to try it, or not.

Subscribe to feed

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in Tech Bits and tagged , , . Bookmark the permalink.

6 Responses to P.G.: You Will Like Calamari – A Squid Testimonial

  1. E.M.Smith says:

    Looks like YouTube actively fights caching. This link has some things that sometimes work:

    http://wiki.squid-cache.org/ConfigExamples/DynamicContent/YouTube

    Outline

    The default configuration of squid older than 3.1 prevents the caching of dynamic content and youtube.com specifically implement several ‘features’ that prevent their flash videos being effectively distributed by caches.

    This page details the publicly available tactics used to overcome at least some of this and allow caching of a lot of youtube.com content. Be advised this demonstrated configuration has a mixed success rate, it works for some but others have reported it strangely not working at all.

    Each configuration action is detailed with its reason and effect so if you find one that is wrong or missing please let us know.

    Contents

    Caching YouTube Content
    Outline
    Partial Solution 1: Local Web Server
    squid.conf configuration
    Partial Solution 2: Squid Storage De-duplication
    Missing Pieces
    Squid Configuration File
    Discussion
    Caching YT is impossible with Squid only now
    Knowing what to cache
    To cache that content:
    The bug
    Temporary work around
    Fixed

    Though it does say “older” so maybe it’s just another of those CentOS has “older” stuff on it issues… We’ll see when the Pi gets done…

  2. beng135 says:

    Interesting post — will take awhile to digest & maybe try out. The standard Firefox cache seems to work well most of the time (refreshing a page online is almost instantaneous), but long ago w/dialup I’d load pages w/o reading, then read at leisure “offline”. Lately tho, that doesn’t work well — many pages won’t read offline anymore & require “online”. Frustrating (yes, I know, most people just stay “online” all the time). Maybe ads, trackers & javascript are responsible for this behavior.

  3. E.M.Smith says:

    Life with Squid has been interesting. Overall, a benefit. I’ve installed it on the RaPiM2 and it is working fine. As expected, the .config file is longer with more options. Setting the disk cache caused it to give a ‘refused connections’ return to the browser ( I’d doubled all the sizes) so “someday” I’ll go back and play with the different sizes individually.) But the default disk cache seems big enough anyway. I did double the memory cache. 8 MB is nearly nothing to a 1 GB memory pool, and I’ve got a 2 GB real disk swap mounted too… I’ll likely double it again as I’ve seen nearly no negative impact ( “top” reports squid using about 1% of memory at the moment). I think I did notice faster speeds after the double.

    The video full “Tech Rebound, The Musical” posting now shows the top image ( i.e. not a black square but the visual of the video and the ‘play’ triangle) but clicking on it does not run it.

    So, the workaround for video:

    IMHO there is a nearly trivial workaround for this lack of youtube. 2 Browsers.

    I usually have at least 2, and often 3, installed. Some Firefox / IceApe / IceWeasel variation, Opera if available, and Chrome if it runs there. The Pi adds Epiphany as a light weight browser on the new releases and Midori on the old ones. So just point one at the Squid and the other one not. Hit a site with a video? Copy the link, paste in the other browser, hit return. Faster than changing the proxy settings (which is also very fast).

    I don’t know why Squid seems to offer more speedup than the native Firefox cache. I suspect it is a lot of the little adverts and other such active content (that may have some HTML tag saying ‘go fetch a new advert / copy and send a notice’ while Squid may just say “Nah, the old one is good enough”)… but it is noticeably faster.

    It isn’t all that visible an effect all the time, but when you’ve hit the long slow load page it seems to flip it into an acceptable range, and acceptable is far better than WTF am I waiting for?…

    So I’m gong to make Squid a regular part of my install process. Now on the “someday” list is learning how to ‘tune it up’ for even better performance. I’m just using the defaults, mostly, and it has a zillion settings you can make…

  4. E.M.Smith says:

    What a difference a disk makes…

    I noticed that squid puts the cache in /var/spool/squid. That’s on the / (root) partition. That lives on the SD card. Read / write to an SD card is “not fast”…

    I stopped squid:

    service squid stop

    Made a directory on the WD111 disk:

    mkdir /WD/Squid

    Copied /var/spool/squid into it (so it would not need to be recreated):

    rsync -ac /var/spool/squid/ /WD/Squid

    Then moved it over:

    mv /var/spool/squid /var/spool/squid.save

    and made a symbolic link ot the real disk:

    ln -s /WD/Squid /var/spool/squid

    And finally touched up ownership and group on /WD/Squid with a chown proxy and chgroup proxy and chmod 750 on it.

    Restarted squid:

    service squid start

    And went back to the browser.

    It is significantly faster. No “pause” while it gets a cache load from the SD card. Nice.

    I’m tempted to say “snappy”… even…

    Now I’m wondering if a distinct /var on real USB disk might benefit even more things. It would be very easy to make a disk partition, and put it into the /etc/fstab entry for automatic mounting. Copy all of he default /var into it. Then mount it right over the old one.

    Now all of /var goes to the disk. Yet, if for some reason the mount doesn’t happen (disk not plugged in or whatever) the system sees the old /var space and just keeps on keeping on…

    I think that needs to be part of my default build process.

    There is some forensic value in /var entries, so most likely that /var on USB disk ought to be an encrypted space, especially if you expect to need a ‘pull the plug and chip and go’ security ‘bug out’ process. For the “daily driver” I’d not bother. For the “secure chip”, I’d leave it on the encrypted chip and not have the complexity of it.

    I’m now wondering what all else might speed up by a move off of the SD card… Berryboot has a choice of ‘install to USB’, As soon as I have an empty USB drive to play with again, I think I’ll try that with the OS on a real spinning USB disk and see what happens…

  5. p.g.sharrow says:

    @EMSmith; Not sure if I understand everything I have read on the links you have provided about “Squid”. But it appears to me that Squid might “remember” things that you would not care to be remembered in a “TallBloke” event. For a organization serving many page demands it could be a great way to efficiently use bandwidth, but a single user setup might not be worth it.
    I would think that the SD chip software needs to be the minimum to make the computer useful for it’s task. Specialized services should go through the USB ports. Squid would seem to be best served with a spinning disk, not the best thing in bug out or in minimal resource conditions.

    Not sure which of us benefits the most from your efforts. Those of us that follow your postings or the guy doing the research ;-)…pg

  6. E.M.Smith says:

    Well, I just moved my home directory on my “daily driver” onto the Western Digital disk. Now all the i/o that was going to .mozilla (cached files by the browser and such) is much faster as it does not take any SD write delays. Loading web pages is even ‘snappier’…

    How?

    rsync -ac /home/pi /Diskname/home/pi
    

    then edit /etc/fstab to change the entry for ‘pi’ from :/home/pi: to :/Diskname/home/pi: (or whatever the mountpoint / directory is that you used).

    Now I ‘got a bit trick about it’ and put a symbolic link in /Diskname/home/pi before mounting the file system that points back to /home/pi. In that way, if the disk is not mounted on /Diskname/home/ at the time of boot up, you can still log in with the old standard home directory contents.

    mkdir /Diskname
    mkdir /Diskname/home
    cd /Diskname/home
    ln -s /home/pi pi
    

    Now if you do a

    mount /Diskname/home
    

    All your stuff will be on the disk. If you forget to mount it, or it isn’t plugged in, then the symbolic link will be visible and when you log in, it tossed you back into /home/pi and everything still works.

    @P.G:

    Yes, there are 2 distinct and divergent scenarios.

    1) The secure “go dark” chip. It ought to be encrypted, and as small as possible. I’d use a Class 10 or better mini-SD chip in the R.Pi board, with the Berryboot encrypted build. Probably a 32 GB chip, but that would depend on how much you wanted to store on it. It’s a dinky little thing, so just click it out of the Pi when the doorbell rings (or gets broken down) and drop it down the heating vent. All gone…

    2) A “daily driver” that you use only for things that are entirely of no interest to forensics types or useful for anything against you. Browsing WUWT or here. Looking up ball game scores. Reading the NYT (nice liberal bias paper to get you in good with the Judge looking at the warrant… 9-)

    This posting is more about case #2. Making that daily driver experience faster. Especially if you are at the end of a slow wire.

    Squid is a ‘caching proxy’. That means you point your browser at it, and it goes and fetches the pages from the internet, and stores a copy in its cache. /var/spool/squid/*

    Next time you ask for any page with some of that stuff in it (like my picture up top), it fetches it from the disk cache, not from the internet over the wires. (There are a hundred config settings to tune how big, how long, what types, etc. etc.) For someone on the end of a slow wire, this speeds things up a LOT.

    By default, squid cashes things onto the SD card, so it would also work on that encrypted card, but with some faster ‘wear’ of the card. I’ve chosen on this system to ‘go for speed’ as it is used almost 100% for doing this blog, and all this is public anyway. I found that putting things on ‘real disk’ speeds them up even more. The SD cards are not very fast at writes… (and I think these small 8 GB cards are older Class-4 speed anyway – i.e. very slow)

    I presently have Squid putting the files it caches onto a spinning disk. This is faster. Just now I moved my home directory onto that same spinning disk. It got even faster (by eliminated SD card IO for the browser actions / cache). A remarkable speedup.

    ( It is an older Western Digital that as near as I can tell does not do the ‘sleep’ that the newer ones do. It isn’t very fast if you have to take a 4 second wake-up on your 2 TB Toshiba disk on your first web page open … I’ve also found that putting ‘swap’ on the Toshiba causes a freeze up when the disk sleeps and the computer wants to swap a page… a page request does not seem to issue the same ‘wake up’ call as a regular disk read. Works fine on the WD… Another reason to get the Cubietruck with the SATA disk)

    Now this DOES put all that stuff you might want to ‘jerk and drop’ onto an external spinning USB disk. Not Good for case #1, but fine for #2. (In fact, might be advantageous as folks would tend to think that is ALL your stuff… And yes, I know I’m giving a ‘road map’ here for how to search for ‘my stuff’. I’m willing to do that since I don’t have anything of interest, really… It’s the principle of the thing to me ;-)

    In the “end game”, I plan to buy a small external USB ‘real disk’ and encrypt it, then put all those active spaces on it for the ‘daily driver’, so it is still not readable without extorting my password from me. (Well, really, I’ll have a premade command to nuke the header and render it non-recoverable… and the backup will be in a foreign country somewhere…) But that is still for the #2 case. The #1 case stays on a dedicated very tiny encrypted micro-SD card. Since swapping the chips is easy and takes about 2 seconds, not exactly a hardship.

    Some notes on forensics:

    In a forensics effort, your home directory has a load of crap in it that folks go looking through. Just do a:

    du -ks .[a-z]* 
    

    That counts up all the disk space used by all the “hidden” files who’s names start with a dot like the .mozilla directory. It is full of cached web pages. (And .thunderbird has your mail and… and… and… )

    Well, Squid is just like a giant version of that .mozilla cache, which caches even more, and holds onto it longer. IFF, for example, you went cruising porn sites or visiting the DNC page, that would stay in both the SQUID cache and potentially your .mozilla cache. So don’t do that.

    When doing that, you go to the EDIT: Preferences:advanced:proxy menu and set the proxy to “none” and you choose a “secure browsing” setting if your browser has one; or you boot up that special purpose SD card with the encrypted data and not using a proxy… (Or have two browsers, one uses the proxy for faster browsing, the other is not routed to the proxy) And set your browser to either “don’t cache” or when done, purge the cache.

    But, for day to day stuff, that use of SQUID will mean a LOT less stuff gets pulled down your wire over and over and over again. If two folks in the house (or you, on two different computers) visit this site, the picture at the top only gets pulled down one time. Next day, it doesn’t get pulled down at all – though you can set the timeout interval…)

    Since you are on the end of a slow wire, it ought to help you a lot.

    Oh, and take a look at “privoxy”. It is a proxy that tries to filter out all the junk. As it never sends the request up the wire for ads or banners or junk, it ought to be a big speed up too. I’ll be taking it for a ‘test drive’ in a week or two.

    So basically there’s a 2 way split between things to make speeds faster, and things to make stuff secure. (“Fast, good, cheap – pick any two” A truth in this industry… and the Pi already chose cheap… I would assert “private and secure” is part of “good”.)

    Having a local DNS that is set to cache, speeds up things a lot. It also means that for a little while it knows what you have looked up. Speed, at the cost of more exposure. You could build it on an encrypted system, but now at boot time you must be there to type in the password to boot. Similarly a caching proxy server. (Anything caching, really, including your browser).

    Now for my needs, putting swap on a real partition, and both my home directory and Squid cache on the same real disk, gives so much speed up I’m “good with that” on my daily driver. Not enough risk to matter.

    For the “secure system”, I’ll not do any of that. It’s all on one small chip and I can crush it in my teeth if need be. Someone comes and collects my gear, they will get the (encrypted) external disk and a R.Pi with a generic OS on it. A nice set of the collected works of NOAA and Hadley, some blog postings drafts and archives, and not much else. Oh, and my browsing history showing I read a lot of this site, WUWT, Iceagenow.info, and Tallblokes site; but they would know that if they read the blog ;-)

    Basically it is like when you have a ‘work computer’ and the company logs all browser traffic, email, and does backups. You just don’t do anything really private on it. To do your banking (or browsing non-work sites) you pop out the tablet on the public network… Same thing, only you have them both as chips on the R.Pi.

    Maybe it is having spent 30 years under the thumb of “Employee Handbook Rules”, but it is just 2nd nature to me to have a “professional at work” mode and a distinct “My space at home” mode. Even when both machines are at home…

    So ‘give it a whirl’ and set up a squid proxy. You can always nuke the cache with:

    rm -rf /var/spool/squid
    

    Or for real paranoia, look up the ‘shred’ command ;-)

    As to who gets the most out of it: I think it is both. For me, these postings are my ‘notebook’. So if a year from now I think “Hey, I’d like to set up a caching proxy again. How did I do that?” I’ve got the notes on what worked, what strange config thing needed a tweak, all that.

    For other folks, they get all the gain without the need to do the trial and error and error and error and Hey it worked!

Comments are closed.