Several times I’ve commented that you can make your own local copy of the repositories and then point multiple machine builds toward them, reducing your network usage a lot (by BuildSize x (NumberOfBuilds-1) – RepositorySize), and you can eliminate the risk of a Man-In-The-Middle attack on your updates (with only one exposure at the Repository creation step and then at any updates to it, which can now be at your choosing and not just pot luck for each build).
So what’s the process and how big are those Repositories, anyway? How long does it take to make a local copy and what kind of resources are we talking about?
Well, last week I decided to take a stab and it and see how much work it really was in a home network environment.
The thumbnail sketch is that a straight Debian / Devuan repository goes very fast and doesn’t take much. The Raspbian Repository is still running after 4 days and sucking down a lot of disk space (pushing 300 GB and rising). No, I don’t know why. It could be any of several things. Raspbian having several variations over the years and maybe Debian only being one architecture image? Hard to say without digging into it.
At present, the Raspbian archive copy is only up to the “T”s for file names:
So I’m hoping it’s just u,v,w,x,y, and z to go… (but the X stuff could be huge…)
I’m using the Orange Pi with the LVM volume set on it for the download (and eventual provisioning to builds). It is running at about 15% loaded, and the process doesn’t look bandwidth limited at this end either (as we can run both TVs and it doesn’t seem to change much nor interfere with the TV) where the Debian download did cause a visible pause in the TV start up (until it claimed some bandwidth and the queued response packets dropped off a little). So I’m pretty sure the Raspbian is server bandwidth limited.
As of now, the sizes of the repositories in kb are:
177688972 apt-mirror 293033000 pisync
apt-mirror is the Debian / Devuan set and pisync is what I called the Raspbian one. It is supposed to be about 300 MB, so hopefully will finish “soon”.
Inside the apt-mirror directory we can see the different bits:
root@orangepione:/LVM/Repository# ls apt-mirror mirror skel var root@orangepione:/LVM/Repository# !!/* ls apt-mirror/* apt-mirror/mirror: archive.raspbian.org archive.ubuntu.com packages.devuan.org apt-mirror/skel: archive.raspbian.org archive.ubuntu.com packages.devuan.org apt-mirror/var: ALL dep11-log.0 index-log.10 translation-log.1 archive-log.0 dep11-log.1 index-log.11 translation-log.10 [...]
The original run was done with some Raspbian sites included and so there’s that raspbian.org bit, then the ubuntu base and the devuan overlay parts. The Raspbian bit is 55 GB and I can throw it away once the actual raspbian.org direct copy is done. (Well, I could throw it away now… more on that below).
The command generally used to make a mirror is named “apt-mirror” and it uses a config file in /etc/apt/mirror.list that looks a lot like a /etc/apt/sources.list file (where you tell the system where to get updates). I’d originally copied the Raspberry Pi entry into it, then found out Raspbian uses a different preferred method AND my cut/past was of a mirror redirector so not an actual archive site and that “caused issues” (or at least I had issues when it was in place). It is possible that the “issues” were due to me not having set a defaultarch yet, but I’ve not gone back to re-test it as I moved on to the Raspbian recommended rsync for Raspbian.
So, some details:
It’s the easier one to point at / explain. First, you go to their web site here:
It lists all their various mirrors where you can get a code download / update. It also lists their preferred way of making a duplicate mirror of your own:
Mirroring the Raspbian Repository
Mirroring of the full Raspbian repository can be accomplished with rsync at archive.raspbian.org::archive.
Sample syntax to mirror the archive would be:rsync --archive --verbose --delete --delete-delay --delay-updates \ archive.raspbian.org::archive /path/to/local/mirror
The ‘–dry-run’ option can be used to first test out connectivity before attempting a full mirror.
On your first run, the –delete and –delete-delay don’t do anything as there is nothing to delete. In later update runs, old obsoleted files will only be deleted at the end of the run, so failure to complete doesn’t leave you in a disjoint broken state. Similarly updated files with the –delay-updates. The rest is pretty clear. Make an archive, do it in a chatty or verbose way so I can see what is happening, get it from archive.raspbian.org and put it in /some/path.
My “script-let” looks like this:
rsync --archive --verbose --delete --delete-delay --delay-updates archive.raspbian.org::archive /LVM/Repository/pisync
Where you can see I chose to put it in an LVM volume and change the name of the archive to pisync to remind me how I made it ;-)
Currently, the repository itself is about 250GB in size with approximately a further 20GB in images etc. Anyone hosting a Raspbian repository mirror should plan on devoting about 350GB to handle future anticipated growth of the repository.
So it ought to be done at 270 GB, yet I’m at 293 GB and rising, so I think that “future growth” is already growing…
This site has a decent tutorial on setting up an apt-mirror based local mirror for Debian / Ubuntu. It is what I followed to set up mine the first time. Note that I didn’t need to add any special source to my /etc/apt/sources.list for the apt-mirror command install, so you can just skip down to “apt-get install apt-mirror” and it’s installed. At least it was for me on Armbian on the Orange Pi.
There was some confusion over the use of “apt-mirror -c filename” where it seems like it just uses /etc/apt/mirror.list in any case, so when running it I just said “apt-mirror” without any arguments.
The other complication is the mirror.list file itself. Seems to work fine for non-Raspbian things, but using the Raspbian mirror director as a site “had issues” (or needed defaultarch set, maybe?). But I’m now using the Raspbian recommeneded rsync command anyway, so that bit is being removed from my mirror.list file. Here’s what mine looks like now (I’ve bolded some parameters I changed, and note the commented out raspbian mirror director):
root@orangepione:/LVM/Repository# cat /etc/apt/mirror.list ############# config ################## # set base_path /LVM/Repository/apt-mirror # #set mirror_path $base_path/mirror #set skel_path $base_path/skel #set var_path $base_path/var #set cleanscript $var_path/clean.sh set defaultarch armhf #set postmirror_script $var_path/postmirror.sh set run_postmirror 0 set nthreads 8 set _tilde 0 # ############# end config ##############
So I changed the default directory as /usr/spool was on a small chip not a big disk… and I set the default arch to armhf (I’ll add explicit deb:arch types for the other amrel and amr64 later). Then I set the number of threads down to 8 from the default 20. I only have 4 cores and one small network pipe on this micro-SBC anyway, so running 2 threads / core plus OS is likely all that can be done effectively.
Then the config file lists the source archives you wish to mirror. The first three lines were from the /etc/apt/sources.list file on the Pi M3 running Raspbian. I commented out the mirror director and swapped in this sjc02 actual mirror instead. Then the rest are from the Armbian (generic i.e. not Devuan) on the Orange Pi. Sometime later I need to add the bits for other OS types like Fedora or ‘whatever’ as desired:
#deb http://mirrordirector.raspbian.orghttp://mirrordirector.raspbian.org deb http://mirror.sjc02.svwh.net/raspbian/raspbian main rpi deb-src http://archive.raspbian.org/raspbian/ jessie main contrib non-free rpi deb http://packages.devuan.org/merged jessie main deb http://archive.ubuntu.com/ubuntu xenial main restricted universe multiverse deb http://archive.ubuntu.com/ubuntu xenial-security main restricted universe multiverse deb http://archive.ubuntu.com/ubuntu xenial-updates main restricted universe multiverse #deb http://archive.ubuntu.com/ubuntu xenial-proposed main restricted universe multiverse #deb http://archive.ubuntu.com/ubuntu xenial-backports main restricted universe multiverse deb-src http://archive.ubuntu.com/ubuntu xenial main restricted universe multiverse deb-src http://archive.ubuntu.com/ubuntu xenial-security main restricted universe multiverse deb-src http://archive.ubuntu.com/ubuntu xenial-updates main restricted universe multiverse #deb-src http://archive.ubuntu.com/ubuntu xenial-proposed main restricted universe multiverse #deb-src http://archive.ubuntu.com/ubuntu xenial-backports main restricted universe multiverse #clean http://archive.ubuntu.com/ubuntu root@orangepione:/LVM/Repository#
I’ve also commented out the “clean” setting until I have something to clean…
So why those ubuntu archives? Well, here’s the /etc/apt/sources.list file on the Orange Pi:
root@orangepione:/LVM/Repository# cat /etc/apt/sources.list deb http://ports.ubuntu.com/ xenial main restricted universe multiverse #deb-src http://ports.ubuntu.com/ xenial main restricted universe multiverse deb http://ports.ubuntu.com/ xenial-security main restricted universe multiverse #deb-src http://ports.ubuntu.com/ xenial-security main restricted universe multiverse deb http://ports.ubuntu.com/ xenial-updates main restricted universe multiverse #deb-src http://ports.ubuntu.com/ xenial-updates main restricted universe multiverse deb http://ports.ubuntu.com/ xenial-backports main restricted universe multiverse #deb-src http://ports.ubuntu.com/ xenial-backports main restricted universe multiverse
So those bits are what Armbian is using. I need to clean this up somewhat as this is just a first experimental run. In particular, I need to figure out what I really want in the way of different architectures and builds. So armel vs armhf vs arm64 vs pc types etc. Then Armbian vs Debian vs Raspbian vs Devuan vs Ubuntu vs etc. etc. I likely don’t have enough TB laying around to hold all possible choices…
For now, I’ve swapped over to doing it “the Raspbian way” for the Raspbian parts. Then I’ll test a generic Raspbian armhf against them, then add a Devuan “upconversion” test. After that, it’s likely to be Armbian as I’m using it on a couple of the Odroids and the Orange Pi (and that Devuan upconversion).
You can have multiple hardware architectures and multiple build types in a repository. All it takes is entries in the mirror.list file and disk space…
For even more about setting up a mirror, see this link:
A Security Comment
Now one of the nice things about doing this is the added security and privacy of not reaching out to the internet for every update and installation. You can also keep better track of “what changed” since the last update. Another interesting bit is that you can choose your source for the initial load and updates from any of many other mirrors. This means you can choose who to trust for that upload, and anyone wanting to do a Man In The Middle has to figure out which site you are using. So, for example, on the Raspbian we have dozens of choices like:
Europe United Kingdom University of Oxford
North America United States Michigan Tech Linux Users Group
South America Brazil Universidade de São Paulo – USP
So you can, for example, keep your own traffic local to, say, South America or Asia if you want to make it harder for the US Government to be in the telco closet… Similarly, you could set up a VPN with an encrypted tunnel from, say, New York City to Brazil and make it just that much harder for someone to see your update traffic in the USA. Yes, things would slow down, but now you have an N x M search space for someone trying to bugger just YOUR feed of code. They would need to get to ALL the repositories or break into the encrypted tunnel (a not very easy thing to do…)
Also note that a lot of work goes into proving the repositories are sound and by using a couple of disjoint ones, you could see if the code base diverged and was suspect. (There’s encryption and key signing used at various levels to help assure things are as they ought to be, but more is being done on the Debian side to create provable binaries, where you can prove a given binary was compiled from a given source file. Over time this will get even more secure.)
For me, I’ve chosen a site just down the street. Folks wanting to get in the middle need to be near here as the bits will not normally be routed far far away.
North America United States Silicon Valley Web Hosting (http|rsync)://mirror.sjc02.svwh.net/raspbian/raspbian/
But that’s on the apt-mirror copy. I’m going to compare them a bit to the rsync copy and look for what’s more interesting / easy.
A true paranoid sys admin could make repository mirrors from several sites and compare them for differences. (Making sure the release levels match…) Flagging any odd things for further inspection.
For me, just changing source site every so often ought to be ‘enough’.
Note, too, that with a portable Pi system and a TB USB drive, you could also do the updates from places like public libraries or even your local WiFi Hotspot. Unless a targeted Man In The Middle attack is following you around town, with on the move reconfiguration and traffic monitoring, they will have a devil of a time staying in the middle. Now they must deal with N x M x P complexity. What source site you use, what VPN you use, what destination you are at. LOTS of moving windows that would need to be lined up… (For me, I have no such need. I run a blog where I publish my config files, fer cry’en out loud… it’s not like I’m hiding what I’m doing. But if running a truly security critical operation, that kind of thing can add just another level of complexity to the attacker…) But for folks on a long slow wire in the middle of nowhere, being able to load up a repository on that weekend stay in the hotel in the city can be a feature… Why watch a rerun of Conan on the TV if you can use the same bandwidth to load up your build box? ;-)
So I’d started this Repository Download (in a window to the Orange Pi) on my “internal only” box (Pi M3). After a couple of days of not posting, realized the flaw of only one monitor and swapping systems to do posting… So I’ve violated the “only internal use” on that box to make this posting. Looks like a 2nd monitor would be a Very Good Thing when doing downloads that take days on a ‘dedicated internal only’ system. OK, this chip was my Daily Driver for a year or so, so not a large exposure in using it that way again for a day or two, but it does illustrate “issues” that can come up with dedicated boxes for each use case…
I’m also getting a pristine Raspbian rsync based archive, that will be kept as is. I’ll likely only update it once a month or so ( I like stability and consistency more than ‘pot luck du jour’) and only after checking that the current build / push is not suffering “issues”. (i.e. as a way to prevent the “update and die suddenly” surprise issues… so update a test system, see it’s OK, and then update the archive…)
The other bits, the apt-mirror ones, need a bit of a think to figure out just which releases of what architectures I really want. Devuan and Debian for sure. Ubuntu to the extent I must have it for Armbian. But what about others? And do I care, at all, about Intel based PC releases any more?
This was a test run for apt-mirror, and it worked very well and very fast. Now I need to figure out what I really want; and likely do that run all over again… but it was just a couple of hours as the whole of Silicon Valley has high speed MAN (Metro Area Network) optical networks all over it… so only “last mile” to me was limiting near as I can tell.
Hopefully you can see that, while slow to build a mirror and taking a good chunk of disk space, it isn’t all that hard to do. The other “next step” is to change my systems /etc/apt/sources.list files to point at my local archive and try updating a system. But that is for another day.