Fix for Raspberry Pi Model 2 Crashing

A long long time ago, I got a Raspberry Pi Model 2 and was waxing enthusiastic about using it more or less daily.

Then I “loaded it up” and it started crashing. ANY time it was over about 50% CPU and always when over about 75% CPU, the sucker would crash. Often with not much useful in the syslog file. Looked vaguely like it couldn’t “walk and chew gum” as it was worst when each processor “core” was loaded up with a different process. Especially if I was using a lot of swap. Clues, all. I suspected either low volts on heavy power draw (CPU and swap disk and memory and…) or general difficulty with handling swap well.

In general, testing showed that the Pi was not well suited to using multiple cores at once and swapping (with one test of parallel FORTRAN showing it would spread a task over 4 cores and get exactly nothing of benefit in return…)

I was dismayed, so went on to other things.

But as is my way, never let it go entirely. I’d “poke at it” from time to time. About 4? months ago I ran into a page saying “Do THIS to fix it!”… Time passes as I was not “in the mood” to go down that rat hole again.

But today I did. A slow day. I needed some “tech progress”. I wanted to use the R.Pi or know I needed to buy a different solution. OK, pop up that page and test it.

http://stevenhickson.blogspot.com/2012/10/fixing-raspberry-pi-crashes.html

Now normally I’d just quote a partial and point to the page for the rest, however:

When I went to write this article, on 2 different systems, one of them the Pi itself using IceApe browser, I was slammed over into a Google Account Login Demand. Now since I don’t use a Google Account on the Pi or on the Tablet, I didn’t see much reason to log into one just to see a web page. So, for “another day” is an exploration of just why wanting to look at a web page results in a Google demand to “log in” to a Google Account… So I’m posting this from CentOS on the old 64 bit $80 or so machine ;-)

Here’s that page, just in case you, too, get a slap in the face from Google for wanting to see it:

Fixing Raspberry Pi Crashes
The most common crash I’ve experienced/heard about posts an error that says:
raspberrypi kernel: <1-1.1:1.0: eth0: kevent 2 may have been dropped

This happens to a lot of people who are torrenting (probably using transmission) and especially to external HDDs. It tends to turn up in /var/log/messages and/or /var/log/kernel and/or /var/log/dmesg. You can cat these to see if the error is there.

There are a couple of reasons that this happens and the following has been the way I have managed to fix it (Supposedly there is a bug fix in a distant future kernel release).

Reason 1:

Memory isn't available fast enough and for some reason the kernel crashes.
I did two things to fix this.

1: Increase the number of min_kbytes by editing sysctl.conf
(using vim or nano or whatever)

Start by opening a terminal then type the following to edit the proper files.

sudo nano /etc/sysctl.conf

at the end of the file add the following line

vm.min_free_kbytes = 16384

*Note, if that doesn't help, try increasing the number
Example:

vm.min_free_kbytes = 32768

Then save and exit the file.

The second thing I did was to add an option to the boot command

sudo nano /boot/cmdline.txt

At the end of the line, add: smsc95xx.turbo_mode=N

Save, exit the file, and then reboot (sudo reboot)

Reason 2:
Your usb hub has a problem where it is creating a feedback loop.

Tape over the +5V pin on the USB cord (You should use a multimeter to find it). Fixed, though a bit sketchily.

At this happened, hub or not, I just did #1.

As of now, I have set (using vi, not nano, as my editor)

vm.min_free_kbytes = 32768

Figuring why not go “all the way” up front and if it works, fine. I can always play with backing off later.

I also set:

smsc95xx.turbo_mode=N

but I’m going to back that out first and see if it really mattered.

IMHO, the basic problem is likely that the original 8K number was set for one core in the original Pi, and nobody thought to “fix it” for a 4 core Pi with 4 times the memory demand. So make it 32k.

As of now, the PiM2 has been up and running, very stable, for a few hours. All the time running a BOINC set of processes (3 cores at 100% most of the time, occasionally 4 cores). Before this would crash it in about 10 minutes, sometime less.

UPDATE: Just as I was set to hit “publish”, the Pi crashed again. OK, a lot longer than before, but not a full “fix”. The rest of the article is unchanged, but I’m going to try an even larger memory size next. Sigh.

I also launched 2 browsers while those were going. The machine didn’t crash, though the browsers would after a few minutes of doing “fancy things” like trying to post this posting… (WordPress is terribly “chatty” and a bit of a resource hog, especially posting, as every word gets spell checked again every minute or so). That might be more of a browser issue, as IceApe is a bit old and, er, deprecated. Epiphany is just a strange duck anyway and nobody seems to care about doing things that make it crash.

The “bottom line” for me is that this memory fix seems to have made the R.PiM2 now stable for heavy core use as a background processor. It isn’t quite tuned up yet enough to be run 100% “to the wall” and layer interactive things (like browsers) on top of that; but that’s OK. I usually divide those roles unless stress testing a box…

I guess now I can get back to all those Raspberry Pi Server projects I put on hold when the system wasn’t able to walk and chew gum at the same time ;-) ( I can’t bring myself to build systems on top of a box that crashes whenever the load goes over 75%…) now that it is able to run 3 cores “at the wall”, along with X-windows, and more, well, I’m “good with that”. Even if a bit more tuning is needed to make it as robust as an old VAX-780 when being hammered with tasks ;-)

Though I do note that this “difficulty” continues to point to the combined ethernet / USB / IO to Chip system as the weak spot. The I/O just isn’t balanced against the 4 cores. (Not really surprising as the original design balance was for 1 core). The implication is that for things like file servers and heavy I/O servers, the systems with a built in SATA are likely to be superior. Which implies that my present inventory of Pis is enough for any use that is not I/O intensive and my next “buy” will be something non-Pi.

Final note: Not sure what that whole “Google tongue down my throat login or nothing” behaviour was about, but I’m not the sort to put up with it. I’m happy to not visit “blogspot” web pages if that’s what it takes. Especially not sure why it would show up in the R.Pi browser (something thinking it is a tablet or picked up a Google Cookie somewhere or ???) If anyone “has clue” how to spike it, feel free to post a pointer. For now I’m happy to just mover over the old CentOS box and “flow around it”.

Despite what Google might think, the world is not all possessed of Google Accounts and even those that have one, don’t feel compelled to log into it constantly all day long just so Google can harvest our privacy. I’m happy to just not use Google “services” if that is what they want…

Subscribe to feed

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in Tech Bits and tagged , , , . Bookmark the permalink.

9 Responses to Fix for Raspberry Pi Model 2 Crashing

  1. E.M.Smith says:

    Well, with an hour and 12 minutes on the restart, it is being stable again. This time I took the cover off of the case… This particular case is entirely enclosed plastic (no vents) and I had my doubts about the heat transfer characteristics.

    The prior crashes had all been with a R.Pi M2 with heat sinks and with a very ventilating case, and this one had it ‘way bad’ too. That the ‘fix’ worked on this one “for a while” made me wonder about “heat too?”…

    So at this point I think this “fix” likely IS a fix, and that running the chip 100% to the wall without a heat sink inside a closed case is a separate ‘issue’.

    We’ll see. I’m leaving it running a BOINC task (case top removed) and if it makes it, it makes it.

    FWIW, the chip is “very hot” to the touch, but only marginally has any pain. So I’d put it about 140 F surface temp (lid off the case full air flow).

    My preliminary conclusion from this is that for “100% load servers” you need the memory fix, and heat sinks and a ventilated case. This case is likely OK for the usuall sporadic load desktop use envelope. I’ll likely move this particular board to a ventilated case and “move on” if this test is OK (I have a spare ventilated case) or I’ll dedicated it to low CPU demand things like file server / DNS server / PXE boot server…

    Oh, the joys of being a guy who likes to slam hardware / software against the wall and ask it to “make my day” ;-0

  2. John Silver says:

    Bigger heat sink and a fan or put in a chimney, open top and bottom.

  3. p.g.sharrow says:

    @EMSmith; Good to hear that you have been able to get all 4 cores to work under full load. 140F with no real cooling is amazing for this kind of computing load. Your WAG of 140F for really hot but not too hot sounds about right to me. No doubt that the old Pi case has inadequate ventilation designed in. Upright card mount style for natural convection might good enough for most uses. Hard to imagine a real use for continues 100% 4 core loading under most conditions. But you likely will come up with something…;-)…pg

  4. E.M.Smith says:

    Well, I’ve managed to scrog the chip in this board. (Scrog may be jargon… it means to really chew up and screw up something). I did an apt-get update and and apt-get upgrade but didn’t bother to shut off the BOINC runs… it carshed mid upgrade. Then complained on reboot and so I did an apt-get -f and it then proceeded to not ever boot again…

    Don’t know if I made a backup copy of this BOINC chip, or excactly what my BOINC accounts were ;-)

    I’d be more bothered if I wasn’t so non-bothered ;-)

    @John Silver:

    My other R.PiM2 board has heat sink fins and ventilated case, so I’ll likely try it when I find where it’s gone off to… the stick on heat sinks are very cheap and I’ll order a set or two…

    I’ve moved the board to my other ventilated case anyway. This case can have the lid left off.

    @P.G.:

    You would be amazed what all I can find that takes all available computes in the world and then some… Ignoring the obvious encryption cracking…

    There’s Golomb rulers,
    Pi, Phi, any transendental you like, really,
    BOINC in all it’s dozen or so variations of projects,
    BitCoin mining,
    all the FooCoin knockoff mining,
    Stock pattern analysis and prediction,
    AI stock trading,
    A Giant Beowulf for general purpose uses,
    Climate Models, the Dark Knight Attacks!,
    Robotic Intellegence (that buddy who does robots… his machines need a bigger brain, IMHO),
    The Berzerker Against Robots (for when the above works a bit too well…)
    and so much more…

    I could run about 20,000 cores of the fastest Intel Chip full boat boogie for a few years and not have run out of ‘things to do’…

    I just torment the Raspberry Pi dinky 4 cores out of a sense of duty to note let cycles go to waste ;-)

    At any rate, it looks like “more to do” at this point with the issue of heat load clearly “part of it” but with the memory management not fully resolved yet either.

    I know, i know… I could just use 1 or 2 cores at a time and only burst to 4 a few seconds out of every minute… but where’s the fun in that? Besides, I bought a 4 core machine. that means 4 cores, 100% duty cycle, 24 x 7 365; perferably 10% overclocked ;-)

  5. pinroot says:

    I noticed a problem I once had which had something to do with the ethernet/USB/IO chip while trying to run a specific distro of XBMC (I forget which one, there were three popular ones for the RPi out at the time). Basically, I could use ethernet, or USB, but not both at the same time. I never could figure it out (it took a while to actually realize what was happening) and eventually switched to a different distro of XBMC, which ran with no problems.

    Another thing that I’ve seen when running XBMC on an RPi was a small rainbow colored square in the upper right-hand of the screen that sort of faded in and out. I never could figure out what it was, but I did notice that any time I wanted to use the media center, I had to reboot it. It was either locked up or crashed. Eventually while looking into something entirely different, I stumbled across a post somewhere stating that if you saw that rainbow colored square, it meant your power supply was under-powered. Upgrading the power supply fixed it.

    I’ve also noticed that they tend to run more stable with the heatsinks and ventilated case. Right now all I’m using one for is a file and ftp server, so I don’t even bother to run X, which saves me some cycles, but I do have some things I’d eventually like to get around to doing… one of these days :)

  6. wyoskeptic says:

    Just to add cycles to your musing, E.M., I hit the website with Firefox (& Add Block Plus) via a Toshiba laptop with winnders 7 and it came up just fine.

    I note that it is a Blogger dot com site. I tried using that place for a time to blog and the easiest way to access the admin for me was via a Gmail account sign-on. I hate that “community gotta know what you are doing” over all the various sites Google has its thumbs into, so I don’t do much there any more. Anyway, it might be some sort of cookie type issue or some of that reputed Google “Secret” cookie issues I have heard some mumbling about here and there. I got too much on my plate to do any more digging that that at the moment.

    (Funny, I thought retirement meant you had lots of time to do anything you wanted …)

  7. J says:

    Do you have the little aluminum heatsink on the ARM CPU?
    I had a new Pi2 kit from canakit:

    http://www.canakit.com/raspberry-pi-starter-ultimate-kit.html

    You can see a picture of the heatsink at the link.
    I did attach it to the processor.
    I probably never loaded it to the same level you are trying.

  8. E.M.Smith says:

    @J:

    Yeah, if you look at my Pi.M2 unboxing story, I show photos of my sticking on the little heat sinks on my first one. It was the one that first manifested the crashing “issue”. This latest run was on my newer one, without heatsinks, and it crashed much less with the memory change; but still crashed. It is the one with my estimated temperature from finger application.

    It was in a more or less completely enclosed ‘red with white top’ plastic case. Taking the top off improved things some. Next I was going to test the ‘with heat sinks and memory change without top’ but blew my test BOINC chip doing an update upgrade when it crashed… so I have to recover it first.

    In the mean time, I’m using the ‘with heatsinks and ventilated case with memory change’ as my daily driver / test bed but without the “4 cores to the wall” BOINC load.

    FWIW, as a Sysadmin Guy into Big Iron, one of the things you always do as part of performance evaluation and / or acceptance testing is put it “pedal to the metal” and see if anything smokes after a day or three… I’d regularly run machines at 100% capacity for “days at a time” at various employers. For a Cray, it’s SOP. You run some low priority ‘scavenger’ program niced to oblivion that is just there to keep the machine fully loaded up to 100% and gain value from those cycles. At about $1500 / CPU-hour you don’t want to be losing a few CPU-hours per day to the idle daemon…

    So when testing hardware, one of the things I do is load it to 100% CPU with some CPU intensive tasks and see what happens. Similarly, I’ll set some I/O killer loose and see how it handles heavy constant I/O load (note that a prior posting on SD cards / USB drives found that PNY was very fast, for a little while, then slowed WAY down after some number of MB or small GB. So they have a fast buffer of some sort to make it LOOK fast, but most of the capacity is quite slow. Not nice! That’s the kind of thing you find when you make the machine grind it out for a few hours…) Oh, and memory thrashing can be fun, plus making swap heavily used. And the video drivers… Then, when all those are shown OK, you try loading up as much of all of them as possible at the same time… Really robust hardware and software doesn’t even glitch, just gets smoothly bogged down but at full total throughput. Crappy hardware and software gets glitchy, then bursty and with long pauses, and eventually hangs and crashes…

    But damn it, if I PAID for 4 cores at 1000 MHz, then I ought to be able to RUN 4 cores at 1000 MHz 24 x 365… (the 1/4 day is for annual maintenance ;-)

    I guess it’s a High Performance Computing Attitude thing ;-)

  9. Pingback: A Tale Of Two Linux Minimal Boxes | Musings from the Chiefio

Anything to say?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s