TECH SOLUTION

TECH SOLUTION

Sunday, 20 October 2013

TROUBLESHOOTING OF SOFT KERNEL PANIC



How to Troubleshoot a Soft Kernel Panic

Soft panics symptoms: 
1. Much less severe than hard panic. 
2. Usually results in a segmentation fault. 
3. Can see an oops message search /var/log/messages
for string Oops
4. Machine still somewhat usable (but should be
rebooted after information is collected).
 
Soft panics causes:

  •  Almost anything that causes a module to crash when it is not within an interrupt handler can cause a soft panic.
  •  In this case, the driver itself will crash, but will not cause catastrophic system failure since it was not locked in the interrupt handler.
  •  The same possible causes exist for soft panics as do for hard
    panics (i.e. accessing a null pointer during runtime).


Soft panics information to collect:

  •  When a soft panic occurs, the kernel will generate a dump that contains kernel symbols this information is logged in /var/log/messages.
  •  To begin troubleshooting, use the ksymoops utility to turn kernel symbols into meaningful data.

To generate a ksymoops file: 

  1. Create new file from text of stack trace found in /var/log/messages. Make sure to strip off timestamps, otherwise ksymoops will fail.
  2.  Run ksymoops on new stack trace file:
  3. Generic: ksymoops -o [location of Dialogic drivers] filename

Example: ksymoops -o /lib/modules/ 2.4.18-5/ misc ksymoops.log

All other defaults should work fine 
For a man page on ksymoops, see the following webpage:


So you Are trying to start Linux for the first time and what You get messages like:

¤ Unable to mount root device. 

¤ Kernel panic - not syncing.


What to do now? 

(1)  The first part of the system that starts running is the boot loader,usually grub. This is the program that loads Linux, and/or Windows if you so desire. (The master boot record,or MBR, enables the computer to load grub.)

(2)    The first thing that Grub needs to know is where is the kernel? It gets this from the /boot/grub/grub. conf file.

(3)     The way that you specifythe correct drive and partition in Grub is a little
different from, like (hd0,0) what you use in ordinary Linux. The kernel will be in some file named vmlinuz

(4)    Once Grub has loaded the kernel into memory, the first thing that the kernel needs to know is, where is the root filesystem? The root= parameter is passed to the kernel to provide this information.

(5)    Notice that now you are talking to Linux, and you identify devices in Linux terms, like /dev/hda2

(6)     Given this information, Linux is going to try to mount the root filesystem  prepare it for use.

(7)    The most common mistake at this point is that you have specified the wrong device in step #3.

(8)     Unfortunately, the message that results is rather nasty looking When Linux doesnt know how to proceed, as in this case, it says kernel panic and it stops.

(9)     But, even then, it tries to go down gracefully. It tries to write anything to disk that hasnt been written out (an operation called syncing, for some darn-fool reason), and if it succeeds in doing so it will say not syncing.

(10)  What’s totally misleading about this message combination is that it implies, incorrectly, that the reason for the panic is not syncing,when actually the reason for the panic will be found in the preceding few lines.

(11)  You might see the message, tried to kill init That really means that a program called init died which it is not allowed to ever do.


(12)  init is a very special program in Linux the first program created when the machine starts.

(13)  So, basically, when you get these messages on startup­ the situation is really a lot more dreadful looking than it actually is.


(14)  You have probably just made a type when entering the information in grub.conf.(Another common place to make a typo is in /etc/fstab,
which tells Linux where all the other drives are.)

(15)  So what do you do? If you are doing a first-time install you can just start over. Otherwise, you need to boot a separate CD-ROM, which will give you a stand-alone Linux installation from which you can edit the offending files.

Explained: kernel panic - not syncing - attempted to kill init

(16)  When the kernel gets into a situation where it does not know how to proceed (most often during booting, but at other times), it issues a kernel panic by calling the panic (msg) routine defined in kernel/panic. c. (Good name, huh?)

(17) This is a call from which No One Ever Returns. The panic() routine adds text to the front of the message, telling you more about what the system was actually doing when the panic occurred basically how big and bad the trail of debris in the filesystem is likely to be.

(18) This is where the not syncing part comes from, and when you see that, its good.(panic() does try to issue a sinc() system-call to push all buffered data out to the hard-disks before it goes down.)

(19)  The second part of the message is what was provided by the original call to panic(). For example, we find panic(Tried to kill init) in kernel/exit. c.

(20)  So, what does this actually mean? Well, in this case it really doesnt mean that someone tried to kill the magical init process (process #­), but simply that
it tried to die.

(21)  This process is not allowed to die or to be killed. When you see this message, it’s almost always at boot-time, and the real messages­ the cause of the
actual failure ¡­ will be found in the startup messages immediately preceding this one.

(22) This is often the case with kernel-panics.

(23)  init encountered something really bad, and it didnt know what to do, so it died, so the kernel died too.

(24)  BTW, the kernel-panic code is rather cute. It can blink lights and beep the system-speaker in Morse code. It can reboot the system automatically.
Obviously the people who wrote this stuff encountered it a lot

No comments:

Post a Comment