• Home
  • Help
  • Search
  • Login
  • Register
Pages: [1]
Author Topic: Suspect bad RAM - how to test?  (Read 3727 times)
ronys
Newbie
*

Karma: 0
Posts: 10


View Profile
« on: February 18, 2010, 05:39:38 AM »

Hi,

I'm getting crashes whenever I do some non-trivial i/o (apt-get safe-upgrade, for example, or just copying files from one FS to another. This happens when booting of a USB drive, NFS or SD card. (I can't manage to boot of nand, which may or may not be related).

Finally, the U-Boot 'mtest function freezes as follows:
Quote
U-Boot 1.1.4 (Dec 23 2009 - 13:32:43) Marvell version: 3.4.27

Marvell>> mtest
Pattern 00000000  Writing...

Any idea how I can run more diagnostics?

I'm attaching the output of the serial port illustrating such a crash.
The system was in the middle of apt-get install rsync:

# apt-get install rsync
Reading package lists... Done
Building dependency tree
Reading state information... Done
Reading extended state information
Initializing package states... Done
Reading task descriptions... Done
The following NEW packages will be installed:
  rsync
0 packages upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 340kB of archives. After unpacking 672kB will be used.
Writing extended state information... Done
Get:1 http://ftp.uk.debian.org lenny/main rsync 3.0.3-2 [340kB]
Fetched 340kB in 3s (97.2kB/s)
Timeout, server not responding.

The dump starts with:
[ 1569.690000] Unable to handle kernel paging request at virtual address 00c90068
[ 1569.700000] Unable to handle kernel paging request at virtual address 0b97ffb0
[ 1569.700000] pgd = df054000
[ 1569.700000] [0b97ffb0] *pgd=00000000
[ 1569.700000] Internal error: Oops: 805 [#1]

Any ideas?

Rony


Rony

* minicom.cap (25.6 KB - downloaded 132 times.)
« Last Edit: February 18, 2010, 06:45:03 AM by ronys » Logged

marcus
Jr. Member
**

Karma: 5
Posts: 83


View Profile
« Reply #1 on: February 18, 2010, 01:49:20 PM »

Environment variable run_diag will perform a memory test:

http://plugcomputer.org/plugforum/index.php?topic=174.0
Logged

Juanisan
Newbie
*

Karma: 0
Posts: 20


View Profile
« Reply #2 on: March 03, 2010, 06:34:36 PM »

I have the same memory issues on one of my plugs.

I cant get the diagnostics to run when I set run_diag to yes though...

Marvell>> nand bad

Device 0 bad blocks:
Marvell>>

Quote
         __  __                      _ _
        |  \/  | __ _ _ ____   _____| | |
        | |\/| |/ _` | '__\ \ / / _ \ | |
        | |  | | (_| | |   \ V /  __/ | |
        |_|  |_|\__,_|_|    \_/ \___|_|_|
 _   _     ____              _
| | | |   | __ )  ___   ___ | |_
| | | |___|  _ \ / _ \ / _ \| __|
| |_| |___| |_) | (_) | (_) | |_
 \___/    |____/ \___/ \___/ \__|
 ** MARVELL BOARD: SHEEVA PLUG LE

U-Boot 1.1.4 (Apr 29 2009 - 13:10:05) Marvell version: 3.4.16

U-Boot code: 00600000 -> 0067FFF0  BSS: -> 006CF100

Soc: 88F6281 A0 (DDR2)
CPU running @ 1200Mhz L2 running @ 400Mhz
SysClock = 400Mhz , TClock = 200Mhz

DRAM CAS Latency = 5 tRP = 5 tRAS = 18 tRCD=6
DRAM CS[0] base 0x00000000   size 256MB
DRAM CS[1] base 0x10000000   size 256MB
DRAM Total size 512MB  16bit width
Flash:  0 kB
Addresses 8M - 0M are saved for the U-Boot usage.
Mem malloc Initialization (8M - 7M): Done
NAND:nand_bbt: ECC error while reading bad block table
512 MB

CPU : Marvell Feroceon (Rev 1)

Streaming disabled
Write allocate disabled


USB 0: host mode
PEX 0: interface detected no Link.
Net:   egiga0 [PRIME], egiga1
Hit any key to stop autoboot:  0
Marvell>> nand bad

Device 0 bad blocks:
Marvell>>

I double checked my environment before resetting
Code:
Marvell>> printenv
baudrate=115200
loads_echo=0
rootpath=/mnt/ARM_FS/
netmask=255.255.255.0
CASset=min
MALLOC_len=1
ethprime=egiga0
bootargs_end=:::DB88FXX81:eth0:none
image_name=uImage
standalone=fsload 0x2000000 $(image_name);setenv bootargs $(console) root=/dev/m
tdblock0 rw ip=$(ipaddr):$(serverip)$(bootargs_end) $(mvPhoneConfig); bootm 0x20
00000;
ethaddr=00:50:43:e4:06:26
ethmtu=1500
mvPhoneConfig=mv_phone_config=dev0:fxs,dev1:fxs
mvNetConfig=mv_net_config=(00:11:88:0f:62:81,0:1:2:3),mtu=1500
usb0Mode=host
yuk_ethaddr=00:00:00:EE:51:81
nandEcc=1bit
netretry=no
rcvrip=169.254.100.100
loadaddr=0x02000000
autoload=no
ethact=egiga0
serverip=192.168.1.3
ipaddr=192.168.1.1
bootargs_console=console=ttyS0,115200
bootargs_root=root=/dev/sda1 waitforroot=10 rootfs=ext3
bootcmd_usb=ext2load usb 0:1 0x0800000 /uinitrd; ext2load usb 0:1 0x400000 /uIma
ge
bootcmd=usb start; setenv bootargs $(bootargs_console) $(bootargs_root); run boo
tcmd_usb ; bootm 0x400000 0x0800000 ; reset
run_diag=yes
stdin=serial
stdout=serial
stderr=serial
console=console=ttyS0,115200 mtdparts=nand_mtd:0x100000@0(uboot)ro,0x0@0x100000(
uImage),0x1ff00000@0x100000(rootfs)rw
mainlineLinux=no
enaMonExt=no
enaCpuStream=no
enaWrAllo=no
pexMode=RC
disL2Cache=no
setL2CacheWT=yes
disL2Prefetch=yes
enaICPref=yes
enaDCPref=yes
sata_dma_mode=yes
netbsd_en=no
vxworks_en=no
bootdelay=3
disaMvPnp=no
enaAutoRecovery=yes

Environment size: 1371/131068 bytes
Marvell>> reset

How do I get my plug to rescan memory for bad blocks?

When I boot from a SD card and try to mount /dev/mtdblock2, I get the scan and then:

[  754.523860] Empty flash at 0x04a4fffc ends at 0x04a50000
[  754.529199] CLEANMARKER node found at 0x04a50000 has totlen 0xc != normal 0x0
[  754.537467] CLEANMARKER node found at 0x04a60000 has totlen 0xc != normal 0x0
[  754.582444] Empty flash at 0x04a6fffc ends at 0x04a70000
[  754.587788] CLEANMARKER node found at 0x04a70000 has totlen 0xc != normal 0x0
[  754.596218] CLEANMARKER node found at 0x04a80000 has totlen 0xc != normal 0x0
[  754.629328] CLEANMARKER node found at 0x04a90000 has totlen 0xc != normal 0x0
[  754.639567] CLEANMARKER node found at 0x04aa0000 has totlen 0xc != normal 0x0
[  757.404109] uncorrectable error :
[  757.407361] uncorrectable error :
[  757.410788] uncorrectable error :
[  757.414229] uncorrectable error :
[  757.417655] mtd->read(0x400 bytes from 0x1fae0000) returned ECC error
[  757.462907] JFFS2 notice: (1125) jffs2_build_xattr_subsystem: complete building xattr subsystem, 0 of xdatum (0 unchecked, 0 orphan) and 0 of xref (0 dead, 0 orphan) found.
root@sheeva:/#
« Last Edit: March 03, 2010, 07:50:38 PM by Juanisan » Logged

ronys
Newbie
*

Karma: 0
Posts: 10


View Profile
« Reply #3 on: March 03, 2010, 10:02:46 PM »

Hi,

I think you're seeing an issue with the NAND, which is the built-in flash memory.

I'm having problems with the RAM, which is separate. In any case, I've just received an RMA for the device, and am shipping it back to Globalscale Technologies for repair.

Rony
Logged

sunmao
Newbie
*

Karma: 0
Posts: 4


View Profile
« Reply #4 on: March 16, 2010, 10:35:14 AM »

Hi!

I think I have got the same problem! Please see this: http://plugcomputer.org/plugforum/index.php?topic=1455.0 Can somebody help me? Sad
Logged

NYDOC
Guest
« Reply #5 on: June 28, 2010, 07:03:51 AM »

The simplest solution to test for bad RAM is to swap it out.
Logged

tylernt
Jr. Member
**

Karma: 2
Posts: 56


View Profile
« Reply #6 on: June 28, 2010, 09:11:41 AM »

The simplest solution to test for bad RAM is to swap it out.
Better warm up your soldering iron. Wink

I've seen random segfaults too. A few when compiling MythTV, and one starting up mythfrontend. Each time, though, I just tried again and it works.

I found a memory test utility that ran from within a booted Linux system (as the popular memtest86 seems to be for x86 not ARM). Obviously, it will miss certain areas of RAM in use by the kernel, but I used it to test what RAM I could on my Sheeva and it reported no errors. Unfortunately, that means I don't know why things were segfaulting. Given GlobalScale's (in)famous issues with bad Sheeva power supplies and overheating on the GuruPlugs, I definitely suspect dodgy quality control on the hardware. Mine hasn't segfaulted in a while, so I am keeping my fingers crossed.

One thing occurs to me -- I think my segfaults all happened when I was using the debug USB-serial console. Can anyone else correlate segfaults (or kernel crashes) with the serial console being in use?

Here is the link to RAM test the utility I used: http://pyropus.ca/software/memtester/
Logged

tylernt
Jr. Member
**

Karma: 2
Posts: 56


View Profile
« Reply #7 on: August 08, 2010, 05:39:40 PM »

Followup to my own post: I think the segfaults are heat-related. I generally get them (or worse, a kernel oops requiring a reboot) when I'm driving the CPU to 100% compiling software etc in the evening when it's hottest (no A/C). Serial or no serial doesn't matter.

Yesterday I pointed an 80mm fan on my Sheeva and was able to compile MythTV for a couple of hours with no segfaults. Everyone talks about the GuruPlug heat problems, well, apparently the Sheeva can have problems too at 84F ambient.

I have an external PS on the way -- hopefully it will help keep some heat out of the enclosure.
Logged

Pages: [1]
Print
Jump to: