fw_printenv offsets
e-squizo:
Again, trying to figure out the layout by squinting at an hex dump of the OOB data that is present every 2048 bytes, it looks to me like it contains:
24 bytes set to 0xff, regardless of the payload values;40 bytes of ECC data, apparently structured in 4 10-byte blocks, each corresponding to a 512-byte data block. The ECC for 512 contiguous 0x00 bytes seems to be all zeros, the ECC for all 0xff seems to be all 0xff.
This is consistent with info I gathered by searching the web for "nand OOB layout 4-bit ECC" (and about a dozen other combinations of search terms), although it would seem to me that those 24 initial bytes set to 0xff ought to be put to some use, but maybe they are only used if the sector is bad or something.
By the way, only data in /dev/mtd0 seems to have this layout: /dev/mtd1's and /dev/mtd2's OOB data seems to have an entirely different layout, which is consistent with the theory that /dev/mtd0 uses a different ECC than the rest.
Now, all we need is to identify the ECC, and algorithm to implement it. Of course, the code to do this ought to be available inside u-boot, since it can read/write the environment, but I haven't been able to find where in u-boot this computation is done :-(
pingtoo:
Quote from: e-squizo on March 15, 2010, 07:26:30 AM
Again, trying to figure out the layout by squinting at an hex dump of the OOB data that is present every 2048 bytes, it looks to me like it contains:
24 bytes set to 0xff, regardless of the payload values;40 bytes of ECC data, apparently structured in 4 10-byte blocks, each corresponding to a 512-byte data block. The ECC for 512 contiguous 0x00 bytes seems to be all zeros, the ECC for all 0xff seems to be all 0xff.
This is consistent with info I gathered by searching the web for "nand OOB layout 4-bit ECC" (and about a dozen other combinations of search terms), although it would seem to me that those 24 initial bytes set to 0xff ought to be put to some use, but maybe they are only used if the sector is bad or something.
By the way, only data in /dev/mtd0 seems to have this layout: /dev/mtd1's and /dev/mtd2's OOB data seems to have an entirely different layout, which is consistent with the theory that /dev/mtd0 uses a different ECC than the rest.
Now, all we need is to identify the ECC, and algorithm to implement it. Of course, the code to do this ought to be available inside u-boot, since it can read/write the environment, but I haven't been able to find where in u-boot this computation is done :-(
I wonder where I get the impression it was using CRC7 :-[
Anyway, I think I figure out the problem why the fw_setnv/fw_printenv is not working for MTD from Linux. it is because the utility is read/write the mtd device as raw stream without consider the OOB blocks. so the end result will always have CRC error because extra OOB data in the mix.
I propose we simply skip finding ECC calculation routines instead we let existing utility do the work. we can use program from mtd-utilis to dump the content from mtd0 then use the sheeva-installer supplied fw_setenv to modify the dump then use nandwrite to put modified dump back to mtd0.
Currently I am in office and do not have access to plug but I envision we can use following code:Code:
linux root# nanddump -o -b -s 0xa0000 -l 0x20000 -f /tmp/uboot-env.bin /dev/mtd0
linux root# cd /tmp
linux root# fw_setenv env_var_name=new_value #<= fw_setenv from sheeva-installer
linux root# nandwrite -a -p -s 0xa0000 /dev/mtd0 /tmp/uboot-env.bin
Above code may have some error. I am not sure the mtd-utils can take hex-decimal as argument or not. And I suspect the nandwrite may have different idea on how to fill in the OOB area that may need some patches.
e-squizo:
Quote from: pingtoo on March 15, 2010, 09:03:40 AM
Anyway, I think I figure out the problem why the fw_setnv/fw_printenv is not working for MTD from Linux. it is because the utility is read/write the mtd device as raw stream without consider the OOB blocks. so the end result will always have CRC error because extra OOB data in the mix.
No, that is definitely not the problem. fw_{print,set}env are not reading /dev/mtd0 in raw mode, and they don't get to see the OOB data. This can be seen both in the data that they read, and in the fact that whenever you use the to access the NAND, the operating system complains about unrecoverable errors once for each 2KB-block (you can see the messages in the console, or in dmesg).
This is an indication of the real problem: those programs attempt to read the data in normal mode, so the kernel tries to use the ECCs it expects in the OOB data to correct eventual errors. But the kernel expects a different ECC than the one that is actually stored in the OOB data, and so it detects an unrecoverable error --- but not before it has attempted a correction or two, so the data that the programs get is actually different from the data that is stored, hence the CRC mismatch.
All of this isn't guessing: I have actually compared the data that fw_printenv gets with the data read by nanddump -s 0xa0000 -n -o /dev/mtd0, and they are different.
Quote
I propose we simply skip finding ECC calculation routines instead we let existing utility do the work. we can use program from mtd-utilis to dump the content from mtd0 then use the sheeva-installer supplied fw_setenv to modify the dump then use nandwrite to put modified dump back to mtd0.
This is the course of action I am pursuing at the time, but I am afraid that it won't be so simple. Here's what works, and what doesn't.
To read the environment data, we can simply do nanddump -s 0xa0000 -n -o -l 0x20000 to dump it to stdout. This is the Really Simple Approach, because it doesn't take into account the possibility that the NAND may indeed have flipped a bit, in which case we are in deep doo-doo. The better solution would be to use nanddump -s 0xa0000 -n -l 0x20000 to get the data with OOB data inline, and feed that to a filter to interpret the ECC and correct any errors it finds, and maybe even check that the CRC is kosher.Either use dd, tr and sed to convert the output of the last command into a text file that can be edited with your favourite editor, or let the filter described in the previous point do this job.Once the file looks the way you want it, re-format it into a 128KB file, with leading CRC32 in big-endian order at the first 4 bytes, the environment immediately afterwards, and padded with 0x00, and write that using nandwrite. But this won't work if we attempt the simplistic approach of just doing nandwrite -x 0xa0000.
The problem with the last point is that if we attempt to write the data like that, nandwrite will write the data with a 1-bit ECC, while u-boot expects 4-bit ECC... we have the opposite problem than we had at the beginning: we can read the data, but U-Boot can't. We could use nandwrite -n to write the data without ECC, but that will leave the NAND in an inconsistent state, which u-boot will not be able to read either.
The only clean way I see out of this is to compute the ECC ourselves in user space, and use nandwrite -o to write both the payload and the precomputed OOB data to the NAND. The program that does this could also do the reformatting, padding and CRC computation described in the last step.
This makes everybody happy, the only thing standing between me and having working code is the fact that I still don't know how the ECC is computed (and no: the functions in drivers/mtd/nand/nand_ecc.c are no help, they seem to be doing 1-bit ECC).
If we figure that out, we're there.
pingtoo:
Quote from: e-squizo on March 15, 2010, 09:53:25 AM
...
No, that is definitely not the problem. fw_{print,set}env are not reading /dev/mtd0 in raw mode, and they don't get to see the OOB data. This can be seen both in the data that they read, and in the fact that whenever you use the to access the NAND, the operating system complains about unrecoverable errors once for each 2KB-block (you can see the messages in the console, or in dmesg).
I think this may be a misunderstanding or wrong terminology. the fw{print,set}env using read/write system call to do its I/O and I see no evidences about skipping OOB blocks so that is why I think it is the problem.
Since /dev/mtd0 is block device (I think) may be we can still do my suggestion with /dev/mtdchar0?
Quote from: e-squizo on March 15, 2010, 09:53:25 AM
...
The problem with the last point is that if we attempt to write the data like that, nandwrite will write the data with a 1-bit ECC, while u-boot expects 4-bit ECC... we have the opposite problem than we had at the beginning: we can read the data, but U-Boot can't. We could use nandwrite -n to write the data without ECC, but that will leave the NAND in an inconsistent state, which u-boot will not be able to read either.
The only clean way I see out of this is to compute the ECC ourselves in user space, and use nandwrite -o to write both the payload and the precomputed OOB data to the NAND. The program that does this could also do the reformatting, padding and CRC computation described in the last step.
This makes everybody happy, the only thing standing between me and having working code is the fact that I still don't know how the ECC is computed (and no: the functions in drivers/mtd/nand/nand_ecc.c are no help, they seem to be doing 1-bit ECC).
If we figure that out, we're there.
do you have u-boot source from CD or download from wiki? I see there is a drivers/mtd/nand/nand_ecc_rs.c may be that is where the 4 bit calculation code is. I read the file but it is beyond my little head can understand ;D
e-squizo:
Quote from: pingtoo on March 15, 2010, 10:29:05 AM
I think this may be a misunderstanding or wrong terminology. the fw{print,set}env using read/write system call to do its I/O and I see no evidences about skipping OOB blocks so that is why I think it is the problem.
That is why it's called "out-of-band" data: read/write system calls by themselves will skip the OOB data. If you want to read/write OOB data, you must do it thorugh the MEMREADOOB respectively MEMWRITEOOB ioctl.
Quote
Since /dev/mtd0 is block device (I think) may be we can still do my suggestion with /dev/mtdchar0?
No, you are thinking of /dev/mtdblock0. /dev/mtd0 is a character device.
Quote
do you have u-boot source from CD or download from wiki? I see there is a drivers/mtd/nand/nand_ecc_rs.c may be that is where the 4 bit calculation code is. I read the file but it is beyond my little head can understand ;D
I got it from denx's git repository. I just downloaded the sources from Marvell, and indeed, there it is! It's an implementation of Reed-Solomon, a popular ECC, which fits the interpretation of the data. I'll be working on the filters, I'm confident that I'll have a solution available soon!
Navigation
[0] Message Index
[#] Next page
[*] Previous page