NAND technical review

classic Classic list List threaded Threaded
58 messages Options
123
Reply | Threaded
Open this post in threaded view
|

NAND technical review

Jonathan Larmour-2
As per my ecos-discuss mail just now, I would like to get going straight
away with a public discussion of the _technical_ merits of both NAND
implementations. There is a risk of rehashing old ground, but I'm sure in
both cases things have moved on a bit since the last time round, not least
in response to comments, so it would also be good to clarify the current
state.

I think at first the ball is really in Ross/eCosCentric's court to give
the technical rationale for the decision, so I'd like to ask him first to
give his rationale and his own perspective of the comparison of the
pros/cons. I think the primary onus of the legwork is on eCosCentric, not
least because they saw Rutger's version before implementation - although
that was an early version, so it's entirely possible things have changed
now. Obviously I would especially like Rutger's view on whether any
purported benefits of eCosCentric's implementation are really the case,
and any claimed disadvantages of his own are plausible. I suspect some of
this to come down to subjective opinions of course.

But this is an open discussion, so I'd appreciate anyone's views. I'd
especially value Simon Kallweit's views as someone who has actually used
both code implementations which gives him a very good perspective.
Although if anyone wants to contribute, please keep it on topic, within
this thread, and technical.

Thanks. Over to Ross....

Jifl
--
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine
Reply | Threaded
Open this post in threaded view
|

Re: NAND technical review

Ross Younger-3
Jonathan Larmour wrote:
> I think at first the ball is really in Ross/eCosCentric's court to give
> the technical rationale for the decision, so I'd like to ask him first
> to give his rationale and his own perspective of the comparison of the
> pros/cons.

Here goes with a comparison between the two in something close to their
current states (my 26/08 push to bugzilla 1000770, and Rutger's r659).
For brevity, I will refer to the two layers as "E" (eCosCentric) and "R"
(Rutger) from time to time.

Note that this is only really a comparison of the two NAND layers. I have
not attempted to compare the two YAFFS porting layers, though I do mention
them in a couple of places where it seemed relevant.

BTW: I will be off-net tomorrow and all next week, so please don't think I
am ignoring the discussion...


1. NAND 101 -------------------------------------------------------------

(Those familiar with NAND chips can skip this section, but I appreciate
that not everybody on-list is in the business of writing NAND device
drivers :-) )

(i) Conceptual

A chip comprises a number of blocks (a round power of two).

Each block comprises a number of pages (another power of two).

Each page has a "main" data area (512 or 2048 bytes on current devices) and
a "spare" - aka out-of-band or OOB - area (16 or 64 bytes respectively).
It's up to the driver and application to decide how they will use the spare
area, but it's usual for some of it to be given over to storing ECC data,
and there is space for a factory-bad marker (see below).

Programming the chip must be performed a page at a time (sometimes a 512
byte subpage).

Erasing must be performed a whole block at a time.

By way of illustration, in the chip spec sheet I have to hand (Samsung
K9F1G08 series):
* 1 page = 2k byte + 64 spare
* 1 block = 64 pages
* The whole chip has 1024 blocks, making for 128MB (1Gbit) of data and 4MB
(32Mbit) of spare area.


Now, I mentioned ECC data. NAND technology has a number of underlying
limitations, importantly that it has reliability issues. I don't have a full
picture - the manufacturers seem to be understandably coy - but my
understanding is that on each page, a driver ought to be able to cope with a
single bit having flipped either on programming or on reading. The
recommended way to achieve this is by storing an ECC in the spare area: the
algorithm published by Samsung is popular, requiring 22 bits of ECC per 256
bytes of data and able to correct a 1 bit error and detect a 2 bit error.

There is also the question of bad blocks. Again, full details are sketchy. A
chip may be shipped with a number of "factory-bad" blocks (e.g. up to 20 on
this Samsung chip); they are marked as such in their spare area. (What
constitutes a "bad" block is not published; one imagines that the factory
have access to more test information than users do and that there may be
statistical techniques involved in judging the likely reliability of the
block.) Blocks may also fail during the life of the device, usually by the
chip reporting a failure during a program or erase operation. Because of
this, the manufacturers recommend that chip drivers scan the device for
factory-bad markers then create and maintain a Bad Block Table throughout
the life of of the device. How this is done is not prescribed, but the
behaviour of the Linux MTD layer is something approximating a de facto standard.


(ii) Chip comms protocol

Getting data into and out of the chip involves a simple protocol sequence.

Commands are single bytes; addresses are sequences of a few bytes depending
on the chip size and the operation invoked.

For example, to read a page of data on the spec sheet I have to hand is:
* Write 0x00 into the command latch
* Write the four address bytes in turn into the address latch
* Write 0x30 into the command latch
* Chip signals Busy; wait for it to signal Ready
* Read out (up to) 2112 bytes of data.

However, not all chips are quite the same. The ONFI initiative is an attempt
to standardise chip protocols and most new chips should comply with it. A
number of chips on the market are _nearly_ ONFI-compliant: deviations
typically occur over the format of the ReadID response and that of an
address. I believe that older chips did their own thing entirely.


(iii) Electrical

Most, if not all, NAND chips have the same broad electrical interface.

There is a master Chip Enable line; nothing happens if this is not active.

Data flows into and out of the chip via its data bus, which is 8 or 16 bits
wide, mediated by Read Enable and Write Enable lines.

Commands and addresses are sent on the data bus, but routed to the
appropriate latches by asserting the Address Latch Enable or Command Latch
Enable lines at the same time.

There is also a ready/busy line which the driver can use to tell when an
operation is in progress. Typical operation times from the Samsung spec
sheet I have to hand are 25us for a page read, 300us for a page program, and
2ms for a block erase.


(iv) Board hook-up

What's more interesting is how the lines are hooked up to the board.

It is quite commonplace for a board based on a SoC to make good use of an
onboard memory controller or dedicated NAND controller. This allows the
controller to be programmed with the electrical profile the chip expects,
which makes life easy for the device driver: often, you just have to write
bytes to the relevant MMIO register address as fast as you wish and the
controller takes care of the rest.

If the NAND lines are connected to the CPU only as GPIO, the driver has a
lot of work to do in conforming to the correct signal profile at every step
of the chip protocol. (I haven't had to produce such a port, and I don't
think Rutger has needed one either, though he has produced an untested
example driver.)

In the case of a dedicated NAND controller, it is common to provide
hardware-assistance for ECC calculation. Where available, this provides a
significant speed-up (about 40% per page in my benchmarking).

Sometimes the ready/busy line isn't wired in or requires a jumper to be set
to route it. This can be worked around: for a read operation, one can just
insert a delay loop for the prescribed maximum time, while for programs and
erases, most (all?) chips have a "Read Status" command which can be used to
query whether the operation has completed.

It can be beneficial to be able to set up the ready/busy line as an
interrupt source, as opposed to having to poll it. Whilst there is an
overhead involved in context-switching, if other application threads have
much to do it may be advantageous overall for the thread waiting for the
NAND to sleep until woken by interrupt.

Of course, it is possible to put multiple chips on a board. In that case
there needs to be a way to route between them; I would expect this to be
done with the Chip Select line, addressed either by different MMIO addresses
or a separate GPIO or CPLD step. Theoretically, multiple chips could be
hooked up in parallel to give something that looks like a 16 or 32-bit
"wide" chip, but I have never encountered this in the NAND world, and it
would impose a certain extra level of complexity on the driver.


2. Application interface -----------------------------------------------

Both layers have broadly similar application interfaces.

In both layers, an application must first use a `lookup' call which provides
a pointer to a device context struct. In Rutger's layer, devices are
identified by device number; in eCosCentric's, by a textual name set in the
board HAL.

Both layers provide a means of finding out about the device. R's provides
a call which returns an info block; E's provides macros which retrieve
information from the device struct (which may also be queried directly).

The basic operations required are reading a page, programming a page and
erasing a block, and both layers provide these.

The page-oriented operations optionally allow read/write of the page spare
area. These operations also automatically calculate and check an ECC, if the
device has been configured to do so. Rutger's layer has an extra hook in
place where an application may explicitly request the use of cached reading
and writing where the device supports this.

Both layers also support the necessary ancillary operations of querying the
status of a block in the bad-block table, and marking a block as bad.


(a) Partitions

E's application interface also provides logic implementing partitions.
That is to say, all access to a NAND array must be via a `partition';
the NAND layer sanity-checks whether the requested flash page or block
address is within the given partition. This is quite a lightweight
layer and hasn't added much overhead of either code footprint or
execution time.

The presence of partitions in E's model was controversial, as are its
fine details. Nevertheless, some notion of partitioning turns out to be
essential on some boards. In some recent work for a customer we identified
three separate regions of NAND: somewhere to put the boot loader (primary,
as booted by ROM, and RedBoot), somewhere for the application image itself
(perhaps FIS-like rather than a full filesystem), and a filesystem for the
application to use as it pleases.


R's interface does not have such a facility. It appears that, in the event
that the flash is shared between two or more logical regions, it's up to
higher-level code to be configured with the correct block ranges to use.


(b) Dynamic memory allocation

R's layer mandates the provision of malloc and free, or compatible
functions. These must be provided to the cyg_nand_init() call.

E's doesn't; instead it declares a small number of static buffers.

Andrew Lunn opined on 6/3/09 that R's requirement for malloc is not a major
issue because the memory needs of that layer are well-bounded; I think I
broadly agree, though the situation is not ideal in that it forces somebody
who wants to use a lean, mean eCos configuration to work around.

Also note that if you're going to run a full file system like YAFFS, you
can't avoid needing malloc, but in an application making simpler use of
NAND, it's an overhead that you may prefer to avoid.


3. Driver model --------------------------------------------------------

The major architectural difference between the two NAND layers is in their
driver models and the degree of abstraction enforced.

In Rutger's layer, controllers and chips are both formally abstracted. The
application talks to the Abstract NAND Chip, which has (hard-coded) the
basic sequences of commands, addresses and data required to talk to a NAND
chip. This layer talks to a controller driver, which provides the nuts and
bolts of reading and writing to the device. The chip driver is also called
by the ANC layer, and provides the really chip-specific parts.

The call flow looks something like this (best viewed in fixed-width font):

Application --(H)-> ANC --(L)-> Controller driver
                       \
                        \-(C)-> Chip driver

H: high-level interface (read page, program page, erase block; chip
(de)selection)
L: low-level interface (read/write commands, addresses, data; query the busy
line)
C: chip-specific details (chip init, parse ReadID, query factory-bad marker)


In eCosCentric's layer, a NAND driver is a single abstraction covering chip
init and querying the factory-bad status as well as the high level functions
(reading a page, etc). It is left to the driver to determine the sequence of
commands to send. How the driver interacts with the device is considered to
be a contract only between the driver and the relevant platform HAL, so is
not formally abstracted by the NAND layer.

E's chip drivers are written as .inl files, intended to be included by the
relevant platform HALs by whichever source file provides the required
low-level functions. The lack of a formal abstraction is an attempt to
provide a leaner and meaner experience at runtime: the low-level functions
can be (and indeed are, so far) provided as static inlines.

The flow looks like this:

Application --(H1)-> NAND layer --(H2)-> NAND driver --(L*)-> Platform HAL

H1: high-level calls (read page, program page, erase block)
H2: high-level calls (as H1, plus device init and query factory-bad marker)
L*: low-level calls, like L above but not formally abstracted


The two models have pros and cons in both directions.

- As hinted at above, the static inline model of E's low-level access
functions is expected to turn out to have a lower function call (and,
generally, code size) overhead than R's.

- R's model shares the command sequence logic amongst all chips,
differentiating only between small- and large-page devices. (I do not know
whether this is correct for all current chips, though going forwards seems
less likely to be an issue as fully-ONFI-compliant chips become the norm.)
If multiple chips of different types are present in a build, E's model
potentially duplicates code (though this could be worked around; also, an
ONFI driver ought to be written).

- A corollary of arguably inconsequential import: R's model forces the synth
driver to emulate an entire NAND chip and its protocol. E's synth doesn't
need to.

- E's high-level driver interface makes it harder to add new functions
later, necessitating a change to that API (H2 above). R's does not; the
requisite logic would only need to be added to the ANC. It is not thought
that more than a handful such changes will ever be required, and it may be
possible to maintain backwards compatibility. (As a case in point, support
for hardware ECC is currently work-in-progress within eCosCentric, and does
require such a change, but now is not the right time to discuss that.)


It would perhaps be interesting to compare the complexities of drivers for
the two models, but it's not readily apparent how we would do that fairly.

Perhaps porting a driver from one NAND layer to the other would be a useful
exercise, and would also allow us to compare code sizes. Any suggestions or
(he says hopefully) volunteers? I've got a lot on my plate this month...


4. Feature/implementation differences ------------------------------------

(I don't consider these to be significant issues; whilst noteworthy, I don't
think they would take much effort to resolve.)

(a) Documentation

The two layers' documentation differ in their depth and layout; these are
difficult for me to compare objectively, and I would suggest that a fresh
pair of eyes compare them.

I can only offer the comment that I documented the E layer bearing in mind
what I considered to be missing from the R layer documentation: it was not
clear how the controller and chip layers inter-related, nor where to start
in creating a driver. (I also had a lot less experience of NAND chips then
than I do now, and what I need to know now is different from what a newbie
would.)

(b) Availability of drivers

R provides support for:
- One board: BlackFin EZ-Kit BF548 (which is not in anoncvs?)
- One chip: the ST Micro 0xG chip (large page, x8 and x16 present but
presumably only tested on the x8 chip on the BlackFin board?)
- A synthetic controller/chip package
- A template for a GPIO-based controller (untested, intended as an example only)

I seem to remember rumours of the existence of a driver for a further
chip+board combination, but I haven't seen it.

E provides support for:
- Two boards: Embedded Artists LPC2468 (very well tested); STM3210E (largely
complete, based on work by Simon K; some enhancements planned)
- Two chips: Samsung K9 family (large page, only x8 done so far); ST-Micro
NANDxxxx3A (small page, x8) (based on work by Simon K)
- Synthetic target. This offers more features than R's: bad block injection,
logging, and a GUI interface via the synth I/O auxiliary.
- Further (customer-confidential) board ports.

(c) RedBoot support

E have added some commands for NAND operations and tested on the EA LPC2468
board. (YAFFS support works via the existing RB fileio layer; nothing really
needed to be done.)

(d) Degree of testing

There are presumably differences of coverage here; both E and R assert they
have carried out stress tests. Properly comparing the depth of the two would
be a job for fresh eyes.

E have:
- a handful of unit and functional tests of the NAND layer, and a benchmarker
- a number of YAFFS functional tests, one of which includes benchmarking,
and a further severe YAFFS stress test: these indirectly test the NAND
layer. (The latter has been run under the synth driver with bad-block
injection turned on, and has revealed some subtle bugs which we probably
wouldn't otherwise have caught.)
- the ability to run continual test cycles in their test farm


5. Works in progress -----------------------------------------------------

I can of course only comment on eCosCentric's plans, but the following work
is in the pipeline:

* Expansion of the device interface to better allow efficient hardware ECC
support (in progress)
* Hardware ECC for the STM3210E board driver
* Performance tuning of software ECC and of NAND low-level drivers
* Partition addressing: make addressing relative to the start of the
partition, once and for all
* Simple raw NAND "filesystem" for use by RedBoot (see
http://ecos.sourceware.org/ml/ecos-devel/2009-07/msg00004.html et seq; those
are the latest public mails but not the latest version of my thinking, which
I will update in due course)
* More RedBoot NAND utility commands
* Support for booting Linux off NAND and for sharing a (YAFFS) NAND-resident
filesystem
* Part-page read support (would provide a big speed-up to parts of YAFFS2
inbandTags mode as needed by small-page devices like that on the STM3210E)

--------------------------------------------------------------------------


Ross

--
Embedded Software Engineer, eCosCentric Limited.
Barnwell House, Barnwell Drive, Cambridge CB5 8UU, UK.
Registered in England no. 4422071.                  www.ecoscentric.com
Reply | Threaded
Open this post in threaded view
|

Re: NAND technical review

Jonathan Larmour-2
Hi Ross,

First thanks very much for all this. Quite a bit to digest but only
because it's extremely useful. Sorry for the number of questions I have -
it's not meant to be inquisitorial, but obviously I need to get to the
bottom of certain issues.

I've added Rutger to the CC as he may be able to comment on some of the
issues I raise.

You can assume tacit acceptance/understanding of whatever I haven't
commented on.

Ross Younger wrote:
> Here goes with a comparison between the two in something close to their
> current states (my 26/08 push to bugzilla 1000770, and Rutger's r659).

FWIW, Rutger is now up to r666.

> However, not all chips are quite the same. The ONFI initiative is an attempt
> to standardise chip protocols and most new chips should comply with it. A
> number of chips on the market are _nearly_ ONFI-compliant: deviations
> typically occur over the format of the ReadID response and that of an
> address. I believe that older chips did their own thing entirely.

Good ONFI support should be the highest priority as that's the way
everything is likely to go, although we do need the others too. OTOH, my
experience of NOR flash chip interfaces is that standard specs are all
well and good, but manufacturers still like to add their own touches. So I
suspect ONFI will probably correspond to a common subset of functionality,
but more would want to be done to improve support for individual chips in
due course.

> It can be beneficial to be able to set up the ready/busy line as an
> interrupt source, as opposed to having to poll it. Whilst there is an
> overhead involved in context-switching, if other application threads have
> much to do it may be advantageous overall for the thread waiting for the
> NAND to sleep until woken by interrupt.

Personally I would expect use as an interrupt line as the main role of the
ready line.

> Of course, it is possible to put multiple chips on a board. In that case
> there needs to be a way to route between them; I would expect this to be
> done with the Chip Select line, addressed either by different MMIO addresses
> or a separate GPIO or CPLD step. Theoretically, multiple chips could be
> hooked up in parallel to give something that looks like a 16 or 32-bit
> "wide" chip, but I have never encountered this in the NAND world, and it
> would impose a certain extra level of complexity on the driver.

Have you found on-chip (SoC's) NAND controllers permit such a
configuration? If not, I would assume that it's not an expected hardware
configuration. Rutger's layer does allow multiple chips per controller,
but AFAICT that's just in the straightforward way.

What problems would you see, if any, using your layer with the same
controller and two completely different chips, of different geometry? Can
you still have a common codebase with other (different) platforms?

Is anyone aware of NAND chips with different sized blocks? Analogous to
bootblocks with NOR (I haven't, but others will undoubtedly have seen more
parts than I). Although it's possible that even if they're not around or
common now, they may be in future. Unfortunately from what I can tell
neither layer would be able to support that directly, although I think it
may be possible for the eCosCentric layer to allow the driver to pretend
there is a different NAND chip. Do you think so too?

> 2. Application interface -----------------------------------------------
>
> Both layers have broadly similar application interfaces.
>
> In both layers, an application must first use a `lookup' call which provides
> a pointer to a device context struct. In Rutger's layer, devices are
> identified by device number; in eCosCentric's, by a textual name set in the
> board HAL.

A device number does seem to be a bit limiting, and less deterministic.
OTOH, a textual name arguably adds a little extra complexity.

I note Rutger's layer needs an explicit init call, whereas yours DTRT
using a constructor, which is good.

> The basic operations required are reading a page, programming a page and
> erasing a block, and both layers provide these.

However I believe Rutger's supports partial page writes (use of 'column'),
whereas I don't believe eCosCentric's does.

> The page-oriented operations optionally allow read/write of the page spare
> area. These operations also automatically calculate and check an ECC, if the
> device has been configured to do so. Rutger's layer has an extra hook in
> place where an application may explicitly request the use of cached reading
> and writing where the device supports this.

That seems like a useful potential optimisation, exploiting underlying
capabilities. Any reason you didn't implement this?

I could also believe that NAND controllers can also optimise by doing
multiple block reads, where this hint would also prove useful.

> Both layers also support the necessary ancillary operations of querying the
> status of a block in the bad-block table, and marking a block as bad.

Does your implementation _require_ a BBT in its current implementation?
For simpler NAND usage, it may be overkill e.g. an application where the
number of rewrites is very small, so the factory bad markers may be
considered sufficient.

> (a) Partitions
[snip]
> R's interface does not have such a facility. It appears that, in the event
> that the flash is shared between two or more logical regions, it's up to
> higher-level code to be configured with the correct block ranges to use.

In yours, the block ranges must be configured in CDL. Is there much
difference? I can see an advantage in writing platform-independent test
programs. But in applications within products possibly less so. Especially
since the flash geometry, including size, can be programmatically queried.

If there was to be a single firmware supporting multiple board
revisions/configurations (as can definitely happen), which could include
different sizes of NAND, I think R's implementation would be able to adapt
better than E's, as the high-level program can divide up the sizes based
on what it sees.

> (b) Dynamic memory allocation
>
> R's layer mandates the provision of malloc and free, or compatible
> functions. These must be provided to the cyg_nand_init() call.

That's unfortunate - that limits its use in smaller boot loaders - a key
application.

> E's doesn't; instead it declares a small number of static buffers.

I assume everything is keyed off CYGNUM_NAND_PAGEBUFFER, and there are no
other variables. Again I'm thinking of the scenario of single firmware -
different board revs. Can you confirm?

> Andrew Lunn opined on 6/3/09 that R's requirement for malloc is not a major
> issue because the memory needs of that layer are well-bounded; I think I
> broadly agree, though the situation is not ideal in that it forces somebody
> who wants to use a lean, mean eCos configuration to work around.

The overhead of including something like malloc/free in the image may
compare badly with the amount of memory R's needs to allocate in the first
place. I also note that if R's implementation has program verifies enabled
it allocates and frees a page _every_ time. If nothing else this could
lead to heap fragmentation.

OTOH your implementation doesn't supports program verifies in the higher
level anyway (I note your code comment about it being unnecessary as the
device should report a successful program - your faith in correct hardware
behaviour is considerable :-) ).

> Also note that if you're going to run a full file system like YAFFS, you
> can't avoid needing malloc, but in an application making simpler use of
> NAND, it's an overhead that you may prefer to avoid.

It's true that YAFFS is likely to be the most common application though.

> 3. Driver model --------------------------------------------------------
>
[snip]
>
> In eCosCentric's layer, a NAND driver is a single abstraction covering chip
> init and querying the factory-bad status as well as the high level functions
> (reading a page, etc). It is left to the driver to determine the sequence of
> commands to send. How the driver interacts with the device is considered to
> be a contract only between the driver and the relevant platform HAL, so is
> not formally abstracted by the NAND layer.

Indeed it's not dissimilar to the existing NOR flash layer.

> - R's model shares the command sequence logic amongst all chips,
> differentiating only between small- and large-page devices. (I do not know
> whether this is correct for all current chips, though going forwards seems
> less likely to be an issue as fully-ONFI-compliant chips become the norm.)

Hmm. Nevertheless, this is a concern for me with R's. I'm concerned it may
be too prescriptive to be robustly future-proof.

> If multiple chips of different types are present in a build, E's model
> potentially duplicates code (though this could be worked around; also, an
> ONFI driver ought to be written).

Worked around in a way likely to increase single-device footprint though.
Shame about the lack of OFNI driver, although I guess the parts still
aren't widely used which can't help. The Samsung K9 is close at least.

> - A corollary of arguably inconsequential import: R's model forces the synth
> driver to emulate an entire NAND chip and its protocol. E's synth doesn't
> need to.

One could say that makes it a more realistic emulation. But yes I can see
disadvantages with a somewhat rigid world view. Thinking out loud, I
wonder if Rutger's layer could work with something like Samsung OneNAND.

> - E's high-level driver interface makes it harder to add new functions
> later, necessitating a change to that API (H2 above). R's does not; the
> requisite logic would only need to be added to the ANC. It is not thought
> that more than a handful such changes will ever be required, and it may be
> possible to maintain backwards compatibility. (As a case in point, support
> for hardware ECC is currently work-in-progress within eCosCentric, and does
> require such a change, but now is not the right time to discuss that.)

In my view allowing hardware ECC support is a vital part of an API. If an
API doesn't permit exploiting hardware ECC that would be quite a negative.
R's does appear to. OTOH I can't imagine it being a difficult thing to add
in yours. In fact, because of the requirement for the drivers to call
CYG_NAND_FUNS, it doesn't seem difficult at all to be backwardly
compatible. Am I right? Nevertheless, it would be unfortunate to have an
API which already needs its low level driver interface updating to a rev 2.

Incidentally I note Rutger has a "Samsung" ECC implementation, whereas you
support Samsung K9 chips, but use the normal ECC algorithm. Did Samsung
change their practice?

> 4. Feature/implementation differences ------------------------------------
>
> (I don't consider these to be significant issues; whilst noteworthy, I don't
> think they would take much effort to resolve.)
>
> (a) Documentation
>
> The two layers' documentation differ in their depth and layout; these are
> difficult for me to compare objectively, and I would suggest that a fresh
> pair of eyes compare them.

Your documentation does appear very thorough and well-structured (although
the Samsung and EA LPC2468 docs really should be broken out into their own
packages). Rutger's does also seem fine though so I don't think there's a
strong difference either way.

> I can only offer the comment that I documented the E layer bearing in mind
> what I considered to be missing from the R layer documentation: it was not
> clear how the controller and chip layers inter-related, nor where to start
> in creating a driver. (I also had a lot less experience of NAND chips then
> than I do now, and what I need to know now is different from what a newbie
> would.)

It's possible that those layer interrelations were at the level where
really the code would be the better guide. Although there's always room
for improvement.

That being said, experience shows that the best "documentation" for driver
internals (i.e. beneath the application API) is in fact real concrete
drivers, which brings us to...

> (b) Availability of drivers
>
> R provides support for:
> - One board: BlackFin EZ-Kit BF548 (which is not in anoncvs?)
> - One chip: the ST Micro 0xG chip (large page, x8 and x16 present but
> presumably only tested on the x8 chip on the BlackFin board?)
> - A synthetic controller/chip package
> - A template for a GPIO-based controller (untested, intended as an example only)
>
> I seem to remember rumours of the existence of a driver for a further
> chip+board combination, but I haven't seen it.
>
> E provides support for:
> - Two boards: Embedded Artists LPC2468 (very well tested); STM3210E (largely
> complete, based on work by Simon K; some enhancements planned)
> - Two chips: Samsung K9 family (large page, only x8 done so far); ST-Micro
> NANDxxxx3A (small page, x8) (based on work by Simon K)
> - Synthetic target. This offers more features than R's: bad block injection,
> logging, and a GUI interface via the synth I/O auxiliary.
> - Further (customer-confidential) board ports.

I would certainly appreciate feedback from anyone who has used R's layer.
What you say would seem to imply that both small page and OFNI are
untested in R's layer.

> (c) RedBoot support
>
> E have added some commands for NAND operations and tested on the EA LPC2468
> board. (YAFFS support works via the existing RB fileio layer; nothing really
> needed to be done.)

I think that patch needs some work (I can go into detail if you like), but
it's presence is still a positive thing.

> (d) Degree of testing
>
> There are presumably differences of coverage here; both E and R assert they
> have carried out stress tests. Properly comparing the depth of the two would
> be a job for fresh eyes.
>
> E have:
> - a handful of unit and functional tests of the NAND layer, and a benchmarker
> - a number of YAFFS functional tests, one of which includes benchmarking,
> and a further severe YAFFS stress test: these indirectly test the NAND
> layer. (The latter has been run under the synth driver with bad-block
> injection turned on, and has revealed some subtle bugs which we probably
> wouldn't otherwise have caught.)
> - the ability to run continual test cycles in their test farm

Bad block injection sounds like an extremely useful feature. I infer from
the latter that we're now talking about many hours of testing?

I'd need feedback from Rutger as to what level of testing has been done
with his.

> 5. Works in progress -----------------------------------------------------
>
> I can of course only comment on eCosCentric's plans, but the following work
> is in the pipeline:
>
> * Expansion of the device interface to better allow efficient hardware ECC
> support (in progress)

Rough ETA? All I'm interested in knowing is whether the device interface
changes for this are likely to be concluded within the timeframe of this
discussion.

> * Partition addressing: make addressing relative to the start of the
> partition, once and for all

That's quite a major API change, which seems problematic to me.

> * Part-page read support (would provide a big speed-up to parts of YAFFS2
> inbandTags mode as needed by small-page devices like that on the STM3210E)

Do you foresee this happening within any particular timeframe? Do you
expect the changes to be backwardly compatible?

If you got this far, well done! Since you say you'll be away, you may
prefer to reply to this email in sections rather than sucking up your time
and doing it all at once.

Thanks in advance.

Jifl
--
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine
Reply | Threaded
Open this post in threaded view
|

Re: Re: NAND technical review

Lambrecht Jürgen
In reply to this post by Ross Younger-3
Ross Younger wrote:

> Jonathan Larmour wrote:
>  
>> I think at first the ball is really in Ross/eCosCentric's court to give
>> the technical rationale for the decision, so I'd like to ask him first
>> to give his rationale and his own perspective of the comparison of the
>> pros/cons.
>>    
>
> Here goes with a comparison between the two in something close to their
> current states (my 26/08 push to bugzilla 1000770, and Rutger's r659).
> For brevity, I will refer to the two layers as "E" (eCosCentric) and "R"
> (Rutger) from time to time.
>
> Note that this is only really a comparison of the two NAND layers. I have
> not attempted to compare the two YAFFS porting layers, though I do mention
> them in a couple of places where it seemed relevant.
>
> BTW: I will be off-net tomorrow and all next week, so please don't think I
> am ignoring the discussion...
>  
<snip>

> (a) Partitions
>
> E's application interface also provides logic implementing partitions.
> That is to say, all access to a NAND array must be via a `partition';
> the NAND layer sanity-checks whether the requested flash page or block
> address is within the given partition. This is quite a lightweight
> layer and hasn't added much overhead of either code footprint or
> execution time.
>
> The presence of partitions in E's model was controversial, as are its
> fine details. Nevertheless, some notion of partitioning turns out to be
> essential on some boards. In some recent work for a customer we identified
> three separate regions of NAND: somewhere to put the boot loader (primary,
> as booted by ROM, and RedBoot), somewhere for the application image itself
> (perhaps FIS-like rather than a full filesystem), and a filesystem for the
> application to use as it pleases.
>
>
> R's interface does not have such a facility. It appears that, in the event
> that the flash is shared between two or more logical regions, it's up to
> higher-level code to be configured with the correct block ranges to use.
>
>
> (b) Dynamic memory allocation
>
> R's layer mandates the provision of malloc and free, or compatible
> functions. These must be provided to the cyg_nand_init() call.
>
> E's doesn't; instead it declares a small number of static buffers.
>
> Andrew Lunn opined on 6/3/09 that R's requirement for malloc is not a major
> issue because the memory needs of that layer are well-bounded; I think I
> broadly agree, though the situation is not ideal in that it forces somebody
> who wants to use a lean, mean eCos configuration to work around.
>
> Also note that if you're going to run a full file system like YAFFS, you
> can't avoid needing malloc, but in an application making simpler use of
> NAND, it's an overhead that you may prefer to avoid.
>
>
> 3. Driver model --------------------------------------------------------
>
> The major architectural difference between the two NAND layers is in their
> driver models and the degree of abstraction enforced.
>
> In Rutger's layer, controllers and chips are both formally abstracted. The
> application talks to the Abstract NAND Chip, which has (hard-coded) the
> basic sequences of commands, addresses and data required to talk to a NAND
> chip. This layer talks to a controller driver, which provides the nuts and
> bolts of reading and writing to the device. The chip driver is also called
> by the ANC layer, and provides the really chip-specific parts.
>
> The call flow looks something like this (best viewed in fixed-width font):
>
> Application --(H)-> ANC --(L)-> Controller driver
>                        \
>                         \-(C)-> Chip driver
>
> H: high-level interface (read page, program page, erase block; chip
> (de)selection)
> L: low-level interface (read/write commands, addresses, data; query the busy
> line)
> C: chip-specific details (chip init, parse ReadID, query factory-bad marker)
>
>
> In eCosCentric's layer, a NAND driver is a single abstraction covering chip
> init and querying the factory-bad status as well as the high level functions
> (reading a page, etc). It is left to the driver to determine the sequence of
> commands to send. How the driver interacts with the device is considered to
> be a contract only between the driver and the relevant platform HAL, so is
> not formally abstracted by the NAND layer.
>
> E's chip drivers are written as .inl files, intended to be included by the
> relevant platform HALs by whichever source file provides the required
> low-level functions. The lack of a formal abstraction is an attempt to
> provide a leaner and meaner experience at runtime: the low-level functions
> can be (and indeed are, so far) provided as static inlines.
>
> The flow looks like this:
>
> Application --(H1)-> NAND layer --(H2)-> NAND driver --(L*)-> Platform HAL
>
> H1: high-level calls (read page, program page, erase block)
> H2: high-level calls (as H1, plus device init and query factory-bad marker)
> L*: low-level calls, like L above but not formally abstracted
>
>
> The two models have pros and cons in both directions.
>
> - As hinted at above, the static inline model of E's low-level access
> functions is expected to turn out to have a lower function call (and,
> generally, code size) overhead than R's.
>
> - R's model shares the command sequence logic amongst all chips,
> differentiating only between small- and large-page devices. (I do not know
> whether this is correct for all current chips, though going forwards seems
> less likely to be an issue as fully-ONFI-compliant chips become the norm.)
> If multiple chips of different types are present in a build, E's model
> potentially duplicates code (though this could be worked around; also, an
> ONFI driver ought to be written).
>
> - A corollary of arguably inconsequential import: R's model forces the synth
> driver to emulate an entire NAND chip and its protocol. E's synth doesn't
> need to.
>
> - E's high-level driver interface makes it harder to add new functions
> later, necessitating a change to that API (H2 above). R's does not; the
> requisite logic would only need to be added to the ANC. It is not thought
> that more than a handful such changes will ever be required, and it may be
> possible to maintain backwards compatibility. (As a case in point, support
> for hardware ECC is currently work-in-progress within eCosCentric, and does
> require such a change, but now is not the right time to discuss that.)
>
>
>  
Therefore we prefer R's model.

Is it possible that R's model follows better the "general" structure of
drivers in eCos?
I mean: (I follow our CVS, could maybe differ from the final commit of
Rutger to eCos)
1. with the low-level chip-specific code in /devs
(devs/flash/arm/at91/[board] and devs/flash/arm/at91/nfc, and
devs/flash/micron/nand)
2. with the "middleware" in /io (io/flash_nand/current/src and there
/anc, /chip, /controller)
3. with the high-level code in /fs

Is it correct that R's abstraction makes it possible to add partitioning
easily?
(because that is an interesting feature of E's implementation)

We also prefer R's model of course because we started with R's model and
use it now.
> It would perhaps be interesting to compare the complexities of drivers for
> the two models, but it's not readily apparent how we would do that fairly.
>
> Perhaps porting a driver from one NAND layer to the other would be a useful
> exercise, and would also allow us to compare code sizes. Any suggestions or
> (he says hopefully) volunteers? I've got a lot on my plate this month...
>  
same for us, no time now - beginning of next year?

>
> 4. Feature/implementation differences ------------------------------------
>
> (I don't consider these to be significant issues; whilst noteworthy, I don't
> think they would take much effort to resolve.)
>
> (a) Documentation
>
> The two layers' documentation differ in their depth and layout; these are
> difficult for me to compare objectively, and I would suggest that a fresh
> pair of eyes compare them.
>
> I can only offer the comment that I documented the E layer bearing in mind
> what I considered to be missing from the R layer documentation: it was not
> clear how the controller and chip layers inter-related, nor where to start
> in creating a driver. (I also had a lot less experience of NAND chips then
> than I do now, and what I need to know now is different from what a newbie
> would.)
>
> (b) Availability of drivers
>
> R provides support for:
> - One board: BlackFin EZ-Kit BF548 (which is not in anoncvs?)
>  
- Two: also our "automatic announcement" board to store mp3's with an
Atmel ARM9 AT91SAM9260 with 16MB of SDRAM.
> - One chip: the ST Micro 0xG chip (large page, x8 and x16 present but
> presumably only tested on the x8 chip on the BlackFin board?)
>  
- Two: also the Micron MT29F2G08AACWP-ET:D 256MB 3V3 NAND FLASH (2kB
page size, x8)
Because if this chip, Rutger adapted the hardware ECC controller code,
because our chip uses more bits (for details, ask Stijn or Rutger).

> - A synthetic controller/chip package
> - A template for a GPIO-based controller (untested, intended as an example only)
>
> I seem to remember rumours of the existence of a driver for a further
> chip+board combination, but I haven't seen it.
>
> E provides support for:
> - Two boards: Embedded Artists LPC2468 (very well tested); STM3210E (largely
> complete, based on work by Simon K; some enhancements planned)
> - Two chips: Samsung K9 family (large page, only x8 done so far); ST-Micro
> NANDxxxx3A (small page, x8) (based on work by Simon K)
> - Synthetic target. This offers more features than R's: bad block injection,
> logging, and a GUI interface via the synth I/O auxiliary.
> - Further (customer-confidential) board ports.
>
> (c) RedBoot support
>
> E have added some commands for NAND operations and tested on the EA LPC2468
> board. (YAFFS support works via the existing RB fileio layer; nothing really
> needed to be done.)
>
> (d) Degree of testing
>
> There are presumably differences of coverage here; both E and R assert they
> have carried out stress tests. Properly comparing the depth of the two would
> be a job for fresh eyes.
>
> E have:
> - a handful of unit and functional tests of the NAND layer, and a benchmarker
> - a number of YAFFS functional tests, one of which includes benchmarking,
> and a further severe YAFFS stress test: these indirectly test the NAND
> layer. (The latter has been run under the synth driver with bad-block
> injection turned on, and has revealed some subtle bugs which we probably
> wouldn't otherwise have caught.)
> - the ability to run continual test cycles in their test farm
>  
We have it very well tested, amongst others
- an automatic (continual) nand-flash test in a clima chamber
- stress tests: putting it full over and over again via FTP (both with
af few big and many small files) and check the heap remaining:
  * Put 25 files with a filesize of 10.000.000 bytes on the filesystem
  * Put 2500 files with a filesize of 100.000 bytes on the filesystem
  * Put 7000 files with a filesize of 10.000 bytes on the filesystem
  Conclusion: storing smaller files needs more heap, but we still have
plenty left with our 16MB
  * Write a bundle of files over and over again on the filesystem. We
put everytime 1000 files of 100.000 bytes filesize on the flash drive.
- used in the final mp3-player application

Kind regards,
Jürgen

>
> 5. Works in progress -----------------------------------------------------
>
> I can of course only comment on eCosCentric's plans, but the following work
> is in the pipeline:
>
> * Expansion of the device interface to better allow efficient hardware ECC
> support (in progress)
> * Hardware ECC for the STM3210E board driver
> * Performance tuning of software ECC and of NAND low-level drivers
> * Partition addressing: make addressing relative to the start of the
> partition, once and for all
> * Simple raw NAND "filesystem" for use by RedBoot (see
> http://ecos.sourceware.org/ml/ecos-devel/2009-07/msg00004.html et seq; those
> are the latest public mails but not the latest version of my thinking, which
> I will update in due course)
> * More RedBoot NAND utility commands
> * Support for booting Linux off NAND and for sharing a (YAFFS) NAND-resident
> filesystem
> * Part-page read support (would provide a big speed-up to parts of YAFFS2
> inbandTags mode as needed by small-page devices like that on the STM3210E)
>
> --------------------------------------------------------------------------
>
>
> Ross
>
> --
> Embedded Software Engineer, eCosCentric Limited.
> Barnwell House, Barnwell Drive, Cambridge CB5 8UU, UK.
> Registered in England no. 4422071.                  www.ecoscentric.com
>  


Reply | Threaded
Open this post in threaded view
|

Re: NAND technical review

Rutger Hofman-2
In reply to this post by Ross Younger-3
Ross Younger wrote:
[snip]

> Getting data into and out of the chip involves a simple protocol sequence.
>
> Commands are single bytes; addresses are sequences of a few bytes depending
> on the chip size and the operation invoked.
>
> For example, to read a page of data on the spec sheet I have to hand is:
> * Write 0x00 into the command latch
> * Write the four address bytes in turn into the address latch
> * Write 0x30 into the command latch
> * Chip signals Busy; wait for it to signal Ready
> * Read out (up to) 2112 bytes of data.

AFAIK, there are two kinds of chips on the market: Large-page chips (2K
data pages) and Small-page chips (512B pages). These speak a different
command language, but in their wiring they are the same. The large-page
chips are (nearly) ONFI-compliant, the Small-page chip command language
is different. Ancient chips aside, if a chip gives its Device Type Byte,
NAND flash code can look up in its tables what the chip parameters are
(page size, block size, number of blocks, 8 or 16 bit data bus, etc).
Miracle: Device Type Bytes are shared across manufacturers, so the table
is limited in size.

I saw an annoucement of 4K-page chips, but the datasheets are
confidential. Is there anybody who can comment on these?

> However, not all chips are quite the same. The ONFI initiative is an attempt
> to standardise chip protocols and most new chips should comply with it. A
> number of chips on the market are _nearly_ ONFI-compliant: deviations
> typically occur over the format of the ReadID response and that of an
> address. I believe that older chips did their own thing entirely.

[snip]

> 3. Driver model --------------------------------------------------------
>
> The major architectural difference between the two NAND layers is in their
> driver models and the degree of abstraction enforced.
>
> In Rutger's layer, controllers and chips are both formally abstracted. The
> application talks to the Abstract NAND Chip, which has (hard-coded) the
> basic sequences of commands, addresses and data required to talk to a NAND
> chip. This layer talks to a controller driver, which provides the nuts and
> bolts of reading and writing to the device. The chip driver is also called
> by the ANC layer, and provides the really chip-specific parts.
>
> The call flow looks something like this (best viewed in fixed-width font):
>
> Application --(H)-> ANC --(L)-> Controller driver
>                        \
>                         \-(C)-> Chip driver

The code attempts at both flexibility and code reuse. Its structure is
as follows:

Application --(H)-> ANC --(H2)-> Controller Common --(L)-> Controller
device-specific --(L)-> Chip

= ANC just wants to hide the presence of multiple controllers and
multiple chips, in any degree of heterogeneity.

= Controller Common implements the command languages for Large-page
chips and Small-page chips, does ECC generation/checking/repair. Its API
is much like the ANC's API: page_read, page_write, block_erase, but on a
specific controller+chip.

= Controller device-specific is (usually) the only part that must be
ported for a new controller/board/setup. Its API is in terms of the
commands described by Ross: push a command on the chip's bus, push/read
data on the chip's bus etc. The sample GPIO driver that I bundled shows
how little work can be involved in doing a port. I think that support
for hardware ECC of some controllers may add more to the device-specific
code than the command implementation!

= Chip has support for ONFI, Large-page, and Small page. Only for chips
that don't fit in these categories (and there will be museums that have
them) require writing a chip driver.

I realize that support for various chip and ECC types increases the
code. It will be trivial to add a few #ifdef's to disable unneeded code
for your configuration; the .cdl can specify what is needed (like: only
large-page 'regular' chips, which means: no small-page, no ONFI
interrogation).

[snip]

> - E's high-level driver interface makes it harder to add new functions
> later, necessitating a change to that API (H2 above). R's does not; the
> requisite logic would only need to be added to the ANC.

'ANC' should read: Controller Common code.

> ... (As a case in point, support
> for hardware ECC is currently work-in-progress within eCosCentric, and does
> require such a change, but now is not the right time to discuss that.)

Use of the hardware ECC support for R's BlackFin's on-board ECC was
included in R from the start. The interface between Common Controller
and device-specific controller code is designed to support this flexibly.

> It would perhaps be interesting to compare the complexities of drivers for
> the two models, but it's not readily apparent how we would do that fairly.
>
> Perhaps porting a driver from one NAND layer to the other would be a useful
> exercise, and would also allow us to compare code sizes. Any suggestions or
> (he says hopefully) volunteers? I've got a lot on my plate this month...

Yes, this would definitely be interesting. Would there be benefits in
R's attempts at ease-of-port and code reuse.

> (b) Availability of drivers
>
> R provides support for:
> - One board: BlackFin EZ-Kit BF548 (which is not in anoncvs?)
> - One chip: the ST Micro 0xG chip (large page, x8 and x16 present but
> presumably only tested on the x8 chip on the BlackFin board?)

Correction: any 'regular' chip, of which the ST Micro is an example.

I tested on my synth target with x8 and x16 chips, also different ones
in one 'board'. I tested with various page sizes, also different ones on
one 'board'.

> - A synthetic controller/chip package
> - A template for a GPIO-based controller (untested, intended as an example only)
>
> I seem to remember rumours of the existence of a driver for a further
> chip+board combination, but I haven't seen it.

See Jurgen Lambrecht's response.

[snip]

> 5. Works in progress -----------------------------------------------------
>
> I can of course only comment on eCosCentric's plans, but the following work
> is in the pipeline:
>
> * Expansion of the device interface to better allow efficient hardware ECC
> support (in progress)
> * Hardware ECC for the STM3210E board driver
> * Performance tuning of software ECC and of NAND low-level drivers
> * Partition addressing: make addressing relative to the start of the
> partition, once and for all
> * Simple raw NAND "filesystem" for use by RedBoot (see
> http://ecos.sourceware.org/ml/ecos-devel/2009-07/msg00004.html et seq; those
> are the latest public mails but not the latest version of my thinking, which
> I will update in due course)
> * More RedBoot NAND utility commands
> * Support for booting Linux off NAND and for sharing a (YAFFS) NAND-resident
> filesystem
> * Part-page read support (would provide a big speed-up to parts of YAFFS2
> inbandTags mode as needed by small-page devices like that on the STM3210E)

R is designed with support for hardware ECC in mind.

R has part-read and part-write support. One thing that has always
puzzled me is how this interacts with ECC. ECC often works on a complete
subpage, like 256 bytes on a 2KB page chip; then I understand. But what
if the read/write is not of such a subpage?

Rutger
Reply | Threaded
Open this post in threaded view
|

Re: NAND technical review

Rutger Hofman-2
In reply to this post by Jonathan Larmour-2
I should have stated this in my first mail...

I am not at all qualified to say anything about E's work, because I
didn't have time to do any kind of review of it. So, I will mainly limit
myself to comments on things that concern R's work, and where I say
anything on E it will be based on the E's mails on the list.

Jonathan Larmour wrote:
[snip]
> A device number does seem to be a bit limiting, and less deterministic.
> OTOH, a textual name arguably adds a little extra complexity.

This will be straightforward to change either way.

> I note Rutger's layer needs an explicit init call, whereas yours DTRT using a constructor, which is good.

I followed flash v2 in this. If the experts think a constructor is
better, that's easy to change too.

> Does your implementation _require_ a BBT in its current implementation?
> For simpler NAND usage, it may be overkill e.g. an application where the
> number of rewrites is very small, so the factory bad markers may be
> considered sufficient.

This is a bit hairy in my opinion, and one reason is that there is no
Standard Layout for the spare areas. One case where a BBT is forced: my
BlackFin NFC can be used to boot from NAND, but it enforces a spare
layout that is incompatible with MTD or anybody. It is even incompatible
with most chips' specification that the first byte of spare in the first
page of the block is the Bad Block Marker. BlackFin's boot layout uses
this first byte in a way that suits it, and it may be 0 -- which would
otherwise mean Bad Block.

Also, what to do if a block grows bad during usage, and that block
doesn't allow writing a marker in its spare area? BBT seems a solution.

>> (b) Dynamic memory allocation
>>
>> R's layer mandates the provision of malloc and free, or compatible
>> functions. These must be provided to the cyg_nand_init() call.
>
> That's unfortunate - that limits its use in smaller boot loaders - a key
> application.

Well, it is certainly possible to calculate statically how much space
R's NAND layer is going to use, to allocate that statically, and write a
tiny function to hand it out piecemeal at the NAND layer's request.
There is no call to free() here except at shutdown, so nothing
malloc-like is necessary. (An exception is in the debug handling, see
below.)

>> E's doesn't; instead it declares a small number of static buffers.
>
> I assume everything is keyed off CYGNUM_NAND_PAGEBUFFER, and there are
> no other variables. Again I'm thinking of the scenario of single
> firmware - different board revs. Can you confirm?
>
>> Andrew Lunn opined on 6/3/09 that R's requirement for malloc is not a
>> major
>> issue because the memory needs of that layer are well-bounded; I think I
>> broadly agree, though the situation is not ideal in that it forces
>> somebody
>> who wants to use a lean, mean eCos configuration to work around.
>
> The overhead of including something like malloc/free in the image may
> compare badly with the amount of memory R's needs to allocate in the
> first place. I also note that if R's implementation has program verifies
> enabled it allocates and frees a page _every_ time. If nothing else this
> could lead to heap fragmentation.

Program verifies should be considered a very deep debugging trait.
Still, another possible implementation for this page buffer would be on
the stack (not!), or in the controller struct. That would grow then by
8KB + spare.

[snip]

>> - R's model shares the command sequence logic amongst all chips,
>> differentiating only between small- and large-page devices. (I do not
>> know
>> whether this is correct for all current chips, though going forwards
>> seems
>> less likely to be an issue as fully-ONFI-compliant chips become the
>> norm.)
>
> Hmm. Nevertheless, this is a concern for me with R's. I'm concerned it
> may be too prescriptive to be robustly future-proof.

Well, there is no way I can see into the future, but I definitely think
that the wire command model for NAND chips is going to stay -- it is in
ONFI, after all. Besides, all except the 1 or 2 most pioneering museum
NAND chips use it too. There are chips that use a different interface,
like SSD or MMC or OneNand, but then these chips come with on-chip bad
block management, wear leveling of some kind, and are completely
different in the way they must be handled. I'd say E's and R's
implementations are concerned only with 'raw' NAND chips.

> One could say that makes it a more realistic emulation. But yes I can
> see disadvantages with a somewhat rigid world view. Thinking out loud, I
> wonder if Rutger's layer could work with something like Samsung OneNAND.

See my comment above. The datasheet on e.g. KFM{2,4}G16Q2A says:
"MuxOneNAND™‚ is a monolithic integrated circuit with a NAND Flash array
using a NOR Flash interface."

> Incidentally I note Rutger has a "Samsung" ECC implementation, whereas
> you support Samsung K9 chips, but use the normal ECC algorithm. Did
> Samsung change their practice?

The ECC algorithm is not something that is related to chips. It is
either software, or it is in the controller's ECC hardware and may need
software support. Controller EEC hardware seems to use one of two public
algorithms that are known as 'Toshiba ECC' and 'Samsung ECC'.

> I would certainly appreciate feedback from anyone who has used R's
> layer. What you say would seem to imply that both small page and OFNI
> are untested in R's layer.

That is correct. I would love some small-page testing. I have seen no
ONFI chips on the market yet, so testing will be future work for both E
and R.

> I'd need feedback from Rutger as to what level of testing has been done
> with his.

I ran YAFFS tests, some took more than an hour to complete on my
BlackFin. But for serious testing, see Jurgen Lambrecht's mail.

Rutger


Reply | Threaded
Open this post in threaded view
|

Re: NAND technical review

Rutger Hofman-2
In reply to this post by Lambrecht Jürgen
Jürgen Lambrecht wrote:
> Ross Younger wrote:
>> Jonathan Larmour wrote:
[snip]

> Is it possible that R's model follows better the "general" structure of
> drivers in eCos?
> I mean: (I follow our CVS, could maybe differ from the final commit of
> Rutger to eCos)
> 1. with the low-level chip-specific code in /devs
> (devs/flash/arm/at91/[board] and devs/flash/arm/at91/nfc, and
> devs/flash/micron/nand)
> 2. with the "middleware" in /io (io/flash_nand/current/src and there
> /anc, /chip, /controller)
> 3. with the high-level code in /fs

As far as I know, this has been the case for some releases already.

> Is it correct that R's abstraction makes it possible to add partitioning
> easily?
> (because that is an interesting feature of E's implementation)

I think it would not be hard to add. It might involve a change in API
though, which is no problem as long as the number of clients is small,
and all the more when those clients desire it.

Rutger
Reply | Threaded
Open this post in threaded view
|

Re: Re: NAND technical review

Lambrecht Jürgen
In reply to this post by Rutger Hofman-2
Rutger Hofman wrote:

<snip>

>>> - R's model shares the command sequence logic amongst all chips,
>>> differentiating only between small- and large-page devices. (I do not
>>> know
>>> whether this is correct for all current chips, though going forwards
>>> seems
>>> less likely to be an issue as fully-ONFI-compliant chips become the
>>> norm.)
>>>      
>> Hmm. Nevertheless, this is a concern for me with R's. I'm concerned it
>> may be too prescriptive to be robustly future-proof.
>>    
>
> Well, there is no way I can see into the future, but I definitely think
> that the wire command model for NAND chips is going to stay -- it is in
> ONFI, after all. Besides, all except the 1 or 2 most pioneering museum
> NAND chips use it too. There are chips that use a different interface,
> like SSD or MMC or OneNand, but then these chips come with on-chip bad
> block management, wear leveling of some kind, and are completely
> different in the way they must be handled. I'd say E's and R's
> implementations are concerned only with 'raw' NAND chips.
>
>  
Correct, only for raw NAND chips to be soldered on a board. The others
have an embedded controller and are already packaged.

>> One could say that makes it a more realistic emulation. But yes I can
>> see disadvantages with a somewhat rigid world view. Thinking out loud, I
>> wonder if Rutger's layer could work with something like Samsung OneNAND.
>>    
>
> See my comment above. The datasheet on e.g. KFM{2,4}G16Q2A says:
> "MuxOneNAND™‚ is a monolithic integrated circuit with a NAND Flash array
> using a NOR Flash interface."
>
>  
Indeed, a oneNAND is to be threated as a NOR flash, like a pseudoSRAM is
a DRAM with SRAM interface.
And SSD has a hard disk drive interface, just like MMC and SD card; they
mostly have a FAT file system on them but also UFS ...

Kind regards,
Jürgen

Reply | Threaded
Open this post in threaded view
|

Re: Re: NAND technical review

Lambrecht Jürgen
In reply to this post by Ross Younger-3
Just some explanatory remarks below, hardware related.

Ross Younger wrote:

<snip>
> 1. NAND 101 -------------------------------------------------------------
>
> (Those familiar with NAND chips can skip this section, but I appreciate
> that not everybody on-list is in the business of writing NAND device
> drivers :-) )
>
> (i) Conceptual
>  
<snip>
>
> Now, I mentioned ECC data. NAND technology has a number of underlying
> limitations, importantly that it has reliability issues. I don't have a full
> picture - the manufacturers seem to be understandably coy - but my
> understanding is that on each page, a driver ought to be able to cope with a
> single bit having flipped either on programming or on reading. The
>  
Such a "broken bit" is because the transistor that contains the bit is
physically broken, and is stuck at 1 or at 0 (I don't know if it can be
both). So you cannot anymore erase it (flip it back to 1) or program it
(flip to 0).

I thought only programming or erasing could break it, not reading?
Is somebody sure about this?

> recommended way to achieve this is by storing an ECC in the spare area: the
> algorithm published by Samsung is popular, requiring 22 bits of ECC per 256
> bytes of data and able to correct a 1 bit error and detect a 2 bit error.
>
> There is also the question of bad blocks. Again, full details are sketchy. A
> chip may be shipped with a number of "factory-bad" blocks (e.g. up to 20 on
> this Samsung chip); they are marked as such in their spare area. (What
> constitutes a "bad" block is not published; one imagines that the factory
> have access to more test information than users do and that there may be
> statistical techniques involved in judging the likely reliability of the
> block.) Blocks may also fail during the life of the device, usually by the
>  
NAND flash chips are very dense chips (many bits on a small size) and
there is a trade-off in manufacturing between reliablility and density.
To make them dense (hence cheap) faults have to be tolerated.
The manufacturer just tries to program all bits a first time to check
for manufacturing errors. When a broken bit is discovered, the entire
block is marked bad.
> chip reporting a failure during a program or erase operation. Because of
> this, the manufacturers recommend that chip drivers scan the device for
> factory-bad markers then create and maintain a Bad Block Table throughout
> the life of of the device. How this is done is not prescribed, but the
> behaviour of the Linux MTD layer is something approximating a de facto standard.
>  
<snip>
> (iii) Electrical
>
> Most, if not all, NAND chips have the same broad electrical interface.
>
> There is a master Chip Enable line; nothing happens if this is not active.
>  
(below a hardware designer note :-)
Be carefull on this: a standard chip enable is only active during the
actual read or write. But an access to a NAND flash is a complete cycle
during which the NAND flash embedded control logic needs to keep its state!
Therefore, the Chip Enable (or Chip Select) of the NAND flash is (on my
ARM9 anyhow) connected to a GPIO pin (general-purpose input/output pin).
Therefore the SW has to assert this pin at the start of an access and
de-assert it at the end.
The read hardware Chip Select pin is not connected.
(In R's SW in the io/flash_nand/../controller: cyg_nand_ctl_chip_select,
that calls chip_select implemented in the board-specific driver in
/devs/flash/[uC brand])

> Data flows into and out of the chip via its data bus, which is 8 or 16 bits
> wide, mediated by Read Enable and Write Enable lines.
>
> Commands and addresses are sent on the data bus, but routed to the
> appropriate latches by asserting the Address Latch Enable or Command Latch
> Enable lines at the same time.
>
> There is also a ready/busy line which the driver can use to tell when an
> operation is in progress. Typical operation times from the Samsung spec
> sheet I have to hand are 25us for a page read, 300us for a page program, and
> 2ms for a block erase.
>
>
> (iv) Board hook-up
>  
<snip>
> Sometimes the ready/busy line isn't wired in or requires a jumper to be set
> to route it. This can be worked around: for a read operation, one can just
> insert a delay loop for the prescribed maximum time, while for programs and
> erases, most (all?) chips have a "Read Status" command which can be used to
> query whether the operation has completed.
>  
We started our driver this way
> It can be beneficial to be able to set up the ready/busy line as an
> interrupt source, as opposed to having to poll it. Whilst there is an
> overhead involved in context-switching, if other application threads have
> much to do it may be advantageous overall for the thread waiting for the
> NAND to sleep until woken by interrupt.
>  
To speed up, now we poll the ready/busy. To use it as interrupt is still
todo.
> Of course, it is possible to put multiple chips on a board. In that case
> there needs to be a way to route between them; I would expect this to be
> done with the Chip Select line, addressed either by different MMIO addresses
> or a separate GPIO or CPLD step. Theoretically, multiple chips could be
> hooked up in parallel to give something that looks like a 16 or 32-bit
> "wide" chip, but I have never encountered this in the NAND world, and it
> would impose a certain extra level of complexity on the driver.
>  
Indeed, this would be difficult: a NAND is not a simple memory mapped
device as a NOR flash or SRAM, easy to put in parallel.
Only because of bad block management, putting them in parallel is
difficult: they cannot be put parallel in hardware, they need to be
addresses separately. Then they must be made parallel virtually in software.

Regards,
Jürgen


Reply | Threaded
Open this post in threaded view
|

Re: NAND technical review

Ross Younger-3
In reply to this post by Rutger Hofman-2
Rutger Hofman wrote:
> R has part-read and part-write support. One thing that has always
> puzzled me is how this interacts with ECC. ECC often works on a complete
> subpage, like 256 bytes on a 2KB page chip; then I understand. But what
> if the read/write is not of such a subpage?

This is a very good question - I revisited it the other day when working on
hardware ECC support for the customer port I'm working on - and I don't have
a particularly good answer for it.

If the read is less than an ECC stride[*], one could perhaps fill in the ECC
calculation by reading the rest of that stride's worth anyway and not
passing it to the caller. Similarly, a write that is less than a stride
could be "filled in" with 0xFF for the purposes of computing its ECC. How
this would be achieved efficiently is an exercise for the reader as a bit of
refactoring is likely to be involved...

[*] I'm using "stride" here to mean the amount of data that an ECC
calculation operates over. The Samsung algorithm which computes 22 bits of
ECC over 256 bytes of data is common, not least of which because that's the
one used by the Linux MTD layer.

I did wonder about not supporting less-than-page reads and writes at all,
but my code currently tries its best on the grounds of being liberal in what
it accepts.

In passing, I note that some large page devices allow the data and spare
areas to be written in subpages (e.g. this Samsung K9 chip to hand - 2048
main + 64 spare per page - allows writes in units of 512 main and 16 spare);
there might be a use to be found here in allowing an application to treat a
large page device as if it were a small-page device.


Ross

--
Embedded Software Engineer, eCosCentric Limited.
Barnwell House, Barnwell Drive, Cambridge CB5 8UU, UK.
Registered in England no. 4422071.                  www.ecoscentric.com
Reply | Threaded
Open this post in threaded view
|

Re: NAND technical review

Ross Younger-3
In reply to this post by Jonathan Larmour-2
(resend, having fallen foul of sourceware's spamtrap)

Jonathan Larmour wrote:
 > > Good ONFI support should be the highest priority as that's the way
 > > everything is likely to go, although we do need the others too.

Agreed. As the Samsung K9 is nearly ONFI already, adapting my driver is
likely to be very quick; all other things being equal, I would just do this
as and when there was a demand (and suitable hardware on my desk).


 > > Personally I would expect use as an interrupt line as the main role of
 > > the ready line.

IMLE the overhead of sleeping and context switching is quite significant. In
the drivers I've written to date, where there is a possiblity to use the
ready line as an interrupt source I have provided this as an option in CDL.


 >> >> Theoretically, multiple chips could be
 >> >> hooked up in parallel to give something that looks like a 16 or 32-bit
 >> >> "wide" chip, but I have never encountered this in the NAND world [...]
 > >
 > > Have you found on-chip (SoC's) NAND controllers permit such a
 > > configuration? If not, I would assume that it's not an expected hardware
 > > configuration.

Not on the small number of controllers I have looked at in detail.


 > > What problems would you see, if any, using your layer with the same
 > > controller and two completely different chips, of different geometry?
 > > Can you still have a common codebase with other (different) platforms?

I don't see any issue: controllers don't IME care about the chip geometry,
they just take care of the electrical side, and some calculate ECC in
passing. For that matter I don't see an issue with a single controller on
one board driving two chips of different geometries at once.


 > > Is anyone aware of NAND chips with different sized blocks? Analogous to
 > > bootblocks with NOR (I haven't, but others will undoubtedly have seen
 > > more parts than I). Although it's possible that even if they're not
 > > around or common now, they may be in future.

I don't think there's a way to express such a chip in the ONFI chip
interrogation logic, and such a chip would I think comprehensively break the
Linux MTD layer into the bargain.

 > > Unfortunately from what I
 > > can tell neither layer would be able to support that directly, although
 > > I think it may be possible for the eCosCentric layer to allow the driver
 > > to pretend there is a different NAND chip. Do you think so too?

Two chip drivers exposing different geometries but with essentially the same
underlying access functions would probably do the trick. There would have to
be careful address translation or partitioning between the two, and a single
mutex protecting both devices in the chip driver layer, but I think it'd be
a goer.


 >> >> 2. Application interface -----------------------------------------------

 >> >> The basic operations required are reading a page, programming a page and
 >> >> erasing a block, and both layers provide these.
 > >
 > > However I believe Rutger's supports partial page writes (use of
 > > 'column'), whereas I don't believe eCosCentric's does.

As covered in the other subthread, is this actually useful, and how to sort
out the ECC?


 >> >> Rutger's layer has an extra hook in
 >> >> place where an application may explicitly request the use of cached reading
 >> >> and writing where the device supports this.
 > >
 > > That seems like a useful potential optimisation, exploiting underlying
 > > capabilities. Any reason you didn't implement this?
 > >
 > > I could also believe that NAND controllers can also optimise by doing
 > > multiple block reads, where this hint would also prove useful.

Not particularly. Looking at cache-assisted read and program operations for
multi-page operations is sitting on my TODO list, languishing :-). I would
note in passing that YAFFS doesn't make use of these, preferring only to
read and write single pages fully synchronously; this might be a worthwhile
  enhancement in dealing with larger files, though YAFFS's own internal NAND
interface is strictly page-oriented at the moment and so this would require
a bit of brain surgery - something best done in conjunction with Charles
Manning, I think.


 > > Does your implementation _require_ a BBT in its current implementation?
 > > For simpler NAND usage, it may be overkill e.g. an application where the
 > > number of rewrites is very small, so the factory bad markers may be
 > > considered sufficient.

I suppose it would be possible to provide a CDL option to switch the
persistent BBT off if you really wanted to. Caution is required, though:
after you have ever written to the chip, it can be impossible to distinguish
a genuine factory-bad marker from application data in the OOB area that
happens to resemble it. This can be worked around with very careful
management of what the application puts into the OOB or by tweaking the OOB
layout to simply avoid ever writing to the relevant byte(s).


 >> >> (a) Partitions
 > > [snip]
 >> >> R's interface does not have such a facility. It appears that, in the
 >> >> event
 >> >> that the flash is shared between two or more logical regions, it's up to
 >> >> higher-level code to be configured with the correct block ranges to use.
 > >
 > > In yours, the block ranges must be configured in CDL. Is there much
 > > difference? I can see an advantage in writing platform-independent test
 > > programs. But in applications within products possibly less so.

I provide CDL for manual config, but have included a partition layout
initialisation hook. If there was an on-chip partition table, all that's
needed would be some code to go into that hook to interrogate it and
translate to my layer's in-memory layout. This is admittedly not well
documented, but hinted at by "Planning a port"
(http://www.ecoscentric.com/ecospro/doc.cgi/html/ecospro-ref/nand-devs-writing.html)
and should be readily apparent on examining code for existing chip drivers.

 > > Especially since the flash geometry, including size, can be
 > > programmatically queried.

Flash geometry can only be programmatically queried up to a point in
non-ONFI chips. Look at the k9_devinit function in k9fxx08x08.inl: while the
ReadID response of Samsung chips encodes the page, block and spare area
sizes, it doesn't tell you about the chip block count or overall size - you
have to know based on the device identifier byte. Linux, for example, has a
big table of these in drivers/mtd/nand/nand_ids.c.

 > > If there was to be a single firmware supporting multiple board
 > > revisions/configurations (as can definitely happen), which could include
 > > different sizes of NAND, I think R's implementation would be able to
 > > adapt better than E's, as the high-level program can divide up the sizes
 > > based on what it sees.

I see no reason why E's wouldn't adapt just as well, given suitably written
driver(s) and init hooks.


 >> >> (b) Dynamic memory allocation
 >> >>
 >> >> R's layer mandates the provision of malloc and free, or compatible
 >> >> functions. These must be provided to the cyg_nand_init() call.
 > >
 > > That's unfortunate - that limits its use in smaller boot loaders - a key
 > > application.
 > >

 >> >> E's doesn't; instead it declares a small number of static buffers.
 > >
 > > I assume everything is keyed off CYGNUM_NAND_PAGEBUFFER, and there are
 > > no other variables. Again I'm thinking of the scenario of single
 > > firmware - different board revs. Can you confirm?

Chip drivers are expected to require in CDL that CYGNUM_NAND_PAGEBUFFER be
large enough, and to set up a static byte array for their Bad Block Table.
Efficiently supporting two differently-sized chips on a single board - I
mean only allocating enough static space for the largest known BBT - would
not be difficult.


 > > OTOH your implementation doesn't supports program verifies in the higher
 > > level anyway (I note your code comment about it being unnecessary as the
 > > device should report a successful program - your faith in correct
 > > hardware behaviour is considerable :-) ).

Verifying after programming is also on my todo list :-)



 >> >> If multiple chips of different types are present in a build, E's model
 >> >> potentially duplicates code (though this could be worked around; also, an
 >> >> ONFI driver ought to be written).
 > >
 > > Worked around in a way likely to increase single-device footprint
 > > though. Shame about the lack of OFNI driver, although I guess the parts
 > > still aren't widely used which can't help. The Samsung K9 is close at
 > > least.

As I said, when one lands on my desk I'll gladly get writing :-)

 > > In fact, because of the requirement for the
 > > drivers to call CYG_NAND_FUNS, it doesn't seem difficult at all to be
 > > backwardly compatible. Am I right? Nevertheless, it would be unfortunate
 > > to have an API which already needs its low level driver interface
 > > updating to a rev 2.

Adding hardware ECC support and making the driver interface
backwards-compatible turned out to break layering, so I chose to change the
interface.

It's a relatively straightforward change in that I have broken up page read
and program operations into three: initialise, to read/write a stride of
data (length chosen by the NAND layer to mesh with whatever ECC length is
provided by the controller), and finalise. The flow inside my NAND layer for
programming a page becomes:

* Call chip driver to initialise the write (we expect it to send the command
and address)
* For each ECC-sized stride of data:
** If hardware ECC, call the ECC driver to tell it we're about to start a stride
** Call chip driver to write a stride of data
** If hardware ECC, call the ECC driver to get the ECC for the stride now
completed and stash it away

* If software ECC, compute it for the page
* Finalise the spare layout using the ECC wherever it came from
* Call chip driver to finalise the write, passing the final spare layout (we
expect it to write the spare area and send the program-confirm command).


I am not yet finished this work, but will update all my existing drivers
when it is done. In a way, the drawn-out nature of this process has provided
extra time for my state of the art to evolve ;-)


 > > Incidentally I note Rutger has a "Samsung" ECC implementation, whereas
 > > you support Samsung K9 chips, but use the normal ECC algorithm. Did
 > > Samsung change their practice?

The "Samsung" ECC implementation has nothing to do with the underlying chip;
it's just an algorithm whose details they published, I think in conjunction
with some of the higher-level NAND-based products they ship which feature an
FTL (USB sticks, SD cards, etc). There is in general no requirement to use
any particular ECC algorithm with any particular chip; all the spec sheets
tend to say is "use ECC".

If I have understood the code correctly, Rutger provides two ECC algorithms:

* nand_ecc.c implements the "standard" Linux MTD algorithm (indeed the code
is lifted, with acknowledgement). This is an algorithm created by Toshiba,
with a 256 byte data block and 22 bit ECC and the layer uses it by default
where no other algorithm is provided.

* io_nand_ecc_samsung.c provides a Samsung algorithm of the same parameters,
  used by the BlackFin board driver.

My layer only provides the Linux MTD algorithm at the moment (also by
lifting the code with acknowledgement).

In passing I note that the 22 bits for 256 bytes algorithm is a bit wasteful
of space as it's relative simple to add an extra pair of row-parity bits and
have 24 bits of ECC for 512 bytes of data. If you were happy that the chip
wouldn't suffer too many single-bit dropouts at once, and you decided you
didn't want to worry about subpage support you could go for 26/1024 or
28/2048. Would you believe it, writing 24/512 (and perhaps 26/1024 and
28/2048) algorithms is also on my todo list ...


 > > Your documentation does appear very thorough and well-structured
 > > (although the Samsung and EA LPC2468 docs really should be broken out
 > > into their own packages). Rutger's does also seem fine though so I don't
 > > think there's a strong difference either way.

The Samsung K9 is in its own (single-chapter) docs package as of a few weeks
ago, and the board-specific bits for the EA LPC2468 have been moved into
that HAL.


 > > [synth target]
 > > Bad block injection sounds like an extremely useful feature. I infer
 > > from the latter that we're now talking about many hours of testing?

We are. We have run our YAFFS severe stress testing with bad block injection
for over a week at a time.


 >> >> * Expansion of the device interface to better allow efficient hardware
 >> >> ECC support (in progress)
 > >
 > > Rough ETA? All I'm interested in knowing is whether the device interface
 > > changes for this are likely to be concluded within the timeframe of this
 > > discussion.

It's part and parcel of the customer port that I'm currently working on, so
"real soon now" - top of my priority list apart from this discussion ;-)
With a following wind I would hope to be able to finish it up, synch my
changes with the anoncvs side and push out maybe a week or so after I'm back
from holiday.


 >> >> * Partition addressing: make addressing relative to the start of the
 >> >> partition, once and for all
 > >
 > > That's quite a major API change, which seems problematic to me.

This is why it has to be worked out sooner rather than later, and is
currently very close to the top of my todo list ;-). Bart in particular has
been encouraging me to make this change for a while.


 >> >> * Part-page read support (would provide a big speed-up to parts of YAFFS2
 >> >> inbandTags mode as needed by small-page devices like that on the
 >> >> STM3210E)
 > >
 > > Do you foresee this happening within any particular timeframe? Do you
 > > expect the changes to be backwardly compatible?

No timescale as yet as it's relatively far down my todo list. I think
support would require an addition to the device interface to support reading
from a column address, not a break - so existing drivers would continue
working. But I need to think about this a bit more when I get there, as it
may require work on the YAFFS side, and it tickles the sleeping dragon that
is support for ECC on part-pages.


Cheers,


Ross

--
Embedded Software Engineer, eCosCentric Limited.
Barnwell House, Barnwell Drive, Cambridge CB5 8UU, UK.
Registered in England no. 4422071.                 www.ecoscentric.com
Reply | Threaded
Open this post in threaded view
|

Re: NAND technical review

Jonathan Larmour-2
In reply to this post by Lambrecht Jürgen
Jürgen Lambrecht wrote:

> Ross Younger wrote:
>> Now, I mentioned ECC data. NAND technology has a number of underlying
>> limitations, importantly that it has reliability issues. I don't have
>> a full
>> picture - the manufacturers seem to be understandably coy - but my
>> understanding is that on each page, a driver ought to be able to cope
>> with a
>> single bit having flipped either on programming or on reading. The
>>  
>
> Such a "broken bit" is because the transistor that contains the bit is
> physically broken, and is stuck at 1 or at 0 (I don't know if it can be
> both). So you cannot anymore erase it (flip it back to 1) or program it
> (flip to 0).
>
> I thought only programming or erasing could break it, not reading?
> Is somebody sure about this?

I've had experience of dodgy flash that spontaneously started getting bit
errors either over time or on reads - couldn't tell which. Really it was
NOR, rather than NAND, but that should be /more/ reliable! I think it's
probably best to assume that if it's hardware, it can go wrong :-).

[ NB I'll be replying to other mails in this thread tomorrow, but it's a
bit late here at the moment for me to start ]

Jifl
--
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine
Reply | Threaded
Open this post in threaded view
|

Re: NAND technical review

Jonathan Larmour-2
In reply to this post by Ross Younger-3
[ Lots of snippage throughout - assume "ack" or comprehension]
Ross Younger wrote:
> Jonathan Larmour wrote:
>  > > Personally I would expect use as an interrupt line as the main role of
>  > > the ready line.
>
> IMLE the overhead of sleeping and context switching is quite
> significant. In
> the drivers I've written to date, where there is a possiblity to use the
> ready line as an interrupt source I have provided this as an option in CDL.

For reads polling is good sure, for programs interrupts are probably
better, for erases interrupts will almost certainly be better. I note that
that's what you arrange for interrupt mode on the EA2468 port example,
which is good.

But I digress, as this isn't something specific to your implementation.

>  > > What problems would you see, if any, using your layer with the same
>  > > controller and two completely different chips, of different geometry?
>  > > Can you still have a common codebase with other (different) platforms?
>
> I don't see any issue: controllers don't IME care about the chip geometry,
> they just take care of the electrical side, and some calculate ECC in
> passing. For that matter I don't see an issue with a single controller on
> one board driving two chips of different geometries at once.

Hmm, I guess the key thing here is that in E's implementation most of the
complexity has been pushed into the lower layers; at least compared to
R's. R's has a more consistent interface through the layers. Albeit at the
expense of some rigidity and noticeable function overhead.

It's not likely E's will be able to easily share controller code, given of
course you don't know what chips, and so what chip driver APIs they'll be
connected to. But OTOH, maybe this isn't a big deal since a lot of the
controller-specific munging is likely to be platform-specific anyway due
to characteristics of the attached NAND (e.g. timings etc.) and the only
bits that would be sensibly shared would potentially happen in the
processor HAL anyway at startup time. What's left may not be that much and
isn't a problem in the platform HAL. However the likely exception to that
is hardware-assisted ECC. A semi-formal API for that would be desirable.

>  >> >> 2. Application interface
> -----------------------------------------------
>
>  >> >> The basic operations required are reading a page, programming a
> page and
>  >> >> erasing a block, and both layers provide these.
>  > >
>  > > However I believe Rutger's supports partial page writes (use of
>  > > 'column'), whereas I don't believe eCosCentric's does.
>
> As covered in the other subthread, is this actually useful, and how to sort
> out the ECC?

Read back the whole page (which is a drop in the ocean compared to the
time to do a full page program of course). memcmp the partially written
section for validity, then regenerate the ECC. Unless the partial write
was most of the page anyway (and a heuristic could deal with that), you
should still end up ahead.

Alternatively, some people may not want or need ECC. Higher layers may be
able to deal or have their own checking. Or the write patterns could be
sufficiently infrequent that it's not an issue worth solving (e.g.
firmware upgrades). In some cases you may not use ECC in one part managed
by e.g. a simple boot loader which you want to keep small; and then in a
different region on the same NAND there's a filesystem which does exploit
ECCs.

>  >> >> Rutger's layer has an extra hook in
>  >> >> place where an application may explicitly request the use of
> cached reading
>  >> >> and writing where the device supports this.
>  > >
>  > > That seems like a useful potential optimisation, exploiting underlying
>  > > capabilities. Any reason you didn't implement this?
>  > >
>  > > I could also believe that NAND controllers can also optimise by doing
>  > > multiple block reads, where this hint would also prove useful.
>
> Not particularly. Looking at cache-assisted read and program operations for
> multi-page operations is sitting on my TODO list, languishing :-). I would
> note in passing that YAFFS doesn't make use of these, preferring only to
> read and write single pages fully synchronously; this might be a worthwhile
>  enhancement in dealing with larger files, though YAFFS's own internal NAND
> interface is strictly page-oriented at the moment and so this would require
> a bit of brain surgery - something best done in conjunction with Charles
> Manning, I think.

Looking to the future and things like
http://osdir.com/ml/linux.file-systems.yaffs/2008-09/msg00010.html this
may well change in future.

Plus contiguous reads are more likely to be useful in other NAND using
applications than a general-purpose FS. Contiguous writes admittedly would
be less useful to exploit, but if you can have the facility for reads you
may as well have the writes.

>  > > Does your implementation _require_ a BBT in its current
> implementation?
>  > > For simpler NAND usage, it may be overkill e.g. an application
> where the
>  > > number of rewrites is very small, so the factory bad markers may be
>  > > considered sufficient.
>
> I suppose it would be possible to provide a CDL option to switch the
> persistent BBT off if you really wanted to. Caution is required, though:
> after you have ever written to the chip, it can be impossible to
> distinguish
> a genuine factory-bad marker from application data in the OOB area that
> happens to resemble it. This can be worked around with very careful
> management of what the application puts into the OOB or by tweaking the OOB
> layout to simply avoid ever writing to the relevant byte(s).

Oh I'm sure that most people will use a BBT if they can, but for simple
booting applications it may be overkill and the management has a penalty.
Factory markers and use of the OOB in appropriate ways can avoid the need
for a BBT for simple applications e.g. by relying only on ECCs, or its own
"this verified ok" marker in the OOB area.

>  >> >> (a) Partitions
>  > > [snip]
>  >> >> R's interface does not have such a facility. It appears that, in the
>  >> >> event
>  >> >> that the flash is shared between two or more logical regions,
> it's up to
>  >> >> higher-level code to be configured with the correct block ranges
> to use.
>  > >
>  > > In yours, the block ranges must be configured in CDL. Is there much
>  > > difference? I can see an advantage in writing platform-independent
> test
>  > > programs. But in applications within products possibly less so.
>
> I provide CDL for manual config, but have included a partition layout
> initialisation hook. If there was an on-chip partition table, all that's
> needed would be some code to go into that hook to interrogate it and
> translate to my layer's in-memory layout. This is admittedly not well
> documented, but hinted at by "Planning a port"
> (http://www.ecoscentric.com/ecospro/doc.cgi/html/ecospro-ref/nand-devs-writing.html)
>
> and should be readily apparent on examining code for existing chip drivers.

Ok, that sounds like quite a good thing. It also sounds harder for R's to
play nicely with Linux.

>  > > Especially since the flash geometry, including size, can be
>  > > programmatically queried.
>
> Flash geometry can only be programmatically queried up to a point in
> non-ONFI chips. Look at the k9_devinit function in k9fxx08x08.inl: while
> the
> ReadID response of Samsung chips encodes the page, block and spare area
> sizes, it doesn't tell you about the chip block count or overall size - you
> have to know based on the device identifier byte. Linux, for example, has a
> big table of these in drivers/mtd/nand/nand_ids.c.

Ahh, ok.

>  > > If there was to be a single firmware supporting multiple board
>  > > revisions/configurations (as can definitely happen), which could
> include
>  > > different sizes of NAND, I think R's implementation would be able to
>  > > adapt better than E's, as the high-level program can divide up the
> sizes
>  > > based on what it sees.
>
> I see no reason why E's wouldn't adapt just as well, given suitably written
> driver(s) and init hooks.

Ok. I also see both your chip drivers possess these hooks - which is good
as people will tend to use existing drivers as templates rather than write
their own from scratch.

>  > > In fact, because of the requirement for the
>  > > drivers to call CYG_NAND_FUNS, it doesn't seem difficult at all to be
>  > > backwardly compatible. Am I right? Nevertheless, it would be
> unfortunate
>  > > to have an API which already needs its low level driver interface
>  > > updating to a rev 2.
>
> Adding hardware ECC support and making the driver interface
> backwards-compatible turned out to break layering, so I chose to change the
> interface.
>
> It's a relatively straightforward change in that I have broken up page read
> and program operations into three: initialise, to read/write a stride of
> data (length chosen by the NAND layer to mesh with whatever ECC length is
> provided by the controller), and finalise. The flow inside my NAND layer
> for
> programming a page becomes:
>
> * Call chip driver to initialise the write (we expect it to send the
> command
> and address)
> * For each ECC-sized stride of data:
> ** If hardware ECC, call the ECC driver to tell it we're about to start
> a stride
> ** Call chip driver to write a stride of data
> ** If hardware ECC, call the ECC driver to get the ECC for the stride now
> completed and stash it away
>
> * If software ECC, compute it for the page
> * Finalise the spare layout using the ECC wherever it came from
> * Call chip driver to finalise the write, passing the final spare layout
> (we
> expect it to write the spare area and send the program-confirm command).

NB Some hardware ECC's will only compute for the whole page, e.g. AT91SAM9's.

> I am not yet finished this work, but will update all my existing drivers
> when it is done. In a way, the drawn-out nature of this process has
> provided
> extra time for my state of the art to evolve ;-)

Well that's fair enough. I think it's fair to make allowances for work
that's actually under active development (rather than vapourware or just
promises). Especially since you say further down your mail that it is
likely to be done in the next couple of weeks. (I'm not asking you for a
concrete commitment - as with anything involving volunteer effort).

>  > > Incidentally I note Rutger has a "Samsung" ECC implementation, whereas
>  > > you support Samsung K9 chips, but use the normal ECC algorithm. Did
>  > > Samsung change their practice?
>
> The "Samsung" ECC implementation has nothing to do with the underlying
> chip;
> it's just an algorithm whose details they published,

Indeed, but I sort of expected them to be using it in that context :).

> I think in conjunction
> with some of the higher-level NAND-based products they ship which
> feature an
> FTL (USB sticks, SD cards, etc). There is in general no requirement to use
> any particular ECC algorithm with any particular chip; all the spec sheets
> tend to say is "use ECC".

Sure. But I was anticipating it may be industry practice, e.g. if
Linux-MTD does the same. Maybe due to...

> * io_nand_ecc_samsung.c provides a Samsung algorithm of the same
> parameters,
>  used by the BlackFin board driver.

...it is indeed industry practice, but perhaps only rarely.

Jifl
--
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine
Reply | Threaded
Open this post in threaded view
|

Re: NAND technical review

Jonathan Larmour-2
In reply to this post by Lambrecht Jürgen
Jürgen Lambrecht wrote:

> Ross Younger wrote:
>> - E's high-level driver interface makes it harder to add new functions
>> later, necessitating a change to that API (H2 above). R's does not; the
>> requisite logic would only need to be added to the ANC. It is not thought
>> that more than a handful such changes will ever be required, and it
>> may be
>> possible to maintain backwards compatibility. (As a case in point,
>> support
>> for hardware ECC is currently work-in-progress within eCosCentric, and
>> does
>> require such a change, but now is not the right time to discuss that.)
>
> Therefore we prefer R's model.
>
> Is it possible that R's model follows better the "general" structure of
> drivers in eCos?
> I mean: (I follow our CVS, could maybe differ from the final commit of
> Rutger to eCos)
> 1. with the low-level chip-specific code in /devs
> (devs/flash/arm/at91/[board] and devs/flash/arm/at91/nfc, and
> devs/flash/micron/nand)
> 2. with the "middleware" in /io (io/flash_nand/current/src and there
> /anc, /chip, /controller)
> 3. with the high-level code in /fs

I don't see E's model as being much different in that perspective. There
is stuff in devs/flash, io/nand and (presumably) fs as well.

The difference is more the separation out of the controller functionality
into a different layer.

> Is it correct that R's abstraction makes it possible to add partitioning
> easily?
> (because that is an interesting feature of E's implementation)

As Rutger said, it could be done - there's nothing in his design which
presents it. It's not there now though, so unless someone's working on it
it's probably not something to consider in the decision process.
Especially since it would be a big user API change.

> We also prefer R's model of course because we started with R's model and
> use it now.

You haven't done any profiling by any luck have you? Or code size
analysis? Although I haven't got into the detail of R's version yet (since
I was starting with dissecting E's), both the footprint and the cumulative
function call and indirection time overhead are concerns of mine.

>> (b) Availability of drivers
[snip]
>> - One chip: the ST Micro 0xG chip (large page, x8 and x16 present but
>> presumably only tested on the x8 chip on the BlackFin board?)
>>  
>
> - Two: also the Micron MT29F2G08AACWP-ET:D 256MB 3V3 NAND FLASH (2kB
> page size, x8)
> Because if this chip, Rutger adapted the hardware ECC controller code,
> because our chip uses more bits (for details, ask Stijn or Rutger).

I'd be interested in what the issue was. From admittedly a quick look I
can't find anything about this in the code.

>> (d) Degree of testing
[snip]

> We have it very well tested, amongst others
> - an automatic (continual) nand-flash test in a clima chamber
> - stress tests: putting it full over and over again via FTP (both with
> af few big and many small files) and check the heap remaining:
>  * Put 25 files with a filesize of 10.000.000 bytes on the filesystem
>  * Put 2500 files with a filesize of 100.000 bytes on the filesystem
>  * Put 7000 files with a filesize of 10.000 bytes on the filesystem
>  Conclusion: storing smaller files needs more heap, but we still have
> plenty left with our 16MB
>  * Write a bundle of files over and over again on the filesystem. We put
> everytime 1000 files of 100.000 bytes filesize on the flash drive.
> - used in the final mp3-player application

That's extremely useful to know, thanks! But a couple of further questions
on this: Did any bad blocks show up at any point? Were you using a bad
block table? Presumably there were factory-marked bad blocks on some?

Thanks,

Jifl
--
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine
Reply | Threaded
Open this post in threaded view
|

Re: NAND technical review

Lambrecht Jürgen
Jonathan Larmour wrote:

> Jürgen Lambrecht wrote:
>  
>> Ross Younger wrote:
>>    
>>> - E's high-level driver interface makes it harder to add new functions
>>> later, necessitating a change to that API (H2 above). R's does not; the
>>> requisite logic would only need to be added to the ANC. It is not thought
>>> that more than a handful such changes will ever be required, and it
>>> may be
>>> possible to maintain backwards compatibility. (As a case in point,
>>> support
>>> for hardware ECC is currently work-in-progress within eCosCentric, and
>>> does
>>> require such a change, but now is not the right time to discuss that.)
>>>      
>> Therefore we prefer R's model.
>>
>> Is it possible that R's model follows better the "general" structure of
>> drivers in eCos?
>> I mean: (I follow our CVS, could maybe differ from the final commit of
>> Rutger to eCos)
>> 1. with the low-level chip-specific code in /devs
>> (devs/flash/arm/at91/[board] and devs/flash/arm/at91/nfc, and
>> devs/flash/micron/nand)
>> 2. with the "middleware" in /io (io/flash_nand/current/src and there
>> /anc, /chip, /controller)
>> 3. with the high-level code in /fs
>>    
>
> I don't see E's model as being much different in that perspective. There
> is stuff in devs/flash, io/nand and (presumably) fs as well.
>
> The difference is more the separation out of the controller functionality
> into a different layer.
>
>  
>> Is it correct that R's abstraction makes it possible to add partitioning
>> easily?
>> (because that is an interesting feature of E's implementation)
>>    
>
> As Rutger said, it could be done - there's nothing in his design which
> presents it. It's not there now though, so unless someone's working on it
> it's probably not something to consider in the decision process.
> Especially since it would be a big user API change.
>
>  
>> We also prefer R's model of course because we started with R's model and
>> use it now.
>>    
>
> You haven't done any profiling by any luck have you? Or code size
> analysis? Although I haven't got into the detail of R's version yet (since
> I was starting with dissecting E's), both the footprint and the cumulative
> function call and indirection time overhead are concerns of mine.
>
>  
No...

>>> (b) Availability of drivers
>>>      
> [snip]
>  
>>> - One chip: the ST Micro 0xG chip (large page, x8 and x16 present but
>>> presumably only tested on the x8 chip on the BlackFin board?)
>>>
>>>      
>> - Two: also the Micron MT29F2G08AACWP-ET:D 256MB 3V3 NAND FLASH (2kB
>> page size, x8)
>> Because if this chip, Rutger adapted the hardware ECC controller code,
>> because our chip uses more bits (for details, ask Stijn or Rutger).
>>    
>
> I'd be interested in what the issue was. From admittedly a quick look I
> can't find anything about this in the code.
>  
Maybe Rutger can better answer this. Else Stijn can look-up his mail on
this issue.

>  
>>> (d) Degree of testing
>>>      
> [snip]
>  
>> We have it very well tested, amongst others
>> - an automatic (continual) nand-flash test in a clima chamber
>> - stress tests: putting it full over and over again via FTP (both with
>> af few big and many small files) and check the heap remaining:
>>  * Put 25 files with a filesize of 10.000.000 bytes on the filesystem
>>  * Put 2500 files with a filesize of 100.000 bytes on the filesystem
>>  * Put 7000 files with a filesize of 10.000 bytes on the filesystem
>>  Conclusion: storing smaller files needs more heap, but we still have
>> plenty left with our 16MB
>>  * Write a bundle of files over and over again on the filesystem. We put
>> everytime 1000 files of 100.000 bytes filesize on the flash drive.
>> - used in the final mp3-player application
>>    
>
> That's extremely useful to know, thanks! But a couple of further questions
> on this: (1) Did any bad blocks show up at any point? (2) Were you using a bad
> block table? (3) Presumably there were factory-marked bad blocks on some?
>  
(3) Yes, there are almost always factory-marked bad blocks.
(2) yes
(1)Yes, certainly! We have from time to time bad blocks, and they are
handled correctly.

Kind regards,
Jürgen
> Thanks,
>
> Jifl
> --
> --["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine
>  
totally agree ;-)

Reply | Threaded
Open this post in threaded view
|

Re: NAND technical review

Rutger Hofman-2
In reply to this post by Jonathan Larmour-2
Jonathan Larmour wrote:
[snip]
>> We also prefer R's model of course because we started with R's model
>> and use it now.
>
> You haven't done any profiling by any luck have you? Or code size
> analysis? Although I haven't got into the detail of R's version yet
> (since I was starting with dissecting E's), both the footprint and the
> cumulative function call and indirection time overhead are concerns of
> mine.

In a first step in mitigating the 'footprint pressure', I have added CDL
options to configure in/out support for the various chips types, to wit:
- ONFI chips;
- 'regular' large-page chips;
- 'regular' small-page chips.
It is in r678 on my download page
(http://www.cs.vu.nl/~rutger/software/ecos/nand-flash/). As I had
suggested before, this was a very small refactoring (although code has
moved about in io_nand_chip.c to save on the number of #ifdefs).

One more candidate for a reduce in code footprint: I can add a CDL
option to configure out support for heterogeneous controllers/chips. The
ANC layer will become paper-thin then. If this change will make any
difference, I will do it within, say, a week's time.

As regards the concerns for (indirect) function call overhead: my
intuition is that the NAND operations themselves (page read, page write,
block erase) will dominate. It takes 200..500us only to transfer a page
over the data bus to the NAND chip; one recent data sheet mentions
program time 200us, erase time 1.5ms. I think only a very slow CPU would
show the overhead of less than 10 indirect function calls.

Rutger
Reply | Threaded
Open this post in threaded view
|

Re: NAND technical review

Rutger Hofman-2
In reply to this post by Jonathan Larmour-2
Jonathan Larmour wrote:

> Hmm, I guess the key thing here is that in E's implementation most of
> the complexity has been pushed into the lower layers; at least compared
> to R's. R's has a more consistent interface through the layers. Albeit
> at the expense of some rigidity and noticeable function overhead.
>
> It's not likely E's will be able to easily share controller code, given
> of course you don't know what chips, and so what chip driver APIs
> they'll be connected to. But OTOH, maybe this isn't a big deal since a
> lot of the controller-specific munging is likely to be platform-specific
> anyway due to characteristics of the attached NAND (e.g. timings etc.)
> and the only bits that would be sensibly shared would potentially happen
> in the processor HAL anyway at startup time. What's left may not be that
> much and isn't a problem in the platform HAL. However the likely
> exception to that is hardware-assisted ECC. A semi-formal API for that
> would be desirable.

This is the largest difference in design philosophy between E and R. Is
it OK if I expand?

NAND chips are all identical in their wire setup. They all have a data
'bus', and control lines to indicate whether what is on the bus is a
command, an address, or data.

NAND chips differ in how their command language works, but only so far.
What is on the market now is 'regular' large-page chips that all speak
the same command language, and small-page chips that have a somewhat
different command language. ONFI chips are large-page chips except in
interrogation at startup and in bad-block marking.

E.g. a page read for a large-page chip (my running example) looks like this:
. write a command 0x00 (READ_START)
. write address bytes of the page(+offset) to be read
. write a command 0x30 (READ_CONFIRM)
. read the data on the bus
. insofar as supported retrieve hw-calculated ECC
For small-page chips the sequence is different because a page's data is
read in multiple chunks, using READ_1_A (0x00), READ_1_B (0x01), and for
spare area READ_2 (0x05).

These 2 languages are all the variation there is for NAND chips (plus,
at another level, 2 timing values for read cycle and write cycle)! The
wide-ranging differences for devices for NAND are in the controllers.

How controllers work, is that they accept input like 'write a command of
value 0x..', 'write an address of value 0x.....', etc, and do their job
on the NAND chip's wires. They cannot really operate at a higher level,
if only because they must support both small-page and large-page chips
(and ONFI), and this is the level of common protocol for the chips.

So controller code has to bridge between API calls like page_read and
the interface of the controller as described above. R's implementation
presumes that a lot of the code to make this translation is generic: a
large-page read translates to the controller steps as given above in the
running example, in any controller implementation. Moreover, the generic
code handles spare layout: where in the spare is the application's spare
data folded, where is the ECC, where is the bad-block mark. OTOH, the
generic code has hooks for handling any ECC that the controller has
computed in hardware -- how ECC is supported in hardware varies across
controllers. But the way the ECC check is handled (case in point is
where a correctible bit error is flagged) is generic again.

So, lots of code can (and will) be shared across controller
implementations -- whether by code sharing or by code duplication.

Rutger
Reply | Threaded
Open this post in threaded view
|

Re: NAND technical review

Rutger Hofman-2
In reply to this post by Jonathan Larmour-2
Jonathan Larmour wrote:
> Jürgen Lambrecht wrote:
[snip]
>> - Two: also the Micron MT29F2G08AACWP-ET:D 256MB 3V3 NAND FLASH (2kB
>> page size, x8)
>> Because if this chip, Rutger adapted the hardware ECC controller code,
>> because our chip uses more bits (for details, ask Stijn or Rutger).
>
> I'd be interested in what the issue was. From admittedly a quick look I
> can't find anything about this in the code.

As things go with NAND, this was not a chip issue but a controller
issue. This controller has a different approach to hardware ECC than
most; it doesn't export the ECC sum values, but the ECC syndromes --
values that in their bit pattern indicate where any bit errors are. I
added ECC_SYNDROME support to my generic controller code. If I compare
with MTD, I think with this addition, R kind/a covers the range of ECC
hardware support types that currently are in existence.

I don't know whether Televic (Stijn) actually uses the ECC_SYNDROME
code. Last thing I heard, coincident with my adding ECC_SYNDROME, is
that they had already solved their performance issues differently, but I
don't know what happened after that.

Rutger
Reply | Threaded
Open this post in threaded view
|

RE: NAND technical review

Lambrecht Jürgen


> -----Original Message-----
> From: [hidden email] [mailto:ecos-devel-
> [hidden email]] On Behalf Of Rutger Hofman
> Sent: dinsdag 13 oktober 2009 16:25
> To: Jonathan Larmour
> Cc: Lambrecht Jürgen; Ross Younger; eCos developers; Deroo Stijn
> Subject: Re: NAND technical review
>
> Jonathan Larmour wrote:
> > Jürgen Lambrecht wrote:
> [snip]
> >> - Two: also the Micron MT29F2G08AACWP-ET:D 256MB 3V3 NAND FLASH (2kB
> >> page size, x8)
> >> Because if this chip, Rutger adapted the hardware ECC controller
> code,
> >> because our chip uses more bits (for details, ask Stijn or Rutger).
> >
> > I'd be interested in what the issue was. From admittedly a quick look
> I
> > can't find anything about this in the code.
>
> As things go with NAND, this was not a chip issue but a controller
> issue. This controller has a different approach to hardware ECC than
> most; it doesn't export the ECC sum values, but the ECC syndromes --
> values that in their bit pattern indicate where any bit errors are. I
> added ECC_SYNDROME support to my generic controller code. If I compare
> with MTD, I think with this addition, R kind/a covers the range of ECC
> hardware support types that currently are in existence.
>
> I don't know whether Televic (Stijn) actually uses the ECC_SYNDROME
> code. Last thing I heard, coincident with my adding ECC_SYNDROME, is
> that they had already solved their performance issues differently, but
> I
> don't know what happened after that.
Indeed, we have not yet used it. Maybe by the end of the year.
Regards,
Jürgen

>
> Rutger
Reply | Threaded
Open this post in threaded view
|

Re: NAND technical review

Jonathan Larmour-2
In reply to this post by Rutger Hofman-2
[ Sorry for getting back to this late - I wanted to continue with Ross
before he went on holiday ]

Rutger Hofman wrote:
> Jonathan Larmour wrote:
>
>> A device number does seem to be a bit limiting, and less
>> deterministic. OTOH, a textual name arguably adds a little extra
>> complexity.
>
>
> This will be straightforward to change either way.

Noted, thanks.

>> I note Rutger's layer needs an explicit init call, whereas yours DTRT
>> using a constructor, which is good.
>
>
> I followed flash v2 in this. If the experts think a constructor is
> better, that's easy to change too.

Flash v2 doesn't use a constructor for legacy reasons and only because of
some last minute discussions before the v3 release which couldn't reach a
conclusion about constructor priority, given things like SPI flash.
cyg_flash_init() is going to be properly eliminated in due course.

These issues don't really affect your layer so much as you don't have any
legacy burden, so moving straight to a constructor is better.

>> Does your implementation _require_ a BBT in its current
>> implementation? For simpler NAND usage, it may be overkill e.g. an
>> application where the number of rewrites is very small, so the factory
>> bad markers may be considered sufficient.
>
>
> This is a bit hairy in my opinion, and one reason is that there is no
> Standard Layout for the spare areas. One case where a BBT is forced: my
> BlackFin NFC can be used to boot from NAND, but it enforces a spare
> layout that is incompatible with MTD or anybody. It is even incompatible
> with most chips' specification that the first byte of spare in the first
> page of the block is the Bad Block Marker. BlackFin's boot layout uses
> this first byte in a way that suits it, and it may be 0 -- which would
> otherwise mean Bad Block.

I infer that your layer can cope with that? I didn't see the handling for
that in io_nand_chip_bad_block.c.

Is your BBT compatible with Linux MTD? Including your use of a mirror?

> Also, what to do if a block grows bad during usage, and that block
> doesn't allow writing a marker in its spare area? BBT seems a solution.

Well I was making the explicit assumption that it wasn't rewritten very
often in the lifetime of the device. Think of things like in-field
firmware upgrades.

>>> (b) Dynamic memory allocation
>>>
>>> R's layer mandates the provision of malloc and free, or compatible
>>> functions. These must be provided to the cyg_nand_init() call.
>>
>>
>> That's unfortunate - that limits its use in smaller boot loaders - a
>> key application.
>
>
> Well, it is certainly possible to calculate statically how much space
> R's NAND layer is going to use, to allocate that statically, and write a
> tiny function to hand it out piecemeal at the NAND layer's request.

If you know what it's going to be (at most), it could just be allocated
statically and just used directly surely? That's got the lowest overheads.

E's implementation had a good idea of a CDL variable for the maximum
supported block size. Then individual HALs or driver packages can use a
CDL 'requires' to ensure it's >= the block size of the chips really in use.

>>> E's doesn't; instead it declares a small number of static buffers.
>>
>> I assume everything is keyed off CYGNUM_NAND_PAGEBUFFER, and there are
>> no other variables. Again I'm thinking of the scenario of single
>> firmware - different board revs. Can you confirm?
>>
>>> Andrew Lunn opined on 6/3/09 that R's requirement for malloc is not a
>>> major
>>> issue because the memory needs of that layer are well-bounded; I think I
>>> broadly agree, though the situation is not ideal in that it forces
>>> somebody
>>> who wants to use a lean, mean eCos configuration to work around.
>>
>>
>> The overhead of including something like malloc/free in the image may
>> compare badly with the amount of memory R's needs to allocate in the
>> first place. I also note that if R's implementation has program
>> verifies enabled it allocates and frees a page _every_ time. If
>> nothing else this could lead to heap fragmentation.
>
>
> Program verifies should be considered a very deep debugging trait.

I'm not sure about that. Experience with NOR Flash has shown that despite
promises of error reporting in the datasheets, sometimes the only way to
be sure of data integrity is an explicit verify step. It's up to the user,
but I would consider it to have more use than just for debugging a driver.

> Still, another possible implementation for this page buffer would be on
> the stack (not!), or in the controller struct. That would grow then by
> 8KB + spare.

Or a single one for all chips maybe (since chances of clashes seem pretty
small, so just protected with a mutex). And only if the program verify
option is enabled of course. As per above, the page buffer size could be
derived from the configuration, with appropriate CDL.

> [snip]
>
>>> - R's model shares the command sequence logic amongst all chips,
>>> differentiating only between small- and large-page devices. (I do not
>>> know
>>> whether this is correct for all current chips, though going forwards
>>> seems
>>> less likely to be an issue as fully-ONFI-compliant chips become the
>>> norm.)
>>
>>
>> Hmm. Nevertheless, this is a concern for me with R's. I'm concerned it
>> may be too prescriptive to be robustly future-proof.
>
>
> Well, there is no way I can see into the future, but I definitely think
> that the wire command model for NAND chips is going to stay -- it is in
> ONFI, after all. Besides, all except the 1 or 2 most pioneering museum
> NAND chips use it too.

I don't entirely disagree. But people do have a habit of inventing new
things, particularly if it allows them to differentiate their products
from their competitors.

> There are chips that use a different interface,
> like SSD or MMC or OneNand, but then these chips come with on-chip bad
> block management, wear leveling of some kind, and are completely
> different in the way they must be handled. I'd say E's and R's
> implementations are concerned only with 'raw' NAND chips.
>> One could say that makes it a more realistic emulation. But yes I can
>> see disadvantages with a somewhat rigid world view. Thinking out loud,
>> I wonder if Rutger's layer could work with something like Samsung
>> OneNAND.
>
>
> See my comment above. The datasheet on e.g. KFM{2,4}G16Q2A says:
> "MuxOneNAND™‚ is a monolithic integrated circuit with a NAND Flash array
> using a NOR Flash interface."

OneNAND isn't like SSD or MMC which essentially provide a block interface
and an advanced controller hiding the details of NAND. It isn't like NOR
flash because you can't address the entire array - as shown by the fact it
only has a 16-bit address bus. Instead with OneNAND you get an SRAM buffer
as a "window" into the NAND array. There are commands to load data from
NAND pages into the SRAM buffers, or write them back. It has onboard ECC
logic, but it has a very different way of controlling the NAND. You do get
access to both data and spare areas too.

You can consider this the sort of thing I mean when I say that
manufacturers can come up with interesting things which break rigid
assumptions of how you talk to NAND chips. So my concern is not (just)
that your layer can't support OneNAND, but it couldn't support anything
which also had a different interface.

Obviously you already support small versus large page, which require
different protocols, but they are still relatively similar in how they're
controlled. Would it even be possible to sensibly extend your generic
layer to support something like OneNAND? Without having a large number of
kludges?

>> I would certainly appreciate feedback from anyone who has used R's
>> layer. What you say would seem to imply that both small page and OFNI
>> are untested in R's layer.
>
>
> That is correct. I would love some small-page testing. I have seen no
> ONFI chips on the market yet, so testing will be future work for both E
> and R.

Ross said that the Samsung K9 is pretty similar to ONFI, other than how
you read the device ID etc. Is your layer equally close?

Thanks,

Jifl
--
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine
123