FLASH CACHE BASICS
PCIe expansion card used for scalabe read cache in NetApp storage systems
Enables a disk limited disk storage system to achieve it's max I/O potential, using fewer disks to achieve the maximum I/O thus uses less resources (power, rack space, money).
Helps achieve lower read latency due to faster access times of the solid state memory
Flash cache hits reduce latency by a factor of 10 for reads
Specifications:
Standard height, 3/4" x8 PCIe card
72 NAND flash chips, 36 on each side of the card, the density is different for two different sized cards, the 256GB version has 72 32-Gb flash chips while the 512-GB version has 64-Gb flash chips.
Each card consumes a single PCIe Gen1 connection
256-GB & 512-GB cards are supported by 7.3.2+ DataONTAP (NOTE not supported in 8.0, only 8.0.1+)
Under the aluminum heat sink ... in the center of the card there is a custom design controller on the PCI bracket at the end of the flash cache card.
Indicators:
Two LEDs are located on the cards -- you can see the LEDs from the back on the controller through the perforated PCIe cover.
The
Amber LED should be off under normal operation, if it is on there is a problem with the card, and it is taken off-line when a fault is detected.
The
Green LED indicates activity, and provides a heartbeat indicate blinking at 0.5Hz, and the blink rates are based on the I/O rate of the card as follows: 0.5Hz (< 1,000 I/O per second); 1.4Hz (1,000-10,000); 5.0Hz (10,000 - 100,000); 10.0Hz (>100,000).
Power:
Draws all required power from the 12V rail on PCI connector
18W power consumption which is below the 25W max consumption required by all PCIe supported platforms
10C-40C ambient temperature operating environment, lower than most PCIe components
Improved air-flow allows memory components to operate under lower ambient temperatures
95% less electricity than a shelf of 14 10,000RPM disk shelves (which is what card allows to be eliminated)
Flash Cache Field Programmable Gate Array (FPGA)
One x8 PCIe 1.1 Interface
DMA access engine
Four independent 72bit async NAND interfaces with 18 flash devices
The flash data interfaces that connect to the flash devices are capable of running at 40MHz so the raw bandwidth of the card is 1.28GB/s so when one is busy another can take up the reads
An interfaces can operate on 9 flash devices in parallel at a time
Each of 18 flash devices on interfaces contains multiple 8GB NAND cores
256GB has 288 cores, 4 cores/device
512GB has 576 cores, 8 cores/device
FPGA Low Level Specifics:
Each NAND core is made up of blocks and cache wears out in increments of blocks
Each block contains pages, and pages are units of storage that data can be written into and read from
Across 9 parallel cores, 8 cores are for data, one if for parity (this is a bank)
8 banks for 256GB
If a NAND core losses too many blocks, it can be taken out of use without functional distruption
A DMA engine supports one Write and Erase queue per flash interfaces
Multithreaded DMA engines ensure non-volatile, supports 8 read queues for each interface
DMA engine supports 520byte sectors, the flash controller controls the write operates to write memory and informs the flash controller of any issues
If a WAFL from flash cache fails because of an uncorrectable BCH error in flash memory than the data is fetched from disk
Flash memory contents are protected by 4bit BCH codes
If a core fails the card continues to operate without any loss of capacity using parity to reconstruct data
If an entire bank of cores fails, the card continues to work with a reduced capacity
The Dynamic interrupt mechanism speeds up or slows down to met host processing rate and upgradable from backup image and can match the power or thermal limits of the platform
Flash Cache FPGA Enhanced Resiliency
Wear Leveling: uses algorithms to ensure each block receives equal amount of wear
Bad Block Detection and Remapping: FPGA monitors and IDs worn out blocks, failed blocks replaced by FPGA
BCH Error Correction Engine: soft errors during reads handled here
Protection RAID: 8 data and 1 parity chips, can tolerate the loss of an entire chip
Dynamic Flash Remapping and Reduction: if two chips from same bank are loss, software can map out that region of memory, when significant portion is loss, ASUP message generated
WAFL Checksums: additionally software stores chksum with every WAFL block, if fails on read data from flash cache is discarded and data is obtained then from disk
Flash Cache Subsystem:
WAFL: helps reduce the demand for random disk reads by reading user data and metadata from the external cache, it interfaces with the WAFL filesystem, and then controls and tracks the cache state.
WAFL External Cache (EC): is a software module that is used to cache WAFL data in external memory cards. The EC can be used with either PAM1 or Flash Caches (>7.3.2). Also supports Predictive Cache Statistics (PCS). Contains three flow control processes: primary cache eviction, cache lookup, and I/O completion. Per 0.5TB of Flash Cache card, 1.5 to 2.0 GB are preallocated for tag storage in the storage system main memory.
Flash Adaptation Layer (FAL): is responsible for mapping a simple block address space on to one or more Flash Cache cards. The FAL can manage cache writes in a way that produces excellent wear leveling, load balancing, and throughput while minimizing read variance that is caused by resource conflicts. The FAL transparently implements bad block mapping, which gradually reduces flash capacity as flash blocks wear out. Models flash memory as a single circular log across all blocks on cache. Blocks must be erased before overwritten. The number of erasures are limited, therefore wear leveling is important. Round-robin scheduling of writes. Reads and writes passed on to the Flash Cache driver. Achieves wear leveling by placing EC writes in a circular log within a bank.
Flash Cache Driver (FCD): manages all comms with the Flash Cache hardware, including request queues, interrupts, fault handling, initialization and FPGA. Manages all Flash Cache cards, multiple cards are aggregated behind this FCD interface. Provides memory unification, load balancing, and queuing across all cards. Communicates through EMS by issuing messages for hardware status and error messages. Automatically enabled when hardware detected.
Flash Cache Hardware: The card itself.
Bad Blocks:
Two copies, bad block discovery table, not stored in each flash block, ensure only one bad block table erased at a time, on power-up driver goes through discovery, since table is kept, initial power-up time reduced.
Flash Management Module
operates at a higher level, viewing the components of a Flash Cache as domains.
These domains are interfaces, flash banks, lanes, blocks, and cores.
FMM assists in maintaining availability and providing serviceability. These aspects are monitored by the FMM when the storage system boots up. The FMM begins running when the storage system boots up and immediately begins discovering flash devices. FMM is enabled by default in the Data ONTAP operating system. When a flash device, such as Flash Cache, is discovered, the driver of the flash device registers it with the FMM for reliability, availability, and serviceability (RAS).
TROUBLESHOOTING, INSTALLING, DIAGNOSTICS
Shut down controller
Open storage system
Remove an exisiting module if necessary
install flash cache card
close and boot system
run diagnostics on the new flash cache card (for first time install)
(also enable WAFL EC software and configuration options for first time install)
complete the installation process
Enable WAFL external cache software liscense:
license ass
Enable WAFL external cache software:
options flexscale.enable
If AA perform on both systems
Run
sysconfig -v to show slots in which cache is installed. Three states, "Enabled|Disabled|Failed". Further details of the failed state may also be listed if the state is failed, e.g. "Failed Firmware".
WAFL EC Config Options:
Cache normal user data blocks
Cache low-priority user data blocks
Cache only system metadata
To integrate the FlexShare QoS tool's buffer cache policies with WAFL external cache options use the
priority command.
Default Flash Cache configuration:
options flexscale
flexscale.enable on
flexscale.lopri_blocks off << recommended to turn this on
flexscale.normal_data_blocks on
Note, when caching normal data, until the Flash Cache card is 70% full, all the options of caching are turned on and after the Flash Cache card is 70% full, the set configurations are identified and used. In the Flash Cache default caching mode, a block is cached when it is evicted from the main memory cache. When the data is accessed at a later time, the data is obtained from the Flash Cache card, which is larger than the main memory. In this mode, Flash Cache acts like the main memory.
Flash Cache caches the file data and metadata. Metadata is not displayed as an option for the options command because metadata is cached instantly. Metadata is the data that is used to maintain the file-level data structure and directory structure for NFS and CIFS data. In the default mode, Flash Cache also caches normal data, which primarily consists of the random reads.
The recommended configuration for Flash Cache is to turn on caching for normal data blocks and for low-priority data, which includes random reads and some of the writes.
Predictive Cache Statistics:
Uses sampling approach instead of using an entirely new EC tag store which PAM1 does.
Sample and only allocated and updates the sampled portion of the tagstore and thus reduces CPU and memory usage. Sampling rate in 1% and 2%, the default is 1%.
options flexscale
flexscale.enable pcs
flexscale.pcs_high_res off <<< turn on to use 2%
flexscale.pcs_size 1024GB <<< can change to test if more cache would help
LED Notifications:
Fault LED amber off // Green LED blinking -- NORMAL
Fault LED amber on // Green LED blinking -- hardware OK, but software taken off-line
Fault LED amber on // Green LED solid -- unknown problem, hardware problem, may need replaced
Fault LED amber off // Green LED off -- hardware problem with power supply -- replace
Fault LED amber on // Green LED off -- hardware problem with FPGA config -- replace
Use sysconfig -v, EMS logs, and
stats show ext_cache_obj command to display the flash cards that the Flash Cache is using and the blocks that can be stored for each Flash Cache card. Each card can store 135 billion 4k blocks.
FMM generates ASUP email notification messages. ASUP needs to be enabled on the storage system, /etc/log/fmm_data is where information and settings are stored. Case types are
DEGRADED, OFFLINED, FAILED.