US20130227236A1 - Systems and methods for storage allocation - Google Patents
Systems and methods for storage allocation Download PDFInfo
- Publication number
- US20130227236A1 US20130227236A1 US13/865,153 US201313865153A US2013227236A1 US 20130227236 A1 US20130227236 A1 US 20130227236A1 US 201313865153 A US201313865153 A US 201313865153A US 2013227236 A1 US2013227236 A1 US 2013227236A1
- Authority
- US
- United States
- Prior art keywords
- storage
- data
- logical
- lids
- request
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000003860 storage Methods 0.000 title claims description 2136
- 238000000034 method Methods 0.000 title claims description 182
- 230000002085 persistent effect Effects 0.000 claims description 133
- 230000004044 response Effects 0.000 claims description 99
- 235000019580 granularity Nutrition 0.000 claims description 58
- 238000013519 translation Methods 0.000 claims description 30
- 230000003370 grooming effect Effects 0.000 claims description 24
- 238000000638 solvent extraction Methods 0.000 claims description 15
- 238000004590 computer program Methods 0.000 claims description 9
- 230000000875 corresponding effect Effects 0.000 description 89
- 238000010586 diagram Methods 0.000 description 65
- 238000013507 mapping Methods 0.000 description 54
- 230000027455 binding Effects 0.000 description 42
- 238000009739 binding Methods 0.000 description 42
- 230000008569 process Effects 0.000 description 41
- 238000013500 data storage Methods 0.000 description 40
- 230000004888 barrier function Effects 0.000 description 39
- 239000013598 vector Substances 0.000 description 35
- 230000014616 translation Effects 0.000 description 28
- 230000002441 reversible effect Effects 0.000 description 27
- 238000007726 management method Methods 0.000 description 26
- 238000011084 recovery Methods 0.000 description 24
- 230000011218 segmentation Effects 0.000 description 24
- 238000010367 cloning Methods 0.000 description 16
- 239000000872 buffer Substances 0.000 description 14
- 230000006870 function Effects 0.000 description 11
- 230000004048 modification Effects 0.000 description 11
- 238000012986 modification Methods 0.000 description 11
- 238000012545 processing Methods 0.000 description 10
- 238000004891 communication Methods 0.000 description 8
- 238000005192 partition Methods 0.000 description 8
- 230000008901 benefit Effects 0.000 description 7
- 238000012005 ligant binding assay Methods 0.000 description 7
- 239000008186 active pharmaceutical agent Substances 0.000 description 6
- 230000009471 action Effects 0.000 description 5
- 230000008859 change Effects 0.000 description 5
- 230000007246 mechanism Effects 0.000 description 5
- 230000001360 synchronised effect Effects 0.000 description 5
- 230000003321 amplification Effects 0.000 description 4
- 239000000470 constituent Substances 0.000 description 4
- 238000012217 deletion Methods 0.000 description 4
- 230000037430 deletion Effects 0.000 description 4
- 238000003199 nucleic acid amplification method Methods 0.000 description 4
- 230000008520 organization Effects 0.000 description 4
- 230000000717 retained effect Effects 0.000 description 4
- 238000013459 approach Methods 0.000 description 3
- 230000003247 decreasing effect Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000036961 partial effect Effects 0.000 description 3
- 230000002829 reductive effect Effects 0.000 description 3
- 238000012546 transfer Methods 0.000 description 3
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000007596 consolidation process Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000002028 premature Effects 0.000 description 2
- 238000009966 trimming Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000003466 anti-cipated effect Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000003139 buffering effect Effects 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000013501 data transformation Methods 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000011010 flushing procedure Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 229920001485 poly(butyl acrylate) polymer Polymers 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000003362 replicative effect Effects 0.000 description 1
- 238000005204 segregation Methods 0.000 description 1
- 208000011580 syndromic disease Diseases 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0223—User address space allocation, e.g. contiguous or non contiguous base addressing
- G06F12/0292—User address space allocation, e.g. contiguous or non contiguous base addressing using tables or multilevel address translation means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/188—Virtual file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0629—Configuration or reconfiguration of storage systems
- G06F3/0631—Configuration or reconfiguration of storage systems by allocating resources to storage systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/0643—Management of files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0683—Plurality of storage devices
- G06F3/0688—Non-volatile semiconductor memory arrays
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C16/00—Erasable programmable read-only memories
- G11C16/02—Erasable programmable read-only memories electrically programmable
- G11C16/06—Auxiliary circuits, e.g. for writing into memory
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C16/00—Erasable programmable read-only memories
- G11C16/02—Erasable programmable read-only memories electrically programmable
- G11C16/06—Auxiliary circuits, e.g. for writing into memory
- G11C16/26—Sensing or reading circuits; Data output circuits
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C16/00—Erasable programmable read-only memories
- G11C16/02—Erasable programmable read-only memories electrically programmable
- G11C16/06—Auxiliary circuits, e.g. for writing into memory
- G11C16/34—Determination of programming status, e.g. threshold voltage, overprogramming or underprogramming, retention
- G11C16/349—Arrangements for evaluating degradation, retention or wearout, e.g. by counting erase cycles
Definitions
- This disclosure relates to storage systems and, in particular, to managing an address space of a storage system.
- a computing system may provide a logical address space of a storage device and/or system.
- the logical address space may comprise identifiers used by storage clients to reference storage resources.
- the computing system may further comprise a logical-to-physical translation layer configured to map identifiers of the logical address space with the storage location of data associated with the identifiers.
- the translation layer may comprise any-to-any mappings between identifiers and physical addresses.
- the logical address space may be independent of the underlying physical storage resources, and may exceed the capacity of the physical storage resources.
- Storage clients may allocate portions of the logical address space to perform storage operations. Maintaining allocation metadata pertaining to the logical address space may, however, impose significant overhead.
- the disclosed methods may comprise one or more machine-executable operations and/or steps.
- the disclosed operations and/or steps may be embodied as program code stored on a computer readable storage medium.
- embodiments of the methods disclosed herein may be embodied as a computer program product comprising a computer readable storage medium storing computer usable program code executable to cause a computing device to perform one or more method operations and/or steps.
- Embodiments of the disclosed method may comprise a computing device providing an address space of a storage device, the address space configured such that at least two or more addresses of the address space are associated with a different physical storage capacity, and allocating one of the at least two or more addresses to a storage client in response to a storage request.
- the allocation granularity may pertain to allocation of logical addresses within the address space. Alternatively, or in addition, the allocation granularity may pertain to data segment size corresponding to one or more logical address of the address space.
- the method may further comprise allocating a logical identifier within a first section of the address space corresponding to a first sector size on the storage device and allocating a logical identifier within a different section of the address space corresponding to a different sector size on the storage device.
- the method includes allocating storage resources within a selected section of the address space in response to a request from a storage client, and selecting the section based on one or more of: a size of data associated with the request, a size of a data structure associated with the request, a size of a storage entity associated with the request, a file associated with the request, an application associated with the request, a parameter of the request, a storage client associated with the request, an input/output (I/O) control (ioctrl) parameter, an fadvise parameter, and availability of unallocated logical addresses within the sections.
- Dividing the address space may comprise partitioning logical addresses within the address space into an identifier portion and an offset portion, wherein relative sizes of the identifier portions to the offset portions vary between the sections.
- Some embodiments of the method may further comprise moving data stored on the storage device to a different section of the address space.
- Moving the data may comprise associating the data with a different logical address than a logical address stored with the data on the storage device and/or updating persistent metadata of the data to reference the different logical address in response to relocating the data on the storage device.
- an apparatus comprising a translation module configured to manage a logical address space of a storage device, a partitioning module configured to segment the logical address space into a plurality of different regions, the individual regions having a different respective allocation granularity, and an allocation module configured to allocate logical identifiers within the regions in accordance with the allocation granularities of the regions.
- the apparatus may further include an interface module configured to provide for specifying a region of the logical address space in which to perform one or more of an allocation operation and a storage operation.
- the interface module may be configured to provide information pertaining to the allocation granularities of one or more of the regions to a storage client.
- the different respective allocation granularities of the regions may pertain to one or more of: a logical identifier block size for allocation operations performed within the respective regions and a data sector size associated with the logical identifiers of the respective regions.
- the allocation module may be configured to associate logical identifiers within the logical address space with one of a plurality of data sector sizes.
- the apparatus includes a data read module configured to read a data segment associated with a logical identifier on the storage device, wherein a size of the data segment corresponds to a data sector size associated with the logical identifier.
- the disclosed apparatus also includes a reallocation module configured to reallocate a set of logical identifiers corresponding to data stored on the storage device to a different set of logical identifiers.
- the reallocation module may be configured to modify a size of a block of logical identifiers associated with the data, and the reallocation module may be configured to move one or more of the logical identifiers to another region of the logical address space. Each region may comprise one or more blocks of logical identifiers within the logical address space.
- the reallocation module may be configured to combine a plurality of blocks allocated within a first region of the logical address space into a single, larger block of logical identifiers within a different region of the logical address space.
- the reallocation module may be configured to reallocate a block of logical identifiers within a first region of the logical address space as one or more smaller blocks of logical identifiers within a different region of the logical address space.
- the apparatus may further include a log storage module configured to store data on the storage device in association with respective logical identifiers corresponding to the data.
- the reallocation module may be configured to modify the logical identifier associated with a data segment such that the logical identifier associated with the data segment on the storage device is inconsistent with the modified logical identifier.
- the apparatus may include a translation module configured to reference the data segment associated with the inconsistent logical identifier on the storage device by use of the modified logical identifier.
- the log storage module may be configured to store the data segment in association with the modified logical identifier on the storage device in response to grooming a storage division comprising the data segment.
- Disclosed herein are embodiments of a method comprising associating logical addresses of an address space with respective sector sizes, wherein the sector size associated with a logical address corresponds to a physical storage capacity on a storage device corresponding to the logical address, determining a sector size of one of the logical addresses in response to a request, and performing a storage operation on the storage device in accordance with the determined sector size.
- the method may further include selecting a sector size for the logical address based on one or more of a file associated with the logical address, an application associated with the logical address, the storage client associated with the logical address, an input/output (I/O) control parameter, and an fadvise parameter.
- I/O input/output
- the method further includes determining an available physical storage capacity of the storage device based on sector sizes of logical addresses of the address space that are associated with valid data on the storage device and/or assigning a different respective sector size to each of a plurality of segments of the address space, wherein determining the sector size of the logical address comprises associating the logical address with one of the segments.
- FIG. 1 is a block diagram of one embodiment of a storage system
- FIG. 2 is a block diagram of another embodiment of a storage system
- FIG. 3A is a block diagram of another embodiment of a storage system
- FIG. 3B depicts one example of a contextual data format
- FIG. 3C is a block diagram of an exemplary log storage format
- FIG. 3D depicts one embodiment of an index
- FIG. 4 is a block diagram of one embodiment of an apparatus to allocate data storage space
- FIG. 5 is a block diagram of another embodiment of an apparatus to allocate data storage space
- FIG. 6 is a flow diagram of one embodiment of a method for allocating data storage space
- FIG. 7 is a flow diagram of one embodiment of a method for servicing a physical capacity request
- FIG. 8 is a flow diagram of one embodiment of a method for reserving physical storage space
- FIG. 9 is a flow chart diagram of one embodiment of a method for binding allocated logical identifiers to media storage locations
- FIG. 10 is a flow diagram of another embodiment of a method for binding allocated logical identifiers to media storage locations
- FIG. 11 is a flow diagram of one embodiment of a method for servicing an allocation query at a storage device
- FIG. 12 is a schematic diagram of exemplary embodiments of indexes to associate logical identifiers with storage locations of a storage device
- FIG. 13 is a schematic diagram of exemplary embodiments of indexes to associate logical identifiers with storage locations of a storage device
- FIG. 14 depicts an example of an index for maintaining unallocated logical capacity
- FIG. 15 is a flow diagram of one embodiment of a method for allocating a storage device
- FIG. 16 is a flow diagram of one embodiment of a method for allocating a storage device
- FIG. 17 is a schematic diagram of exemplary embodiments of storage metadata
- FIG. 18 is a schematic diagram of exemplary embodiments of physical reservation metadata
- FIG. 19A depicts a logical identifier that has been segmented into a first portion and a second portion
- FIG. 19B is a schematic diagram of exemplary embodiments of storage metadata for segmented logical identifiers
- FIG. 19C is a schematic diagram of exemplary embodiments of physical reservation metadata for segmented logical identifiers
- FIG. 20A is a schematic diagram of exemplary embodiments of a file system storage client accessing a storage layer using segmented logical identifiers
- FIG. 20B is a schematic diagram of exemplary embodiments of a file system storage client accessing a storage layer using segmented logical identifiers
- FIG. 21 is a flow diagram of one embodiment of a method for providing a storage layer
- FIG. 22 is a flow diagram of one embodiment of a method for segmenting logical identifiers of a logical address space
- FIG. 23 is a flow diagram of one embodiment of a method for providing crash recovery and data integrity in a storage layer
- FIG. 24A is a flow diagram of one embodiment of a method for servicing queries pertaining to the status of a logical identifier
- FIG. 24B is a flow diagram of one embodiment of a method of servicing queries pertaining to a media storage location
- FIG. 25A depicts one embodiment of a contextual, log-based data format
- FIG. 25B depicts one embodiment of a persistent note
- FIG. 25C is a flow diagram of one embodiment of a method for designating ephemeral data
- FIG. 26 is a flow diagram of one embodiment of a method reconstructing storage metadata and/or determining the status of media storage locations using a contextual, log-based data format
- FIG. 27 is a flow diagram of one embodiment of a method ordering storage operations using barriers
- FIGS. 28A-E depict embodiments of clone operations
- FIG. 28F depicts another embodiment of a storage layer
- FIGS. 28G-J depict embodiments of clone operations using reference entries
- FIG. 28K depicts one embodiment of an indirection layer
- FIG. 28L depicts one embodiment of a clone operation performed using intermediate mapping layers
- FIG. 29M depicts one embodiment of a deduplication operation
- FIG. 28N depicts embodiments of snapshot operations
- FIGS. 28O-S depict embodiments of range move operations
- FIG. 29A depicts one embodiment of a storage layer configured to perform logical storage operations for a file system
- FIG. 29B depicts one embodiment of a storage layer configured to implement mmap checkpoints
- FIG. 29C depicts one embodiment of storage layer configured to implement atomic storage operations
- FIG. 30 is a flow diagram of one embodiment of a method for managing a logical interface of data storage in a contextual format on a non-volatile storage media
- FIG. 31 is a flow diagram of one embodiment of a method for managing a logical interface of contextual data
- FIG. 32 is a flow diagram of another embodiment of a method managing a logical interface of contextual data
- FIGS. 33A-B depict exemplary clone operations
- FIG. 34 is a flow diagram of one embodiment of a method for managing a clone of contextual data
- FIG. 35 is a flow diagram of one embodiment of a method for folding a clone of contextual data
- FIG. 36A depicts another embodiment a storage layer
- FIG. 36B depicts one embodiment of a logical address space comprising a plurality of allocation regions
- FIG. 36C depicts another embodiment of a logical address space comprising a plurality of allocation regions
- FIG. 37A depicts one example of a move operation within a segmented logical address space
- FIG. 37B depicts one example of an allocation operation within a segmented logical address space
- FIG. 37C depicts another example of an allocation operation within a segmented logical address space
- FIGS. 38-41 are flow diagrams of embodiments of methods for managing storage allocation.
- a storage layer manages one or more storage devices.
- the storage device(s) may comprise non-volatile storage devices, such as solid-state storage device(s), that are arranged and/or partitioned into a plurality of addressable, media storage locations.
- a media storage location refers to any physical unit of storage (e.g., any physical storage media quantity on a storage device).
- Media storage units may include, but are not limited to: pages, storage divisions, erase blocks, sectors, blocks, collections or sets of physical storage locations (e.g., logical pages, logical erase blocks, etc., described below), or the like.
- the storage layer may be configured to present a logical address space to one or more storage clients.
- a logical address space refers to a logical representation of storage resources.
- the logical address space may comprise a plurality (e.g., range) of logical identifiers.
- a logical identifier refers to any identifier for referencing a storage resource (e.g., data), including, but not limited to: a logical block address (“LBA”), a cylinder/head/sector (“CHS”) address, a file name, an object identifier, an inode, a Universally Unique Identifier (“UUID”), a Globally Unique Identifier (“GUID”), a hash code, a signature, an index entry, a range, an extent, or the like.
- LBA logical block address
- CHS cylinder/head/sector
- UUID Universally Unique Identifier
- GUID Globally Unique Identifier
- hash code a signature
- an index entry a range, an extent, or the like.
- a logical interface refers to a handle, identifier, path, process, or other mechanism for referencing and/or interfacing with a storage resource.
- a logical interface may include, but is not limited to: a LID, a range or extent of LIDs, a reference to a LID (e.g., a link between LIDs, a pointer to a LID, etc.), a reference to a virtual storage unit, or the like.
- a logical interface may be used to reference data through a storage interface and/or application programming interface (“API”).
- API application programming interface
- the storage layer may maintain storage metadata, such as a forward index, to map LIDs of the logical address space to media storage locations on the storage device(s).
- the storage layer may provide for arbitrary, “any-to-any” mappings to physical storage resources. Accordingly, there may be no pre-defined and/or pre-set mappings between LIDs and particular media storage locations and/or media addresses.
- a media address refers to an address of a storage resource that uniquely identifies one storage resource from another to a controller that manages a plurality of storage resources
- a media address includes, but is not limited to: the address of a media storage location, a physical storage unit, a collection of physical storage units (e.g., a logical storage unit), a portion of a media storage unit (e.g., a logical storage unit address and offset, range, and/or extent), or the like.
- the storage layer may map LIDs to physical data resources of any size and/or granularity, which may or may not correspond to the underlying data partitioning scheme of the storage device(s).
- the storage controller is configured to store data within logical storage units that are formed by logically combining a plurality of physical storage units, which may allow the storage controller to support many different virtual storage unit sizes and/or granularities.
- a logical storage element refers to a set of two or more non-volatile storage elements that are or are capable of being managed in parallel (e.g., via an I/O and/or control bus).
- a logical storage element may comprise a plurality of logical storage units, such as logical pages, logical storage divisions (e.g., logical erase blocks), and so on.
- Each logical storage unit may be comprised of storage units on the non-volatile storage elements in the respective logical storage element.
- a logical storage unit refers to logical construct combining two or more physical storage units, each physical storage unit on a respective solid-state storage element in the respective logical storage element (each solid-state storage element being accessible in parallel).
- a logical storage division refers to a set of two or more physical storage divisions, each physical storage division on a respective solid-state storage element in the respective logical storage element.
- the logical address space presented by the storage layer may have a logical capacity, which may comprise a finite set or range of LIDs.
- the logical capacity of the logical address space may correspond to the number of available LIDs in the logical address space and/or the size and/or granularity of the data referenced by the LIDs.
- the logical capacity of a logical address space comprising 2 ⁇ 32 unique LIDs, each referencing 2048 bytes (2 kb) of data may be 2 ⁇ 43 bytes.
- the logical address space may be “thinly provisioned.”
- a thinly provisioned logical address space refers to a logical address space having a logical capacity that exceeds the physical storage capacity of the underlying storage device(s).
- the storage layer may present a 64-bit logical address space to the storage clients (e.g., a logical address space referenced by 64-bit LIDs), which exceeds the physical storage capacity of the underlying storage devices.
- the large logical address space may allow storage clients to allocate and/or reference contiguous ranges of LIDs, while reducing the chance of naming conflicts.
- the storage layer may leverage the “any-to-any” mappings between LIDs and physical storage resources to manage the logical address space independently of the underlying physical storage devices. For example, the storage layer may add and/or remove physical storage resources seamlessly, as needed, and without changing the logical interfaces used by the storage clients.
- the storage layer may be configured to store data in a contextual format.
- a contextual format refers to a “self-describing” data format in which persistent metadata is associated with the data on the physical storage media (e.g., stored with the data in a packet, or other data structure).
- the persistent metadata provides context for the data with which it is stored.
- the persistent metadata uniquely identifies the data with which the persistent metadata is stored.
- the persistent metadata may uniquely identify a sector of data owned by a storage client from other sectors of data owned by the storage client.
- the persistent metadata identifies an operation that is performed on the data.
- the persistent metadata identifies an order of a sequence of operations performed on the data.
- the persistent metadata identifies security controls, a data type, or other attributes of the data. In certain embodiment, the persistent metadata identifies at least one of a plurality of aspects, including data type, a unique data identifier, an operation, and an order of a sequence of operations performed on the data.
- the persistent metadata may include, but is not limited to: a logical interface of the data, an identifier of the data (e.g., a LID, file name, object id, label, unique identifier, or the like), reference(s) to other data (e.g., an indicator that the data is associated with other data), a relative position or offset of the data with respect to other data (e.g., file offset, etc.), data size and/or range, and the like.
- the contextual data format may comprise a packet format comprising a data segment and one or more headers. Alternatively, a contextual data format may associate data with context information in other ways (e.g., in a dedicated index on the non-volatile storage media, a storage division index, or the like).
- a contextual data format refers to a data format that associates the data with a logical interface of the data (e.g., the “context” of the data).
- a contextual data format is self-describing in that the contextual data format includes the logical interface of the data.
- the contextual data format may allow data context to be determined (and/or reconstructed) based upon the contents of the non-volatile storage media, and independently of other storage metadata, such as the arbitrary, “any-to-any” mappings discussed above. Since the media storage location of data is independent of the logical interface of the data, it may be inefficient (or impossible) to determine the context of data based solely upon the media storage location or media address of the data. Storing data in a contextual format on the non-volatile storage media may allow data context to be determined without reference to other storage metadata. For example, the contextual data format may allow the logical interface of data to be reconstructed based only upon the contents of the non-volatile storage media (e.g., reconstruct the “any-to-any” mappings between LID and media storage location).
- the storage controller may be configured to store data on an asymmetric, write-once storage media, such as solid-state storage media.
- a “write once” storage media refers to a storage media that is reinitialized (e.g., erased) each time new data is written or programmed thereon.
- asymmetric storage media refers to storage media having different latencies for different storage operations. Many types of solid-state storage media are asymmetric; for example, a read operation may be much faster than a write/program operation, and a write/program operation may be much faster than an erase operation (e.g., reading the media may be hundreds of times faster than erasing, and tens of times faster than programming the media).
- the storage media may be partitioned into storage divisions that can be erased as a group (e.g., erase blocks) in order to, inter alia, account for the asymmetric properties of the media.
- modifying a single data segment “in-place” may require erasing the entire erase block comprising the data, and rewriting the modified data to the erase block, along with the original, unchanged data. This may result in inefficient “write amplification,” which may excessively wear the media.
- the storage controller may be configured to write data “out-of-place.”
- writing data “out-of-place” refers to writing data to different media storage location(s) rather than overwriting the data “in-place” (e.g., overwriting the original physical location of the data). Modifying data “out-of-place” may avoid write amplification, since existing, valid data on the erase block with the data to be modified need not be erased and recopied. Moreover, writing data “out-of-place” may remove erasure from the latency path of many storage operations (the erasure latency is no longer part of the “critical path” of a write operation).
- the storage controller may comprise one or more processes that operate outside of the regular path for servicing of storage operations (the “path” for performing a storage operation and/or servicing a storage request).
- the “regular path for servicing a storage request” or “path for servicing a storage operation” (also referred to as a “critical path”) refers to a series of processing operations needed to service the storage operation or request, such as a read, write, modify, or the like.
- the path for servicing a storage request may comprise receiving the request from a storage client, identifying the logical interface of the request (e.g., LIDs pertaining to the request), performing one or more storage operations on a non-volatile storage media, and returning a result, such as acknowledgement or data.
- Processes that occur outside of the path for servicing storage requests may include, but are not limited to: a groomer, deduplication, and so on. These processes may be implemented autonomously, and in the background from servicing storage requests, such that they do not interfere with or impact the performance of other storage operations and/or requests. Accordingly, these processes may operate independent of servicing storage requests.
- the storage controller comprises a groomer, which is configured to reclaim storage divisions (erase blocks) for reuse.
- the write out-of-place paradigm implemented by the storage controller may result in obsolete or invalid data (data that has been erased, modified, and/or overwritten) remaining on the storage device. For example, overwriting data X with data Y may result in storing Y on a new storage division (rather than overwriting X in place), and updating the “any-to-any” mappings of the storage metadata to identify Y as the valid, up-to-date version of the data.
- the obsolete version of the data X may be marked as “invalid,” but may not be immediately removed (e.g., erased), since, as discussed above, erasing X may involve erasing an entire storage division, which is a time-consuming operation and may result in write amplification. Similarly, data that is no longer is use (e.g., deleted or trimmed data) may not be immediately removed.
- the non-volatile storage media may accumulate a significant amount of “invalid” data.
- a groomer process may operate outside of the “critical path” for servicing storage operations. The groomer process may reclaim storage divisions so that they can be reused for other storage operations.
- reclaiming a storage division refers to erasing the storage division so that new data may be stored/programmed thereon.
- Reclaiming a storage division may comprise relocating valid data on the storage division to a new storage location.
- the groomer may identify storage divisions for reclamation based upon one or more factors, which may include, but are not limited to: the amount of invalid data in the storage division, the amount of valid data in the storage division, wear on the storage division (e.g., number of erase cycles), time since the storage division was programmed or refreshed, and so on.
- the storage controller may be further configured to store data in a log format.
- a log format refers to a data format that defines an ordered sequence of storage operations performed on a non-volatile storage media.
- the log format comprises storing data in a pre-determined sequence within the media address space of the non-volatile storage media (e.g., sequentially within pages and/or erase blocks of the media).
- the log format may further comprise associating data (e.g., each packet or data segment) with respective sequence indicators.
- the sequence indicators may be applied to data individually (e.g., applied to each data packet) and/or to data groupings (e.g., packets stored sequentially on a storage division, such as an erase block).
- sequence indicators may be applied to storage divisions when the storage divisions are reclaimed (e.g., erased), as described above, and/or when the storage divisions are first used to store data.
- the log format may comprise storing data in an “append only” paradigm.
- the storage controller may maintain a current append point within a media address space of the storage device.
- the append point may be a current storage division and/or offset within a storage division.
- Data may then be sequentially appended from the append point.
- the sequential ordering of the data therefore, may be determined based upon the sequence indicator of the storage division of the data in combination with the sequence of the data within the storage division.
- the storage controller may identify the “next” available storage division (the next storage division that is initialized and ready to store data).
- the groomer may reclaim storage divisions comprising invalid, stale, and/or deleted data, to ensure that data may continue to be appended to the media log.
- the log format described herein may allow valid data to be distinguished from invalid data based upon the contents of the non-volatile storage media, and independently of the storage metadata. As discussed above, invalid data may not be removed from the storage media until the storage division comprising the data is reclaimed. Therefore, multiple “versions” of data having the same context may exist on the non-volatile storage media (e.g., multiple versions of data having the same logical interface and/or same LID).
- the sequence indicators associated with the data may be used to distinguish “invalid” versions of data from the current, up-to-date version of the data; the data that is the most recent in the log is the current version, and all previous versions may be identified as invalid.
- a logical interface of data stored in a contextual format is modified.
- the contextual format of the data may be inconsistent with the modified logical interface.
- an inconsistent contextual data format refers to a contextual data format that defines a logical interface to data on storage media that is inconsistent with the logical interface of the data.
- the logical interface of the data may be maintained by a storage layer, storage controller, or other module.
- the inconsistency may include, but is not limited to: the contextual data format associating the data with a different LID than the logical interface; the contextual data format associating the data with a different set of LIDs than the logical interface; the contextual data format associating the data with a different LID reference than the logical interface; or the like.
- the storage controller may provide access to the data in the inconsistent contextual format and may update the contextual format of the data of the non-volatile storage media to be consistent with the modified logical interface.
- the update may require rewriting the data out-of-place and, as such, may be deferred.
- a consistent contextual data format refers to a contextual data format that defines the same (or an equivalent) logical interface as the logical interface of the data, which may include, but is not limited to: the contextual data format associating the data with the same LID(s) (or equivalent LID(s)) as the logical interface; the contextual data format associating the LID with the same set of LIDs as the logical interface; the contextual data format associating the data with the same reference LID as the logical interface; or the like.
- a storage controller and/or storage layer performs a method for managing a logical address space, comprising: modifying a logical interface of data stored in a contextual format on a non-volatile storage media, wherein the contextual format of the data on the non-volatile storage media is inconsistent with the modified logical interface of the data; accessing the data in the inconsistent contextual format through the modified logical interface; and updating the contextual format of the data on the non-volatile storage media to be consistent with the modified logical interface.
- the logical interface of the data may be modified in response to a request (e.g., a request from a storage client).
- the request may comprise a move, clone (e.g., copy), deduplication, or the like.
- the request may “return” (e.g., be acknowledged by the storage layer) before the contextual format of the data is updated on the non-volatile storage media.
- Modifying the logical interface may further comprise storing a persistent note on the non-volatile storage media indicative of the modification to the logical interface (e.g., associate the data with the modified logical interface).
- the contextual format of the data may be updated out-of-place, at other media storage locations on the non-volatile storage media. Updates to the contextual format may be deferred and/or made outside of the path of other storage operations (e.g., independent of servicing other storage operations and/or requests). For example, the contextual format of the data may be updated as part of a grooming process.
- data that is in an inconsistent contextual format may be identified and updated as the data is relocated to new media storage locations.
- Providing access to the data through the modified logical interface may comprise referencing the data in the inconsistent contextual format through one or more reference entry and/or indirect entries in an index.
- FIG. 1 is a block diagram of one embodiment of a system 100 for allocating storage resources.
- the storage system 102 comprises a storage controller 140 and storage layer 130 , which may be configured for operation on a computing device 110 .
- the computing device 110 may comprise a processor 111 , volatile memory 112 , a communication interface 113 , and the like.
- the processor 111 may comprise one or more central processing units, one or more general-purpose processors, one or more application-specific processors, one or more virtual processors (e.g., the computing device 110 may be a virtual machine operating within a host), one or more processor cores, or the like.
- the communication interface 113 may comprise one or more network interfaces configured to communicatively couple the computing device 110 (and/or storage layer 130 ) to a communication network, such as an Internet Protocol network, a Storage Area Network, or the like.
- the computing device 110 may further comprise machine-readable storage media 114 .
- the machine-readable storage media 114 may comprise machine-executable instructions configured to cause the computing device 110 (e.g., processor 111 ) to perform steps of one or more of the methods disclosed herein.
- the storage layer 130 and/or one or more modules thereof may be embodied as one or more machine-readable instructions stored on the non-transitory storage media 114 .
- the machine-readable storage medium 114 may comprise one or more persistent, non-transitory storage devices.
- the storage layer 130 may be configured to provide storage services to one or more storage clients 116 .
- the storage clients 116 may include local storage clients 116 operating on the computing device 110 and/or remote, storage clients 116 accessible via the network 115 (and communication interface 113 ).
- the storage clients 116 may include, but are not limited to: operating systems, file systems, database applications, server applications, kernel-level processes, user-level processes, applications, and the like.
- the storage layer 130 comprises and/or is communicatively coupled to one or more storage devices 120 A-N.
- the storage devices 120 A-N may include different types of storage devices including, but not limited to: solid-state storage devices, hard drives, SAN storage resources, or the like.
- the storage devices 120 A-N may comprise respective controllers 126 A-N and non-volatile storage media 122 A-N.
- the storage layer 130 may comprise an interface 138 configured to provide access to storage services and/or metadata 135 maintained by the storage layer 130 .
- the interface 138 may be comprise, but is not limited to: a block I/O interface 131 , a virtual storage interface 132 , a cache interface 133 , and the like.
- Storage metadata 135 may be used to manage and/or track storage operations performed through any of the block I/O interface 131 , virtual storage interface 132 , cache interface 133 , or other, related interfaces.
- the cache interface 133 may expose cache-specific features accessible through the storage layer 130 .
- the virtual storage interface 132 presented to the storage clients 116 provides access to data transformations implemented by the non-volatile storage device 120 and/or the non-volatile storage media controller 126 .
- the storage layer 130 may provide storage services through one or more interfaces, which may include, but are not limited to: a block I/O interface, an extended virtual storage interface, a cache interface, and the like.
- the storage layer 130 may present a logical address space 136 to the storage clients 116 through one or more interfaces.
- the logical address space 136 may comprise a plurality of LIDs, each corresponding to respective media storage locations on one or more of the storage devices 120 A-N.
- the storage layer 130 may maintain storage metadata 135 comprising “any-to-any” mappings between LIDs and media storage locations, as described above.
- the logical address space 136 and storage metadata 135 may, therefore, define a logical interface of data stored on the storage devices 120 A-N.
- the storage layer 130 may further comprise a log storage module 137 that is configured to store data in a contextual, log format.
- the contextual, log data format may comprise associating data with persistent metadata, such as the logical interface of the data (e.g., LID), or the like.
- the contextual, log format may further comprise associating data with respective sequence identifiers on the non-volatile storage media 122 A-N, which define an ordered sequence of storage operations performed on the storage devices 120 A-N, as described above.
- the storage layer 130 may further comprise a storage device interface 139 configured to transfer data, commands, and/or queries to the storage devices 120 A-N over a bus 127 , which may include, but is not limited to: a peripheral component interconnect express (“PCI Express” or “PCIe”) bus, a serial Advanced Technology Attachment (“ATA”) bus, a parallel ATA bus, a small computer system interface (“SCSI”), FireWire, Fibre Channel, a Universal Serial Bus (“USB”), a PCIe Advanced Switching (“PCIe-AS”) bus, a network, Infiniband, SCSI RDMA, or the like.
- PCI Express peripheral component interconnect express
- ATA serial Advanced Technology Attachment
- SCSI small computer system interface
- USB Universal Serial Bus
- PCIe Advanced Switching PCIe-AS
- the storage device interface 139 may communicate with the storage devices 120 A-N using input-output control (“IO-CTL”) command(s), IO-CTL command extension(s), remote direct memory access, or the like.
- the non-volatile storage devices 120 A-N may comprise non-volatile storage media 122 A-N, which may include but is not limited to: NAND flash memory, NOR flash memory, nano random access memory (“nano RAM” or “NRAM”), magneto-resistive RAM (“MRAM”), dynamic RAM (“DRAM”), phase change RAM (“PRAM”), magnetic storage media (e.g., hard disk, tape), optical storage media, or the like.
- NAND flash memory NOR flash memory
- nano random access memory nano random access memory
- MRAM magneto-resistive RAM
- DRAM dynamic RAM
- PRAM phase change RAM
- magnetic storage media e.g., hard disk, tape
- optical storage media e.g., optical storage media, or the like.
- Portions of the storage layer 130 may be implemented by use of one or more drivers, kernel-level applications, user-level applications, and the like, which may be configured to operate within an operating system, guest operating system (e.g., in a virtualized computing environment), or the like.
- Other portions of the storage layer 130 may be implemented by use of hardware components, such as one or more controllers, Field-Programmable Gate Arrays (“FPGAs”), Application-Specific Integrated Circuits (“ASICs”), and/or the like.
- FPGAs Field-Programmable Gate Arrays
- ASICs Application-Specific Integrated Circuits
- the storage layer 130 may present a logical address space 136 to the storage clients 116 (through one or more of the interfaces 131 , 132 , and/or 133 of the interface 138 ).
- the storage layer 130 may maintain storage metadata 135 comprising “any-to-any” mappings between LIDs in the logical address space 136 and media storage locations on one or more non-volatile storage devices 120 A-N.
- the storage layer 130 may further comprise a log storage module 137 configured to store data on the storage device(s) 120 A-N in a contextual, log format.
- the contextual, log data format may comprise storing data in association with persistent metadata, such as the logical interface of the data.
- the contextual, log format may further comprise associating data with respective sequence identifiers that define an ordered sequence of storage operations performed through the storage layer 130 .
- FIG. 2 depicts another embodiment of a system 200 comprising a storage controller 140 configured to write and/or read data in a contextual, log-based format.
- the system 200 may comprise a non-volatile storage device 120 comprising non-volatile storage media 222 .
- the non-volatile storage media 222 may comprise a plurality of non-volatile storage elements 223 , which may be communicatively coupled to the storage media controller 126 via a bus 127 .
- the storage media controller 126 may manage groups of non-volatile storage elements 223 (logical storage elements 229 ).
- the storage media controller 126 may comprise a storage request receiver module 228 configured to receive storage requests from the storage layer 130 via a bus 127 .
- the storage request receiver 228 may be further configured to transfer data to/from the storage layer 130 and/or storage clients 116 via the bus 127 .
- the storage request receiver module 228 may comprise one or more direct memory access (“DMA”) modules, remote DMA modules, bus controllers, bridges, buffers, and so on.
- DMA direct memory access
- the storage media controller 126 may comprise a data write module 240 that is configured to store data on the non-volatile storage media 222 in a contextual format.
- the requests may include and/or reference data to be stored on the non-volatile storage media 222 , may include logical interface of the data (e.g., LID(s) of the data), and so on.
- the data write module 240 may comprise a contextual write module 242 and a write buffer 244 .
- the contextual format may comprise storing a logical interface of the data (e.g., LID of the data) in association with the data on the non-volatile storage media 222 .
- the contextual write module 242 is configured to format data into packets, and may include the logical interface of the data in a packet header (or other packet field).
- the write buffer 244 may be configured to buffer data for storage on the non-volatile storage media 222 .
- the data packets may comprise an arbitrary amount data.
- the write buffer 244 may comprise one or more synchronization buffers to synchronize a clock domain of the storage media controller 126 with a clock domain of the non-volatile storage media 222 (and/or bus 127 ).
- the data write module 240 may be configured to store data in arbitrarily-sized structures (packets) on the non-volatile storage media 222 .
- the log storage module 248 may be configured to select media storage location(s) for the data and may provide addressing and/or control information to the non-volatile storage elements 223 via the bus 127 .
- the log storage module 248 is configured to store data sequentially in a log format within the media address space of the non-volatile storage media.
- the log storage module 248 may be further configured to groom the non-volatile storage media, as disclosed above.
- the storage media controller 126 may be configured to update storage metadata 135 (e.g., a forward index) to associate the logical interface of the data (e.g., the LIDs of the data) with the media address(es) of the data on the non-volatile storage media 222 .
- the storage metadata 135 may be maintained on the storage media controller 126 ; for example, the storage metadata 135 may be stored on the non-volatile storage media 222 , on a volatile memory (not shown), or the like.
- the storage metadata 135 may be maintained within the storage layer 130 (e.g., on a volatile memory 112 of the computing device 110 of FIG. 1 ).
- the storage metadata 135 may be maintained in a volatile memory by the storage layer 130 , and may be periodically stored on the non-volatile storage media 222 .
- the storage media controller 126 may further comprise a data read module 241 that is configured to read contextual data from the non-volatile storage media 222 in response to requests received via the storage request receiver module 228 .
- the requests may comprise a LID of the requested data, a media address of the requested data, and so on.
- the contextual read module 243 may be configured to read data stored in a contextual format from the non-volatile storage media 222 and to provide the data to the storage layer 130 and/or a storage client 116 .
- the contextual read module 243 may be configured to determine the media address of the data using a logical interface of the data and the storage metadata 135 .
- the storage layer 130 may determine the media address of the data and may include the media address in the request.
- the log storage module 248 may provide the media address to the non-volatile storage elements 223 , and the data may stream into the data read module 241 via the read buffer 245 .
- the read buffer 245 may comprise one or more read synchronization buffers for clock domain synchronization, as described above.
- the storage media controller 126 may further comprise a multiplexer 249 that is configured to selectively route data and/or commands to/from the data write module 240 and the data read module 241 .
- storage media controller 126 may be configured to read data while filling the write buffer 244 and/or may interleave one or more storage operations on one or more banks of non-volatile storage elements 223 (not shown).
- FIG. 3A is a block diagram depicting another embodiment of a storage layer 130 .
- the non-volatile storage elements 223 may be partitioned into storage divisions (e.g., erase blocks) 251 , and each storage division 251 may be partitioned into a physical storage units (e.g., pages) 252 .
- An exemplary physical storage unit (page) 251 may be capable of storing 2048 bytes (“2 kb”).
- Each non-volatile storage element 223 may further comprise one or more registers for buffering data to be written to a page 251 and/or data read from a page 251 .
- the non-volatile storage elements 223 may be further arranged into a plurality of independent banks (not shown).
- the storage media controller 126 may manage the non-volatile storage elements 223 as a logical storage element 229 .
- the logical storage element 229 may be formed by coupling the non-volatile storage elements 223 in parallel using the bus 127 . Accordingly, storage operations may be performed on the non-volatile storage elements 223 concurrently, and in parallel (e.g., data may be written to and/or read from the non-volatile storage elements 223 in parallel).
- the logical storage element 229 may comprise a plurality of logical storage divisions (e.g., logical erase blocks) 253 ; each comprising a respective storage division of the non-volatile storage elements 223 .
- the logical storage divisions 253 may comprise a plurality of logical storage units (e.g., logical pages) 254 ; each comprising a respective physical storage unit of the non-volatile storage elements 223 .
- the storage capacity of a logical storage unit 253 may be a multiple of the number of parallel non-volatile storage elements 223 comprising the logical storage unit 253 ; for example, the capacity of a logical storage element comprised of 2 kb pages on 25 non-volatile storage elements 223 is 50 kb. In other embodiments, comprising 25 non-volatile storage elements 223 having a 8 kb page size, the logical page may have a storage capacity of 200 kb.
- the storage controller 140 may be configured to store data within large constructs, such as logical storage divisions 253 and/or logical storage units 254 , formed from plurality non-volatile storage elements 123 .
- the storage controller 140 may, therefore, be capable of handling data storage operations of different sizes, independent of the underlying physical partitioning and/or arrangement of the non-volatile storage elements 123 .
- the storage layer 130 may be configured to store data in 16 kb segments (sectors) within logical pages 254 , despite the fact that the page size of the underlying non-volatile storage elements is only 2 kb.
- FIG. 3A depicts a particular embodiment of a logical storage element 229
- the disclosure is not limited in this regard and could be adapted to differently sized logical storage elements 229 comprising any number of non-volatile storage elements 223 .
- the size and number of erase blocks, pages, planes, or other logical and physical divisions within the non-volatile storage elements 223 are expected to change over time with advancements in technology; it is to be expected that many embodiments consistent with new configurations are possible and are consistent with the embodiments disclosed herein.
- the contextual write module 242 may be configured to store data in a contextual format.
- the contextual format comprises a packet format.
- FIG. 3B depicts one example of a contextual data format (packet format 360 ).
- a packet 360 includes data (e.g., a data segment 362 ) that is associated with one or more LIDs.
- the data segment 362 comprises compressed, encrypted, and/or whitened data.
- the data segment 362 may be a predetermined size (e.g., a fixed data “block” or “segment” size) or a variable size.
- the packet 360 may comprise persistent metadata 364 that is stored on the non-volatile storage media 222 with the data segment 362 (e.g., in a header of the packet format 360 as depicted in FIG. 3B ).
- the persistent metadata 364 may include logical interface metadata 365 that defines the logical interface of the data segment 362 .
- the logical interface metadata 365 may associate the data segment 362 with one or more LIDs, LID references (e.g., reference entries), a range, a size, and so on.
- the logical interface metadata 365 may be used to determine the context of the data independently of the storage metadata 135 and/or may be used to reconstruct the storage metadata 135 (e.g., reconstruct the “any-to-any” mappings, described above).
- the persistent metadata 364 may comprise other metadata, which may include, but are not limited to: data attributes (e.g., an access control list), data segment delimiters, signatures, links, metadata flags 367 (described below), and the like.
- the packet 360 may be associated with log sequence indicator 368 .
- the log sequence indicator 368 may be persisted on the non-volatile storage media (e.g., page) with the data packet 360 and/or on the storage division (e.g., erase block) of the data packet 360 . Alternatively, the sequence indicator 368 may be persisted in a separate storage division.
- a sequence indicator 368 is applied when a storage division reclaimed (e.g., erased, when the first or last storage unit is programmed, etc.).
- the log sequence indicator 368 may be used to determine an order of the packet 360 in a sequence of storage operations performed on the non-volatile storage media 222 , as described above.
- the contextual write module 242 may be configured to generate data packets of any suitable size.
- Data packets may be of a fixed size or a variable size. Due to the independence between the logical interface of data and the underlying media storage location of the data, the size of the packets generated by the contextual write module 242 may be independent of the underlying structure and/or partitioning of the non-volatile storage media 222 .
- the data write module 240 may further comprise an ECC write module 346 , which may be configured to encode the contextual data (e.g., data packets) into respective error-correcting code (“ECC”) words or chunks.
- ECC error-correcting code
- the ECC encoding may be configured to detect and/or correct errors introduced through transmission and storage of data on the non-volatile storage media 222 .
- data packets stream to the ECC write module 346 as un-encoded blocks of length N (“ECC blocks”).
- An ECC block may comprise a single packet, multiple packets, or a portion of one or more packets.
- the ECC write module 346 may calculate a syndrome of length S for the ECC block, which may be appended and streamed as an ECC chunk of length N+S.
- N and S may be selected according to testing and experience and may be based upon the characteristics of the non-volatile storage media 222 (e.g., error rate of the media 222 ) and/or performance, efficiency, and robustness constraints.
- the relative size of N and S may determine the number of bit errors that can be detected and/or corrected in an ECC chunk.
- a packet may comprise more than one ECC block; the ECC block may comprise more than one packet; a first packet may end anywhere within the ECC block, and a second packet may begin after the end of the first packet within the same ECC block.
- the ECC algorithm implemented by the ECC write module 346 and/or ECC read module 347 may be dynamically modified and/or may be selected according to a preference (e.g., communicated via the bus 127 ), in a firmware update, a configuration setting, or the like.
- the ECC read module 347 may be configured to decode ECC chunks read from the non-volatile storage medium 222 . Decoding an ECC chunk may comprise detecting and/or correcting errors therein.
- the contextual read module 243 may be configured to depacketize data packets read from the non-volatile storage media 222 . Depacketizing may comprise removing and/or validating contextual metadata of the packet, such as the logical interface metadata 365 , described above. In some embodiments, the contextual read module 243 may be configured to verify that the logical interface information in the packet matches a LID in the storage request.
- the log storage module 248 is configured to store contextual formatted data, sequentially, in a log format.
- log storage refers to storing data in a format that defines an ordered sequence of storage operation, which may comprise storing data at sequential media addresses within the media address space of the non-volatile storage media (e.g., sequentially within one logical storage units 254 ).
- sequential storage may refer to storing data in association with a sequence indicator, such as a sequence number, timestamp, or the like, such as the sequence indicator 368 , described above.
- the log storage module 248 may store data sequentially at an append point.
- An append point may be located where data from the write buffer 244 will next be written. Once data is written at an append point, the append point moves to the end of the data. This process typically continues until a logical erase block 253 is full. The append point is then advanced to the next available logical erase block 253 .
- the sequence of writing to logical erase blocks is maintained (e.g., using sequence indicators) so that if the storage metadata 135 is corrupted or lost, the log sequence of storage operations data be replayed to rebuild the storage metadata 135 (e.g., rebuild the “any-to-any” mappings of the storage metadata 135 ).
- FIG. 3C depicts one example of sequential, log-based data storage.
- FIG. 3C depicts a physical storage space 302 of a non-volatile storage media, such as the non-volatile storage media 222 of FIG. 3A .
- the physical storage space 302 is arranged into storage divisions (e.g., logical erase blocks 253 A-N), each of which can be initialized (e.g., erased) in a single operation.
- each logical erase block 253 A-N may comprise an erase block 251 of a respective non-volatile storage element 223
- each logical erase block 253 A-N may comprise a plurality of logical storage units (e.g., logical pages) 254 .
- each logical page 254 may comprise a page of a respective non-volatile storage element 223 .
- Storage element delimiters are omitted from FIG. 3C to avoid obscuring the details of the embodiment.
- the logical storage units 254 may be assigned respective media addresses; in the FIG. 3C example, the media addresses range from zero (0) to N.
- the log storage module 248 may store data sequentially, at the append point 380 ; data may be stored sequentially within the logical page 382 and, when the logical page 382 is full, the append point 380 advances 381 to the next available logical page in the logical erase block, where the sequential storage continues.
- Each logical erase block 253 A-N may comprise a respective sequence indicator. Accordingly, the sequential storage operations may be determined based upon the sequence indicators of the logical erase blocks 253 A-N, and the sequential order of data within each logical erase block 253 A-N.
- an “available” logical page refers to a logical page that has been initialized (e.g., erased) and has not yet been programmed. Some non-volatile storage media 222 can only be reliably programmed once after erasure. Accordingly, an available logical erase block may refer to a logical erase block that is in an initialized (or erased) state.
- the logical erase blocks 253 A-N may be reclaimed by a groomer (or other process), which may comprise erasing the logical erase block 253 A-N and moving valid data thereon (if any) to other storage locations. Reclaiming logical erase block 253 A-N may further comprise marking the logical erase block 253 A-N with a sequence indicator, as described above.
- the logical erase block 253 B may be unavailable for storage due to, inter alia: not being in an erased state (e.g., comprising valid data), being out-of service due to high error rates or the like, and so on.
- the append point 380 may skip the unavailable logical erase block 253 B, and continue at the next available logical erase block 253 C.
- the log storage module 248 may store data sequentially starting at logical page 383 , and continuing through logical page 385 , at which point the append point 380 continues at a next available logical erase block, as described above.
- the append point 380 After storing data on the “last” storage unit (e.g., storage unit N 389 of storage division 253 N), the append point 380 wraps back to the first division 253 A (or the next available storage division, if storage division 253 A is unavailable). Accordingly, the append point 380 may treat the media address space 302 as a loop or cycle.
- the “last” storage unit e.g., storage unit N 389 of storage division 253 N
- the append point 380 wraps back to the first division 253 A (or the next available storage division, if storage division 253 A is unavailable). Accordingly, the append point 380 may treat the media address space 302 as a loop or cycle.
- the storage controller 140 may be configured to modify and/or overwrite data out-of-place. Accordingly, a storage request to overwrite data A stored at physical storage location 391 with data A′ may be stored out-of-place on a different location (media address 393) within the physical address space 302 . Storing the data A′ may comprise updating the storage metadata 150 to associate A′ with the new media address 393 and/or to invalidate the data A at media address 391.
- the groomer module 370 may be configured to scan the physical address space 370 to reclaim storage resources comprising invalidated data that no longer needs to be preserved on the storage device 120 , such as the obsolete version of data A at media address 391.
- the storage metadata 135 may be reconstructed based on contextual, log-based storage format disclosed herein.
- the current version of data A′ may be distinguished from the obsolete data A based on log ordering information on the storage device 120 . Accordingly, the reconstructed index may identify the data A′ at media address 393 as the current, valid version of the data, and determine that the data A at media address 391 is obsolete and can be removed from the device.
- the storage controller 324 may comprise a groomer module 380 that is configured to reclaim logical erase blocks, as described above.
- the groomer module 380 may monitor the non-volatile storage media and/or storage metadata 135 to identify logical erase blocks 253 for reclamation.
- the groomer module 370 may reclaim logical erase blocks in response to detecting one or more conditions, which may include, but are not limited to: a lack of available storage capacity, detecting a percentage of data marked as invalid within a particular logical erase block 253 reaching a threshold, a consolidation of valid data, an error detection rate reaching a threshold, improving data distribution, data refresh, or the like.
- the groomer module 370 may operate outside of the path for servicing storage operations and/or requests. Therefore, the groomer module 370 may operate as an autonomous, background process, which may be suspended and/or deferred while other storage operations are in process.
- the groomer 370 may manage the non-volatile storage media 222 so that data is systematically spread throughout the logical erase blocks 253 , which may improve performance and data reliability and to avoid overuse and underuse of any particular storage locations, thereby lengthening the useful life of the solid-state storage media 222 (e.g., wear-leveling, etc.).
- the groomer module 370 is depicted in the storage layer 130 , the disclosure is not limited in this regard. In some embodiments, the groomer module 370 may operate on the storage media controller 126 , may comprise a separate hardware component, or the like.
- the groomer 370 may interleave grooming operations with other storage operations and/or requests. For example, reclaiming a logical erase block 253 may comprise relocating valid data thereon to another storage location.
- the groomer read and groomer write bypass modules 363 and 362 may be configured to allow data packets to be read into the data read module 241 and then be transferred directly to the data write module 240 without being routed out of the storage media controller 126 .
- the groomer read bypass module 363 may coordinate reading data to be relocated from a reclaimed logical erase block 253 .
- the groomer module 370 may be configured to interleave relocation data with other data being written to the non-volatile storage media 222 via the groomer write bypass 362 . Accordingly, data may be relocated without leaving the storage media controller 126 .
- the groomer module 370 may be configured to fill the remainder of a logical page (or other data storage primitive) with relocation data, which may improve groomer efficiency, while minimizing the performance impact of grooming operations.
- the storage layer 130 may further comprise a deduplication module 374 , which may be configured to identify duplicated data on the storage device 120 .
- the deduplication module 374 may be configured to identify duplicated data and to modify a logical interface of the data, such that one or more LIDs reference the same set of data on the storage device 120 as opposed to referencing separate copies of the data.
- the deduplication module 374 may operate outside of the path for servicing storage operations and/or requests, as described above.
- the storage controller may maintain an index corresponding to the logical address space 136 .
- FIG. 3D depicts one example of such an index 1204 .
- the index 1204 may comprise a one or more entries 1205 A-N. Each entry 1205 A may correspond to a LID (or LID range or extent) 1217 in the logical address space 136 .
- the entries 1205 A-N may represent LIDs that have been allocated for use by one or more storage clients 116 .
- the index 1204 may comprise “any-to-any” mappings between LIDs and media storage locations on one or more storage devices 120 .
- the entry 1205 B binds LIDs 072-083 to media storage locations 95-106.
- An entry 1205 D may represent a LID that has been allocated, but has not yet been used to store data, and as such, the LIDs may not be bound to any particular media storage locations (e.g., the LIDs 178-192 are “unbound”). As described above, deferring the allocation of physical storage resources may allow the storage controller 140 to more efficiently manage storage resources (e.g., prevent premature reservation of physical storage resources, so that the storage resources are available to other storage clients 116 ).
- One or more of the entries 1205 A-N may comprise additional metadata 1219 , which may include, but is not limited to: access control metadata (e.g., identify the storage client(s) authorized to access the entry), reference metadata, logical interface metadata, and so on.
- the index 1204 may be maintained by the storage layer 130 (e.g., translation module 134 ), and may be embodied as storage metadata 135 on a volatile memory 112 and/or a non-transitory machine-readable storage media 114 and/or 120 .
- the storage layer 130 e.g., translation module 134
- the index 1204 may be configured to provide for fast and efficient entry lookup.
- the index 1204 may be implemented using one or more datastructures, including, but not limited to: a B-tree, a content addressable memory (“CAM”), a binary tree, a hash table, or other datastructure that facilitates quickly searching a sparsely populated logical address space.
- the datastructure may be indexed by LID, such that, given a LID, the entry 1205 A-N corresponding to the LID (if any) can be identified in a computationally efficient manner.
- the index 1204 comprises one or more entries (not shown) to represent unallocated LIDs (e.g., LIDs that are available for allocation by one or more storage clients 116 ).
- the unallocated LIDs may be maintained in the index 1204 and/or in a separate index 1444 as depicted in FIG. 14 .
- the index 1204 may comprise one or more sub-indexes, such as a “reference index.”
- the reference index 1222 may comprise data that is being referenced by one or more other entries 1205 A-N in the index (e.g., indirect references).
- the storage layer 130 may be configured to incorporate any type of storage metadata embodied using any suitable datastructure.
- FIG. 4 is a schematic block diagram illustrating an embodiment of an apparatus 400 to allocate data storage space.
- the apparatus 400 includes an allocation request module 402 , a logical capacity module 404 , and an allocation reply module 406 , which are described below.
- the allocation request module 402 , the logical capacity module 404 , and the allocation reply module 406 are depicted in the storage layer 130 in general, but all or part of the allocation request module 402 , the logical capacity module 404 , and the allocation reply module 406 may be in a storage layer 130 , storage media controller 126 , or the like.
- the apparatus 400 includes an allocation request module 402 that receives from a requesting device an allocation request to allocate logical capacity.
- the requesting device may be storage client 116 , or any other device or component capable of sending an allocation request.
- the storage layer 130 may comprise and/or be communicatively coupled to one or more storage devices 120 (as depicted in FIG. 1 ).
- the logical capacity associated with the allocation request may refer to storing data on a particular storage device 120 or on any of a plurality of storage devices 120 A-N.
- the allocation request may include a logical allocation request or may include a request to store data.
- a logical allocation request may comprise a request to allocate LIDs to a storage client 116 .
- a data storage request may comprise a request to store data corresponding to one or more LIDs that are allocated to the storage client 116 , which are then bound to media storage locations.
- binding the LIDs may comprise associating the LIDs with media storage locations comprising the data in an index maintained in the storage metadata 135 (e.g., the index 1204 ).
- the LIDs may be bound to media storage locations at the time of allocation (e.g., the allocation request may comprise a request to store data).
- allocating LIDs to the data may be in a separate step from binding the LIDs to the media storage locations.
- the request comes from a plurality of storage clients 116 , consequently a client identifier may be associated with the request, the apparatus 400 may use the client identifier to implement an access control with respect to allocations for that storage client 116 and/or with respect to the LIDs available to allocate to the storage client 116 .
- the client identifier may be used to manage how much physical capacity is allocated to a particular storage client 116 or set of storage clients 116 .
- the apparatus 400 includes a logical capacity module 404 that determines if a logical address space 136 of the data storage device includes sufficient unallocated logical capacity to satisfy the allocation request.
- the logical capacity module 404 may determine if the logical address space 136 has sufficient unbound and/or unallocated logical capacity using an index (or other datastructure) maintaining LID bindings and/or LID allocations.
- the logical capacity module 404 may search a logical-to-physical map or index maintained in the storage metadata 135 and/or an unallocated index 1444 described below.
- unbound LIDs may refer to LIDs that do not correspond to valid data stored on a media storage location.
- An unbound LID may be allocated to a client 116 or may be unallocated.
- the logical-to-physical map is configured such that there are no other logical-to-logical mappings between the LIDs in the map and media addresses associated with the LIDs.
- the logical capacity module 404 searches the logical-to-physical index 1204 (or other datastructure) to identify unbound LIDs and identifies unallocated logical space therein. For example, if a logical address space 136 includes a range of logical addresses from 0000 to FFFF and the logical-to-physical map indicates that the logical addresses 0000 to F000 are allocated and bound, the logical capacity module 404 may determine that LIDs F001 to FFFF are not allocated. If the LIDs F001 to FFFF are not allocated to another storage client 116 , they may be available for allocation to satisfy the allocation request.
- the translation module 134 may maintain a plurality of different logical address spaces, such as a separate logical address space each storage client 116 . Accordingly, each storage client 116 may operate in its own, separate logical storage space 136 .
- the storage layer 130 may, therefore, comprise separate storage metadata 135 (e.g., indexes, capacity indicators, and so on), for each storage client 116 (or group of storage clients 116 ).
- Storage clients 116 may be distinguished by an identifier, which may include, but is not limited to: an address (e.g., network address), credential, name, context, or other identifier. The identifiers may be provided in storage requests and/or may be associated with a communication channel or protocol used by the storage client 116 to access the storage layer 130 .
- the index 1204 may comprise an allocation index or allocation entries configured to track logical capacity allocations that have not yet been bound to media storage locations.
- a LID (or other portion of logical capacity) may be allocated to a client, but may not be associated with data stored on a storage device 120 .
- the logical capacity may be “unbound,” and as such, may not be included in the logical-to-physical index.
- the logical capacity module 404 may consult additional datastructures (e.g., allocation index, allocation entries, and/or an unallocated index 1444 ).
- the allocation entry may be included in the logical-to-physical index (e.g., entry 1205 D), and may comprise an indicator showing that the entry is not bound to any particular media storage locations.
- An allocation request may include a request for a certain number of LIDs.
- the logical capacity module 404 may determine if the available logical capacity (e.g., unbound and/or unallocated logical capacity) is sufficient to meet or exceed the requested amount of logical addresses. In another example, if the allocation request specifies a list or range of LIDs to allocate, the logical capacity module 404 can determine if the LIDs for all or a portion of the LIDs requested are unallocated or unbound.
- the apparatus 400 may further comprise an allocation reply module 406 that communicates a reply to the requesting device indicating whether the request can be satisfied. For example, if the logical capacity module 404 determines that the unallocated logical space is insufficient to satisfy the allocation request, the allocation reply module 406 may include in the reply that the allocation request failed, and if the logical capacity module 404 determines that the unallocated logical space is sufficient to satisfy the allocation request (and/or the specified LIDs are unallocated), the allocation reply module 406 may include in the reply an affirmative response.
- An affirmative response may comprise a list of allocated LIDs, a range of LIDs, or the like.
- the allocation request is for a specific group of LIDs and the allocation reply module 406 may reply with the requested LIDs.
- the allocation request is part of a write request.
- the write request includes specific LIDs and the allocation reply module 406 may reply with the requested LIDs.
- the write request only includes data or an indication of an amount of data and the allocation reply module 406 may reply by allocating LIDS sufficient for the write request and returning the allocated LIDS. Alternatively, if an indication of an amount of data is provided the reply may include LIDs that are unallocated.
- the allocation reply module 406 may reply before or after the data is written.
- the allocation reply module 406 may reply in response to the logical capacity module 404 determining if the logical space of the data storage device has sufficient unallocated logical space to satisfy an allocation request.
- the storage layer 130 may expose portions of the logical address space maintained by the translation module 134 (e.g., index 1204 ) directly to storage clients 116 via the virtual storage interface 132 (or other interface).
- the storage clients 116 may use the virtual storage interface 132 to perform various functions including, but not limited to: identifying available logical capacity (e.g., particular LIDs or general LID ranges), determining available physical capacity, querying the health of the storage media 122 , identifying allocated LIDs, identifying LIDs that are bound to media storage locations, etc.
- the interface 138 can expose all or a subset of the features and functionality of the apparatus 400 directly to clients which may leverage the virtual storage interface 132 to delegate management of the logical address space 136 and/or LIDs to the storage layer 130 .
- FIG. 5 is a schematic block diagram illustrating another embodiment of an apparatus 500 to allocate data storage space.
- the apparatus 500 includes an allocation request module 402 , a logical capacity module 404 , and an allocation reply module 406 , which are substantially similar to those described above in relation to the apparatus 400 of FIG. 4 .
- the apparatus 500 includes a physical capacity request module 502 , a physical capacity allocation module 504 , a physical capacity reply module 506 , an allocation module 508 , an allocation query request module 510 , an allocation query determination module 512 , an allocation query reply module 514 , a logical space management module 516 , a mapping module 518 , a physical space reservation request module 520 , a physical space reservation module 522 , a physical space reservation return module 524 , a physical space reservation cancellation module 526 , a LID binding module 528 , a DMA module 530 , and a deletion module 532 , which are described below.
- the modules 402 - 406 and 502 - 532 of the apparatus 500 of FIG. 5 may be included in the storage layer 130 , a storage media controller 126 , or any other appropriate location known to one of skill in the art.
- the apparatus 500 includes, in one embodiment, a physical capacity request module 502 , a physical capacity allocation module 504 , and a physical capacity reply module 506 .
- the physical capacity request module 502 receives from a requesting device a physical capacity request.
- the physical capacity request is received at the data storage device and includes a request of an amount of available physical storage capacity in the data storage device (and/or physical storage capacity allocated to the requesting device).
- the physical capacity request may include a quantity of physical capacity or may indirectly request physical storage capacity, for example by indicating a size of a data unit to be stored.
- Another indirect physical storage capacity request may include logical addresses of data to be stored which may correlate to a data size.
- One of skill in the art will recognize other forms of a physical capacity request.
- the physical capacity allocation module 504 determines the amount of available physical storage capacity on one or more storage devices 120 and/or 120 A-N.
- the amount of available physical storage capacity includes a physical storage capacity of unbound media storage locations.
- the amount of available physical storage capacity may be “budgeted,” for example, only a portion of the physical storage capacity of a storage device 120 may be available to the requesting device.
- the amount of available physical storage capacity may be budgeted based on a quota associated with each storage client 116 or group of storage clients 116 .
- the apparatus 500 may enforce these quotas.
- the allocation of available physical storage device may be determined by configuration parameter(s), may be dynamically adjusted according to performance and/or quality of service policies, or the like.
- the physical capacity allocation module 504 may determine the amount of available physical storage capacity using an index (or other datastructure), such as the index 1204 described above.
- Index 1204 may identify the media storage locations that comprise valid data (e.g., entries 1205 A-N that comprise bound media storage locations).
- the available storage capacity may be a total (or budgeted) physical capacity minus the capacity of the bound media storage locations.
- an allocation index (or other datastructure) may maintain an indicator of the available physical storage capacity. The indicator may be updated responsive to storage operations performed on the storage device including, but not limited to: grooming operations, deallocations (e.g., TRIM), writing additional data, physical storage capacity reservations, physical storage capacity reservation cancellations, and so on. Accordingly, the module 504 may maintain a “running total” of available physical storage capacity that is available on request.
- the physical capacity reply module 506 that communicates a reply to the requesting device in response to the physical capacity allocation module 504 determining the amount of available physical storage capacity on the data storage device.
- the physical capacity allocation module 504 tracks bound media storage locations, unbound media storage locations, reserved physical storage capacity, unreserved physical storage capacity, and the like.
- the physical capacity allocation module 504 may track these parameters using a logical-to-physical map, a validity map, a free media address pool, a used media address pool, a physical-to-logical map, or other means known to one of skill in the art.
- the reply may take many forms.
- the reply may include an amount of available physical storage capacity.
- the reply may include an acknowledgement that the data storage device has the requested available physical storage capacity.
- One of skill in the art will recognize other forms of a reply in response to a physical capacity request.
- the apparatus 500 with a physical capacity request module 502 , a physical capacity allocation module 504 , and a physical capacity reply module 506 is advantageous for storage devices 120 where a logical-to-physical mapping is not a one-to-one mapping.
- a file server storage client 116 may track physical storage capacity of a storage device 120 by tracking the LBAs that are bound to media storage locations.
- tracking LIDs may not provide any indication of physical storage capacity.
- LIDs may be used to support snap shots, cloning (e.g., logical copies), deduplcation and/or backup. Examples of systems and methods for managing many-to-one LID to media storage location logical interfaces are disclosed in further detail below.
- the apparatus 500 may track available physical storage space and may communicate the amount of available physical storage space to storage clients 116 , which may allow the storage clients 116 to offload allocation management and physical capacity management to the storage layer 130 .
- media storage locations are bound to corresponding LIDs.
- LIDs associated with the data are bound to the media storage location where the data is stored.
- the location where the data is stored is not apparent from the LID, even if the LID is an LBA. Instead, the data is stored at an append point and the address where the data is stored is mapped to the LID. If the data is a modification of data stored previously, the LID may be mapped to the current data as well as to a location where the old data is stored. There may be several versions of the data mapped to the same LID.
- the apparatus 500 includes an allocation module 508 that allocates the unallocated logical space sufficient to satisfy the allocation request of the requesting device.
- the allocation module 508 may allocate the unallocated logical space in response to the logical capacity module 404 determining that the logical space has sufficient unallocated logical space to satisfy the allocation request.
- the allocation request is part of a pre-allocation where logical space is not associated with a specific request to store data.
- a storage client 116 may request, using an allocation request, logical space and then may proceed to store data over time to the allocated logical space.
- the allocation module 508 allocates LIDs to the storage client 116 in response to an allocation request and to the logical capacity module 404 determining that the logical space has sufficient unallocated logical space to satisfy the allocation request.
- the allocation module 508 may also allocate LIDs based on an allocation request associated with a specific storage request. For example, if a storage request includes specific LIDs and the logical capacity module 404 determines that the LIDs are available, the allocation module 508 may allocate the LIDs in conjunction with storing the data of the storage request. In another example, if the storage request does not include LIDs and the logical capacity module 404 determines that there are sufficient LIDs to for the storage request, the allocation module 508 may select and allocate LIDs for the data and the allocation reply module 406 may communicate the allocated LIDs.
- the allocation module 508 may be configured to locate unallocated LIDs to satisfy an allocation request. In some embodiments, the allocation module 508 may identify unallocated LIDs by receiving a list of requested LIDs to allocate from the storage client 116 and verify that these LIDs are available for allocation. In another example, the allocation module 508 may identify unallocated LIDs by searching for unallocated LIDs that meet criteria received in conjunction with the request. The criteria may be LIDs that are associated with a particular storage device 120 A-N, that are available in a RAID, that have some assigned metadata characteristic, etc.
- the allocation module 508 may identify unallocated LIDs by creating a subset of LIDs that meet criteria received in conjunction with the request identified in a pool of available LIDs.
- the LIDs may be a subset of LIDs that have already been allocated to the client 116 . For example, if a set or group of LIDs is allocated to a particular user, group, employer, etc., a subset of the LIDs may be allocated. A specific example is if a set of LIDs is allocated to an organization and then a subset of the allocated LIDs is further allocated to a particular user in the organization.
- the allocation module 508 can identify one or more unallocated LIDs.
- the allocation module 508 can expand the LIDs allocated to a storage client 116 by allocating LIDs in addition to LIDs already allocated to the storage client 116 .
- LIDs allocated to a storage client 116 may be decreased by deallocating certain LIDs so that they return to a pool of unallocated LIDs.
- subsets of allocated LIDs may be allocated, deallocated, increased, decreased, etc. For example, LIDs allocated to a user in an organization may be deallocated so that the LIDs allocated to the user are still allocated to the organization but not to the user.
- the apparatus 500 includes an allocation query request module 510 , an allocation query determination module 512 , and an allocation query reply module 514 .
- the allocation query request module 510 receives an allocation query from some requesting device, such as a storage client 116 , etc.
- An allocation query may include a request for information about allocating logical space or associated management of the allocated logical space.
- an allocation query may be a request to identify allocated LIDs, identify bound LIDs, identify allocated LIDs that are not bound to media storage locations, unallocated LIDs or a range of LIDs, and the like.
- the allocation query may include information about logical allocation, logical capacity, physical capacity, or other information meeting criteria in the allocation query.
- the information may include metadata, status, logical associations, historical usage, flags, control, etc.
- One of skill in the art will recognize other allocation queries and the type of information returned in response to the allocation query.
- the allocation query includes some type of criteria that allows the allocation query determination module 512 to service the allocation request.
- the allocation query determination module 512 identifies one or more LIDs that meet the criteria specified in the allocation query.
- the identified LIDs include allocated LIDs that are bound to media storage locations, allocated LIDs that are unbound, unallocated LIDs, and the like.
- the allocation query reply module 514 communicates to the client 110 the results of the query to the requesting device or to another device as directed in the allocation query.
- the results of the allocation query may include a list of the identified LIDs, an acknowledgement that LIDs meeting the criteria were found, an acknowledgement that LIDs meeting the criteria in the allocation query were not found, bound/unbound status of LIDs, logical storage capacity, or the like.
- the allocation query reply module 514 returns status information and the information returned may include any information related to managing and allocating LIDs known to those of skill in the art.
- the apparatus 500 in another embodiment, includes a logical space management module 516 that manages the logical space of the data storage device from within the data storage device.
- the logical space management module 516 may manage the logical space from a storage layer 130 or driver associated with a storage device 120 of the data storage device.
- the logical space management module 516 may track unbound LIDs and bound LIDs, for example, in the logical-to-physical map, in an index, or in another datastructure.
- a bound LID refers to a LID corresponding to data; a bound LID is a LID associated with valid data stored on a media storage location of the storage device 120 .
- the logical space management module 516 may service allocation requests and allocation queries as described above, and other functions related to allocation.
- the logical space management module 516 can also include receiving a deallocation request from a requesting device.
- the deallocation request typically includes a request to return one or more allocated LIDs to an unallocated state and then communicating to the requesting device, or other designated device, the successful deallocation.
- the deallocation request may include a request to return one or more storage locations associated with the LIDs allocated, and then communicating to the requesting device, or other designated device, the successful deallocation. This might be transparent, or might require that the deallocation request be extended to include an indication that a logical/physical deallocation should accompany the request.
- Deallocation requests may be asynchronous and tied to the groomer. Thus, the deallocation request may be virtual (in time) until completed.
- the management of the allocations (logical and physical) may diverge from the actual available space at any point in time.
- the management module 516 is configured to deal with these differences.
- the logical space management module 516 may also receive a LID group command request from a requesting device and may communicate to the requesting device a reply indicating a response to the LID group command request.
- the LID group command request may include an action to take on, for example, two or more LIDs (“LID group”), metadata associated with the LID group, the data associated with the LID group, and the like. For example, if several users are each allocated LIDs and the users are part of a group, a LID group command may be to deallocate the LIDs for several of the users, allocate additional LIDs to each user, return usage information for each user, etc.
- the action taken in response to the LID group command may also include modifying the metadata, backing up the data, backing up the metadata, changing control parameters, changing access parameters, deleting data, copying the data, encrypting the data, deduplicating the data, compressing the data, decompressing the data, etc.
- One of skill in the art will recognize other logical space management functions that the logical space management module 516 may also perform.
- the apparatus 500 includes a mapping module 518 that binds, in a logical-to-physical map (e.g., the index 1204 ), bound LIDs to media storage locations.
- the logical capacity module 404 determines if the logical space has sufficient unallocated logical space using the logical-to-physical map mapped by the mapping module 518 .
- the index 1204 may be used to track allocation of the bound LIDs, the unbound LIDs, the allocated LIDs, the unallocated LIDs, the allocated LID capacity, the unallocated LID capacity, and the like.
- the mapping module 518 binds LIDs to corresponding media addresses in multiple indexes and/or maps.
- a reverse map may be used to quickly access information related to a media address and to link to a LID associated with the media address.
- the reverse map may be used to identify a LID from a media address.
- a reverse map may be used to map addresses in a data storage device 120 into erase regions, such as erase blocks, such that a portion of the reverse map spans an erase region of the storage device 120 erased together during a storage space recovery operation. Organizing a reverse map by erase regions facilitates tracking information useful during grooming operations.
- the reverse map may include which media addresses in an erase region have valid data and which have invalid data. When valid data is copied from an erase region and the erase region erased, the reverse map can easily be changed to indicate that the erase region does not include data and is ready for sequential storage of data.
- the apparatus 500 includes a physical space reservation request module 520 , located in the storage layer 130 , that receives a request from a storage client 116 to reserve available physical storage capacity on the data storage device (i.e. the storage device 120 that is part of the data storage device) [hereinafter a “physical space reservation request”].
- the physical space reservation request includes an indication of an amount of physical storage capacity requested by the storage client 116 .
- the indication of an amount of physical storage capacity requested may be expressed in terms of physical capacity.
- the request to reserve physical storage capacity may also include a request to allocate the reserved physical storage capacity to a logical entity.
- the indication of an amount of physical storage capacity may be expressed indirectly as well.
- a storage client 116 may indicate a number of logical blocks and the data storage device may determine a particular fixed size for each logical block and then translate the number of logical blocks to a physical storage capacity.
- One of skill in the art will recognize other indicators of an amount of physical storage capacity in a physical space reservation request.
- the physical space reservation request in one embodiment, is associated with a write request.
- the write request is a two-step process, and the physical space reservation request and the write request are separate.
- the physical space reservation request is part of the write request or the write request is recognized as having an implicit physical space reservation request.
- the physical space reservation request is not associated with a specific write request, but may instead be associated with planned storage, reserving storage space for a critical operation, etc., where mere allocation of storage space is insufficient.
- the data may be organized into atomic data units.
- the atomic data unit may be a packet, a page, a logical page, a logical packet, a block, a logical block, a set of data associated with one or more logical block addresses (the logical block addresses may be contiguous or noncontiguous), a file, a document, or other grouping of related data.
- an atomic data unit is associated with a plurality of noncontiguous and/or out of order logical block addresses or other identifiers that the data write module 240 handles as a single atomic data unit.
- writing noncontiguous and/or out of order logical blocks in a single write operation is referred to as an atomic write.
- a hardware controller processes operations in the order received and a software driver of the client sends the operations to the hardware controller for a single atomic write together so that the data write module 240 can process the atomic write operation as normal. Because the hardware processes operations in order, this guarantees that the different logical block addresses or other identifiers for a given atomic write travel through the data write module 240 together to the nonvolatile memory.
- the client in one embodiment, can back out, reprocess, or otherwise handle failed atomic writes and/or other failed or terminated operations upon recovery once power has been restored.
- apparatus 500 may mark blocks of an atomic write with a metadata flag indicating whether a particular block is part of an atomic write.
- metadata marking is to rely on the log write/append only protocol of the nonvolatile memory together with a metadata flag, or the like.
- the use of an append only log for storing data and prevention of any interleaving blocks enables the atomic write membership metadata to be a single bit.
- the flag bit may be a 0, unless the block is a member of an atomic write, and then the bit may be a 1, or vice versa.
- the metadata flag may be a 0 to indicate that the block is the last block of the atomic write.
- different hardware commands may be sent to mark different headers for an atomic write, such as the first block in an atomic write, middle member blocks of an atomic write, tail of an atomic write, or the like.
- the apparatus 500 scans the log on the nonvolatile storage in a deterministic direction (for example, in one embodiment the start of the log is the tail and the end of the log is the head and data is always added at the head).
- the power management apparatus scans from the head of the log toward the tail of the log.
- the block is either a single block atomic write or a non-atomic write block.
- the power management apparatus continues scanning the log until the metadata flag changes back to a 0; at that point in the log, the previous block scanned is the last member of the atomic write and the first block stored for the atomic write.
- the nonvolatile memory uses a sequential, append only write structured writing system where new writes are appended on the front of the log (i.e. at the head of the log).
- the storage controller reclaims deleted, stale, and/or invalid blocks of the log using a garbage collection system, a groomer, a cleaner agent, or the like.
- the storage controller uses a forward map to map logical block addresses to media addresses to facilitate use of the append only write structure and garbage collection.
- the apparatus 500 includes a physical space reservation module 522 that determines if the data storage device (i.e. storage device 120 ) has an amount of available physical storage capacity to satisfy the physical storage space request. If the physical space reservation module 522 determines that the amount of available physical storage capacity is adequate to satisfy the physical space reservation request, the physical space reservation module 522 reserves an amount of available physical storage capacity on the storage device 120 to satisfy the physical storage space request. The amount of available physical storage capacity reserved to satisfy the physical storage space request is the reserved physical capacity.
- the amount of reserved physical capacity may or may not be equal to the amount of storage space requested in the physical space reservation request.
- the storage layer 130 may need to store additional information with data written to a storage device 120 , such as metadata, index information, error correcting code, etc.
- the storage layer 130 may encrypt and/or compress data, which may affect storage size.
- the physical space reservation request includes an amount of logical space and the indication of an amount of physical storage capacity requested is derived from the requested logical space.
- the physical space reservation request includes one or more LIDs and the indication of an amount of physical storage capacity requested is derived from an amount of data associated with the LIDs.
- the data associated with the LIDs is data that has been bound to the LIDs, such as in a write request.
- the data associated with the LIDs is a data capacity allocated to each LID, such as would be the case if a LID is an LBA and a logical block size could be used to derive the amount of requested physical storage capacity.
- the physical space reservation request is a request to store data.
- the physical space reservation request may be implied and the indication of an amount of physical storage capacity requested may be derived from the data and/or metadata associated with the data.
- the physical space reservation request is associated with a request to store data. In this embodiment, the indication of an amount of physical storage capacity requested is indicated in the physical space reservation request and may be correlated to the data of the request to store data.
- the physical space reservation module 522 may also then factor metadata, compression, encryption, etc. to determine an amount of required physical capacity to satisfy the physical space reservation request.
- the amount of physical capacity required to satisfy the physical space reservation request may be equal to, larger than, or smaller than an amount indicated in the physical space reservation request.
- the physical space reservation module 522 determines if one or more storage devices 120 A-N, either individually or combined, have enough available physical storage capacity to satisfy the physical space reservation request.
- the request may be for space on a particular storage device (e.g. 120 A), a combination of storage devices 120 A-N, such as would be the case if some of the storage devices 120 A-N are in a RAID configuration, or for available space generally.
- the physical space reservation module 522 may tailor a determination of available capacity to specifics of the physical space reservation request.
- the physical space reservation module 522 will typically retrieve available physical storage capacity information from each logical-to-physical map of each storage device 120 or a combined logical-to-physical map of a group of storage devices 120 A-N.
- the physical space reservation module 522 typically surveys bound media addresses. Note that the physical space reservation module 522 may not have enough information to determine available physical capacity by looking at bound LIDs, because there is typically not a one-to-one relationship between LIDs and media storage locations.
- the physical space reservation module 522 reserves physical storage capacity, in one embodiment, by maintaining enough available storage capacity to satisfy the amount of requested capacity in the physical space reservation request. Typically, in a log structured file system or other sequential storage device, the physical space reservation module 522 would not reserve a specific media region or media address range in the storage device 120 , but would instead reserve physical storage capacity.
- a storage device 120 may have 500 gigabytes (“GB”) of available physical storage capacity.
- the storage device 120 may be receiving data and storing the data at one or more append points, thus reducing the storage capacity.
- a garbage collection or storage space recovery operation may be running in the background that would return recovered erase blocks to storage pool, thus increasing storage space.
- the locations where data is stored and freed are constantly changing so the physical space reservation module 522 , in one embodiment, monitors storage capacity without reserving fixed media storage locations.
- the physical space reservation module 522 may reserve storage space in a number of ways. For example, the physical space reservation module 522 may halt storage of new data if the available physical storage capacity on the storage device 120 decreased to the reserved storage capacity, may send an alert if the physical storage capacity on the storage device 120 was reduced to some level above the reserved physical storage capacity, or some other action or combination of actions that would preserve an available storage capacity above the reserved physical storage capacity.
- the physical space reservation module 522 reserves a media region, range of media addresses, etc. on the data storage device. For example, if the physical space reservation module 522 reserved a certain quantity of erase blocks, data associated with the physical space reservation request may be stored in the reserved region or address range. The data may be stored sequentially in the reserved storage region or range. For example, it may be desirable to store certain data at a particular location.
- One of skill in the art will recognize reasons to reserve a particular region, address range, etc. in response to a physical space reservation request.
- the apparatus 500 includes a physical space reservation return module 524 that transmits to the storage client 116 an indication of availability or unavailability of the requested amount of physical storage capacity in response to the physical space reservation module 522 determining if the data storage device has an amount of available physical storage space that satisfies the physical space reservation request. For example, if the physical space reservation module 522 determines that the available storage space is adequate to satisfy the physical space reservation request, the physical space reservation return module 524 may transmit a notice that the physical space reservation module 522 has reserved the requested storage capacity or other appropriate notice.
- the physical space reservation return module 524 may transmit a failure notification or other indicator that the requested physical storage space was not reserved.
- the indication of availability or unavailability of the requested storage space may be used prior to writing data to reduce a likelihood of failure of a write operation.
- the apparatus 500 in another embodiment, includes a physical space reservation cancellation module 526 that cancels all or a portion of reserved physical storage space in response to a cancellation triggering event.
- the cancelation triggering event may come in many different forms.
- the cancellation triggering event may include determining that data to be written to the storage device 120 and associated with available space reserved by the physical space reservation module 522 has been previously stored by the storage layer 130 .
- a deduplication process determines that the data has already been stored, the data may not need to be stored again since the previously stored data could be mapped to two or more LIDs.
- the cancellation triggering event could be completion of storing data of the write request.
- the physical space reservation cancellation module 526 may reduce or cancel the reserved physical storage capacity.
- the physical space reservation cancellation module 526 may merely reduce the reserved amount, or may completely cancel the reserved physical storage capacity associated with the write request. Writing to less than the reserved physical space may be due to writing a portion of a data unit where the data unit is the basis of the request, where data associated with a physical space reservation request is written incrementally, etc.
- physical storage space is reserved by the physical storage space reservation module 522 to match a request and then due to compression or similar procedure, the storage space of the data stored is less than the associated reserved physical storage capacity.
- the cancellation triggering event is a timeout. For example, if a physical space reservation request is associated with a write request and the physical space reservation module 522 reserves physical storage capacity, if the data associated with the write request is not written before the expiration of a certain amount of time the physical space reservation cancellation module 526 may cancel the reservation of physical storage space.
- the physical space reservation cancellation module 526 may cancel the reservation of physical storage space.
- the physical space reservation module 522 may increase or otherwise change the amount of reserved physical storage capacity.
- the physical space reservation request module 520 may receive another physical space reservation request, which may or may not be associated with another physical space reservation request. Where the physical space reservation request is associated with previously reserved physical storage capacity, the physical space reservation module 522 may increase the reserved physical storage capacity. Where the physical space reservation request is not associated with previously reserved physical storage capacity, the physical space reservation module 522 may separately reserve physical storage capacity and track the additional storage capacity separately.
- Standard management should include some kind of thresholds, triggers, alarms and the like for managing the physical storage capacity, providing indicators to the user that action needs to be taken. Typically, this would be done in the management system. But, either the management system would have to pool the devices under management or said devices would have to be configured/programmed to interrupt the manger when a criteria was met (preferred).
- the apparatus 500 in another embodiment, includes a LID binding module 528 that, in response to a request from a storage client 116 to write data, binds one or more unbound LIDs to media storage locations comprising the data and transmits the LIDs to the storage client 116 .
- the LID assignment module 528 allows on-the-fly allocation and binding of LIDs.
- the request to write data in another embodiment, may be a two step process.
- the LID binding module 528 may allocate LIDs in a first step for data to be written and then in a second step the data may be written along with the allocated LIDs.
- the LID allocation module 402 allocates LIDs in a contiguous range.
- the LID binding module 528 may also allocate LIDs in a consecutive range. Where a logical space is large, the LID allocation module 402 may not need to fragment allocated LIDs but may be able to choose a range of LIDs that are consecutive.
- the LID allocation module 402 binds LIDs that may not be contiguous and may use logical spaces that are interspersed with other allocated logical spaces.
- the apparatus 500 in another embodiment, includes a DMA module 530 that pulls data from a client 110 in a direct memory access (“DMA”) and/or a remote DMA (“RDMA”) operation.
- the data is first identified in a request to store data, such as a write request, and then the storage layer 130 executes a DMA and/or RDMA to pull data from the storage client 116 to a storage device 120 .
- the write request does not use a DMA or RDMA, but instead the write request includes the data. Again the media storage locations of the data are bound to the corresponding LIDs.
- the apparatus 500 includes a deletion module 532 .
- the deletion module 532 removes the mapping between storage space where the deleted data was stored and the corresponding LID.
- the deletion module 532 may also unbind the one or more media storage locations of the deleted data and also may deallocate the one or more logical addresses associated with the deleted data.
- FIG. 6 is a flow diagram of one embodiment of a method 600 for allocating data storage space.
- the method 600 and the other methods and/or processes disclosed herein, may be embodied as instructions stored on a computer-readable storage medium.
- the instructions may be configured for execution by a computing device, and may be configured to cause the computing device to perform one or more of the disclosed method steps and/or operations.
- one or more of the disclosed method steps and/or operations may be implemented by use of hardware components, such as special-purpose circuitry, logic elements, processors, ASICs, FPGAs, and/or the like.
- Step 602 may comprise receiving an allocation request from a storage client 116 .
- the allocation request may be received through the interface 138 of the storage layer 130 .
- the logical capacity module 404 determines 604 if a logical address space 136 includes sufficient unallocated logical capacity to satisfy the allocation request where the determination includes a search of a logical-to-physical map (e.g., index 1204 , or other datastructure).
- the logical-to-physical map includes bindings between LIDs of the logical space and corresponding media storage locations comprising data of the bound LIDs, wherein a bound LID differs from the one or more media storage locations addresses bound to the LID.
- the allocation reply module 406 communicates 606 a reply to the requesting device and the method 600 ends.
- FIG. 7 is a schematic flow chart diagram illustrating one embodiment of a method 700 for allocating data storage space.
- the method 700 begins and the physical capacity request module 502 receives 702 from a requesting device a physical capacity request.
- the physical capacity request is received at the data storage device.
- the physical capacity request includes a request of an amount of available physical storage capacity in the data storage device.
- the physical capacity request for example, may be a specific amount of physical capacity, may be derived from a request to store data, etc.
- the physical capacity allocation module 504 determines 704 the amount of available physical storage capacity on the data storage device where the amount of available physical storage capacity includes a physical storage capacity of unbound storage locations in the data storage device.
- the physical capacity reply module 506 communicates 706 a reply to the requesting device in response to the physical capacity allocation module 504 determines the amount of available physical storage capacity on the data storage device, and the method 700 ends.
- FIG. 8 is a schematic flow chart diagram illustrating one embodiment of a method 800 for reserving physical storage space.
- the method 800 begins and the physical space reservation request module 520 receives 802 a physical space reservation request to reserve available physical storage space.
- the physical space reservation request includes an indication of an amount of physical storage capacity requested.
- the indication of an amount of physical storage capacity could take many forms, such as a number of bytes or a number of logical blocks, a request to store specific data, or other indirect indication where the indication of an amount of physical storage is derived from the request.
- the physical space reservation module 522 determines 804 if the data storage device has available physical storage capacity to satisfy the physical storage space request. If the physical space reservation module 522 determines 804 that the data storage device has available physical storage capacity to satisfy the physical storage space request, the physical space reservation module 522 reserves 806 physical storage capacity adequate to service the physical space reservation request and the physical space reservation return module 524 transmits 808 to the requesting storage client 116 an indication that the requested physical storage space is reserved.
- the physical allocation module 404 maintains 810 enough available physical storage capacity to maintain the reservation of physical storage capacity until the reservation is used by storing data associated with the reservation or until the reservation is cancelled, and the method 800 ends. If the physical space reservation module 522 determines 804 that the data storage device does not have available physical storage capacity to satisfy the physical storage space request, the physical space reservation return module 524 transmits 812 to the requesting storage client 116 an indication that the requested physical storage space is not reserved or an indication of insufficient capacity, and the method 800 ends.
- FIG. 9 is a schematic flow chart diagram illustrating one embodiment of a method 900 for binding LIDs to media storage locations.
- the method 900 begins and the LID binding module 528 receives 902 a write request from a storage client 116 .
- the write request is a request to write data to one or more storage devices 120 .
- For 902 may comprise determining whether the request is associated with any LIDs (e.g., determining whether LIDs have been allocated for the request).
- Step 904 may comprise allocating LIDs to the storage client to service the write request (if necessary), as disclosed above.
- Step 904 may further comprise identifying LIDs allocated to the storage client for use in referencing the data of the write request.
- Step 904 may comprising indicating that the identified LIDs are allocated by the storage client and are currently being used to reference valid data on a storage device.
- Step 904 may further comprise allocating and/or reserving physical storage capacity for the write request (by use of the physical capacity allocation module 504 , as disclosed above.
- Step 906 may comprise servicing the write request by, inter alia, storing data of the write request onto one or more storage device(s) 120 .
- the data may be stored in a contextual, log-based format, as disclosed herein.
- the data may be stored at one or more physical storage locations, which may be referenced by respective media addresses.
- Step 908 may comprise binding the LIDs identified at step 904 to the media addresses of step 906 .
- Step 908 may, therefore, comprise the mapping module 518 binding the media addresses to the LIDs identified at step 904 (e.g., binding the LIDs to the media addresses in one or more entries 1205 A-N of an index).
- the media addresses may be determined concurrently with (or after) the data is stored at step 906 .
- step 910 further comprises providing an indication of the LIDs used to satisfy the write request (the LIDs identified at step 904 ) to the storage client 116 .
- the LIDs may be communicated in an acknowledgement message, a return value, a callback, or other suitable mechanism.
- FIG. 10 is a schematic flow chart diagram illustrating another embodiment of a method 1000 for binding allocated LIDs in data storage device 120 .
- the method 1000 begins and the LID binding module 528 receives 1002 a request to bind LIDs to data where the LIDs are allocated to the storage client 116 making the request.
- the LID binding module 528 binds 1004 LIDs to media storage locations comprising the data.
- the LID binding module 528 communicates 1006 the bound LIDs to the storage client 116 .
- the storage layer 130 receives 1006 a write request to write data to a storage device 120 where the data is already associated with bound LIDs.
- the write request is to store the data on more than one storage device 120 in the storage system 102 , such as would be the case if the storage devices 120 are RAIDed or if the data is written to a primary storage device 120 and to a mirror storage device 120 .
- the storage controller 140 stores 1010 the data on the storage device 120 and the mapping module 518 maps 1012 one or more media storage locations where the data is stored to the bound LIDs (e.g., updates the binding between the LIDs and media storage locations in the index 1204 ).
- Step 1014 may further comprise communicating an indication that the request of step 1002 was successfully completed.
- FIG. 11 is a schematic flow chart diagram illustrating an embodiment of a method 1100 for servicing an allocation query at a storage device.
- the allocation query request module 510 receives 1102 an allocation query at the data storage device.
- the allocation query determination module 512 identifies 1104 one or more LIDs that meet a criteria specified in the allocation query.
- the identified LIDs include allocated LIDs that are bound, allocated LIDs that are unbound, and/or unallocated LIDs.
- the allocation query reply module 514 communicates 1106 the results of the allocation query to a requesting device or other designated device and the method 1100 ends.
- the results may include a list of the identified LIDs, an acknowledgement that LIDs meeting the criteria were found, an acknowledgement that LIDs meeting the criteria in the allocation query were not found, etc.
- FIG. 12 depicts another example of an index 1204 for associating LIDs with storage locations on a non-volatile storage device.
- the index 1204 may comprise a tree (or other datastructure) comprising a plurality of entries (e.g., entries 1208 , 1214 , 1218 and so on). Each entry in the index 1204 may associate a LID (or LID range, extent, or set) with one or more media storage locations, as described above.
- the LIDs may be contiguous (e.g. 072-083).
- Other entries, such as 1218 may comprise a discontiguous set of LIDs (e.g., LID 454-477 and 535-598).
- the index 1204 may be used to represent variable sized storage entries (e.g., storage entries corresponding to one or more storage locations of the—volatile storage device 120 comprising data of an arbitrary set or range of LIDs).
- the storage entries may further comprise and/or reference metadata 1219 , which may comprise metadata pertaining to the LIDs, such as age, size, LID attributes (e.g., client identifier, data identifier, file name, group identifier), and so on. Since the metadata 1219 is associated with the storage entries, which are indexed by LID (e.g., address 1215), the metadata 1219 may remain associated with the storage entry 1214 regardless of changes to the location of the underlying storage locations on the non-volatile storage device 120 (e.g., changes to the storage locations 1217).
- LID e.g., address 1215
- the index 1204 may be used to efficiently determine whether the non-volatile storage device 120 comprises a storage entry referenced in a client request and/or to identify a storage location of data on the device 120 .
- the non-volatile storage device 120 may receive a request to allocate a particular LID.
- the request may specify a particular LID, a LID and a length or offset (e.g., request 3 units of data starting from LID 074), a set of LIDs or the like.
- the client request may comprise a set of LIDs, LID ranges (continuous or discontinuous), or the like.
- the non-volatile storage device 120 may determine whether a storage entry corresponding to the requested LIDs is in the index 1204 using a search operation. If a storage entry comprising the requested LIDs is found in the index 1204 , the LID(s) associated with the request may be identified as being allocated and bound. Accordingly, data corresponding to the LID(s) may be stored on the non-volatile storage device 120 . If the LID(s) are not found in the index 1204 , the LID(s) may be identified as unbound (but may be allocated). Since the storage entries may represent sets of LIDS and/or LID ranges, a client request may result in partial allocation.
- a request to allocate 068-073 may successfully allocate LIDs 068 to 071, but may fail to allocate 072 and 073 since these are included in the storage entry 1214 .
- the entire allocation request may fail, the available LIDs may be allocated and other LIDs may be substituted for the failed LIDs, or the like.
- the storage entry corresponding to the storage request is in the index 1204 (storage entry 1214 ), and, as such, the LIDs associated with the request are identified as allocated and bound. Therefore, if the client request is to read data at the specified LIDs, data may be read from the storage locations 1217 identified in the storage entry 1214 and returned to the originator of the request. If the request is to allocate the identified LIDs, the allocation request may fail (and/or substitute LIDs may be allocated as described above).
- a merge operation may occur.
- an existing storage entry may be “merged” with one or more other storage entries.
- a new storage entry for LIDs 084-088 may be merged with entry 1214 .
- the merge may comprise modifying the LID 1215 of the storage entry to include the new addresses (e.g., 072-088) and/or to reference the storage locations 1217 to include the storage location on which the data was stored.
- the storage entries in the index 1204 are shown as comprising references to storage locations (e.g., addresses 1217), the disclosure is not limited in this regard.
- the storage entries comprise reference or indirect links to the storage locations.
- the storage entries may include a storage location identifier (or reference to the reverse map 1222 ).
- FIG. 12 depicts another example of an index comprising a reverse map 1222 , which may associate storage locations of the non-volatile storage device 120 with LIDs in the logical address space 136 .
- the reverse map 1222 may also associate a storage location with metadata, such as a validity indicator 1230 , and/or other metadata 1236 .
- the storage location address 1226 and/or length 1228 may be explicitly included in the reverse map 1222 .
- the storage location address 1226 and/or data length 1228 may be inferred from a location and/or arrangement of an entry in the reverse map 1222 and, as such, the address 1226 and/or data length 1228 may be omitted.
- the reverse map 1222 may include references to LIDs 1234.
- the reverse map 1222 may comprise metadata 1236 , which may include metadata pertaining to sequential storage operations performed on the storage locations, such as sequence indicators (e.g., timestamp) to indicate an ordered sequence of storage operations performed on the storage device (e.g., as well as an “age” of the storage locations and so on).
- the metadata 1236 may further include metadata pertaining to the storage media, such as wear level, reliability, error rate, disturb status, and so on.
- the metadata 1236 may be used to identify unreliable and/or unusable storage locations, which may reduce the physical storage capacity of the non-volatile storage device 120 .
- the reverse map 1222 may be organized according to storage divisions (e.g., erase blocks) of the non-volatile storage device 120 .
- the entry 1220 that corresponds to storage entry 1218 is located in erase block n 1238 .
- Erase block n 1238 is preceded by erase block n ⁇ 1 1240 and followed by erase block n+1 1242 (the contents of erase blocks n ⁇ 1 and n+1 are not shown).
- An erase block may comprise a predetermined number of storage locations.
- An erase block may refer to an area in the non-volatile storage device 120 that is erased together in a storage recovery operation.
- the validity indicator 1230 may be used to selectively “invalidate” data.
- Data marked as invalid in the reverse index 1222 may correspond to obsolete versions of data (e.g., data that has been overwritten and/or modified in a subsequent storage operation).
- data that does not have a corresponding entry in the index 1204 may be marked as invalid (e.g., data that is no longer being referenced by a storage client 116 ). Therefore, as used herein, “invalidating” data may comprise marking the data as invalid in the storage metadata 135 , which may include removing a reference to the media storage location in the index 1204 and/or marking a validity indicator 1230 of the data in the reverse map.
- the groomer module 370 uses the validity indicators 1230 to identify storage divisions (e.g., erase blocks) for recovery.
- the erase block may be erased and valid data thereon (if any) may be relocated to new storage locations on the non-volatile storage media.
- the groomer module 370 may identify the data to relocate using the validity indicator(s) 1230 . Data that is invalid may not be relocated (may be deleted), whereas data that is still valid (e.g., still being referenced within the index 1204 ) may be relocated. After the relocation, the groomer module 370 (or other process) may update the index 1204 to reference the new media storage location(s) of the valid data.
- marking data as “invalid” in the storage metadata 135 may cause data to be removed from the non-volatile storage media 122 .
- the removal of the data may not occur immediately (when the data is marked “invalid”), but may occur in response to a grooming operation or other processes that is outside of the path for servicing storage operations and/or requests.
- the groomer module 370 may be configured to determine whether the contextual format of the data should be updated by referencing the storage metadata 135 (e.g., the reverse map 1222 and/or index 1204 ).
- the validity metadata 1230 may be used to determine an available physical storage capacity of the non-volatile storage device 120 (e.g., a difference between physical capacity (or budgeted capacity) and the storage locations comprising valid data).
- the reverse map 1222 may be arranged by storage division (e.g. erase blocks) or erase region to enable efficient traversal of the physical storage space (e.g., to perform grooming operations, determine physical storage capacity, and so on). Accordingly, in some embodiments, the available physical capacity may be determined by traversing the storage locations and/or erase blocks in the reverse map 1222 to identify the available physical storage capacity (and/or is being used to store valid data).
- the reverse map 1222 may comprise an indicator 1239 to track the available physical capacity of the non-volatile storage device 120 .
- the available physical capacity indicator 1239 may be initialized to the physical storage capacity (or budgeted capacity) of the non-volatile storage device 120 , and may be updated as storage operations are performed.
- the storage operations resulting in an update to the available physical storage capacity indicator 1239 may include, but are not limited to: storing data on the storage device 120 , reserving physical capacity on the storage device 120 , canceling a physical capacity reservation, storing data associated with a reservation where the size of the stored data differs from the reservation, detecting unreliable and/or unusable storage locations and/or storage division (e.g., taking storage locations out of service), and so on.
- the metadata 1204 and/or 1222 may be configured to reflect reservations of physical storage capacity.
- a storage client may reserve physical storage capacity for an operation that is to take place over time. Without a reservation, the storage client may begin the operation, but other clients may exhaust the physical capacity before the operation is complete.
- the storage client 116 issues a request to reserve physical capacity before beginning the storage operation.
- the storage layer 130 may update storage metadata 135 (e.g., the indexes 1204 and/or 1222 , disclosed herein), to indicate that the requested portion has been reserved.
- the reserved portion may not be associated with any particular media storage locations; rather, the reservation may indicate that the storage layer 130 is to maintain at least enough physical storage capacity to satisfy the reservation.
- the indicator 1239 of remaining physical storage capacity may be reduced by the amount of reserved physical storage capacity. Requests subsequent to the reservation may be denied if satisfying the requests would exhaust the remaining physical storage capacity in the updated indicator 1239 .
- a reservation of physical storage capacity may be valid for a pre-determined time, until released by the storage client, until another, higher-priority request is received, or the like. The reservation may expire once the storage client that reserved the physical capacity consumes the reserved physical storage capacity in one or more subsequent storage operations. If the storage operations occur over a series of storage operations (as opposed to a single operation), the reservation may be incrementally reduced accordingly.
- FIG. 13 depicts another example of an index 1304 for managing storage allocation of a non-volatile storage device.
- the index 1304 may be modified to include one or more allocation entries (e.g., allocated entry 1314 ).
- An allocation entry may be used to track LIDs that are allocated to a client, but are not yet bound (e.g., are not associated with data stored on the non-volatile storage device 120 ). Therefore, unlike the storage entries (e.g., entries 1308 , 1316 , and 1318 ), an allocation entry 1314 may not include references to storage locations 1317; these references may be set to “unbound,” NULL, or may be omitted.
- metadata 1319 associated with the allocation entry 1314 may indicate that the entry is not bound and/or associated with data.
- the index 1304 may be used to determine an available logical capacity of the logical address space 136 (e.g., by traversing the index 1304 ).
- the available logical capacity may consider LIDs that are bound (using the storage entries), as well as LIDs that are allocated, but not yet bound (using the allocation entries, such as 1314 ).
- the allocation entries 1314 may be maintained in the index 1304 with the storage entries. Alternatively, allocation entries may be maintained in a separate index (or other datastructure). When an allocation entry becomes associated with data on the non-volatile storage device 120 (e.g., as associated with storage locations), the allocation entry may be modified and/or replaced by a storage entry.
- the index 1304 may comprise an indicator 1330 to track the available logical capacity of the logical address space 136 .
- the available logical capacity may be initialized according to the logical address space 136 presented by the storage device 120 .
- Changes to the index 1304 may cause the available logical capacity indicator 1330 to be updated. The changes may include, but are not limited to: addition of new allocation entries, removal of allocation entries, addition of storage entries, removal of allocation entries, or the like.
- FIG. 14 depicts an example of an unallocated index 1444 , which may be used to allocate storage in a non-volatile storage device.
- the index 1444 may comprise entries 1450 , which may correspond to “holes” in the LIDs indexes 1204 and/or 1304 described above. Accordingly an entry 1450 in the available index 1444 may correspond to a LID (and/or LID range, set, or the like) that is available (e.g., is not allocated nor bound).
- the index 1444 may be used to quickly determine the logical storage capacity of a logical storage space and/or to identify LIDs to allocate in response to client requests.
- the entries in the index 1444 are shown as being indexed by LID.
- the index 1444 may be indexed in other (or additional) ways.
- the unallocated index 1444 may be indexed by LID range (e.g., by the size of the LID range) as well as LID. This indexing may be used to identify unallocated LIDs sized according to client requests (e.g., to efficiently fill “holes” in the logical address space 136 ).
- FIG. 15 is a flow diagram of one embodiment of a method 1500 for allocating storage. As described above, steps of the method 1500 may be tied to particular machine components and/or may be implemented using machine-readable instructions stored on a non-transitory machine-readable storage medium.
- a non-volatile storage device may be initialized for use.
- the initialization may comprise allocating resources for the non-volatile storage device (e.g., solid-state storage device 120 ), such as communications interfaces (e.g., bus, network, and so on), allocating volatile memory, accessing solid-state storage media, and so on.
- the initialization may further comprise presenting a logical address space 136 to storage clients 116 , initializing one or more indexes (e.g., the indexes described above in conjunction with FIGS. 12-14 ), and so on.
- the non-volatile storage device may present a logical space to one or more clients.
- Step 1520 may comprise implementing and/or providing an interface (e.g., API) accessible to one or more clients, or the like.
- an interface e.g., API
- the non-volatile storage device may maintain metadata pertaining to logical allocation operations performed by the method 1500 .
- the logical allocation operations may pertain to operations in the logical address space 136 presented at step 1520 , and may include, but are not limited to: allocating logical capacity, binding logical capacity to media storage locations, and so on.
- the metadata may include, but is not limited to: indexes associating LIDs in the logical address space 136 with media storage locations on the non-volatile storage device; indexes associating storage locations with LIDs (e.g., index 1204 of FIG. 12 ), allocation entries indicating allocated LIDs having no associated storage location (e.g., index 1304 of FIG. 13 ), an unallocated index (e.g. index 1444 of FIG. 14 ), maintaining an indicator of unallocated logical capacity (e.g., indicator 1330 of FIG. 13 ), and so on.
- a client request pertaining to a LID in the logical address space 136 may be received.
- the client request may comprise a query to determine if a particular LID and/or logical capacity can be allocated, a request to allocate a LID and/or logical capacity, a request to store data on the non-volatile storage device, or the like.
- the metadata maintained at step 1530 may be referenced to determine whether the client request can be satisfied.
- Step 1550 may comprise referencing the metadata (e.g., indexes and/or indicators) maintained at step 1530 to determine an available logical capacity of the logical address space 136 and/or to identify available LIDs (or LID range) as described above.
- the method 1500 may provide a response to the client request, which if the request cannot be satisfied may comprise providing a response to indicate such.
- Providing the response may comprise one or more of: an indicator that the allocation can be satisfied, allocating LIDs satisfying the request, providing allocated LIDs satisfying the request, providing one or more requested LIDs and/or one or more additional LIDs, (e.g., if a portion of a requested set of LIDs can be allocated), or the like.
- step 1560 the flow may return to step 1530 , where the method 1500 may update the metadata (e.g., indexes, indicators, and so on) according to the allocation operation (if any) performed at step 1560 .
- the metadata e.g., indexes, indicators, and so on
- FIG. 16 is a flow diagram depicting an embodiment of a method 1600 for allocating storage. As described above, steps of the method 1600 may be tied to particular machine components and/or may be implemented using machine-readable instructions stored on a non-transitory machine-readable storage medium.
- the method 1600 may be initialized, present a logical storage space to one or more clients, and/or maintain metadata pertaining to logical operations performed by the method 1600 .
- the method 1600 may maintain metadata pertaining to physical storage operations performed by the method 1600 .
- the storage operations may include, but are not limited to: reserving physical storage capacity, canceling physical storage capacity reservations, storing data on the non-volatile storage device, deallocating physical storage capacity, grooming operations (e.g., garbage collection, error handling, and so on), physical storage space budgeting, and so on.
- metadata maintained at step 1632 may include, but is not limited to: indexes associating LIDs in the logical address space 136 with storage locations on the non-volatile storage device; indexes associating storage locations with LIDs (e.g., index 1204 of FIG.
- allocation entries indicating allocated LIDs having no associated storage location e.g., index 1304 of FIG. 13
- an unallocated index e.g. index 1444 of FIG. 14
- maintaining an indicator of unallocated logical address space 136 e.g., indicator 1330 of FIG. 13
- indicator 1330 of FIG. 13 e.g., indicator 1330 of FIG. 13
- a client request pertaining to physical storage capacity of the non-volatile storage device may be received.
- the client request may comprise a query to determine if physical storage capacity is available, a request to reserve physical storage capacity, a request to store data, a request to deallocate data (e.g., TRIM), or the like.
- the metadata maintained at steps 1630 and/or 1632 may be referenced to determine whether the client request can be satisfied.
- Step 1650 may comprise referencing the metadata at steps 1630 and/or 1632 to determine an available physical storage capacity of the non-volatile storage device and/or to identify storage locations associated with particular LIDs (e.g., in a deallocation request or TRIM) as described above.
- the method 1600 may provide a response to the client request, which if the request cannot be satisfied may comprise providing a response to indicate such.
- Providing the response may comprise one or more of: indicating that the client request can and/or was satisfied, reserving physical storage capacity for the client; cancelling a physical storage capacity reservation, storing data on the non-volatile storage device, deallocating physical storage capacity, or the like.
- the storage layer 130 may be configured to maintain allocations of the logical address space 136 and/or bindings between LIDs and media storage locations using, inter alia, the storage metadata 135 .
- the storage layer 130 may be further configured to store data in contextual format; as disclosed above, the contextual format may comprise associating persistent, contextual metadata (e.g., logical interface) with the data. Accordingly, contextual metadata pertaining to the data may be determined independent of the storage metadata 135 .
- the storage layer 130 may be configured to store data in a sequential log, such that a sequence of storage operations performed through the storage layer 130 can be replayed and/or the storage metadata 135 may be reconstructed, based upon the contents of the storage device 120 .
- the storage layer 130 may maintain a large, thinly provisioned logical address space 136 , which may simplify logical allocation operations for the storage clients (e.g., allow the storage clients 116 to operate within large, contiguous LID ranges, with low probability of LID collisions).
- the storage layer 130 may be further configured to deter the reservation of media storage locations until needed, to prevent premature exhaustion or over-reservation of physical storage resources.
- the storage layer 130 may expose access to the logical address space 136 and/or storage metadata 135 to the storage clients 116 through one or more interfaces 140 .
- storage clients 116 may delegate certain functions to the storage layer 130 .
- Storage clients 116 may leverage the virtual storage interface 132 to perform various operations, including, but not limited to: logical address space 136 management, media storage location management (e.g., mappings between LIDs and media storage locations, such as thin provisioning), deferred physical resource reservation, crash recovery, logging, backup (e.g., snapshots), crash recovery, data integrity, transactions, data move operations, cloning, deduplication, and so on.
- media storage location management e.g., mappings between LIDs and media storage locations, such as thin provisioning
- deferred physical resource reservation e.g., deferred physical resource reservation, crash recovery, logging, backup (e.g., snapshots), crash recovery, data integrity, transactions, data move operations, cloning, deduplication, and so on.
- storage clients 116 may leverage the contextual, log format to delegate crash recovery and/or data integrity functionality to the storage layer 130 . For instance, after an invalid shutdown and reconstruction operation, the storage controller 130 may provide access to the reconstructed storage metadata 135 to storage clients 116 through the interface 138 . The storage clients 116 may, therefore, delegate crash-recovery and/or data integrity to the storage layer 130 . File system storage clients 116 may require crash-recovery and/or data integrity services for certain data, such as I-node tables, file allocation tables, and so on. The storage client 116 may have to implement these services itself, which may impose significant overhead and/or complexity. The storage client 116 may be relieved from this overhead by delegating crash recovery and/or data integrity to the storage layer 130 , as disclosed herein.
- storage clients 116 may also delegate logical allocation operations and/or physical storage reservations to the storage layer 130 .
- a storage client 116 such as a file system, may maintain its own metadata to track logical and physical allocations for files; the storage client 116 may maintain a set of logical addresses that “mirrors” the media storage locations of the non-volatile storage device 120 . If the underlying storage device 120 provides a one-to-one mapping between logical block address and media storage locations, as with conventional storage devices, the block storage layer performs appropriate LBA-to-media address translations and implements the requested storage operations.
- the underlying non-volatile storage device does not support one-to-one mappings (e.g., the underlying storage device is a sequential, or write-out-of-place device, such as a solid-state storage device), another redundant set of translations are needed (e.g., a Flash Translation Layer, or other mapping).
- the redundant set of translations and the requirement that the storage client 116 maintain logical address allocations may represent a significant overhead, and may make allocating contiguous LBA ranges difficult or impossible without time-consuming “defragmentation” operations.
- the storage client 116 may delegate such allocation functionality to the storage layer 130 .
- the storage layer 130 may leverage a thinly provisioned logical address space 136 to manage large, contiguous LID ranges for the storage client 116 , without the need for redundant address translation layers.
- FIG. 17 depicts one exemplary embodiment of an index 1804 for maintaining allocations within a logical address space, such as the logical address space 136 , described above.
- the index 1804 may be embodied as a datastructure on a volatile memory 112 and/or non-transitory, machine—readable storage media 114 (e.g., part of the storage metadata 135 ).
- the index 1804 may comprise an entry for each allocated range of LIDs.
- the allocated LIDs may or may not be associated with media storage locations on the non-volatile storage device (e.g., non-volatile storage device 120 ).
- the entries may be indexed and/or linked by LID.
- the storage metadata e.g., metadata 135
- the entries in the index 1804 may include LIDs that are allocated, but that are not associated with media storage locations on a non-volatile storage device. Like the index 1204 described above, inclusion in the index 1804 may indicate that a LID is both allocated and associated with valid data on the non-volatile storage device 120 . Alternatively, the index 1804 may be implemented similarly to the index 1304 of FIG. 13 . In this case, the index 1804 may comprise entries that are associated with valid data on the non-volatile storage device 120 along with entries that are allocated but are not associated with stored data. The entries that are associated with valid data may identify the media storage location of the data, as described above. Entries that are not associated with valid, stored data (e.g., “allocation entries” such as the entry 1314 of FIG. 13 ) may have a “NULL” media storage location indicator or some other suitable indicator.
- the index 1804 may comprise security-related metadata, such as access control metadata, or the like.
- the security related metadata may be associated with each respective entry (e.g., entry 1812 ) in the index 1804 .
- the storage layer 130 may access and/or enforce the security-related metadata (if any) in the corresponding entry.
- the storage layer 130 delegates enforcement of security-related policy enforcement to another device or service, such as an operating system, access control system, or the like. Accordingly, when implementing storage operations, the storage layer 130 may access security-related metadata and verify that the requester is authorized to perform the operating using a delegate. If the delegate indicates that the requester is authorized, the storage layer 130 implements the requested storage operations; if not, the storage layer 130 returns a failure condition.
- the storage layer 130 may access the storage metadata 135 , such as the index 1804 , to allocate LIDs in the logical address space 136 , to determine a remaining logical capacity of the logical address space 136 , to determine the remaining physical storage capacity of the non-volatile storage device(s) 120 , and so on.
- the storage layer 130 may respond to queries for the remaining logical capacity, remaining physical storage capacity, and the like via the virtual storage interface 132 .
- the storage layer 130 may service requests to reserve physical storage capacity on the non-volatile storage device 120 .
- a storage client 116 may wish to perform a sequence of storage operations that occur over time (e.g., receive a data stream, perform a DMA transfer, or the like).
- the storage client 116 may reserve sufficient logical and/or physical storage capacity to perform the sequence of storage operations up-front to ensure that the operations can be completed. Reserving logical capacity may comprise allocating LIDs through the storage layer 130 (using the virtual storage interface 132 ). Physical capacity may be similarly allocated. The storage client 116 may request to reserve physical capacity through the virtual storage interface 132 . If a sufficient amount of physical capacity is available, the storage layer 130 acknowledges the request and updates the storage metadata accordingly (and as described above in conjunction with FIGS. 8 and 12 ).
- the storage layer 130 and/or storage metadata 135 is not limited to the particular, exemplary datastructures described above.
- the storage metadata 135 may comprise any suitable datastructure (or datastructure combination) for efficiently tracking logical address space 136 allocations and/or associations between LIDs and media storage locations.
- the index 1804 may be adapted such that entries in the index 1804 comprise and/or are linked to respective physical binding metadata.
- the physical binding metadata may comprise a “sub-index” of associations between LIDs in a particular allocated range and corresponding media storage locations on the non-volatile storage medium. Each “sub-range” within the allocated LID comprises an entry associating the sub-range with a corresponding media storage location (if any).
- FIG. 18 depicts one embodiment of an index entry comprising physical binding metadata.
- the entry 1818 represents an allocated LID having a range from 31744 through 46080 in the logical address space.
- the entries of the physical binding metadata associate sub-ranges of the LID with corresponding media storage locations (if any).
- the physical binding metadata 1819 may be indexed by LID as described above.
- the LID sub-range comprising 31817 to 46000 of entry 1822 is not associated with valid data on the non-volatile storage device and, as such, is associated with a “NULL” media storage location.
- the entry 1824 for the sub-range 46001 to 46080 is associated with valid data.
- the entry 1824 identifies the media storage location of the data on the non-volatile storage device (locations 12763 through 12842).
- the entry 1826 identifies the media storage location of the valid data associated with the sub-range for 31744-31816.
- the storage layer 130 is configured to segment the LIDs in the logical address space 136 into two or more portions. As shown in FIG. 19A , a LID 1900 is segmented into a first portion 1952 and a second portion 1954 . In some embodiments, the first portion 1952 comprises “high-order” bits of the LID 1900, and the second portion comprises “low-order” bits. However, the disclosure is not limited in this regard and could segment LIDs using any suitable segmentation scheme.
- the first portion 1952 may serve as a reference or identifier for a storage entity.
- a storage entity refers to any data or data structure that is capable of being persisted to the non-volatile storage device 120 ; accordingly, a storage entity may include, but is not limited to: file system objects (e.g., files, streams, I-nodes, etc.), a database primitive (e.g., database table, extent, or the like), streams, persistent memory space, memory mapped files, virtual storage unit (VSU), logical unit number (LUN), virtual logical unit number (VLUN), logical storage unit (LSU), block storage device, or the like.
- file system objects e.g., files, streams, I-nodes, etc.
- a database primitive e.g., database table, extent, or the like
- streams persistent memory space, memory mapped files, virtual storage unit (VSU), logical unit number (LUN), virtual logical unit number (VLUN), logical storage unit (LSU), block storage device, or
- the second portion 1954 may represent an offset into the storage entity.
- the storage layer 130 may reference the logical address space 136 comprising 64-bit LIDs (the logical address space 136 may comprise 2 ⁇ 64 unique LIDs).
- the storage layer 130 may partition the LIDs into a first portion 1952 comprising the high-order 32 bits of the 64-bit LID and a second portion 1954 comprising the low-order 32 bits of the LID.
- the resulting logical address space 136 may be capable of representing 2 ⁇ 32 ⁇ 1 unique storage entities (e.g., using the first portion of the LIDs), each having a maximum size (or offset) of 2 ⁇ 32 virtual storage locations (e.g., 2 TB for a virtual storage location size of 512 bytes).
- the disclosure is not limited in this regard, however, and could be adapted to use any suitable segmentation scheme.
- the first portion 1952 may comprise a larger proportion of the LID.
- the first portion 1952 may comprise 42 bits (providing 2 ⁇ 42 ⁇ 1 unique identifiers), and the second portion may comprise 22 bits (providing a maximum offset of 4 GB).
- the segmentation scheme may be similarly modified.
- the storage layer 130 may present larger logical address spaces (e.g., 128 bits and so on) in accordance with the requirements of the storage clients 116 , configuration of the computing device 110 , and/or configuration of the non-volatile storage device 120 .
- the storage layer 130 segments the logical address space 136 in response to a request from a storage client 116 or other entity.
- the storage layer 130 may allocate LIDs based on the first portion 1952 . For example, in a 64 bit address space, when the storage layer 130 allocates a LID comprising a first portion 1952 [0000 0000 0000 0000 0000 0100] (e.g., first portion 1952 logical address 4 ), the storage layer 130 is effectively allocating a logical address range comprising 2 ⁇ 32 unique LIDs 1956 (4,294,967,296 unique LIDS) ranging from:
- the storage layer 130 uses the segmentation of the LIDs to simplify the storage metadata 135 .
- the number of bits in the first portion 1952 is X
- the number of bits in the second portion 1954 is Y.
- the storage layer 130 may determine that the maximum number of unique LIDs that can be allocated is 2 ⁇ X, and that the allocated LIDs can be referenced using only the first portion of the LID (e.g., the set of X bits). Therefore, the storage layer 130 may simplify the storage metadata index to use entries comprising only the first portion of a LID.
- the storage layer 130 may determine that the LIDs are allocated in fixed-sized ranges of 2 ⁇ Y. Accordingly, each entry in the storage metadata 135 (e.g., index 1904 ) may be of the same extent. Therefore, the range portion of the metadata entries may be omitted.
- FIG. 19B depicts one example of an allocation index 1904 that has been simplified by segmenting the logical address space 136 .
- the first portion 1952 of the LIDs in the logical address space 136 managed by the index 1904 is depicted using eight (8) bits.
- the remaining portion of the LID e.g., remaining 54 bits
- other portions of the LID may be used for other logical address space 136 segmentation schemes, such as logical volume identifiers, partition identifiers, and so on.
- Each entry 1912 in the index 1904 may be uniquely identified using the first portion (eight bits) of a LID. Accordingly, the entries 1912 may be indexed using only the first portion 1952 (e.g., 8 bits). This simplification may reduce the amount of data required to identify an entry 1912 from 64 bits to 8 bits (assuming a 64-bit LID with an 8-bit first portion). Moreover, the LIDs may be allocated in fixed sized logical ranges (e.g., in accordance with the second portion 1954 ). Therefore, each entry 1912 may represent the same range of allocated LIDs. As such, the entries 1912 may omit explicit range identifiers, which may save an additional 64 bits per entry 1912 .
- the storage layer 130 may use the simplified index 1904 to maintain LID allocations in the logical address space 136 and/or identify LIDs to allocate in response to requests from storage clients 116 .
- the storage layer 130 maintains a listing of “first portions” that are unallocated. Since, in some embodiments, allocations occur in a pre-determined way (e.g., using only the first portion 1952 , and within a fixed range 1956), the unallocated LIDs may be expressed in a simple list or map as opposed to an index or other datastructure. As LIDs are allocated, they are removed from the datastructure and are replaced when they are deallocated.
- FIG. 19C depicts an example of physical binding metadata for use in a segmented logical addressing scheme.
- LIDs are segmented such that the first portion 1952 comprises 56 bits, and the second portion 1954 comprises 8 bits (the reverse of FIG. 19B ).
- the entry 1914 is identified using the first portion 0000 0000 0000 0000 0000 0000 0000 0111 1010.
- the entries 1922 of the index 1919 may be simplified to reference only offsets within the entry 1914 (e.g., within the second portion, which comprises 8 bits in the FIG. 19C example).
- the head entry 1926 may omit the top-end of the second portion (e.g., may omit 1111 1111 since it can be determined that the top-most entry will necessarily include the maximal extent of the range defined by the second portion).
- the tail entry 1924 may omit the bottom-end of the second portion 1954 (e.g., may omit 0000 000 since it can be determined that the bottom-most entry will necessarily include the beginning of the range defined by the second portion 1954 ).
- Each entry 1914 associates a range within the second portion with valid data on the non-volatile storage device (if any), as described above.
- storage clients 116 may delegate LID allocation to the storage layer 130 using the virtual storage interface 132 .
- the delegation may occur in a number of different ways. For example, a storage client 116 may query the storage layer 130 (via the storage layer 130 interface 138 ) for any available LID. If a LID is available, the storage layer 130 returns an allocated LID to the storage client 116 . Alternatively, the storage client 116 may request a particular LID for allocation. The request may comprise the first portion of the LID or an entire LID (with an offset). The storage layer 130 may determine if the LID is unallocated and, if so, may allocate the LID for the client and return an acknowledgement.
- the storage layer 130 may allocate an alternative LID and/or may return an error condition.
- the storage layer 130 may indicate whether particular LIDs are allocated and/or whether particular LIDs are bound to media storage locations on the non-volatile storage device 120 .
- the queries may be serviced via the virtual storage interface 132 .
- the storage layer 130 may expose the segmentation scheme to the storage clients 116 .
- storage clients 116 may query the storage layer 130 to determine the segmentation scheme currently in use.
- the storage clients 116 may also configure the storage layer 130 to use a particular LID segmentation scheme adapted to the needs of the storage client 116 .
- the storage layer 130 may allocate LIDs using only the first portion 1952 of a LID. If the LID is unallocated, the storage layer 130 acknowledges the request, and the storage client 116 is allocated a range of LIDs in the logical address space 136 corresponding to the first portion 1952 and comprising the range defined by the second portion 1954 . Similarly, when allocating a “nameless LID” (e.g., any available LID selected by the storage layer 130 ), the storage layer 130 may return only the first portion of the allocated LID. In some embodiments, when a client requests a LID using the first portion and the second portion, the storage layer 130 extracts the first portion from the requested LID, and allocates a LID corresponding to the first portion to the client (if possible).
- a “nameless LID” e.g., any available LID selected by the storage layer 130
- the disclosed embodiments support such a large number of addresses for the second portion over such a high number of contiguous addresses that storage requests that cross a LID boundary are anticipated to be very rare.
- the storage layer 130 may even prevent allocations that cross LID boundaries (as used herein, a LID boundary is between two contiguous LIDs, the first being the last addressable LID in a second portion of a LID and the second being the first addressable LID in a next successive first portion of a LID). If the request crosses a boundary between pre-determined LID ranges, the storage layer 130 may return an alternative LID range that is properly aligned to the LID segmentation scheme, return an error, or the like. In other embodiments, if the request crosses a boundary between pre-determined LID ranges, the storage layer 130 may allocate both LIDs (if available).
- FIG. 20A is a block diagram depicting a file system storage client 2016 leveraging the storage layer 130 to perform file system operations.
- the file system storage client 2016 accesses the storage layer 130 via the virtual storage interface 132 to allocate LIDs for storage entities, such as file system objects (e.g., files).
- LIDs for storage entities
- file system objects e.g., files
- the allocation request may be implemented as described above. If the requested LIDs can be allocated, the storage layer 130 returns an allocated LID to the file system storage client 2016 .
- the LID may be returned as a LID and an offset (indicating an initial size for the file), a LID range, a first portion of a LID, or the like.
- FIG. 20A example shows the storage layer 130 implementing a segmented LID range and, as such, the storage layer 130 may return the first portion of a LID 2062 in response to an allocation request.
- the file system storage client 2016 may implement a fast and efficient mapping between LIDs and storage entities. For example, when the first portion of the LID is sufficiently large, the file system storage client 2016 may hash file names into LID identifiers (into hash codes of the same length as the first portion of the LID 2062). When a new file is created, the file system storage client 2016 hashes the file name to generate the first portion of the LID 2062 and issues a request to the storage layer 130 to allocate the LID. If the LID is unallocated (e.g., no hash collisions have occurred), the storage layer 130 may grant the request.
- LID identifiers into hash codes of the same length as the first portion of the LID 2062
- the file system storage client 2016 may not need to maintain an entry in the file system table 2060 for the new file (or may only be required to maintain an abbreviated version of a table entry 2061 ), since the LID 2062 can be derived from the file name. If a name collision occurs, the storage layer 130 may return an alternative LID, which may be derived from the hash code (or file name), which may obviate the need for the file system table 2060 to maintain the entire identifier.
- the file system storage client 2016 may maintain a file system table 2060 to associate file system objects (e.g., files) with corresponding LIDs in the logical address space 136 of the storage layer 130 .
- file system table 2060 is persisted on the non-volatile storage device 120 at a pre-determined LID. Accordingly, the file system storage client 2016 may delegate crash recovery and/or data integrity for the file system table 2060 (as well as the file system objects themselves) to the storage layer 130 .
- the file system storage client 2016 may reference files using the file system table 2060 .
- the file system storage client 2016 may access a file system entry 2061 corresponding to the file (e.g., using a file name lookup or another identifier, such as an I-node, or the like).
- the entry 2061 comprises a LID of the file, which, in the FIG. 20C example, is a first portion of a LID 2062.
- the file system storage client 2016 performs storage operations using the first portion 2062 of the LID along with an offset (the second portion 2064 ).
- the file system storage client 2016 may combine the file identifier (first portion 2062 ) with an offset 2064 to generate a full LID 2070.
- the LID 2070 may be sent to the storage layer 130 in connection with requests to perform storage operations within the logical address space 136 .
- the storage layer 130 performs storage operations using the storage metadata 135 .
- Storage requests to persist data in the logical address space 136 comprise the storage layer 130 causing the data to be stored on the non-volatile storage device 120 in a contextual, log-based format, as disclosed above.
- the storage layer 130 updates the storage metadata 135 to associate LIDs in the logical address space 136 with media storage locations on the non-volatile storage comprising data stored in the storage operation.
- Storage operations to access persisted data on the non-volatile storage device may comprise the storage client, such as the file system storage client 2016 requesting the data associated with one or more LIDs 2070 in the logical address space.
- the file system storage client 2016 may identify the LIDs using the file system table 2060 or another datastructure.
- the storage layer 130 determines the media storage location of the LIDs 2070 on the non-volatile storage device 120 using the storage metadata 135 , which is used to access the data.
- storage clients may deallocate a storage entity.
- Deallocating a storage entity may comprise issuing a deallocation request to the storage layer 130 via the virtual storage interface 132 .
- the storage layer 130 removes the deallocated LIDs from the storage metadata 135 and/or may mark the deallocated LIDs as unallocated.
- the storage layer 130 may also invalidate the media storage locations corresponding to the deallocated LIDs in the storage metadata 135 and/or the non-volatile storage device 120 (e.g., using a reverse map, as disclosed above).
- a deallocation may be a “hint” to a groomer 370 of the non-volatile storage device 120 that the media storage locations associated with the deallocated LIDs are available for recovery.
- the virtual storage interface 132 may provide an interface through which storage clients may issue a deallocation “directive” (as opposed to a hint).
- the deallocation directive may configure the storage layer 130 to return a pre-determined value (e.g., “0” or “NULL”) for subsequent accesses to the deallocated LIDs (or the media storage locations associated therewith), even if the data is still available on the non-volatile storage device 120 .
- the pre-determined value may continue to be returned until the LIDs are reallocated for another purpose.
- the storage layer 130 implements a deallocation directive by removing the deallocated LIDs from the storage metadata and returning a pre-determined value in response to requests for LIDs that are not allocated in the storage metadata 135 and/or are not bound (e.g., are not associated with valid data on the non-volatile storage device).
- the storage layer 130 may cause the corresponding media storage locations on the non-volatile storage device 120 to be erased.
- the storage layer 130 may provide the file system storage client 2016 with an acknowledgement when the erasure is complete. Since erasures make take a significant amount of time to complete relative to other storage operations, the acknowledgement may be issued asynchronously.
- FIG. 20B is a block diagram depicting another embodiment 2001 of storage client leveraging the storage layer 130 .
- the storage layer 130 presents a logical address space 136 to the file system storage client 2016 and maintains storage metadata 135 as described above.
- the storage layer 130 maintains name-to-LID association metadata 2036 .
- This metadata 2036 may comprise associations between LIDs in the logical address space 136 and storage entity identifiers of storage clients 116 .
- a file system storage client 2016 may request LID allocations using a storage entity identifier or name 2071 (e.g., file name) as opposed to a LID.
- the file system storage client 2016 relies on the storage layer 130 to select an available LID (as opposed to specifying a particular LID), is referred to as a “nameless write” or “nameless allocation.”
- the storage layer 130 allocates a LID for the file system storage client 2016 within the logical address space 136 .
- the storage layer 130 may maintain an association between the allocated LID and the name 2071 in name-to-LID metadata 2036 .
- File system storage clients 2016 may request subsequent storage operations on the storage entity using the name 2071 (along with an offset, if needed).
- the file system table 2060 of the file system storage client 2016 may be simplified since entries 2063 need only maintain the name of a file as opposed to the name and LID.
- the storage layer 130 accesses the name-to-LID metadata 2036 to determine the LID associated with the name 2071 and implements the storage request as described above.
- the name-to-LID metadata 2036 may be included with the storage metadata 135 .
- entries in the index 1804 of FIG. 18 may be indexed by name in addition to (or in place of) a LID.
- the storage layer 130 may persist the name-to-LID metadata 2036 on the non-volatile storage device 120 , such that the integrity of the metadata 2036 is maintained despite invalid shutdown conditions.
- the name-to-LID metadata 2036 may be reconstructed using the contextual, log-based data format on the non-volatile storage device 120 .
- FIG. 21 is a flow diagram of one embodiment of a method 2100 for providing a storage layer.
- the method 2100 presents a logical address space 136 for the non-volatile device to storage clients.
- the logical address space 136 may be defined independently of the non-volatile storage device. Accordingly, the logical capacity of the logical address space 136 (e.g., the size of the logical address space 136 and/or the size of the virtual storage blocks thereof) may exceed the physical storage capacity of the non-volatile storage device.
- the logical address space 136 is presented via an application-programming interface (API) that is accessible to storage clients, such as operating systems, file systems, database applications, and the like.
- API application-programming interface
- storage metadata is maintained.
- the storage metadata may track allocations of LIDs within the logical address space 136 , as well as bindings between LIDs and media storage locations of the non-volatile storage device.
- the metadata may further comprise indications of the remaining logical capacity of the logical address space 136 , the remaining physical storage capacity of the non-volatile storage device, the status of particular LIDs, and so on.
- the metadata is maintained in response to storage operations performed within the logical address space.
- the storage metadata is updated to reflect allocations of LIDs by storage clients. When storage clients persist data to allocated LIDs, bindings between the LIDs and the media storage locations comprising the data are updated.
- storage operations are performed using a log-based sequence.
- the storage layer 130 (and non-volatile storage device) may be configured to store data in a log-based format, such that an ordered sequence of storage operations performed on the storage device can be reconstructed in the event of an invalid shutdown (or other loss of storage metadata 135 ).
- the ordered sequence of storage operations allows storage clients to delegate crash recovery, data integrity, and other functionality to the storage layer 130 .
- FIG. 22 is a flow diagram of one embodiment of a method 2200 for segmenting LIDs of a logical address space.
- the method 2200 segments LIDs of a logical address space 136 into at least a first portion and a second portion.
- the segmentation of step 2230 may be performed as part of a configuration process of the storage layer 130 and/or non-volatile storage device (e.g., when the device is initialized).
- the segmentation of step 2220 may be performed in response to a request from a storage client.
- the storage client may request a particular type of LID segmentation, according to the storage requirements thereof. For example, if the storage client has a need to store a large number of relatively small storage entities, the storage client may configure the LID segmentation to dedicate a larger proportion of the LID to identification bits and a smaller proportion to offset bits. Alternatively, a storage client who requires a relatively small number of very large storage entities may configure the method 2200 to implement a different type of segmentation that uses a larger proportion of the LID for offset bits (allowing for larger storage entities).
- the storage layer 130 uses the first portion of the LID to reference storage client allocations (e.g., as a reference for storage entities).
- Step 2230 may comprise reconfiguring the storage metadata to allocate LIDs using only the first portion of the LID (e.g., the upper X bits of a LID).
- the size of the first portion may determine the number of unique storage entities that can be expressed in the storage metadata (e.g., as 2 ⁇ X ⁇ 1, where X is the number of bits in the first portion). Accordingly, a first portion comprising 32 bits may support approximately 2 ⁇ 32 unique storage entities.
- the reconfiguration may simplify the storage metadata, since each entry may be identified using a smaller amount of data (only the first portion of the LID as opposed to the entire LID).
- the storage layer 130 uses the second portion of the LID as an offset into a storage entity.
- the size of the second portion may define the maximum size of a storage entity (under the current segmentation scheme).
- the size of a LID may be defined as the virtual block size times 2 ⁇ Y, where Y is the number of bits in the second portion.
- a virtual block size of 512 and second portion comprise 32 bits results in a maximum storage entity size of 2 TB.
- Step 2240 may comprise reconfiguring the storage metadata to reference LID to media storage location bindings using only the second portion of the LID. This may allow the storage metadata entries (e.g., entries in physical binding metadata) to be simplified, since the bindings can be expressed using a smaller number of bits.
- the storage layer 130 uses the LID segmentation of step 2220 to allocate LIDs comprising contiguous logical address ranges in the logical address space.
- Step 2250 may comprise the storage layer 130 allocating LIDs using only the first portion of the LID (e.g., the upper X bits).
- the allocated LID may comprise a contiguous logical address range corresponding to the number of bits in the second portion, as described above.
- allocating a LID at step 2250 does not cause corresponding logical storage locations to be reserved of “bound” thereto.
- the bindings between allocated LIDs and media storage locations may not occur until the storage client actually performs storage operations on the LIDs (e.g., stores data in the LIDs).
- the delayed binding prevents the large, contiguous LID allocations from exhausting the physical storage capacity of the non-volatile storage device.
- FIG. 23 is a flow diagram of one embodiment of a method 2300 for providing crash recovery and data integrity in a storage layer 130 .
- the storage layer 130 presents a logical address space 136 to one or more storage clients 116 (e.g., through the interface 138 ).
- Step 2330 may comprise maintaining metadata 135 configured to associate LIDs in the logical address space 136 with media storage locations on the non-volatile storage device 120 .
- the storage layer 130 causes data to be stored on the non-volatile storage device in a contextual, log-based format.
- the contextual, log-based formatting of the data is configured such that, in the event of an invalid shutdown, the data (and metadata pertaining thereto) can be reconstructed.
- the storage layer 130 reconstructs data stored on the non-volatile storage device using the data formatted in the contextual, log-based format.
- the log-based format may comprise storing LID identifiers with data on the non-volatile storage device.
- the LID identifiers may be used to associate the data with LIDs in the logical address space 136 (e.g., reconstruct the storage metadata).
- Sequence indicators stored with the data on the non-volatile storage device are used to determine the most current version of data associated with the same LID; since data is written out-of-place, updated data may be stored on the non-volatile storage device along with previous, obsolete versions.
- the sequence indicators allow the storage layer 130 to distinguish older versions from the current version.
- the reconstruction of step 2350 may comprise reconstructing the storage metadata, determining the most current version of data for a particular LID (e.g., identifying the media storage location that comprises the current version of the data), and so on.
- the storage layer 130 provides access to the reconstructed data to storage clients. Accordingly, the storage clients may delegate crash recovery and/or data integrity functionality to the storage layer 130 , which relieves the storage clients from implementing these features themselves. Accordingly, the storage clients can be simpler and more efficient.
- FIG. 24A is a flow diagram of one embodiment of a method 2400 for servicing queries pertaining to the status of a LID.
- Step 2420 may comprise receiving a request pertaining to the status of a particular LID in the logical address space 136 presented by the storage layer 130 .
- the request may pertain to the logical address space 136 as a whole (e.g., a query for the remaining logical capacity of the logical address space 136 , or the like).
- the query may pertain to the physical storage capacity of the non-volatile storage device, such as a query regarding the physical storage capacity that is bound to LIDs in the logical address space 136 (e.g., currently occupied), available physical storage capacity, and so on.
- the storage layer 130 accesses storage metadata to determine the status of the requested LID, logical capacity, physical storage capacity, or the like.
- the access may comprise identifying an entry for the LID in a logical-to-physical map, in an allocation index, or the like. If the particular LID falls within an entry in an allocation index and/or logical to physical index, the storage layer 130 may determine that the LID is allocated and/or may determine whether the LID is bound to a media storage location.
- the access may further comprise, traversing a metadata index to identify unallocated LIDs, unused media storage locations, and so on.
- the traversal may further comprise identifying allocated (or unallocated) LIDs to determine current LID allocation (or unallocated LID capacity), to determine bound physical storage capacity, determine remaining physical storage capacity, or the like.
- the storage layer 130 returns the status determined at step 2430 to the storage client 116 .
- FIG. 24B is a flow diagram of one embodiment of a method 2401 for servicing queries pertaining to the status of a media storage location (or range of media storage locations) of a non-volatile storage device.
- the storage layer 130 receives a request pertaining to the status of a particular media storage location on a non-volatile storage device.
- the media storage location may be associated with a LID in the logical address space 136 presented by the storage layer 130 .
- the query may be “iterative” and may pertain to all media storage locations on the non-volatile storage device (e.g., a query regarding the status of all media storage locations on the device).
- the query may pertain to the physical storage capacity of the non-volatile storage device, such as a query regarding the physical storage capacity that is bound to LIDs in the logical address space 136 (e.g., currently occupied), available physical storage capacity, and so on.
- a second non-volatile storage device may be configured to mirror the contents of a first non-volatile storage device.
- the data stored on the first logical storage device may be stored sequentially (e.g., in a contextual, log-based format).
- the first non-volatile storage device may comprise “invalid” data (e.g., data was deleted, was made obsolete by a sequent storage operation, etc.).
- the query of setp 2421 may be issued by the second, non-volatile storage device to determine which media storage locations on the first, non-volatile storage device “exist” (e.g., are valid), and should be mirrored on the second non-volatile storage device. Accordingly, the query of step 2421 may be issued in the form of an iterator, configured to iterate over (e.g., discover) all media storage locations that comprise “valid data,” and the extent of the valid data.
- Step 2431 comprises accessing storage metadata, such as the index 1204 or reverse map 1222 described above in conjunction with FIG. 12 , to determine whether the specified media storage location comprises valid data and/or to determine the extent (or range) of valid data in the specified media storage location.
- the storage layer 130 returns the status determined at step 2431 to the requester.
- methods 2400 and 2401 are used to implement conditional storage operations.
- a conditional storage operation refers to a storage operation that is to occur if one or more conditions are met.
- a conditional write may comprise a storage client requesting that data be written to a particular set of LIDs.
- the storage layer 130 may implement the conditional write if the specified LIDs do not exist (e.g., are not already allocated to another storage client), and the non-volatile storage comprises sufficient physical storage capacity to satisfy the request.
- a conditional read may comprise a storage client requesting data from a particular set of LIDs.
- the storage layer 130 may implement the conditional read if the specified LIDs exist and are bound to valid data (e.g., are in storage metadata maintained by the storage layer 130 , and are bound to media storage locations).
- the storage layer 130 provides for “nameless” reads and writes, in which a storage client presents identifier, and the storage layer 130 determines the LIDs associated with the identifier, and services the storage request accordingly (e.g., “nameless” writes as described above). In this case, the storage layer 130 offloads management of identifier-to-LID mappings for the storage client.
- the storage metadata maintained by the storage layer may provide for designating certain portions of the logical address space 136 as being “temporary” or “ephemeral.”
- an ephemeral address range is an address range that is set to be automatically deleted under certain conditions. The conditions may include, but are not limited to: a restart operation, a shutdown event (planned or unplanned), expiration of a pre-determined time, resource exhaustion, etc.
- Data may be identified as ephemeral in storage metadata maintained by the storage layer 130 , in metadata persisted to the solid-state storage media, or the like.
- an entry 1214 in the index 1204 (forward map) may be identified as ephemeral in the metadata 1219 thereof.
- the storage layer 130 may designate an a portion of the large logical address space 136 as comprising ephemeral data. Any entries in the ephemeral address range may be designated as ephemeral in the index without additional modifications to entry metadata.
- an ephemeral indicator may be included in a media storage location on the non-volatile storage media.
- FIG. 25A depicts one example of a contextual data format (e.g., packet format) 2500 , which may be used to store a data segment 2520 on a non-volatile storage media.
- packets 2500 may be subject to further processing before being persisted on a media storage location (e.g., packets may be encoded into ECC codewords by an ECC generator 304 as described above).
- the packet format 2500 may comprise persistent metadata 2564 , which may include logical interface metadata 2565 , as described above.
- the packet format 2500 may comprise and/or be associated with a sequence indicator 2518 , which may include, but is not limited to a sequence number, timestamp, or other suitable sequence indicator.
- the sequence indicator 2518 may be included in the persistent metadata 2564 (e.g., as another field, not shown).
- a sequence indicator 2518 may be stored elsewhere on the non-volatile storage media 122 .
- a sequence indicator 2518 may be stored on a page (or virtual page) basis, on an erase-block basis, or the like.
- each logical erase block may be marked with a respective marking, and packets may be stored sequentially therein. Accordingly, the sequential order of packets may be determined by a combination of the logical erase block sequence indicators (e.g., indicators 2518 ) and the sequence of packets 2500 within each logical erase block.
- the logical erase block sequence indicators e.g., indicators 2518
- the storage layer 130 may be configured to reconstruct the storage metadata (e.g., index, etc.) using the contextual, log-based formatted data stored on the non-volatile storage media 122 .
- Reconstruction may comprise the storage layer 130 (or another process) reading packets 2500 formatted in the contextual, log-based format from media storage locations of the solid-state storage media 122 .
- a corresponding entry in the storage metadata e.g., the indexes described above
- the LID range associated with the entry is derived from the LID 2516 in the header 2512 of the packet.
- the sequence indicator 2518 associated with the data packet may be used to determine the most up-to-date version of data 2514 for a particular LID.
- the storage layer 130 may write data “out-of-place” due to, inter alia, wear leveling, write amplification, and other considerations. Accordingly, data intended to overwrite an existing LID may be written to a different media storage location than the original data.
- the overwritten data is “invalidated” as described above; this data, however, remains on the solid-state storage media 122 until the erase block comprising the data is groomed (e.g., reclaimed and erased).
- the sequence identifier may be used to determine which of two (or more) contextual, log-based packets 2500 corresponding to the same LID comprises the current, valid version of the data.
- the header 2512 includes an ephemeral indicator 2568 .
- the ephemeral indicator 2568 may be used to identify data that should be invalidated (e.g., deleted). Invalidating ephemeral data may comprise omitting the LIDs 2514 referenced in the logical interface 2565 of the packet 2500 , marking the data segment 2520 as invalid in a reverse-index, and so on. Similarly, if data marked as ephemeral is more “up-to-date” than other data per the sequence indicator 2518 , the original, “older” data may be retained and the ephemeral data may be ignored.
- the storage layer 130 may provide an API through which storage clients may designate certain LID ranges (or other identifiers) as being ephemeral. Alternatively, or in addition, the storage layer 130 may implement higher-level interfaces using ephemeral data. For example, a multi-step atomic write (e.g., multi-block atomic write), may be implemented by issuing multiple write requests, each of which designates the data as being ephemeral. When all of the writes are completed, the ephemeral designation may be removed. If a failure occurs during the multi-step atomic write, data that was previously written can be ignored (no “roll-back” is necessary), since the data will be removed the next time the device is restarted.
- a multi-step atomic write e.g., multi-block atomic write
- a “transaction” refers to a plurality of operations that are completed as a group. If any one of the transaction operations is not completed, the other transaction operations are rolled-back. As a transaction are implemented, the constituent storage operations may be marked as ephemeral. Successful completion of the transaction comprises removing the ephemeral designation from the storage operations. If the transaction fails, the ephemeral data may be ignored.
- ephemeral data may be associated with a time-out indicator.
- the time-out indicator may be associated with the operation of a storage reclamation process, such as a groomer.
- a storage reclamation process such as a groomer.
- the groomer evaluates a storage division (e.g., erase block, page, etc) for reclamation, ephemeral data therein may be treated as invalid data.
- the ephemeral data may be omitted during reclamation processing (e.g., not considered for storage division selection and/or not stored in another media storage location during reclamation).
- ephemeral data may not be treated as invalid until its age exceeds a threshold.
- the age of ephemeral data may be determined by the sequence indicator 2518 associated therewith.
- the threshold may be set on a per-packet basis (e.g., in the header 2512 ), may be set globally (through an API or setting of the storage layer), or the like.
- removing an ephemeral designation may comprise updating storage metadata (e.g., index 1204 ) to indicate that a particular entry is no longer to be considered to be ephemeral.
- the storage layer 130 may update the ephemeral indicator stored on the solid-state storage media (e.g., in persistent metadata 2564 of a packet 2500 ).
- the solid-state storage media is write-out-of-place, it may not be practical to overwrite (or rewrite) these indicators. Therefore, in some embodiments, the storage layer 130 persists a “note” on the solid-state storage media (e.g., writes a persistent note to a media storage location of the solid-state storage media).
- a persistent note refers to a “metadata note” that is persistently stored on the solid-state storage media. Removing the ephemeral designation may comprise persisting a metadata note indicating the removal to the solid-state storage media.
- a persistent note 2501 may comprise a reference 2511 that identifies one or more packets 2500 on a media storage location.
- the reference 2511 may comprise any suitable identifying information including, but not limited to: a logical interface, a LID, a range, a media storage location identifier, a sequence indicator, or the like.
- the persistent note 2501 may also include a directive 2513 , which, in the FIG.
- 25B example may be a directive to remove an ephemeral designation from the identified packets. Additional details regarding persistent notes are disclosed in U.S. patent application Ser. No. 13/330,554, entitled “Apparatus, System, and Method for Persistent Metadata,” filed Dec. 19, 2011, and which is hereby incorporated by reference.
- the logical address space 136 presented by the storage layer 130 may include an “ephemeral” LID range.
- an ephemeral LID range comprises references to ephemeral data (e.g., LIDs that are to be “auto-deleted” on restart, or another condition). This segmentation may be possible due to the storage layer 130 maintaining a large (e.g., sparse) logical address space 136 , as described above.
- the storage layer 130 maintains ephemeral data in the ephemeral logical address range, as such, each entry therein is considered to be ephemeral.
- An ephemeral indicator may also be included in contextual, log-based formatted data bound to the LIDs within the ephemeral range.
- FIG. 25C depicts one example of a method for using ephemeral designations to implement a multi-step operation.
- the method 2503 may start and be initialized as described above.
- the method receives a request to allocate a range of LIDs in a logical address space.
- the request may indicate that the LIDs are to be designated as ephemeral.
- the request may be received from a storage client (e.g., an explicit allocation request).
- the request may be made as part of a higher-level API provided by the storage layer 130 , which may include, but is not limited to: a transaction API, a clone API, move API, deduplication API, an atomic-write API, or the like.
- a higher-level API provided by the storage layer 130 , which may include, but is not limited to: a transaction API, a clone API, move API, deduplication API, an atomic-write API, or the like.
- Step 2540 the requested LIDs are allocated as described above (unless not already allocated by another storage client).
- Step 2540 may further comprise updating storage metadata to indicate that the LIDs ephemeral, which may include, but is not limited to: setting an indicator in a entry for the LIDs in the storage metadata (e.g., index), allocating the LIDs in an “ephemeral range” of the index.
- the storage client may request one or more persistent storage operations on the ephemeral LIDs of step 2540 .
- the storage operations may comprise a multi-block atomic write, operations pertaining to a transaction, a snapshot operation, a clone (described in additional detail below), or the like.
- Step 2550 may comprise marking contextual, log-based data associated with the persistent storage operations as ephemeral as described above (e.g., in a header of a packet comprising the data).
- step 2560 if the method receives a request to remove the ephemeral designation, the flow continues to step 2562 ; otherwise, the flow continues to step 2570 .
- the request of step 2560 may be issued by a storage client and/or the request may be part of a higher-level API as described above. For example, the request may be issued when the constituent operations a transaction or atomic operation are complete.
- Step 2562 the ephemeral designation applied at steps 2540 and 2550 are removed.
- Step 2562 may comprise removing metadata indicators from storage metadata, “folding” the ephemeral range into a “non-ephemeral range” of the storage metadata index, or the like (folding is described in additional detail below).
- Step 2562 may further comprising storing one or more persistent notes on the non-volatile storage media that remove the ephemeral designation from data corresponding to the formerly ephemeral data as described above.
- the method 2500 may determine whether the ephemeral data should be removed. If not, the flow continues back to step 2560 ; otherwise, the flow continues to step 2780 .
- the ephemeral data is removed (or omitted) when the storage metadata is persisted (as part of a shutdown or reboot operation). Alternatively, or in addition, data that is designated as ephemeral on the non-volatile storage media may be ignored during a reconstruction process.
- step 2790 the flow ends until a next request is received, at which point the flow continues at step 2530 .
- FIG. 26 depicts one example of a method for reconstructing storage metadata from data stored on a non-volatile storage medium in a contextual, log-based format.
- Step 2620 may comprise receiving a request to reconstruct storage metadata from the contents of a non-volatile storage medium or device.
- the request may be received in response to storage metadata maintained by the storage layer 130 (or another entity) being lost or out-of-sync with the contents of the physical storage media.
- portions of the storage metadata described herein e.g., the index 1204 and/or reverse map 1222 ) may be maintained in volatile memory. In an invalid shutdown, the contents of the volatile memory may be lost before the storage metadata can be stored in non-volatile storage.
- a second storage device may be configured to mirror the contents of a first storage device; accordingly, the second storage device may maintain storage metadata describing the contents of the first storage device.
- the second storage device may lose communication with the first storage device and/or may need to be rebuilt (e.g., initialized).
- the initialization may comprise reconstructing storage metadata from the contents of the first storage device (e.g., through queries to the first storage device as described above in conjunction with FIG. 24B ).
- the method iterates over media storage locations of the storage device.
- the iteration may comprise accessing a sequence of media storage locations on the non-volatile storage medium, as described above in conjunction with FIG. 23 .
- the method 2600 access data formatted in the contextual, log-based format described above.
- the method 2600 may reconstruct the storage metadata using information determined from the contextual, log-based data format on the non-volatile storage media 122 .
- the method 2600 may determine the LIDs associated with the data, may determine whether the data is valid (e.g., using persistent notes and/or sequence indicators as described above), and so on.
- step 2640 may comprise issuing queries to another storage device to iteratively determine which media storage locations comprise valid data.
- the iterative query approach (described above in conjunction with FIG. 24B ) may be used to mirror a storage device.
- step 2650 the method 2600 determines whether a particular data packet is designated as being ephemeral. The determination may be based on an ephemeral indicator in a header of the packet. The determination may also comprise determining whether a persistent note that removes the ephemeral designation exists (e.g., a persistent note as described above in conjunction with FIG. 25B ). Accordingly, step 2650 may comprise the method 2650 maintaining the metadata for the packet in a temporary (e.g., ephemeral) location, until the iteration of step 2630 completes and the method 2600 can determine whether a persistent note removing the ephemeral designation exists.
- a temporary e.g., ephemeral
- step 2650 determines that the data is ephemeral
- the flow continues to step 2660 ; otherwise, the flow continues to step 2670 .
- the method 2600 removes the ephemeral data. Removing the data may comprise omitting LIDs associated with the data from storage metadata (e.g., the index 1204 described above), marking the media storage location as “invalid” and available to be reclaimed (e.g., in the reverse map 1222 ), or the like.
- step 2670 the method reconstructs the storage metadata as described above.
- step 2670 may further comprise determining whether the data is valid (as described above in conjunction with FIG. 24B ). If the data is valid, the method 2600 may be configured to perform further processing. For example, if the method 2600 is being used to construct a mirror of another storage device, step 2670 may comprise transferring the valid data to the mirror device.
- the storage layer 130 may provide an API to order storage operations performed thereon.
- the storage layer 130 may provide a “barrier” API to determine the order of operations.
- a “barrier” refers to a primitive that enforces an order of storage operations.
- a barrier may specify that all storage operations that were issued before the barrier are completed before the barrier, and that all operations that were issued after the barrier complete after the barrier.
- a barrier may mark a “point-in-time” in the sequence of operations implemented on the non-volatile storage device.
- a barrier is persisted to the non-volatile storage media as a persistent note.
- a barrier may be stored on the non-volatile storage media, and may, therefore, act as a persistent record of the state of the non-volatile storage media at a particular time (e.g., a particular time within the sequence of operations performed on the non-volatile storage media).
- the storage layer 130 may issue an acknowledgement when all operations issued previous to the barrier are complete.
- the acknowledgement may include an identifier that specifies the “time” (e.g., sequence pointer) corresponding to the barrier.
- the storage layer 130 may maintain a record of the barrier in the storage metadata maintained thereby.
- Barriers may be used to guarantee the ordering of storage operations. For example, a sequence of write requests may be interleaved with barriers. Enforcement of the barriers may be used to guarantee the ordering of the write requests. Similarly, interleaving barriers between write and read requests may be used to remove read before write hazards.
- Barriers may be used to enable atomic operations (similarly to the ephemeral designation described above). For example, the storage layer 130 may issue a first barrier as a transaction is started, and then issue a second barrier when complete. If the transaction fails, the storage layer 130 may “roll back” the sequence of storage operations between the first and second barriers to effectively “undo” the partial transaction. Similarly, a barrier may be used to obtain a “snapshot” of the state of the non-volatile storage device at a particular time. For instance, the storage layer 130 may provide an API to discover changes to the storage media that occurred between two barriers.
- barriers may be used to synchronize distributed storage systems.
- a second storage device may be used to mirror the contents of a first storage device.
- the first storage device may be configured to issue barriers periodically (e.g., every N storage operations).
- the second storage device may lose communication with the first storage device for a certain period of time.
- the second storage device may transmit its last barrier to the first storage device, and then may mirror only those changes that occurred since the last barrier.
- Distributed barriers may also be used to control access to and/or synchronize shared storage devices.
- storage clients may be issued a credential that allows access to a particular range of LIDs (read only access, read/write, delete, etc.).
- the credentials may be tied to a particular point or range in time (e.g., as defined by a barrier).
- the credential may be updated.
- the credential may expire.
- the client Before being allowed access to the distributed storage device, the client may first be required to access a new set of credentials and/or ensure that local data (e.g., cached data, etc.), is updated accordingly.
- FIG. 27 is a flow chart of one embodiment of a method for providing barriers in storage system 102 .
- the method 2700 starts and is initialized as disclosed herein.
- a request to issue a barrier is received.
- the request may be received from a storage client and/or as part of a high-level API provided by the storage layer 130 (e.g., an atomic write, transaction, snapshot, or the like).
- step 2730 the method 2700 enforces the ordering constraints of the barrier. Accordingly, step 2730 may comprise causing all previously issued storage requests to complete. Step 2730 may further comprise queuing all subsequent requests until the previously issued requests complete, and the barrier is acknowledged (at step 2740 ).
- step 2740 the method 2700 determines if the ordering constraints are met, and if so, the flow continues to step 2750 ; otherwise, the flow continues at step 2730 .
- the barrier is acknowledged, which may comprise returning a current “time” (e.g., sequence indicator) at which the operations issued before the barrier were completed.
- Step 2750 may further comprise storing a persistent note of the barrier on the non-volatile storage.
- the method resumes operation on storage requests issued subsequent to the barrier at step 2720 .
- the flow ends until a next request for a barrier is received.
- the storage layer 130 leverages the logical address space 136 to manage “logical copies” of data (e.g., clones).
- a “clone” or “logical cloning operation” refers to replicating a range (or set of ranges) of LIDs within the logical address space 136 and/or other addressing system.
- the cloned range may comprise different set(s) of LIDs, which may be bound to the same media storage locations as the original LIDs (source LIDs), allowing two or more LIDs and/or LID ranges to reference the same data.
- Clone operations may be used to perform higher-level operations, such as deduplication, snapshots, logical copies, atomic operations (e.g., atomic writes, transactions, etc.), and the like.
- Creating a clone may comprise modifying the logical interface of data stored in a non-volatile storage device 120 in order to, inter alia, allow the data to be referenced by use of two or more different LIDs and/or LID extents. Accordingly, creating a clone of a LID (or set of LIDs) may comprise allocating new LIDs in the logical address space 136 (or dedicated portion thereof), and associating the new LIDs with the same media storage location(s) as the original LIDs in the storage metadata 135 . Creating a clone may, therefore, comprise adding one or more entries to a forward index 1204 configured to associate the new set of LIDs with the data.
- FIG. 28A depicts one embodiment of a range clone operation.
- the range clone operation of FIG. 28A may be implemented in response to a request from a storage client 116 and/or as part of a higher-level API provided by the storage layer 130 , such as an atomic operation, snapshot, logical copy, or the like.
- the interface 138 of the storage layer 130 may be configured to provide interfaces and/or APIs for performing clone operations.
- FIG. 28A depicts one embodiment of an index 2804 before the clone is created.
- the index 2804 of FIG. 28A comprises, inter alia, an entry 2814 that binds LIDs 1024-2048 to media storage locations 3453-4477. Other entries are omitted from FIG. 28A to avoid obscuring the details of the depicted embodiment.
- the entry 2814 and the bindings thereof, may define a logical interface 2811 A through which storage clients 116 may reference the corresponding data (e.g., data segment 2812 ); storage clients 116 may access and/or reference the data segment 2812 (and/or portions thereof) through the storage layer 130 by use of the LIDs 1024-2048.
- the storage controller 140 may be configured to store data in a contextual format on a storage device 120 .
- the contextual format may comprise associating data with corresponding persistent metadata that defines and/or references, inter alia, the logical interface of the data.
- the data stored at media addresses 3453-4477 comprises a packet format 2818 that includes persistent metadata 2864 .
- the persistent metadata 2864 may comprise the logical interface of the data segment 2812 (logical interface metadata 2865 ), and as such, may associate the data segment 2812 with the LIDs 1024-2048 of the entry 2814 .
- the contextual data format 2818 may enable the index 2804 (and/or other metadata 135 ) to be reconstructed from the contents of the storage device 120 ; in the FIG. 28A embodiment, the entry 2814 may be reconstructed by associating the data stored at media addresses 3453-4477 with the LIDs 1024-2048 identified in the persistent metadata 2864 of the packet 2818 .
- FIG. 28A depicts a single packet 2818 , the disclosure is not limited in this regard.
- the data of the entry 2814 may be stored in multiple, different packets 2818 , each comprising respective persistent metadata 2864 (e.g., a separate packet for each media storage location, etc.).
- Creating a clone of the entry 2814 may comprise allocating one or more LIDs in the logical address space 136 , and binding the new LIDs to the same data segment 2812 as the entry 2814 (e.g., the data segment at media storage location 3453-4477). Creating the clone may therefore, comprise modifying the storage metadata 135 without requiring the underlying data segment 2812 to be copied and/or replicated.
- FIG. 28B depicts storage metadata 135 of one embodiment of a clone operation.
- Cloning the LIDs 1024-2048 may comprise allocating a new set of LIDs within the logical address space 136 , and associating the new set of LIDs with the media addresses of LIDs 1024-2048.
- cloning the LIDs 1024-2048 may comprise allocating a new set of LIDs 6144-7168 (represented in index entry 2824 ), and associating the LIDs with the media addresses 3453-4477 of entry 2814 .
- the data stored at media addresses 3453-4477 may, therefore, be referenced through both the LIDs 1024-2048 of entry 2814 and the LIDs 6144-7168 of entry 2824 . Accordingly, the clone operation modifies the logical interface 2811 B of the data segment 2812 , such that the data can be referenced through either the LIDs of entry 2814 and/or the LIDs of entry 2824 .
- the modified logical interface 2811 B of the data may be inconsistent with the contextual format of the data segment 2812 on the storage device 120 .
- the persistent metadata 2864 of the data segment 2812 comprises logical interface metadata 2865 that associates the data segment 2812 with LIDs 1024-2048 of the logical interface 2811 A, and not LIDs 6144-7168 of the modified logical interface 2811 B.
- the contextual format of the data 2818 may be updated to be consistent with the modified logical interface 2811 B (e.g., updated to associate the data with LIDs 1024-2048 and 6144-7168, as opposed to only LIDs 1024-2048).
- Updating the contextual format of the data segment 2812 may comprise updating the persistent metadata 2864 on the storage device 120 .
- the persistent metadata 2864 may be updated by overwriting and/or updating the persistent metadata 2864 without relocating the data segment 812 and/or packet 2818 .
- the storage controller 140 may be configured to append data to a log and/or update data out-of-place on the storage device 120 .
- updating the contextual format of the data segment 2812 may comprise relocating and/or rewriting the data segment 2812 on the storage device 120 , which may be a time-consuming processes, and may be particularly inefficient if the data segment 2812 is large and/or the clone comprises a large number and/or of LIDs. Therefore, in some embodiments, the storage layer 130 may defer updating the contextual format of cloned data and/or may update the contextual format in one or more background operations. In the meantime, the storage layer 130 may be configured to provide access to the data while stored in the inconsistent contextual format 2818 .
- the storage layer 130 may be configured to acknowledge completion of clone operations before contextual format of the corresponding data is updated.
- the data may be subsequently rewritten (e.g., relocated) in the updated contextual format on the storage device 120 in another process, which may be outside of the “critical path” of the clone operation and/or other storage operations (e.g., in one or more background operations).
- the data segment 2812 is relocated using the groomer 370 , or the like. Accordingly, storage clients 116 may be able to access the data segment 2812 through the modified logical interface 2811 B (both 1024-2048 and 6144-7168) without waiting for the contextual format of the data segment 2812 to be updated to be consistent with the modified logical interface 2811 B.
- the modified logical interface 2811 B of the data segment 2812 may exist only in the index 2804 . Therefore, if the index 2804 is lost, due to, inter alia, power failure or data corruption, the clone operation may not be reflected in the reconstructed storage metadata 135 (the clone operation may not be persistent and/or crash safe).
- the contextual format of the data at 3453-4477 is accessed, the logical interface metadata 2865 of the persistent metadata 2864 indicates that the data is associated only with LIDs 1024-2048, not 1024-2048 and 6144-7168. Therefore, only entry 2814 will be reconstructed (as in FIG. 28A ), and 2824 will be omitted; moreover, subsequent attempts to access the data segment 2812 through the modified logical interface 2811 B (e.g., through 6144-7168) may fail.
- a clone operation may further comprise storing a persistent note on the storage device 120 to make a clone operation persistent and/or crash safe.
- the persistent note may comprise an indication of the modified logical interface of the data.
- the persistent note 2866 corresponding to the depicted clone operation may comprise a persistent indicator 2868 that associates the data stored at media addresses 3453-4477 with both LID ranges 1024-2048 and 6144-7168.
- the persistent note 2866 may indicate that the data segment 2812 is associated with both LID ranges, such that both entries 2814 and 2824 can be reconstructed.
- the storage layer 130 may acknowledge completion of a clone operation in response to updating the metadata 135 (e.g., creating the index entry 2824 ) and storing the persistent note 2866 on the storage device 120 .
- the persistent note 2866 may be invalidated and/or marked for erasure in response updating the contextual format of the data segment 2812 to be consistent with the logical interface 2811 B (e.g., relocating the data segment 2812 by the groomer module 370 as disclosed above).
- the storage controller 140 may be configured to store the data segment 2812 in an updated contextual format that is consistent with the modified logical interface 2811 B.
- the updated contextual format may comprise associating the data segment 2812 with the LIDs of both entries 2814 and 2824 (e.g., both LIDs 1024-2048 and 6144-7168).
- FIG. 28C depicts one embodiment of an updated contextual format (packet 2888 ) of the data segment 2812 .
- the logical interface metadata 2865 of the updated packet 2888 indicates that the data segment 2812 is associated with both LID ranges 1024-2048 and 6144-7168 (as opposed to only 1024-2048).
- the updated contextual data format (packet 2888 ) may be written out-of-place, at different media addresses (64432-65456), which is reflected in the entries 2814 and 2824 in the index 2804 .
- the corresponding persistent note 2866 (if any) may be invalidated (removed and/or marked for subsequent removal) from the storage device 120 .
- removing the persistent note 2866 may comprise issuing one or more TRIM messages indicating that the persistent note 2866 no longer needs to be retained on the storage device 120 .
- portions of the index 2804 may be stored in a persistent crash safe storage location (e.g., non-transitory storage media 114 and/or the storage device 120 ).
- the persistent note 2866 may be removed, even if the contextual format 2818 of the data has not yet been updated on the storage device 120 .
- Clones may operate in different modes. In a “copy on write” mode, storage operations that occur after creating the clone may cause the clones to diverge from one another (e.g., the entries 2814 and 2824 may refer to different media addresses).
- FIG. 28D depicts one embodiment of a storage operation performed within a cloned range in a copy-on-write mode.
- the storage controller 140 has written the data segment 2812 in the updated contextual data format (packet 2888 ) that is configured to associate the data with both LID ranges 1024-2048 and 6144-7168 (as depicted in FIG. 28C ).
- a storage client 116 may then issue one or more storage requests to modify and/or overwrite data corresponding to the LIDs 6657-7168.
- the storage request comprises modifying and/or overwriting LIDs 6657-7168.
- the storage controller 140 may store the new and/or modified data on the storage device 120 , which may comprise appending a data segment 2852 to the log in a contextual format (packet 2889 ).
- the packet 2889 may associate the data segment 2852 with the LIDs 6657-7424 as disclosed herein (e.g., by use of LID indicators 2875 within persistent metadata 2874 of the packet 2889 ).
- the index 2804 may be updated to associate the LIDs 6657-7424 with the data segment 2852 , which may comprise splitting the entry 2824 into an entry 2852 configured to continue to reference the unmodified portion of the data in the data segment 2812 and an entry 2833 that references the new data segment 2852 stored at media addresses 78512-79024.
- the entry 2814 corresponding to the LIDs 1024-2048 may continue to reference the data segment 2812 at media addresses 64432-65456.
- modifications to within the LID range 1024-2048 may result in similar divergent changes affecting the entry 2814 .
- the storage request(s) are not limited to modifying and/or overwriting data. Other operations may comprise expanding a LID range (appending data), removing LIDs (deleting and/or trimming data), and/or the like.
- the storage controller 130 may support other clone modes, such as a “synchronized clone” mode.
- a synchronized clone mode changes made within a cloned LID range may be reflected in one or more other, corresponding LID ranges.
- implementing the described storage operations in a “synchronized clone” mode may comprise updating the entry 2814 to reference the new data segment 2852 , as disclosed herein, which may comprise inter alia, splitting the entry 2814 into an entry configured to associate LIDs 1024-1536 with the original data segment 2812 , and adding an entry configured to associate the LIDs 1537-2048 with the new data segment 2852 .
- the storage layer 130 may be further configured to manage clone merge operations.
- a “range merge” or “clone merge” refers to an operation to combine two or more different sets of LIDs.
- a range merge operation may comprise merging the entry 2814 with the cloned entries 2832 and 2833 .
- the storage layer 130 may be configured to implement range merge operations according to a merge policy, such as a recency policy in which more recent changes override earlier changes, a priority-based policy based on the relative priority of storage operations (e.g., based on properties of the storage client(s) 116 , applications, and/or users associated with the storage operations), a completion indicator (e.g., completion of an atomic storage operation, failure of an atomic storage operation, or the like), fadvise parameters, ioctrl parameters, and/or the like.
- a merge policy such as a recency policy in which more recent changes override earlier changes, a priority-based policy based on the relative priority of storage operations (e.g., based on properties of the storage client(s) 116 , applications, and/or users associated with the storage operations), a completion indicator (e.g., completion of an atomic storage operation, failure of an atomic storage operation, or the like), fadvise parameters, ioctr
- FIG. 28E depicts one embodiment of a range merge operation.
- the range merge operation of FIG. 28E may be performed in accordance with a recency merge policy.
- the range merge operation may comprise merging the range 6144-6656 into the range 1024-2048.
- the range merge operation may comprise selectively applying changes made within the LID range 6144-6656 to the LID range 1024-2048 in accordance with the merge policy.
- the modify/overwrite operation to LIDs 6657-7424 may be applied to the merged LID range 1024-2048 in accordance with the recency merge policy.
- the range merge operation may, therefore, comprise updating the LID range 1024-2048 to associate LIDs 1537-2048 with the media addresses 78512-79024 comprising the new/modified data segment 2852 .
- the resulting LID range may be split into two separate entries 2815 and 2817 in the index 2804 ; entry 2815 may be configured to associate LIDs 1024-1536 with portions of the data segment 2812 ; and entry 2817 may be configured to associate LIDs 1537-2048 with the data segment 2852 . Portions of the data segment 2812 no longer referenced by the LIDs 1537-2048 may be invalidated, as disclosed herein.
- the merged LID range 6144-7168 may be deallocated and/or removed from the index 2804 .
- the range merge operation illustrated in FIG. 28E may result in modifying the logical interface 2811 C to portions of the data.
- the contextual format 2889 of the data segment 2852 may associate the data with LIDs 6657-7168, rather than LIDs 1537-2048.
- the storage layer 130 may provide access to the data stored in the inconsistent contextual format.
- the storage controller 140 may be configured to store the data in an updated contextual format, in which the data segment 2852 is associated with LIDs 1537-2048 in one or more background operations (e.g., grooming operations).
- the range merge operation may further comprise storing a persistent note 2866 on the storage device 120 to associate the data segment 2852 with the updated logical interface 2811 C (e.g., associate the data at media addresses 78512-79024 with LIDs 1537-2048).
- the persistent note 2866 may be used to ensure that the range merge operation is persistent and crash safe.
- the persistent note 2866 may be removed in response to relocating the data segment 2852 in a contextual format that is consistent with the logical interface 2811 C (e.g., associates the data segment 2852 with the LIDs 1537-2048).
- the logical clone operations disclosed in conjunction with FIGS. 28A-E may be used to implement other logical operations, such as a range move operation.
- the clone operation of entry 2814 comprises modifying the logical interface associated with the data segment 2812 to associate the data segment 2812 with the LIDs 1024-2048 of entry 2814 and the LIDs 6144-7168 of entry 2824 .
- the cloning operation further includes storing a persistent note 2866 indicating the updated logical interface 2811 B of the data and rewriting the data segment 2812 in accordance with the updated logical interface 2811 B in one or more background storage operations (e.g., grooming operations).
- a range move operation refers to modifying the logical interface of one or more data segments to associate the data segments with a different set of LIDs.
- a range move operation may, therefore, comprise updating storage metadata 135 (e.g., the index 2804 ) to associate the one or more data segments with the updated logical interface, storing a persistent note 2866 on the storage device 120 comprising the updated logical interface of the data segments, and rewriting the data segments in accordance in a contextual format (packet 2888 ) that is consistent with the updated logical interface (e.g., includes the updated logical interface 2865 in the persistent metadata 2864 ), as disclosed herein.
- the storage layer 130 may implement range move operations using the same mechanisms and/or processing steps as those disclosed above in conjunction with FIGS. 28A-E .
- storing data in a contextual format may comprise associating the data with each LID that references the data.
- the persistent metadata 2864 comprises references to both LID ranges 1024-2048 and 6144-7168. Increasing the number references to a data segment may, therefore, impose a corresponding increase in persistent metadata overhead.
- the size of the persistent metadata 2864 may be limited, which may limit the number of references and/or clones that can reference a particular data segment. Moreover, inclusion of multiple LID references may complicate groomer operations.
- the number of index entries needed to be updated in a grooming operation may vary in accordance with the number of LIDs that reference the data that is to be relocated.
- relocating the data segment 2812 in a grooming and/or storage recovery operation may comprise updating two separate index entries 2814 and 2824 .
- Relocating a data segment referenced by N different clones e.g., N different LIDs
- storing the data segment may comprise writing N entries into the persistent metadata 2864 . This variable overhead may reduce the performance of background grooming operations and may limit the number of concurrent clones and/or references that can be supported.
- the storage layer 130 may comprise and/or leverage an intermediate mapping layer to reduce the overhead imposed by clone operations.
- the intermediate mapping layer may comprise “reference entries” configured to facilitate efficient cloning operations (as well as other operations, as disclosed in further detail herein).
- a reference entry refers to an entry that only exists while it is being referenced by one or more entries in the logical address space 136 . Accordingly, a reference entry does not exist in its own right, but only exists as long as it is being referenced by one or more other index entries.
- reference entries may be immutable. Multiple clones may reference the same set of data through a single reference entry.
- the contextual format of cloned data may be simplified to associate the data with a reference entry which, in turn, is associated with N other references through other persistent metadata (e.g., persistent notes 2866 ).
- Relocating cloned data may, therefore, comprise updating a single mapping between the reference entry and the new media address of the data.
- FIG. 28F depicts one embodiment of a storage layer 130 configured to implement an intermediate mapping layer.
- the metadata module 135 of the storage layer may comprise a forward index 2804 pertaining to the logical address space 136 that is exposed to the storage clients via the interface 138 .
- the metadata 2804 may include information pertaining to LID allocations, bindings between LIDs and media addresses and so on.
- the metadata module 135 may further comprise a reference index 2809 comprising reference entries. As disclosed above, the reference entries may be used to reference cloned data.
- the translation module 134 may monitor reference entries of the reference index 2809 , and may remove reference entries that are no longer needed (e.g. are not longer being referenced by other entries in the forward index 2804 ).
- reference entries may be maintained in a separate portion of the storage metadata 135 (within a separate reference index 2809 ).
- the reference entries may be identified by use of reference identifiers, which may be maintained in a separate namespace from the index 2804 . Accordingly, the reference entries may be part of an intermediate, “virtual” or “reference” address space that is separate and distinct from the logical address space 136 exposed to the storage clients 116 .
- reference entries may be assigned LIDs selected from pre-determined ranges and/or portions of the logical address space 136 that are not directly accessible by storage clients 116 .
- a clone operation may comprise linking one or more LID entries in the logical address space 2804 to reference entries in the reference index 2809 .
- the reference entries may comprise the media address(es) of the cloned data. Accordingly, LIDs that are associated with cloned data may reference the cloned data indirectly through the reference index 2809 .
- Such entries may be referred to as “indirect entries.”
- an indirect entry refers to an entry in the index 2804 that references and/or is linked to a reference entry in the reference index 2804 . Indirect entries may be assigned a LID within the logical address space 136 , and may be accessible to the storage clients 116 .
- storage clients 116 may perform storage operations within one or more of the cloned ranges, which may cause the clones to diverge from one another (in accordance with the clone mode).
- changes made to a particular clone may not be reflected in the other cloned ranges.
- changes made to a clone may be reflected in “local” entries within an indirect entry.
- a “local entry” or “local LID” refers to a portion of an indirect entry that is directly mapped to one or more media addresses on the storage device 120 . Accordingly, local entries and/or local LIDs may be configured to reference data that has been changed in a particular clone and/or differs from the contents of other clones.
- the translation module 134 may be configured to access data associated with cloned data. In some embodiments, the translation module 134 is configured to determine the media addresses associated with an indirect entry by use of the corresponding reference entries in the reference index 2809 .
- the translation module 134 may further comprise a cascade lookup module 2855 configured to manage indirect entries that comprise local LIDs. The cascade lookup module 2855 may be configured to traverse local LIDs of indirect entries first and, if the LID is not found within local entries, the cascade lookup module 2855 may continue searching within the reference entries to which the indirect entry is linked.
- the log storage module 137 and groomer module 370 may be configured to manage the contextual format of cloned data.
- cloned data data that is referenced by two or more LIDs and/or LID ranges within the index 2804
- the logical interface metadata stored with cloned data segments may correspond to a single reference entry as opposed to identifying each LID and/or LID range of the clone.
- Creating a clone may, therefore, comprise updating the contextual format of the cloned data in one or more background operations by use of, inter alia, the groomer module 370 .
- FIG. 28G depicts one embodiment of a clone operation using a reference index 2809 .
- an entry corresponding to logical identifier 10 extent 2 in the logical address space 136 may directly reference data at media address 20000 on the storage device 120 .
- Other entries are omitted from FIG. 28F to avoid obscuring the details of the disclosed embodiment.
- the storage controller 130 is configured to create a clone of the entry 10 , 2 at LID 400 (denoted 400 , 2 in FIG. 28G ).
- the storage controller 130 may be configured to create the clone in response to a request from a storage client 116 and/or as part of a higher-level operation, such as an atomic storage operation, snapshot, or the like.
- creating a clone of entry 10 , 2 may comprise creating an entry in the logical address space 136 (index 2804 ) to represent the clone.
- the clone may be created at LID 400 (entry 400 , 2 in FIG. 28G ).
- Creating the clone may further comprise creating an entry in the reference index 2809 through which the entries 10 , 2 and 400 , 2 may reference the cloned data at media address 20000.
- the reference entry may correspond to a particular portion of the logical address space 136 and/or may be part of a separate, reference address space.
- the reference entry is identified in FIG. 28G as entry 100000 , 2 .
- the clone operation may further comprise associating the entries 10 , 2 and 400 , 2 within the index 2804 with the reference entry 100000 , 2 as illustrated at state 2813 C.
- associating the entries 10 , 2 and 400 , 2 , with the reference entry 100000 , 2 may comprise indicating that the entries 10 , 2 and 400 , 2 are indirect entries.
- State 2813 C may further comprise storing a persistent note 2866 on the storage device 120 to associate the data at media address 20000 with the reference entry 100000 , 2 and/or to associate the entries 10 , 2 and 400 , 2 with the reference entry 100000 , 2 in the reference index 2809 .
- the storage layer 130 may provide access to the data at media address 20000 through either LID 10 or LID 400 (and by reference to the reference entry 100000 , 2 ).
- the translation module 134 may determine that the corresponding entry in the index 2804 is an indirect entry that is associated with an entry in the reference index 2809 .
- the cascade lookup module 2855 may determine the media address associated with the LID by use of local entries (if any) and the corresponding reference entry 100000 , 2 .
- the data stored at media address 20000 may be stored in a contextual format that is inconsistent with the clone configuration (e.g., the data may be associated with LID 10,2 as opposed to the reference entry 100000 , 2 and/or LID 400).
- the data may be stored in an updated contextual format (in state 2813 D) in one or more background and/or grooming operations.
- the data may be stored with persistent metadata that associates the data with the reference entry 100000 , 2 as opposed to the separate LIDs ranges 10,2 and 400,2. Relocating the cloned data may only require updating a single entry in the reference index 2809 as opposed to multiple entries corresponding to each LID that references the data (e.g., entries 10 , 2 and 400 , 2 ).
- any number of LIDs in the index 2804 may reference the cloned data, without increasing the size of the persistent metadata associated with the cloned data and/or complicating the operation of the groomer module 370 .
- FIG. 28H depicts another embodiment of a clone operation implemented using a reference index 2890 .
- the storage layer 130 may be configured to create a reference entry 2891 in a designated portion of the index 2804 (e.g., the reference index 2890 ), or within a separate namespace.
- the reference entry 2891 may represent the cloned data segment 2812 . Any number of LIDs and/or LID ranges in the index 2804 may reference the data through the reference index entry 2891 .
- the reference entry 2891 may be bound to the media storage locations of the cloned data segment 2812 (media addresses 3453-4477).
- the entries 2894 and 2895 may reference the media addresses indirectly, through the reference entry 2891 .
- the reference entry 2891 may be assigned identifiers 0Z-1023Z. As disclosed above, the identifier(s) of the reference entry 2891 may correspond to a particular portion of the logical address space 136 or may correspond to a different, separate namespace.
- the storage layer links the entries 2894 and 2895 to the reference entry 2891 by use of, inter alia, metadata 2819 and 2829 . Alternatively, or in addition, the indirect entries 2894 and 2895 may replace media address information with references and/or links to the reference entry 2891 .
- the reference entry 2891 may not be directly accessible by storage clients 116 via the storage layer 130 and/or interface 138 .
- the clone operation may further comprise modifying the logical interface 2811 D of the data segment 2812 ; the modified logical interface 2811 D may allow the data segment 2812 to be referenced through the LIDs 1024-2048 of the indirect entry 2894 and/or the LIDs 6144-7168 of the indirect entry 2895 .
- the reference entry 2891 may not be used by storage clients 116 to reference the data segment 2812
- FIG. 28H depicts the reference entry 2891 as part of the modified logical interface 2811 D of the data segment 2812 , since the reference entry 2891 is used to access the data by the translation module 132 (through the indirect entries 2894 and 2895 ).
- Creating the clone may further comprise storing a persistent note 2866 on the storage device 120 .
- the persistent note 2866 may identify the reference entry 2891 associated with the data segment 2812 . Accordingly, the persistent note 2866 may associate the media addresses 64432-65456 with the identifier(s) of the reference entry 2891 .
- the clone operation may further comprise storing another persistent note 2867 configured to associate the LIDs of entries 2894 and 2895 (LIDs 1024-2048 and 6144-7168) with the reference entry 2891 .
- metadata pertaining to the association between entries 2894 and 2895 and the reference entry 2891 may be included in the persistent note 2866 .
- the persistent notes 2866 and/or 2867 may be retained on the storage device 120 until the data segment 2812 is relocated in an updated contextual format and/or the index 2804 (and/or reference index 2890 ) are persisted. As disclosed above, storage of the persistent note(s) 2866 and/or 2867 may ensure that the clone operation is persistent and crash safe.
- the modified logical interface 2811 D of the data segment 2812 may be inconsistent with the contextual format of the data 2898 A; the logical interface metadata 2865 A of the persistent metadata 2864 A may reference LIDs 1024-2048 rather than the identifiers of the reference entry 2891 and/or the cloned entry 2895 .
- the storage controller 140 may be configured to store the cloned data segment 2812 in an updated contextual format 2864 B that is consistent with the modified logical interface 2811 D; the logical interface metadata 2865 B of the persistent metadata 2864 B may associate the data segment 2812 with the reference entry 2891 , as opposed to separately identifying the LIDs within each cloned range (LIDs of entries 2894 and 2895 ).
- the use of the indirect entry 2894 allows the logical interface 2811 D of the data segment 2812 to comprise any number of LIDs, independent of size limitations of the contextual data format 2898 A-B (e.g., independent of the number of LIDs that can be included in the logical interface metadata 2865 ).
- additional logical copies of the reference entry 2891 may be made without updating the contextual format 2864 B of the data; such updates may be made by associating the LID ranges with the reference entry 2891 in the index 2804 and/or by use of, inter alia, persistent notes 2867 .
- the indirect entries 2894 and/or 2895 may initially reference the data segment 2812 through the reference entry 2891 .
- Storage operations performed after creating the clones 2894 and/or 2895 may be reflected by use of local LIDs within the respective entries 2894 and/or 2895 .
- FIG. 28I depicts one embodiment of the result of a storage operation pertaining to LIDs 1024-1052 performed after completing the clone operation of FIG. 28H .
- a storage client 116 may modify data associated with one or more of the clones. In the FIG.
- a storage client 116 modifies and/or overwrites data corresponding to LIDs 1024-1052 of entry 2894 , which may comprise appending a new data segment 2892 to the storage device 120 .
- the data segment 2892 may be stored in a contextual format 2898 comprising persistent metadata 2864 configured to associate the data segment 2892 stored at media addresses 7923-7851 with logical interface metadata 2865 (LIDs 1024-1052).
- the storage layer 130 may be configured to associate the data segment 2892 with the LIDs 1024-1052 in a local LID entry 2896 .
- the local LID entry 2896 may reference the updated data directly, as opposed to referencing the data through a reference entry (e.g., reference entry 2891 ).
- the cascade lookup module 2855 may search for references to the LIDs in a cascading lookup operation, which may comprise searching for references to local LIDs (if available) followed by the reference entries 2891 .
- the local entry 2896 may be used to satisfy requests pertaining to LIDs 1024-1052 (media addresses 7823-7851 rather than 64432-64460 per the reference entry 2891 ). Requests for LIDs that are not found in local entries (e.g., LIDs 1053-2048) may continue to be serviced through the reference entry 2891 .
- the storage layer 130 may use the indirect entry 2894 and reference entry 2891 to implement a cascade lookup for LIDs pertaining to the clone range 1024-2048.
- the logical interface 2811 E of the data may, therefore comprise one or more local entries 2896 and/or one or more indirect and/or reference entries.
- a storage client 116 may modify data of the clone through another one of the LIDs of the logical interface 2811 E (e.g., LIDs 6144-6162); the logical interface delimiters are not shown in FIG. 28J to avoid obscuring the details of the illustrated embodiment.
- the modified data may be referenced using a local entry 2897 , as disclosed above. Since each of the clones now has its own, respective version of the original clone data 0Z-52Z; neither clone references that portion of the reference entry 2891 .
- the storage layer 130 may determine that the corresponding clone data (and reference identifiers) are no longer being referenced, and may be removed (as depicted in FIG. 28J ). The clones may continue to diverge, until neither 2894 nor 2895 references any portion of the reference entry 2891 , at which point the reference entry 2891 may be removed.
- FIGS. 28I and 28J depict local entries 2896 and 2897 that overlap with the corresponding indirect entries 2894 and 2895 , the disclosure is not limited in this regard.
- the storage operation of FIG. 28I may be reflected by creating the local entry 2896 and modifying the indirect entry to reference LIDs 1053-2048.
- the operation of FIG. 28J may comprise creating the local entry 2897 and modifying the indirect entry to reference LIDs 6163-7168.
- each entry in the reference index 2809 comprises metadata that includes a reference count (not shown).
- the reference count may be incremented as new references or links to the reference entry are added, and may be decremented in response to removing references to the entry.
- reference counts may be maintained for each reference identifier in the reference index 2809 .
- reference counts may be maintained for reference entries as a whole. When the reference count of a reference entry reaches 0, the reference entry 2891 (or a portion thereof) may be removed from the reference index 2809 . Removing a reference entry (or portion of a reference entry) may comprise invalidating the corresponding data on the storage device 120 , as disclosed herein (indicating that the data no longer needs to be retained on the storage device 120 ).
- the storage layer 130 may remove reference entries using a “mark-and-sweep” approach.
- the storage layer 130 (or other process, such as the translation module 134 and/or groomer 370 ) may periodically check references to entries in the reference index 2809 by, inter alia, following links to the reference entries from indirect entries (or other types of entries) in the index 2804 . Entries that are not referenced by any entries during the mark-and-sweep may be removed, as disclosed above.
- the mark-and-sweep may operate as a background process, and may periodically perform a mark-and-sweep operation to garbage collect reference entries that are no longer in use.
- the reference index disclosed in conjunction with FIGS. 28F-28J may be created on demand (e.g., in response to creation of a clone, or other indirect data reference).
- all data may be referenced through intermediate, two-layer mappings.
- storage clients 116 may allocate indirect, virtual identifiers (VIDs) in a virtual address space, which may be linked to and/or reference media addresses through an intermediate mapping layer, such as the logical address space 136 of the storage layer 136 . These embodiments may result in an additional mapping layer between storage clients 116 and the storage device(s) 120 .
- Storage clients may reference data using VIDs of a virtualized address space that map to logical identifiers of the logical address space 136 , which, in turn, are associated with media addresses on respective storage device(s) 120 .
- FIG. 28K depicts one embodiment of an indirection layer 2830 configured to implement cloning operations using a two-layer, virtualized address space.
- the indirection layer 2830 may be configured to present a virtual address space 2836 to the storage clients 116 .
- the indirection layer 2830 may implement the virtual address space 2836 using the same modules and/or interfaces for managing the logical address space 136 disclosed herein.
- the virtual address space 2836 may comprise 64-bit VIDs, which may be defined independently of the underlying logical address space 136 and/or storage device(s) 120 .
- the indirection layer 2830 may comprise VID metadata 2835 , which may comprise a VID index 2884 .
- the VID index 2884 may be implemented as the index 2804 disclosed herein.
- the indirection layer 2830 may further comprise a VID translation module 2834 configured to map VIDs to LIDs within the logical address space 136 of the storage layer 130 .
- the indirection layer 2830 may provide access to the virtual address space 2836 through the interface 2838 .
- the interface 2838 may comprise one or more of a block device interface, virtual storage interface, cache interface, and the like, as disclosed herein.
- the clone module 2831 may be configured to manage clone operations within the virtual address space 2836 .
- FIG. 28K depicts the indirection layer 2830 separately from the storage layer 130 , the disclosure is not limited in this regard.
- virtual address space 2836 , VID index 2884 , VID translation module 2834 , and/or the clone module 2831 may be implemented as part of the storage layer 130 .
- the VIDs of the virtual address space may be used to, inter alia, perform efficient cloning operations.
- the additional mapping layer may be leveraged to enable logical clone operations on random access, write-in-place storage devices 120 , such as hard disks.
- Storage clients 130 may perform storage operations in reference to VIDs of the virtual address space 2836 . Accordingly, storage operations may comprise two (or more) translation layers.
- the VID index 2884 may comprise a first translation layer between VIDs of the virtual address space 2836 and LIDs of the logical address space 136 .
- the index 2804 of the storage layer 130 may implement a second translation layer between the LIDs and media address(es) on respective storage devices 120 .
- the indirection layer 2830 may be configured to manage allocations within the virtual address space 2836 by use of, inter alia, the VID metadata 2835 , VID index 2884 , and/or VID translation module 2834 .
- the VID translation module 2834 may be configured to maintain associations between VIDs of the virtual address space 2836 and LIDs of the logical address space 136 (by use of the VID index 2884 ).
- allocating a VID in the virtual address space 2836 may comprise allocating one or more corresponding LIDs in the logical address space 136 . Accordingly, each VID allocated in the virtual address space 2836 may be mapped to one or more LIDs in the logical address space 136 .
- the mappings may be sparse and/or any-to-any, as disclosed herein.
- the logical address space 136 may not be directly accessible to the storage clients 116 (e.g., the logical address space 136 may be used as an intermediate mapping layer).
- Performing a storage operation through the indirection layer 2830 may comprise: a) identifying the LIDs corresponding to one or more VIDs referenced in the storage operation by use of the VID translation module 2834 and/or VID index 2884 ; and b) implementing the storage operation within the storage layer 130 in reference to the identified LIDs.
- FIG. 28J depicts one embodiment of a clone operation using an indirection layer 2830 .
- the VID index 2884 may correspond to a virtual address space 2836 that is indirectly mapped to media addresses through the logical address space 136 of the storage layer 130 .
- the indirection layer 2830 may provide access to the virtual address space 2836 through an interface 2838 .
- Storage clients 116 may allocate portions of the virtual address space 2836 and/or perform storage operations using VIDs of the virtual address space through the indirection layer 2830 (and storage layer 130 ), as disclosed herein.
- the VID index 2884 may comprise an entry 10 , 2 that represents two VIDs ( 10 and 11 ) in the virtual address space 2836 .
- the VID index 2884 may be configured to map the VID entry 10 , 2 to LIDs within the logical address space 136 (using the VID index 2884 ).
- the VID index 2884 maps the VID entry 10 , 2 to the LID entry 100000 , 2 .
- the entry 10 , 2 may be allocated to a storage client 116 , which may perform storage operations in reference to the VIDs.
- the storage layer 130 may be configured to map the LID entry 100000 , 2 to one or more media addresses on the storage device 120 (media address 20000).
- the indirection layer 2830 is configured to implement a clone operation.
- the clone operation may comprise creating a clone of the VID entry 10 , 2 .
- the clone is identified as VID index entry 400 , 2 .
- the clone operation may further comprise associating the cloned entry 400 , 2 with corresponding LID entry 100000 , 2 in the VID index 2884 .
- the corresponding entry 100000 , 2 in the index 2804 may remain unchanged.
- a reference count (or other indicator) of the LID entry 100000 , 2 may be updated to indicate that the entry is being referenced by multiple VID entries.
- the contextual format of the data stored at media address 20000 may be left unchanged (e.g., continue to associate the data with the LID entry 100000 , 2 ).
- the clone operation may further comprise storing a persistent note 2866 and/or 2867 on the storage device 120 to persist the association between VID entry 400 , 2 and the LID entry 100000 , 2 .
- the clone operation may be made persistent and/or crash safe by persisting the VID index 2884 .
- the data at media address 20000 may be relocated to media address 40000.
- the relocation may occur in a standard grooming operation, and not to update the contextual format of the cloned data.
- Relocating the data may comprise updating a single entry in the index 2804 .
- the clone implementations disclosed herein may be used to efficiently implement storage operations, such as range clone operations, range move operations, snapshots, deduplication, atomic writes, and the like.
- an address range refers to a logical address range, a virtual address range, or the like.
- the storage layer 130 may comprise a deduplication module 374 configured to identify duplicate data on the storage device 120 and/or non-volatile storage media 122 .
- Duplicate data may be identified using any suitable mechanism.
- duplicated data is identified by scanning the contents of the storage device 120 , generating signature values for various data segments, and comparing data signature values to identify duplicates.
- the signature values may include, but are not limited to: cryptographic signatures, hash codes, cyclic codes, and/or the like.
- Signature information may be stored within storage metadata 135 , such as the index 2804 (e.g., in metadata associated with the entries) and/or may be maintained and/or indexed in one or more separate datastructures (not shown).
- the deduplication module 374 may compare data signatures and, upon detecting a signature match, may perform one or more deduplication operations.
- the deduplication operations may comprise verifying the signature match (e.g., performing a byte-by-byte data comparison), and performing one or more range clone operations to reference the duplicate data within two or more LIDs and/or LID ranges.
- FIG. 28M depicts one embodiment of a deduplication operation.
- the index 2804 may comprise entries 2814 and 2884 , which may reference duplicated data (e.g., duplicated data segment 2812 ) stored at different respective media addresses 3453-4477 and 7024-8048.
- the entries 2814 and 2884 may correspond to different, respective logical interfaces 2814 M and 2884 M.
- the duplicated data segment 2812 may be identified and/or verified by the deduplication module 374 , as disclosed above.
- the duplicated data may be identified as data is received for storage at the storage layer 130 . Accordingly, the data may be deduplicated before an additional copy of the data is stored on the storage device 120 .
- the storage layer 130 may be configured to deduplicate the data, which may comprise creating one or more range clones.
- creating a range clone may comprise modifying the logical interface 2811 G of the duplicated data segment 2812 to associate a single version of the data segment 2812 with both sets of LIDs 1024-2048 and 6144-7168.
- the range clone may be implemented using any of the clone embodiments disclosed herein including the range clone embodiments of FIGS. 28A-E , the reference entry embodiments of FIGS. 28F-J , and/or the two-layer mapping embodiments of FIGS. 28K-L .
- both LID ranges 1024-2048 and 614407168 may be modified to reference a single version of the data segment 2812 (the other data segment 2812 may be removed and/or groomed from the storage device 120 ).
- the FIG. 28M embodiment uses the reference entry implementation of FIGS. 28F-J .
- the deduplication operation may comprise creating a reference entry 2981 to represent the deduplicated data segment 2812 (the cloned data).
- the deduplication operation may further comprise modifying and/or converting the entries 2814 M and 2884 M to indirect entries 2894 and 2895 , which may be mapped to the data segment 2812 through the reference entry 2891 , as disclosed above.
- the deduplication operations may comprise modifying the logical interface 2811 G to reference the data segment 2812 through both sets of LIDs 1024-2048 and 6144-7168 (as well as the reference entry 2891 ).
- the deduplication operations may further comprise storing a persistent note on the non-volatile storage media 122 to associate the data segment 2812 with the updated logical interface 2811 G (e.g., associate the data segment 2812 with the reference entry 2891 and/or the linked indirect entries 2894 and 2895 ), as disclosed herein.
- the deduplication operations may further comprise updating the contextual format of the data segment 2812 to be consistent with the modified logical interface 2811 G, as disclosed above. Updating the contextual format may comprise relocating (e.g., rewriting) the data segment 2812 in an updated contextual format 2898 to new media storage locations (e.g., media storage locations 84432-84556) in one or more background operations.
- the updated contextual format 2898 may comprise persistent metadata 2864 that includes logical interface metadata 2865 to associates the data segment 2812 with the reference entry 2891 (e.g., identifiers 0Z-1023Z).
- FIGS. 28A-G depict cloning and/or deduplicating a single entry or range of LIDs
- a plurality of LID ranges may be cloned in a single clone operation.
- a cloning operation may clone the entry 1214 along with all of its child entries.
- a clone operation may comprise copying the entire contents of the index 1204 (e.g., all of the entries in the index 1204 ).
- This type of clone operation may be used to create a “snapshot” of a logical address space 136 (or a particular LID range).
- a snapshot refers to the state of a storage device (or set of LIDs) at a particular point in time. The snapshot may persist the state of a logical address range despite changes to the original.
- FIG. 28N depicts one embodiment of a storage layer configured to perform snapshot operations.
- the FIG. 28N embodiment pertains to an address range with a logical address space 136 (logical address range 2904).
- the disclosure is not limited in this regard, however, and could be adapted for use with other types of address ranges, such as ranges and/or extents within the virtual address space 2836 , as disclosed above.
- the storage layer 130 may be configured to create a snapshot of the logical address range LAS 1 .
- a snapshot of an address range refers to an operation that is configured to maintain the state of the address range at a particular time (e.g., freeze the address range).
- the snapshot operation may comprise preserving the state of the (LAS 1 ) at a particular time.
- the snapshot operation may further comprise preserving the logical address range while allowing subsequent storage operations to be performed within the logical address range.
- the storage layer 130 may be configured store data in an ordered log by use of, inter alia, the log storage module 137 .
- the log order of storage operations may be determined using sequence information associated with the data, such as sequence indicators on storage divisions 253 of a solid-state storage medium (e.g., logical storage element 229 of FIG. 3A ) and/or sequential storage locations within the physical address space of the storage device 120 (as disclosed in conjunction with FIG. 3C ).
- the storage controller 140 may be further configured to maintain other types of ordering and/or timing information, such as the relative time ordering of data in the log.
- the log order of data may not accurately reflect data information.
- the groomer module 370 may be configured to relocate data on the storage device 120 . Relocating data may comprise reading the data from its original storage location on the storage device 120 and appending the data at a current append point in the log. As such, older, relocated data may be stored with newer, current data in the log.
- the log storage module 137 is configured to associate data with timing information, which may be used to establish relative timing information of the storage operations performed in the log.
- the timing information may comprise respective timestamps (maintained by the timing module 2862 ), which may be applied to each data packet stored in the log. The timestamps may be stored within persistent metadata 2864 of the data packets (e.g. in packet headers).
- the timing module 2862 may be configured to track timing information at a higher-level of granularity.
- the timing module 2862 maintains one or more global timing indicators (an epoch identifier).
- an “epoch identifier” refers to an identifier used to determine relative timing of storage operations performed through the storage layer 130 .
- the log storage module 137 may be configured to include an indicator 2869 of the current epoch identifier in the persistent metadata 2864 ; the epoch indicator 2869 may correspond to the epoch in which the data segment 2812 was written to the log.
- the timing module 2862 may be configured to increment the global epoch identifier in response to certain events, such as the creation of new snapshots, user requests, and/or the like.
- the epoch indicator 2869 of the data segment 2812 may remain unchanged through relocation and/or other grooming operations. Accordingly, the epoch indicator 2869 may correspond to the original storage time of the data segment 2812 independent of the relative position of the contextual data format (packet 2918 ) in the log.
- a snapshot operation may comprise preserving the state of a particular logical address range (LAS 1 ) at a particular time.
- a snapshot operation may, therefore, comprise preserving data pertaining to the LAS 1 on the storage device 120 .
- Preserving the data may comprise a) identifying data pertaining to a particular timeframe (epoch), and b) preserving the identified data on the storage device 120 (e.g., preventing the identified data being removed from the storage device 120 in, inter alia, grooming operations).
- Data that needs to be preserved for a particular snapshot may be identified by use of the epoch indicators 2869 disclosed above.
- the storage layer 130 may receive a request to implement a snapshot operation through the interface 138 .
- the snapshot module 2860 may determine the current value of the epoch identifier maintained by the timing module 2862 .
- the current value of the epoch identifier may be referred to as the current “snapshot epoch.”
- the snapshot epoch is 0.
- the snapshot module 2860 may be further configured to cause the timing module 2862 to increment the current, global epoch indicator (e.g., increment the epoch identifier to 1).
- Creating the snapshot may further comprising storing a persistent note 2966 on the storage device 120 .
- the persistent note 2966 may indicate the current, updated epoch indicator, and may further indicate that data pertaining to the snapshot epoch is to be preserved.
- the persistent notes 2966 may be using during a metadata reconstruction operation to a) determine the current epoch identifier and to b) configure the snapshot module 2860 and/or groomer module 370 to preserve data associated with the snapshot epoch e 0 .
- the snapshot module 2860 may be further configured to instruct the groomer 370 to preserve data associated with the snapshot epoch.
- the groomer 370 may be configured to a) identify data to preserve for the snapshot (snapshot data), and b) prevent the identified data from being removed from the storage device 120 in, inter alia, grooming operations (e.g., storage recovery operations).
- the groomer module 370 may identify snapshot data in reference to the epoch indicators 2869 associated with the data. As disclosed in conjunction with FIG. 3C , data may be written out-of-place on the storage device 120 .
- the most current version of data associated with a particular LID (or LID range) may be determined based on the order of the corresponding data packets 2918 within the log.
- the groomer 370 may be configured to identify the most current version of data within the snapshot epoch as data that needs to be preserved. Data that has been rendered obsolete by other data in the snapshot epoch may be removed. Referring to the FIG. 3C embodiment, if the data A and A′ (associated with the same LIDs) were both marked with the snapshot epoch 0 , the groomer module 370 would identify the most current version of the data in epoch 0 as A′, and would identify the data A for removal.
- the snapshot module 2860 may be configured to preserve data pertaining to the snapshot LAS 1 (data associated with epoch e 0 ), while allowing storage operations to continue to be performed during subsequent epochs (e.g., epoch e 1 ).
- the storage operations may comprise storing data on the storage device.
- the data may be stored with an indicator of the current epoch (e 1 ).
- the snapshot module 2860 may be configured to preserve data that is rendered obsolete and/or invalidated by storage operations performed during epoch e 1 (and subsequent epochs). Referring back to the FIG.
- the groomer module 370 may identify data A′ as data to preserve for the snapshot LAS 1 (the data A′ may be the most current version within epoch e 0 ).
- the snapshot module 2860 and/or groomer 370 may be configured to preserve the data A′ even of the corresponding LIDs are trimmed and/or deleted during epoch e 1 .
- the data A′ may be preserved in response to overwriting the data with a new version A′′ during epoch e 1 .
- the snapshotfor LAS 1 (data marked with epoch indicator e 0 ) may be preserved until it is deleted.
- the snapshot may be deleted in response to a request received with the interface 138 .
- the epoch 0 may persist on the storage device 120 even after other, intervening epochs (epochs e 1 -eN) have been created and/or deleted. Deleting the epoch e 0 may comprise configuring the snapshot module 2860 and/or groomer module 370 to remove invalid/obsolete data associated with the epoch e 0 .
- the storage operations performed after creating the snapshot at 2873 A may modify the logical address space 136 and specifically, the index 2804 .
- the modifications may comprise updating LID-to-media address bindings in response to appending data to the storage device 120 , adding LIDs, removing and/or trimming LIDs, and so on.
- the snapshot module 2860 is configured to preserve the LAS 1 index in a separate storage location, such as a separate location in the logical address space 136 , in a separate namespace, or the like.
- the snapshot module 2860 may allow the changes to take place in the index 2804 without preserving the original version of the index 2804 LAS 1 at time t 1 .
- the snapshot module 2860 may be configured to reconstruct the index 2804 for LAS 1 at time t 1 using the data stored in the contextual, log-based data format on the storage device 120 .
- the LAS 1 at time t 1 may be reconstructed as disclosed above, which may comprise sequentially accessing data stored on the storage device 120 (in a log-order), and creating index entries based on persistent metadata 2864 associated with the data packet 2918 .
- the LAS 1 may be reconstructed by referencing data packets 2918 that are marked with the epoch indicator 2869 e 0 (or lower). Data associated with epoch indicators 2869 greater than e 0 may be ignored (since such data corresponds to operations after creation of the snapshot LAS 1 ).
- FIG. 28O depicts one example of a move operation.
- the index 2804 includes entries 2877 configured to bind LIDs 1023-1025 to respective data segments on the storage device.
- the entries 2877 are depicted separately to better illustrate details of the embodiment, however, the entries 2877 could be included in a single entry comprising a range of LIDs 1023-1025.
- the entries 2877 define a logical interface 2811 A of the data stored at media storage locations 32, 3096, and 872.
- data of the entries 2877 may be stored in a contextual format that associates the data with the corresponding LIDs 1023, 1024, and 1025.
- the storage layer 130 may be configured to implement a move operation.
- the move operation may comprise modifying the logical interface to the data 2811 B by, inter alia, replacing the association between the LIDs 1023, 1024, and 1025 and the data at the respective media storage locations 32, 3096, and 872, with a new logical interface 2811 B for the data that includes a new set of LIDs (e.g., 9215, 9216, and 9217).
- the move operation may be performed in response to a request received via the interface 138 and/or as part of a higher-level storage operation (e.g., a request to rename a file, operations to balance and/or defragment the index 2804 , or the like).
- the move operation may be implemented in accordance with one or more of the cloning embodiments disclosed above.
- the move operation may comprise associating the media addresses mapped to LIDs 1023, 1024, and 1025 with the destination LIDs 9215, 9216, and 9217, which may result in modifying the logical interface 2811 B of the data in accordance with the move operation.
- the move operation may further comprise storing a persistent note 2866 on the storage device 120 to ensure that the move operation is persistent and crash safe.
- the data stored at media addresses 32, 872, and 3096 may be re-written in accordance with the updated logical interface 2811 B in one or more background operations, as disclosed above.
- FIG. 28P depicts another embodiment of a move operation.
- the move operation may comprise moving the data associated with LIDs 1023-1025 to LIDs 9215-9217.
- the move operation of FIG. 28P may utilize the reference entries as disclosed in FIGS. 28F-J .
- the move operation may comprise creating reference entries 2899 in a reference index 2809 to represent the logical move operation.
- the reference entries 2899 may comprise the pre-move LIDs 1023, 1024, and 1025, which may be associated with the addresses 32, 3096, and 872.
- the new logical interface 2811 B of the data therefore, may comprise the indirect LIDs entries 2879 and the corresponding reference entries 2899 .
- the move operation may further comprise storing a persistent note 2866 on the storage device 120 to ensure that the move operation is persistent and crash safe.
- the contextual format of the data on the media addresses 32, 3096, and 872 may be inconsistent with the updated logical interface 2811 B; the contextual format of the data may associate the respective data segments with LIDs 1023, 1024, and 1025 as opposed to 9215, 9216, and 9217.
- the persistent note 2866 may comprise the updated logical interface for the data, so that the storage metadata 135 (e.g., index 2804 ) can be correctly reconstructed if necessary.
- the storage layer 130 may provide access to the data in the inconsistent contextual format through the modified logical interface 2811 B (LIDs 9215, 9216, and 9217).
- the data may be rewritten and/or relocated in a contextual format that is consistent with the modified logical interface 2811 B subsequent to the move operation (outside of the path of the move operation and/or other storage operations).
- the data at media addresses 32, 3096, and/or 872 may be rewritten by a groomer module 370 in one or more background grooming operations, as described above. Therefore, the move operation may complete (and/or return an acknowledgement) in response to updating the index 2804 is updated (and/or storing the persistent note 2866 ).
- the index 2804 may be updated in response to storing data in the consistent contextual format.
- the data segment 2823 at media storage location 32 may be relocated in a grooming operation, which may comprise storing the data in a contextual format 2883 that is consistent with the modified logical interface 2811 B of the move operation (e.g., includes persistent metadata 2864 comprising the logical interface 2865 that associates the data segment 2823 with LID 9215).
- the index 2804 may be updated to reference the data in the updated contextual format, which may comprise modifying the entry for LID 9215, such that it no longer is linked to the reference entry for 1023.
- the entry for LID 9215 may revert from an indirect node to a standard index entry and the reference entry for LID 1023 may be removed.
- a storage client 116 may modify data associated with LID 9217, which may comprise storing the modified data, out-of-place (at media address 772).
- the data may be written in a contextual format that is consistent with the modified logical interface 2811 B (e.g., associates the data with LID 9217).
- the index 2804 may be updated to associate the entry 9217 with the media storage location of the modified data (e.g., media storage location 772), and to remove the reference entry for LID 1025, as disclosed above.
- the reference index 2809 may be maintained separately from the index 2804 , such that the entries therein (e.g., entries 2899 ) cannot be directly referenced by storage clients 116 .
- This segregation of the logical address space 136 may allow storage clients to operate more efficiently. For example, rather than stalling operations until data is rewritten and/or relocated in the updated contextual format, data operations may proceed while the data is rewritten in one or more processes outside of the path for servicing storage operations and/or requests. Referring to FIG. 28S , following the move operation disclosed above, a storage client 116 may store data in connection with the LID 1024.
- the reference entry 2899 corresponding to LID 1024 may be included in the reference index 2809 , due to inter alia the data at 3096 not yet being rewritten in the updated contextual format. However, since the reference index 2809 is maintained separately from the index 2804 , a name collision may not occur, and the storage operation may complete.
- the index 2804 may include a separate entry 2857 comprising the logical interface for the data stored at media storage location 4322, while continuing to provide access to the data formerly bound to 1024 through the reference index 2809 through the logical interface 2811 B.
- any entries in the reference index 2809 due to, inter alia, rewriting, relocating, modifying, deleting, and/or overwriting, the data, the last of the reference entries 2899 may be removed, and the entries 2879 may no longer be linked to reference entries in the reference index 2809 .
- the persistent note associated with the move operation may be invalidated and/or removed from the storage device 120 , as disclosed above.
- the interface 138 of the storage layer 130 may be configured to provide APIs and/or interfaces for performing the storage operations disclosed herein.
- the APIs and/or interfaces may be exposed through one or more of the block interface 131 , an extended virtual storage interface 132 , and/or the like.
- the block interface 131 may be extended to include additional APIs and/or functionality use of interface extensions such as fadvise parameters, I/O control parameters, and the like.
- the interface 138 may provide APIs to perform range clone operations, range move operations, range merge operations, apply attributes and/or metadata to ranges (e.g., freeze a range), manage range snapshots, and the like.
- a range clone operation comprises creating a logical copy of a set of one or more sources LIDs.
- Range clone operations may be implemented using any of the embodiments disclosed herein including, but not limited to: the range clone embodiments depicted in FIGS. 28A-E (including the range merge embodiment of FIG. 28E ), the reference entry embodiments of FIGS. 28F-J , and/or the two-layer mapping embodiments of FIGS. 28K-28J .
- the disclosed embodiments may be further configured to implement range move operations.
- the lower-level interfaces disclosed herein may be used to implement higher-level operations, such as deduplication, file-level snapshots, efficient file copy operations (logical file copies), address space management, mmap checkpoints, atomic writes, and the like. These higher-level operations may also be exposed through the interface 138 of the storage layer 130 .
- FIG. 29A depicts one embodiment of a storage layer 130 configured to provide storage services to a file system 2916 .
- the file system 2916 may be configured to leverage functionality of the storage layer 130 to reduce complexity, overhead, and the like. For example, the file system 2916 may delegate crash recovery functionality to the storage layer 130 , as disclosed above.
- the file system 2916 may be further configured to leverage the range clone functionality of the storage layer to implement efficient file-level snapshot and/or copy operations.
- the file system 2916 may be configured to implement such operations in response to a request (e.g., a copy command, a file snapshot ioctl, or the like).
- the file system 2916 may be configured to implement efficient file copy and/or file-level snapshot operations on a source file may by, inter alia: a) flushing dirty pages of the source file (if any), b) creating a new destination file to represent the copied file and/or file-level snapshot, and c) instructing the storage module to perform a range clone operation configured to clone the source file to the destination file.
- FIG. 29A depicts various embodiments for implementing the range clone operation.
- the storage layer 130 may be configured to maintain an logical address space 136 in which LIDs of the source file are mapped to source file data in the index 2804 (e.g., as disclosed in FIGS. 28A-E ).
- the corresponding range clone operation depicted in state 2911 B may comprise mapping the LIDs of the source file and the destination file to the source file data.
- the range clone operation may further comprise storing a persistent note 2866 on the storage device 120 to indicate that the file data is associated with both the source file LIDs and the destination file LIDs.
- the range clone operation may further comprise storing the file data in accordance with the uploaded contextual format, as disclosed herein.
- the clone operation may leverage a reference index 2809 (e.g., as disclosed in FIGS. 28F-J ).
- a reference index 2809 e.g., as disclosed in FIGS. 28F-J .
- the LIDs of the source file may be directly mapped to the corresponding file data in the index 2804 .
- Creating the range clone in state 2911 D may comprise creating a reference entry in the reference index 2809 and associating the source file LIDs and destination file LIDs with the reference entry.
- the range clone operation may further comprise storing a persistent note 2866 on the storage device and/or updating the contextual format of the file data, as disclosed herein.
- the storage layer 130 may implement the clone operation using a two-layer mapping embodiment (e.g., as disclosed in FIGS. 28K-L ).
- the source file may correspond to virtual identifiers (VIDs) in a virtual address space (VID index 2884 ), which may be mapped to file data LIDs in the logical address space 136 (in the index 2804 ).
- Performing the range clone operation may comprise associating the destination file VIDs with the LIDs of the intermediate mapping layer.
- the range clone operation may further comprise storing a persistent note on the storage device 120 indicating that the destination VIDs are associated with the file data LIDs. Since the file data is already bound to the intermediate file data LIDs, the contextual format of the file data may not need to be updated.
- the file system 2916 may be further configured to leverage the storage layer 130 to checkpoint mmap operations.
- an “mmap” operation refers to an operation in which the contents of files are accessed as pages of memory through standard load and store operations rather than the standard read/write interfaces provided by the file system 2916 .
- An “msync” operation refers to an operation to flush the dirty pages of the file (if any) to the storage device 120 .
- the use of mmap operations may make file checkpointing difficult. File operations are performed in memory and an msync is issued when the state has to be saved. However, the state of the file after msync represents the current in-memory state and the last saved state is lost. If the file system 2916 were to crash during an msync, the file could be left in an inconsistent state.
- the file system 2916 is configured to checkpoint the state of an mmap-ed file during calls with msync.
- Checkpointing the file may comprise creating a file-level snapshot (and/or range clone), as disclosed above.
- the file-level snapshot may be configured to save the state of the file before the changes are applied.
- another clone may be created to reflect the changes applied in the msync operation.
- file 1 may be associated with LIDs 10-13 and corresponding media addresses P1-P4.
- the file system 2916 may perform a range clone operation through the interface 138 of the storage layer 130 , which may comprise creating a cloned file 1 . 1 .
- the cloned file 1 . 1 may be associated with a different set of LIDs 40-43 that reference the same data (same media addresses P1-P4).
- the files may be cloned using a reference entry embodiment and/or two-layer mapping embodiment, as disclosed above.
- the file system 2916 may perform another range clone operation (through the interface 138 ).
- the range clone operation associated with the msync operation may comprise updating the file 1 with the contents of one or more dirty pages (media addresses P5 and P6) and cloning the updated file 1 as file 1 . 2 .
- the file 1 . 1 may reflect the state of the file before the msync operation.
- the file system 2916 may be capable of reconstructing the previous state of the file 1 .
- the storage layer 130 may be further configured to implement efficient atomic storage operations.
- the storage layer 130 comprises an atomic storage module 2932 .
- an atomic storage operation refers to a storage operation that is either fully completed, or is rolled back as a whole. Accordingly, atomic storage operations may not remain in a “partially completed” state.
- Implementing atomic storage operations, and particularly, atomic storage operations comprising multiple steps and/or pertaining to multiple different LID ranges or vectors may impose high overhead costs. For example, some database systems implement atomic storage operations using multiple sets of redundant write operations.
- the atomic storage module 2932 may leverage the range clone, range move, and/or other operations disclosed herein to increase the efficiency of atomic storage operations.
- the interface 138 provides APIs and/or interfaces for performing vectored atomic storage operations.
- a vector may be defined as a data structure, such as:
- the iov_base parameter may reference a memory or buffer location comprising data of the vector
- iov_len may refer to a length or size of the data buffer
- dest_lid may refer to the destination logical identifier(s) for the vector (e.g., base logical identifier, the length of the logical identifier range may be implied and/or derived from the input buffer iov_len).
- a vector storage request to write data to one or more vectors may, therefore, be defined as follows:
- vector_write int fileids, const struct iovect *iov, uint32 iov_cnt, uint32 flag
- the vector write operation above may be configured to gather data from each of the vector data structures referenced by the *iov pointer and/or specified by the vector count parameter (iov_cnt), and write the data to the destination logical identifier(s) specified in the respective iovect structures (e.g., dest_lid).
- the flag parameter may specify whether the vector write operation should be implemented as an atomic vector operation.
- a vector storage request may comprise performing the same operation on each of a plurality of vectors (e.g., implicitly perform a write operation pertaining to one or more different vectors).
- a vector storage request may specify different I/O operations for each constituent vector.
- each iovect data structure may comprise a respective operation indicator.
- the iovect structure may be extended as follows:
- the iov_flag parameter may specify the storage operation to perform on the vector.
- the iov_flag may specify any suitable storage operation, which include, but is not limited to, a write, a read, an atomic write, a trim or discard request, a delete request, a format request, a patterned write request (e.g., request to write a specified pattern), a write zero request, or an atomic write operation with verification request, allocation request, or the like.
- the vector storage request interface described above, may be extended to accept vector structures:
- vector_request ( int fileids, const struct iovect *iov, uint32 iov_cnt, uint32 flag)
- the flag parameter may specify whether the vector operations of the vector_request are to be performed atomically.
- the atomic storage module 136 may be configured to redirect storage operations pertaining to an atomic storage operation to a pre-determined range (an “in-process” range).
- the in-process range may be a designated portion of the logical address space 136 that is not accessible to the storage clients 116 .
- the in-process range may be implemented in a separate address namespace.
- the atomic storage module 2932 may perform an atomic range move operation to move the data from the in-process range to the destination range(s).
- the range move operation may comprise writing a single persistent note 2866 to the storage device 120 .
- a storage client 116 may issue an atomic write request pertaining to vectors 2940 A and 2940 B.
- the LIDs 10-13 of vector 2940 A may be bound to media addresses P1-P4 and the LIDs 36-38 of vector 2940 B may be bound to media addresses P6-8.
- the atomic storage module 2932 may be configured to redirect the atomic storage operations to an in-process index 2836 .
- the in-process index 2836 may comprise a designated region of the logical address space 136 and/or may be implemented within a separate index and/or address namespace.
- the vector 2942 A within the in-processes index 2836 may correspond to the LIDs 10-13 of vector 2940 A and the in-process vector 2942 B may correspond to the LIDs 36-38 of vector 2940 B.
- the vectors 2942 A and 2942 B may comprise metadata configured to reference the corresponding vectors 2940 A and 2940 B in the index 2804 .
- Implementing the atomic storage operations in state 2915 B may comprise appending data to the storage device 120 in association with the in-process LIDs Z0-Z3 and/or Z6-Z6 of the in-process vectors 2942 A and 2942 B.
- Other storage operations may be performed concurrently with and/or interleaved within the atomic vector operations within the in-process index 2936 .
- the original data of vectors 2940 A and 2940 B may be unaffected.
- the data associated with the in-process entries (the data at P9-P13 and/or P100-P102) may identified as part of an incomplete atomic storage operation (due to the association between the data and identifiers within the in-process index 2836 ), and the data may be removed.
- the atomic storage operation(s) may be completed within the in-process index 2936 .
- Completion of the atomic storage request may comprise performing a range move operation to move the data written to the in-process vectors 2942 A and 2942 B into the logical address space 136 .
- the range move operation may comprise performing an atomic storage operation to store a persistent note on the storage device to bind the media address P9-P13 to LIDs 10-13 and P100-102 to LIDs 36-38.
- the range move operation may be implemented in other ways including, but not limited to: the reference entry embodiments of FIGS. 28F-J and/or the two-layer mapping embodiments of FIGS. 28K-L .
- FIG. 30 is a flow diagram of one embodiment of a method 3000 for managing a logical interface of data stored in a contextual format on a non-volatile storage medium.
- Step 3020 may comprise modifying a logical interface of data stored in a contextual format on a non-volatile storage media.
- the logical interface may be modified at step 3020 in response to performing an operation on the data, which may include, but is not limited to: a clone operation, a deduplication operation, a move operation, or the like.
- the request may originate from a storage client 116 , the storage layer 130 (e.g., deduplication module 374 ), or the like.
- Modifying the logical interface may comprise modifying the LID(s) associated with the data, which may include, but is not limited to: referencing the data using one or more additional LIDs (e.g., clone, deduplication, etc.), changing the LID(s) associated with the data (e.g., a move), or the like.
- the modified logical interface may be inconsistent with the contextual format of the data on the non-volatile storage media 122 , as described above.
- Step 3020 may further comprise storing a persistent note on the non-volatile storage media 122 that identifies the modification to the logical interface.
- the persistent note may be used to make the logical operation persistent and crash safe, such that the modified logical interface (e.g., storage metadata 135 ) of the data may be reconstructed from the contents of the non-volatile storage media 122 (if necessary).
- Step 3020 may further comprise acknowledging that the logical interface has been modified (e.g., returning from an API call, returning an explicit acknowledgement, or the like). The acknowledgement occur (and access through the modified logical interface at step 3030 ) before the contextual format of the data is updated on the non-volatile storage media 122 .
- the logical operation may not wait until the data is rewritten and/or relocated; as discussed below, updating contextual format of the data may be deferred and/or implemented in a processes that is outside of the “critical path” of the method 3000 and/or the path for servicing other storage operations and/or requests.
- Step 3030 may comprise providing access to the data in the inconsistent contextual format through the modified logical interface of step 3020 .
- updating the contextual format of the data to be consistent with the modified contextual interface may comprise rewriting and/or relocating the data on the non-volatile storage media, which may impose additional latency on the operation of step 3020 and/or other storage operations pertaining to the modified logical interface. Therefore, the storage layer 130 may be configured to provide access to the data in the inconsistent contextual format while (or before) the contextual format of the data is updated.
- Providing access to the data at step 3030 may comprise referencing and/or linking to one or more reference entries corresponding to the data (via one or more indirect entries), as described above.
- Step 3040 may comprise updating the contextual format of the data on the non-volatile storage media 122 to be consistent with the modified logical interface of step 3020 .
- Step 3040 may comprise rewriting and/or relocating the data to another media storage location on the non-volatile storage media 122 and/or on another non-volatile storage device 120 A-N.
- step 3040 may be implemented using a process that is outside of the critical path of step 3020 and/or other storage requests performed by the storage layer 130 ;
- step 3040 may be implemented by another, autonomous module, such as groomer module 370 , deduplication module 374 , or the like. Accordingly, the contextual format of the data may be updated independent of servicing other storage operations and/or requests.
- step 3040 may comprise deferring an immediate update of the contextual format of the data, and updating the contextual format of the data in one or more “background” processes, such as a groomer process.
- updating the contextual format of the data may occur in response to (e.g., along with) other storage operations. For example, a subsequent request to modify the data may cause the data to be rewritten out-of-place and in the updated contextual format (e.g., as described above in connection with FIG. 29C ).
- Step 3040 may further comprise updating storage metadata 135 as the contextual format of the data is updated.
- the storage layer 130 may update the storage metadata 135 (e.g., index) accordingly.
- the updates may comprise removing one or more links to reference entries in a reference index and/or replacing indirect entries with local entries, as described above.
- Step 3040 may further comprise invalidating and/or removing a persistent note from the non-volatile storage media 122 in response to updating the contextual format of the data and/or persisting the storage metadata 135 , as described above.
- FIG. 31 is a flow diagram of another embodiment of a method 3100 for managing a logical interface of data stored in a contextual format on a non-volatile storage media.
- the method 3100 may be implemented by one or more modules and/or components of the storage controller 140 , such as the groomer module 370 , disclosed herein.
- Step 3120 comprises selecting a storage division for recovery, such as an erase block or logical erase block.
- a storage division for recovery such as an erase block or logical erase block.
- the selection of step 3120 may be based upon a number of different factors, such as a lack of available storage capacity, detecting a percentage of data marked as invalid within a particular logical erase block reaching a threshold, a consolidation of valid data, an error detection rate reaching a threshold, improving data distribution, data refresh, or the like.
- the selection criteria of step 3120 may include whether the storage division comprises data in a contextual format that is inconsistent with a corresponding logical interface thereof, as described above.
- recovering (or reclaiming) a storage division may comprise erasing the storage division and relocating valid data thereon (if any) to other storage locations on the non-volatile storage media.
- Step 3130 may comprise determining whether the contextual format of data to be relocated in a grooming operation should be updated (e.g., is inconsistent with the logical interface of the data).
- Step 3130 may comprise accessing storage metadata 135 , such as the indexes described above, to determine whether the persistent metadata (e.g., logical interface metadata) of the data is consistent with the storage metadata 135 of the data. If the persistent metadata is not consistent with the storage metadata 135 (e.g., associates the data with different LIDs, as described above), the flow continues at step 3140 ; otherwise, the flow continues at step 3150 .
- storage metadata 135 such as the indexes described above
- Step 3140 may comprise updating the contextual format of the data to be consistent with the logical interface of the data.
- Step 3140 may comprise modifying the logical interface metadata to reference a different set of LIDs (and/or reference entries), as described above.
- Step 3150 comprises relocating the data to a different storage location in a log format that, as described above, preserves an ordered sequence of storage operations performed on the non-volatile storage media. Accordingly, the relocated data (in the updated contextual format) may be identified as the valid and up-to-date version of the data when reconstructing the storage metadata 135 (if necessary). Step 3150 may further comprise updating the storage metadata 135 to bind the logical interface of the data to the new media storage locations of the data, remove indirect and/or reference entries to the data in the inconsistent contextual format, and so on, as disclosed herein.
- FIG. 32 is a flow diagram of another embodiment of a method 3200 for managing logical interfaces of data stored in a contextual format.
- Step 3215 may comprise identifying duplicate data on one or more storage devices 120 .
- Step 3215 may be performed by a deduplication module 374 operating within the storage layer 130 .
- step 3220 may be performed by the storage layer 130 as storage operations are performed.
- Step 3215 may comprise determining and/or verifying that the non-volatile storage media 122 comprises duplicate data (or already comprises data of a write and/or modify request). Accordingly, step 3220 may occur within the path of a storage operation (e.g., as or before duplicate data is written to the non-volatile storage media 122 ) and/or may occur outside of the path of servicing storage operations (e.g., identify duplicate data already stored on the non-volatile storage media 122 ). Step 3220 may comprise generating and/or maintaining data signatures in storage metadata 135 , and using the signature to identify duplicate data.
- the storage layer 130 may modify a logical interface of a copy of the data, such that a single copy may be referenced by two (or more) sets of LIDs.
- the modification to the logical interface at step 3220 may comprise updating storage metadata 135 and/or storing a persistent note on the non-volatile storage media 135 , as described above.
- Step 3220 may further comprise invalidating and/or removing other copies of the data on the non-volatile storage media, as described above.
- steps 3230 and 3240 may comprise providing access to the data in the inconsistent contextual format through the modified logical interface and updating the contextual format of the data on the non-volatile storage media 122 , as described above.
- clone operations may be used to perform atomic operations, such as multi-step writes or transactions.
- An atomic operation to modify a data in a particular logical address range may comprise creating a clone of the logical address range, implementing storage operations within the clone, and, when the operations complete, “folding” the clone back into the logical address space 136 (e.g., overlaying the original logical address range with the clone).
- “folding” a logical address range refers to combining two or more address ranges together (e.g., folding a logical address range with a clone thereof).
- the folding may occur according to one of a plurality of operational modes, which may include, but are not limited to: an “overwrite” mode, in which the contents of one of one logical address range “overwrites” the contents of another logical address range, a “merge” mode, in which the contents of the logical address ranges are merged together (e.g., in a logical OR operation), or the like.
- an “overwrite” mode in which the contents of one of one logical address range “overwrites” the contents of another logical address range
- a “merge” mode in which the contents of the logical address ranges are merged together (e.g., in a logical OR operation), or the like.
- FIG. 33A depicts one example of a clone between entries 2814 and 2820 in the index 3304 .
- a storage client modified the data within the clone 972 - 983 , with the updated data being stored at media storage locations 195-206.
- Folding the clone 2824 back into the entry 2814 in an “overwrite” mode results in the entry 2814 being bound to the media storage locations of the clone 2824 (195-206). Portions of the clone 2824 that were not modified (if any) may remain unchanged in the entry 2814 .
- clones may be tied to one another (e.g., using entry metadata 2819 and/or 2829 ).
- An extension to a clone, such as entry 2824 may be predicated on the logical address range being available to the original entry 2814 .
- the link between the entries may be predicated on the “mode” of the clone as described above. For example, if the entries are not to be “folded” at a later time, the clones may not be linked.
- FIG. 33B depicts another example of a folding operation using reference and indirect entries.
- the clones 3314 and 3324 are linked to reference entries 3395 in a reference index 3390 associated with data of the clone.
- a storage client 116 may modify one clone 3324 , resulting in modified data being bound to the clone 3324 (e.g., entry 9217 is bound to media storage location 8923). Accordingly, the clone 3324 has diverged from the clone 3314 .
- the modified data of 9217 may overwrite the original data (e.g., the data at media storage location 872).
- clones may be “tied” together, according to an operational mode of the clones. For example, changes to a clone may be automatically mirrored in the other clone. This mirroring may be uni-directional, bi-direction, or the like.
- the nature of the tie between clones may be maintained in storage metadata (e.g., metadata entries 2819 and 2829 and/or in reference entries 3395 ).
- the storage layer 130 may access the metadata entries 2819 and/or 2829 when storage operations are performed within the LID ranges 2815 and/or 2825 to determine what, if any, synchronization operations are to be performed.
- data of a clone may be designated as ephemeral, as described above. Accordingly, if upon reboot (or another condition), the ephemeral designation is not removed, the clone may be deleted (e.g., invalidated as described above).
- FIG. 34 is a flow diagram of another embodiment of a method for cloning ranges of a logical address space 136 .
- Step 3420 may comprise receiving a request to create a clone.
- the request may be received from a storage client 116 through an interface 138 and/or may be part of a higher-level API provided by the storage layer 130 .
- the request may include an “operational mode” of the clone, which may include, but is not limited to: how the clones are to be synchronized, if at all, how folding is to occur, whether the copy is to be designated as ephemeral, and so on.
- Step 3430 may comprise allowing LIDs in the logical address space 136 to service the request.
- the allocation of step 3430 may further comprise reserving physical storage space to accommodate changes to the clone.
- the reservation of physical storage space may be predicated on the operational mode of the clone. For instance, if all changes are to be synchronized between the clone and the original address range, a small portion (if any) physical storage space may be reserved.
- Step 3430 may further comprise allocating the clone within a designated portion or segment of the logical address space 136 (e.g., a range dedicated for use with clones).
- Step 3440 may comprise updating the logical interface of data of the clone, as described above.
- Step 3440 may further comprise storing a persistent note on the non-volatile storage media to make the clone persistent and crash safe, as described above.
- Step 3450 may comprise receiving a storage request and determining if a storage request pertains to the original LID range and/or the clone of the LID range. If so, the flow continues to step 3460 , otherwise, the flow remains on step 3450 .
- Step 3460 may comprise determining what (if any) operations are to be taken on the other associated LID ranges (e.g., synchronize changes, allocate logical and/or physical storage resources, or the like). The determination of step 3460 may comprise accessing storage metadata describing the operational mode of the clone and/or the nature of the “tie” (if any) between the original LIDs and the clone thereof.
- Step 3470 may comprise performing the operations (if any) determined at step 3460 along with the requested storage operation. If one or more of the synchronization operations cannot be performed (e.g., additional logical address space 136 cannot be allocated), the underlying storage operation may fail.
- FIG. 35 is a flow diagram of another embodiment of a method for managing clones of contextual data.
- Step 3521 may comprise creating a clone of a logical address range as disclosed herein.
- step 3531 one or more storage operations within the original logical address range and/or the clone thereof are performed along with additional, synchronization operations (if any), as described above.
- a request to fold the clone is received.
- the request may specify an operational mode of the fold and/or the operational mode may have been specified when the clone was created at step 3521 .
- Step 3551 comprises folding the clone back into the logical address space 136 of the original logical range.
- Step 3551 may comprise overwriting the contents of the original logical address range with the contents of the clone, merging the logical address ranges (e.g., in an OR operation), or the like.
- the merging comprises deleting (e.g., invalidating) the clone, which may comprise removing entries of the clone from the storage metadata index, removing shared references to media storage locations from a reference count datastructure, and the like.
- Step 3551 may further comprise modifying a logical interface of the merged data, as described above.
- the modified logical interface may change the LIDs used to reference the data.
- the modified logical interface may be inconsistent with the contextual format of the data on the non-volatile storage media 122 . Therefore, step 3551 may further comprise providing access to the data in the inconsistent contextual format and/or updating the contextual format of the data, as described above.
- the storage layer 130 may be configured to segment the logical address logical address space 136 into a plurality of contiguous LID ranges.
- a LID (e.g., address) 1900 is segmented into a first portion 1952 and a second portion 1954 .
- the first portion 1952 comprises “high-order” bits of the LID 1900
- the second portion comprises “low-order” bits.
- the first portion 1952 may serve as a reference or identifier
- the second portion 1952 may represent a range (e.g., block size) offset within a contiguous range of LIDs.
- the storage layer 130 may logically segment or divide the sparse logical address space into segments of contiguous LIDs that can be efficiently allocated as a group.
- segmenting LIDs into 32 high order and 32 low order bits may result in a logical address space 136 that is capable of representing 2 ⁇ 32 ⁇ 1 unique LID allocation ranges (e.g., using the first portion of the LIDs 1952), each of which have a maximum size (or offset) of 2 ⁇ 32 virtual storage locations (e.g., 2 TB for a virtual storage location size of 512 bytes).
- different segmentation schemes may be used.
- the first portion 1952 may comprise a larger proportion of the LID address range and the second portion 1954 (e.g., first portion 1952 comprising 42 bits providing 2 ⁇ 42 ⁇ 1 unique identifiers).
- the ratio between the size of the first and second address portions 1952 and 1954 may be reversed.
- the LID segmentation scheme disclosed herein may be used to define an allocation granularity of the logical address space 136 .
- the allocation granularity is fixed according to the segmentation of the logical addresses 1900 ; each allocation operation in the logical address space 136 comprises allocating X LIDs where X is determined according to the size of the second portion 1954 of the logical addresses 1900 .
- the allocation granularity may also determine the number of unique storage entities that can be represented within the logical address space: in the FIG. 19A embodiment, the logical address space is capable of supporting Y unique storage entities (Y unique LID ranges of size X) where Y is determined according to the size of the first portion 1952 .
- the fixed allocation granularity may result in wasted storage resources.
- each file is allocated a pre-determined range of contiguous LIDs (e.g., 2 ⁇ 32 ⁇ 1 LIDs)
- a large proportion of the LIDs allocated for small files will likely never be used, which may result in increased metadata overhead and/or may reduce the number of unique files that can be represented within the logical address space 136 .
- large files that do not fit within a single LID allocation range e.g., require more than 2 ⁇ 32 LIDs
- LIDs may be limited to 48 bits rather than 64, due to, inter alia, operating system limitations, addressing limitation, addressing overhead (e.g., use of a portion of a LID to represent different virtual storage units), and so on.
- the storage layer 130 may be configured to implement an adaptive and/or variable allocation scheme in which different portions of the logical address space 136 are configured to provide a different, respective allocation granularity.
- allocation granularity refers to the amount of storage resources that are allocated in a single allocation operation.
- the allocation granularity of a region may refer to the size of LID blocks or ranges allocated in the region.
- the allocation granularity of the logical address space 1900 was determined according to the size of the first and second portions 1952 and 1952 of the segmented LIDs 1900.
- allocation granularity may refer to physical storage allocations and/or operations.
- LIDs in the logical address space 136 may correspond to (be bound to) physical storage resources, such as physical sectors.
- a “physical sector,” “data sector,” or “sector” refers to physical storage capacity capable of storing a particular amount of data.
- the physical sector size may, therefore, determine the granularity of data storage operations performed on the storage device 120 ; the data sector size may determine the smallest granularity of write/read operations that can be performed on the storage device 120 .
- storage clients 116 may be configured align storage operations in accordance with a particular data sector size.
- storage clients 116 may adapt storage operations to fall within the 512 byte boundaries.
- the sector size is based on physical characteristics the underlying storage devices; a storage device may, for example, be physically partitioned into sectors or pages having a particular, pre-determined size.
- the storage layer 130 disclosed herein may be capable of storing data within large, logical constructs, such as logical storage divisions 253 and/or logical pages 254 , of the logical storage element 229 , disclosed above in FIG. 3A .
- the storage layer 130 may be capable of performing storage operations according to arbitrarily-sized physical sectors that are independent of the underlying partitioning of the storage device 120 and/or individual, non-volatile storage elements 123 .
- the physical sector size implemented by the storage layer 130 may be configurable and/or variable. As disclosed in further detail below, the physical sector size may vary within different regions of the logical address space 136 ; a LID in a first allocation region of the logical address space 136 may correspond to a 512 byte sector and a LID in a different allocation region may correspond to a 4 kb sector, and so on.
- Storage clients 116 may be configured to operate in different allocation regions in accordance with a preferred physical sector size.
- FIG. 36A is a block diagram of a system 3600 comprising another embodiment of a storage layer 130 .
- the storage layer 130 of the FIG. 36A embodiment may comprise an allocation module 3360 configured to manage allocation within one or more of the logical address space 136 and storage device 120 .
- the storage layer 130 may be configured to store data on a storage device 120 in a contextual, log-based format.
- the storage device 120 may comprise a plurality of independent non-volatile storage elements 223 .
- the non-volatile storage elements 223 may comprise solid-state storage elements, packages, die, chips, and/or the like.
- the storage controller 140 may be configured to manage the independent, non-volatile storage elements 223 as a logical storage element 229 .
- the storage layer 130 may, therefore, be capable of storing data within logical storage units (e.g., logical pages) 254 , which may be formed by combining physical storage units (e.g., pages) 252 of a plurality of the non-volatile storage elements 223 . Accordingly, the storage layer 130 may be capable of storing data segments of different sizes (e.g., different physical sector sizes), independent of the underlying partitioning and/or configuration of the non-volatile storage elements 223 . In some embodiments, for example, the non-volatile storage elements 223 may comprise 2 kb physical pages.
- the logical storage units 254 may comprise 25 physical pages of separate non-volatile storage elements 223 , which may allow the storage layer 130 to perform read/write operations ranging from 0 to 50 kb.
- the disclosure is not limited in this regard, however, and could be adapted to include logical storage elements 229 comprising any number of non-volatile storage elements 223 having any suitable page size.
- the storage controller 140 may be further configured to store data in a contextual, log-based format (a packet format).
- the data write module 240 may be configured to generate packets corresponding to any suitable physical sector size (comprising any sized data segment).
- the size of the packets may be independent of the underlying partitioning and/or arrangement of the non-volatile storage elements 223 . Therefore, the storage layer 130 may be capable of performing storage operations corresponding to any suitable physical sector size and/or physical granularity from a few bytes (e.g., 256 byte sector sizes) to 50 kb, or more.
- the storage layer 130 may be configured to store a packet comprising a 512 data segment within a logical page 254 along a packet comprising a 2 kb data segment 3612 B.
- the allocation module 3660 comprises a partition module 3662 configured to partition and/or segment the logical address space into two or more allocation regions.
- the allocation regions may correspond to different allocation granularities.
- the allocation granularity of a particular region may refer to the allocation of physical storage resources (e.g., physical sector size) and/or logical allocation granularity such as LID allocation block size.
- the allocation module 3660 may further comprise an allocation policy module 3664 configured to determine an allocation granularity for storage client 116 , storage requests, and/or storage entities and/or to selectively reallocate storage resources.
- the reallocation module 3666 may be configured to reallocate storage resources, which may comprise performing one or more of the range clone and/or range move operations, as disclosed herein.
- FIG. 36B illustrates one embodiment of a logical address space 136 comprising allocation regions 3638 A, 3638 B, through 3638 N.
- the partitioning module 3662 may be configured to partition the logical address space 136 into any number of partitions, corresponding to any suitable physical allocation granularity (e.g., any suitable sector size).
- the allocation regions of FIG. 36B may correspond to the granularity of physical storage allocation: LIDs within the region 3638 A may correspond to a relative small data sectors (e.g., 512 bytes); region 3638 B may correspond to a larger data sectors (e.g., 2 kb); and region 3638 N may correspond to larger 4 k data sectors.
- storage operations pertaining to LIDs within the region 3636 A may correspond to 512 byte physical sectors; LIDs within the region 3636 A may correspond data packets 3688 A comprising 512 byte data segments 3612 A. Storage operations performed within the region 3636 A may, therefore, operate at a 512 byte sector granularity (the smallest read/write operation in region 3636 A is 512 bytes).
- the LID 3636A in region 3638 A is bound to data packet 3688 A (in the index 2804 ).
- the data packet 3688 A may comprise a 512 byte data segment 3612 A in accordance with the physical allocation granularity of the region 3638 A.
- the persistent metadata 3864 of the packets 3688 A-N comprise respective size indicators 3687 A-N indicating a size of the corresponding data segments 3612 A-N.
- the data segment size may be indicated in the index 2804 and/or other metadata 135 .
- the LID 3636B in region 3638 B may be bound to data packet 3688 B.
- the data packet 3688 may comprise a 2 k data segment 3812 B in accordance with the physical sector size of region 3638 B.
- LID 3636N may be bound to data packet 3688 N, which may comprise a 4 k data segment 3812 N in accordance with the physical allocation granularity of region 3638 N.
- the differently sized data packets 3688 A-N may be stored at arbitrary physical storage locations within the storage device 120 . In some embodiments, the data packets 3688 A-N may be stored within large, logical storage units 254 of a logical storage element 229 , as disclosed above.
- 36B depicts a particular embodiment of logical address partitioning, the disclosure is not limited in this regard and could be adapted to partition the logical address space 136 into any number of different allocation regions 3638 A-N corresponding to any suitable physical and/or logical allocation granularity.
- Certain storage clients 116 may operate more efficiently at specific sector sizes. For example, an application that processes large amounts of contiguous data may operate most efficiently with large 4 kb sector sizes. Other applications that rely on a large number of relatively small transactions may operate more efficiently using smaller sector sizes.
- the interface 138 provides mechanisms for specifying a desired sector size for particular storage and/or allocation operations.
- a file system storage client 2916 may, for example, specify that storage operations pertaining to a particular file 2929 A be performed at a 2 k sector size.
- the allocation module 3660 may allocate LIDs for the file 2919 A within the region 3638 B of the logical address space 136 .
- the file system 2916 (and/or other storage clients 116 ) may query the interface 138 for information pertaining to the available allocation regions 3638 A-N and/or data sector sizes supported by the storage layer 130 .
- the storage clients 116 may selectively allocate LIDs within the regions 3638 A-N in accordance with a desired physical allocation granularity (sector size).
- the file system storage client 2916 may, therefore, be configured to allocate LIDs having different sector sizes for different files 2919 A-N according to the access characteristics of the files 2929 A-N.
- the file system storage client 2916 may be capable of supporting files 2919 A-N having different respective data sector sizes.
- users may specify a desired file sector size through, inter alia, ioctrl parameters, an fadvise API, and/or the like.
- the log storage module 137 may be configured to provide for storing store data according to the sector size assigned to the data (corresponding to the LID associated with the data).
- the log storage module 137 may determine the sector size in reference to, inter alia, the storage metadata 135 , index 2804 , and/or allocation module 3660 .
- the log storage module 137 may configure the storage device controller 126 (data write module 240 ) to packetize the data in accordance with the sector size for storage within the log on the storage device 120 .
- the log storage module 137 may be further configured to provide for reading data of various, different data sector sizes. In response to a read request pertaining to a particular LID, the log storage module 137 may determine the sector size corresponding to the LID (as above), and may configure the data read module 241 to read the corresponding data packet size.
- FIG. 36C illustrates another embodiment of a logical address space 136 that has been partitioned into a plurality of allocation regions.
- the partitioning module 3662 may be configured to partition the logical address space 136 into 4 regions: 3650 A, 3650 B, 3650 C, and 3650 D.
- Each region 3650 A-D may correspond to a different, respective allocation granularity.
- allocation operations within the regions 3650 A-D correspond to different logical allocation granularities; logical allocation operations within the regions 3650 A-D correspond to differently sized LID extents (blocks of LIDs).
- each region 3650 A-D may result in allocating a different number of contiguous LIDs (different range of contiguous LIDs).
- the region 3650 A may comprise large contiguous LID ranges 3651A, such that allocation operations therein result in allocating a large number of contiguous LIDs (e.g., 2 ⁇ 34 LIDs).
- the region 3650 A may, therefore, be suited for large storage entities (e.g., large files).
- the region 3650 D may correspond to relatively small contiguous LID ranges 3651D, such allocation operations therein result in allocating a smaller number of contiguous LIDs (e.g., 2 ⁇ 12 LIDs).
- the region 3650 D may be suited for use with smaller storage entities (e.g., small files, objects, database tables, or the like).
- the other regions 3650 B and 3650 C may comprise respective contiguous LID ranges 3651B and 3651C, each having a different respective allocation granularity.
- FIG. 36C depicts regions 3650 A-D as being of approximately the same size (e.g., the logical address space 136 is equally segmented into four regions 3650 A-D), the disclosure is not limited in this regard.
- partitioning module 3636 may be configured to segment the logical address space 136 into differently sized regions 3650 A-D and/or into different numbers of regions 3650 A-D.
- the logical address space 136 may be segmented into two regions, a first region for large files and a second region for small files, and large file region may be allocated a larger proportion of the logical address space than the small file region, or vice versa.
- each region 3650 A-D may comprise and/or result in a different segmentation of the LIDs 1901A-D.
- the portion of a LID 1901A-D comprising the “identifier” portion of the LID 1952A-D versus the “offset” or “range” portion of the LID 1954A-D may vary depending on the size of the underlying contiguous LID range 3651A-D.
- the LIDs 1901A of region 3650 A comprise a larger proportion offset bits 1954 A as compared to the LIDs 1901D of region 3650 D.
- the LIDs 1901D of region 3650 D comprise a larger number of identifier bits 1952 D as compared to the LIDs 1901A of region 3650 A.
- the LIDs 1901A-D may further comprise bits for specifying the region 3650 A-D of the LID, specifying a logical storage unit of the LID, and so on.
- each LID 1901A-D may comprise two bits for specifying one of the regions 3650 A-D.
- the storage layer 130 may track LID region relationships based upon pre-determined LID values or ranges (in the index 2804 and/or other metadata 135 ), such that no region-specifying overhead is needed.
- the storage layer 130 may provide access to allocation information through the interface 138 .
- the interface 138 may be configured to publish information pertaining to the allocation regions of the logical address space 136 , indicate the remaining, unallocated and/or unbound resources within a particular region and/or LID block, and the like.
- the interface 138 may be further configured to allow storage clients 116 to specify a desired allocation granularity, physical sector size, and/or the like.
- an allocation request may specify the number of contiguous LIDs requested for allocation.
- the allocation module 3660 may allocate the LIDs within the appropriate region.
- the region 3650 A may contiguous LID ranges 3561A comprising 65536 LIDs
- region 3650 B may comprise contiguous LID ranges 3651B comprising 16384 LIDs
- region 3650 C may comprise contiguous LID ranges 3651C comprising 4096 LIDs
- region 3650 D may comprise contiguous LID ranges 3651D comprising 1024 LIDs.
- the storage layer 130 may allocate an available contiguous LID range 3651B within region 3650 B.
- the storage layer 130 may allocate a contiguous LID range 3651B in region 3650 B and a contiguous LID range 3651C in region 3650 C.
- the allocation module 3660 comprises an allocation policy module 3664 that is configured to select a suitable allocation granularity (region 3650 A-D and/or physical granularity region 3638 A- 3638 N) based on one or more allocation policies, which may include, but are not limited to: availability of contiguous LID ranges in the regions 3650 A-D, whether the LID range is expected to grow, information pertaining to the storage client 116 associated with the request, information pertaining to an application associated with the request, information pertaining to a storage entity associated with the request (e.g., file information), explicit requests, request parameters (ioctrl, fadvise, etc.), and/or the like.
- LID allocation requests may specify a particular allocation region (e.g., LID region 3650 A-D).
- a storage client 116 may initially allocate a small LID range, but may know that the LID range may be required to grow over time (e.g., the storage client 116 may be receiving a stream of data over a network). Accordingly, the storage client 116 request an initially small LID allocation, but may specify that the LID allocation be serviced in the region 3650 A.
- the allocation module 3660 may initially allocate LIDs in the smallest granularity region 3650 D, and may move storage entities to larger regions as needed. As such, even if a storage client requests a larger number of LIDs, the allocation module 3660 may defer allocation of additional LIDs until needed.
- the allocation module 3660 may comprise a reallocation module configured to, inter alia, relocate storage entities between different allocation regions (e.g., physical allocation regions 3638 A-N and/or logical allocation regions 3650 A-D).
- a file storage entity may be initially managed using LIDs within the region 3650 D. However, the file may grow to require more than a single, contiguous LID range 3651D.
- the storage layer 130 may allocate additional contiguous LID ranges 3651D within the region 3650 D.
- reallocation module 3666 may determine that the storage entity should be relocated, which may comprise a range move operation from the region 3650 D to another region 3650 A-C. As disclosed above in conjunction with FIG.
- the range move operation may comprise a) modifying the logical interface of the data corresponding to the storage entity (in the index 2804 and/or other storage metadata 135 ), b) storing persistent note 2866 on the storage device 120 associating the data with the updated logical interface, c) and/or rewriting the data in the uploaded logical interface in one or more background operations.
- the range move operation may comprise modifying a two-layer mapping between the logical identifiers one or more intermediate mapping layers as disclosed above in connection with FIGS. 28K-28L .
- FIG. 37A illustrates on embodiment of an operation to move a storage entity (a file 3720 ) that occupies three contiguous LID ranges within region 3650 C to region 3650 B.
- the original allocations for the file 3720 are represented in the index 2804 in respective entries 3722
- the entries 3722 may comprise a respective LID range 3723A-C and corresponding physical storage locations (media addresses) 3725 A-C comprising data of the file one the storage device 120 , as disclosed herein.
- the entries 3722 may be combined into a single entry (not shown).
- the entries 3722 may be maintained in the region 3650 C of the logical address space 136 (range of LIDs corresponding to the region 3650 C).
- the index 2804 may comprise other entries corresponding to other storage entities (e.g., other files) within various regions 3650 A-D of the logical address space 136 .
- the reallocation module 3666 determine that the file 3720 should be moved from region 3650 C to region 3650 D of the logical address space 136 by use of, inter alia, the policy module 3664 .
- the policy module 3664 may identify files that should be reallocated (moved) based on one or more of: requests to allocate additional capacity for the file 3720 , in response to a balancing operation within the logical address space 136 , in response to availability issues (e.g., lack of availability in the region 3650 C), in response to a move request from a storage client 116 , and/or the like.
- the reallocation module 3666 may be configured to periodically balance the logical address space 136 , to move relatively large files (files comprising a number of contiguous LID ranges), into larger regions, so that the files may benefit from larger contiguous LID ranges. Similarly, files that have not used their allocated capacity for a predetermined time period, may be moved into smaller, granularity regions.
- Moving the file 3720 may comprise allocating one or more contiguous LID ranges 3651B in the region 3650 B.
- the reallocation module 3666 is configured to move the file 3720 from the region 3650 C to the region 3650 B.
- Moving the file 3720 may comprise allocating a contiguous LID range 3651D in region 3650 B, and performing a move operation, as disclosed above (e.g., modifying the logical interface of the file data 3720 , storing a persistent note 2866 on the storage device 120 , and/or updating the contextual format of the data to be consistent with the logical interface).
- Moving the file 3720 may allow the file to be managed using contiguous LIDs of a single entry 3732 in the region 3650 B of the index 3704 , as opposed to multiple entries 3722 .
- the file 3720 has been moved to region 3650 B. Following the range move operation, the file 3720 may grow by a relatively small increment. The increment may require additional capacity beyond the contiguous LID region 3733 A allocated to the file 3720 in region 3650 B.
- the storage layer 130 may allocate additional LIDs in the region 3650 B (e.g., another contiguous LID range 3651B). However, if the increase to LID capacity is relatively small, allocating another, relatively large contiguous LID range 3651B may be inefficient (e.g., result in a large number of unused LIDs). As such, the allocation module 3660 may allocate LIDs in a different region. In the FIG.
- the storage layer 130 allocates the additional LIDs in the region 3650 D, which comprises relatively small contiguous LID ranges 3651D.
- the LID allocation may be represented in an entry 3742 in the index 3704 (within a region 3650 D of the index 3704 ).
- the entry 3742 may comprise a range of LIDs 3743A allocated to the file 3720 , along with corresponding physical storage locations (e.g., physical addresses) 3745A, as described above.
- the file 3720 may, therefore, be managed using two noncontiguous sets of LIDs.
- the entries 3732 and 3742 may be linked (through respective metadata, reference information, or the like), to indicate that the entries 3732 and 3742 correspond to the same file 3720 .
- a file system and/or the storage layer 130 ) may maintain references to the entries 3732 and 3742 (e.g., an i-node or other datastructure).
- allocating additional LIDs comprises moving the file 3720 into the region 3750 A.
- the file 3720 may be moved in response to a request to expand the file 3720 ; in response, the reallocation module 3666 may be configured to move the file 3720 from the region 3650 B to the region 3650 A.
- the move operation may comprise a) allocating a contiguous range of LIDs 3651A in the region 3650 A (represented by entry 3752 ) and b) performing a range move operation to modify the logical interface of the file data to the new LIDs 3615A, as disclosed herein.
- FIGS. 37A-C depict range move operations to move data to different logical allocation regions within the logical address space 136
- the disclosure is not limited in this regard; the same range move operations may be used to move data to/from different physical allocation regions 3638 A-N of FIG. 36B .
- data stored in a plurality of packets 3688 A comprising a 512 byte data segments 3612 A may be moved to a smaller number of packets 3688 B comprising 2 k data segments 3612 B within region 3638 B in one or more range move operations, as disclosed herein.
- the range move operation may comprise maintaining the data in the smaller packets 3688 A until the data is rewritten in a one or more background processes (e.g., grooming operations).
- Storage metadata 135 associated with the data may be configured to indicate that data of the LIDs in region 3638 B are stored with smaller physical segment sizes until the data is rewritten in the updated packet format 3688 B.
- the interface 138 of the storage layer 130 may be configured to provide logical and/or physical allocation information to storage clients 116 through the interface 138 .
- the file system 2916 may leverage such information to streamline file management operations.
- the file system 2916 may perform journaling operations to, inter alia, persist metadata pertaining to allocation operations performed for the files 2919 A-N managed thereby.
- the journaling operations may comprise storing metadata pertaining to logical and/or physical storage allocation operations.
- the file system 2916 may leverage allocation metadata to streamline such operations.
- the interface 138 may, for example, provide an indication of the remaining logical capacity of one or more of the files 2919 A-N.
- the file 2919 A may be allocated within region 3650 B of the logical address space 136 and, as such, may be allocated a particular range of LIDs.
- the file 2919 A may only occupy a limited subset of the allocated LIDs.
- the file system 2916 may query the storage layer 130 (through the interface 138 ) to determine the remaining, allocated LID capacity for the file 2919 A, such that subsequent file expansions can be performed without explicit allocation requests.
- the file system 2916 may be further configured to identify an appropriate allocation region 3650 A-D in accordance with an expected file size.
- the reallocation module 3666 may be configured to move data to/from different regions 3638 A-N of the logical address space 136 .
- the reallocation module 3666 may move a storage entities (files) in response to determining that the storage entity is stored at an unsuitable physical granularity.
- a storage client 116 may perform a large number of small write operations to data stored in a large granularity region 3638 N.
- the small write operations may, for example, comprise modifying 256 bytes of data within large 4 kb data sectors.
- the reallocation module 3666 may be configured to move the data to the region 3638 A that has a smaller 512 byte granularity to improve the performance of the small write operations.
- the move may comprise a range move operation as disclosed above.
- the range move may further comprise rewritten one or more data packets 3688 N comprising 4 kb data segments 3612 N as a plurality of data packets 3688 A comprising smaller 512 byte data segments 3612 A.
- FIG. 38 is a flow diagram of one embodiment of a method 3800 for managing storage allocation.
- Step 3820 may comprise defining a plurality of allocation regions within the logical address space 136 .
- the allocation regions may correspond to a logical allocation granularity (e.g., LID block size) and/or a physical allocation granularity (e.g., data sector size).
- Step 3820 may comprise partitioning the logical address space 136 into different regions and/or sections, as disclosed above.
- step 3820 may comprise defining arbitrary ranges and/or sections of the logical address space to correspond to particular allocation regions.
- Step 3830 may comprise receiving an allocation request.
- the allocation request may be received with the interface 138 of the storage layer 130 .
- the allocation request may comprise a request to allocate one or more LIDs.
- the allocation request may comprise a request to perform a storage operation (e.g., write data the storage device 120 in a nameless write operation, or the like).
- Step 3830 may, therefore, comprise selecting an allocation region for the request by use of, inter alia, the policy module 3664 .
- the policy module 3664 may be configured to select the allocation region based on one or more request parameters, file-level knowledge (e.g., information about the data to be stored in connection with the allocated LIDs), application-level knowledge (e.g., information about the storage client 116 associated with the request, data access characteristics, and the like), request parameters, and the like.
- file-level knowledge e.g., information about the data to be stored in connection with the allocated LIDs
- application-level knowledge e.g., information about the storage client 116 associated with the request, data access characteristics, and the like
- request parameters e.g., information about the storage client 116 associated with the request, data access characteristics, and the like.
- Step 3840 may comprise allocating storage resources within one of the defined allocation regions. Step 3840 may comprise allocating a contiguous range of LIDs within a particular LID allocation region 3650 A-D. Alternatively, or in addition, step 3840 may comprise allocating LIDs and/or storing data at a particular physical granularity (e.g., having a particular data sector size in accordance a selected region 3636 A-N).
- FIG. 39 is a flow diagram of another embodiment of a method 3900 for allocating storage resources.
- Step 3920 may comprise defining a plurality of regions 3638 A-N within a logical address space 136 .
- the regions defined at step 3920 may correspond to different, respective physical granularities. Accordingly, the LIDs of the defined regions 3638 A-N may correspond to different physical sector sizes.
- Step 3930 may comprise associating a LID with a particular data sector size based on, inter alia, the regions defined at step 3920 .
- Step 3930 may be performed in response to receiving a storage request pertaining to the LID, such as request to write and/or modify data associated with the LID, a request to read data associated with the LID, and/or the like.
- the sector size may be determined in reference to storage metadata 135 , the index 2804 , the allocation module 3660 , and/or the like.
- Step 3940 may comprise performing one or more storage operations in accordance with the determined sector size.
- Step 3940 may comprise configuring the data write module 240 to store data packets in accordance with the identified sector data.
- step 3940 may comprise configuring the data read module 241 to read one or more data packets of a particular size, as disclosed above.
- FIG. 40 is a flow diagram of another embodiment of a method 4000 for allocating storage resources.
- Step 4020 may comprise defining a plurality of regions 3650 A-D within the logical address space.
- the regions 3650 A-D may correspond to respective logical allocation granularities, as disclosed above.
- step 4020 may comprise segmenting LIDs into respective identifier portions and/or offset or range portions.
- the segmentation of the LIDs may vary by region, as disclosed above. For example, regions comprising large contiguous LID ranges may use LIDs having a relatively large offset or range portion, and regions comprising relatively small contiguous LID ranges may use LIDs having a relatively small offset or range portion (and a larger identifier portion).
- the segmentation of step 3820 may comprise segmenting the logical address space 136 into equally sized regions. Alternatively, the regions may vary in size and/or extent.
- Step 4030 may comprise allocating one or more LIDs to a storage client 116 within a selected region of the logical address space 136 .
- Step 4030 may comprise selecting a region of the logical address space 136 . Selection of the region may be based upon, inter alia, a size of the request, a request parameter (e.g., the storage client may request allocation within a particular region and/or allocation of a particular range of contiguous LIDs), configuration and/or preferences of the storage client, availability, request parameters (ioctrl, fadvise), and/or the like.
- Allocating the one or more LIDs may comprise allocating a contiguous range of LIDs in accordance with the allocation granularity of the selected region of the logical address space 136 .
- the contiguous range of LIDs allocated at step 4030 may, therefore, comprise logical capacity that exceeds the number of LIDs requested by the storage client 116 .
- step 4030 may comprise allocating one or more noncontiguous LID ranges within one or more of the regions 3650 A-D, as disclosed above.
- Step 4040 comprises managing the segmented logical address space 136 .
- Step 3840 may comprise moving one or more storage entities (e.g., files) in response to allocation changes and/or balancing operations, as disclosed above.
- FIG. 41 is a flow diagram of another embodiment of a method 4100 for allocating storage resources.
- Step 4120 may comprise selecting an allocation region in response to a request.
- the request may comprise an allocation request, a storage request, and/or the like.
- selection of the allocation region may be based on one or more of: size of data associated with the request, a size of a data structure associated with the request, a size of a storage entity associated with the request, a file associated with the request, an application associated with the request, a parameter of the request, a storage client associated with the request, ioctrl parameter, an fadvise parameter, and availability of storage resources, and/or the like.
- the region selected at step 4120 may comprise a logical allocation region 3650 A-D, a physical allocation region 3638 A-N, and/or a region comprising a combination of LID and data sector allocation granularity. Step 4120 may further comprise allocating storage resources within the selected region.
- Step 4130 may comprise performing one or more storage operations within the selected region and/or in accordance with the allocation granularity of the selected region.
- Step 4130 may comprise allocating a particular range of LIDs in accordance with a particular logical allocation region 3650 A-D, storing data within physical sectors of a predetermined size (in accordance with a particular physical allocation region 3638 A-N), and/or the like.
- Step 4140 may comprise moving data corresponding to the storage operations performed at step 4130 to a different allocation region.
- Step 4140 may comprise determining that the data should be moved. The determination may be based on receiving a request, through the interface 138 , to move the data. Alternatively, or in addition, the determination may be based on profiling metadata pertaining to storage operation(s); such as access characteristics of the data, changes in requested allocation size, and/or the like. For example, a file may be moved from a relatively small logical allocation region to a larger logical allocation region in response to continued expansion of the file. In another embodiment, a file may be moved from a large logical allocation region to a smaller logical allocation region in response to a reduction in file size.
- Step 4140 may further comprise performing one or more range move operations to move the data to/from different portions of the logical address space 136 , as disclosed herein.
- These computer program instructions may also be stored in a machine-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the machine-readable memory produce an article of manufacture, including implementing means that implement the function specified.
- the computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer-implemented process, such that the instructions that execute on the computer or other programmable apparatus provide steps for implementing the functions specified.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
- This application is a continuation in part of, and claims priority to, U.S. patent application Ser. No. 13/424,333, entitled, “Logical Interfaces for Contextual Storage,” filed Mar. 19, 2012 for David Flynn et al., and which claims priority to U.S. Provisional Patent Application No. 61/454,235, entitled, “Virtual Storage Layer Supporting Operations Ordering, a Virtual Address Space, Atomic Operations, and Metadata Discovery,” filed Mar. 18, 2011, this application also claims priority to U.S. Provisional Patent Application No. 61/625,647, entitled, “Systems, Methods, and Interfaces for Managing a Logical Address Space,” filed Apr. 17, 2012, for David Flynn et al., and to U.S. Provisional Patent Application No. 61/637,165, entitled, “Systems, Methods, and Interfaces for Managing a Logical Address Space,” filed Apr. 23, 2012, for David Flynn et al., each of which is incorporated by reference.
- This disclosure relates to storage systems and, in particular, to managing an address space of a storage system.
- A computing system may provide a logical address space of a storage device and/or system. The logical address space may comprise identifiers used by storage clients to reference storage resources. The computing system may further comprise a logical-to-physical translation layer configured to map identifiers of the logical address space with the storage location of data associated with the identifiers. The translation layer may comprise any-to-any mappings between identifiers and physical addresses. The logical address space may be independent of the underlying physical storage resources, and may exceed the capacity of the physical storage resources. Storage clients may allocate portions of the logical address space to perform storage operations. Maintaining allocation metadata pertaining to the logical address space may, however, impose significant overhead.
- Disclosed herein are embodiments of methods of managing storage allocation. The disclosed methods may comprise one or more machine-executable operations and/or steps. The disclosed operations and/or steps may be embodied as program code stored on a computer readable storage medium. Accordingly, embodiments of the methods disclosed herein may be embodied as a computer program product comprising a computer readable storage medium storing computer usable program code executable to cause a computing device to perform one or more method operations and/or steps.
- Embodiments of the disclosed method may comprise a computing device providing an address space of a storage device, the address space configured such that at least two or more addresses of the address space are associated with a different physical storage capacity, and allocating one of the at least two or more addresses to a storage client in response to a storage request. The allocation granularity may pertain to allocation of logical addresses within the address space. Alternatively, or in addition, the allocation granularity may pertain to data segment size corresponding to one or more logical address of the address space.
- The method may further comprise allocating a logical identifier within a first section of the address space corresponding to a first sector size on the storage device and allocating a logical identifier within a different section of the address space corresponding to a different sector size on the storage device.
- In some embodiments, the method includes allocating storage resources within a selected section of the address space in response to a request from a storage client, and selecting the section based on one or more of: a size of data associated with the request, a size of a data structure associated with the request, a size of a storage entity associated with the request, a file associated with the request, an application associated with the request, a parameter of the request, a storage client associated with the request, an input/output (I/O) control (ioctrl) parameter, an fadvise parameter, and availability of unallocated logical addresses within the sections. Dividing the address space may comprise partitioning logical addresses within the address space into an identifier portion and an offset portion, wherein relative sizes of the identifier portions to the offset portions vary between the sections.
- Some embodiments of the method may further comprise moving data stored on the storage device to a different section of the address space. Moving the data may comprise associating the data with a different logical address than a logical address stored with the data on the storage device and/or updating persistent metadata of the data to reference the different logical address in response to relocating the data on the storage device.
- Disclosed herein are embodiments of an apparatus, comprising a translation module configured to manage a logical address space of a storage device, a partitioning module configured to segment the logical address space into a plurality of different regions, the individual regions having a different respective allocation granularity, and an allocation module configured to allocate logical identifiers within the regions in accordance with the allocation granularities of the regions. The apparatus may further include an interface module configured to provide for specifying a region of the logical address space in which to perform one or more of an allocation operation and a storage operation. The interface module may be configured to provide information pertaining to the allocation granularities of one or more of the regions to a storage client. The different respective allocation granularities of the regions may pertain to one or more of: a logical identifier block size for allocation operations performed within the respective regions and a data sector size associated with the logical identifiers of the respective regions. The allocation module may be configured to associate logical identifiers within the logical address space with one of a plurality of data sector sizes. In some embodiments, the apparatus includes a data read module configured to read a data segment associated with a logical identifier on the storage device, wherein a size of the data segment corresponds to a data sector size associated with the logical identifier.
- In some embodiments, the disclosed apparatus also includes a reallocation module configured to reallocate a set of logical identifiers corresponding to data stored on the storage device to a different set of logical identifiers. The reallocation module may be configured to modify a size of a block of logical identifiers associated with the data, and the reallocation module may be configured to move one or more of the logical identifiers to another region of the logical address space. Each region may comprise one or more blocks of logical identifiers within the logical address space. The reallocation module may be configured to combine a plurality of blocks allocated within a first region of the logical address space into a single, larger block of logical identifiers within a different region of the logical address space. Alternatively, or in addition, the reallocation module may be configured to reallocate a block of logical identifiers within a first region of the logical address space as one or more smaller blocks of logical identifiers within a different region of the logical address space.
- The apparatus may further include a log storage module configured to store data on the storage device in association with respective logical identifiers corresponding to the data. The reallocation module may be configured to modify the logical identifier associated with a data segment such that the logical identifier associated with the data segment on the storage device is inconsistent with the modified logical identifier. The apparatus may include a translation module configured to reference the data segment associated with the inconsistent logical identifier on the storage device by use of the modified logical identifier. The log storage module may be configured to store the data segment in association with the modified logical identifier on the storage device in response to grooming a storage division comprising the data segment.
- Disclosed herein are embodiments of a method comprising associating logical addresses of an address space with respective sector sizes, wherein the sector size associated with a logical address corresponds to a physical storage capacity on a storage device corresponding to the logical address, determining a sector size of one of the logical addresses in response to a request, and performing a storage operation on the storage device in accordance with the determined sector size. The method may further include selecting a sector size for the logical address based on one or more of a file associated with the logical address, an application associated with the logical address, the storage client associated with the logical address, an input/output (I/O) control parameter, and an fadvise parameter. In some embodiments, the method further includes determining an available physical storage capacity of the storage device based on sector sizes of logical addresses of the address space that are associated with valid data on the storage device and/or assigning a different respective sector size to each of a plurality of segments of the address space, wherein determining the sector size of the logical address comprises associating the logical address with one of the segments.
-
FIG. 1 is a block diagram of one embodiment of a storage system; -
FIG. 2 is a block diagram of another embodiment of a storage system; -
FIG. 3A is a block diagram of another embodiment of a storage system; -
FIG. 3B depicts one example of a contextual data format; -
FIG. 3C is a block diagram of an exemplary log storage format; -
FIG. 3D depicts one embodiment of an index; -
FIG. 4 is a block diagram of one embodiment of an apparatus to allocate data storage space; -
FIG. 5 is a block diagram of another embodiment of an apparatus to allocate data storage space; -
FIG. 6 is a flow diagram of one embodiment of a method for allocating data storage space; -
FIG. 7 is a flow diagram of one embodiment of a method for servicing a physical capacity request; -
FIG. 8 is a flow diagram of one embodiment of a method for reserving physical storage space; -
FIG. 9 is a flow chart diagram of one embodiment of a method for binding allocated logical identifiers to media storage locations; -
FIG. 10 is a flow diagram of another embodiment of a method for binding allocated logical identifiers to media storage locations; -
FIG. 11 is a flow diagram of one embodiment of a method for servicing an allocation query at a storage device; -
FIG. 12 is a schematic diagram of exemplary embodiments of indexes to associate logical identifiers with storage locations of a storage device; -
FIG. 13 is a schematic diagram of exemplary embodiments of indexes to associate logical identifiers with storage locations of a storage device; -
FIG. 14 depicts an example of an index for maintaining unallocated logical capacity; -
FIG. 15 is a flow diagram of one embodiment of a method for allocating a storage device; -
FIG. 16 is a flow diagram of one embodiment of a method for allocating a storage device; -
FIG. 17 is a schematic diagram of exemplary embodiments of storage metadata; -
FIG. 18 is a schematic diagram of exemplary embodiments of physical reservation metadata; -
FIG. 19A depicts a logical identifier that has been segmented into a first portion and a second portion; -
FIG. 19B is a schematic diagram of exemplary embodiments of storage metadata for segmented logical identifiers; -
FIG. 19C is a schematic diagram of exemplary embodiments of physical reservation metadata for segmented logical identifiers; -
FIG. 20A is a schematic diagram of exemplary embodiments of a file system storage client accessing a storage layer using segmented logical identifiers; -
FIG. 20B is a schematic diagram of exemplary embodiments of a file system storage client accessing a storage layer using segmented logical identifiers; -
FIG. 21 is a flow diagram of one embodiment of a method for providing a storage layer; -
FIG. 22 is a flow diagram of one embodiment of a method for segmenting logical identifiers of a logical address space; -
FIG. 23 is a flow diagram of one embodiment of a method for providing crash recovery and data integrity in a storage layer; -
FIG. 24A is a flow diagram of one embodiment of a method for servicing queries pertaining to the status of a logical identifier; -
FIG. 24B is a flow diagram of one embodiment of a method of servicing queries pertaining to a media storage location; -
FIG. 25A depicts one embodiment of a contextual, log-based data format; -
FIG. 25B depicts one embodiment of a persistent note; -
FIG. 25C is a flow diagram of one embodiment of a method for designating ephemeral data; -
FIG. 26 is a flow diagram of one embodiment of a method reconstructing storage metadata and/or determining the status of media storage locations using a contextual, log-based data format; -
FIG. 27 is a flow diagram of one embodiment of a method ordering storage operations using barriers; -
FIGS. 28A-E depict embodiments of clone operations; -
FIG. 28F depicts another embodiment of a storage layer; -
FIGS. 28G-J depict embodiments of clone operations using reference entries; -
FIG. 28K depicts one embodiment of an indirection layer; -
FIG. 28L depicts one embodiment of a clone operation performed using intermediate mapping layers; -
FIG. 29M depicts one embodiment of a deduplication operation; -
FIG. 28N depicts embodiments of snapshot operations; -
FIGS. 28O-S depict embodiments of range move operations; -
FIG. 29A depicts one embodiment of a storage layer configured to perform logical storage operations for a file system; -
FIG. 29B depicts one embodiment of a storage layer configured to implement mmap checkpoints; -
FIG. 29C depicts one embodiment of storage layer configured to implement atomic storage operations; -
FIG. 30 is a flow diagram of one embodiment of a method for managing a logical interface of data storage in a contextual format on a non-volatile storage media; -
FIG. 31 is a flow diagram of one embodiment of a method for managing a logical interface of contextual data; -
FIG. 32 is a flow diagram of another embodiment of a method managing a logical interface of contextual data; -
FIGS. 33A-B depict exemplary clone operations; -
FIG. 34 is a flow diagram of one embodiment of a method for managing a clone of contextual data; -
FIG. 35 is a flow diagram of one embodiment of a method for folding a clone of contextual data; -
FIG. 36A depicts another embodiment a storage layer; -
FIG. 36B depicts one embodiment of a logical address space comprising a plurality of allocation regions; -
FIG. 36C depicts another embodiment of a logical address space comprising a plurality of allocation regions; -
FIG. 37A depicts one example of a move operation within a segmented logical address space; -
FIG. 37B depicts one example of an allocation operation within a segmented logical address space; -
FIG. 37C depicts another example of an allocation operation within a segmented logical address space; and -
FIGS. 38-41 are flow diagrams of embodiments of methods for managing storage allocation. - According to various embodiments, a storage layer manages one or more storage devices. The storage device(s) may comprise non-volatile storage devices, such as solid-state storage device(s), that are arranged and/or partitioned into a plurality of addressable, media storage locations. As used herein, a media storage location refers to any physical unit of storage (e.g., any physical storage media quantity on a storage device). Media storage units may include, but are not limited to: pages, storage divisions, erase blocks, sectors, blocks, collections or sets of physical storage locations (e.g., logical pages, logical erase blocks, etc., described below), or the like.
- The storage layer may be configured to present a logical address space to one or more storage clients. As used herein, a logical address space refers to a logical representation of storage resources. The logical address space may comprise a plurality (e.g., range) of logical identifiers. As used herein, a logical identifier (LID) refers to any identifier for referencing a storage resource (e.g., data), including, but not limited to: a logical block address (“LBA”), a cylinder/head/sector (“CHS”) address, a file name, an object identifier, an inode, a Universally Unique Identifier (“UUID”), a Globally Unique Identifier (“GUID”), a hash code, a signature, an index entry, a range, an extent, or the like. The logical address space, LIDs, and relationships between LIDs and storage resources define a “logical interface” through which storage clients access storage resources. As used herein, a logical interface refers to a handle, identifier, path, process, or other mechanism for referencing and/or interfacing with a storage resource. A logical interface may include, but is not limited to: a LID, a range or extent of LIDs, a reference to a LID (e.g., a link between LIDs, a pointer to a LID, etc.), a reference to a virtual storage unit, or the like. A logical interface may be used to reference data through a storage interface and/or application programming interface (“API”).
- The storage layer may maintain storage metadata, such as a forward index, to map LIDs of the logical address space to media storage locations on the storage device(s). The storage layer may provide for arbitrary, “any-to-any” mappings to physical storage resources. Accordingly, there may be no pre-defined and/or pre-set mappings between LIDs and particular media storage locations and/or media addresses. As used herein, a media address refers to an address of a storage resource that uniquely identifies one storage resource from another to a controller that manages a plurality of storage resources, by way of example, a media address includes, but is not limited to: the address of a media storage location, a physical storage unit, a collection of physical storage units (e.g., a logical storage unit), a portion of a media storage unit (e.g., a logical storage unit address and offset, range, and/or extent), or the like. Accordingly, the storage layer may map LIDs to physical data resources of any size and/or granularity, which may or may not correspond to the underlying data partitioning scheme of the storage device(s). For example, in some embodiments, the storage controller is configured to store data within logical storage units that are formed by logically combining a plurality of physical storage units, which may allow the storage controller to support many different virtual storage unit sizes and/or granularities.
- As used herein, a logical storage element refers to a set of two or more non-volatile storage elements that are or are capable of being managed in parallel (e.g., via an I/O and/or control bus). A logical storage element may comprise a plurality of logical storage units, such as logical pages, logical storage divisions (e.g., logical erase blocks), and so on. Each logical storage unit may be comprised of storage units on the non-volatile storage elements in the respective logical storage element. As used herein, a logical storage unit refers to logical construct combining two or more physical storage units, each physical storage unit on a respective solid-state storage element in the respective logical storage element (each solid-state storage element being accessible in parallel). As used herein, a logical storage division refers to a set of two or more physical storage divisions, each physical storage division on a respective solid-state storage element in the respective logical storage element.
- The logical address space presented by the storage layer may have a logical capacity, which may comprise a finite set or range of LIDs. The logical capacity of the logical address space may correspond to the number of available LIDs in the logical address space and/or the size and/or granularity of the data referenced by the LIDs. For example, the logical capacity of a logical address space comprising 2̂32 unique LIDs, each referencing 2048 bytes (2 kb) of data may be 2̂43 bytes. In some embodiments, the logical address space may be “thinly provisioned.” As used herein, a thinly provisioned logical address space refers to a logical address space having a logical capacity that exceeds the physical storage capacity of the underlying storage device(s). For example, the storage layer may present a 64-bit logical address space to the storage clients (e.g., a logical address space referenced by 64-bit LIDs), which exceeds the physical storage capacity of the underlying storage devices. The large logical address space may allow storage clients to allocate and/or reference contiguous ranges of LIDs, while reducing the chance of naming conflicts. The storage layer may leverage the “any-to-any” mappings between LIDs and physical storage resources to manage the logical address space independently of the underlying physical storage devices. For example, the storage layer may add and/or remove physical storage resources seamlessly, as needed, and without changing the logical interfaces used by the storage clients.
- The storage layer may be configured to store data in a contextual format. As used herein, a contextual format refers to a “self-describing” data format in which persistent metadata is associated with the data on the physical storage media (e.g., stored with the data in a packet, or other data structure). The persistent metadata provides context for the data with which it is stored. In certain embodiments, the persistent metadata uniquely identifies the data with which the persistent metadata is stored. For example, the persistent metadata may uniquely identify a sector of data owned by a storage client from other sectors of data owned by the storage client. In a further embodiment, the persistent metadata identifies an operation that is performed on the data. In a further embodiment, the persistent metadata identifies an order of a sequence of operations performed on the data. In a further embodiment, the persistent metadata identifies security controls, a data type, or other attributes of the data. In certain embodiment, the persistent metadata identifies at least one of a plurality of aspects, including data type, a unique data identifier, an operation, and an order of a sequence of operations performed on the data. The persistent metadata may include, but is not limited to: a logical interface of the data, an identifier of the data (e.g., a LID, file name, object id, label, unique identifier, or the like), reference(s) to other data (e.g., an indicator that the data is associated with other data), a relative position or offset of the data with respect to other data (e.g., file offset, etc.), data size and/or range, and the like. The contextual data format may comprise a packet format comprising a data segment and one or more headers. Alternatively, a contextual data format may associate data with context information in other ways (e.g., in a dedicated index on the non-volatile storage media, a storage division index, or the like). Accordingly, a contextual data format refers to a data format that associates the data with a logical interface of the data (e.g., the “context” of the data). A contextual data format is self-describing in that the contextual data format includes the logical interface of the data.
- In some embodiments, the contextual data format may allow data context to be determined (and/or reconstructed) based upon the contents of the non-volatile storage media, and independently of other storage metadata, such as the arbitrary, “any-to-any” mappings discussed above. Since the media storage location of data is independent of the logical interface of the data, it may be inefficient (or impossible) to determine the context of data based solely upon the media storage location or media address of the data. Storing data in a contextual format on the non-volatile storage media may allow data context to be determined without reference to other storage metadata. For example, the contextual data format may allow the logical interface of data to be reconstructed based only upon the contents of the non-volatile storage media (e.g., reconstruct the “any-to-any” mappings between LID and media storage location).
- In some embodiments, the storage controller may be configured to store data on an asymmetric, write-once storage media, such as solid-state storage media. As used herein, a “write once” storage media refers to a storage media that is reinitialized (e.g., erased) each time new data is written or programmed thereon. As used herein, “asymmetric” storage media refers to storage media having different latencies for different storage operations. Many types of solid-state storage media are asymmetric; for example, a read operation may be much faster than a write/program operation, and a write/program operation may be much faster than an erase operation (e.g., reading the media may be hundreds of times faster than erasing, and tens of times faster than programming the media). The storage media may be partitioned into storage divisions that can be erased as a group (e.g., erase blocks) in order to, inter alia, account for the asymmetric properties of the media. As such, modifying a single data segment “in-place” may require erasing the entire erase block comprising the data, and rewriting the modified data to the erase block, along with the original, unchanged data. This may result in inefficient “write amplification,” which may excessively wear the media. Therefore, in some embodiments, the storage controller may be configured to write data “out-of-place.” As used herein, writing data “out-of-place” refers to writing data to different media storage location(s) rather than overwriting the data “in-place” (e.g., overwriting the original physical location of the data). Modifying data “out-of-place” may avoid write amplification, since existing, valid data on the erase block with the data to be modified need not be erased and recopied. Moreover, writing data “out-of-place” may remove erasure from the latency path of many storage operations (the erasure latency is no longer part of the “critical path” of a write operation).
- The storage controller may comprise one or more processes that operate outside of the regular path for servicing of storage operations (the “path” for performing a storage operation and/or servicing a storage request). As used herein, the “regular path for servicing a storage request” or “path for servicing a storage operation” (also referred to as a “critical path”) refers to a series of processing operations needed to service the storage operation or request, such as a read, write, modify, or the like. The path for servicing a storage request may comprise receiving the request from a storage client, identifying the logical interface of the request (e.g., LIDs pertaining to the request), performing one or more storage operations on a non-volatile storage media, and returning a result, such as acknowledgement or data. Processes that occur outside of the path for servicing storage requests may include, but are not limited to: a groomer, deduplication, and so on. These processes may be implemented autonomously, and in the background from servicing storage requests, such that they do not interfere with or impact the performance of other storage operations and/or requests. Accordingly, these processes may operate independent of servicing storage requests.
- In some embodiments, the storage controller comprises a groomer, which is configured to reclaim storage divisions (erase blocks) for reuse. The write out-of-place paradigm implemented by the storage controller may result in obsolete or invalid data (data that has been erased, modified, and/or overwritten) remaining on the storage device. For example, overwriting data X with data Y may result in storing Y on a new storage division (rather than overwriting X in place), and updating the “any-to-any” mappings of the storage metadata to identify Y as the valid, up-to-date version of the data. The obsolete version of the data X may be marked as “invalid,” but may not be immediately removed (e.g., erased), since, as discussed above, erasing X may involve erasing an entire storage division, which is a time-consuming operation and may result in write amplification. Similarly, data that is no longer is use (e.g., deleted or trimmed data) may not be immediately removed. The non-volatile storage media may accumulate a significant amount of “invalid” data. A groomer process may operate outside of the “critical path” for servicing storage operations. The groomer process may reclaim storage divisions so that they can be reused for other storage operations. As used herein, reclaiming a storage division refers to erasing the storage division so that new data may be stored/programmed thereon. Reclaiming a storage division may comprise relocating valid data on the storage division to a new storage location. The groomer may identify storage divisions for reclamation based upon one or more factors, which may include, but are not limited to: the amount of invalid data in the storage division, the amount of valid data in the storage division, wear on the storage division (e.g., number of erase cycles), time since the storage division was programmed or refreshed, and so on.
- The storage controller may be further configured to store data in a log format. As described above, a log format refers to a data format that defines an ordered sequence of storage operations performed on a non-volatile storage media. In some embodiments, the log format comprises storing data in a pre-determined sequence within the media address space of the non-volatile storage media (e.g., sequentially within pages and/or erase blocks of the media). The log format may further comprise associating data (e.g., each packet or data segment) with respective sequence indicators. The sequence indicators may be applied to data individually (e.g., applied to each data packet) and/or to data groupings (e.g., packets stored sequentially on a storage division, such as an erase block). In some embodiments, sequence indicators may be applied to storage divisions when the storage divisions are reclaimed (e.g., erased), as described above, and/or when the storage divisions are first used to store data.
- In some embodiments the log format may comprise storing data in an “append only” paradigm. The storage controller may maintain a current append point within a media address space of the storage device. The append point may be a current storage division and/or offset within a storage division. Data may then be sequentially appended from the append point. The sequential ordering of the data, therefore, may be determined based upon the sequence indicator of the storage division of the data in combination with the sequence of the data within the storage division. Upon reaching the end of a storage division, the storage controller may identify the “next” available storage division (the next storage division that is initialized and ready to store data). The groomer may reclaim storage divisions comprising invalid, stale, and/or deleted data, to ensure that data may continue to be appended to the media log.
- The log format described herein may allow valid data to be distinguished from invalid data based upon the contents of the non-volatile storage media, and independently of the storage metadata. As discussed above, invalid data may not be removed from the storage media until the storage division comprising the data is reclaimed. Therefore, multiple “versions” of data having the same context may exist on the non-volatile storage media (e.g., multiple versions of data having the same logical interface and/or same LID). The sequence indicators associated with the data may be used to distinguish “invalid” versions of data from the current, up-to-date version of the data; the data that is the most recent in the log is the current version, and all previous versions may be identified as invalid.
- According to various embodiments, a logical interface of data stored in a contextual format is modified. The contextual format of the data may be inconsistent with the modified logical interface. As used herein, an inconsistent contextual data format refers to a contextual data format that defines a logical interface to data on storage media that is inconsistent with the logical interface of the data. The logical interface of the data may be maintained by a storage layer, storage controller, or other module. The inconsistency may include, but is not limited to: the contextual data format associating the data with a different LID than the logical interface; the contextual data format associating the data with a different set of LIDs than the logical interface; the contextual data format associating the data with a different LID reference than the logical interface; or the like. The storage controller may provide access to the data in the inconsistent contextual format and may update the contextual format of the data of the non-volatile storage media to be consistent with the modified logical interface. The update may require rewriting the data out-of-place and, as such, may be deferred. As used herein, a consistent contextual data format refers to a contextual data format that defines the same (or an equivalent) logical interface as the logical interface of the data, which may include, but is not limited to: the contextual data format associating the data with the same LID(s) (or equivalent LID(s)) as the logical interface; the contextual data format associating the LID with the same set of LIDs as the logical interface; the contextual data format associating the data with the same reference LID as the logical interface; or the like.
- According to various embodiments, a storage controller and/or storage layer performs a method for managing a logical address space, comprising: modifying a logical interface of data stored in a contextual format on a non-volatile storage media, wherein the contextual format of the data on the non-volatile storage media is inconsistent with the modified logical interface of the data; accessing the data in the inconsistent contextual format through the modified logical interface; and updating the contextual format of the data on the non-volatile storage media to be consistent with the modified logical interface. The logical interface of the data may be modified in response to a request (e.g., a request from a storage client). The request may comprise a move, clone (e.g., copy), deduplication, or the like. The request may “return” (e.g., be acknowledged by the storage layer) before the contextual format of the data is updated on the non-volatile storage media. Modifying the logical interface may further comprise storing a persistent note on the non-volatile storage media indicative of the modification to the logical interface (e.g., associate the data with the modified logical interface). The contextual format of the data may be updated out-of-place, at other media storage locations on the non-volatile storage media. Updates to the contextual format may be deferred and/or made outside of the path of other storage operations (e.g., independent of servicing other storage operations and/or requests). For example, the contextual format of the data may be updated as part of a grooming process. When reclaiming a storage division, data that is in an inconsistent contextual format may be identified and updated as the data is relocated to new media storage locations. Providing access to the data through the modified logical interface may comprise referencing the data in the inconsistent contextual format through one or more reference entry and/or indirect entries in an index.
- In the following detailed description, reference is made to the accompanying drawings, which form a part thereof. The foregoing summary is illustrative only and is not intended to be in any way limiting. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features will become apparent by reference to the drawings and the following detailed description.
-
FIG. 1 is a block diagram of one embodiment of asystem 100 for allocating storage resources. Thestorage system 102 comprises astorage controller 140 andstorage layer 130, which may be configured for operation on acomputing device 110. Thecomputing device 110 may comprise aprocessor 111,volatile memory 112, acommunication interface 113, and the like. Theprocessor 111 may comprise one or more central processing units, one or more general-purpose processors, one or more application-specific processors, one or more virtual processors (e.g., thecomputing device 110 may be a virtual machine operating within a host), one or more processor cores, or the like. Thecommunication interface 113 may comprise one or more network interfaces configured to communicatively couple the computing device 110 (and/or storage layer 130) to a communication network, such as an Internet Protocol network, a Storage Area Network, or the like. Thecomputing device 110 may further comprise machine-readable storage media 114. The machine-readable storage media 114 may comprise machine-executable instructions configured to cause the computing device 110 (e.g., processor 111) to perform steps of one or more of the methods disclosed herein. Alternatively, or in addition, thestorage layer 130 and/or one or more modules thereof may be embodied as one or more machine-readable instructions stored on thenon-transitory storage media 114. The machine-readable storage medium 114 may comprise one or more persistent, non-transitory storage devices. - The
storage layer 130 may be configured to provide storage services to one ormore storage clients 116. Thestorage clients 116 may includelocal storage clients 116 operating on thecomputing device 110 and/or remote,storage clients 116 accessible via the network 115 (and communication interface 113). Thestorage clients 116 may include, but are not limited to: operating systems, file systems, database applications, server applications, kernel-level processes, user-level processes, applications, and the like. - The
storage layer 130 comprises and/or is communicatively coupled to one ormore storage devices 120A-N. Thestorage devices 120A-N may include different types of storage devices including, but not limited to: solid-state storage devices, hard drives, SAN storage resources, or the like. Thestorage devices 120A-N may compriserespective controllers 126A-N andnon-volatile storage media 122A-N. Thestorage layer 130 may comprise aninterface 138 configured to provide access to storage services and/ormetadata 135 maintained by thestorage layer 130. Theinterface 138 may be comprise, but is not limited to: a block I/O interface 131, avirtual storage interface 132, acache interface 133, and the like.Storage metadata 135 may be used to manage and/or track storage operations performed through any of the block I/O interface 131,virtual storage interface 132,cache interface 133, or other, related interfaces. - The
cache interface 133 may expose cache-specific features accessible through thestorage layer 130. In some embodiments, thevirtual storage interface 132 presented to thestorage clients 116 provides access to data transformations implemented by thenon-volatile storage device 120 and/or the non-volatilestorage media controller 126. - The
storage layer 130 may provide storage services through one or more interfaces, which may include, but are not limited to: a block I/O interface, an extended virtual storage interface, a cache interface, and the like. Thestorage layer 130 may present alogical address space 136 to thestorage clients 116 through one or more interfaces. As discussed above, thelogical address space 136 may comprise a plurality of LIDs, each corresponding to respective media storage locations on one or more of thestorage devices 120A-N. Thestorage layer 130 may maintainstorage metadata 135 comprising “any-to-any” mappings between LIDs and media storage locations, as described above. Thelogical address space 136 andstorage metadata 135 may, therefore, define a logical interface of data stored on thestorage devices 120A-N. - The
storage layer 130 may further comprise alog storage module 137 that is configured to store data in a contextual, log format. The contextual, log data format may comprise associating data with persistent metadata, such as the logical interface of the data (e.g., LID), or the like. The contextual, log format may further comprise associating data with respective sequence identifiers on thenon-volatile storage media 122A-N, which define an ordered sequence of storage operations performed on thestorage devices 120A-N, as described above. - The
storage layer 130 may further comprise astorage device interface 139 configured to transfer data, commands, and/or queries to thestorage devices 120A-N over abus 127, which may include, but is not limited to: a peripheral component interconnect express (“PCI Express” or “PCIe”) bus, a serial Advanced Technology Attachment (“ATA”) bus, a parallel ATA bus, a small computer system interface (“SCSI”), FireWire, Fibre Channel, a Universal Serial Bus (“USB”), a PCIe Advanced Switching (“PCIe-AS”) bus, a network, Infiniband, SCSI RDMA, or the like. Thestorage device interface 139 may communicate with thestorage devices 120A-N using input-output control (“IO-CTL”) command(s), IO-CTL command extension(s), remote direct memory access, or the like. - The
non-volatile storage devices 120A-N may comprisenon-volatile storage media 122A-N, which may include but is not limited to: NAND flash memory, NOR flash memory, nano random access memory (“nano RAM” or “NRAM”), magneto-resistive RAM (“MRAM”), dynamic RAM (“DRAM”), phase change RAM (“PRAM”), magnetic storage media (e.g., hard disk, tape), optical storage media, or the like. - Portions of the
storage layer 130 may be implemented by use of one or more drivers, kernel-level applications, user-level applications, and the like, which may be configured to operate within an operating system, guest operating system (e.g., in a virtualized computing environment), or the like. Other portions of thestorage layer 130 may be implemented by use of hardware components, such as one or more controllers, Field-Programmable Gate Arrays (“FPGAs”), Application-Specific Integrated Circuits (“ASICs”), and/or the like. - The
storage layer 130 may present alogical address space 136 to the storage clients 116 (through one or more of theinterfaces storage layer 130 may maintainstorage metadata 135 comprising “any-to-any” mappings between LIDs in thelogical address space 136 and media storage locations on one or morenon-volatile storage devices 120A-N. Thestorage layer 130 may further comprise alog storage module 137 configured to store data on the storage device(s) 120A-N in a contextual, log format. The contextual, log data format may comprise storing data in association with persistent metadata, such as the logical interface of the data. The contextual, log format may further comprise associating data with respective sequence identifiers that define an ordered sequence of storage operations performed through thestorage layer 130. -
FIG. 2 depicts another embodiment of asystem 200 comprising astorage controller 140 configured to write and/or read data in a contextual, log-based format. Thesystem 200 may comprise anon-volatile storage device 120 comprisingnon-volatile storage media 222. Thenon-volatile storage media 222 may comprise a plurality ofnon-volatile storage elements 223, which may be communicatively coupled to thestorage media controller 126 via abus 127. Thestorage media controller 126 may manage groups of non-volatile storage elements 223 (logical storage elements 229). - The
storage media controller 126 may comprise a storagerequest receiver module 228 configured to receive storage requests from thestorage layer 130 via abus 127. Thestorage request receiver 228 may be further configured to transfer data to/from thestorage layer 130 and/orstorage clients 116 via thebus 127. Accordingly, the storagerequest receiver module 228 may comprise one or more direct memory access (“DMA”) modules, remote DMA modules, bus controllers, bridges, buffers, and so on. - The
storage media controller 126 may comprise adata write module 240 that is configured to store data on thenon-volatile storage media 222 in a contextual format. The requests may include and/or reference data to be stored on thenon-volatile storage media 222, may include logical interface of the data (e.g., LID(s) of the data), and so on. The data writemodule 240 may comprise acontextual write module 242 and awrite buffer 244. As described above, the contextual format may comprise storing a logical interface of the data (e.g., LID of the data) in association with the data on thenon-volatile storage media 222. In some embodiments, thecontextual write module 242 is configured to format data into packets, and may include the logical interface of the data in a packet header (or other packet field). Thewrite buffer 244 may be configured to buffer data for storage on thenon-volatile storage media 222. The data packets may comprise an arbitrary amount data. In some embodiments, thewrite buffer 244 may comprise one or more synchronization buffers to synchronize a clock domain of thestorage media controller 126 with a clock domain of the non-volatile storage media 222 (and/or bus 127). The data writemodule 240 may be configured to store data in arbitrarily-sized structures (packets) on thenon-volatile storage media 222. - The
log storage module 248 may be configured to select media storage location(s) for the data and may provide addressing and/or control information to thenon-volatile storage elements 223 via thebus 127. In some embodiments, thelog storage module 248 is configured to store data sequentially in a log format within the media address space of the non-volatile storage media. Thelog storage module 248 may be further configured to groom the non-volatile storage media, as disclosed above. - Upon writing data to the non-volatile storage media, the
storage media controller 126 may be configured to update storage metadata 135 (e.g., a forward index) to associate the logical interface of the data (e.g., the LIDs of the data) with the media address(es) of the data on thenon-volatile storage media 222. In some embodiments, thestorage metadata 135 may be maintained on thestorage media controller 126; for example, thestorage metadata 135 may be stored on thenon-volatile storage media 222, on a volatile memory (not shown), or the like. Alternatively, or in addition, thestorage metadata 135 may be maintained within the storage layer 130 (e.g., on avolatile memory 112 of thecomputing device 110 ofFIG. 1 ). In some embodiments, thestorage metadata 135 may be maintained in a volatile memory by thestorage layer 130, and may be periodically stored on thenon-volatile storage media 222. - The
storage media controller 126 may further comprise a data readmodule 241 that is configured to read contextual data from thenon-volatile storage media 222 in response to requests received via the storagerequest receiver module 228. The requests may comprise a LID of the requested data, a media address of the requested data, and so on. Thecontextual read module 243 may be configured to read data stored in a contextual format from thenon-volatile storage media 222 and to provide the data to thestorage layer 130 and/or astorage client 116. Thecontextual read module 243 may be configured to determine the media address of the data using a logical interface of the data and thestorage metadata 135. Alternatively, or in addition, thestorage layer 130 may determine the media address of the data and may include the media address in the request. Thelog storage module 248 may provide the media address to thenon-volatile storage elements 223, and the data may stream into the data readmodule 241 via theread buffer 245. The readbuffer 245 may comprise one or more read synchronization buffers for clock domain synchronization, as described above. - The
storage media controller 126 may further comprise amultiplexer 249 that is configured to selectively route data and/or commands to/from the data writemodule 240 and the data readmodule 241. In some embodiments,storage media controller 126 may be configured to read data while filling thewrite buffer 244 and/or may interleave one or more storage operations on one or more banks of non-volatile storage elements 223 (not shown). -
FIG. 3A is a block diagram depicting another embodiment of astorage layer 130. As illustrated inFIG. 3A , thenon-volatile storage elements 223 may be partitioned into storage divisions (e.g., erase blocks) 251, and eachstorage division 251 may be partitioned into a physical storage units (e.g., pages) 252. An exemplary physical storage unit (page) 251 may be capable of storing 2048 bytes (“2 kb”). Eachnon-volatile storage element 223 may further comprise one or more registers for buffering data to be written to apage 251 and/or data read from apage 251. In some embodiments, thenon-volatile storage elements 223 may be further arranged into a plurality of independent banks (not shown). - The
storage media controller 126 may manage thenon-volatile storage elements 223 as alogical storage element 229. Thelogical storage element 229 may be formed by coupling thenon-volatile storage elements 223 in parallel using thebus 127. Accordingly, storage operations may be performed on thenon-volatile storage elements 223 concurrently, and in parallel (e.g., data may be written to and/or read from thenon-volatile storage elements 223 in parallel). Thelogical storage element 229 may comprise a plurality of logical storage divisions (e.g., logical erase blocks) 253; each comprising a respective storage division of thenon-volatile storage elements 223. Thelogical storage divisions 253 may comprise a plurality of logical storage units (e.g., logical pages) 254; each comprising a respective physical storage unit of thenon-volatile storage elements 223. The storage capacity of alogical storage unit 253 may be a multiple of the number of parallelnon-volatile storage elements 223 comprising thelogical storage unit 253; for example, the capacity of a logical storage element comprised of 2 kb pages on 25non-volatile storage elements 223 is 50 kb. In other embodiments, comprising 25non-volatile storage elements 223 having a 8 kb page size, the logical page may have a storage capacity of 200 kb. - As disclosed herein, the
storage controller 140 may be configured to store data within large constructs, such aslogical storage divisions 253 and/orlogical storage units 254, formed from plurality non-volatile storage elements 123. Thestorage controller 140 may, therefore, be capable of handling data storage operations of different sizes, independent of the underlying physical partitioning and/or arrangement of the non-volatile storage elements 123. In some embodiments, for example, thestorage layer 130 may be configured to store data in 16 kb segments (sectors) withinlogical pages 254, despite the fact that the page size of the underlying non-volatile storage elements is only 2 kb. - Although
FIG. 3A depicts a particular embodiment of alogical storage element 229, the disclosure is not limited in this regard and could be adapted to differently sizedlogical storage elements 229 comprising any number ofnon-volatile storage elements 223. The size and number of erase blocks, pages, planes, or other logical and physical divisions within thenon-volatile storage elements 223 are expected to change over time with advancements in technology; it is to be expected that many embodiments consistent with new configurations are possible and are consistent with the embodiments disclosed herein. - As described above, the
contextual write module 242 may be configured to store data in a contextual format. In some embodiments, the contextual format comprises a packet format.FIG. 3B depicts one example of a contextual data format (packet format 360). Apacket 360 includes data (e.g., a data segment 362) that is associated with one or more LIDs. In some embodiments, thedata segment 362 comprises compressed, encrypted, and/or whitened data. Thedata segment 362 may be a predetermined size (e.g., a fixed data “block” or “segment” size) or a variable size. Thepacket 360 may comprisepersistent metadata 364 that is stored on thenon-volatile storage media 222 with the data segment 362 (e.g., in a header of thepacket format 360 as depicted inFIG. 3B ). Thepersistent metadata 364 may includelogical interface metadata 365 that defines the logical interface of thedata segment 362. Thelogical interface metadata 365 may associate thedata segment 362 with one or more LIDs, LID references (e.g., reference entries), a range, a size, and so on. Thelogical interface metadata 365 may be used to determine the context of the data independently of thestorage metadata 135 and/or may be used to reconstruct the storage metadata 135 (e.g., reconstruct the “any-to-any” mappings, described above). Thepersistent metadata 364 may comprise other metadata, which may include, but are not limited to: data attributes (e.g., an access control list), data segment delimiters, signatures, links, metadata flags 367 (described below), and the like. - In some embodiments, the
packet 360 may be associated withlog sequence indicator 368. Thelog sequence indicator 368 may be persisted on the non-volatile storage media (e.g., page) with thedata packet 360 and/or on the storage division (e.g., erase block) of thedata packet 360. Alternatively, thesequence indicator 368 may be persisted in a separate storage division. In some embodiments, asequence indicator 368 is applied when a storage division reclaimed (e.g., erased, when the first or last storage unit is programmed, etc.). Thelog sequence indicator 368 may be used to determine an order of thepacket 360 in a sequence of storage operations performed on thenon-volatile storage media 222, as described above. - Referring back to
FIG. 3A , thecontextual write module 242 may be configured to generate data packets of any suitable size. Data packets may be of a fixed size or a variable size. Due to the independence between the logical interface of data and the underlying media storage location of the data, the size of the packets generated by thecontextual write module 242 may be independent of the underlying structure and/or partitioning of thenon-volatile storage media 222. - The data write
module 240 may further comprise anECC write module 346, which may be configured to encode the contextual data (e.g., data packets) into respective error-correcting code (“ECC”) words or chunks. The ECC encoding may be configured to detect and/or correct errors introduced through transmission and storage of data on thenon-volatile storage media 222. In some embodiments, data packets stream to theECC write module 346 as un-encoded blocks of length N (“ECC blocks”). An ECC block may comprise a single packet, multiple packets, or a portion of one or more packets. TheECC write module 346 may calculate a syndrome of length S for the ECC block, which may be appended and streamed as an ECC chunk of length N+S. The values of N and S may be selected according to testing and experience and may be based upon the characteristics of the non-volatile storage media 222 (e.g., error rate of the media 222) and/or performance, efficiency, and robustness constraints. The relative size of N and S may determine the number of bit errors that can be detected and/or corrected in an ECC chunk. - In some embodiments, there is no fixed relationship between the ECC input blocks and the packets; a packet may comprise more than one ECC block; the ECC block may comprise more than one packet; a first packet may end anywhere within the ECC block, and a second packet may begin after the end of the first packet within the same ECC block. The ECC algorithm implemented by the
ECC write module 346 and/or ECC readmodule 347 may be dynamically modified and/or may be selected according to a preference (e.g., communicated via the bus 127), in a firmware update, a configuration setting, or the like. - The ECC read
module 347 may be configured to decode ECC chunks read from thenon-volatile storage medium 222. Decoding an ECC chunk may comprise detecting and/or correcting errors therein. Thecontextual read module 243 may be configured to depacketize data packets read from thenon-volatile storage media 222. Depacketizing may comprise removing and/or validating contextual metadata of the packet, such as thelogical interface metadata 365, described above. In some embodiments, thecontextual read module 243 may be configured to verify that the logical interface information in the packet matches a LID in the storage request. - In some embodiments, the
log storage module 248 is configured to store contextual formatted data, sequentially, in a log format. As described above, log storage refers to storing data in a format that defines an ordered sequence of storage operation, which may comprise storing data at sequential media addresses within the media address space of the non-volatile storage media (e.g., sequentially within one logical storage units 254). Alternatively, or in addition, sequential storage may refer to storing data in association with a sequence indicator, such as a sequence number, timestamp, or the like, such as thesequence indicator 368, described above. - The
log storage module 248 may store data sequentially at an append point. An append point may be located where data from thewrite buffer 244 will next be written. Once data is written at an append point, the append point moves to the end of the data. This process typically continues until a logical eraseblock 253 is full. The append point is then advanced to the next available logical eraseblock 253. The sequence of writing to logical erase blocks is maintained (e.g., using sequence indicators) so that if thestorage metadata 135 is corrupted or lost, the log sequence of storage operations data be replayed to rebuild the storage metadata 135 (e.g., rebuild the “any-to-any” mappings of the storage metadata 135). -
FIG. 3C depicts one example of sequential, log-based data storage.FIG. 3C depicts aphysical storage space 302 of a non-volatile storage media, such as thenon-volatile storage media 222 ofFIG. 3A . Thephysical storage space 302 is arranged into storage divisions (e.g., logical eraseblocks 253A-N), each of which can be initialized (e.g., erased) in a single operation. As described above, each logical eraseblock 253A-N may comprise an eraseblock 251 of a respectivenon-volatile storage element 223, and each logical eraseblock 253A-N may comprise a plurality of logical storage units (e.g., logical pages) 254. As described above, eachlogical page 254 may comprise a page of a respectivenon-volatile storage element 223. Storage element delimiters are omitted fromFIG. 3C to avoid obscuring the details of the embodiment. - The
logical storage units 254 may be assigned respective media addresses; in theFIG. 3C example, the media addresses range from zero (0) to N. Thelog storage module 248 may store data sequentially, at theappend point 380; data may be stored sequentially within thelogical page 382 and, when thelogical page 382 is full, theappend point 380advances 381 to the next available logical page in the logical erase block, where the sequential storage continues. Each logical eraseblock 253A-N may comprise a respective sequence indicator. Accordingly, the sequential storage operations may be determined based upon the sequence indicators of the logical eraseblocks 253A-N, and the sequential order of data within each logical eraseblock 253A-N. - As used herein, an “available” logical page refers to a logical page that has been initialized (e.g., erased) and has not yet been programmed. Some
non-volatile storage media 222 can only be reliably programmed once after erasure. Accordingly, an available logical erase block may refer to a logical erase block that is in an initialized (or erased) state. The logical eraseblocks 253A-N may be reclaimed by a groomer (or other process), which may comprise erasing the logical eraseblock 253A-N and moving valid data thereon (if any) to other storage locations. Reclaiming logical eraseblock 253A-N may further comprise marking the logical eraseblock 253A-N with a sequence indicator, as described above. - The logical erase
block 253B may be unavailable for storage due to, inter alia: not being in an erased state (e.g., comprising valid data), being out-of service due to high error rates or the like, and so on. In theFIG. 3C example, after storing data on thephysical storage unit 382, theappend point 380 may skip the unavailable logical eraseblock 253B, and continue at the next available logical eraseblock 253C. Thelog storage module 248 may store data sequentially starting at logical page 383, and continuing throughlogical page 385, at which point theappend point 380 continues at a next available logical erase block, as described above. - After storing data on the “last” storage unit (e.g.,
storage unit N 389 ofstorage division 253N), theappend point 380 wraps back to thefirst division 253A (or the next available storage division, ifstorage division 253A is unavailable). Accordingly, theappend point 380 may treat themedia address space 302 as a loop or cycle. - As disclosed above, the
storage controller 140 may be configured to modify and/or overwrite data out-of-place. Accordingly, a storage request to overwrite data A stored atphysical storage location 391 with data A′ may be stored out-of-place on a different location (media address 393) within thephysical address space 302. Storing the data A′ may comprise updating the storage metadata 150 to associate A′ with thenew media address 393 and/or to invalidate the data A atmedia address 391. Thegroomer module 370 may be configured to scan thephysical address space 370 to reclaim storage resources comprising invalidated data that no longer needs to be preserved on thestorage device 120, such as the obsolete version of data A atmedia address 391. Thestorage metadata 135 may be reconstructed based on contextual, log-based storage format disclosed herein. In theFIG. 3C embodiment, the current version of data A′ may be distinguished from the obsolete data A based on log ordering information on thestorage device 120. Accordingly, the reconstructed index may identify the data A′ atmedia address 393 as the current, valid version of the data, and determine that the data A atmedia address 391 is obsolete and can be removed from the device. - Referring back to
FIG. 3A , the storage controller 324 may comprise agroomer module 380 that is configured to reclaim logical erase blocks, as described above. Thegroomer module 380 may monitor the non-volatile storage media and/orstorage metadata 135 to identify logical eraseblocks 253 for reclamation. Thegroomer module 370 may reclaim logical erase blocks in response to detecting one or more conditions, which may include, but are not limited to: a lack of available storage capacity, detecting a percentage of data marked as invalid within a particular logical eraseblock 253 reaching a threshold, a consolidation of valid data, an error detection rate reaching a threshold, improving data distribution, data refresh, or the like. - The
groomer module 370 may operate outside of the path for servicing storage operations and/or requests. Therefore, thegroomer module 370 may operate as an autonomous, background process, which may be suspended and/or deferred while other storage operations are in process. Thegroomer 370 may manage thenon-volatile storage media 222 so that data is systematically spread throughout the logical eraseblocks 253, which may improve performance and data reliability and to avoid overuse and underuse of any particular storage locations, thereby lengthening the useful life of the solid-state storage media 222 (e.g., wear-leveling, etc.). Although thegroomer module 370 is depicted in thestorage layer 130, the disclosure is not limited in this regard. In some embodiments, thegroomer module 370 may operate on thestorage media controller 126, may comprise a separate hardware component, or the like. - In some embodiments, the
groomer 370 may interleave grooming operations with other storage operations and/or requests. For example, reclaiming a logical eraseblock 253 may comprise relocating valid data thereon to another storage location. The groomer read and groomerwrite bypass modules module 241 and then be transferred directly to the data writemodule 240 without being routed out of thestorage media controller 126. - The groomer read
bypass module 363 may coordinate reading data to be relocated from a reclaimed logical eraseblock 253. Thegroomer module 370 may be configured to interleave relocation data with other data being written to thenon-volatile storage media 222 via thegroomer write bypass 362. Accordingly, data may be relocated without leaving thestorage media controller 126. In some embodiments, thegroomer module 370 may be configured to fill the remainder of a logical page (or other data storage primitive) with relocation data, which may improve groomer efficiency, while minimizing the performance impact of grooming operations. - The
storage layer 130 may further comprise adeduplication module 374, which may be configured to identify duplicated data on thestorage device 120. Thededuplication module 374 may be configured to identify duplicated data and to modify a logical interface of the data, such that one or more LIDs reference the same set of data on thestorage device 120 as opposed to referencing separate copies of the data. Thededuplication module 374 may operate outside of the path for servicing storage operations and/or requests, as described above. - As described above, the storage controller may maintain an index corresponding to the
logical address space 136.FIG. 3D depicts one example of such anindex 1204. Theindex 1204 may comprise a one ormore entries 1205A-N. Eachentry 1205A may correspond to a LID (or LID range or extent) 1217 in thelogical address space 136. Theentries 1205A-N may represent LIDs that have been allocated for use by one ormore storage clients 116. Theindex 1204 may comprise “any-to-any” mappings between LIDs and media storage locations on one ormore storage devices 120. For example, theentry 1205B binds LIDs 072-083 to media storage locations 95-106. Anentry 1205D may represent a LID that has been allocated, but has not yet been used to store data, and as such, the LIDs may not be bound to any particular media storage locations (e.g., the LIDs 178-192 are “unbound”). As described above, deferring the allocation of physical storage resources may allow thestorage controller 140 to more efficiently manage storage resources (e.g., prevent premature reservation of physical storage resources, so that the storage resources are available to other storage clients 116). One or more of theentries 1205A-N may comprise additional metadata 1219, which may include, but is not limited to: access control metadata (e.g., identify the storage client(s) authorized to access the entry), reference metadata, logical interface metadata, and so on. Theindex 1204 may be maintained by the storage layer 130 (e.g., translation module 134), and may be embodied asstorage metadata 135 on avolatile memory 112 and/or a non-transitory machine-readable storage media 114 and/or 120. - The
index 1204 may be configured to provide for fast and efficient entry lookup. Theindex 1204 may be implemented using one or more datastructures, including, but not limited to: a B-tree, a content addressable memory (“CAM”), a binary tree, a hash table, or other datastructure that facilitates quickly searching a sparsely populated logical address space. The datastructure may be indexed by LID, such that, given a LID, theentry 1205A-N corresponding to the LID (if any) can be identified in a computationally efficient manner. - In some embodiments, the
index 1204 comprises one or more entries (not shown) to represent unallocated LIDs (e.g., LIDs that are available for allocation by one or more storage clients 116). The unallocated LIDs may be maintained in theindex 1204 and/or in aseparate index 1444 as depicted inFIG. 14 . In some embodiments, theindex 1204 may comprise one or more sub-indexes, such as a “reference index.” As described below, thereference index 1222 may comprise data that is being referenced by one or moreother entries 1205A-N in the index (e.g., indirect references). Although particular examples and datastructures ofstorage metadata 135 are described herein, the disclosure is not limited in this regard; thestorage layer 130 may be configured to incorporate any type of storage metadata embodied using any suitable datastructure. -
FIG. 4 is a schematic block diagram illustrating an embodiment of anapparatus 400 to allocate data storage space. Theapparatus 400 includes anallocation request module 402, alogical capacity module 404, and anallocation reply module 406, which are described below. Theallocation request module 402, thelogical capacity module 404, and theallocation reply module 406 are depicted in thestorage layer 130 in general, but all or part of theallocation request module 402, thelogical capacity module 404, and theallocation reply module 406 may be in astorage layer 130,storage media controller 126, or the like. - The
apparatus 400 includes anallocation request module 402 that receives from a requesting device an allocation request to allocate logical capacity. The requesting device may bestorage client 116, or any other device or component capable of sending an allocation request. Thestorage layer 130 may comprise and/or be communicatively coupled to one or more storage devices 120 (as depicted inFIG. 1 ). The logical capacity associated with the allocation request may refer to storing data on aparticular storage device 120 or on any of a plurality ofstorage devices 120A-N. - The allocation request may include a logical allocation request or may include a request to store data. A logical allocation request may comprise a request to allocate LIDs to a
storage client 116. A data storage request may comprise a request to store data corresponding to one or more LIDs that are allocated to thestorage client 116, which are then bound to media storage locations. As described above, binding the LIDs may comprise associating the LIDs with media storage locations comprising the data in an index maintained in the storage metadata 135 (e.g., the index 1204). The LIDs may be bound to media storage locations at the time of allocation (e.g., the allocation request may comprise a request to store data). Alternatively, where the allocation request is separate from a request to store data, allocating LIDs to the data may be in a separate step from binding the LIDs to the media storage locations. In some embodiments, the request comes from a plurality ofstorage clients 116, consequently a client identifier may be associated with the request, theapparatus 400 may use the client identifier to implement an access control with respect to allocations for thatstorage client 116 and/or with respect to the LIDs available to allocate to thestorage client 116. In addition, the client identifier may be used to manage how much physical capacity is allocated to aparticular storage client 116 or set ofstorage clients 116. - The
apparatus 400 includes alogical capacity module 404 that determines if alogical address space 136 of the data storage device includes sufficient unallocated logical capacity to satisfy the allocation request. Thelogical capacity module 404 may determine if thelogical address space 136 has sufficient unbound and/or unallocated logical capacity using an index (or other datastructure) maintaining LID bindings and/or LID allocations. In some embodiments, thelogical capacity module 404 may search a logical-to-physical map or index maintained in thestorage metadata 135 and/or anunallocated index 1444 described below. - As described above, unbound LIDs may refer to LIDs that do not correspond to valid data stored on a media storage location. An unbound LID may be allocated to a
client 116 or may be unallocated. In some embodiments, the logical-to-physical map is configured such that there are no other logical-to-logical mappings between the LIDs in the map and media addresses associated with the LIDs. - In some embodiments, the
logical capacity module 404 searches the logical-to-physical index 1204 (or other datastructure) to identify unbound LIDs and identifies unallocated logical space therein. For example, if alogical address space 136 includes a range of logical addresses from 0000 to FFFF and the logical-to-physical map indicates that thelogical addresses 0000 to F000 are allocated and bound, thelogical capacity module 404 may determine that LIDs F001 to FFFF are not allocated. If the LIDs F001 to FFFF are not allocated to anotherstorage client 116, they may be available for allocation to satisfy the allocation request. - In some embodiments, the
translation module 134 may maintain a plurality of different logical address spaces, such as a separate logical address space eachstorage client 116. Accordingly, eachstorage client 116 may operate in its own, separatelogical storage space 136. Thestorage layer 130 may, therefore, comprise separate storage metadata 135 (e.g., indexes, capacity indicators, and so on), for each storage client 116 (or group of storage clients 116).Storage clients 116 may be distinguished by an identifier, which may include, but is not limited to: an address (e.g., network address), credential, name, context, or other identifier. The identifiers may be provided in storage requests and/or may be associated with a communication channel or protocol used by thestorage client 116 to access thestorage layer 130. - In some embodiments, the index 1204 (or other datastructure) may comprise an allocation index or allocation entries configured to track logical capacity allocations that have not yet been bound to media storage locations. For example, a LID (or other portion of logical capacity) may be allocated to a client, but may not be associated with data stored on a
storage device 120. Accordingly, although the logical capacity maybe allocated, it may be “unbound,” and as such, may not be included in the logical-to-physical index. Accordingly, when determining the unallocatedlogical address space 136, thelogical capacity module 404 may consult additional datastructures (e.g., allocation index, allocation entries, and/or an unallocated index 1444). Alternatively, the allocation entry may be included in the logical-to-physical index (e.g.,entry 1205D), and may comprise an indicator showing that the entry is not bound to any particular media storage locations. - An allocation request may include a request for a certain number of LIDs. The
logical capacity module 404 may determine if the available logical capacity (e.g., unbound and/or unallocated logical capacity) is sufficient to meet or exceed the requested amount of logical addresses. In another example, if the allocation request specifies a list or range of LIDs to allocate, thelogical capacity module 404 can determine if the LIDs for all or a portion of the LIDs requested are unallocated or unbound. - The
apparatus 400 may further comprise anallocation reply module 406 that communicates a reply to the requesting device indicating whether the request can be satisfied. For example, if thelogical capacity module 404 determines that the unallocated logical space is insufficient to satisfy the allocation request, theallocation reply module 406 may include in the reply that the allocation request failed, and if thelogical capacity module 404 determines that the unallocated logical space is sufficient to satisfy the allocation request (and/or the specified LIDs are unallocated), theallocation reply module 406 may include in the reply an affirmative response. An affirmative response may comprise a list of allocated LIDs, a range of LIDs, or the like. - In some embodiments, the allocation request is for a specific group of LIDs and the
allocation reply module 406 may reply with the requested LIDs. In another embodiment, the allocation request is part of a write request. In one case the write request includes specific LIDs and theallocation reply module 406 may reply with the requested LIDs. In another case the write request only includes data or an indication of an amount of data and theallocation reply module 406 may reply by allocating LIDS sufficient for the write request and returning the allocated LIDS. Alternatively, if an indication of an amount of data is provided the reply may include LIDs that are unallocated. Theallocation reply module 406 may reply before or after the data is written. If theallocation reply module 406 sends a reply after the data is written, the reply may be part of a confirmation of writing the data. One of skill in the art will recognize other ways that theallocation reply module 406 may reply in response to thelogical capacity module 404 determining if the logical space of the data storage device has sufficient unallocated logical space to satisfy an allocation request. - The
storage layer 130 may expose portions of the logical address space maintained by the translation module 134 (e.g., index 1204) directly tostorage clients 116 via the virtual storage interface 132 (or other interface). Thestorage clients 116 may use thevirtual storage interface 132 to perform various functions including, but not limited to: identifying available logical capacity (e.g., particular LIDs or general LID ranges), determining available physical capacity, querying the health of thestorage media 122, identifying allocated LIDs, identifying LIDs that are bound to media storage locations, etc. Theinterface 138 can expose all or a subset of the features and functionality of theapparatus 400 directly to clients which may leverage thevirtual storage interface 132 to delegate management of thelogical address space 136 and/or LIDs to thestorage layer 130. -
FIG. 5 is a schematic block diagram illustrating another embodiment of anapparatus 500 to allocate data storage space. Theapparatus 500 includes anallocation request module 402, alogical capacity module 404, and anallocation reply module 406, which are substantially similar to those described above in relation to theapparatus 400 ofFIG. 4 . In addition, theapparatus 500 includes a physicalcapacity request module 502, a physicalcapacity allocation module 504, a physicalcapacity reply module 506, anallocation module 508, an allocationquery request module 510, an allocationquery determination module 512, an allocationquery reply module 514, a logicalspace management module 516, amapping module 518, a physical spacereservation request module 520, a physicalspace reservation module 522, a physical spacereservation return module 524, a physical spacereservation cancellation module 526, aLID binding module 528, aDMA module 530, and adeletion module 532, which are described below. The modules 402-406 and 502-532 of theapparatus 500 ofFIG. 5 may be included in thestorage layer 130, astorage media controller 126, or any other appropriate location known to one of skill in the art. - The
apparatus 500 includes, in one embodiment, a physicalcapacity request module 502, a physicalcapacity allocation module 504, and a physicalcapacity reply module 506. The physicalcapacity request module 502 receives from a requesting device a physical capacity request. The physical capacity request is received at the data storage device and includes a request of an amount of available physical storage capacity in the data storage device (and/or physical storage capacity allocated to the requesting device). The physical capacity request may include a quantity of physical capacity or may indirectly request physical storage capacity, for example by indicating a size of a data unit to be stored. Another indirect physical storage capacity request may include logical addresses of data to be stored which may correlate to a data size. One of skill in the art will recognize other forms of a physical capacity request. - The physical
capacity allocation module 504 determines the amount of available physical storage capacity on one ormore storage devices 120 and/or 120A-N. The amount of available physical storage capacity includes a physical storage capacity of unbound media storage locations. In some embodiments, the amount of available physical storage capacity may be “budgeted,” for example, only a portion of the physical storage capacity of astorage device 120 may be available to the requesting device. The amount of available physical storage capacity may be budgeted based on a quota associated with eachstorage client 116 or group ofstorage clients 116. Theapparatus 500 may enforce these quotas. The allocation of available physical storage device may be determined by configuration parameter(s), may be dynamically adjusted according to performance and/or quality of service policies, or the like. - The physical
capacity allocation module 504 may determine the amount of available physical storage capacity using an index (or other datastructure), such as theindex 1204 described above.Index 1204 may identify the media storage locations that comprise valid data (e.g.,entries 1205A-N that comprise bound media storage locations). The available storage capacity may be a total (or budgeted) physical capacity minus the capacity of the bound media storage locations. Alternatively, or in addition, an allocation index (or other datastructure) may maintain an indicator of the available physical storage capacity. The indicator may be updated responsive to storage operations performed on the storage device including, but not limited to: grooming operations, deallocations (e.g., TRIM), writing additional data, physical storage capacity reservations, physical storage capacity reservation cancellations, and so on. Accordingly, themodule 504 may maintain a “running total” of available physical storage capacity that is available on request. - The physical
capacity reply module 506 that communicates a reply to the requesting device in response to the physicalcapacity allocation module 504 determining the amount of available physical storage capacity on the data storage device. - The physical
capacity allocation module 504, in one embodiment, tracks bound media storage locations, unbound media storage locations, reserved physical storage capacity, unreserved physical storage capacity, and the like. The physicalcapacity allocation module 504 may track these parameters using a logical-to-physical map, a validity map, a free media address pool, a used media address pool, a physical-to-logical map, or other means known to one of skill in the art. - The reply may take many forms. In one embodiment where the physical capacity request includes a request for available physical capacity, the reply may include an amount of available physical storage capacity. In another embodiment where the physical capacity request includes a specific amount of physical capacity, the reply may include an acknowledgement that the data storage device has the requested available physical storage capacity. One of skill in the art will recognize other forms of a reply in response to a physical capacity request.
- The
apparatus 500 with a physicalcapacity request module 502, a physicalcapacity allocation module 504, and a physicalcapacity reply module 506 is advantageous forstorage devices 120 where a logical-to-physical mapping is not a one-to-one mapping. In a typical random access device where read and write requests include one or more LBAs, a fileserver storage client 116 may track physical storage capacity of astorage device 120 by tracking the LBAs that are bound to media storage locations. - For a log storage system where multiple media storage locations can be mapped to a single LID (e.g., multiple versions of data mapped to a LID) or vice versa (e.g., multiple LIDs to a the same media storage locations) tracking LIDs may not provide any indication of physical storage capacity. These many-to-one relationships may be used to support snap shots, cloning (e.g., logical copies), deduplcation and/or backup. Examples of systems and methods for managing many-to-one LID to media storage location logical interfaces are disclosed in further detail below. The
apparatus 500 may track available physical storage space and may communicate the amount of available physical storage space tostorage clients 116, which may allow thestorage clients 116 to offload allocation management and physical capacity management to thestorage layer 130. - In some embodiments, media storage locations are bound to corresponding LIDs. When data is stored in response to a write request, LIDs associated with the data are bound to the media storage location where the data is stored. For a log-structured file system where data is stored sequentially, the location where the data is stored is not apparent from the LID, even if the LID is an LBA. Instead, the data is stored at an append point and the address where the data is stored is mapped to the LID. If the data is a modification of data stored previously, the LID may be mapped to the current data as well as to a location where the old data is stored. There may be several versions of the data mapped to the same LID.
- The
apparatus 500, in one embodiment, includes anallocation module 508 that allocates the unallocated logical space sufficient to satisfy the allocation request of the requesting device. Theallocation module 508 may allocate the unallocated logical space in response to thelogical capacity module 404 determining that the logical space has sufficient unallocated logical space to satisfy the allocation request. - In one embodiment, the allocation request is part of a pre-allocation where logical space is not associated with a specific request to store data. For example, a
storage client 116 may request, using an allocation request, logical space and then may proceed to store data over time to the allocated logical space. Theallocation module 508 allocates LIDs to thestorage client 116 in response to an allocation request and to thelogical capacity module 404 determining that the logical space has sufficient unallocated logical space to satisfy the allocation request. - The
allocation module 508 may also allocate LIDs based on an allocation request associated with a specific storage request. For example, if a storage request includes specific LIDs and thelogical capacity module 404 determines that the LIDs are available, theallocation module 508 may allocate the LIDs in conjunction with storing the data of the storage request. In another example, if the storage request does not include LIDs and thelogical capacity module 404 determines that there are sufficient LIDs to for the storage request, theallocation module 508 may select and allocate LIDs for the data and theallocation reply module 406 may communicate the allocated LIDs. - The
allocation module 508 may be configured to locate unallocated LIDs to satisfy an allocation request. In some embodiments, theallocation module 508 may identify unallocated LIDs by receiving a list of requested LIDs to allocate from thestorage client 116 and verify that these LIDs are available for allocation. In another example, theallocation module 508 may identify unallocated LIDs by searching for unallocated LIDs that meet criteria received in conjunction with the request. The criteria may be LIDs that are associated with aparticular storage device 120A-N, that are available in a RAID, that have some assigned metadata characteristic, etc. - In another example, the
allocation module 508 may identify unallocated LIDs by creating a subset of LIDs that meet criteria received in conjunction with the request identified in a pool of available LIDs. In one instance, the LIDs may be a subset of LIDs that have already been allocated to theclient 116. For example, if a set or group of LIDs is allocated to a particular user, group, employer, etc., a subset of the LIDs may be allocated. A specific example is if a set of LIDs is allocated to an organization and then a subset of the allocated LIDs is further allocated to a particular user in the organization. One of skill in the art will recognize other ways that theallocation module 508 can identify one or more unallocated LIDs. - The
allocation module 508, in one embodiment, can expand the LIDs allocated to astorage client 116 by allocating LIDs in addition to LIDs already allocated to thestorage client 116. In addition, LIDs allocated to astorage client 116 may be decreased by deallocating certain LIDs so that they return to a pool of unallocated LIDs. In other embodiments, subsets of allocated LIDs may be allocated, deallocated, increased, decreased, etc. For example, LIDs allocated to a user in an organization may be deallocated so that the LIDs allocated to the user are still allocated to the organization but not to the user. - The
apparatus 500, in one embodiment, includes an allocationquery request module 510, an allocationquery determination module 512, and an allocationquery reply module 514. The allocationquery request module 510 receives an allocation query from some requesting device, such as astorage client 116, etc. An allocation query may include a request for information about allocating logical space or associated management of the allocated logical space. For example, an allocation query may be a request to identify allocated LIDs, identify bound LIDs, identify allocated LIDs that are not bound to media storage locations, unallocated LIDs or a range of LIDs, and the like. - The allocation query may include information about logical allocation, logical capacity, physical capacity, or other information meeting criteria in the allocation query. The information may include metadata, status, logical associations, historical usage, flags, control, etc. One of skill in the art will recognize other allocation queries and the type of information returned in response to the allocation query.
- The allocation query includes some type of criteria that allows the allocation
query determination module 512 to service the allocation request. The allocationquery determination module 512, in one embodiment, identifies one or more LIDs that meet the criteria specified in the allocation query. The identified LIDs include allocated LIDs that are bound to media storage locations, allocated LIDs that are unbound, unallocated LIDs, and the like. - The allocation
query reply module 514 communicates to theclient 110 the results of the query to the requesting device or to another device as directed in the allocation query. The results of the allocation query may include a list of the identified LIDs, an acknowledgement that LIDs meeting the criteria were found, an acknowledgement that LIDs meeting the criteria in the allocation query were not found, bound/unbound status of LIDs, logical storage capacity, or the like. Typically the allocationquery reply module 514 returns status information and the information returned may include any information related to managing and allocating LIDs known to those of skill in the art. - The
apparatus 500, in another embodiment, includes a logicalspace management module 516 that manages the logical space of the data storage device from within the data storage device. For example, the logicalspace management module 516 may manage the logical space from astorage layer 130 or driver associated with astorage device 120 of the data storage device. The logicalspace management module 516 may track unbound LIDs and bound LIDs, for example, in the logical-to-physical map, in an index, or in another datastructure. As described above, a bound LID refers to a LID corresponding to data; a bound LID is a LID associated with valid data stored on a media storage location of thestorage device 120. - The logical
space management module 516, in various embodiments, may service allocation requests and allocation queries as described above, and other functions related to allocation. The logicalspace management module 516 can also include receiving a deallocation request from a requesting device. The deallocation request typically includes a request to return one or more allocated LIDs to an unallocated state and then communicating to the requesting device, or other designated device, the successful deallocation. The deallocation request may include a request to return one or more storage locations associated with the LIDs allocated, and then communicating to the requesting device, or other designated device, the successful deallocation. This might be transparent, or might require that the deallocation request be extended to include an indication that a logical/physical deallocation should accompany the request. Deallocation requests may be asynchronous and tied to the groomer. Thus, the deallocation request may be virtual (in time) until completed. The management of the allocations (logical and physical) may diverge from the actual available space at any point in time. Themanagement module 516 is configured to deal with these differences. - The logical
space management module 516 may also receive a LID group command request from a requesting device and may communicate to the requesting device a reply indicating a response to the LID group command request. The LID group command request may include an action to take on, for example, two or more LIDs (“LID group”), metadata associated with the LID group, the data associated with the LID group, and the like. For example, if several users are each allocated LIDs and the users are part of a group, a LID group command may be to deallocate the LIDs for several of the users, allocate additional LIDs to each user, return usage information for each user, etc. The action taken in response to the LID group command may also include modifying the metadata, backing up the data, backing up the metadata, changing control parameters, changing access parameters, deleting data, copying the data, encrypting the data, deduplicating the data, compressing the data, decompressing the data, etc. One of skill in the art will recognize other logical space management functions that the logicalspace management module 516 may also perform. - The
apparatus 500, in one embodiment, includes amapping module 518 that binds, in a logical-to-physical map (e.g., the index 1204), bound LIDs to media storage locations. Thelogical capacity module 404 determines if the logical space has sufficient unallocated logical space using the logical-to-physical map mapped by themapping module 518. Theindex 1204 may be used to track allocation of the bound LIDs, the unbound LIDs, the allocated LIDs, the unallocated LIDs, the allocated LID capacity, the unallocated LID capacity, and the like. In one embodiment, themapping module 518 binds LIDs to corresponding media addresses in multiple indexes and/or maps. - In addition, a reverse map may be used to quickly access information related to a media address and to link to a LID associated with the media address. The reverse map may be used to identify a LID from a media address. A reverse map may be used to map addresses in a
data storage device 120 into erase regions, such as erase blocks, such that a portion of the reverse map spans an erase region of thestorage device 120 erased together during a storage space recovery operation. Organizing a reverse map by erase regions facilitates tracking information useful during grooming operations. For example, the reverse map may include which media addresses in an erase region have valid data and which have invalid data. When valid data is copied from an erase region and the erase region erased, the reverse map can easily be changed to indicate that the erase region does not include data and is ready for sequential storage of data. - A more detailed discussion of forward and reverse mapping is included in U.S. patent application Ser. No. 12/098,434, titled “Apparatus, System, and Method for Efficient Mapping of Virtual and Media addresses, Non-Volatile Storage,” to David Flynn, et al., filed Apr. 8, 2008, which is incorporated herein by reference. By including any-to-any mappings between LIDs and media addresses, the
storage layer 130 efficiently consolidates functions such as thin provisioning, allocation functions, etc. that have traditionally been handled by other entities. Themapping module 518 may, therefore, provide an efficient way to eliminate layers of mapping used in traditional systems. - In a thinly provisioned storage system, one potential problem is that a
storage client 116 may attempt to write data to a storage device only to have the write request fail because the storage device is out of available physical storage capacity. For random access devices where the file server/file system tracks available physical storage capacity relying on the one-to-one mapping of LBAs to PBAs, the likelihood of a storage device running out of storage space is very low. Theapparatus 500 includes a physical spacereservation request module 520, located in thestorage layer 130, that receives a request from astorage client 116 to reserve available physical storage capacity on the data storage device (i.e. thestorage device 120 that is part of the data storage device) [hereinafter a “physical space reservation request”]. In one embodiment, the physical space reservation request includes an indication of an amount of physical storage capacity requested by thestorage client 116. - The indication of an amount of physical storage capacity requested may be expressed in terms of physical capacity. The request to reserve physical storage capacity may also include a request to allocate the reserved physical storage capacity to a logical entity. The indication of an amount of physical storage capacity may be expressed indirectly as well. For example, a
storage client 116 may indicate a number of logical blocks and the data storage device may determine a particular fixed size for each logical block and then translate the number of logical blocks to a physical storage capacity. One of skill in the art will recognize other indicators of an amount of physical storage capacity in a physical space reservation request. - The physical space reservation request, in one embodiment, is associated with a write request. In one embodiment, the write request is a two-step process, and the physical space reservation request and the write request are separate. In another embodiment, the physical space reservation request is part of the write request or the write request is recognized as having an implicit physical space reservation request. In another embodiment, the physical space reservation request is not associated with a specific write request, but may instead be associated with planned storage, reserving storage space for a critical operation, etc., where mere allocation of storage space is insufficient.
- In certain embodiments, the data may be organized into atomic data units. For example, the atomic data unit may be a packet, a page, a logical page, a logical packet, a block, a logical block, a set of data associated with one or more logical block addresses (the logical block addresses may be contiguous or noncontiguous), a file, a document, or other grouping of related data.
- In one embodiment, an atomic data unit is associated with a plurality of noncontiguous and/or out of order logical block addresses or other identifiers that the data write
module 240 handles as a single atomic data unit. As used herein, writing noncontiguous and/or out of order logical blocks in a single write operation is referred to as an atomic write. In one embodiment, a hardware controller processes operations in the order received and a software driver of the client sends the operations to the hardware controller for a single atomic write together so that the data writemodule 240 can process the atomic write operation as normal. Because the hardware processes operations in order, this guarantees that the different logical block addresses or other identifiers for a given atomic write travel through the data writemodule 240 together to the nonvolatile memory. The client, in one embodiment, can back out, reprocess, or otherwise handle failed atomic writes and/or other failed or terminated operations upon recovery once power has been restored. - In one embodiment,
apparatus 500 may mark blocks of an atomic write with a metadata flag indicating whether a particular block is part of an atomic write. One example of metadata marking is to rely on the log write/append only protocol of the nonvolatile memory together with a metadata flag, or the like. The use of an append only log for storing data and prevention of any interleaving blocks enables the atomic write membership metadata to be a single bit. In one embodiment, the flag bit may be a 0, unless the block is a member of an atomic write, and then the bit may be a 1, or vice versa. If the block is a member of an atomic write and is the last block of the atomic write, in one embodiment, the metadata flag may be a 0 to indicate that the block is the last block of the atomic write. In another embodiment, different hardware commands may be sent to mark different headers for an atomic write, such as the first block in an atomic write, middle member blocks of an atomic write, tail of an atomic write, or the like. - On recovery from a power loss or other failure of the client or of the storage device, in one embodiment, the
apparatus 500 scans the log on the nonvolatile storage in a deterministic direction (for example, in one embodiment the start of the log is the tail and the end of the log is the head and data is always added at the head). In one embodiment, the power management apparatus scans from the head of the log toward the tail of the log. For atomic write recovery, in one embodiment, when scanning head to tail, if the metadata flag bit is a 0, then the block is either a single block atomic write or a non-atomic write block. In one embodiment, once the metadata flag bit changes from 0 to 1, the previous block scanned and potentially the current block scanned are members of an atomic write. The power management apparatus, in one embodiment, continues scanning the log until the metadata flag changes back to a 0; at that point in the log, the previous block scanned is the last member of the atomic write and the first block stored for the atomic write. - In one embodiment, the nonvolatile memory uses a sequential, append only write structured writing system where new writes are appended on the front of the log (i.e. at the head of the log). In a further embodiment, the storage controller reclaims deleted, stale, and/or invalid blocks of the log using a garbage collection system, a groomer, a cleaner agent, or the like. The storage controller, in a further embodiment, uses a forward map to map logical block addresses to media addresses to facilitate use of the append only write structure and garbage collection.
- The
apparatus 500, in one embodiment, includes a physicalspace reservation module 522 that determines if the data storage device (i.e. storage device 120) has an amount of available physical storage capacity to satisfy the physical storage space request. If the physicalspace reservation module 522 determines that the amount of available physical storage capacity is adequate to satisfy the physical space reservation request, the physicalspace reservation module 522 reserves an amount of available physical storage capacity on thestorage device 120 to satisfy the physical storage space request. The amount of available physical storage capacity reserved to satisfy the physical storage space request is the reserved physical capacity. - The amount of reserved physical capacity may or may not be equal to the amount of storage space requested in the physical space reservation request. For example, the
storage layer 130 may need to store additional information with data written to astorage device 120, such as metadata, index information, error correcting code, etc. In addition, thestorage layer 130 may encrypt and/or compress data, which may affect storage size. - In one embodiment, the physical space reservation request includes an amount of logical space and the indication of an amount of physical storage capacity requested is derived from the requested logical space. In another embodiment, the physical space reservation request includes one or more LIDs and the indication of an amount of physical storage capacity requested is derived from an amount of data associated with the LIDs. In one example, the data associated with the LIDs is data that has been bound to the LIDs, such as in a write request. In another example, the data associated with the LIDs is a data capacity allocated to each LID, such as would be the case if a LID is an LBA and a logical block size could be used to derive the amount of requested physical storage capacity.
- In another embodiment, the physical space reservation request is a request to store data. In this embodiment the physical space reservation request may be implied and the indication of an amount of physical storage capacity requested may be derived from the data and/or metadata associated with the data. In another embodiment, the physical space reservation request is associated with a request to store data. In this embodiment, the indication of an amount of physical storage capacity requested is indicated in the physical space reservation request and may be correlated to the data of the request to store data.
- The physical
space reservation module 522 may also then factor metadata, compression, encryption, etc. to determine an amount of required physical capacity to satisfy the physical space reservation request. The amount of physical capacity required to satisfy the physical space reservation request may be equal to, larger than, or smaller than an amount indicated in the physical space reservation request. - Once the physical
space reservation module 522 determines an amount of physical capacity required to satisfy the physical space reservation request, the physicalspace reservation module 522 determines if one ormore storage devices 120A-N, either individually or combined, have enough available physical storage capacity to satisfy the physical space reservation request. The request may be for space on a particular storage device (e.g. 120A), a combination ofstorage devices 120A-N, such as would be the case if some of thestorage devices 120A-N are in a RAID configuration, or for available space generally. The physicalspace reservation module 522 may tailor a determination of available capacity to specifics of the physical space reservation request. - Where the physical space reservation request is for space on more than one storage device, the physical
space reservation module 522 will typically retrieve available physical storage capacity information from each logical-to-physical map of eachstorage device 120 or a combined logical-to-physical map of a group ofstorage devices 120A-N. The physicalspace reservation module 522 typically surveys bound media addresses. Note that the physicalspace reservation module 522 may not have enough information to determine available physical capacity by looking at bound LIDs, because there is typically not a one-to-one relationship between LIDs and media storage locations. - The physical
space reservation module 522 reserves physical storage capacity, in one embodiment, by maintaining enough available storage capacity to satisfy the amount of requested capacity in the physical space reservation request. Typically, in a log structured file system or other sequential storage device, the physicalspace reservation module 522 would not reserve a specific media region or media address range in thestorage device 120, but would instead reserve physical storage capacity. - For example, a
storage device 120 may have 500 gigabytes (“GB”) of available physical storage capacity. Thestorage device 120 may be receiving data and storing the data at one or more append points, thus reducing the storage capacity. Meanwhile, a garbage collection or storage space recovery operation may be running in the background that would return recovered erase blocks to storage pool, thus increasing storage space. The locations where data is stored and freed are constantly changing so the physicalspace reservation module 522, in one embodiment, monitors storage capacity without reserving fixed media storage locations. - The physical
space reservation module 522 may reserve storage space in a number of ways. For example, the physicalspace reservation module 522 may halt storage of new data if the available physical storage capacity on thestorage device 120 decreased to the reserved storage capacity, may send an alert if the physical storage capacity on thestorage device 120 was reduced to some level above the reserved physical storage capacity, or some other action or combination of actions that would preserve an available storage capacity above the reserved physical storage capacity. - In another embodiment, the physical
space reservation module 522 reserves a media region, range of media addresses, etc. on the data storage device. For example, if the physicalspace reservation module 522 reserved a certain quantity of erase blocks, data associated with the physical space reservation request may be stored in the reserved region or address range. The data may be stored sequentially in the reserved storage region or range. For example, it may be desirable to store certain data at a particular location. One of skill in the art will recognize reasons to reserve a particular region, address range, etc. in response to a physical space reservation request. - In one embodiment, the
apparatus 500 includes a physical spacereservation return module 524 that transmits to thestorage client 116 an indication of availability or unavailability of the requested amount of physical storage capacity in response to the physicalspace reservation module 522 determining if the data storage device has an amount of available physical storage space that satisfies the physical space reservation request. For example, if the physicalspace reservation module 522 determines that the available storage space is adequate to satisfy the physical space reservation request, the physical spacereservation return module 524 may transmit a notice that the physicalspace reservation module 522 has reserved the requested storage capacity or other appropriate notice. - If, on the other hand, the physical
space reservation module 522 determines that thestorage device 120 does not have enough available physical storage capacity to satisfy the physical space reservation request, the physical spacereservation return module 524 may transmit a failure notification or other indicator that the requested physical storage space was not reserved. The indication of availability or unavailability of the requested storage space, for example, may be used prior to writing data to reduce a likelihood of failure of a write operation. - The
apparatus 500, in another embodiment, includes a physical spacereservation cancellation module 526 that cancels all or a portion of reserved physical storage space in response to a cancellation triggering event. The cancelation triggering event may come in many different forms. For example, the cancellation triggering event may include determining that data to be written to thestorage device 120 and associated with available space reserved by the physicalspace reservation module 522 has been previously stored by thestorage layer 130. - For example, if a deduplication process (deduplication module 374) determines that the data has already been stored, the data may not need to be stored again since the previously stored data could be mapped to two or more LIDs. In a more basic example, if reserved physical storage space is associated with a write request and the write request is executed, the cancellation triggering event could be completion of storing data of the write request. In this example, the physical space
reservation cancellation module 526 may reduce or cancel the reserved physical storage capacity. - If the data written is less than the reserved space, the physical space
reservation cancellation module 526 may merely reduce the reserved amount, or may completely cancel the reserved physical storage capacity associated with the write request. Writing to less than the reserved physical space may be due to writing a portion of a data unit where the data unit is the basis of the request, where data associated with a physical space reservation request is written incrementally, etc. In one embodiment, physical storage space is reserved by the physical storagespace reservation module 522 to match a request and then due to compression or similar procedure, the storage space of the data stored is less than the associated reserved physical storage capacity. - In another embodiment, the cancellation triggering event is a timeout. For example, if a physical space reservation request is associated with a write request and the physical
space reservation module 522 reserves physical storage capacity, if the data associated with the write request is not written before the expiration of a certain amount of time the physical spacereservation cancellation module 526 may cancel the reservation of physical storage space. One of skill in the art will recognize other reasons to cancel all or a portion of reserved physical capacity. - The physical
space reservation module 522, in one embodiment, may increase or otherwise change the amount of reserved physical storage capacity. For example, the physical spacereservation request module 520 may receive another physical space reservation request, which may or may not be associated with another physical space reservation request. Where the physical space reservation request is associated with previously reserved physical storage capacity, the physicalspace reservation module 522 may increase the reserved physical storage capacity. Where the physical space reservation request is not associated with previously reserved physical storage capacity, the physicalspace reservation module 522 may separately reserve physical storage capacity and track the additional storage capacity separately. One of skill in the art will recognize other ways to request and reserve available physical storage capacity and to change or cancel reserved capacity. Standard management should include some kind of thresholds, triggers, alarms and the like for managing the physical storage capacity, providing indicators to the user that action needs to be taken. Typically, this would be done in the management system. But, either the management system would have to pool the devices under management or said devices would have to be configured/programmed to interrupt the manger when a criteria was met (preferred). - The
apparatus 500, in another embodiment, includes aLID binding module 528 that, in response to a request from astorage client 116 to write data, binds one or more unbound LIDs to media storage locations comprising the data and transmits the LIDs to thestorage client 116. TheLID assignment module 528, in one embodiment, allows on-the-fly allocation and binding of LIDs. The request to write data, in another embodiment, may be a two step process. TheLID binding module 528 may allocate LIDs in a first step for data to be written and then in a second step the data may be written along with the allocated LIDs. - In one embodiment, the
LID allocation module 402 allocates LIDs in a contiguous range. TheLID binding module 528 may also allocate LIDs in a consecutive range. Where a logical space is large, theLID allocation module 402 may not need to fragment allocated LIDs but may be able to choose a range of LIDs that are consecutive. In another embodiment, theLID allocation module 402 binds LIDs that may not be contiguous and may use logical spaces that are interspersed with other allocated logical spaces. - The
apparatus 500, in another embodiment, includes aDMA module 530 that pulls data from aclient 110 in a direct memory access (“DMA”) and/or a remote DMA (“RDMA”) operation. The data is first identified in a request to store data, such as a write request, and then thestorage layer 130 executes a DMA and/or RDMA to pull data from thestorage client 116 to astorage device 120. In another embodiment, the write request does not use a DMA or RDMA, but instead the write request includes the data. Again the media storage locations of the data are bound to the corresponding LIDs. - In one embodiment, the
apparatus 500 includes adeletion module 532. In response to a request to delete data from the data storage device, in one embodiment, thedeletion module 532 removes the mapping between storage space where the deleted data was stored and the corresponding LID. Thedeletion module 532 may also unbind the one or more media storage locations of the deleted data and also may deallocate the one or more logical addresses associated with the deleted data. -
FIG. 6 is a flow diagram of one embodiment of amethod 600 for allocating data storage space. In some embodiments, themethod 600, and the other methods and/or processes disclosed herein, may be embodied as instructions stored on a computer-readable storage medium. The instructions may be configured for execution by a computing device, and may be configured to cause the computing device to perform one or more of the disclosed method steps and/or operations. Alternatively, or in addition, one or more of the disclosed method steps and/or operations may be implemented by use of hardware components, such as special-purpose circuitry, logic elements, processors, ASICs, FPGAs, and/or the like. - Step 602 may comprise receiving an allocation request from a
storage client 116. The allocation request may be received through theinterface 138 of thestorage layer 130. Thelogical capacity module 404 determines 604 if alogical address space 136 includes sufficient unallocated logical capacity to satisfy the allocation request where the determination includes a search of a logical-to-physical map (e.g.,index 1204, or other datastructure). The logical-to-physical map includes bindings between LIDs of the logical space and corresponding media storage locations comprising data of the bound LIDs, wherein a bound LID differs from the one or more media storage locations addresses bound to the LID. Theallocation reply module 406 communicates 606 a reply to the requesting device and themethod 600 ends. -
FIG. 7 is a schematic flow chart diagram illustrating one embodiment of amethod 700 for allocating data storage space. Themethod 700 begins and the physicalcapacity request module 502 receives 702 from a requesting device a physical capacity request. The physical capacity request is received at the data storage device. The physical capacity request includes a request of an amount of available physical storage capacity in the data storage device. The physical capacity request, for example, may be a specific amount of physical capacity, may be derived from a request to store data, etc. - The physical
capacity allocation module 504 determines 704 the amount of available physical storage capacity on the data storage device where the amount of available physical storage capacity includes a physical storage capacity of unbound storage locations in the data storage device. The physicalcapacity reply module 506 communicates 706 a reply to the requesting device in response to the physicalcapacity allocation module 504 determines the amount of available physical storage capacity on the data storage device, and themethod 700 ends. -
FIG. 8 is a schematic flow chart diagram illustrating one embodiment of amethod 800 for reserving physical storage space. Themethod 800 begins and the physical spacereservation request module 520 receives 802 a physical space reservation request to reserve available physical storage space. The physical space reservation request includes an indication of an amount of physical storage capacity requested. The indication of an amount of physical storage capacity could take many forms, such as a number of bytes or a number of logical blocks, a request to store specific data, or other indirect indication where the indication of an amount of physical storage is derived from the request. - The physical
space reservation module 522 determines 804 if the data storage device has available physical storage capacity to satisfy the physical storage space request. If the physicalspace reservation module 522 determines 804 that the data storage device has available physical storage capacity to satisfy the physical storage space request, the physicalspace reservation module 522reserves 806 physical storage capacity adequate to service the physical space reservation request and the physical spacereservation return module 524 transmits 808 to the requestingstorage client 116 an indication that the requested physical storage space is reserved. - The
physical allocation module 404 maintains 810 enough available physical storage capacity to maintain the reservation of physical storage capacity until the reservation is used by storing data associated with the reservation or until the reservation is cancelled, and themethod 800 ends. If the physicalspace reservation module 522 determines 804 that the data storage device does not have available physical storage capacity to satisfy the physical storage space request, the physical spacereservation return module 524 transmits 812 to the requestingstorage client 116 an indication that the requested physical storage space is not reserved or an indication of insufficient capacity, and themethod 800 ends. -
FIG. 9 is a schematic flow chart diagram illustrating one embodiment of amethod 900 for binding LIDs to media storage locations. Themethod 900 begins and theLID binding module 528 receives 902 a write request from astorage client 116. The write request is a request to write data to one ormore storage devices 120. For 902 may comprise determining whether the request is associated with any LIDs (e.g., determining whether LIDs have been allocated for the request). - Step 904 may comprise allocating LIDs to the storage client to service the write request (if necessary), as disclosed above. Step 904 may further comprise identifying LIDs allocated to the storage client for use in referencing the data of the write request. Step 904 may comprising indicating that the identified LIDs are allocated by the storage client and are currently being used to reference valid data on a storage device. Step 904 may further comprise allocating and/or reserving physical storage capacity for the write request (by use of the physical
capacity allocation module 504, as disclosed above. - Step 906 may comprise servicing the write request by, inter alia, storing data of the write request onto one or more storage device(s) 120. The data may be stored in a contextual, log-based format, as disclosed herein. The data may be stored at one or more physical storage locations, which may be referenced by respective media addresses. Step 908 may comprise binding the LIDs identified at
step 904 to the media addresses ofstep 906. Step 908 may, therefore, comprise themapping module 518 binding the media addresses to the LIDs identified at step 904 (e.g., binding the LIDs to the media addresses in one ormore entries 1205A-N of an index). In some embodiments, the media addresses may be determined concurrently with (or after) the data is stored atstep 906. - In some embodiments, step 910 further comprises providing an indication of the LIDs used to satisfy the write request (the LIDs identified at step 904) to the
storage client 116. The LIDs may be communicated in an acknowledgement message, a return value, a callback, or other suitable mechanism. -
FIG. 10 is a schematic flow chart diagram illustrating another embodiment of amethod 1000 for binding allocated LIDs indata storage device 120. Themethod 1000 begins and theLID binding module 528 receives 1002 a request to bind LIDs to data where the LIDs are allocated to thestorage client 116 making the request. TheLID binding module 528binds 1004 LIDs to media storage locations comprising the data. TheLID binding module 528 communicates 1006 the bound LIDs to thestorage client 116. - The
storage layer 130 receives 1006 a write request to write data to astorage device 120 where the data is already associated with bound LIDs. In other embodiments, the write request is to store the data on more than onestorage device 120 in thestorage system 102, such as would be the case if thestorage devices 120 are RAIDed or if the data is written to aprimary storage device 120 and to amirror storage device 120. Thestorage controller 140stores 1010 the data on thestorage device 120 and themapping module 518maps 1012 one or more media storage locations where the data is stored to the bound LIDs (e.g., updates the binding between the LIDs and media storage locations in the index 1204).Step 1014 may further comprise communicating an indication that the request ofstep 1002 was successfully completed. -
FIG. 11 is a schematic flow chart diagram illustrating an embodiment of amethod 1100 for servicing an allocation query at a storage device. The allocationquery request module 510 receives 1102 an allocation query at the data storage device. The allocationquery determination module 512 identifies 1104 one or more LIDs that meet a criteria specified in the allocation query. The identified LIDs include allocated LIDs that are bound, allocated LIDs that are unbound, and/or unallocated LIDs. The allocationquery reply module 514 communicates 1106 the results of the allocation query to a requesting device or other designated device and themethod 1100 ends. The results may include a list of the identified LIDs, an acknowledgement that LIDs meeting the criteria were found, an acknowledgement that LIDs meeting the criteria in the allocation query were not found, etc. -
FIG. 12 depicts another example of anindex 1204 for associating LIDs with storage locations on a non-volatile storage device. Theindex 1204 may comprise a tree (or other datastructure) comprising a plurality of entries (e.g.,entries index 1204 may associate a LID (or LID range, extent, or set) with one or more media storage locations, as described above. The LIDs may be contiguous (e.g. 072-083). Other entries, such as 1218, may comprise a discontiguous set of LIDs (e.g., LID 454-477 and 535-598). Accordingly, theindex 1204 may be used to represent variable sized storage entries (e.g., storage entries corresponding to one or more storage locations of the—volatile storage device 120 comprising data of an arbitrary set or range of LIDs). - The storage entries may further comprise and/or reference metadata 1219, which may comprise metadata pertaining to the LIDs, such as age, size, LID attributes (e.g., client identifier, data identifier, file name, group identifier), and so on. Since the metadata 1219 is associated with the storage entries, which are indexed by LID (e.g., address 1215), the metadata 1219 may remain associated with the
storage entry 1214 regardless of changes to the location of the underlying storage locations on the non-volatile storage device 120 (e.g., changes to the storage locations 1217). - The
index 1204 may be used to efficiently determine whether thenon-volatile storage device 120 comprises a storage entry referenced in a client request and/or to identify a storage location of data on thedevice 120. For example, thenon-volatile storage device 120 may receive a request to allocate a particular LID. The request may specify a particular LID, a LID and a length or offset (e.g., request 3 units of data starting from LID 074), a set of LIDs or the like. Alternatively, or in addition, the client request may comprise a set of LIDs, LID ranges (continuous or discontinuous), or the like. - The
non-volatile storage device 120 may determine whether a storage entry corresponding to the requested LIDs is in theindex 1204 using a search operation. If a storage entry comprising the requested LIDs is found in theindex 1204, the LID(s) associated with the request may be identified as being allocated and bound. Accordingly, data corresponding to the LID(s) may be stored on thenon-volatile storage device 120. If the LID(s) are not found in theindex 1204, the LID(s) may be identified as unbound (but may be allocated). Since the storage entries may represent sets of LIDS and/or LID ranges, a client request may result in partial allocation. For example, a request to allocate 068-073 may successfully allocate LIDs 068 to 071, but may fail to allocate 072 and 073 since these are included in thestorage entry 1214. In the event of a partial allocation, the entire allocation request may fail, the available LIDs may be allocated and other LIDs may be substituted for the failed LIDs, or the like. - In the example depicted in
FIG. 12 , the storage entry corresponding to the storage request is in the index 1204 (storage entry 1214), and, as such, the LIDs associated with the request are identified as allocated and bound. Therefore, if the client request is to read data at the specified LIDs, data may be read from thestorage locations 1217 identified in thestorage entry 1214 and returned to the originator of the request. If the request is to allocate the identified LIDs, the allocation request may fail (and/or substitute LIDs may be allocated as described above). - When new storage entries are added to the
index 1204, a merge operation may occur. In a merge operation, an existing storage entry may be “merged” with one or more other storage entries. For instance, a new storage entry for LIDs 084-088 may be merged withentry 1214. The merge may comprise modifying theLID 1215 of the storage entry to include the new addresses (e.g., 072-088) and/or to reference thestorage locations 1217 to include the storage location on which the data was stored. - Although the storage entries in the
index 1204 are shown as comprising references to storage locations (e.g., addresses 1217), the disclosure is not limited in this regard. In other embodiments, the storage entries comprise reference or indirect links to the storage locations. For example, the storage entries may include a storage location identifier (or reference to the reverse map 1222). -
FIG. 12 depicts another example of an index comprising areverse map 1222, which may associate storage locations of thenon-volatile storage device 120 with LIDs in thelogical address space 136. Thereverse map 1222 may also associate a storage location with metadata, such as avalidity indicator 1230, and/orother metadata 1236. In some embodiments, thestorage location address 1226 and/orlength 1228 may be explicitly included in thereverse map 1222. Alternatively, thestorage location address 1226 and/ordata length 1228 may be inferred from a location and/or arrangement of an entry in thereverse map 1222 and, as such, theaddress 1226 and/ordata length 1228 may be omitted. In some embodiments, thereverse map 1222 may include references toLIDs 1234. - As discussed above, the
reverse map 1222 may comprisemetadata 1236, which may include metadata pertaining to sequential storage operations performed on the storage locations, such as sequence indicators (e.g., timestamp) to indicate an ordered sequence of storage operations performed on the storage device (e.g., as well as an “age” of the storage locations and so on). Themetadata 1236 may further include metadata pertaining to the storage media, such as wear level, reliability, error rate, disturb status, and so on. Themetadata 1236 may be used to identify unreliable and/or unusable storage locations, which may reduce the physical storage capacity of thenon-volatile storage device 120. - The
reverse map 1222 may be organized according to storage divisions (e.g., erase blocks) of thenon-volatile storage device 120. In this example, theentry 1220 that corresponds tostorage entry 1218 is located in eraseblock n 1238. Eraseblock n 1238 is preceded by erase block n−1 1240 and followed by erase block n+1 1242 (the contents of erase blocks n−1 and n+1 are not shown). An erase block may comprise a predetermined number of storage locations. An erase block may refer to an area in thenon-volatile storage device 120 that is erased together in a storage recovery operation. - The
validity indicator 1230 may be used to selectively “invalidate” data. Data marked as invalid in thereverse index 1222 may correspond to obsolete versions of data (e.g., data that has been overwritten and/or modified in a subsequent storage operation). Similarly, data that does not have a corresponding entry in theindex 1204 may be marked as invalid (e.g., data that is no longer being referenced by a storage client 116). Therefore, as used herein, “invalidating” data may comprise marking the data as invalid in thestorage metadata 135, which may include removing a reference to the media storage location in theindex 1204 and/or marking avalidity indicator 1230 of the data in the reverse map. - In some embodiments, the
groomer module 370, described above, uses thevalidity indicators 1230 to identify storage divisions (e.g., erase blocks) for recovery. When recovering (or reclaiming) an erase block, the erase block may be erased and valid data thereon (if any) may be relocated to new storage locations on the non-volatile storage media. Thegroomer module 370 may identify the data to relocate using the validity indicator(s) 1230. Data that is invalid may not be relocated (may be deleted), whereas data that is still valid (e.g., still being referenced within the index 1204) may be relocated. After the relocation, the groomer module 370 (or other process) may update theindex 1204 to reference the new media storage location(s) of the valid data. Accordingly, marking data as “invalid” in thestorage metadata 135 may cause data to be removed from thenon-volatile storage media 122. The removal of the data, however, may not occur immediately (when the data is marked “invalid”), but may occur in response to a grooming operation or other processes that is outside of the path for servicing storage operations and/or requests. Moreover, when relocating data thegroomer module 370 may be configured to determine whether the contextual format of the data should be updated by referencing the storage metadata 135 (e.g., thereverse map 1222 and/or index 1204). - The
validity metadata 1230 may be used to determine an available physical storage capacity of the non-volatile storage device 120 (e.g., a difference between physical capacity (or budgeted capacity) and the storage locations comprising valid data). Thereverse map 1222 may be arranged by storage division (e.g. erase blocks) or erase region to enable efficient traversal of the physical storage space (e.g., to perform grooming operations, determine physical storage capacity, and so on). Accordingly, in some embodiments, the available physical capacity may be determined by traversing the storage locations and/or erase blocks in thereverse map 1222 to identify the available physical storage capacity (and/or is being used to store valid data). - Alternatively, or in addition, the reverse map 1222 (or other datastructure) may comprise an
indicator 1239 to track the available physical capacity of thenon-volatile storage device 120. The availablephysical capacity indicator 1239 may be initialized to the physical storage capacity (or budgeted capacity) of thenon-volatile storage device 120, and may be updated as storage operations are performed. The storage operations resulting in an update to the available physicalstorage capacity indicator 1239 may include, but are not limited to: storing data on thestorage device 120, reserving physical capacity on thestorage device 120, canceling a physical capacity reservation, storing data associated with a reservation where the size of the stored data differs from the reservation, detecting unreliable and/or unusable storage locations and/or storage division (e.g., taking storage locations out of service), and so on. - In some embodiments, the
metadata 1204 and/or 1222 may be configured to reflect reservations of physical storage capacity. As described above in conjunction withFIG. 8 , a storage client may reserve physical storage capacity for an operation that is to take place over time. Without a reservation, the storage client may begin the operation, but other clients may exhaust the physical capacity before the operation is complete. In some embodiments, thestorage client 116 issues a request to reserve physical capacity before beginning the storage operation. Thestorage layer 130 may update storage metadata 135 (e.g., theindexes 1204 and/or 1222, disclosed herein), to indicate that the requested portion has been reserved. The reserved portion may not be associated with any particular media storage locations; rather, the reservation may indicate that thestorage layer 130 is to maintain at least enough physical storage capacity to satisfy the reservation. For example, theindicator 1239 of remaining physical storage capacity may be reduced by the amount of reserved physical storage capacity. Requests subsequent to the reservation may be denied if satisfying the requests would exhaust the remaining physical storage capacity in the updatedindicator 1239. In some embodiments, a reservation of physical storage capacity may be valid for a pre-determined time, until released by the storage client, until another, higher-priority request is received, or the like. The reservation may expire once the storage client that reserved the physical capacity consumes the reserved physical storage capacity in one or more subsequent storage operations. If the storage operations occur over a series of storage operations (as opposed to a single operation), the reservation may be incrementally reduced accordingly. -
FIG. 13 depicts another example of anindex 1304 for managing storage allocation of a non-volatile storage device. In theFIG. 13 example, theindex 1304 may be modified to include one or more allocation entries (e.g., allocated entry 1314). An allocation entry may be used to track LIDs that are allocated to a client, but are not yet bound (e.g., are not associated with data stored on the non-volatile storage device 120). Therefore, unlike the storage entries (e.g.,entries 1308, 1316, and 1318), anallocation entry 1314 may not include references tostorage locations 1317; these references may be set to “unbound,” NULL, or may be omitted. Similarly,metadata 1319 associated with theallocation entry 1314 may indicate that the entry is not bound and/or associated with data. - The
index 1304 may be used to determine an available logical capacity of the logical address space 136 (e.g., by traversing the index 1304). The available logical capacity may consider LIDs that are bound (using the storage entries), as well as LIDs that are allocated, but not yet bound (using the allocation entries, such as 1314). - As shown in
FIG. 13 , in some embodiments, theallocation entries 1314 may be maintained in theindex 1304 with the storage entries. Alternatively, allocation entries may be maintained in a separate index (or other datastructure). When an allocation entry becomes associated with data on the non-volatile storage device 120 (e.g., as associated with storage locations), the allocation entry may be modified and/or replaced by a storage entry. - In some embodiments, the index 1304 (or index 1204) may comprise an
indicator 1330 to track the available logical capacity of thelogical address space 136. The available logical capacity may be initialized according to thelogical address space 136 presented by thestorage device 120. Changes to theindex 1304 may cause the availablelogical capacity indicator 1330 to be updated. The changes may include, but are not limited to: addition of new allocation entries, removal of allocation entries, addition of storage entries, removal of allocation entries, or the like. -
FIG. 14 depicts an example of anunallocated index 1444, which may be used to allocate storage in a non-volatile storage device. Theindex 1444 may compriseentries 1450, which may correspond to “holes” in theLIDs indexes 1204 and/or 1304 described above. Accordingly anentry 1450 in theavailable index 1444 may correspond to a LID (and/or LID range, set, or the like) that is available (e.g., is not allocated nor bound). Theindex 1444 may be used to quickly determine the logical storage capacity of a logical storage space and/or to identify LIDs to allocate in response to client requests. In theFIG. 14 example, the entries in theindex 1444 are shown as being indexed by LID. In some embodiments, however, theindex 1444 may be indexed in other (or additional) ways. For example, theunallocated index 1444 may be indexed by LID range (e.g., by the size of the LID range) as well as LID. This indexing may be used to identify unallocated LIDs sized according to client requests (e.g., to efficiently fill “holes” in the logical address space 136). -
FIG. 15 is a flow diagram of one embodiment of amethod 1500 for allocating storage. As described above, steps of themethod 1500 may be tied to particular machine components and/or may be implemented using machine-readable instructions stored on a non-transitory machine-readable storage medium. - At
step 1510, a non-volatile storage device may be initialized for use. The initialization may comprise allocating resources for the non-volatile storage device (e.g., solid-state storage device 120), such as communications interfaces (e.g., bus, network, and so on), allocating volatile memory, accessing solid-state storage media, and so on. The initialization may further comprise presenting alogical address space 136 tostorage clients 116, initializing one or more indexes (e.g., the indexes described above in conjunction withFIGS. 12-14 ), and so on. - At
step 1520, the non-volatile storage device may present a logical space to one or more clients.Step 1520 may comprise implementing and/or providing an interface (e.g., API) accessible to one or more clients, or the like. - At
step 1530, the non-volatile storage device may maintain metadata pertaining to logical allocation operations performed by themethod 1500. The logical allocation operations may pertain to operations in thelogical address space 136 presented atstep 1520, and may include, but are not limited to: allocating logical capacity, binding logical capacity to media storage locations, and so on. The metadata may include, but is not limited to: indexes associating LIDs in thelogical address space 136 with media storage locations on the non-volatile storage device; indexes associating storage locations with LIDs (e.g.,index 1204 ofFIG. 12 ), allocation entries indicating allocated LIDs having no associated storage location (e.g.,index 1304 ofFIG. 13 ), an unallocated index (e.g. index 1444 ofFIG. 14 ), maintaining an indicator of unallocated logical capacity (e.g.,indicator 1330 ofFIG. 13 ), and so on. - At
step 1540, a client request pertaining to a LID in thelogical address space 136 may be received. The client request may comprise a query to determine if a particular LID and/or logical capacity can be allocated, a request to allocate a LID and/or logical capacity, a request to store data on the non-volatile storage device, or the like. - At
step 1550, the metadata maintained atstep 1530 may be referenced to determine whether the client request can be satisfied.Step 1550 may comprise referencing the metadata (e.g., indexes and/or indicators) maintained atstep 1530 to determine an available logical capacity of thelogical address space 136 and/or to identify available LIDs (or LID range) as described above. - At
step 1560, themethod 1500 may provide a response to the client request, which if the request cannot be satisfied may comprise providing a response to indicate such. Providing the response may comprise one or more of: an indicator that the allocation can be satisfied, allocating LIDs satisfying the request, providing allocated LIDs satisfying the request, providing one or more requested LIDs and/or one or more additional LIDs, (e.g., if a portion of a requested set of LIDs can be allocated), or the like. - Following
step 1560, the flow may return to step 1530, where themethod 1500 may update the metadata (e.g., indexes, indicators, and so on) according to the allocation operation (if any) performed atstep 1560. -
FIG. 16 is a flow diagram depicting an embodiment of amethod 1600 for allocating storage. As described above, steps of themethod 1600 may be tied to particular machine components and/or may be implemented using machine-readable instructions stored on a non-transitory machine-readable storage medium. - At
steps method 1600 may be initialized, present a logical storage space to one or more clients, and/or maintain metadata pertaining to logical operations performed by themethod 1600. - At
step 1632, themethod 1600 may maintain metadata pertaining to physical storage operations performed by themethod 1600. The storage operations may include, but are not limited to: reserving physical storage capacity, canceling physical storage capacity reservations, storing data on the non-volatile storage device, deallocating physical storage capacity, grooming operations (e.g., garbage collection, error handling, and so on), physical storage space budgeting, and so on. As discussed above, metadata maintained atstep 1632 may include, but is not limited to: indexes associating LIDs in thelogical address space 136 with storage locations on the non-volatile storage device; indexes associating storage locations with LIDs (e.g.,index 1204 ofFIG. 12 ), allocation entries indicating allocated LIDs having no associated storage location (e.g.,index 1304 ofFIG. 13 ), an unallocated index (e.g. index 1444 ofFIG. 14 ), maintaining an indicator of unallocated logical address space 136 (e.g.,indicator 1330 ofFIG. 13 ), and so on. - At
step 1640, a client request pertaining to physical storage capacity of the non-volatile storage device may be received. The client request may comprise a query to determine if physical storage capacity is available, a request to reserve physical storage capacity, a request to store data, a request to deallocate data (e.g., TRIM), or the like. - At
step 1650, the metadata maintained atsteps 1630 and/or 1632 may be referenced to determine whether the client request can be satisfied.Step 1650 may comprise referencing the metadata atsteps 1630 and/or 1632 to determine an available physical storage capacity of the non-volatile storage device and/or to identify storage locations associated with particular LIDs (e.g., in a deallocation request or TRIM) as described above. - At
step 1660, themethod 1600 may provide a response to the client request, which if the request cannot be satisfied may comprise providing a response to indicate such. Providing the response may comprise one or more of: indicating that the client request can and/or was satisfied, reserving physical storage capacity for the client; cancelling a physical storage capacity reservation, storing data on the non-volatile storage device, deallocating physical storage capacity, or the like. - Referring back to
FIG. 1 , thestorage layer 130 may be configured to maintain allocations of thelogical address space 136 and/or bindings between LIDs and media storage locations using, inter alia, thestorage metadata 135. Thestorage layer 130 may be further configured to store data in contextual format; as disclosed above, the contextual format may comprise associating persistent, contextual metadata (e.g., logical interface) with the data. Accordingly, contextual metadata pertaining to the data may be determined independent of thestorage metadata 135. Moreover, thestorage layer 130 may be configured to store data in a sequential log, such that a sequence of storage operations performed through thestorage layer 130 can be replayed and/or thestorage metadata 135 may be reconstructed, based upon the contents of thestorage device 120. In some embodiments, thestorage layer 130 may maintain a large, thinly provisionedlogical address space 136, which may simplify logical allocation operations for the storage clients (e.g., allow thestorage clients 116 to operate within large, contiguous LID ranges, with low probability of LID collisions). Thestorage layer 130 may be further configured to deter the reservation of media storage locations until needed, to prevent premature exhaustion or over-reservation of physical storage resources. - The
storage layer 130 may expose access to thelogical address space 136 and/orstorage metadata 135 to thestorage clients 116 through one ormore interfaces 140. As disclosed herein,storage clients 116 may delegate certain functions to thestorage layer 130.Storage clients 116 may leverage thevirtual storage interface 132 to perform various operations, including, but not limited to:logical address space 136 management, media storage location management (e.g., mappings between LIDs and media storage locations, such as thin provisioning), deferred physical resource reservation, crash recovery, logging, backup (e.g., snapshots), crash recovery, data integrity, transactions, data move operations, cloning, deduplication, and so on. - In some embodiments,
storage clients 116 may leverage the contextual, log format to delegate crash recovery and/or data integrity functionality to thestorage layer 130. For instance, after an invalid shutdown and reconstruction operation, thestorage controller 130 may provide access to the reconstructedstorage metadata 135 tostorage clients 116 through theinterface 138. Thestorage clients 116 may, therefore, delegate crash-recovery and/or data integrity to thestorage layer 130. Filesystem storage clients 116 may require crash-recovery and/or data integrity services for certain data, such as I-node tables, file allocation tables, and so on. Thestorage client 116 may have to implement these services itself, which may impose significant overhead and/or complexity. Thestorage client 116 may be relieved from this overhead by delegating crash recovery and/or data integrity to thestorage layer 130, as disclosed herein. - In some embodiments,
storage clients 116 may also delegate logical allocation operations and/or physical storage reservations to thestorage layer 130. Astorage client 116, such as a file system, may maintain its own metadata to track logical and physical allocations for files; thestorage client 116 may maintain a set of logical addresses that “mirrors” the media storage locations of thenon-volatile storage device 120. If theunderlying storage device 120 provides a one-to-one mapping between logical block address and media storage locations, as with conventional storage devices, the block storage layer performs appropriate LBA-to-media address translations and implements the requested storage operations. If, however, the underlying non-volatile storage device does not support one-to-one mappings (e.g., the underlying storage device is a sequential, or write-out-of-place device, such as a solid-state storage device), another redundant set of translations are needed (e.g., a Flash Translation Layer, or other mapping). The redundant set of translations and the requirement that thestorage client 116 maintain logical address allocations may represent a significant overhead, and may make allocating contiguous LBA ranges difficult or impossible without time-consuming “defragmentation” operations. Thestorage client 116 may delegate such allocation functionality to thestorage layer 130. Thestorage layer 130 may leverage a thinly provisionedlogical address space 136 to manage large, contiguous LID ranges for thestorage client 116, without the need for redundant address translation layers. -
FIG. 17 depicts one exemplary embodiment of anindex 1804 for maintaining allocations within a logical address space, such as thelogical address space 136, described above. Theindex 1804 may be embodied as a datastructure on avolatile memory 112 and/or non-transitory, machine—readable storage media 114 (e.g., part of the storage metadata 135). Theindex 1804 may comprise an entry for each allocated range of LIDs. The allocated LIDs may or may not be associated with media storage locations on the non-volatile storage device (e.g., non-volatile storage device 120). The entries may be indexed and/or linked by LID. As discussed above, in some embodiments, the storage metadata (e.g., metadata 135) may comprise a separate index to track unallocated LIDs in thelogical address space 136. - The entries in the
index 1804 may include LIDs that are allocated, but that are not associated with media storage locations on a non-volatile storage device. Like theindex 1204 described above, inclusion in theindex 1804 may indicate that a LID is both allocated and associated with valid data on thenon-volatile storage device 120. Alternatively, theindex 1804 may be implemented similarly to theindex 1304 ofFIG. 13 . In this case, theindex 1804 may comprise entries that are associated with valid data on thenon-volatile storage device 120 along with entries that are allocated but are not associated with stored data. The entries that are associated with valid data may identify the media storage location of the data, as described above. Entries that are not associated with valid, stored data (e.g., “allocation entries” such as theentry 1314 ofFIG. 13 ) may have a “NULL” media storage location indicator or some other suitable indicator. - In some embodiments, the
index 1804 may comprise security-related metadata, such as access control metadata, or the like. The security related metadata may be associated with each respective entry (e.g., entry 1812) in theindex 1804. When storage requests pertaining to a particular LID are received by thestorage layer 130, thestorage layer 130 may access and/or enforce the security-related metadata (if any) in the corresponding entry. In some embodiments, thestorage layer 130 delegates enforcement of security-related policy enforcement to another device or service, such as an operating system, access control system, or the like. Accordingly, when implementing storage operations, thestorage layer 130 may access security-related metadata and verify that the requester is authorized to perform the operating using a delegate. If the delegate indicates that the requester is authorized, thestorage layer 130 implements the requested storage operations; if not, thestorage layer 130 returns a failure condition. - The
storage layer 130 may access thestorage metadata 135, such as theindex 1804, to allocate LIDs in thelogical address space 136, to determine a remaining logical capacity of thelogical address space 136, to determine the remaining physical storage capacity of the non-volatile storage device(s) 120, and so on. Thestorage layer 130 may respond to queries for the remaining logical capacity, remaining physical storage capacity, and the like via thevirtual storage interface 132. Similarly, thestorage layer 130 may service requests to reserve physical storage capacity on thenon-volatile storage device 120. As described above, astorage client 116 may wish to perform a sequence of storage operations that occur over time (e.g., receive a data stream, perform a DMA transfer, or the like). Thestorage client 116 may reserve sufficient logical and/or physical storage capacity to perform the sequence of storage operations up-front to ensure that the operations can be completed. Reserving logical capacity may comprise allocating LIDs through the storage layer 130 (using the virtual storage interface 132). Physical capacity may be similarly allocated. Thestorage client 116 may request to reserve physical capacity through thevirtual storage interface 132. If a sufficient amount of physical capacity is available, thestorage layer 130 acknowledges the request and updates the storage metadata accordingly (and as described above in conjunction withFIGS. 8 and 12 ). - The
storage layer 130 and/orstorage metadata 135 is not limited to the particular, exemplary datastructures described above. Thestorage metadata 135 may comprise any suitable datastructure (or datastructure combination) for efficiently trackinglogical address space 136 allocations and/or associations between LIDs and media storage locations. For example, theindex 1804 may be adapted such that entries in theindex 1804 comprise and/or are linked to respective physical binding metadata. The physical binding metadata may comprise a “sub-index” of associations between LIDs in a particular allocated range and corresponding media storage locations on the non-volatile storage medium. Each “sub-range” within the allocated LID comprises an entry associating the sub-range with a corresponding media storage location (if any). -
FIG. 18 depicts one embodiment of an index entry comprising physical binding metadata. Theentry 1818 represents an allocated LID having a range from 31744 through 46080 in the logical address space. The entries of the physical binding metadata associate sub-ranges of the LID with corresponding media storage locations (if any). The physicalbinding metadata 1819 may be indexed by LID as described above. In theFIG. 18 example, the LID sub-range comprising 31817 to 46000 ofentry 1822 is not associated with valid data on the non-volatile storage device and, as such, is associated with a “NULL” media storage location. Theentry 1824 for the sub-range 46001 to 46080 is associated with valid data. Theentry 1824 identifies the media storage location of the data on the non-volatile storage device (locations 12763 through 12842). Theentry 1826 identifies the media storage location of the valid data associated with the sub-range for 31744-31816. - In some embodiments, the
storage layer 130 is configured to segment the LIDs in thelogical address space 136 into two or more portions. As shown inFIG. 19A , aLID 1900 is segmented into afirst portion 1952 and asecond portion 1954. In some embodiments, thefirst portion 1952 comprises “high-order” bits of theLID 1900, and the second portion comprises “low-order” bits. However, the disclosure is not limited in this regard and could segment LIDs using any suitable segmentation scheme. - The
first portion 1952 may serve as a reference or identifier for a storage entity. As used herein, a storage entity refers to any data or data structure that is capable of being persisted to thenon-volatile storage device 120; accordingly, a storage entity may include, but is not limited to: file system objects (e.g., files, streams, I-nodes, etc.), a database primitive (e.g., database table, extent, or the like), streams, persistent memory space, memory mapped files, virtual storage unit (VSU), logical unit number (LUN), virtual logical unit number (VLUN), logical storage unit (LSU), block storage device, or the like. - The
second portion 1954 may represent an offset into the storage entity. For example, thestorage layer 130 may reference thelogical address space 136 comprising 64-bit LIDs (thelogical address space 136 may comprise 2̂64 unique LIDs). Thestorage layer 130 may partition the LIDs into afirst portion 1952 comprising the high-order 32 bits of the 64-bit LID and asecond portion 1954 comprising the low-order 32 bits of the LID. The resultinglogical address space 136 may be capable of representing 2̂32−1 unique storage entities (e.g., using the first portion of the LIDs), each having a maximum size (or offset) of 2̂32 virtual storage locations (e.g., 2 TB for a virtual storage location size of 512 bytes). The disclosure is not limited in this regard, however, and could be adapted to use any suitable segmentation scheme. For example, in implementations that require a large number of small storage entities (e.g., database applications, messaging applications, or the like), thefirst portion 1952 may comprise a larger proportion of the LID. For instance, thefirst portion 1952 may comprise 42 bits (providing 2̂42−1 unique identifiers), and the second portion may comprise 22 bits (providing a maximum offset of 4 GB). Alternatively, where larger files are required, the segmentation scheme may be similarly modified. Furthermore, thestorage layer 130 may present larger logical address spaces (e.g., 128 bits and so on) in accordance with the requirements of thestorage clients 116, configuration of thecomputing device 110, and/or configuration of thenon-volatile storage device 120. In some embodiments, thestorage layer 130 segments thelogical address space 136 in response to a request from astorage client 116 or other entity. - The
storage layer 130 may allocate LIDs based on thefirst portion 1952. For example, in a 64 bit address space, when thestorage layer 130 allocates a LID comprising a first portion 1952 [0000 0000 0000 0000 0000 0000 0000 0100] (e.g.,first portion 1952 logical address 4), thestorage layer 130 is effectively allocating a logical address range comprising 2̂32 unique LIDs 1956 (4,294,967,296 unique LIDS) ranging from: - [0000 0000 0000 0000 0000 0000 0000 0100 0000 0000 0000 0000 0000 0000 0000 0000]
- to,
- [0000 0000 0000 0000 0000 0000 0000 0100 1111 1111 1111 1111 1111 1111 1111 1111]
- In some embodiments, the
storage layer 130 uses the segmentation of the LIDs to simplify thestorage metadata 135. In one example, the number of bits in thefirst portion 1952 is X, and the number of bits in thesecond portion 1954 is Y. Thestorage layer 130 may determine that the maximum number of unique LIDs that can be allocated is 2̂X, and that the allocated LIDs can be referenced using only the first portion of the LID (e.g., the set of X bits). Therefore, thestorage layer 130 may simplify the storage metadata index to use entries comprising only the first portion of a LID. Moreover, thestorage layer 130 may determine that the LIDs are allocated in fixed-sized ranges of 2̂Y. Accordingly, each entry in the storage metadata 135 (e.g., index 1904) may be of the same extent. Therefore, the range portion of the metadata entries may be omitted. -
FIG. 19B depicts one example of anallocation index 1904 that has been simplified by segmenting thelogical address space 136. For clarity, thefirst portion 1952 of the LIDs in thelogical address space 136 managed by theindex 1904 is depicted using eight (8) bits. The remaining portion of the LID (e.g., remaining 54 bits) may be used as thesecond portion 1954. Alternatively, other portions of the LID may be used for otherlogical address space 136 segmentation schemes, such as logical volume identifiers, partition identifiers, and so on. - Each
entry 1912 in theindex 1904 may be uniquely identified using the first portion (eight bits) of a LID. Accordingly, theentries 1912 may be indexed using only the first portion 1952 (e.g., 8 bits). This simplification may reduce the amount of data required to identify anentry 1912 from 64 bits to 8 bits (assuming a 64-bit LID with an 8-bit first portion). Moreover, the LIDs may be allocated in fixed sized logical ranges (e.g., in accordance with the second portion 1954). Therefore, eachentry 1912 may represent the same range of allocated LIDs. As such, theentries 1912 may omit explicit range identifiers, which may save an additional 64 bits perentry 1912. - The
storage layer 130 may use thesimplified index 1904 to maintain LID allocations in thelogical address space 136 and/or identify LIDs to allocate in response to requests fromstorage clients 116. In some embodiments, thestorage layer 130 maintains a listing of “first portions” that are unallocated. Since, in some embodiments, allocations occur in a pre-determined way (e.g., using only thefirst portion 1952, and within a fixed range 1956), the unallocated LIDs may be expressed in a simple list or map as opposed to an index or other datastructure. As LIDs are allocated, they are removed from the datastructure and are replaced when they are deallocated. - Associations between portions of the entry and valid data on the non-volatile storage device may be maintained in the index 1904 (using physical binding metadata as described above).
FIG. 19C depicts an example of physical binding metadata for use in a segmented logical addressing scheme. For clarity, in theFIG. 19C example, LIDs are segmented such that thefirst portion 1952 comprises 56 bits, and thesecond portion 1954 comprises 8 bits (the reverse ofFIG. 19B ). Theentry 1914 is identified using thefirst portion 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0111 1010. Theentries 1922 of theindex 1919 may be simplified to reference only offsets within the entry 1914 (e.g., within the second portion, which comprises 8 bits in theFIG. 19C example). Moreover, thehead entry 1926 may omit the top-end of the second portion (e.g., may omit 1111 1111 since it can be determined that the top-most entry will necessarily include the maximal extent of the range defined by the second portion). Similarly, thetail entry 1924 may omit the bottom-end of the second portion 1954 (e.g., may omit 0000 000 since it can be determined that the bottom-most entry will necessarily include the beginning of the range defined by the second portion 1954). Eachentry 1914 associates a range within the second portion with valid data on the non-volatile storage device (if any), as described above. - As described above,
storage clients 116 may delegate LID allocation to thestorage layer 130 using thevirtual storage interface 132. The delegation may occur in a number of different ways. For example, astorage client 116 may query the storage layer 130 (via thestorage layer 130 interface 138) for any available LID. If a LID is available, thestorage layer 130 returns an allocated LID to thestorage client 116. Alternatively, thestorage client 116 may request a particular LID for allocation. The request may comprise the first portion of the LID or an entire LID (with an offset). Thestorage layer 130 may determine if the LID is unallocated and, if so, may allocate the LID for the client and return an acknowledgement. If the LID is allocated (or the LID falls within an allocated range), thestorage layer 130 may allocate an alternative LID and/or may return an error condition. Thestorage layer 130 may indicate whether particular LIDs are allocated and/or whether particular LIDs are bound to media storage locations on thenon-volatile storage device 120. The queries may be serviced via thevirtual storage interface 132. - In embodiments in which the
storage layer 130 implements segmented LIDs, thestorage layer 130 may expose the segmentation scheme to thestorage clients 116. For example,storage clients 116 may query thestorage layer 130 to determine the segmentation scheme currently in use. Thestorage clients 116 may also configure thestorage layer 130 to use a particular LID segmentation scheme adapted to the needs of thestorage client 116. - The
storage layer 130 may allocate LIDs using only thefirst portion 1952 of a LID. If the LID is unallocated, thestorage layer 130 acknowledges the request, and thestorage client 116 is allocated a range of LIDs in thelogical address space 136 corresponding to thefirst portion 1952 and comprising the range defined by thesecond portion 1954. Similarly, when allocating a “nameless LID” (e.g., any available LID selected by the storage layer 130), thestorage layer 130 may return only the first portion of the allocated LID. In some embodiments, when a client requests a LID using the first portion and the second portion, thestorage layer 130 extracts the first portion from the requested LID, and allocates a LID corresponding to the first portion to the client (if possible). Advantageously, the disclosed embodiments support such a large number of addresses for the second portion over such a high number of contiguous addresses that storage requests that cross a LID boundary are anticipated to be very rare. In certain embodiments, thestorage layer 130 may even prevent allocations that cross LID boundaries (as used herein, a LID boundary is between two contiguous LIDs, the first being the last addressable LID in a second portion of a LID and the second being the first addressable LID in a next successive first portion of a LID). If the request crosses a boundary between pre-determined LID ranges, thestorage layer 130 may return an alternative LID range that is properly aligned to the LID segmentation scheme, return an error, or the like. In other embodiments, if the request crosses a boundary between pre-determined LID ranges, thestorage layer 130 may allocate both LIDs (if available). - As described above, the
storage layer 130 may be leveraged by thestorage clients 116 for logical allocations, physical storage bindings, physical storage reservations, crash-recovery, data integrity, and the like.FIG. 20A is a block diagram depicting a filesystem storage client 2016 leveraging thestorage layer 130 to perform file system operations. - The file
system storage client 2016 accesses thestorage layer 130 via thevirtual storage interface 132 to allocate LIDs for storage entities, such as file system objects (e.g., files). In some embodiments, when a new file is created, the filesystem storage client 2016 queries thestorage layer 130 for a LID. The allocation request may be implemented as described above. If the requested LIDs can be allocated, thestorage layer 130 returns an allocated LID to the filesystem storage client 2016. The LID may be returned as a LID and an offset (indicating an initial size for the file), a LID range, a first portion of a LID, or the like. TheFIG. 20A example shows thestorage layer 130 implementing a segmented LID range and, as such, thestorage layer 130 may return the first portion of aLID 2062 in response to an allocation request. - In some embodiments, the file
system storage client 2016 may implement a fast and efficient mapping between LIDs and storage entities. For example, when the first portion of the LID is sufficiently large, the filesystem storage client 2016 may hash file names into LID identifiers (into hash codes of the same length as the first portion of the LID 2062). When a new file is created, the filesystem storage client 2016 hashes the file name to generate the first portion of theLID 2062 and issues a request to thestorage layer 130 to allocate the LID. If the LID is unallocated (e.g., no hash collisions have occurred), thestorage layer 130 may grant the request. The filesystem storage client 2016 may not need to maintain an entry in the file system table 2060 for the new file (or may only be required to maintain an abbreviated version of a table entry 2061), since theLID 2062 can be derived from the file name. If a name collision occurs, thestorage layer 130 may return an alternative LID, which may be derived from the hash code (or file name), which may obviate the need for the file system table 2060 to maintain the entire identifier. - The file
system storage client 2016 may maintain a file system table 2060 to associate file system objects (e.g., files) with corresponding LIDs in thelogical address space 136 of thestorage layer 130. In some embodiments, the file system table 2060 is persisted on thenon-volatile storage device 120 at a pre-determined LID. Accordingly, the filesystem storage client 2016 may delegate crash recovery and/or data integrity for the file system table 2060 (as well as the file system objects themselves) to thestorage layer 130. - The file
system storage client 2016 may reference files using the file system table 2060. To perform storage operations on a particular file, the filesystem storage client 2016 may access afile system entry 2061 corresponding to the file (e.g., using a file name lookup or another identifier, such as an I-node, or the like). Theentry 2061 comprises a LID of the file, which, in theFIG. 20C example, is a first portion of aLID 2062. The filesystem storage client 2016 performs storage operations using thefirst portion 2062 of the LID along with an offset (the second portion 2064). The filesystem storage client 2016 may combine the file identifier (first portion 2062) with an offset 2064 to generate afull LID 2070. TheLID 2070 may be sent to thestorage layer 130 in connection with requests to perform storage operations within thelogical address space 136. - The
storage layer 130 performs storage operations using thestorage metadata 135. Storage requests to persist data in thelogical address space 136 comprise thestorage layer 130 causing the data to be stored on thenon-volatile storage device 120 in a contextual, log-based format, as disclosed above. Thestorage layer 130 updates thestorage metadata 135 to associate LIDs in thelogical address space 136 with media storage locations on the non-volatile storage comprising data stored in the storage operation. - Storage operations to access persisted data on the non-volatile storage device may comprise the storage client, such as the file
system storage client 2016 requesting the data associated with one ormore LIDs 2070 in the logical address space. The filesystem storage client 2016 may identify the LIDs using the file system table 2060 or another datastructure. In response to the request, thestorage layer 130 determines the media storage location of theLIDs 2070 on thenon-volatile storage device 120 using thestorage metadata 135, which is used to access the data. - In some embodiments, storage clients, such as the file
system storage client 2016 may deallocate a storage entity. Deallocating a storage entity may comprise issuing a deallocation request to thestorage layer 130 via thevirtual storage interface 132. In response to a deallocation request, thestorage layer 130 removes the deallocated LIDs from thestorage metadata 135 and/or may mark the deallocated LIDs as unallocated. Thestorage layer 130 may also invalidate the media storage locations corresponding to the deallocated LIDs in thestorage metadata 135 and/or the non-volatile storage device 120 (e.g., using a reverse map, as disclosed above). A deallocation may be a “hint” to agroomer 370 of thenon-volatile storage device 120 that the media storage locations associated with the deallocated LIDs are available for recovery. - The
groomer 370, however, may not actually remove the data for some time after the deallocation request issued. Accordingly, in some embodiments, thevirtual storage interface 132 may provide an interface through which storage clients may issue a deallocation “directive” (as opposed to a hint). The deallocation directive may configure thestorage layer 130 to return a pre-determined value (e.g., “0” or “NULL”) for subsequent accesses to the deallocated LIDs (or the media storage locations associated therewith), even if the data is still available on thenon-volatile storage device 120. The pre-determined value may continue to be returned until the LIDs are reallocated for another purpose. - In some embodiments, the
storage layer 130 implements a deallocation directive by removing the deallocated LIDs from the storage metadata and returning a pre-determined value in response to requests for LIDs that are not allocated in thestorage metadata 135 and/or are not bound (e.g., are not associated with valid data on the non-volatile storage device). Alternatively, or in addition, in response to a deallocation directive, thestorage layer 130 may cause the corresponding media storage locations on thenon-volatile storage device 120 to be erased. Thestorage layer 130 may provide the filesystem storage client 2016 with an acknowledgement when the erasure is complete. Since erasures make take a significant amount of time to complete relative to other storage operations, the acknowledgement may be issued asynchronously. -
FIG. 20B is a block diagram depicting anotherembodiment 2001 of storage client leveraging thestorage layer 130. In theFIG. 20B example, thestorage layer 130 presents alogical address space 136 to the filesystem storage client 2016 and maintainsstorage metadata 135 as described above. In addition, thestorage layer 130 maintains name-to-LID association metadata 2036. Thismetadata 2036 may comprise associations between LIDs in thelogical address space 136 and storage entity identifiers ofstorage clients 116. For example, a filesystem storage client 2016, may request LID allocations using a storage entity identifier or name 2071 (e.g., file name) as opposed to a LID. The filesystem storage client 2016 relies on thestorage layer 130 to select an available LID (as opposed to specifying a particular LID), is referred to as a “nameless write” or “nameless allocation.” In response, thestorage layer 130 allocates a LID for the filesystem storage client 2016 within thelogical address space 136. In addition, thestorage layer 130 may maintain an association between the allocated LID and thename 2071 in name-to-LID metadata 2036. Filesystem storage clients 2016 may request subsequent storage operations on the storage entity using the name 2071 (along with an offset, if needed). The file system table 2060 of the filesystem storage client 2016 may be simplified sinceentries 2063 need only maintain the name of a file as opposed to the name and LID. In response to storage requests comprising aname 2071, thestorage layer 130 accesses the name-to-LID metadata 2036 to determine the LID associated with thename 2071 and implements the storage request as described above. - In some embodiments, the name-to-
LID metadata 2036 may be included with thestorage metadata 135. For example, entries in theindex 1804 ofFIG. 18 may be indexed by name in addition to (or in place of) a LID. Thestorage layer 130 may persist the name-to-LID metadata 2036 on thenon-volatile storage device 120, such that the integrity of themetadata 2036 is maintained despite invalid shutdown conditions. Alternatively, or in addition, the name-to-LID metadata 2036 may be reconstructed using the contextual, log-based data format on thenon-volatile storage device 120. -
FIG. 21 is a flow diagram of one embodiment of amethod 2100 for providing a storage layer. Atstep 2120, themethod 2100 presents alogical address space 136 for the non-volatile device to storage clients. Thelogical address space 136 may be defined independently of the non-volatile storage device. Accordingly, the logical capacity of the logical address space 136 (e.g., the size of thelogical address space 136 and/or the size of the virtual storage blocks thereof) may exceed the physical storage capacity of the non-volatile storage device. In some embodiments, thelogical address space 136 is presented via an application-programming interface (API) that is accessible to storage clients, such as operating systems, file systems, database applications, and the like. - At
step 2130, storage metadata is maintained. The storage metadata may track allocations of LIDs within thelogical address space 136, as well as bindings between LIDs and media storage locations of the non-volatile storage device. The metadata may further comprise indications of the remaining logical capacity of thelogical address space 136, the remaining physical storage capacity of the non-volatile storage device, the status of particular LIDs, and so on. - In some embodiments, the metadata is maintained in response to storage operations performed within the logical address space. The storage metadata is updated to reflect allocations of LIDs by storage clients. When storage clients persist data to allocated LIDs, bindings between the LIDs and the media storage locations comprising the data are updated.
- At step 2140, storage operations are performed using a log-based sequence. As described above, the storage layer 130 (and non-volatile storage device) may be configured to store data in a log-based format, such that an ordered sequence of storage operations performed on the storage device can be reconstructed in the event of an invalid shutdown (or other loss of storage metadata 135). The ordered sequence of storage operations allows storage clients to delegate crash recovery, data integrity, and other functionality to the
storage layer 130. -
FIG. 22 is a flow diagram of one embodiment of amethod 2200 for segmenting LIDs of a logical address space. - At
step 2220, themethod 2200 segments LIDs of alogical address space 136 into at least a first portion and a second portion. The segmentation of step 2230 may be performed as part of a configuration process of thestorage layer 130 and/or non-volatile storage device (e.g., when the device is initialized). Alternatively, or in addition, the segmentation ofstep 2220 may be performed in response to a request from a storage client. The storage client may request a particular type of LID segmentation, according to the storage requirements thereof. For example, if the storage client has a need to store a large number of relatively small storage entities, the storage client may configure the LID segmentation to dedicate a larger proportion of the LID to identification bits and a smaller proportion to offset bits. Alternatively, a storage client who requires a relatively small number of very large storage entities may configure themethod 2200 to implement a different type of segmentation that uses a larger proportion of the LID for offset bits (allowing for larger storage entities). - At step 2230, the
storage layer 130 uses the first portion of the LID to reference storage client allocations (e.g., as a reference for storage entities). Step 2230 may comprise reconfiguring the storage metadata to allocate LIDs using only the first portion of the LID (e.g., the upper X bits of a LID). The size of the first portion may determine the number of unique storage entities that can be expressed in the storage metadata (e.g., as 2̂X−1, where X is the number of bits in the first portion). Accordingly, a first portion comprising 32 bits may support approximately 2̂32 unique storage entities. The reconfiguration may simplify the storage metadata, since each entry may be identified using a smaller amount of data (only the first portion of the LID as opposed to the entire LID). - At step 2240, the
storage layer 130 uses the second portion of the LID as an offset into a storage entity. The size of the second portion may define the maximum size of a storage entity (under the current segmentation scheme). The size of a LID may be defined as the virtual block size times 2̂Y, where Y is the number of bits in the second portion. As discussed above, a virtual block size of 512 and second portion comprise 32 bits results in a maximum storage entity size of 2 TB. Step 2240 may comprise reconfiguring the storage metadata to reference LID to media storage location bindings using only the second portion of the LID. This may allow the storage metadata entries (e.g., entries in physical binding metadata) to be simplified, since the bindings can be expressed using a smaller number of bits. - At
step 2250, thestorage layer 130 uses the LID segmentation ofstep 2220 to allocate LIDs comprising contiguous logical address ranges in the logical address space.Step 2250 may comprise thestorage layer 130 allocating LIDs using only the first portion of the LID (e.g., the upper X bits). The allocated LID may comprise a contiguous logical address range corresponding to the number of bits in the second portion, as described above. - In some embodiments, allocating a LID at
step 2250 does not cause corresponding logical storage locations to be reserved of “bound” thereto. The bindings between allocated LIDs and media storage locations may not occur until the storage client actually performs storage operations on the LIDs (e.g., stores data in the LIDs). The delayed binding prevents the large, contiguous LID allocations from exhausting the physical storage capacity of the non-volatile storage device. -
FIG. 23 is a flow diagram of one embodiment of amethod 2300 for providing crash recovery and data integrity in astorage layer 130. Atstep 2320, thestorage layer 130 presents alogical address space 136 to one or more storage clients 116 (e.g., through the interface 138).Step 2330 may comprise maintainingmetadata 135 configured to associate LIDs in thelogical address space 136 with media storage locations on thenon-volatile storage device 120. - At step 2340, the
storage layer 130 causes data to be stored on the non-volatile storage device in a contextual, log-based format. As described above, the contextual, log-based formatting of the data is configured such that, in the event of an invalid shutdown, the data (and metadata pertaining thereto) can be reconstructed. - At
step 2350, thestorage layer 130 reconstructs data stored on the non-volatile storage device using the data formatted in the contextual, log-based format. As described above, the log-based format may comprise storing LID identifiers with data on the non-volatile storage device. The LID identifiers may be used to associate the data with LIDs in the logical address space 136 (e.g., reconstruct the storage metadata). Sequence indicators stored with the data on the non-volatile storage device are used to determine the most current version of data associated with the same LID; since data is written out-of-place, updated data may be stored on the non-volatile storage device along with previous, obsolete versions. The sequence indicators allow thestorage layer 130 to distinguish older versions from the current version. The reconstruction ofstep 2350 may comprise reconstructing the storage metadata, determining the most current version of data for a particular LID (e.g., identifying the media storage location that comprises the current version of the data), and so on. - At step 2360, the
storage layer 130 provides access to the reconstructed data to storage clients. Accordingly, the storage clients may delegate crash recovery and/or data integrity functionality to thestorage layer 130, which relieves the storage clients from implementing these features themselves. Accordingly, the storage clients can be simpler and more efficient. -
FIG. 24A is a flow diagram of one embodiment of amethod 2400 for servicing queries pertaining to the status of a LID.Step 2420 may comprise receiving a request pertaining to the status of a particular LID in thelogical address space 136 presented by thestorage layer 130. Alternatively, the request may pertain to thelogical address space 136 as a whole (e.g., a query for the remaining logical capacity of thelogical address space 136, or the like). Similarly, the query may pertain to the physical storage capacity of the non-volatile storage device, such as a query regarding the physical storage capacity that is bound to LIDs in the logical address space 136 (e.g., currently occupied), available physical storage capacity, and so on. - At step 2430, the
storage layer 130 accesses storage metadata to determine the status of the requested LID, logical capacity, physical storage capacity, or the like. The access may comprise identifying an entry for the LID in a logical-to-physical map, in an allocation index, or the like. If the particular LID falls within an entry in an allocation index and/or logical to physical index, thestorage layer 130 may determine that the LID is allocated and/or may determine whether the LID is bound to a media storage location. The access may further comprise, traversing a metadata index to identify unallocated LIDs, unused media storage locations, and so on. The traversal may further comprise identifying allocated (or unallocated) LIDs to determine current LID allocation (or unallocated LID capacity), to determine bound physical storage capacity, determine remaining physical storage capacity, or the like. At step 2440, thestorage layer 130 returns the status determined at step 2430 to thestorage client 116. -
FIG. 24B is a flow diagram of one embodiment of amethod 2401 for servicing queries pertaining to the status of a media storage location (or range of media storage locations) of a non-volatile storage device. - At
step 2421, thestorage layer 130 receives a request pertaining to the status of a particular media storage location on a non-volatile storage device. The media storage location may be associated with a LID in thelogical address space 136 presented by thestorage layer 130. Alternatively, the query may be “iterative” and may pertain to all media storage locations on the non-volatile storage device (e.g., a query regarding the status of all media storage locations on the device). Similarly, the query may pertain to the physical storage capacity of the non-volatile storage device, such as a query regarding the physical storage capacity that is bound to LIDs in the logical address space 136 (e.g., currently occupied), available physical storage capacity, and so on. - The query of
step 2421 may be useful in various different contexts. For example, in a RAID rebuild operation, a second non-volatile storage device may be configured to mirror the contents of a first non-volatile storage device. The data stored on the first logical storage device may be stored sequentially (e.g., in a contextual, log-based format). As such, the first non-volatile storage device may comprise “invalid” data (e.g., data was deleted, was made obsolete by a sequent storage operation, etc.). The query ofsetp 2421 may be issued by the second, non-volatile storage device to determine which media storage locations on the first, non-volatile storage device “exist” (e.g., are valid), and should be mirrored on the second non-volatile storage device. Accordingly, the query ofstep 2421 may be issued in the form of an iterator, configured to iterate over (e.g., discover) all media storage locations that comprise “valid data,” and the extent of the valid data. - Step 2431 comprises accessing storage metadata, such as the
index 1204 orreverse map 1222 described above in conjunction withFIG. 12 , to determine whether the specified media storage location comprises valid data and/or to determine the extent (or range) of valid data in the specified media storage location. At step 2441, thestorage layer 130 returns the status determined at step 2431 to the requester. - In some embodiments,
methods storage layer 130 may implement the conditional write if the specified LIDs do not exist (e.g., are not already allocated to another storage client), and the non-volatile storage comprises sufficient physical storage capacity to satisfy the request. Similarly, a conditional read may comprise a storage client requesting data from a particular set of LIDs. Thestorage layer 130 may implement the conditional read if the specified LIDs exist and are bound to valid data (e.g., are in storage metadata maintained by thestorage layer 130, and are bound to media storage locations). In other examples, thestorage layer 130 provides for “nameless” reads and writes, in which a storage client presents identifier, and thestorage layer 130 determines the LIDs associated with the identifier, and services the storage request accordingly (e.g., “nameless” writes as described above). In this case, thestorage layer 130 offloads management of identifier-to-LID mappings for the storage client. - In some embodiments, the storage metadata maintained by the storage layer may provide for designating certain portions of the
logical address space 136 as being “temporary” or “ephemeral.” As used herein, an ephemeral address range is an address range that is set to be automatically deleted under certain conditions. The conditions may include, but are not limited to: a restart operation, a shutdown event (planned or unplanned), expiration of a pre-determined time, resource exhaustion, etc. - Data may be identified as ephemeral in storage metadata maintained by the
storage layer 130, in metadata persisted to the solid-state storage media, or the like. Referring back toFIG. 12 , anentry 1214 in the index 1204 (forward map) may be identified as ephemeral in the metadata 1219 thereof. When the storage layer persists theindex 1204 as part of a shutdown, restart, or other operation, entries that include an ephemeral indicator may be omitted, effectively “invalidating” the corresponding data. Alternatively, or in addition, thestorage layer 130 may designate an a portion of the largelogical address space 136 as comprising ephemeral data. Any entries in the ephemeral address range may be designated as ephemeral in the index without additional modifications to entry metadata. - In some embodiments, an ephemeral indicator may be included in a media storage location on the non-volatile storage media.
FIG. 25A depicts one example of a contextual data format (e.g., packet format) 2500, which may be used to store adata segment 2520 on a non-volatile storage media. As described above, in some embodiments,packets 2500 may be subject to further processing before being persisted on a media storage location (e.g., packets may be encoded into ECC codewords by an ECC generator 304 as described above). - The
packet format 2500 may comprisepersistent metadata 2564, which may includelogical interface metadata 2565, as described above. Thepacket format 2500 may comprise and/or be associated with asequence indicator 2518, which may include, but is not limited to a sequence number, timestamp, or other suitable sequence indicator. Thesequence indicator 2518 may be included in the persistent metadata 2564 (e.g., as another field, not shown). Alternatively, or in addition, asequence indicator 2518 may be stored elsewhere on thenon-volatile storage media 122. For example, asequence indicator 2518 may be stored on a page (or virtual page) basis, on an erase-block basis, or the like. As described above, each logical erase block may be marked with a respective marking, and packets may be stored sequentially therein. Accordingly, the sequential order of packets may be determined by a combination of the logical erase block sequence indicators (e.g., indicators 2518) and the sequence ofpackets 2500 within each logical erase block. - The
storage layer 130 may be configured to reconstruct the storage metadata (e.g., index, etc.) using the contextual, log-based formatted data stored on thenon-volatile storage media 122. Reconstruction may comprise the storage layer 130 (or another process) readingpackets 2500 formatted in the contextual, log-based format from media storage locations of the solid-state storage media 122. As eachpacket 2500 is read, a corresponding entry in the storage metadata (e.g., the indexes described above) may be created. The LID range associated with the entry is derived from the LID 2516 in the header 2512 of the packet. Thesequence indicator 2518 associated with the data packet may be used to determine the most up-to-date version of data 2514 for a particular LID. As described above, thestorage layer 130 may write data “out-of-place” due to, inter alia, wear leveling, write amplification, and other considerations. Accordingly, data intended to overwrite an existing LID may be written to a different media storage location than the original data. The overwritten data is “invalidated” as described above; this data, however, remains on the solid-state storage media 122 until the erase block comprising the data is groomed (e.g., reclaimed and erased). The sequence identifier may be used to determine which of two (or more) contextual, log-basedpackets 2500 corresponding to the same LID comprises the current, valid version of the data. - In some embodiments, and as illustrated in
FIG. 25A , the header 2512 includes anephemeral indicator 2568. When reconstructing the storage metadata, theephemeral indicator 2568 may be used to identify data that should be invalidated (e.g., deleted). Invalidating ephemeral data may comprise omitting the LIDs 2514 referenced in thelogical interface 2565 of thepacket 2500, marking thedata segment 2520 as invalid in a reverse-index, and so on. Similarly, if data marked as ephemeral is more “up-to-date” than other data per thesequence indicator 2518, the original, “older” data may be retained and the ephemeral data may be ignored. - The
storage layer 130 may provide an API through which storage clients may designate certain LID ranges (or other identifiers) as being ephemeral. Alternatively, or in addition, thestorage layer 130 may implement higher-level interfaces using ephemeral data. For example, a multi-step atomic write (e.g., multi-block atomic write), may be implemented by issuing multiple write requests, each of which designates the data as being ephemeral. When all of the writes are completed, the ephemeral designation may be removed. If a failure occurs during the multi-step atomic write, data that was previously written can be ignored (no “roll-back” is necessary), since the data will be removed the next time the device is restarted. A similar approach may be used to provide support for transactions. As used herein, a “transaction” refers to a plurality of operations that are completed as a group. If any one of the transaction operations is not completed, the other transaction operations are rolled-back. As a transaction are implemented, the constituent storage operations may be marked as ephemeral. Successful completion of the transaction comprises removing the ephemeral designation from the storage operations. If the transaction fails, the ephemeral data may be ignored. - In some embodiments, ephemeral data may be associated with a time-out indicator. The time-out indicator may be associated with the operation of a storage reclamation process, such as a groomer. When the groomer evaluates a storage division (e.g., erase block, page, etc) for reclamation, ephemeral data therein may be treated as invalid data. As such, the ephemeral data may be omitted during reclamation processing (e.g., not considered for storage division selection and/or not stored in another media storage location during reclamation). In some embodiments, ephemeral data may not be treated as invalid until its age exceeds a threshold. The age of ephemeral data may be determined by the
sequence indicator 2518 associated therewith. When the age of ephemeral data exceeds a pre-determined threshold, it may be considered to be part of a failed transaction, and may be invalidated as described above. The threshold may be set on a per-packet basis (e.g., in the header 2512), may be set globally (through an API or setting of the storage layer), or the like. - As described above, removing an ephemeral designation may comprise updating storage metadata (e.g., index 1204) to indicate that a particular entry is no longer to be considered to be ephemeral. In addition, the
storage layer 130 may update the ephemeral indicator stored on the solid-state storage media (e.g., inpersistent metadata 2564 of a packet 2500). However, if the solid-state storage media is write-out-of-place, it may not be practical to overwrite (or rewrite) these indicators. Therefore, in some embodiments, thestorage layer 130 persists a “note” on the solid-state storage media (e.g., writes a persistent note to a media storage location of the solid-state storage media). As used herein, a persistent note refers to a “metadata note” that is persistently stored on the solid-state storage media. Removing the ephemeral designation may comprise persisting a metadata note indicating the removal to the solid-state storage media. As depicted inFIG. 25B , apersistent note 2501 may comprise areference 2511 that identifies one ormore packets 2500 on a media storage location. Thereference 2511 may comprise any suitable identifying information including, but not limited to: a logical interface, a LID, a range, a media storage location identifier, a sequence indicator, or the like. Thepersistent note 2501 may also include adirective 2513, which, in theFIG. 25B example, may be a directive to remove an ephemeral designation from the identified packets. Additional details regarding persistent notes are disclosed in U.S. patent application Ser. No. 13/330,554, entitled “Apparatus, System, and Method for Persistent Metadata,” filed Dec. 19, 2011, and which is hereby incorporated by reference. - In some embodiments, the
logical address space 136 presented by thestorage layer 130 may include an “ephemeral” LID range. As used herein, an ephemeral LID range comprises references to ephemeral data (e.g., LIDs that are to be “auto-deleted” on restart, or another condition). This segmentation may be possible due to thestorage layer 130 maintaining a large (e.g., sparse)logical address space 136, as described above. Thestorage layer 130 maintains ephemeral data in the ephemeral logical address range, as such, each entry therein is considered to be ephemeral. An ephemeral indicator may also be included in contextual, log-based formatted data bound to the LIDs within the ephemeral range. -
FIG. 25C depicts one example of a method for using ephemeral designations to implement a multi-step operation. Atstep 2520, themethod 2503 may start and be initialized as described above. Atstep 2530, the method receives a request to allocate a range of LIDs in a logical address space. The request may indicate that the LIDs are to be designated as ephemeral. The request may be received from a storage client (e.g., an explicit allocation request). Alternatively, or in addition, the request may be made as part of a higher-level API provided by thestorage layer 130, which may include, but is not limited to: a transaction API, a clone API, move API, deduplication API, an atomic-write API, or the like. - At
step 2540, the requested LIDs are allocated as described above (unless not already allocated by another storage client).Step 2540 may further comprise updating storage metadata to indicate that the LIDs ephemeral, which may include, but is not limited to: setting an indicator in a entry for the LIDs in the storage metadata (e.g., index), allocating the LIDs in an “ephemeral range” of the index. - At
step 2550, the storage client may request one or more persistent storage operations on the ephemeral LIDs ofstep 2540. The storage operations may comprise a multi-block atomic write, operations pertaining to a transaction, a snapshot operation, a clone (described in additional detail below), or the like.Step 2550 may comprise marking contextual, log-based data associated with the persistent storage operations as ephemeral as described above (e.g., in a header of a packet comprising the data). - At
step 2560, if the method receives a request to remove the ephemeral designation, the flow continues to step 2562; otherwise, the flow continues to step 2570. The request ofstep 2560 may be issued by a storage client and/or the request may be part of a higher-level API as described above. For example, the request may be issued when the constituent operations a transaction or atomic operation are complete. - At
step 2562, the ephemeral designation applied atsteps Step 2562 may comprise removing metadata indicators from storage metadata, “folding” the ephemeral range into a “non-ephemeral range” of the storage metadata index, or the like (folding is described in additional detail below).Step 2562 may further comprising storing one or more persistent notes on the non-volatile storage media that remove the ephemeral designation from data corresponding to the formerly ephemeral data as described above. - At
step 2570, themethod 2500 may determine whether the ephemeral data should be removed. If not, the flow continues back tostep 2560; otherwise, the flow continues to step 2780. At step 2780, the ephemeral data is removed (or omitted) when the storage metadata is persisted (as part of a shutdown or reboot operation). Alternatively, or in addition, data that is designated as ephemeral on the non-volatile storage media may be ignored during a reconstruction process. - At step 2790, the flow ends until a next request is received, at which point the flow continues at
step 2530. -
FIG. 26 depicts one example of a method for reconstructing storage metadata from data stored on a non-volatile storage medium in a contextual, log-based format.Step 2620 may comprise receiving a request to reconstruct storage metadata from the contents of a non-volatile storage medium or device. The request may be received in response to storage metadata maintained by the storage layer 130 (or another entity) being lost or out-of-sync with the contents of the physical storage media. For example, portions of the storage metadata described herein (e.g., theindex 1204 and/or reverse map 1222) may be maintained in volatile memory. In an invalid shutdown, the contents of the volatile memory may be lost before the storage metadata can be stored in non-volatile storage. In another example, a second storage device may be configured to mirror the contents of a first storage device; accordingly, the second storage device may maintain storage metadata describing the contents of the first storage device. The second storage device may lose communication with the first storage device and/or may need to be rebuilt (e.g., initialized). The initialization may comprise reconstructing storage metadata from the contents of the first storage device (e.g., through queries to the first storage device as described above in conjunction withFIG. 24B ). - At
step 2630, the method iterates over media storage locations of the storage device. The iteration may comprise accessing a sequence of media storage locations on the non-volatile storage medium, as described above in conjunction withFIG. 23 . - At
step 2640, for each media storage location, themethod 2600 access data formatted in the contextual, log-based format described above. Themethod 2600 may reconstruct the storage metadata using information determined from the contextual, log-based data format on thenon-volatile storage media 122. Using the contextual, log-based data format, themethod 2600 may determine the LIDs associated with the data, may determine whether the data is valid (e.g., using persistent notes and/or sequence indicators as described above), and so on. Alternatively,step 2640 may comprise issuing queries to another storage device to iteratively determine which media storage locations comprise valid data. The iterative query approach (described above in conjunction withFIG. 24B ) may be used to mirror a storage device. - In addition, at
step 2650, themethod 2600 determines whether a particular data packet is designated as being ephemeral. The determination may be based on an ephemeral indicator in a header of the packet. The determination may also comprise determining whether a persistent note that removes the ephemeral designation exists (e.g., a persistent note as described above in conjunction withFIG. 25B ). Accordingly,step 2650 may comprise themethod 2650 maintaining the metadata for the packet in a temporary (e.g., ephemeral) location, until the iteration ofstep 2630 completes and themethod 2600 can determine whether a persistent note removing the ephemeral designation exists. - If
step 2650 determines that the data is ephemeral, the flow continues to step 2660; otherwise, the flow continues to step 2670. Atstep 2660, themethod 2600 removes the ephemeral data. Removing the data may comprise omitting LIDs associated with the data from storage metadata (e.g., theindex 1204 described above), marking the media storage location as “invalid” and available to be reclaimed (e.g., in the reverse map 1222), or the like. - At
step 2670, the method reconstructs the storage metadata as described above. In some embodiments,step 2670 may further comprise determining whether the data is valid (as described above in conjunction withFIG. 24B ). If the data is valid, themethod 2600 may be configured to perform further processing. For example, if themethod 2600 is being used to construct a mirror of another storage device,step 2670 may comprise transferring the valid data to the mirror device. - In some embodiments, the
storage layer 130 may provide an API to order storage operations performed thereon. For example, thestorage layer 130 may provide a “barrier” API to determine the order of operations. As used herein, a “barrier” refers to a primitive that enforces an order of storage operations. A barrier may specify that all storage operations that were issued before the barrier are completed before the barrier, and that all operations that were issued after the barrier complete after the barrier. A barrier may mark a “point-in-time” in the sequence of operations implemented on the non-volatile storage device. - In some embodiments, a barrier is persisted to the non-volatile storage media as a persistent note. A barrier may be stored on the non-volatile storage media, and may, therefore, act as a persistent record of the state of the non-volatile storage media at a particular time (e.g., a particular time within the sequence of operations performed on the non-volatile storage media). The
storage layer 130 may issue an acknowledgement when all operations issued previous to the barrier are complete. The acknowledgement may include an identifier that specifies the “time” (e.g., sequence pointer) corresponding to the barrier. In some embodiments, thestorage layer 130 may maintain a record of the barrier in the storage metadata maintained thereby. - Barriers may be used to guarantee the ordering of storage operations. For example, a sequence of write requests may be interleaved with barriers. Enforcement of the barriers may be used to guarantee the ordering of the write requests. Similarly, interleaving barriers between write and read requests may be used to remove read before write hazards.
- Barriers may be used to enable atomic operations (similarly to the ephemeral designation described above). For example, the
storage layer 130 may issue a first barrier as a transaction is started, and then issue a second barrier when complete. If the transaction fails, thestorage layer 130 may “roll back” the sequence of storage operations between the first and second barriers to effectively “undo” the partial transaction. Similarly, a barrier may be used to obtain a “snapshot” of the state of the non-volatile storage device at a particular time. For instance, thestorage layer 130 may provide an API to discover changes to the storage media that occurred between two barriers. - In another example, barriers may be used to synchronize distributed storage systems. As described above, a second storage device may be used to mirror the contents of a first storage device. The first storage device may be configured to issue barriers periodically (e.g., every N storage operations). The second storage device may lose communication with the first storage device for a certain period of time. To get back in sync, the second storage device may transmit its last barrier to the first storage device, and then may mirror only those changes that occurred since the last barrier.
- Distributed barriers may also be used to control access to and/or synchronize shared storage devices. For example, storage clients may be issued a credential that allows access to a particular range of LIDs (read only access, read/write, delete, etc.). The credentials may be tied to a particular point or range in time (e.g., as defined by a barrier). As the storage client interacts with the distributed storage device, the credential may be updated. However, if a storage client loses contact with the distributed storage device, the credential may expire. Before being allowed access to the distributed storage device, the client may first be required to access a new set of credentials and/or ensure that local data (e.g., cached data, etc.), is updated accordingly.
-
FIG. 27 is a flow chart of one embodiment of a method for providing barriers instorage system 102. At step 2710, themethod 2700 starts and is initialized as disclosed herein. Atstep 2720, a request to issue a barrier is received. The request may be received from a storage client and/or as part of a high-level API provided by the storage layer 130 (e.g., an atomic write, transaction, snapshot, or the like). - At
step 2730, themethod 2700 enforces the ordering constraints of the barrier. Accordingly,step 2730 may comprise causing all previously issued storage requests to complete.Step 2730 may further comprise queuing all subsequent requests until the previously issued requests complete, and the barrier is acknowledged (at step 2740). - At
step 2740, themethod 2700 determines if the ordering constraints are met, and if so, the flow continues to step 2750; otherwise, the flow continues atstep 2730. - At
step 2750, the barrier is acknowledged, which may comprise returning a current “time” (e.g., sequence indicator) at which the operations issued before the barrier were completed.Step 2750 may further comprise storing a persistent note of the barrier on the non-volatile storage. Atstep 2760, the method resumes operation on storage requests issued subsequent to the barrier atstep 2720. Atstep 2770, the flow ends until a next request for a barrier is received. - In some embodiments, the
storage layer 130 leverages thelogical address space 136 to manage “logical copies” of data (e.g., clones). As used herein, a “clone” or “logical cloning operation” refers to replicating a range (or set of ranges) of LIDs within thelogical address space 136 and/or other addressing system. The cloned range may comprise different set(s) of LIDs, which may be bound to the same media storage locations as the original LIDs (source LIDs), allowing two or more LIDs and/or LID ranges to reference the same data. Clone operations may be used to perform higher-level operations, such as deduplication, snapshots, logical copies, atomic operations (e.g., atomic writes, transactions, etc.), and the like. - Creating a clone may comprise modifying the logical interface of data stored in a
non-volatile storage device 120 in order to, inter alia, allow the data to be referenced by use of two or more different LIDs and/or LID extents. Accordingly, creating a clone of a LID (or set of LIDs) may comprise allocating new LIDs in the logical address space 136 (or dedicated portion thereof), and associating the new LIDs with the same media storage location(s) as the original LIDs in thestorage metadata 135. Creating a clone may, therefore, comprise adding one or more entries to aforward index 1204 configured to associate the new set of LIDs with the data. -
FIG. 28A depicts one embodiment of a range clone operation. The range clone operation ofFIG. 28A may be implemented in response to a request from astorage client 116 and/or as part of a higher-level API provided by thestorage layer 130, such as an atomic operation, snapshot, logical copy, or the like. In some embodiments, theinterface 138 of thestorage layer 130 may be configured to provide interfaces and/or APIs for performing clone operations. -
FIG. 28A depicts one embodiment of anindex 2804 before the clone is created. Theindex 2804 ofFIG. 28A comprises, inter alia, anentry 2814 that binds LIDs 1024-2048 to media storage locations 3453-4477. Other entries are omitted fromFIG. 28A to avoid obscuring the details of the depicted embodiment. As disclosed herein, theentry 2814, and the bindings thereof, may define alogical interface 2811A through whichstorage clients 116 may reference the corresponding data (e.g., data segment 2812);storage clients 116 may access and/or reference the data segment 2812 (and/or portions thereof) through thestorage layer 130 by use of the LIDs 1024-2048. - As disclosed herein, the
storage controller 140 may be configured to store data in a contextual format on astorage device 120. The contextual format may comprise associating data with corresponding persistent metadata that defines and/or references, inter alia, the logical interface of the data. In theFIG. 28A embodiment, the data stored at media addresses 3453-4477 comprises apacket format 2818 that includespersistent metadata 2864. Thepersistent metadata 2864 may comprise the logical interface of the data segment 2812 (logical interface metadata 2865), and as such, may associate thedata segment 2812 with the LIDs 1024-2048 of theentry 2814. As disclosed herein, thecontextual data format 2818 may enable the index 2804 (and/or other metadata 135) to be reconstructed from the contents of thestorage device 120; in theFIG. 28A embodiment, theentry 2814 may be reconstructed by associating the data stored at media addresses 3453-4477 with the LIDs 1024-2048 identified in thepersistent metadata 2864 of thepacket 2818. Although,FIG. 28A depicts asingle packet 2818, the disclosure is not limited in this regard. In some embodiments, the data of theentry 2814 may be stored in multiple,different packets 2818, each comprising respective persistent metadata 2864 (e.g., a separate packet for each media storage location, etc.). - Creating a clone of the
entry 2814 may comprise allocating one or more LIDs in thelogical address space 136, and binding the new LIDs to thesame data segment 2812 as the entry 2814 (e.g., the data segment at media storage location 3453-4477). Creating the clone may therefore, comprise modifying thestorage metadata 135 without requiring theunderlying data segment 2812 to be copied and/or replicated. -
FIG. 28B depictsstorage metadata 135 of one embodiment of a clone operation. Cloning the LIDs 1024-2048 may comprise allocating a new set of LIDs within thelogical address space 136, and associating the new set of LIDs with the media addresses of LIDs 1024-2048. As depicted inFIG. 28B , cloning the LIDs 1024-2048 may comprise allocating a new set of LIDs 6144-7168 (represented in index entry 2824), and associating the LIDs with the media addresses 3453-4477 ofentry 2814. The data stored at media addresses 3453-4477 may, therefore, be referenced through both the LIDs 1024-2048 ofentry 2814 and the LIDs 6144-7168 ofentry 2824. Accordingly, the clone operation modifies thelogical interface 2811B of thedata segment 2812, such that the data can be referenced through either the LIDs ofentry 2814 and/or the LIDs ofentry 2824. - The modified
logical interface 2811B of the data may be inconsistent with the contextual format of thedata segment 2812 on thestorage device 120. As disclosed above, thepersistent metadata 2864 of thedata segment 2812 compriseslogical interface metadata 2865 that associates thedata segment 2812 with LIDs 1024-2048 of thelogical interface 2811A, and not LIDs 6144-7168 of the modifiedlogical interface 2811B. The contextual format of thedata 2818 may be updated to be consistent with the modifiedlogical interface 2811B (e.g., updated to associate the data with LIDs 1024-2048 and 6144-7168, as opposed to only LIDs 1024-2048). - Updating the contextual format of the
data segment 2812 may comprise updating thepersistent metadata 2864 on thestorage device 120. If thestorage device 120 is a random-access, write-in-place, storage device, thepersistent metadata 2864 may be updated by overwriting and/or updating thepersistent metadata 2864 without relocating thedata segment 812 and/orpacket 2818. In other embodiments, however, thestorage controller 140 may be configured to append data to a log and/or update data out-of-place on thestorage device 120. In such embodiments, updating the contextual format of thedata segment 2812 may comprise relocating and/or rewriting thedata segment 2812 on thestorage device 120, which may be a time-consuming processes, and may be particularly inefficient if thedata segment 2812 is large and/or the clone comprises a large number and/or of LIDs. Therefore, in some embodiments, thestorage layer 130 may defer updating the contextual format of cloned data and/or may update the contextual format in one or more background operations. In the meantime, thestorage layer 130 may be configured to provide access to the data while stored in the inconsistentcontextual format 2818. - The
storage layer 130 may be configured to acknowledge completion of clone operations before contextual format of the corresponding data is updated. The data may be subsequently rewritten (e.g., relocated) in the updated contextual format on thestorage device 120 in another process, which may be outside of the “critical path” of the clone operation and/or other storage operations (e.g., in one or more background operations). In some embodiments, thedata segment 2812 is relocated using thegroomer 370, or the like. Accordingly,storage clients 116 may be able to access thedata segment 2812 through the modifiedlogical interface 2811B (both 1024-2048 and 6144-7168) without waiting for the contextual format of thedata segment 2812 to be updated to be consistent with the modifiedlogical interface 2811B. - Until the contextual format of the
data segment 2812 is updated on thenon-volatile storage media 122, the modifiedlogical interface 2811B of thedata segment 2812 may exist only in theindex 2804. Therefore, if theindex 2804 is lost, due to, inter alia, power failure or data corruption, the clone operation may not be reflected in the reconstructed storage metadata 135 (the clone operation may not be persistent and/or crash safe). In a metadata reconstruction operation, the contextual format of the data at 3453-4477 is accessed, thelogical interface metadata 2865 of thepersistent metadata 2864 indicates that the data is associated only with LIDs 1024-2048, not 1024-2048 and 6144-7168. Therefore, onlyentry 2814 will be reconstructed (as inFIG. 28A ), and 2824 will be omitted; moreover, subsequent attempts to access thedata segment 2812 through the modifiedlogical interface 2811B (e.g., through 6144-7168) may fail. - In some embodiments, a clone operation may further comprise storing a persistent note on the
storage device 120 to make a clone operation persistent and/or crash safe. The persistent note may comprise an indication of the modified logical interface of the data. In theFIG. 28B embodiment, thepersistent note 2866 corresponding to the depicted clone operation may comprise apersistent indicator 2868 that associates the data stored at media addresses 3453-4477 with both LID ranges 1024-2048 and 6144-7168. During reconstruction of theindex 2804, thepersistent note 2866 may indicate that thedata segment 2812 is associated with both LID ranges, such that bothentries storage layer 130 may acknowledge completion of a clone operation in response to updating the metadata 135 (e.g., creating the index entry 2824) and storing thepersistent note 2866 on thestorage device 120. Thepersistent note 2866 may be invalidated and/or marked for erasure in response updating the contextual format of thedata segment 2812 to be consistent with thelogical interface 2811B (e.g., relocating thedata segment 2812 by thegroomer module 370 as disclosed above). - As disclosed above, the
storage controller 140 may be configured to store thedata segment 2812 in an updated contextual format that is consistent with the modifiedlogical interface 2811B. In some embodiments, the updated contextual format may comprise associating thedata segment 2812 with the LIDs of bothentries 2814 and 2824 (e.g., both LIDs 1024-2048 and 6144-7168).FIG. 28C depicts one embodiment of an updated contextual format (packet 2888) of thedata segment 2812. - As illustrated in the
FIG. 28C embodiment, thelogical interface metadata 2865 of the updatedpacket 2888 indicates that thedata segment 2812 is associated with both LID ranges 1024-2048 and 6144-7168 (as opposed to only 1024-2048). The updated contextual data format (packet 2888) may be written out-of-place, at different media addresses (64432-65456), which is reflected in theentries index 2804. In response to updating the contextual format of thepacket 2888, the corresponding persistent note 2866 (if any) may be invalidated (removed and/or marked for subsequent removal) from thestorage device 120. In some embodiments, removing thepersistent note 2866 may comprise issuing one or more TRIM messages indicating that thepersistent note 2866 no longer needs to be retained on thestorage device 120. Alternatively, or in addition, portions of theindex 2804 may be stored in a persistent crash safe storage location (e.g.,non-transitory storage media 114 and/or the storage device 120). In response to persisting the storage metadata 2804 (e.g., theentries 2814 and 2824), thepersistent note 2866 may be removed, even if thecontextual format 2818 of the data has not yet been updated on thestorage device 120. - Clones may operate in different modes. In a “copy on write” mode, storage operations that occur after creating the clone may cause the clones to diverge from one another (e.g., the
entries FIG. 28D depicts one embodiment of a storage operation performed within a cloned range in a copy-on-write mode. In theFIG. 28D embodiment, thestorage controller 140 has written thedata segment 2812 in the updated contextual data format (packet 2888) that is configured to associate the data with both LID ranges 1024-2048 and 6144-7168 (as depicted inFIG. 28C ). Astorage client 116 may then issue one or more storage requests to modify and/or overwrite data corresponding to the LIDs 6657-7168. In theFIG. 28D embodiment, the storage request comprises modifying and/or overwriting LIDs 6657-7168. In response to the storage request, thestorage controller 140 may store the new and/or modified data on thestorage device 120, which may comprise appending adata segment 2852 to the log in a contextual format (packet 2889). Thepacket 2889 may associate thedata segment 2852 with the LIDs 6657-7424 as disclosed herein (e.g., by use ofLID indicators 2875 withinpersistent metadata 2874 of the packet 2889). Theindex 2804 may be updated to associate the LIDs 6657-7424 with thedata segment 2852, which may comprise splitting theentry 2824 into anentry 2852 configured to continue to reference the unmodified portion of the data in thedata segment 2812 and anentry 2833 that references thenew data segment 2852 stored at media addresses 78512-79024. In the copy-on-write mode depicted inFIG. 28D , theentry 2814 corresponding to the LIDs 1024-2048 may continue to reference thedata segment 2812 at media addresses 64432-65456. Although not depicted inFIG. 28D , modifications to within the LID range 1024-2048 may result in similar divergent changes affecting theentry 2814. Moreover, the storage request(s) are not limited to modifying and/or overwriting data. Other operations may comprise expanding a LID range (appending data), removing LIDs (deleting and/or trimming data), and/or the like. - In some embodiments, the
storage controller 130 may support other clone modes, such as a “synchronized clone” mode. In a synchronized clone mode, changes made within a cloned LID range may be reflected in one or more other, corresponding LID ranges. In theFIG. 28D embodiment, implementing the described storage operations in a “synchronized clone” mode may comprise updating theentry 2814 to reference thenew data segment 2852, as disclosed herein, which may comprise inter alia, splitting theentry 2814 into an entry configured to associate LIDs 1024-1536 with theoriginal data segment 2812, and adding an entry configured to associate the LIDs 1537-2048 with thenew data segment 2852. - Referring back to the copy-on-write embodiment of
FIG. 28D , thestorage layer 130 may be further configured to manage clone merge operations. As used herein, a “range merge” or “clone merge” refers to an operation to combine two or more different sets of LIDs. In theFIG. 28D embodiment, a range merge operation may comprise merging theentry 2814 with thecloned entries storage layer 130 may be configured to implement range merge operations according to a merge policy, such as a recency policy in which more recent changes override earlier changes, a priority-based policy based on the relative priority of storage operations (e.g., based on properties of the storage client(s) 116, applications, and/or users associated with the storage operations), a completion indicator (e.g., completion of an atomic storage operation, failure of an atomic storage operation, or the like), fadvise parameters, ioctrl parameters, and/or the like. -
FIG. 28E depicts one embodiment of a range merge operation. The range merge operation ofFIG. 28E may be performed in accordance with a recency merge policy. The range merge operation may comprise merging the range 6144-6656 into the range 1024-2048. Accordingly, the range merge operation may comprise selectively applying changes made within the LID range 6144-6656 to the LID range 1024-2048 in accordance with the merge policy. In theFIG. 28E embodiment, the modify/overwrite operation to LIDs 6657-7424 may be applied to the merged LID range 1024-2048 in accordance with the recency merge policy. The range merge operation may, therefore, comprise updating the LID range 1024-2048 to associate LIDs 1537-2048 with the media addresses 78512-79024 comprising the new/modifieddata segment 2852. The resulting LID range may be split into twoseparate entries index 2804;entry 2815 may be configured to associate LIDs 1024-1536 with portions of thedata segment 2812; andentry 2817 may be configured to associate LIDs 1537-2048 with thedata segment 2852. Portions of thedata segment 2812 no longer referenced by the LIDs 1537-2048 may be invalidated, as disclosed herein. The merged LID range 6144-7168 may be deallocated and/or removed from theindex 2804. - The range merge operation illustrated in
FIG. 28E may result in modifying thelogical interface 2811C to portions of the data. Thecontextual format 2889 of thedata segment 2852 may associate the data with LIDs 6657-7168, rather than LIDs 1537-2048. As disclosed above, thestorage layer 130 may provide access to the data stored in the inconsistent contextual format. Thestorage controller 140 may be configured to store the data in an updated contextual format, in which thedata segment 2852 is associated with LIDs 1537-2048 in one or more background operations (e.g., grooming operations). In some embodiments, the range merge operation may further comprise storing apersistent note 2866 on thestorage device 120 to associate thedata segment 2852 with the updatedlogical interface 2811C (e.g., associate the data at media addresses 78512-79024 with LIDs 1537-2048). As disclosed above, thepersistent note 2866 may be used to ensure that the range merge operation is persistent and crash safe. Thepersistent note 2866 may be removed in response to relocating thedata segment 2852 in a contextual format that is consistent with thelogical interface 2811C (e.g., associates thedata segment 2852 with the LIDs 1537-2048). - The logical clone operations disclosed in conjunction with
FIGS. 28A-E may be used to implement other logical operations, such as a range move operation. Referring back toFIGS. 28A-C , the clone operation ofentry 2814 comprises modifying the logical interface associated with thedata segment 2812 to associate thedata segment 2812 with the LIDs 1024-2048 ofentry 2814 and the LIDs 6144-7168 ofentry 2824. The cloning operation further includes storing apersistent note 2866 indicating the updatedlogical interface 2811B of the data and rewriting thedata segment 2812 in accordance with the updatedlogical interface 2811B in one or more background storage operations (e.g., grooming operations). - The same set of operations may be performed to perform a “range move” operation. As used herein, a “range move” operation refers to modifying the logical interface of one or more data segments to associate the data segments with a different set of LIDs. A range move operation may, therefore, comprise updating storage metadata 135 (e.g., the index 2804) to associate the one or more data segments with the updated logical interface, storing a
persistent note 2866 on thestorage device 120 comprising the updated logical interface of the data segments, and rewriting the data segments in accordance in a contextual format (packet 2888) that is consistent with the updated logical interface (e.g., includes the updatedlogical interface 2865 in the persistent metadata 2864), as disclosed herein. Accordingly, thestorage layer 130 may implement range move operations using the same mechanisms and/or processing steps as those disclosed above in conjunction withFIGS. 28A-E . - The logical clone operations disclosed in
FIGS. 28A-E , however, may impose certain limitations on thestorage layer 130. As disclosed above, storing data in a contextual format (packet 2888) may comprise associating the data with each LID that references the data. In theFIG. 28C embodiment, thepersistent metadata 2864 comprises references to both LID ranges 1024-2048 and 6144-7168. Increasing the number references to a data segment may, therefore, impose a corresponding increase in persistent metadata overhead. In some embodiments, the size of thepersistent metadata 2864 may be limited, which may limit the number of references and/or clones that can reference a particular data segment. Moreover, inclusion of multiple LID references may complicate groomer operations. The number of index entries needed to be updated in a grooming operation may vary in accordance with the number of LIDs that reference the data that is to be relocated. Referring back toFIG. 28C , relocating thedata segment 2812 in a grooming and/or storage recovery operation may comprise updating twoseparate index entries persistent metadata 2864. This variable overhead may reduce the performance of background grooming operations and may limit the number of concurrent clones and/or references that can be supported. - In some embodiments, the
storage layer 130 may comprise and/or leverage an intermediate mapping layer to reduce the overhead imposed by clone operations. The intermediate mapping layer may comprise “reference entries” configured to facilitate efficient cloning operations (as well as other operations, as disclosed in further detail herein). As used herein, a reference entry refers to an entry that only exists while it is being referenced by one or more entries in thelogical address space 136. Accordingly, a reference entry does not exist in its own right, but only exists as long as it is being referenced by one or more other index entries. In some embodiments, reference entries may be immutable. Multiple clones may reference the same set of data through a single reference entry. The contextual format of cloned data (data that is referenced by multiple LIDs) may be simplified to associate the data with a reference entry which, in turn, is associated with N other references through other persistent metadata (e.g., persistent notes 2866). Relocating cloned data may, therefore, comprise updating a single mapping between the reference entry and the new media address of the data. -
FIG. 28F depicts one embodiment of astorage layer 130 configured to implement an intermediate mapping layer. As disclosed above, themetadata module 135 of the storage layer may comprise aforward index 2804 pertaining to thelogical address space 136 that is exposed to the storage clients via theinterface 138. Themetadata 2804 may include information pertaining to LID allocations, bindings between LIDs and media addresses and so on. Themetadata module 135 may further comprise areference index 2809 comprising reference entries. As disclosed above, the reference entries may be used to reference cloned data. Thetranslation module 134 may monitor reference entries of thereference index 2809, and may remove reference entries that are no longer needed (e.g. are not longer being referenced by other entries in the forward index 2804). In some embodiments, reference entries may be maintained in a separate portion of the storage metadata 135 (within a separate reference index 2809). The reference entries may be identified by use of reference identifiers, which may be maintained in a separate namespace from theindex 2804. Accordingly, the reference entries may be part of an intermediate, “virtual” or “reference” address space that is separate and distinct from thelogical address space 136 exposed to thestorage clients 116. Alternatively, in some embodiments, reference entries may be assigned LIDs selected from pre-determined ranges and/or portions of thelogical address space 136 that are not directly accessible bystorage clients 116. - A clone operation may comprise linking one or more LID entries in the
logical address space 2804 to reference entries in thereference index 2809. The reference entries may comprise the media address(es) of the cloned data. Accordingly, LIDs that are associated with cloned data may reference the cloned data indirectly through thereference index 2809. Such entries may be referred to as “indirect entries.” As used herein, an indirect entry refers to an entry in theindex 2804 that references and/or is linked to a reference entry in thereference index 2804. Indirect entries may be assigned a LID within thelogical address space 136, and may be accessible to thestorage clients 116. - As disclosed above, after cloning a particular address range,
storage clients 116 may perform storage operations within one or more of the cloned ranges, which may cause the clones to diverge from one another (in accordance with the clone mode). In a “copy on write” mode, changes made to a particular clone may not be reflected in the other cloned ranges. In theFIG. 28F embodiment, changes made to a clone may be reflected in “local” entries within an indirect entry. As used herein, a “local entry” or “local LID” refers to a portion of an indirect entry that is directly mapped to one or more media addresses on thestorage device 120. Accordingly, local entries and/or local LIDs may be configured to reference data that has been changed in a particular clone and/or differs from the contents of other clones. - The
translation module 134 may be configured to access data associated with cloned data. In some embodiments, thetranslation module 134 is configured to determine the media addresses associated with an indirect entry by use of the corresponding reference entries in thereference index 2809. Thetranslation module 134 may further comprise acascade lookup module 2855 configured to manage indirect entries that comprise local LIDs. Thecascade lookup module 2855 may be configured to traverse local LIDs of indirect entries first and, if the LID is not found within local entries, thecascade lookup module 2855 may continue searching within the reference entries to which the indirect entry is linked. - The
log storage module 137 andgroomer module 370 may be configured to manage the contextual format of cloned data. In theFIG. 28F embodiment, cloned data (data that is referenced by two or more LIDs and/or LID ranges within the index 2804) may be stored in a contextual format that associates the data with the corresponding reference entries. Accordingly, the logical interface metadata stored with cloned data segments may correspond to a single reference entry as opposed to identifying each LID and/or LID range of the clone. Creating a clone may, therefore, comprise updating the contextual format of the cloned data in one or more background operations by use of, inter alia, thegroomer module 370. -
FIG. 28G depicts one embodiment of a clone operation using areference index 2809. Instate 2813A, an entry corresponding tological identifier 10 extent 2 in the logical address space 136 (denoted 10,2 inFIG. 28G ) may directly reference data atmedia address 20000 on thestorage device 120. Other entries are omitted fromFIG. 28F to avoid obscuring the details of the disclosed embodiment. Instate 2813B, thestorage controller 130 is configured to create a clone of theentry 10,2 at LID 400 (denoted 400,2 inFIG. 28G ). Thestorage controller 130 may be configured to create the clone in response to a request from astorage client 116 and/or as part of a higher-level operation, such as an atomic storage operation, snapshot, or the like. In theFIG. 28G embodiment, creating a clone ofentry 10,2 may comprise creating an entry in the logical address space 136 (index 2804) to represent the clone. The clone may be created at LID 400 (entry 400,2 inFIG. 28G ). Creating the clone may further comprise creating an entry in thereference index 2809 through which theentries media address 20000. The reference entry may correspond to a particular portion of thelogical address space 136 and/or may be part of a separate, reference address space. The reference entry is identified inFIG. 28G as entry 100000,2. The clone operation may further comprise associating theentries index 2804 with the reference entry 100000,2 as illustrated atstate 2813C. As disclosed above, associating theentries entries State 2813C may further comprise storing apersistent note 2866 on thestorage device 120 to associate the data atmedia address 20000 with the reference entry 100000,2 and/or to associate theentries reference index 2809. - The
storage layer 130 may provide access to the data atmedia address 20000 through eitherLID 10 or LID 400 (and by reference to the reference entry 100000,2). In response to a request pertaining toLID 10 orLID 400, thetranslation module 134 may determine that the corresponding entry in theindex 2804 is an indirect entry that is associated with an entry in thereference index 2809. In response, thecascade lookup module 2855 may determine the media address associated with the LID by use of local entries (if any) and the corresponding reference entry 100000,2. - The data stored at
media address 20000 may be stored in a contextual format that is inconsistent with the clone configuration (e.g., the data may be associated withLID 10,2 as opposed to the reference entry 100000,2 and/or LID 400). The data may be stored in an updated contextual format (instate 2813D) in one or more background and/or grooming operations. The data may be stored with persistent metadata that associates the data with the reference entry 100000,2 as opposed to the separate LIDs ranges 10,2 and 400,2. Relocating the cloned data may only require updating a single entry in thereference index 2809 as opposed to multiple entries corresponding to each LID that references the data (e.g.,entries index 2804 may reference the cloned data, without increasing the size of the persistent metadata associated with the cloned data and/or complicating the operation of thegroomer module 370. -
FIG. 28H depicts another embodiment of a clone operation implemented using areference index 2890. As disclosed above, in response to a request to create a clone of the logical identifiers 1024-2048 and/ordata segment 2812, thestorage layer 130 may be configured to create areference entry 2891 in a designated portion of the index 2804 (e.g., the reference index 2890), or within a separate namespace. Thereference entry 2891 may represent the cloneddata segment 2812. Any number of LIDs and/or LID ranges in theindex 2804 may reference the data through thereference index entry 2891. As depicted inFIG. 28H , thereference entry 2891 may be bound to the media storage locations of the cloned data segment 2812 (media addresses 3453-4477). Theentries reference entry 2891. - The
reference entry 2891 may be assignedidentifiers 0Z-1023Z. As disclosed above, the identifier(s) of thereference entry 2891 may correspond to a particular portion of thelogical address space 136 or may correspond to a different, separate namespace. The storage layer links theentries reference entry 2891 by use of, inter alia,metadata indirect entries reference entry 2891. Thereference entry 2891 may not be directly accessible bystorage clients 116 via thestorage layer 130 and/orinterface 138. - The clone operation may further comprise modifying the
logical interface 2811D of thedata segment 2812; the modifiedlogical interface 2811D may allow thedata segment 2812 to be referenced through the LIDs 1024-2048 of theindirect entry 2894 and/or the LIDs 6144-7168 of theindirect entry 2895. Although thereference entry 2891 may not be used bystorage clients 116 to reference thedata segment 2812,FIG. 28H depicts thereference entry 2891 as part of the modifiedlogical interface 2811D of thedata segment 2812, since thereference entry 2891 is used to access the data by the translation module 132 (through theindirect entries 2894 and 2895). - Creating the clone may further comprise storing a
persistent note 2866 on thestorage device 120. As disclosed above, thepersistent note 2866 may identify thereference entry 2891 associated with thedata segment 2812. Accordingly, thepersistent note 2866 may associate the media addresses 64432-65456 with the identifier(s) of thereference entry 2891. The clone operation may further comprise storing anotherpersistent note 2867 configured to associate the LIDs ofentries 2894 and 2895 (LIDs 1024-2048 and 6144-7168) with thereference entry 2891. Alternatively, metadata pertaining to the association betweenentries reference entry 2891 may be included in thepersistent note 2866. Thepersistent notes 2866 and/or 2867 may be retained on thestorage device 120 until thedata segment 2812 is relocated in an updated contextual format and/or the index 2804 (and/or reference index 2890) are persisted. As disclosed above, storage of the persistent note(s) 2866 and/or 2867 may ensure that the clone operation is persistent and crash safe. - The modified
logical interface 2811D of thedata segment 2812 may be inconsistent with the contextual format of thedata 2898A; thelogical interface metadata 2865A of thepersistent metadata 2864A may reference LIDs 1024-2048 rather than the identifiers of thereference entry 2891 and/or the clonedentry 2895. Thestorage controller 140 may be configured to store the cloneddata segment 2812 in an updatedcontextual format 2864B that is consistent with the modifiedlogical interface 2811D; thelogical interface metadata 2865B of thepersistent metadata 2864B may associate thedata segment 2812 with thereference entry 2891, as opposed to separately identifying the LIDs within each cloned range (LIDs ofentries 2894 and 2895). Accordingly, the use of theindirect entry 2894 allows thelogical interface 2811D of thedata segment 2812 to comprise any number of LIDs, independent of size limitations of thecontextual data format 2898A-B (e.g., independent of the number of LIDs that can be included in the logical interface metadata 2865). Moreover, additional logical copies of thereference entry 2891 may be made without updating thecontextual format 2864B of the data; such updates may be made by associating the LID ranges with thereference entry 2891 in theindex 2804 and/or by use of, inter alia,persistent notes 2867. - As disclosed above, the
indirect entries 2894 and/or 2895 may initially reference thedata segment 2812 through thereference entry 2891. Storage operations performed after creating theclones 2894 and/or 2895 may be reflected by use of local LIDs within therespective entries 2894 and/or 2895.FIG. 28I depicts one embodiment of the result of a storage operation pertaining to LIDs 1024-1052 performed after completing the clone operation ofFIG. 28H . After completion of the clone operation, astorage client 116 may modify data associated with one or more of the clones. In theFIG. 28I embodiment, astorage client 116 modifies and/or overwrites data corresponding to LIDs 1024-1052 ofentry 2894, which may comprise appending anew data segment 2892 to thestorage device 120. Thedata segment 2892 may be stored in acontextual format 2898 comprisingpersistent metadata 2864 configured to associate thedata segment 2892 stored at media addresses 7923-7851 with logical interface metadata 2865 (LIDs 1024-1052). Thestorage layer 130 may be configured to associate thedata segment 2892 with the LIDs 1024-1052 in alocal LID entry 2896. Thelocal LID entry 2896 may reference the updated data directly, as opposed to referencing the data through a reference entry (e.g., reference entry 2891). - In response to a request pertaining to data 1024-1052 (or sub-set thereof), the
cascade lookup module 2855 may search for references to the LIDs in a cascading lookup operation, which may comprise searching for references to local LIDs (if available) followed by thereference entries 2891. In theFIG. 28I embodiment, thelocal entry 2896 may be used to satisfy requests pertaining to LIDs 1024-1052 (media addresses 7823-7851 rather than 64432-64460 per the reference entry 2891). Requests for LIDs that are not found in local entries (e.g., LIDs 1053-2048) may continue to be serviced through thereference entry 2891. Accordingly, thestorage layer 130 may use theindirect entry 2894 andreference entry 2891 to implement a cascade lookup for LIDs pertaining to the clone range 1024-2048. Thelogical interface 2811E of the data may, therefore comprise one or morelocal entries 2896 and/or one or more indirect and/or reference entries. - In a further embodiment, illustrated in
FIG. 28J , astorage client 116 may modify data of the clone through another one of the LIDs of thelogical interface 2811E (e.g., LIDs 6144-6162); the logical interface delimiters are not shown inFIG. 28J to avoid obscuring the details of the illustrated embodiment. The modified data may be referenced using alocal entry 2897, as disclosed above. Since each of the clones now has its own, respective version of theoriginal clone data 0Z-52Z; neither clone references that portion of thereference entry 2891. Thestorage layer 130 may determine that the corresponding clone data (and reference identifiers) are no longer being referenced, and may be removed (as depicted inFIG. 28J ). The clones may continue to diverge, until neither 2894 nor 2895 references any portion of thereference entry 2891, at which point thereference entry 2891 may be removed. - Although
FIGS. 28I and 28J depictlocal entries indirect entries FIG. 28I may be reflected by creating thelocal entry 2896 and modifying the indirect entry to reference LIDs 1053-2048. Similarly, the operation ofFIG. 28J may comprise creating thelocal entry 2897 and modifying the indirect entry to reference LIDs 6163-7168. - Referring back to
FIG. 28F , thetranslation module 134 may be configured to “groom” thereference index 2809. In some embodiments, each entry in thereference index 2809 comprises metadata that includes a reference count (not shown). The reference count may be incremented as new references or links to the reference entry are added, and may be decremented in response to removing references to the entry. In some embodiments, reference counts may be maintained for each reference identifier in thereference index 2809. Alternatively, reference counts may be maintained for reference entries as a whole. When the reference count of a reference entry reaches 0, the reference entry 2891 (or a portion thereof) may be removed from thereference index 2809. Removing a reference entry (or portion of a reference entry) may comprise invalidating the corresponding data on thestorage device 120, as disclosed herein (indicating that the data no longer needs to be retained on the storage device 120). - In another example, the
storage layer 130 may remove reference entries using a “mark-and-sweep” approach. The storage layer 130 (or other process, such as thetranslation module 134 and/or groomer 370) may periodically check references to entries in thereference index 2809 by, inter alia, following links to the reference entries from indirect entries (or other types of entries) in theindex 2804. Entries that are not referenced by any entries during the mark-and-sweep may be removed, as disclosed above. The mark-and-sweep may operate as a background process, and may periodically perform a mark-and-sweep operation to garbage collect reference entries that are no longer in use. - The reference index disclosed in conjunction with
FIGS. 28F-28J may be created on demand (e.g., in response to creation of a clone, or other indirect data reference). In other embodiments, all data may be referenced through intermediate, two-layer mappings. In such embodiments,storage clients 116 may allocate indirect, virtual identifiers (VIDs) in a virtual address space, which may be linked to and/or reference media addresses through an intermediate mapping layer, such as thelogical address space 136 of thestorage layer 136. These embodiments may result in an additional mapping layer betweenstorage clients 116 and the storage device(s) 120. Storage clients may reference data using VIDs of a virtualized address space that map to logical identifiers of thelogical address space 136, which, in turn, are associated with media addresses on respective storage device(s) 120. -
FIG. 28K depicts one embodiment of anindirection layer 2830 configured to implement cloning operations using a two-layer, virtualized address space. Theindirection layer 2830 may be configured to present avirtual address space 2836 to thestorage clients 116. Theindirection layer 2830 may implement thevirtual address space 2836 using the same modules and/or interfaces for managing thelogical address space 136 disclosed herein. Thevirtual address space 2836 may comprise 64-bit VIDs, which may be defined independently of the underlyinglogical address space 136 and/or storage device(s) 120. Theindirection layer 2830 may compriseVID metadata 2835, which may comprise aVID index 2884. TheVID index 2884 may be implemented as theindex 2804 disclosed herein. Theindirection layer 2830 may further comprise aVID translation module 2834 configured to map VIDs to LIDs within thelogical address space 136 of thestorage layer 130. - The
indirection layer 2830 may provide access to thevirtual address space 2836 through theinterface 2838. Theinterface 2838 may comprise one or more of a block device interface, virtual storage interface, cache interface, and the like, as disclosed herein. Theclone module 2831 may be configured to manage clone operations within thevirtual address space 2836. AlthoughFIG. 28K depicts theindirection layer 2830 separately from thestorage layer 130, the disclosure is not limited in this regard. In some embodiments,virtual address space 2836,VID index 2884,VID translation module 2834, and/or theclone module 2831 may be implemented as part of thestorage layer 130. - The VIDs of the virtual address space may be used to, inter alia, perform efficient cloning operations. Alternatively, or in addition, the additional mapping layer may be leveraged to enable logical clone operations on random access, write-in-
place storage devices 120, such as hard disks. -
Storage clients 130 may perform storage operations in reference to VIDs of thevirtual address space 2836. Accordingly, storage operations may comprise two (or more) translation layers. TheVID index 2884 may comprise a first translation layer between VIDs of thevirtual address space 2836 and LIDs of thelogical address space 136. Theindex 2804 of thestorage layer 130 may implement a second translation layer between the LIDs and media address(es) onrespective storage devices 120. - The
indirection layer 2830 may be configured to manage allocations within thevirtual address space 2836 by use of, inter alia, theVID metadata 2835,VID index 2884, and/orVID translation module 2834. TheVID translation module 2834 may be configured to maintain associations between VIDs of thevirtual address space 2836 and LIDs of the logical address space 136 (by use of the VID index 2884). In some embodiments, allocating a VID in thevirtual address space 2836 may comprise allocating one or more corresponding LIDs in thelogical address space 136. Accordingly, each VID allocated in thevirtual address space 2836 may be mapped to one or more LIDs in thelogical address space 136. The mappings may be sparse and/or any-to-any, as disclosed herein. Thelogical address space 136 may not be directly accessible to the storage clients 116 (e.g., thelogical address space 136 may be used as an intermediate mapping layer). Performing a storage operation through theindirection layer 2830 may comprise: a) identifying the LIDs corresponding to one or more VIDs referenced in the storage operation by use of theVID translation module 2834 and/orVID index 2884; and b) implementing the storage operation within thestorage layer 130 in reference to the identified LIDs. -
FIG. 28J depicts one embodiment of a clone operation using anindirection layer 2830. As disclosed above, theVID index 2884 may correspond to avirtual address space 2836 that is indirectly mapped to media addresses through thelogical address space 136 of thestorage layer 130. Theindirection layer 2830 may provide access to thevirtual address space 2836 through aninterface 2838.Storage clients 116 may allocate portions of thevirtual address space 2836 and/or perform storage operations using VIDs of the virtual address space through the indirection layer 2830 (and storage layer 130), as disclosed herein. - In
state 2863A, theVID index 2884 may comprise anentry 10,2 that represents two VIDs (10 and 11) in thevirtual address space 2836. TheVID index 2884 may be configured to map theVID entry 10,2 to LIDs within the logical address space 136 (using the VID index 2884). In theFIG. 28K embodiment, theVID index 2884 maps theVID entry 10,2 to the LID entry 100000,2. Theentry 10,2 may be allocated to astorage client 116, which may perform storage operations in reference to the VIDs. Instate 2863A, thestorage layer 130 may be configured to map the LID entry 100000,2 to one or more media addresses on the storage device 120 (media address 20000). - In state 2836B, the
indirection layer 2830 is configured to implement a clone operation. The clone operation may comprise creating a clone of theVID entry 10,2. InFIG. 28L , the clone is identified asVID index entry 400,2. The clone operation may further comprise associating the clonedentry 400,2 with corresponding LID entry 100000,2 in theVID index 2884. The corresponding entry 100000,2 in theindex 2804 may remain unchanged. Alternatively, a reference count (or other indicator) of the LID entry 100000,2 may be updated to indicate that the entry is being referenced by multiple VID entries. The contextual format of the data stored atmedia address 20000 may be left unchanged (e.g., continue to associate the data with the LID entry 100000,2). The clone operation may further comprise storing apersistent note 2866 and/or 2867 on thestorage device 120 to persist the association betweenVID entry 400,2 and the LID entry 100000,2. Alternatively, or in addition, the clone operation may be made persistent and/or crash safe by persisting theVID index 2884. - In state 2836C, the data at
media address 20000 may be relocated tomedia address 40000. The relocation may occur in a standard grooming operation, and not to update the contextual format of the cloned data. Relocating the data may comprise updating a single entry in theindex 2804. - The clone implementations disclosed herein may be used to efficiently implement storage operations, such as range clone operations, range move operations, snapshots, deduplication, atomic writes, and the like.
- The embodiments for clone operations disclosed herein may be leveraged to manage snapshots of the logical address space 136 (or virtual address space 2836). Creating a snapshot of a address range may comprise maintaining an immutable copy of the AR, and the corresponding data. As used herein, an address range (or AR) refers to a logical address range, a virtual address range, or the like.
- The embodiments for managing range clone and/or range move operations disclosed herein may be leveraged to perform one or more higher-level operations, such as deduplication operations. Referring back to
FIG. 3A , thestorage layer 130 may comprise adeduplication module 374 configured to identify duplicate data on thestorage device 120 and/ornon-volatile storage media 122. Duplicate data may be identified using any suitable mechanism. In some embodiments, duplicated data is identified by scanning the contents of thestorage device 120, generating signature values for various data segments, and comparing data signature values to identify duplicates. The signature values may include, but are not limited to: cryptographic signatures, hash codes, cyclic codes, and/or the like. Signature information may be stored withinstorage metadata 135, such as the index 2804 (e.g., in metadata associated with the entries) and/or may be maintained and/or indexed in one or more separate datastructures (not shown). Thededuplication module 374 may compare data signatures and, upon detecting a signature match, may perform one or more deduplication operations. The deduplication operations may comprise verifying the signature match (e.g., performing a byte-by-byte data comparison), and performing one or more range clone operations to reference the duplicate data within two or more LIDs and/or LID ranges. -
FIG. 28M depicts one embodiment of a deduplication operation. Theindex 2804 may compriseentries entries logical interfaces data segment 2812 may be identified and/or verified by thededuplication module 374, as disclosed above. Alternatively, the duplicated data may be identified as data is received for storage at thestorage layer 130. Accordingly, the data may be deduplicated before an additional copy of the data is stored on thestorage device 120. - In response to identifying and/or verifying that
entries storage layer 130 may be configured to deduplicate the data, which may comprise creating one or more range clones. As disclosed above, creating a range clone may comprise modifying the logical interface 2811G of the duplicateddata segment 2812 to associate a single version of thedata segment 2812 with both sets of LIDs 1024-2048 and 6144-7168. - The range clone may be implemented using any of the clone embodiments disclosed herein including the range clone embodiments of
FIGS. 28A-E , the reference entry embodiments ofFIGS. 28F-J , and/or the two-layer mapping embodiments ofFIGS. 28K-L . In the de-deduplication embodiment ofFIG. 28M , both LID ranges 1024-2048 and 614407168 may be modified to reference a single version of the data segment 2812 (theother data segment 2812 may be removed and/or groomed from the storage device 120). - The
FIG. 28M embodiment uses the reference entry implementation ofFIGS. 28F-J . As such, the deduplication operation may comprise creating a reference entry 2981 to represent the deduplicated data segment 2812 (the cloned data). The deduplication operation may further comprise modifying and/or converting theentries indirect entries data segment 2812 through thereference entry 2891, as disclosed above. The deduplication operations may comprise modifying the logical interface 2811G to reference thedata segment 2812 through both sets of LIDs 1024-2048 and 6144-7168 (as well as the reference entry 2891). The deduplication operations may further comprise storing a persistent note on thenon-volatile storage media 122 to associate thedata segment 2812 with the updated logical interface 2811G (e.g., associate thedata segment 2812 with thereference entry 2891 and/or the linkedindirect entries 2894 and 2895), as disclosed herein. The deduplication operations may further comprise updating the contextual format of thedata segment 2812 to be consistent with the modified logical interface 2811G, as disclosed above. Updating the contextual format may comprise relocating (e.g., rewriting) thedata segment 2812 in an updatedcontextual format 2898 to new media storage locations (e.g., media storage locations 84432-84556) in one or more background operations. The updatedcontextual format 2898 may comprisepersistent metadata 2864 that includeslogical interface metadata 2865 to associates thedata segment 2812 with the reference entry 2891 (e.g.,identifiers 0Z-1023Z). - Although
FIGS. 28A-G depict cloning and/or deduplicating a single entry or range of LIDs, the disclosure is not limited in this regard. In some embodiments, a plurality of LID ranges may be cloned in a single clone operation. For example, referring back toFIG. 12 , a cloning operation may clone theentry 1214 along with all of its child entries. In another example, a clone operation may comprise copying the entire contents of the index 1204 (e.g., all of the entries in the index 1204). This type of clone operation may be used to create a “snapshot” of a logical address space 136 (or a particular LID range). As used herein, a snapshot refers to the state of a storage device (or set of LIDs) at a particular point in time. The snapshot may persist the state of a logical address range despite changes to the original. -
FIG. 28N depicts one embodiment of a storage layer configured to perform snapshot operations. TheFIG. 28N embodiment pertains to an address range with a logical address space 136 (logical address range 2904). The disclosure is not limited in this regard, however, and could be adapted for use with other types of address ranges, such as ranges and/or extents within thevirtual address space 2836, as disclosed above. - At
time t1 2913A, thestorage layer 130 may be configured to create a snapshot of the logical address range LAS1. As used herein, a snapshot of an address range refers to an operation that is configured to maintain the state of the address range at a particular time (e.g., freeze the address range). The snapshot operation may comprise preserving the state of the (LAS1) at a particular time. The snapshot operation may further comprise preserving the logical address range while allowing subsequent storage operations to be performed within the logical address range. - As disclosed above, the
storage layer 130 may be configured store data in an ordered log by use of, inter alia, thelog storage module 137. The log order of storage operations may be determined using sequence information associated with the data, such as sequence indicators onstorage divisions 253 of a solid-state storage medium (e.g.,logical storage element 229 ofFIG. 3A ) and/or sequential storage locations within the physical address space of the storage device 120 (as disclosed in conjunction withFIG. 3C ). - The
storage controller 140 may be further configured to maintain other types of ordering and/or timing information, such as the relative time ordering of data in the log. However, in some embodiments, the log order of data may not accurately reflect data information. As disclosed above, thegroomer module 370 may be configured to relocate data on thestorage device 120. Relocating data may comprise reading the data from its original storage location on thestorage device 120 and appending the data at a current append point in the log. As such, older, relocated data may be stored with newer, current data in the log. - In some embodiments, the
log storage module 137 is configured to associate data with timing information, which may be used to establish relative timing information of the storage operations performed in the log. In some embodiments, the timing information may comprise respective timestamps (maintained by the timing module 2862), which may be applied to each data packet stored in the log. The timestamps may be stored withinpersistent metadata 2864 of the data packets (e.g. in packet headers). Alternatively, or in addition, thetiming module 2862 may be configured to track timing information at a higher-level of granularity. In some embodiments, thetiming module 2862 maintains one or more global timing indicators (an epoch identifier). As used herein, an “epoch identifier” refers to an identifier used to determine relative timing of storage operations performed through thestorage layer 130. Thelog storage module 137 may be configured to include anindicator 2869 of the current epoch identifier in thepersistent metadata 2864; theepoch indicator 2869 may correspond to the epoch in which thedata segment 2812 was written to the log. Thetiming module 2862 may be configured to increment the global epoch identifier in response to certain events, such as the creation of new snapshots, user requests, and/or the like. Theepoch indicator 2869 of thedata segment 2812 may remain unchanged through relocation and/or other grooming operations. Accordingly, theepoch indicator 2869 may correspond to the original storage time of thedata segment 2812 independent of the relative position of the contextual data format (packet 2918) in the log. - As disclosed above, a snapshot operation may comprise preserving the state of a particular logical address range (LAS1) at a particular time. A snapshot operation may, therefore, comprise preserving data pertaining to the LAS1 on the
storage device 120. Preserving the data may comprise a) identifying data pertaining to a particular timeframe (epoch), and b) preserving the identified data on the storage device 120 (e.g., preventing the identified data being removed from thestorage device 120 in, inter alia, grooming operations). Data that needs to be preserved for a particular snapshot may be identified by use of theepoch indicators 2869 disclosed above. - In
state 2873A (time t1, denoted by epoch indicator e0), thestorage layer 130 may receive a request to implement a snapshot operation through theinterface 138. In response to the request, thesnapshot module 2860 may determine the current value of the epoch identifier maintained by thetiming module 2862. The current value of the epoch identifier may be referred to as the current “snapshot epoch.” In theFIG. 28N embodiment, the snapshot epoch is 0. Thesnapshot module 2860 may be further configured to cause thetiming module 2862 to increment the current, global epoch indicator (e.g., increment the epoch identifier to 1). Creating the snapshot may further comprising storing apersistent note 2966 on thestorage device 120. Thepersistent note 2966 may indicate the current, updated epoch indicator, and may further indicate that data pertaining to the snapshot epoch is to be preserved. Thepersistent notes 2966 may be using during a metadata reconstruction operation to a) determine the current epoch identifier and to b) configure thesnapshot module 2860 and/orgroomer module 370 to preserve data associated with the snapshot epoch e0. - The
snapshot module 2860 may be further configured to instruct thegroomer 370 to preserve data associated with the snapshot epoch. In response, thegroomer 370 may be configured to a) identify data to preserve for the snapshot (snapshot data), and b) prevent the identified data from being removed from thestorage device 120 in, inter alia, grooming operations (e.g., storage recovery operations). Thegroomer module 370 may identify snapshot data in reference to theepoch indicators 2869 associated with the data. As disclosed in conjunction withFIG. 3C , data may be written out-of-place on thestorage device 120. The most current version of data associated with a particular LID (or LID range) may be determined based on the order of the correspondingdata packets 2918 within the log. Thegroomer 370 may be configured to identify the most current version of data within the snapshot epoch as data that needs to be preserved. Data that has been rendered obsolete by other data in the snapshot epoch may be removed. Referring to theFIG. 3C embodiment, if the data A and A′ (associated with the same LIDs) were both marked with thesnapshot epoch 0, thegroomer module 370 would identify the most current version of the data inepoch 0 as A′, and would identify the data A for removal. - In
state 2873B, thesnapshot module 2860 may be configured to preserve data pertaining to the snapshot LAS 1 (data associated with epoch e0), while allowing storage operations to continue to be performed during subsequent epochs (e.g., epoch e1). The storage operations may comprise storing data on the storage device. The data may be stored with an indicator of the current epoch (e1). Thesnapshot module 2860 may be configured to preserve data that is rendered obsolete and/or invalidated by storage operations performed during epoch e1 (and subsequent epochs). Referring back to theFIG. 3C embodiment, thegroomer module 370 may identify data A′ as data to preserve for the snapshot LAS1 (the data A′ may be the most current version within epoch e0). Thesnapshot module 2860 and/orgroomer 370 may be configured to preserve the data A′ even of the corresponding LIDs are trimmed and/or deleted during epoch e1. Similarly, the data A′ may be preserved in response to overwriting the data with a new version A″ during epoch e1. - The snapshotfor LAS 1 (data marked with epoch indicator e0) may be preserved until it is deleted. The snapshot may be deleted in response to a request received with the
interface 138. As indicated instate 2873C, theepoch 0 may persist on thestorage device 120 even after other, intervening epochs (epochs e1-eN) have been created and/or deleted. Deleting the epoch e0 may comprise configuring thesnapshot module 2860 and/orgroomer module 370 to remove invalid/obsolete data associated with the epoch e0. - The storage operations performed after creating the snapshot at 2873A may modify the
logical address space 136 and specifically, theindex 2804. The modifications may comprise updating LID-to-media address bindings in response to appending data to thestorage device 120, adding LIDs, removing and/or trimming LIDs, and so on. In some embodiments, thesnapshot module 2860 is configured to preserve the LAS1 index in a separate storage location, such as a separate location in thelogical address space 136, in a separate namespace, or the like. Alternatively, thesnapshot module 2860 may allow the changes to take place in theindex 2804 without preserving the original version of theindex 2804 LAS1 at time t1. Thesnapshot module 2860 may be configured to reconstruct theindex 2804 for LAS1 at time t1 using the data stored in the contextual, log-based data format on thestorage device 120. The LAS1 at time t1 may be reconstructed as disclosed above, which may comprise sequentially accessing data stored on the storage device 120 (in a log-order), and creating index entries based onpersistent metadata 2864 associated with thedata packet 2918. In theFIG. 28N embodiment, the LAS1 may be reconstructed by referencingdata packets 2918 that are marked with theepoch indicator 2869 e0 (or lower). Data associated withepoch indicators 2869 greater than e0 may be ignored (since such data corresponds to operations after creation of the snapshot LAS1). -
FIG. 28O depicts one example of a move operation. Theindex 2804 includesentries 2877 configured to bind LIDs 1023-1025 to respective data segments on the storage device. Theentries 2877 are depicted separately to better illustrate details of the embodiment, however, theentries 2877 could be included in a single entry comprising a range of LIDs 1023-1025. Theentries 2877 define alogical interface 2811A of the data stored atmedia storage locations entries 2877 may be stored in a contextual format that associates the data with the correspondingLIDs - The
storage layer 130 may be configured to implement a move operation. The move operation may comprise modifying the logical interface to thedata 2811B by, inter alia, replacing the association between theLIDs media storage locations logical interface 2811B for the data that includes a new set of LIDs (e.g., 9215, 9216, and 9217). The move operation may be performed in response to a request received via theinterface 138 and/or as part of a higher-level storage operation (e.g., a request to rename a file, operations to balance and/or defragment theindex 2804, or the like). - The move operation may be implemented in accordance with one or more of the cloning embodiments disclosed above. In some embodiments, the move operation may comprise associating the media addresses mapped to
LIDs destination LIDs logical interface 2811B of the data in accordance with the move operation. The move operation may further comprise storing apersistent note 2866 on thestorage device 120 to ensure that the move operation is persistent and crash safe. The data stored at media addresses 32, 872, and 3096 may be re-written in accordance with the updatedlogical interface 2811B in one or more background operations, as disclosed above. -
FIG. 28P depicts another embodiment of a move operation. As above, the move operation may comprise moving the data associated with LIDs 1023-1025 to LIDs 9215-9217. The move operation ofFIG. 28P may utilize the reference entries as disclosed inFIGS. 28F-J . Accordingly, the move operation may comprise creatingreference entries 2899 in areference index 2809 to represent the logical move operation. Thereference entries 2899 may comprise thepre-move LIDs addresses logical interface 2811B of the data, therefore, may comprise theindirect LIDs entries 2879 and thecorresponding reference entries 2899. The move operation may further comprise storing apersistent note 2866 on thestorage device 120 to ensure that the move operation is persistent and crash safe. - As disclosed herein, the contextual format of the data on the media addresses 32, 3096, and 872 may be inconsistent with the updated
logical interface 2811B; the contextual format of the data may associate the respective data segments withLIDs persistent note 2866 may comprise the updated logical interface for the data, so that the storage metadata 135 (e.g., index 2804) can be correctly reconstructed if necessary. - The
storage layer 130 may provide access to the data in the inconsistent contextual format through the modifiedlogical interface 2811B (LIDs logical interface 2811B subsequent to the move operation (outside of the path of the move operation and/or other storage operations). In some embodiments, the data at media addresses 32, 3096, and/or 872 may be rewritten by agroomer module 370 in one or more background grooming operations, as described above. Therefore, the move operation may complete (and/or return an acknowledgement) in response to updating theindex 2804 is updated (and/or storing the persistent note 2866). - As illustrated in
FIG. 28Q , theindex 2804 may be updated in response to storing data in the consistent contextual format. Thedata segment 2823 atmedia storage location 32 may be relocated in a grooming operation, which may comprise storing the data in acontextual format 2883 that is consistent with the modifiedlogical interface 2811B of the move operation (e.g., includespersistent metadata 2864 comprising thelogical interface 2865 that associates thedata segment 2823 with LID 9215). Theindex 2804 may be updated to reference the data in the updated contextual format, which may comprise modifying the entry forLID 9215, such that it no longer is linked to the reference entry for 1023. The entry forLID 9215 may revert from an indirect node to a standard index entry and the reference entry forLID 1023 may be removed. - Referring to
FIG. 28R , astorage client 116 may modify data associated withLID 9217, which may comprise storing the modified data, out-of-place (at media address 772). The data may be written in a contextual format that is consistent with the modifiedlogical interface 2811B (e.g., associates the data with LID 9217). In response, theindex 2804 may be updated to associate theentry 9217 with the media storage location of the modified data (e.g., media storage location 772), and to remove the reference entry forLID 1025, as disclosed above. - In some embodiments, the
reference index 2809 may be maintained separately from theindex 2804, such that the entries therein (e.g., entries 2899) cannot be directly referenced bystorage clients 116. This segregation of thelogical address space 136 may allow storage clients to operate more efficiently. For example, rather than stalling operations until data is rewritten and/or relocated in the updated contextual format, data operations may proceed while the data is rewritten in one or more processes outside of the path for servicing storage operations and/or requests. Referring toFIG. 28S , following the move operation disclosed above, astorage client 116 may store data in connection with theLID 1024. Thereference entry 2899 corresponding toLID 1024 may be included in thereference index 2809, due to inter alia the data at 3096 not yet being rewritten in the updated contextual format. However, since thereference index 2809 is maintained separately from theindex 2804, a name collision may not occur, and the storage operation may complete. Theindex 2804 may include aseparate entry 2857 comprising the logical interface for the data stored atmedia storage location 4322, while continuing to provide access to the data formerly bound to 1024 through thereference index 2809 through thelogical interface 2811B. - When the
entries 2879 are no longer linked, any entries in thereference index 2809, due to, inter alia, rewriting, relocating, modifying, deleting, and/or overwriting, the data, the last of thereference entries 2899 may be removed, and theentries 2879 may no longer be linked to reference entries in thereference index 2809. In addition, the persistent note associated with the move operation may be invalidated and/or removed from thestorage device 120, as disclosed above. - Referring back to
FIG. 1 , theinterface 138 of thestorage layer 130 may be configured to provide APIs and/or interfaces for performing the storage operations disclosed herein. The APIs and/or interfaces may be exposed through one or more of theblock interface 131, an extendedvirtual storage interface 132, and/or the like. Theblock interface 131 may be extended to include additional APIs and/or functionality use of interface extensions such as fadvise parameters, I/O control parameters, and the like. Theinterface 138 may provide APIs to perform range clone operations, range move operations, range merge operations, apply attributes and/or metadata to ranges (e.g., freeze a range), manage range snapshots, and the like. As disclosed above, a range clone operation comprises creating a logical copy of a set of one or more sources LIDs. Range clone operations may be implemented using any of the embodiments disclosed herein including, but not limited to: the range clone embodiments depicted inFIGS. 28A-E (including the range merge embodiment ofFIG. 28E ), the reference entry embodiments ofFIGS. 28F-J , and/or the two-layer mapping embodiments ofFIGS. 28K-28J . As disclosed above in conjunction withFIGS. 28A-E , the disclosed embodiments may be further configured to implement range move operations. - The lower-level interfaces disclosed herein may be used to implement higher-level operations, such as deduplication, file-level snapshots, efficient file copy operations (logical file copies), address space management, mmap checkpoints, atomic writes, and the like. These higher-level operations may also be exposed through the
interface 138 of thestorage layer 130. -
FIG. 29A depicts one embodiment of astorage layer 130 configured to provide storage services to afile system 2916. Thefile system 2916 may be configured to leverage functionality of thestorage layer 130 to reduce complexity, overhead, and the like. For example, thefile system 2916 may delegate crash recovery functionality to thestorage layer 130, as disclosed above. Thefile system 2916 may be further configured to leverage the range clone functionality of the storage layer to implement efficient file-level snapshot and/or copy operations. Thefile system 2916 may be configured to implement such operations in response to a request (e.g., a copy command, a file snapshot ioctl, or the like). Thefile system 2916 may be configured to implement efficient file copy and/or file-level snapshot operations on a source file may by, inter alia: a) flushing dirty pages of the source file (if any), b) creating a new destination file to represent the copied file and/or file-level snapshot, and c) instructing the storage module to perform a range clone operation configured to clone the source file to the destination file. -
FIG. 29A depicts various embodiments for implementing the range clone operation. In some embodiments, and as depicted instate 2911A, thestorage layer 130 may be configured to maintain anlogical address space 136 in which LIDs of the source file are mapped to source file data in the index 2804 (e.g., as disclosed inFIGS. 28A-E ). The corresponding range clone operation depicted instate 2911B may comprise mapping the LIDs of the source file and the destination file to the source file data. The range clone operation may further comprise storing apersistent note 2866 on thestorage device 120 to indicate that the file data is associated with both the source file LIDs and the destination file LIDs. The range clone operation may further comprise storing the file data in accordance with the uploaded contextual format, as disclosed herein. - In other embodiments, the clone operation may leverage a reference index 2809 (e.g., as disclosed in
FIGS. 28F-J ). Before the range clone operation, instate 2911C, the LIDs of the source file may be directly mapped to the corresponding file data in theindex 2804. Creating the range clone instate 2911D may comprise creating a reference entry in thereference index 2809 and associating the source file LIDs and destination file LIDs with the reference entry. The range clone operation may further comprise storing apersistent note 2866 on the storage device and/or updating the contextual format of the file data, as disclosed herein. - The
storage layer 130 may implement the clone operation using a two-layer mapping embodiment (e.g., as disclosed inFIGS. 28K-L ). Initially, the source file may correspond to virtual identifiers (VIDs) in a virtual address space (VID index 2884), which may be mapped to file data LIDs in the logical address space 136 (in the index 2804). Performing the range clone operation may comprise associating the destination file VIDs with the LIDs of the intermediate mapping layer. The range clone operation may further comprise storing a persistent note on thestorage device 120 indicating that the destination VIDs are associated with the file data LIDs. Since the file data is already bound to the intermediate file data LIDs, the contextual format of the file data may not need to be updated. - The
file system 2916 may be further configured to leverage thestorage layer 130 to checkpoint mmap operations. As used herein, an “mmap” operation refers to an operation in which the contents of files are accessed as pages of memory through standard load and store operations rather than the standard read/write interfaces provided by thefile system 2916. An “msync” operation refers to an operation to flush the dirty pages of the file (if any) to thestorage device 120. The use of mmap operations may make file checkpointing difficult. File operations are performed in memory and an msync is issued when the state has to be saved. However, the state of the file after msync represents the current in-memory state and the last saved state is lost. If thefile system 2916 were to crash during an msync, the file could be left in an inconsistent state. - In some embodiments, the
file system 2916 is configured to checkpoint the state of an mmap-ed file during calls with msync. Checkpointing the file may comprise creating a file-level snapshot (and/or range clone), as disclosed above. The file-level snapshot may be configured to save the state of the file before the changes are applied. When the msync is issued, another clone may be created to reflect the changes applied in the msync operation. As depicted inFIG. 29B , in state 2913 (prior to the mmap operation),file 1 may be associated with LIDs 10-13 and corresponding media addresses P1-P4. In response to the mmap operation, thefile system 2916 may perform a range clone operation through theinterface 138 of thestorage layer 130, which may comprise creating a cloned file 1.1. The cloned file 1.1 may be associated with a different set of LIDs 40-43 that reference the same data (same media addresses P1-P4). In other embodiments, the files may be cloned using a reference entry embodiment and/or two-layer mapping embodiment, as disclosed above. - In response to an msync call, the
file system 2916 may perform another range clone operation (through the interface 138). As illustrated instate 2913C, the range clone operation associated with the msync operation may comprise updating thefile 1 with the contents of one or more dirty pages (media addresses P5 and P6) and cloning the updatedfile 1 as file 1.2. The file 1.1 may reflect the state of the file before the msync operation. Accordantly, in the event of a failure, thefile system 2916 may be capable of reconstructing the previous state of thefile 1. - The
storage layer 130 may be further configured to implement efficient atomic storage operations. Referring toFIG. 29C , in some embodiments, thestorage layer 130 comprises anatomic storage module 2932. As used herein, an atomic storage operation refers to a storage operation that is either fully completed, or is rolled back as a whole. Accordingly, atomic storage operations may not remain in a “partially completed” state. Implementing atomic storage operations, and particularly, atomic storage operations comprising multiple steps and/or pertaining to multiple different LID ranges or vectors, may impose high overhead costs. For example, some database systems implement atomic storage operations using multiple sets of redundant write operations. Theatomic storage module 2932 may leverage the range clone, range move, and/or other operations disclosed herein to increase the efficiency of atomic storage operations. - In some embodiments, the
interface 138 provides APIs and/or interfaces for performing vectored atomic storage operations. A vector may be defined as a data structure, such as: -
struct iovect { uint64 iov_base; // Base address of memory region for input or output uint32 iov_len; // Size of the memory referenced by iov_base uint64 dest_lid; // Destination logical identifier } - The iov_base parameter may reference a memory or buffer location comprising data of the vector, iov_len may refer to a length or size of the data buffer, and dest_lid may refer to the destination logical identifier(s) for the vector (e.g., base logical identifier, the length of the logical identifier range may be implied and/or derived from the input buffer iov_len).
- A vector storage request to write data to one or more vectors may, therefore, be defined as follows:
-
vector_write ( int fileids, const struct iovect *iov, uint32 iov_cnt, uint32 flag) - The vector write operation above may be configured to gather data from each of the vector data structures referenced by the *iov pointer and/or specified by the vector count parameter (iov_cnt), and write the data to the destination logical identifier(s) specified in the respective iovect structures (e.g., dest_lid). The flag parameter may specify whether the vector write operation should be implemented as an atomic vector operation.
- As illustrated above, a vector storage request may comprise performing the same operation on each of a plurality of vectors (e.g., implicitly perform a write operation pertaining to one or more different vectors). In some embodiments, a vector storage request may specify different I/O operations for each constituent vector. Accordingly, each iovect data structure may comprise a respective operation indicator. In some embodiments, the iovect structure may be extended as follows:
-
struct iovect { uint64 iov_base; // Base address of memory region for input or output uint32 iov_len; // Size of the memory referenced by iov_base uint32 iov_flag; // Vector operation flag uint64 dest_lid; // Destination logical identifier } - The iov_flag parameter may specify the storage operation to perform on the vector. The iov_flag may specify any suitable storage operation, which include, but is not limited to, a write, a read, an atomic write, a trim or discard request, a delete request, a format request, a patterned write request (e.g., request to write a specified pattern), a write zero request, or an atomic write operation with verification request, allocation request, or the like. The vector storage request interface described above, may be extended to accept vector structures:
-
vector_request( int fileids, const struct iovect *iov, uint32 iov_cnt, uint32 flag) - The flag parameter may specify whether the vector operations of the vector_request are to be performed atomically.
- The
atomic storage module 136 may be configured to redirect storage operations pertaining to an atomic storage operation to a pre-determined range (an “in-process” range). The in-process range may be a designated portion of thelogical address space 136 that is not accessible to thestorage clients 116. Alternatively, the in-process range may be implemented in a separate address namespace. After the atomic storage operation has been completed within the in-process range (e.g., all of the constituent I/O vectors have been processed), theatomic storage module 2932 may perform an atomic range move operation to move the data from the in-process range to the destination range(s). As disclosed above, the range move operation may comprise writing a singlepersistent note 2866 to thestorage device 120. - A
storage client 116 may issue an atomic write request pertaining tovectors FIG. 29C , before the atomic storage operation is performed (atstate 2915A), the LIDs 10-13 ofvector 2940A may be bound to media addresses P1-P4 and the LIDs 36-38 ofvector 2940B may be bound to media addresses P6-8. As depicted instate 2915B, theatomic storage module 2932 may be configured to redirect the atomic storage operations to an in-process index 2836. As disclosed above, the in-process index 2836 may comprise a designated region of thelogical address space 136 and/or may be implemented within a separate index and/or address namespace. Thevector 2942A within the in-processes index 2836 may correspond to the LIDs 10-13 ofvector 2940A and the in-process vector 2942B may correspond to the LIDs 36-38 ofvector 2940B. Thevectors vectors index 2804. Implementing the atomic storage operations instate 2915B may comprise appending data to thestorage device 120 in association with the in-process LIDs Z0-Z3 and/or Z6-Z6 of the in-process vectors process index 2936. - If the atomic storage operation fails before completion, the original data of
vectors - As illustrated in
FIG. 29C , instate 2915B, the atomic storage operation(s) may be completed within the in-process index 2936. Completion of the atomic storage request may comprise performing a range move operation to move the data written to the in-process vectors logical address space 136. The range move operation may comprise performing an atomic storage operation to store a persistent note on the storage device to bind the media address P9-P13 to LIDs 10-13 and P100-102 to LIDs 36-38. The range move operation may be implemented in other ways including, but not limited to: the reference entry embodiments ofFIGS. 28F-J and/or the two-layer mapping embodiments ofFIGS. 28K-L . -
FIG. 30 is a flow diagram of one embodiment of amethod 3000 for managing a logical interface of data stored in a contextual format on a non-volatile storage medium. -
Step 3020 may comprise modifying a logical interface of data stored in a contextual format on a non-volatile storage media. The logical interface may be modified atstep 3020 in response to performing an operation on the data, which may include, but is not limited to: a clone operation, a deduplication operation, a move operation, or the like. The request may originate from astorage client 116, the storage layer 130 (e.g., deduplication module 374), or the like. - Modifying the logical interface may comprise modifying the LID(s) associated with the data, which may include, but is not limited to: referencing the data using one or more additional LIDs (e.g., clone, deduplication, etc.), changing the LID(s) associated with the data (e.g., a move), or the like. The modified logical interface may be inconsistent with the contextual format of the data on the
non-volatile storage media 122, as described above. -
Step 3020 may further comprise storing a persistent note on thenon-volatile storage media 122 that identifies the modification to the logical interface. The persistent note may be used to make the logical operation persistent and crash safe, such that the modified logical interface (e.g., storage metadata 135) of the data may be reconstructed from the contents of the non-volatile storage media 122 (if necessary).Step 3020 may further comprise acknowledging that the logical interface has been modified (e.g., returning from an API call, returning an explicit acknowledgement, or the like). The acknowledgement occur (and access through the modified logical interface at step 3030) before the contextual format of the data is updated on thenon-volatile storage media 122. Accordingly, the logical operation may not wait until the data is rewritten and/or relocated; as discussed below, updating contextual format of the data may be deferred and/or implemented in a processes that is outside of the “critical path” of themethod 3000 and/or the path for servicing other storage operations and/or requests. -
Step 3030 may comprise providing access to the data in the inconsistent contextual format through the modified logical interface ofstep 3020. As described above, updating the contextual format of the data to be consistent with the modified contextual interface may comprise rewriting and/or relocating the data on the non-volatile storage media, which may impose additional latency on the operation ofstep 3020 and/or other storage operations pertaining to the modified logical interface. Therefore, thestorage layer 130 may be configured to provide access to the data in the inconsistent contextual format while (or before) the contextual format of the data is updated. Providing access to the data atstep 3030 may comprise referencing and/or linking to one or more reference entries corresponding to the data (via one or more indirect entries), as described above. -
Step 3040 may comprise updating the contextual format of the data on thenon-volatile storage media 122 to be consistent with the modified logical interface ofstep 3020.Step 3040 may comprise rewriting and/or relocating the data to another media storage location on thenon-volatile storage media 122 and/or on anothernon-volatile storage device 120A-N. As described above,step 3040 may be implemented using a process that is outside of the critical path ofstep 3020 and/or other storage requests performed by thestorage layer 130;step 3040 may be implemented by another, autonomous module, such asgroomer module 370,deduplication module 374, or the like. Accordingly, the contextual format of the data may be updated independent of servicing other storage operations and/or requests. As such,step 3040 may comprise deferring an immediate update of the contextual format of the data, and updating the contextual format of the data in one or more “background” processes, such as a groomer process. Alternatively, or in addition, updating the contextual format of the data may occur in response to (e.g., along with) other storage operations. For example, a subsequent request to modify the data may cause the data to be rewritten out-of-place and in the updated contextual format (e.g., as described above in connection withFIG. 29C ). -
Step 3040 may further comprise updatingstorage metadata 135 as the contextual format of the data is updated. As data is rewritten and/or relocated in the updated contextual format, thestorage layer 130 may update the storage metadata 135 (e.g., index) accordingly. The updates may comprise removing one or more links to reference entries in a reference index and/or replacing indirect entries with local entries, as described above.Step 3040 may further comprise invalidating and/or removing a persistent note from thenon-volatile storage media 122 in response to updating the contextual format of the data and/or persisting thestorage metadata 135, as described above. -
FIG. 31 is a flow diagram of another embodiment of amethod 3100 for managing a logical interface of data stored in a contextual format on a non-volatile storage media. Themethod 3100 may be implemented by one or more modules and/or components of thestorage controller 140, such as thegroomer module 370, disclosed herein. -
Step 3120 comprises selecting a storage division for recovery, such as an erase block or logical erase block. As described above, the selection ofstep 3120 may be based upon a number of different factors, such as a lack of available storage capacity, detecting a percentage of data marked as invalid within a particular logical erase block reaching a threshold, a consolidation of valid data, an error detection rate reaching a threshold, improving data distribution, data refresh, or the like. Alternatively, or in addition, the selection criteria ofstep 3120 may include whether the storage division comprises data in a contextual format that is inconsistent with a corresponding logical interface thereof, as described above. - As discussed above, recovering (or reclaiming) a storage division may comprise erasing the storage division and relocating valid data thereon (if any) to other storage locations on the non-volatile storage media.
Step 3130 may comprise determining whether the contextual format of data to be relocated in a grooming operation should be updated (e.g., is inconsistent with the logical interface of the data).Step 3130 may comprise accessingstorage metadata 135, such as the indexes described above, to determine whether the persistent metadata (e.g., logical interface metadata) of the data is consistent with thestorage metadata 135 of the data. If the persistent metadata is not consistent with the storage metadata 135 (e.g., associates the data with different LIDs, as described above), the flow continues atstep 3140; otherwise, the flow continues atstep 3150. -
Step 3140 may comprise updating the contextual format of the data to be consistent with the logical interface of the data.Step 3140 may comprise modifying the logical interface metadata to reference a different set of LIDs (and/or reference entries), as described above. -
Step 3150 comprises relocating the data to a different storage location in a log format that, as described above, preserves an ordered sequence of storage operations performed on the non-volatile storage media. Accordingly, the relocated data (in the updated contextual format) may be identified as the valid and up-to-date version of the data when reconstructing the storage metadata 135 (if necessary).Step 3150 may further comprise updating thestorage metadata 135 to bind the logical interface of the data to the new media storage locations of the data, remove indirect and/or reference entries to the data in the inconsistent contextual format, and so on, as disclosed herein. -
FIG. 32 is a flow diagram of another embodiment of amethod 3200 for managing logical interfaces of data stored in a contextual format.Step 3215 may comprise identifying duplicate data on one ormore storage devices 120.Step 3215 may be performed by adeduplication module 374 operating within thestorage layer 130. Alternatively,step 3220 may be performed by thestorage layer 130 as storage operations are performed. -
Step 3215 may comprise determining and/or verifying that thenon-volatile storage media 122 comprises duplicate data (or already comprises data of a write and/or modify request). Accordingly,step 3220 may occur within the path of a storage operation (e.g., as or before duplicate data is written to the non-volatile storage media 122) and/or may occur outside of the path of servicing storage operations (e.g., identify duplicate data already stored on the non-volatile storage media 122).Step 3220 may comprise generating and/or maintaining data signatures instorage metadata 135, and using the signature to identify duplicate data. - In response to identifying the duplicate data at
step 3215, the storage layer 130 (or other module, such as the deduplication module 374) may modify a logical interface of a copy of the data, such that a single copy may be referenced by two (or more) sets of LIDs. The modification to the logical interface atstep 3220 may comprise updatingstorage metadata 135 and/or storing a persistent note on thenon-volatile storage media 135, as described above.Step 3220 may further comprise invalidating and/or removing other copies of the data on the non-volatile storage media, as described above. - The contextual format of the data on the
non-volatile storage media 122 may be inconsistent with the modified logical interface. Therefore, steps 3230 and 3240 may comprise providing access to the data in the inconsistent contextual format through the modified logical interface and updating the contextual format of the data on thenon-volatile storage media 122, as described above. - Referring back to the cloning embodiments depicted in
FIGS. 28A and 28B , in other examples, clone operations may be used to perform atomic operations, such as multi-step writes or transactions. An atomic operation to modify a data in a particular logical address range may comprise creating a clone of the logical address range, implementing storage operations within the clone, and, when the operations complete, “folding” the clone back into the logical address space 136 (e.g., overlaying the original logical address range with the clone). As used herein, “folding” a logical address range refers to combining two or more address ranges together (e.g., folding a logical address range with a clone thereof). The folding may occur according to one of a plurality of operational modes, which may include, but are not limited to: an “overwrite” mode, in which the contents of one of one logical address range “overwrites” the contents of another logical address range, a “merge” mode, in which the contents of the logical address ranges are merged together (e.g., in a logical OR operation), or the like. -
FIG. 33A depicts one example of a clone betweenentries index 3304. Following the clone operation, a storage client modified the data within the clone 972-983, with the updated data being stored at media storage locations 195-206. Folding theclone 2824 back into theentry 2814 in an “overwrite” mode results in theentry 2814 being bound to the media storage locations of the clone 2824 (195-206). Portions of theclone 2824 that were not modified (if any) may remain unchanged in theentry 2814. - In another example, in which the LID range of the clone was modified (e.g., data was appended or deleted from the clone), the
LID 2814 would be modified in a corresponding way. Accordingly, a folding operation may comprise allocation of additional LIDs in thelogical address space 136. Therefore, in some embodiments, clones may be tied to one another (e.g., usingentry metadata 2819 and/or 2829). An extension to a clone, such asentry 2824, may be predicated on the logical address range being available to theoriginal entry 2814. The link between the entries may be predicated on the “mode” of the clone as described above. For example, if the entries are not to be “folded” at a later time, the clones may not be linked. -
FIG. 33B depicts another example of a folding operation using reference and indirect entries. The clones 3314 and 3324 are linked to reference entries 3395 in areference index 3390 associated with data of the clone. Astorage client 116 may modify one clone 3324, resulting in modified data being bound to the clone 3324 (e.g.,entry 9217 is bound to media storage location 8923). Accordingly, the clone 3324 has diverged from the clone 3314. When folding the clone 3324 into the clone 3314, the modified data of 9217 may overwrite the original data (e.g., the data at media storage location 872). - As described above, clones may be “tied” together, according to an operational mode of the clones. For example, changes to a clone may be automatically mirrored in the other clone. This mirroring may be uni-directional, bi-direction, or the like. The nature of the tie between clones may be maintained in storage metadata (e.g.,
metadata entries storage layer 130 may access themetadata entries 2819 and/or 2829 when storage operations are performed within the LID ranges 2815 and/or 2825 to determine what, if any, synchronization operations are to be performed. - In some embodiments, data of a clone may be designated as ephemeral, as described above. Accordingly, if upon reboot (or another condition), the ephemeral designation is not removed, the clone may be deleted (e.g., invalidated as described above).
FIG. 34 is a flow diagram of another embodiment of a method for cloning ranges of alogical address space 136. -
Step 3420 may comprise receiving a request to create a clone. The request may be received from astorage client 116 through aninterface 138 and/or may be part of a higher-level API provided by thestorage layer 130. The request may include an “operational mode” of the clone, which may include, but is not limited to: how the clones are to be synchronized, if at all, how folding is to occur, whether the copy is to be designated as ephemeral, and so on. -
Step 3430 may comprise allowing LIDs in thelogical address space 136 to service the request. The allocation ofstep 3430 may further comprise reserving physical storage space to accommodate changes to the clone. The reservation of physical storage space may be predicated on the operational mode of the clone. For instance, if all changes are to be synchronized between the clone and the original address range, a small portion (if any) physical storage space may be reserved.Step 3430 may further comprise allocating the clone within a designated portion or segment of the logical address space 136 (e.g., a range dedicated for use with clones). -
Step 3440 may comprise updating the logical interface of data of the clone, as described above.Step 3440 may further comprise storing a persistent note on the non-volatile storage media to make the clone persistent and crash safe, as described above. - Step 3450 may comprise receiving a storage request and determining if a storage request pertains to the original LID range and/or the clone of the LID range. If so, the flow continues to step 3460, otherwise, the flow remains on step 3450.
-
Step 3460 may comprise determining what (if any) operations are to be taken on the other associated LID ranges (e.g., synchronize changes, allocate logical and/or physical storage resources, or the like). The determination ofstep 3460 may comprise accessing storage metadata describing the operational mode of the clone and/or the nature of the “tie” (if any) between the original LIDs and the clone thereof. -
Step 3470 may comprise performing the operations (if any) determined atstep 3460 along with the requested storage operation. If one or more of the synchronization operations cannot be performed (e.g., additionallogical address space 136 cannot be allocated), the underlying storage operation may fail. -
FIG. 35 is a flow diagram of another embodiment of a method for managing clones of contextual data.Step 3521 may comprise creating a clone of a logical address range as disclosed herein. Atstep 3531, one or more storage operations within the original logical address range and/or the clone thereof are performed along with additional, synchronization operations (if any), as described above. - At
step 3541, a request to fold the clone is received. The request may specify an operational mode of the fold and/or the operational mode may have been specified when the clone was created atstep 3521. -
Step 3551 comprises folding the clone back into thelogical address space 136 of the original logical range.Step 3551 may comprise overwriting the contents of the original logical address range with the contents of the clone, merging the logical address ranges (e.g., in an OR operation), or the like. In some embodiments, the merging comprises deleting (e.g., invalidating) the clone, which may comprise removing entries of the clone from the storage metadata index, removing shared references to media storage locations from a reference count datastructure, and the like.Step 3551 may further comprise modifying a logical interface of the merged data, as described above. The modified logical interface may change the LIDs used to reference the data. The modified logical interface may be inconsistent with the contextual format of the data on thenon-volatile storage media 122. Therefore,step 3551 may further comprise providing access to the data in the inconsistent contextual format and/or updating the contextual format of the data, as described above. - As disclosed above, in some embodiments, the
storage layer 130 may be configured to segment the logical addresslogical address space 136 into a plurality of contiguous LID ranges. As illustrated inFIG. 19A , a LID (e.g., address) 1900 is segmented into afirst portion 1952 and asecond portion 1954. In some embodiments, thefirst portion 1952 comprises “high-order” bits of theLID 1900, and the second portion comprises “low-order” bits. Thefirst portion 1952 may serve as a reference or identifier, and thesecond portion 1952 may represent a range (e.g., block size) offset within a contiguous range of LIDs. In this manner, thestorage layer 130 may logically segment or divide the sparse logical address space into segments of contiguous LIDs that can be efficiently allocated as a group. In theFIG. 19A embodiment, segmenting LIDs into 32 high order and 32 low order bits may result in alogical address space 136 that is capable of representing 2̂32−1 unique LID allocation ranges (e.g., using the first portion of the LIDs 1952), each of which have a maximum size (or offset) of 2̂32 virtual storage locations (e.g., 2 TB for a virtual storage location size of 512 bytes). In other embodiments, different segmentation schemes may be used. In embodiments require a large number of small storage entities (e.g., database applications, messaging applications, or the like), thefirst portion 1952 may comprise a larger proportion of the LID address range and the second portion 1954 (e.g.,first portion 1952 comprising 42 bits providing 2̂42−1 unique identifiers). Alternatively, where larger storage entitles are used, the ratio between the size of the first andsecond address portions - The LID segmentation scheme disclosed herein may be used to define an allocation granularity of the
logical address space 136. In theFIG. 19A embodiment, the allocation granularity is fixed according to the segmentation of thelogical addresses 1900; each allocation operation in thelogical address space 136 comprises allocating X LIDs where X is determined according to the size of thesecond portion 1954 of the logical addresses 1900. The allocation granularity may also determine the number of unique storage entities that can be represented within the logical address space: in theFIG. 19A embodiment, the logical address space is capable of supporting Y unique storage entities (Y unique LID ranges of size X) where Y is determined according to the size of thefirst portion 1952. - The fixed allocation granularity may result in wasted storage resources. In embodiments in which each file is allocated a pre-determined range of contiguous LIDs (e.g., 2̂32−1 LIDs), a large proportion of the LIDs allocated for small files will likely never be used, which may result in increased metadata overhead and/or may reduce the number of unique files that can be represented within the
logical address space 136. Similarly, large files that do not fit within a single LID allocation range (e.g., require more than 2̂32 LIDs) may have to allocate multiple LID ranges, which may result in additional wasted resources. These issues may be compounded in embodiments that have a more limited logical address space 136 (e.g., fewer number of bits available to represent LIDs 1900). In some embodiments, for example, LIDs may be limited to 48 bits rather than 64, due to, inter alia, operating system limitations, addressing limitation, addressing overhead (e.g., use of a portion of a LID to represent different virtual storage units), and so on. - Accordingly, in some embodiments, the
storage layer 130 may be configured to implement an adaptive and/or variable allocation scheme in which different portions of thelogical address space 136 are configured to provide a different, respective allocation granularity. As used herein, “allocation granularity” refers to the amount of storage resources that are allocated in a single allocation operation. The allocation granularity of a region may refer to the size of LID blocks or ranges allocated in the region. In theFIG. 19A embodiment, the allocation granularity of thelogical address space 1900 was determined according to the size of the first andsecond portions LIDs 1900. - Alternatively, or in addition, allocation granularity may refer to physical storage allocations and/or operations. As disclosed above, LIDs in the
logical address space 136 may correspond to (be bound to) physical storage resources, such as physical sectors. As used herein a “physical sector,” “data sector,” or “sector” refers to physical storage capacity capable of storing a particular amount of data. The physical sector size may, therefore, determine the granularity of data storage operations performed on thestorage device 120; the data sector size may determine the smallest granularity of write/read operations that can be performed on thestorage device 120. As such,storage clients 116 may be configured align storage operations in accordance with a particular data sector size. For example, in embodiments comprising a 512 byte sector size,storage clients 116 may adapt storage operations to fall within the 512 byte boundaries. In some storage systems, the sector size is based on physical characteristics the underlying storage devices; a storage device may, for example, be physically partitioned into sectors or pages having a particular, pre-determined size. By contrast, thestorage layer 130 disclosed herein, may be capable of storing data within large, logical constructs, such aslogical storage divisions 253 and/orlogical pages 254, of thelogical storage element 229, disclosed above inFIG. 3A . As such, thestorage layer 130 may be capable of performing storage operations according to arbitrarily-sized physical sectors that are independent of the underlying partitioning of thestorage device 120 and/or individual, non-volatile storage elements 123. Accordingly, in some embodiments, the physical sector size implemented by thestorage layer 130 may be configurable and/or variable. As disclosed in further detail below, the physical sector size may vary within different regions of thelogical address space 136; a LID in a first allocation region of thelogical address space 136 may correspond to a 512 byte sector and a LID in a different allocation region may correspond to a 4 kb sector, and so on.Storage clients 116 may be configured to operate in different allocation regions in accordance with a preferred physical sector size. -
FIG. 36A is a block diagram of asystem 3600 comprising another embodiment of astorage layer 130. Thestorage layer 130 of theFIG. 36A embodiment may comprise an allocation module 3360 configured to manage allocation within one or more of thelogical address space 136 andstorage device 120. As disclosed above, thestorage layer 130 may be configured to store data on astorage device 120 in a contextual, log-based format. In theFIG. 36A embodiment, thestorage device 120 may comprise a plurality of independentnon-volatile storage elements 223. Thenon-volatile storage elements 223 may comprise solid-state storage elements, packages, die, chips, and/or the like. Thestorage controller 140 may be configured to manage the independent,non-volatile storage elements 223 as alogical storage element 229. Thestorage layer 130 may, therefore, be capable of storing data within logical storage units (e.g., logical pages) 254, which may be formed by combining physical storage units (e.g., pages) 252 of a plurality of thenon-volatile storage elements 223. Accordingly, thestorage layer 130 may be capable of storing data segments of different sizes (e.g., different physical sector sizes), independent of the underlying partitioning and/or configuration of thenon-volatile storage elements 223. In some embodiments, for example, thenon-volatile storage elements 223 may comprise 2 kb physical pages. Thelogical storage units 254 may comprise 25 physical pages of separatenon-volatile storage elements 223, which may allow thestorage layer 130 to perform read/write operations ranging from 0 to 50 kb. The disclosure is not limited in this regard, however, and could be adapted to includelogical storage elements 229 comprising any number ofnon-volatile storage elements 223 having any suitable page size. - The
storage controller 140 may be further configured to store data in a contextual, log-based format (a packet format). As disclosed above, the data writemodule 240 may be configured to generate packets corresponding to any suitable physical sector size (comprising any sized data segment). The size of the packets may be independent of the underlying partitioning and/or arrangement of thenon-volatile storage elements 223. Therefore, thestorage layer 130 may be capable of performing storage operations corresponding to any suitable physical sector size and/or physical granularity from a few bytes (e.g., 256 byte sector sizes) to 50 kb, or more. Thestorage layer 130 may be configured to store a packet comprising a 512 data segment within alogical page 254 along a packet comprising a 2kb data segment 3612B. - In some embodiments, the
allocation module 3660 comprises apartition module 3662 configured to partition and/or segment the logical address space into two or more allocation regions. The allocation regions may correspond to different allocation granularities. The allocation granularity of a particular region may refer to the allocation of physical storage resources (e.g., physical sector size) and/or logical allocation granularity such as LID allocation block size. Theallocation module 3660 may further comprise anallocation policy module 3664 configured to determine an allocation granularity forstorage client 116, storage requests, and/or storage entities and/or to selectively reallocate storage resources. Thereallocation module 3666 may be configured to reallocate storage resources, which may comprise performing one or more of the range clone and/or range move operations, as disclosed herein. -
FIG. 36B illustrates one embodiment of alogical address space 136 comprisingallocation regions partitioning module 3662 may be configured to partition thelogical address space 136 into any number of partitions, corresponding to any suitable physical allocation granularity (e.g., any suitable sector size). The allocation regions ofFIG. 36B may correspond to the granularity of physical storage allocation: LIDs within theregion 3638A may correspond to a relative small data sectors (e.g., 512 bytes);region 3638B may correspond to a larger data sectors (e.g., 2 kb); andregion 3638N may correspond to larger 4 k data sectors. Accordingly, storage operations pertaining to LIDs within theregion 3636A may correspond to 512 byte physical sectors; LIDs within theregion 3636A may corresponddata packets 3688A comprising 512byte data segments 3612A. Storage operations performed within theregion 3636A may, therefore, operate at a 512 byte sector granularity (the smallest read/write operation inregion 3636A is 512 bytes). - In the
FIG. 36B embodiment, theLID 3636A inregion 3638A is bound todata packet 3688A (in the index 2804). Thedata packet 3688A may comprise a 512byte data segment 3612A in accordance with the physical allocation granularity of theregion 3638A. In some embodiments, the persistent metadata 3864 of thepackets 3688A-N compriserespective size indicators 3687A-N indicating a size of the correspondingdata segments 3612A-N. Alternatively, or in addition, the data segment size may be indicated in theindex 2804 and/orother metadata 135. TheLID 3636B inregion 3638B may be bound todata packet 3688B. The data packet 3688 may comprise a 2 k data segment 3812B in accordance with the physical sector size ofregion 3638B.LID 3636N may be bound to data packet 3688N, which may comprise a 4 k data segment 3812N in accordance with the physical allocation granularity ofregion 3638N. The differentlysized data packets 3688A-N may be stored at arbitrary physical storage locations within thestorage device 120. In some embodiments, thedata packets 3688A-N may be stored within large,logical storage units 254 of alogical storage element 229, as disclosed above. AlthoughFIG. 36B depicts a particular embodiment of logical address partitioning, the disclosure is not limited in this regard and could be adapted to partition thelogical address space 136 into any number ofdifferent allocation regions 3638A-N corresponding to any suitable physical and/or logical allocation granularity. -
Certain storage clients 116 may operate more efficiently at specific sector sizes. For example, an application that processes large amounts of contiguous data may operate most efficiently with large 4 kb sector sizes. Other applications that rely on a large number of relatively small transactions may operate more efficiently using smaller sector sizes. In some embodiments, theinterface 138 provides mechanisms for specifying a desired sector size for particular storage and/or allocation operations. A filesystem storage client 2916 may, for example, specify that storage operations pertaining to a particular file 2929A be performed at a 2 k sector size. In response, theallocation module 3660 may allocate LIDs for thefile 2919A within theregion 3638B of thelogical address space 136. Alternatively, or in addition, the file system 2916 (and/or other storage clients 116) may query theinterface 138 for information pertaining to theavailable allocation regions 3638A-N and/or data sector sizes supported by thestorage layer 130. Thestorage clients 116 may selectively allocate LIDs within theregions 3638A-N in accordance with a desired physical allocation granularity (sector size). The filesystem storage client 2916 may, therefore, be configured to allocate LIDs having different sector sizes fordifferent files 2919A-N according to the access characteristics of the files 2929A-N. As such, the filesystem storage client 2916 may be capable of supportingfiles 2919A-N having different respective data sector sizes. In some embodiments, users may specify a desired file sector size through, inter alia, ioctrl parameters, an fadvise API, and/or the like. - Referring back to
FIG. 36A , thelog storage module 137 may be configured to provide for storing store data according to the sector size assigned to the data (corresponding to the LID associated with the data). Thelog storage module 137 may determine the sector size in reference to, inter alia, thestorage metadata 135,index 2804, and/orallocation module 3660. Thelog storage module 137 may configure the storage device controller 126 (data write module 240) to packetize the data in accordance with the sector size for storage within the log on thestorage device 120. Thelog storage module 137 may be further configured to provide for reading data of various, different data sector sizes. In response to a read request pertaining to a particular LID, thelog storage module 137 may determine the sector size corresponding to the LID (as above), and may configure the data readmodule 241 to read the corresponding data packet size. -
FIG. 36C illustrates another embodiment of alogical address space 136 that has been partitioned into a plurality of allocation regions. In theFIG. 36C embodiment, thepartitioning module 3662 may be configured to partition thelogical address space 136 into 4 regions: 3650A, 3650B, 3650C, and 3650D. Eachregion 3650A-D may correspond to a different, respective allocation granularity. In theFIG. 36C embodiment, allocation operations within theregions 3650A-D correspond to different logical allocation granularities; logical allocation operations within theregions 3650A-D correspond to differently sized LID extents (blocks of LIDs). Accordingly, allocations within eachregion 3650A-D may result in allocating a different number of contiguous LIDs (different range of contiguous LIDs). Theregion 3650A may comprise large contiguous LID ranges 3651A, such that allocation operations therein result in allocating a large number of contiguous LIDs (e.g., 2̂34 LIDs). Theregion 3650A may, therefore, be suited for large storage entities (e.g., large files). By contrast, theregion 3650D may correspond to relatively small contiguous LID ranges 3651D, such allocation operations therein result in allocating a smaller number of contiguous LIDs (e.g., 2̂12 LIDs). As such, theregion 3650D may be suited for use with smaller storage entities (e.g., small files, objects, database tables, or the like). Theother regions - Although
FIG. 36C depictsregions 3650A-D as being of approximately the same size (e.g., thelogical address space 136 is equally segmented into fourregions 3650A-D), the disclosure is not limited in this regard. In some embodiments, partitioning module 3636 may be configured to segment thelogical address space 136 into differentlysized regions 3650A-D and/or into different numbers ofregions 3650A-D. For example, thelogical address space 136 may be segmented into two regions, a first region for large files and a second region for small files, and large file region may be allocated a larger proportion of the logical address space than the small file region, or vice versa. - Referring back to the
FIG. 36C , eachregion 3650A-D may comprise and/or result in a different segmentation of theLIDs 1901A-D. The portion of aLID 1901A-D comprising the “identifier” portion of theLID 1952A-D versus the “offset” or “range” portion of theLID 1954A-D may vary depending on the size of the underlying contiguous LID range 3651A-D. For example, theLIDs 1901A ofregion 3650A comprise a larger proportion offsetbits 1954A as compared to theLIDs 1901D ofregion 3650D. Conversely, theLIDs 1901D ofregion 3650D comprise a larger number ofidentifier bits 1952D as compared to theLIDs 1901A ofregion 3650A. Although not depicted inFIG. 36C , theLIDs 1901A-D may further comprise bits for specifying theregion 3650A-D of the LID, specifying a logical storage unit of the LID, and so on. In some embodiments, in the four-way segmentation ofFIG. 36C , eachLID 1901A-D may comprise two bits for specifying one of theregions 3650A-D. Alternatively, or in addition, thestorage layer 130 may track LID region relationships based upon pre-determined LID values or ranges (in theindex 2804 and/or other metadata 135), such that no region-specifying overhead is needed. - In some embodiments, the
storage layer 130 may provide access to allocation information through theinterface 138. In some embodiments, theinterface 138 may be configured to publish information pertaining to the allocation regions of thelogical address space 136, indicate the remaining, unallocated and/or unbound resources within a particular region and/or LID block, and the like. Theinterface 138 may be further configured to allowstorage clients 116 to specify a desired allocation granularity, physical sector size, and/or the like. In some embodiments, for example, an allocation request may specify the number of contiguous LIDs requested for allocation. In response, theallocation module 3660 may allocate the LIDs within the appropriate region. For example, theregion 3650A may contiguous LID ranges 3561A comprising 65536 LIDs,region 3650B may comprise contiguous LID ranges 3651B comprising 16384 LIDs,region 3650C may comprise contiguous LID ranges 3651C comprising 4096 LIDs, andregion 3650D may comprise contiguous LID ranges 3651D comprising 1024 LIDs. In response to a request to allocate 8024 LIDs, thestorage layer 130 may allocate an available contiguous LID range 3651B withinregion 3650B. Alternatively, thestorage layer 130 may allocate a contiguous LID range 3651B inregion 3650B and a contiguous LID range 3651C inregion 3650C. - In some embodiments, the
allocation module 3660 comprises anallocation policy module 3664 that is configured to select a suitable allocation granularity (region 3650A-D and/orphysical granularity region 3638A-3638N) based on one or more allocation policies, which may include, but are not limited to: availability of contiguous LID ranges in theregions 3650A-D, whether the LID range is expected to grow, information pertaining to thestorage client 116 associated with the request, information pertaining to an application associated with the request, information pertaining to a storage entity associated with the request (e.g., file information), explicit requests, request parameters (ioctrl, fadvise, etc.), and/or the like. In some embodiments, LID allocation requests may specify a particular allocation region (e.g.,LID region 3650A-D). For example, astorage client 116 may initially allocate a small LID range, but may know that the LID range may be required to grow over time (e.g., thestorage client 116 may be receiving a stream of data over a network). Accordingly, thestorage client 116 request an initially small LID allocation, but may specify that the LID allocation be serviced in theregion 3650A. In some embodiments, theallocation module 3660 may initially allocate LIDs in thesmallest granularity region 3650D, and may move storage entities to larger regions as needed. As such, even if a storage client requests a larger number of LIDs, theallocation module 3660 may defer allocation of additional LIDs until needed. - In some embodiments, the
allocation module 3660 may comprise a reallocation module configured to, inter alia, relocate storage entities between different allocation regions (e.g.,physical allocation regions 3638A-N and/orlogical allocation regions 3650A-D). In one embodiment, a file storage entity may be initially managed using LIDs within theregion 3650D. However, the file may grow to require more than a single, contiguous LID range 3651D. In response, thestorage layer 130 may allocate additional contiguous LID ranges 3651D within theregion 3650D. Alternatively,reallocation module 3666 may determine that the storage entity should be relocated, which may comprise a range move operation from theregion 3650D to anotherregion 3650A-C. As disclosed above in conjunction withFIG. 28A-33B , the range move operation may comprise a) modifying the logical interface of the data corresponding to the storage entity (in theindex 2804 and/or other storage metadata 135), b) storingpersistent note 2866 on thestorage device 120 associating the data with the updated logical interface, c) and/or rewriting the data in the uploaded logical interface in one or more background operations. Alternatively, or in addition, the range move operation may comprise modifying a two-layer mapping between the logical identifiers one or more intermediate mapping layers as disclosed above in connection withFIGS. 28K-28L . -
FIG. 37A illustrates on embodiment of an operation to move a storage entity (a file 3720) that occupies three contiguous LID ranges withinregion 3650C toregion 3650B. The original allocations for thefile 3720 are represented in theindex 2804 inrespective entries 3722, theentries 3722 may comprise arespective LID range 3723A-C and corresponding physical storage locations (media addresses) 3725A-C comprising data of the file one thestorage device 120, as disclosed herein. Alternatively, theentries 3722 may be combined into a single entry (not shown). Theentries 3722 may be maintained in theregion 3650C of the logical address space 136 (range of LIDs corresponding to theregion 3650C). Although not depicted inFIG. 37A to avoid obscuring the details of the illustrated embodiment, theindex 2804 may comprise other entries corresponding to other storage entities (e.g., other files) withinvarious regions 3650A-D of thelogical address space 136. - The
reallocation module 3666 determine that thefile 3720 should be moved fromregion 3650C toregion 3650D of thelogical address space 136 by use of, inter alia, thepolicy module 3664. Thepolicy module 3664 may identify files that should be reallocated (moved) based on one or more of: requests to allocate additional capacity for thefile 3720, in response to a balancing operation within thelogical address space 136, in response to availability issues (e.g., lack of availability in theregion 3650C), in response to a move request from astorage client 116, and/or the like. In some embodiments, thereallocation module 3666 may be configured to periodically balance thelogical address space 136, to move relatively large files (files comprising a number of contiguous LID ranges), into larger regions, so that the files may benefit from larger contiguous LID ranges. Similarly, files that have not used their allocated capacity for a predetermined time period, may be moved into smaller, granularity regions. - Moving the
file 3720 may comprise allocating one or more contiguous LID ranges 3651B in theregion 3650B. In theFIG. 37A embodiment, thereallocation module 3666 is configured to move thefile 3720 from theregion 3650C to theregion 3650B. Moving thefile 3720 may comprise allocating a contiguous LID range 3651D inregion 3650B, and performing a move operation, as disclosed above (e.g., modifying the logical interface of thefile data 3720, storing apersistent note 2866 on thestorage device 120, and/or updating the contextual format of the data to be consistent with the logical interface). Moving thefile 3720 may allow the file to be managed using contiguous LIDs of asingle entry 3732 in theregion 3650B of theindex 3704, as opposed tomultiple entries 3722. - As illustrated in
FIG. 37B , thefile 3720 has been moved toregion 3650B. Following the range move operation, thefile 3720 may grow by a relatively small increment. The increment may require additional capacity beyond thecontiguous LID region 3733A allocated to thefile 3720 inregion 3650B. In response, thestorage layer 130 may allocate additional LIDs in theregion 3650B (e.g., another contiguous LID range 3651B). However, if the increase to LID capacity is relatively small, allocating another, relatively large contiguous LID range 3651B may be inefficient (e.g., result in a large number of unused LIDs). As such, theallocation module 3660 may allocate LIDs in a different region. In theFIG. 37B embodiment, thestorage layer 130 allocates the additional LIDs in theregion 3650D, which comprises relatively small contiguous LID ranges 3651D. The LID allocation may be represented in anentry 3742 in the index 3704 (within aregion 3650D of the index 3704). Theentry 3742 may comprise a range ofLIDs 3743A allocated to thefile 3720, along with corresponding physical storage locations (e.g., physical addresses) 3745A, as described above. Thefile 3720 may, therefore, be managed using two noncontiguous sets of LIDs. In some embodiments, theentries entries same file 3720. Alternatively, a file system (and/or the storage layer 130) may maintain references to theentries 3732 and 3742 (e.g., an i-node or other datastructure). - In another embodiment, allocating additional LIDs comprises moving the
file 3720 into the region 3750A. Referring toFIG. 37C , thefile 3720 may be moved in response to a request to expand thefile 3720; in response, thereallocation module 3666 may be configured to move thefile 3720 from theregion 3650B to theregion 3650A. The move operation may comprise a) allocating a contiguous range of LIDs 3651A in theregion 3650A (represented by entry 3752) and b) performing a range move operation to modify the logical interface of the file data to the new LIDs 3615A, as disclosed herein. - Although
FIGS. 37A-C depict range move operations to move data to different logical allocation regions within thelogical address space 136, the disclosure is not limited in this regard; the same range move operations may be used to move data to/from differentphysical allocation regions 3638A-N ofFIG. 36B . Referring toFIG. 36B , in some embodiments, data stored in a plurality ofpackets 3688A comprising a 512byte data segments 3612A may be moved to a smaller number ofpackets 3688B comprising 2k data segments 3612B withinregion 3638B in one or more range move operations, as disclosed herein. The range move operation may comprise maintaining the data in thesmaller packets 3688A until the data is rewritten in a one or more background processes (e.g., grooming operations).Storage metadata 135 associated with the data (corresponding entries in the index 2804) may be configured to indicate that data of the LIDs inregion 3638B are stored with smaller physical segment sizes until the data is rewritten in the updatedpacket format 3688B. - Referring back to
FIG. 36A , theinterface 138 of thestorage layer 130 may be configured to provide logical and/or physical allocation information tostorage clients 116 through theinterface 138. In some embodiments, thefile system 2916 may leverage such information to streamline file management operations. Thefile system 2916 may perform journaling operations to, inter alia, persist metadata pertaining to allocation operations performed for thefiles 2919A-N managed thereby. The journaling operations may comprise storing metadata pertaining to logical and/or physical storage allocation operations. In some embodiments, thefile system 2916 may leverage allocation metadata to streamline such operations. Theinterface 138 may, for example, provide an indication of the remaining logical capacity of one or more of thefiles 2919A-N. For example, thefile 2919A may be allocated withinregion 3650B of thelogical address space 136 and, as such, may be allocated a particular range of LIDs. Thefile 2919A may only occupy a limited subset of the allocated LIDs. Thefile system 2916 may query the storage layer 130 (through the interface 138) to determine the remaining, allocated LID capacity for thefile 2919A, such that subsequent file expansions can be performed without explicit allocation requests. Thefile system 2916 may be further configured to identify anappropriate allocation region 3650A-D in accordance with an expected file size. - In some embodiments, the
reallocation module 3666 may be configured to move data to/fromdifferent regions 3638A-N of thelogical address space 136. Thereallocation module 3666 may move a storage entities (files) in response to determining that the storage entity is stored at an unsuitable physical granularity. For example, astorage client 116 may perform a large number of small write operations to data stored in alarge granularity region 3638N. The small write operations may, for example, comprise modifying 256 bytes of data within large 4 kb data sectors. Thereallocation module 3666 may be configured to move the data to theregion 3638A that has a smaller 512 byte granularity to improve the performance of the small write operations. The move may comprise a range move operation as disclosed above. The range move may further comprise rewritten one or more data packets 3688N comprising 4kb data segments 3612N as a plurality ofdata packets 3688A comprising smaller 512byte data segments 3612A. -
FIG. 38 is a flow diagram of one embodiment of amethod 3800 for managing storage allocation.Step 3820 may comprise defining a plurality of allocation regions within thelogical address space 136. The allocation regions may correspond to a logical allocation granularity (e.g., LID block size) and/or a physical allocation granularity (e.g., data sector size).Step 3820 may comprise partitioning thelogical address space 136 into different regions and/or sections, as disclosed above. Alternatively,step 3820 may comprise defining arbitrary ranges and/or sections of the logical address space to correspond to particular allocation regions. -
Step 3830 may comprise receiving an allocation request. The allocation request may be received with theinterface 138 of thestorage layer 130. The allocation request may comprise a request to allocate one or more LIDs. Alternatively, the allocation request may comprise a request to perform a storage operation (e.g., write data thestorage device 120 in a nameless write operation, or the like).Step 3830 may, therefore, comprise selecting an allocation region for the request by use of, inter alia, thepolicy module 3664. Thepolicy module 3664 may be configured to select the allocation region based on one or more request parameters, file-level knowledge (e.g., information about the data to be stored in connection with the allocated LIDs), application-level knowledge (e.g., information about thestorage client 116 associated with the request, data access characteristics, and the like), request parameters, and the like. -
Step 3840 may comprise allocating storage resources within one of the defined allocation regions.Step 3840 may comprise allocating a contiguous range of LIDs within a particularLID allocation region 3650A-D. Alternatively, or in addition,step 3840 may comprise allocating LIDs and/or storing data at a particular physical granularity (e.g., having a particular data sector size in accordance a selectedregion 3636A-N). -
FIG. 39 is a flow diagram of another embodiment of amethod 3900 for allocating storage resources.Step 3920 may comprise defining a plurality ofregions 3638A-N within alogical address space 136. The regions defined atstep 3920 may correspond to different, respective physical granularities. Accordingly, the LIDs of the definedregions 3638A-N may correspond to different physical sector sizes. -
Step 3930 may comprise associating a LID with a particular data sector size based on, inter alia, the regions defined atstep 3920.Step 3930 may be performed in response to receiving a storage request pertaining to the LID, such as request to write and/or modify data associated with the LID, a request to read data associated with the LID, and/or the like. The sector size may be determined in reference tostorage metadata 135, theindex 2804, theallocation module 3660, and/or the like. -
Step 3940 may comprise performing one or more storage operations in accordance with the determined sector size.Step 3940 may comprise configuring the data writemodule 240 to store data packets in accordance with the identified sector data. Alternatively, or in addition,step 3940 may comprise configuring the data readmodule 241 to read one or more data packets of a particular size, as disclosed above. -
FIG. 40 is a flow diagram of another embodiment of amethod 4000 for allocating storage resources.Step 4020 may comprise defining a plurality ofregions 3650A-D within the logical address space. Theregions 3650A-D may correspond to respective logical allocation granularities, as disclosed above. In some embodiments,step 4020 may comprise segmenting LIDs into respective identifier portions and/or offset or range portions. The segmentation of the LIDs may vary by region, as disclosed above. For example, regions comprising large contiguous LID ranges may use LIDs having a relatively large offset or range portion, and regions comprising relatively small contiguous LID ranges may use LIDs having a relatively small offset or range portion (and a larger identifier portion). The segmentation ofstep 3820 may comprise segmenting thelogical address space 136 into equally sized regions. Alternatively, the regions may vary in size and/or extent. -
Step 4030 may comprise allocating one or more LIDs to astorage client 116 within a selected region of thelogical address space 136.Step 4030 may comprise selecting a region of thelogical address space 136. Selection of the region may be based upon, inter alia, a size of the request, a request parameter (e.g., the storage client may request allocation within a particular region and/or allocation of a particular range of contiguous LIDs), configuration and/or preferences of the storage client, availability, request parameters (ioctrl, fadvise), and/or the like. Allocating the one or more LIDs may comprise allocating a contiguous range of LIDs in accordance with the allocation granularity of the selected region of thelogical address space 136. The contiguous range of LIDs allocated atstep 4030 may, therefore, comprise logical capacity that exceeds the number of LIDs requested by thestorage client 116. In some embodiments,step 4030 may comprise allocating one or more noncontiguous LID ranges within one or more of theregions 3650A-D, as disclosed above. -
Step 4040 comprises managing the segmentedlogical address space 136.Step 3840 may comprise moving one or more storage entities (e.g., files) in response to allocation changes and/or balancing operations, as disclosed above. -
FIG. 41 is a flow diagram of another embodiment of amethod 4100 for allocating storage resources.Step 4120 may comprise selecting an allocation region in response to a request. The request may comprise an allocation request, a storage request, and/or the like. As disclosed above, selection of the allocation region may be based on one or more of: size of data associated with the request, a size of a data structure associated with the request, a size of a storage entity associated with the request, a file associated with the request, an application associated with the request, a parameter of the request, a storage client associated with the request, ioctrl parameter, an fadvise parameter, and availability of storage resources, and/or the like. The region selected atstep 4120 may comprise alogical allocation region 3650A-D, aphysical allocation region 3638A-N, and/or a region comprising a combination of LID and data sector allocation granularity.Step 4120 may further comprise allocating storage resources within the selected region. -
Step 4130 may comprise performing one or more storage operations within the selected region and/or in accordance with the allocation granularity of the selected region.Step 4130 may comprise allocating a particular range of LIDs in accordance with a particularlogical allocation region 3650A-D, storing data within physical sectors of a predetermined size (in accordance with a particularphysical allocation region 3638A-N), and/or the like. -
Step 4140 may comprise moving data corresponding to the storage operations performed atstep 4130 to a different allocation region.Step 4140 may comprise determining that the data should be moved. The determination may be based on receiving a request, through theinterface 138, to move the data. Alternatively, or in addition, the determination may be based on profiling metadata pertaining to storage operation(s); such as access characteristics of the data, changes in requested allocation size, and/or the like. For example, a file may be moved from a relatively small logical allocation region to a larger logical allocation region in response to continued expansion of the file. In another embodiment, a file may be moved from a large logical allocation region to a smaller logical allocation region in response to a reduction in file size. In other embodiments, data may be moved in toregions 3638A-N having different data sector sizes, in accordance with observed data access characteristics.Step 4140 may further comprise performing one or more range move operations to move the data to/from different portions of thelogical address space 136, as disclosed herein. - This disclosure has been made with reference to various exemplary embodiments. However, those skilled in the art will recognize that changes and modifications may be made to the exemplary embodiments without departing from the scope of the present disclosure. For example, various operational steps, as well as components for carrying out operational steps, may be implemented in alternate ways depending upon the particular application or in consideration of any number of cost functions associated with the operation of the system (e.g., one or more of the steps may be deleted, modified, or combined with other steps). Therefore, this disclosure is to be regarded in an illustrative rather than a restrictive sense, and all such modifications are intended to be included within the scope thereof. Likewise, benefits, other advantages, and solutions to problems have been described above with regard to various embodiments. However, benefits, advantages, solutions to problems, and any element(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, a required, or an essential feature or element. As used herein, the terms “comprises,” “comprising,” and any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, a method, an article, or an apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, system, article, or apparatus. Also, as used herein, the terms “coupled,” “coupling,” and any other variation thereof are intended to cover a physical connection, an electrical connection, a magnetic connection, an optical connection, a communicative connection, a functional connection, and/or any other connection.
- Additionally, as will be appreciated by one of ordinary skill in the art, principles of the present disclosure may be reflected in a computer program product on a machine-readable storage medium having machine-readable program code means embodied in the storage medium. Any tangible, non-transitory machine-readable storage medium may be utilized, including magnetic storage devices (hard disks, floppy disks, and the like), optical storage devices (CD-ROMs, DVDs, Blu-Ray discs, and the like), flash memory, and/or the like. These computer program instructions may be loaded onto a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions that execute on the computer or other programmable data processing apparatus create means for implementing the functions specified. These computer program instructions may also be stored in a machine-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the machine-readable memory produce an article of manufacture, including implementing means that implement the function specified. The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer-implemented process, such that the instructions that execute on the computer or other programmable apparatus provide steps for implementing the functions specified.
- While the principles of this disclosure have been shown in various embodiments, many modifications of structure, arrangements, proportions, elements, materials, and components that are particularly adapted for a specific environment and operating requirements may be used without departing from the principles and scope of this disclosure. These and other changes or modifications are intended to be included within the scope of the present disclosure.
Claims (25)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/865,153 US9563555B2 (en) | 2011-03-18 | 2013-04-17 | Systems and methods for storage allocation |
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201161454235P | 2011-03-18 | 2011-03-18 | |
US13/424,333 US8966191B2 (en) | 2011-03-18 | 2012-03-19 | Logical interface for contextual storage |
US201261625647P | 2012-04-17 | 2012-04-17 | |
US201261637165P | 2012-04-23 | 2012-04-23 | |
US13/865,153 US9563555B2 (en) | 2011-03-18 | 2013-04-17 | Systems and methods for storage allocation |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/424,333 Continuation-In-Part US8966191B2 (en) | 2011-03-18 | 2012-03-19 | Logical interface for contextual storage |
Publications (2)
Publication Number | Publication Date |
---|---|
US20130227236A1 true US20130227236A1 (en) | 2013-08-29 |
US9563555B2 US9563555B2 (en) | 2017-02-07 |
Family
ID=49004575
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/865,153 Active 2033-09-16 US9563555B2 (en) | 2011-03-18 | 2013-04-17 | Systems and methods for storage allocation |
Country Status (1)
Country | Link |
---|---|
US (1) | US9563555B2 (en) |
Cited By (311)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140006708A1 (en) * | 2012-06-28 | 2014-01-02 | International Business Machines Corporation | Secure access to shared storage resources |
US20140068183A1 (en) * | 2012-08-31 | 2014-03-06 | Fusion-Io, Inc. | Systems, methods, and interfaces for adaptive persistence |
US20140173223A1 (en) * | 2011-12-13 | 2014-06-19 | Nathaniel S DeNeui | Storage controller with host collaboration for initialization of a logical volume |
US20140201384A1 (en) * | 2013-01-16 | 2014-07-17 | Cisco Technology, Inc. | Method for optimizing wan traffic with efficient indexing scheme |
US20140281126A1 (en) * | 2013-03-14 | 2014-09-18 | Sandisk Technologies Inc. | Overprovision capacity in a data storage device |
US20140344507A1 (en) * | 2013-04-16 | 2014-11-20 | Fusion-Io, Inc. | Systems and methods for storage metadata management |
WO2015034483A1 (en) * | 2013-09-04 | 2015-03-12 | Intel Corporation | Mechanism for facilitating dynamic storage management for mobile computing devices |
WO2015038741A1 (en) * | 2013-09-16 | 2015-03-19 | Netapp, Inc. | Management of extent based metadata with dense tree structures within a distributed storage architecture |
WO2015048140A1 (en) * | 2013-09-24 | 2015-04-02 | Intelligent Intellectual Property Holdings 2 Llc | Systems and methods for storage collision management |
US9069782B2 (en) | 2012-10-01 | 2015-06-30 | The Research Foundation For The State University Of New York | System and method for security and privacy aware virtual machine checkpointing |
US20150248418A1 (en) * | 2013-10-09 | 2015-09-03 | Rahul M. Bhardwaj | Technology for managing cloud storage |
US9152684B2 (en) | 2013-11-12 | 2015-10-06 | Netapp, Inc. | Snapshots and clones of volumes in a storage system |
US9170938B1 (en) * | 2013-05-17 | 2015-10-27 | Western Digital Technologies, Inc. | Method and system for atomically writing scattered information in a solid state storage device |
US9201918B2 (en) | 2013-11-19 | 2015-12-01 | Netapp, Inc. | Dense tree volume metadata update logging and checkpointing |
US9218279B2 (en) | 2013-03-15 | 2015-12-22 | Western Digital Technologies, Inc. | Atomic write command support in a solid state drive |
US20150370844A1 (en) * | 2014-06-24 | 2015-12-24 | Google Inc. | Processing mutations for a remote database |
US20160004460A1 (en) * | 2013-10-29 | 2016-01-07 | Hitachi, Ltd. | Computer system and control method |
US20160026408A1 (en) * | 2014-07-24 | 2016-01-28 | Fusion-Io, Inc. | Storage device metadata synchronization |
US20160070652A1 (en) * | 2014-09-04 | 2016-03-10 | Fusion-Io, Inc. | Generalized storage virtualization interface |
US9306997B2 (en) | 2013-01-16 | 2016-04-05 | Cisco Technology, Inc. | Method for optimizing WAN traffic with deduplicated storage |
US9348517B2 (en) | 2014-08-28 | 2016-05-24 | International Business Machines Corporation | Using a migration threshold and a candidate list for cache management of sequential write storage |
WO2016081166A1 (en) * | 2014-11-18 | 2016-05-26 | Netapp, Inc. | N-way merge for updating volume metadata in a storage i/o stack |
US9411528B1 (en) * | 2015-04-22 | 2016-08-09 | Ryft Systems, Inc. | Storage management systems and methods |
US9411613B1 (en) | 2015-04-22 | 2016-08-09 | Ryft Systems, Inc. | Systems and methods for managing execution of specialized processors |
EP3061008A1 (en) * | 2013-10-24 | 2016-08-31 | Western Digital Technologies, Inc. | Data storage device supporting accelerated database operations |
US20160253097A1 (en) * | 2015-02-27 | 2016-09-01 | Kyocera Document Solutions Inc. | Information processing device that extends service life of non-volatile semiconductor memory and recording medium |
US20160283157A1 (en) * | 2015-03-23 | 2016-09-29 | Kabushiki Kaisha Toshiba | Memory device |
US9459998B2 (en) | 2015-02-04 | 2016-10-04 | International Business Machines Corporation | Operations interlock under dynamic relocation of storage |
US20160342645A1 (en) * | 2015-05-18 | 2016-11-24 | Oracle International Corporation | Efficient storage using automatic data translation |
US9509736B2 (en) | 2013-01-16 | 2016-11-29 | Cisco Technology, Inc. | Method for optimizing WAN traffic |
US9542244B2 (en) | 2015-04-22 | 2017-01-10 | Ryft Systems, Inc. | Systems and methods for performing primitive tasks using specialized processors |
US20170046351A1 (en) * | 2015-08-10 | 2017-02-16 | International Business Machines Corporation | File migration in a hierarchical storage system |
US20170085636A1 (en) * | 2015-09-21 | 2017-03-23 | Intel Corporation | Method and Apparatus for Dynamically Offloading Execution of Machine Code in an Application to a Virtual Machine |
US20170126663A1 (en) * | 2015-10-29 | 2017-05-04 | Airbus Defence and Space GmbH | Forward-Secure Crash-Resilient Logging Device |
US9671960B2 (en) | 2014-09-12 | 2017-06-06 | Netapp, Inc. | Rate matching technique for balancing segment cleaning and I/O workload |
US9684460B1 (en) | 2010-09-15 | 2017-06-20 | Pure Storage, Inc. | Proactively correcting behavior that may affect I/O performance in a non-volatile semiconductor storage device |
US9710165B1 (en) | 2015-02-18 | 2017-07-18 | Pure Storage, Inc. | Identifying volume candidates for space reclamation |
US9710317B2 (en) | 2015-03-30 | 2017-07-18 | Netapp, Inc. | Methods to identify, handle and recover from suspect SSDS in a clustered flash array |
US9720601B2 (en) | 2015-02-11 | 2017-08-01 | Netapp, Inc. | Load balancing technique for a storage array |
US9727485B1 (en) | 2014-11-24 | 2017-08-08 | Pure Storage, Inc. | Metadata rewrite and flatten optimization |
US9727249B1 (en) * | 2014-02-06 | 2017-08-08 | SK Hynix Inc. | Selection of an open block in solid state storage systems with multiple open blocks |
US9740566B2 (en) | 2015-07-31 | 2017-08-22 | Netapp, Inc. | Snapshot creation workflow |
US20170249246A1 (en) * | 2015-03-13 | 2017-08-31 | Hitachi Data Systems Corporation | Deduplication and garbage collection across logical databases |
US9762460B2 (en) | 2015-03-24 | 2017-09-12 | Netapp, Inc. | Providing continuous context for operational information of a storage system |
US9767284B2 (en) | 2012-09-14 | 2017-09-19 | The Research Foundation For The State University Of New York | Continuous run-time validation of program execution: a practical approach |
US9767271B2 (en) | 2010-07-15 | 2017-09-19 | The Research Foundation For The State University Of New York | System and method for validating program execution at run-time |
US9773007B1 (en) * | 2014-12-01 | 2017-09-26 | Pure Storage, Inc. | Performance improvements in a storage system |
US20170277709A1 (en) * | 2016-03-25 | 2017-09-28 | Amazon Technologies, Inc. | Block allocation for low latency file systems |
US20170277739A1 (en) * | 2016-03-25 | 2017-09-28 | Netapp, Inc. | Consistent method of indexing file system information |
US9779268B1 (en) | 2014-06-03 | 2017-10-03 | Pure Storage, Inc. | Utilizing a non-repeating identifier to encrypt data |
US9792045B1 (en) | 2012-03-15 | 2017-10-17 | Pure Storage, Inc. | Distributing data blocks across a plurality of storage devices |
US9798728B2 (en) | 2014-07-24 | 2017-10-24 | Netapp, Inc. | System performing data deduplication using a dense tree data structure |
US9804973B1 (en) | 2014-01-09 | 2017-10-31 | Pure Storage, Inc. | Using frequency domain to prioritize storage of metadata in a cache |
US9811551B1 (en) | 2011-10-14 | 2017-11-07 | Pure Storage, Inc. | Utilizing multiple fingerprint tables in a deduplicating storage system |
US9817608B1 (en) | 2014-06-25 | 2017-11-14 | Pure Storage, Inc. | Replication and intermediate read-write state for mediums |
US9823842B2 (en) | 2014-05-12 | 2017-11-21 | The Research Foundation For The State University Of New York | Gang migration of virtual machines using cluster-wide deduplication |
US20180004649A1 (en) * | 2016-07-01 | 2018-01-04 | Intel Corporation | Techniques to Format a Persistent Memory File |
US9864769B2 (en) | 2014-12-12 | 2018-01-09 | Pure Storage, Inc. | Storing data utilizing repeating pattern detection |
CN107564558A (en) * | 2016-06-30 | 2018-01-09 | 希捷科技有限公司 | Realize scattered atom I/O write-ins |
US9864761B1 (en) | 2014-08-08 | 2018-01-09 | Pure Storage, Inc. | Read optimization operations in a storage system |
US9880779B1 (en) | 2013-01-10 | 2018-01-30 | Pure Storage, Inc. | Processing copy offload requests in a storage system |
US9965398B2 (en) | 2016-01-12 | 2018-05-08 | Samsung Electronics Co., Ltd. | Method and apparatus for simplified nameless writes using a virtual address table |
US20180165321A1 (en) * | 2016-12-09 | 2018-06-14 | Qumulo, Inc. | Managing storage quotas in a shared storage system |
WO2018118453A1 (en) * | 2016-12-19 | 2018-06-28 | Pure Storage, Inc. | Block consolidation in a direct-mapped flash storage system |
US10013425B1 (en) * | 2016-03-31 | 2018-07-03 | EMC IP Holding Company LLC | Space-efficient persistent block reservation optimized for compression |
US10073656B2 (en) | 2012-01-27 | 2018-09-11 | Sandisk Technologies Llc | Systems and methods for storage virtualization |
US10089180B2 (en) * | 2015-07-31 | 2018-10-02 | International Business Machines Corporation | Unfavorable storage growth rate abatement |
US10102144B2 (en) | 2013-04-16 | 2018-10-16 | Sandisk Technologies Llc | Systems, methods and interfaces for data virtualization |
US10108547B2 (en) * | 2016-01-06 | 2018-10-23 | Netapp, Inc. | High performance and memory efficient metadata caching |
US10114574B1 (en) | 2014-10-07 | 2018-10-30 | Pure Storage, Inc. | Optimizing storage allocation in a storage system |
US10126982B1 (en) | 2010-09-15 | 2018-11-13 | Pure Storage, Inc. | Adjusting a number of storage devices in a storage system that may be utilized to simultaneously service high latency operations |
US10133511B2 (en) | 2014-09-12 | 2018-11-20 | Netapp, Inc | Optimized segment cleaning technique |
US10146694B1 (en) * | 2017-04-28 | 2018-12-04 | EMC IP Holding Company LLC | Persistent cache layer in a distributed file system |
US10156998B1 (en) | 2010-09-15 | 2018-12-18 | Pure Storage, Inc. | Reducing a number of storage devices in a storage system that are exhibiting variable I/O response times |
US10164841B2 (en) | 2014-10-02 | 2018-12-25 | Pure Storage, Inc. | Cloud assist for storage systems |
US10162523B2 (en) | 2016-10-04 | 2018-12-25 | Pure Storage, Inc. | Migrating data between volumes using virtual copy operation |
US10176216B2 (en) | 2016-02-01 | 2019-01-08 | International Business Machines Corporation | Verifying data consistency |
US10180879B1 (en) | 2010-09-28 | 2019-01-15 | Pure Storage, Inc. | Inter-device and intra-device protection data |
US10185505B1 (en) | 2016-10-28 | 2019-01-22 | Pure Storage, Inc. | Reading a portion of data to replicate a volume based on sequence numbers |
US10191662B2 (en) | 2016-10-04 | 2019-01-29 | Pure Storage, Inc. | Dynamic allocation of segments in a flash storage system |
US20190042097A1 (en) * | 2018-05-18 | 2019-02-07 | Intel Corporation | Non-volatile memory cloning with hardware copy-on-write support |
US10222984B1 (en) * | 2015-12-31 | 2019-03-05 | EMC IP Holding Company LLC | Managing multi-granularity flash translation layers in solid state drives |
US10235065B1 (en) | 2014-12-11 | 2019-03-19 | Pure Storage, Inc. | Datasheet replication in a cloud computing environment |
US10248516B1 (en) | 2014-12-11 | 2019-04-02 | Pure Storage, Inc. | Processing read and write requests during reconstruction in a storage system |
US10257274B2 (en) * | 2014-09-15 | 2019-04-09 | Foundation for Research and Technology—Hellas (FORTH) | Tiered heterogeneous fast layer shared storage substrate apparatuses, methods, and systems |
US10261708B1 (en) * | 2017-04-26 | 2019-04-16 | EMC IP Holding Company LLC | Host data replication allocating single memory buffers to store multiple buffers of received host data and to internally process the received host data |
US10263770B2 (en) | 2013-11-06 | 2019-04-16 | Pure Storage, Inc. | Data protection in a storage system using external secrets |
US20190129982A1 (en) * | 2017-10-30 | 2019-05-02 | Nicira, Inc. | Just-in-time multi-indexed tables in a shared log |
US10284367B1 (en) | 2012-09-26 | 2019-05-07 | Pure Storage, Inc. | Encrypting data in a storage system using a plurality of encryption keys |
US10296469B1 (en) | 2014-07-24 | 2019-05-21 | Pure Storage, Inc. | Access control in a flash storage system |
US10296354B1 (en) | 2015-01-21 | 2019-05-21 | Pure Storage, Inc. | Optimized boot operations within a flash storage array |
US10313427B2 (en) | 2014-09-24 | 2019-06-04 | Intel Corporation | Contextual application management |
US10310740B2 (en) | 2015-06-23 | 2019-06-04 | Pure Storage, Inc. | Aligning memory access operations to a geometry of a storage device |
US10318495B2 (en) | 2012-09-24 | 2019-06-11 | Sandisk Technologies Llc | Snapshots for a non-volatile device |
US10318202B2 (en) * | 2017-03-20 | 2019-06-11 | Via Technologies, Inc. | Non-volatile memory apparatus and data deduplication method thereof |
US20190188040A1 (en) * | 2017-12-19 | 2019-06-20 | Western Digital Technologies, Inc. | Multi-constraint dynamic resource manager |
CN110019248A (en) * | 2017-09-29 | 2019-07-16 | 英特尔公司 | Technology for the more storage format database access of dynamic |
US10359942B2 (en) | 2016-10-31 | 2019-07-23 | Pure Storage, Inc. | Deduplication aware scalable content placement |
US10365858B2 (en) | 2013-11-06 | 2019-07-30 | Pure Storage, Inc. | Thin provisioning in a storage device |
US10365848B2 (en) * | 2015-12-02 | 2019-07-30 | Netapp, Inc. | Space reservation for distributed storage systems |
US10394660B2 (en) | 2015-07-31 | 2019-08-27 | Netapp, Inc. | Snapshot restore workflow |
US20190266260A1 (en) * | 2018-02-23 | 2019-08-29 | Microsoft Technology Licensing, Llc | Location and context for computer file system |
US10402266B1 (en) | 2017-07-31 | 2019-09-03 | Pure Storage, Inc. | Redundant array of independent disks in a direct-mapped flash storage system |
US10430079B2 (en) | 2014-09-08 | 2019-10-01 | Pure Storage, Inc. | Adjusting storage capacity in a computing system |
US10430282B2 (en) | 2014-10-07 | 2019-10-01 | Pure Storage, Inc. | Optimizing replication by distinguishing user and system write activity |
US10437677B2 (en) * | 2015-02-27 | 2019-10-08 | Pure Storage, Inc. | Optimized distributed rebuilding within a dispersed storage network |
US10444998B1 (en) | 2013-10-24 | 2019-10-15 | Western Digital Technologies, Inc. | Data storage device providing data maintenance services |
US10452289B1 (en) | 2010-09-28 | 2019-10-22 | Pure Storage, Inc. | Dynamically adjusting an amount of protection data stored in a storage system |
US10452297B1 (en) | 2016-05-02 | 2019-10-22 | Pure Storage, Inc. | Generating and optimizing summary index levels in a deduplication storage system |
US10482049B2 (en) * | 2017-02-03 | 2019-11-19 | Lenovo Enterprise Solutions (Singapore) Pte. Ltd. | Configuring NVMe devices for redundancy and scaling |
US10496556B1 (en) | 2014-06-25 | 2019-12-03 | Pure Storage, Inc. | Dynamic data protection within a flash storage system |
US10509776B2 (en) | 2012-09-24 | 2019-12-17 | Sandisk Technologies Llc | Time sequence data management |
US10545987B2 (en) | 2014-12-19 | 2020-01-28 | Pure Storage, Inc. | Replication to the cloud |
US10545861B2 (en) | 2016-10-04 | 2020-01-28 | Pure Storage, Inc. | Distributed integrated high-speed solid-state non-volatile random-access memory |
CN110780809A (en) * | 2018-07-31 | 2020-02-11 | 爱思开海力士有限公司 | Apparatus and method for managing metadata for interfacing of multiple memory systems |
WO2020033167A1 (en) * | 2018-08-10 | 2020-02-13 | Micron Technology, Inc. | Data validity tracking in a non-volatile memory |
US10564882B2 (en) | 2015-06-23 | 2020-02-18 | Pure Storage, Inc. | Writing data to storage device based on information about memory in the storage device |
US10565230B2 (en) | 2015-07-31 | 2020-02-18 | Netapp, Inc. | Technique for preserving efficiency for replication between clusters of a network |
TWI687806B (en) * | 2015-10-29 | 2020-03-11 | 韓商愛思開海力士有限公司 | Data storage device and operating method thereof |
US10599580B2 (en) * | 2018-05-23 | 2020-03-24 | International Business Machines Corporation | Representing an address space of unequal granularity and alignment |
US20200099692A1 (en) * | 2018-09-24 | 2020-03-26 | Nutanix, Inc. | System and method for protection of entities across availability zones |
US10614033B1 (en) | 2019-01-30 | 2020-04-07 | Qumulo, Inc. | Client aware pre-fetch policy scoring system |
US10623386B1 (en) | 2012-09-26 | 2020-04-14 | Pure Storage, Inc. | Secret sharing data protection in a storage system |
CN111026615A (en) * | 2019-12-20 | 2020-04-17 | 浪潮电子信息产业股份有限公司 | Method and device for acquiring logical volume list, electronic equipment and storage medium |
US20200134041A1 (en) * | 2018-10-31 | 2020-04-30 | Alibaba Group Holding Limited | Journaling overhead reduction with remapping interface |
US10649981B2 (en) | 2017-10-23 | 2020-05-12 | Vmware, Inc. | Direct access to object state in a shared log |
US10656864B2 (en) | 2014-03-20 | 2020-05-19 | Pure Storage, Inc. | Data replication within a flash storage array |
US10657068B2 (en) | 2018-03-22 | 2020-05-19 | Intel Corporation | Techniques for an all persistent memory file system |
US10678436B1 (en) | 2018-05-29 | 2020-06-09 | Pure Storage, Inc. | Using a PID controller to opportunistically compress more data during garbage collection |
US10678433B1 (en) | 2018-04-27 | 2020-06-09 | Pure Storage, Inc. | Resource-preserving system upgrade |
US10693964B2 (en) | 2015-04-09 | 2020-06-23 | Pure Storage, Inc. | Storage unit communication within a storage system |
US10719480B1 (en) * | 2016-11-17 | 2020-07-21 | EMC IP Holding Company LLC | Embedded data valuation and metadata binding |
US10719265B1 (en) * | 2017-12-08 | 2020-07-21 | Pure Storage, Inc. | Centralized, quorum-aware handling of device reservation requests in a storage system |
US10725977B1 (en) | 2019-10-21 | 2020-07-28 | Qumulo, Inc. | Managing file system state during replication jobs |
US10756816B1 (en) | 2016-10-04 | 2020-08-25 | Pure Storage, Inc. | Optimized fibre channel and non-volatile memory express access |
US10776202B1 (en) | 2017-09-22 | 2020-09-15 | Pure Storage, Inc. | Drive, blade, or data shard decommission via RAID geometry shrinkage |
US10776034B2 (en) | 2016-07-26 | 2020-09-15 | Pure Storage, Inc. | Adaptive data migration |
US10776046B1 (en) | 2018-06-08 | 2020-09-15 | Pure Storage, Inc. | Optimized non-uniform memory access |
CN111708716A (en) * | 2019-03-18 | 2020-09-25 | 爱思开海力士有限公司 | Data storage device, computing device having the same, and method of operation |
US10789211B1 (en) | 2017-10-04 | 2020-09-29 | Pure Storage, Inc. | Feature-based deduplication |
US10795796B1 (en) | 2020-01-24 | 2020-10-06 | Qumulo, Inc. | Predictive performance analysis for file systems |
US10831935B2 (en) | 2017-08-31 | 2020-11-10 | Pure Storage, Inc. | Encryption management with host-side data reduction |
US10838620B2 (en) | 2016-05-26 | 2020-11-17 | Nutanix, Inc. | Efficient scaling of distributed storage systems |
US10838923B1 (en) * | 2015-12-18 | 2020-11-17 | EMC IP Holding Company LLC | Poor deduplication identification |
US10846216B2 (en) | 2018-10-25 | 2020-11-24 | Pure Storage, Inc. | Scalable garbage collection |
US10860372B1 (en) | 2020-01-24 | 2020-12-08 | Qumulo, Inc. | Managing throughput fairness and quality of service in file systems |
US10860414B1 (en) | 2020-01-31 | 2020-12-08 | Qumulo, Inc. | Change notification in distributed file systems |
US10860547B2 (en) | 2014-04-23 | 2020-12-08 | Qumulo, Inc. | Data mobility, accessibility, and consistency in a data storage system |
US10860475B1 (en) * | 2017-11-17 | 2020-12-08 | Pure Storage, Inc. | Hybrid flash translation layer |
US10877942B2 (en) | 2015-06-17 | 2020-12-29 | Qumulo, Inc. | Filesystem capacity and performance metrics and visualizations |
US10884919B2 (en) | 2017-10-31 | 2021-01-05 | Pure Storage, Inc. | Memory management in a storage system |
US10911328B2 (en) | 2011-12-27 | 2021-02-02 | Netapp, Inc. | Quality of service policy based load adaption |
US10908835B1 (en) | 2013-01-10 | 2021-02-02 | Pure Storage, Inc. | Reversing deletion of a virtual machine |
US10915813B2 (en) | 2018-01-31 | 2021-02-09 | Pure Storage, Inc. | Search acceleration for artificial intelligence |
US10922142B2 (en) | 2018-10-31 | 2021-02-16 | Nutanix, Inc. | Multi-stage IOPS allocation |
US10929046B2 (en) | 2019-07-09 | 2021-02-23 | Pure Storage, Inc. | Identifying and relocating hot data to a cache determined with read velocity based on a threshold stored at a storage device |
US10929022B2 (en) | 2016-04-25 | 2021-02-23 | Netapp. Inc. | Space savings reporting for storage system supporting snapshot and clones |
US10936538B1 (en) | 2020-03-30 | 2021-03-02 | Qumulo, Inc. | Fair sampling of alternate data stream metrics for file systems |
US10936551B1 (en) | 2020-03-30 | 2021-03-02 | Qumulo, Inc. | Aggregating alternate data stream metrics for file systems |
US10944671B2 (en) | 2017-04-27 | 2021-03-09 | Pure Storage, Inc. | Efficient data forwarding in a networked device |
US10951488B2 (en) | 2011-12-27 | 2021-03-16 | Netapp, Inc. | Rule-based performance class access management for storage cluster performance guarantees |
US10970395B1 (en) | 2018-01-18 | 2021-04-06 | Pure Storage, Inc | Security threat monitoring for a storage system |
US10983866B2 (en) | 2014-08-07 | 2021-04-20 | Pure Storage, Inc. | Mapping defective memory in a storage system |
US10990480B1 (en) | 2019-04-05 | 2021-04-27 | Pure Storage, Inc. | Performance of RAID rebuild operations by a storage group controller of a storage system |
US10990297B1 (en) * | 2017-07-21 | 2021-04-27 | EMC IP Holding Company LLC | Checkpointing of user data and metadata in a non-atomic persistent storage environment |
US10997098B2 (en) | 2016-09-20 | 2021-05-04 | Netapp, Inc. | Quality of service policy sets |
US11010233B1 (en) | 2018-01-18 | 2021-05-18 | Pure Storage, Inc | Hardware-based system monitoring |
US11032259B1 (en) | 2012-09-26 | 2021-06-08 | Pure Storage, Inc. | Data protection in a storage system |
US11036583B2 (en) | 2014-06-04 | 2021-06-15 | Pure Storage, Inc. | Rebuilding data across storage nodes |
US11036596B1 (en) | 2018-02-18 | 2021-06-15 | Pure Storage, Inc. | System for delaying acknowledgements on open NAND locations until durability has been confirmed |
US11036438B2 (en) * | 2017-05-31 | 2021-06-15 | Fmad Engineering Kabushiki Gaisha | Efficient storage architecture for high speed packet capture |
US11070382B2 (en) | 2015-10-23 | 2021-07-20 | Pure Storage, Inc. | Communication in a distributed architecture |
US11080154B2 (en) | 2014-08-07 | 2021-08-03 | Pure Storage, Inc. | Recovering error corrected data |
US11086713B1 (en) | 2019-07-23 | 2021-08-10 | Pure Storage, Inc. | Optimized end-to-end integrity storage system |
US11093146B2 (en) | 2017-01-12 | 2021-08-17 | Pure Storage, Inc. | Automatic load rebalancing of a write group |
US11099986B2 (en) | 2019-04-12 | 2021-08-24 | Pure Storage, Inc. | Efficient transfer of memory contents |
US11113409B2 (en) | 2018-10-26 | 2021-09-07 | Pure Storage, Inc. | Efficient rekey in a transparent decrypting storage array |
US11119657B2 (en) | 2016-10-28 | 2021-09-14 | Pure Storage, Inc. | Dynamic access in flash system |
US11128448B1 (en) | 2013-11-06 | 2021-09-21 | Pure Storage, Inc. | Quorum-aware secret sharing |
US11128740B2 (en) | 2017-05-31 | 2021-09-21 | Fmad Engineering Kabushiki Gaisha | High-speed data packet generator |
US11133076B2 (en) | 2018-09-06 | 2021-09-28 | Pure Storage, Inc. | Efficient relocation of data between storage devices of a storage system |
US11132126B1 (en) | 2021-03-16 | 2021-09-28 | Qumulo, Inc. | Backup services for distributed file systems in cloud computing environments |
CN113448920A (en) * | 2020-03-27 | 2021-09-28 | 伊姆西Ip控股有限责任公司 | Method, apparatus and computer program product for managing indexes in a storage system |
US11144638B1 (en) | 2018-01-18 | 2021-10-12 | Pure Storage, Inc. | Method for storage system detection and alerting on potential malicious action |
US11151092B2 (en) | 2019-01-30 | 2021-10-19 | Qumulo, Inc. | Data replication in distributed file systems |
US11151001B2 (en) | 2020-01-28 | 2021-10-19 | Qumulo, Inc. | Recovery checkpoints for distributed file systems |
CN113535597A (en) * | 2020-04-14 | 2021-10-22 | 阿里巴巴集团控股有限公司 | Memory management method, memory management unit and Internet of things equipment |
US11157458B1 (en) | 2021-01-28 | 2021-10-26 | Qumulo, Inc. | Replicating files in distributed file systems using object-based data storage |
US11163721B1 (en) * | 2017-04-25 | 2021-11-02 | EMC IP Holding Company LLC | Snapshot change list and file system indexing |
US11188269B2 (en) | 2015-03-27 | 2021-11-30 | Pure Storage, Inc. | Configuration for multiple logical storage arrays |
US11194473B1 (en) | 2019-01-23 | 2021-12-07 | Pure Storage, Inc. | Programming frequently read data to low latency portions of a solid-state storage array |
US11194759B2 (en) | 2018-09-06 | 2021-12-07 | Pure Storage, Inc. | Optimizing local data relocation operations of a storage device of a storage system |
CN113766027A (en) * | 2021-09-09 | 2021-12-07 | 瀚高基础软件股份有限公司 | Method and equipment for forwarding data by flow replication cluster node |
WO2021254598A1 (en) * | 2020-06-16 | 2021-12-23 | Huawei Technologies Co., Ltd. | Devices for memory management |
US11231956B2 (en) | 2015-05-19 | 2022-01-25 | Pure Storage, Inc. | Committed transactions in a storage system |
US11249688B2 (en) | 2017-05-31 | 2022-02-15 | Fmad Engineering Kabushiki Gaisha | High-speed data packet capture and storage with playback capabilities |
US11249999B2 (en) | 2015-09-04 | 2022-02-15 | Pure Storage, Inc. | Memory efficient searching |
US20220050898A1 (en) * | 2019-11-22 | 2022-02-17 | Pure Storage, Inc. | Selective Control of a Data Synchronization Setting of a Storage System Based on a Possible Ransomware Attack Against the Storage System |
US11269884B2 (en) | 2015-09-04 | 2022-03-08 | Pure Storage, Inc. | Dynamically resizable structures for approximate membership queries |
US11275509B1 (en) | 2010-09-15 | 2022-03-15 | Pure Storage, Inc. | Intelligently sizing high latency I/O requests in a storage environment |
US11281577B1 (en) | 2018-06-19 | 2022-03-22 | Pure Storage, Inc. | Garbage collection tuning for low drive wear |
US11281394B2 (en) | 2019-06-24 | 2022-03-22 | Pure Storage, Inc. | Replication across partitioning schemes in a distributed storage system |
US11281501B2 (en) * | 2018-04-04 | 2022-03-22 | Micron Technology, Inc. | Determination of workload distribution across processors in a memory system |
US11288238B2 (en) | 2019-11-01 | 2022-03-29 | EMC IP Holding Company LLC | Methods and systems for logging data transactions and managing hash tables |
US11288211B2 (en) | 2019-11-01 | 2022-03-29 | EMC IP Holding Company LLC | Methods and systems for optimizing storage resources |
US11294725B2 (en) | 2019-11-01 | 2022-04-05 | EMC IP Holding Company LLC | Method and system for identifying a preferred thread pool associated with a file system |
US11294604B1 (en) | 2021-10-22 | 2022-04-05 | Qumulo, Inc. | Serverless disk drives based on cloud storage |
US11307772B1 (en) | 2010-09-15 | 2022-04-19 | Pure Storage, Inc. | Responding to variable response time behavior in a storage environment |
US11334254B2 (en) | 2019-03-29 | 2022-05-17 | Pure Storage, Inc. | Reliability based flash page sizing |
US11341136B2 (en) | 2015-09-04 | 2022-05-24 | Pure Storage, Inc. | Dynamically resizable structures for approximate membership queries |
US11341236B2 (en) | 2019-11-22 | 2022-05-24 | Pure Storage, Inc. | Traffic-based detection of a security threat to a storage system |
US11347699B2 (en) | 2018-12-20 | 2022-05-31 | Qumulo, Inc. | File system cache tiers |
US11347743B2 (en) * | 2020-04-01 | 2022-05-31 | Sap Se | Metadata converter and memory management system |
US11347709B2 (en) | 2020-04-01 | 2022-05-31 | Sap Se | Hierarchical metadata enhancements for a memory management system |
US11354273B1 (en) | 2021-11-18 | 2022-06-07 | Qumulo, Inc. | Managing usable storage space in distributed file systems |
US11360936B2 (en) | 2018-06-08 | 2022-06-14 | Qumulo, Inc. | Managing per object snapshot coverage in filesystems |
US11379119B2 (en) | 2010-03-05 | 2022-07-05 | Netapp, Inc. | Writing data in a distributed data storage system |
US11386120B2 (en) | 2014-02-21 | 2022-07-12 | Netapp, Inc. | Data syncing in a distributed system |
US11385792B2 (en) | 2018-04-27 | 2022-07-12 | Pure Storage, Inc. | High availability controller pair transitioning |
US11392317B2 (en) | 2017-05-31 | 2022-07-19 | Fmad Engineering Kabushiki Gaisha | High speed data packet flow processing |
US11392464B2 (en) | 2019-11-01 | 2022-07-19 | EMC IP Holding Company LLC | Methods and systems for mirroring and failover of nodes |
US11397674B1 (en) | 2019-04-03 | 2022-07-26 | Pure Storage, Inc. | Optimizing garbage collection across heterogeneous flash devices |
US11399063B2 (en) | 2014-06-04 | 2022-07-26 | Pure Storage, Inc. | Network authentication for a storage system |
US11403043B2 (en) | 2019-10-15 | 2022-08-02 | Pure Storage, Inc. | Efficient data compression by grouping similar data within a data segment |
US11403019B2 (en) | 2017-04-21 | 2022-08-02 | Pure Storage, Inc. | Deduplication-aware per-tenant encryption |
US11409720B2 (en) * | 2019-11-13 | 2022-08-09 | Western Digital Technologies, Inc. | Metadata reduction in a distributed storage system |
US11409696B2 (en) | 2019-11-01 | 2022-08-09 | EMC IP Holding Company LLC | Methods and systems for utilizing a unified namespace |
US11422751B2 (en) | 2019-07-18 | 2022-08-23 | Pure Storage, Inc. | Creating a virtual storage system |
KR20220119348A (en) * | 2020-03-12 | 2022-08-29 | 웨스턴 디지털 테크놀로지스, 인코포레이티드 | Snapshot management in partitioned storage |
US11436023B2 (en) | 2018-05-31 | 2022-09-06 | Pure Storage, Inc. | Mechanism for updating host file system and flash translation layer based on underlying NAND technology |
US11442919B2 (en) * | 2015-07-31 | 2022-09-13 | Accenture Global Services Limited | Data reliability analysis |
US11449485B1 (en) | 2017-03-30 | 2022-09-20 | Pure Storage, Inc. | Sequence invalidation consolidation in a storage system |
US11461241B2 (en) | 2021-03-03 | 2022-10-04 | Qumulo, Inc. | Storage tier management for file systems |
US11487665B2 (en) | 2019-06-05 | 2022-11-01 | Pure Storage, Inc. | Tiered caching of data in a storage system |
US20220350543A1 (en) * | 2021-04-29 | 2022-11-03 | EMC IP Holding Company LLC | Methods and systems for storing data in a distributed system using offload components |
US11494109B1 (en) | 2018-02-22 | 2022-11-08 | Pure Storage, Inc. | Erase block trimming for heterogenous flash memory storage devices |
US11500788B2 (en) | 2019-11-22 | 2022-11-15 | Pure Storage, Inc. | Logical address based authorization of operations with respect to a storage system |
US20220374393A1 (en) * | 2021-05-19 | 2022-11-24 | Oracle International Corporation | Snapshot at the beginning marking in z garbage collector |
US11513902B1 (en) * | 2016-09-29 | 2022-11-29 | EMC IP Holding Company LLC | System and method of dynamic system resource allocation for primary storage systems with virtualized embedded data protection |
US11520907B1 (en) | 2019-11-22 | 2022-12-06 | Pure Storage, Inc. | Storage system snapshot retention based on encrypted data |
WO2022271412A1 (en) * | 2021-06-24 | 2022-12-29 | Pure Storage, Inc. | Efficiently writing data in a zoned drive storage system |
US11550481B2 (en) | 2016-12-19 | 2023-01-10 | Pure Storage, Inc. | Efficiently writing data in a zoned drive storage system |
US11567704B2 (en) | 2021-04-29 | 2023-01-31 | EMC IP Holding Company LLC | Method and systems for storing data in a storage pool using memory semantics with applications interacting with emulated block devices |
US11567660B2 (en) | 2021-03-16 | 2023-01-31 | Qumulo, Inc. | Managing cloud storage for distributed file systems |
US11579976B2 (en) | 2021-04-29 | 2023-02-14 | EMC IP Holding Company LLC | Methods and systems parallel raid rebuild in a distributed storage system |
US20230050976A1 (en) * | 2021-08-12 | 2023-02-16 | Seagate Technology Llc | File system aware computational storage block |
US11588633B1 (en) | 2019-03-15 | 2023-02-21 | Pure Storage, Inc. | Decommissioning keys in a decryption storage system |
US11599508B1 (en) | 2022-01-31 | 2023-03-07 | Qumulo, Inc. | Integrating distributed file systems with object stores |
US11614893B2 (en) | 2010-09-15 | 2023-03-28 | Pure Storage, Inc. | Optimizing storage device access based on latency |
US11615185B2 (en) | 2019-11-22 | 2023-03-28 | Pure Storage, Inc. | Multi-layer security threat detection for a storage system |
US11625481B2 (en) | 2019-11-22 | 2023-04-11 | Pure Storage, Inc. | Selective throttling of operations potentially related to a security threat to a storage system |
US20230111251A1 (en) * | 2020-02-28 | 2023-04-13 | Nebulon, Inc. | Metadata store in multiple reusable append logs |
US11636031B2 (en) | 2011-08-11 | 2023-04-25 | Pure Storage, Inc. | Optimized inline deduplication |
CN116036604A (en) * | 2023-01-28 | 2023-05-02 | 腾讯科技(深圳)有限公司 | Data processing method, device, computer and readable storage medium |
US11645162B2 (en) | 2019-11-22 | 2023-05-09 | Pure Storage, Inc. | Recovery point determination for data restoration in a storage system |
US11651075B2 (en) | 2019-11-22 | 2023-05-16 | Pure Storage, Inc. | Extensible attack monitoring by a storage system |
US11657155B2 (en) | 2019-11-22 | 2023-05-23 | Pure Storage, Inc | Snapshot delta metric based determination of a possible ransomware attack against data maintained by a storage system |
US11669259B2 (en) | 2021-04-29 | 2023-06-06 | EMC IP Holding Company LLC | Methods and systems for methods and systems for in-line deduplication in a distributed storage system |
US11669255B2 (en) | 2021-06-30 | 2023-06-06 | Qumulo, Inc. | Distributed resource caching by reallocation of storage caching using tokens and agents with non-depleted cache allocations |
US11677633B2 (en) | 2021-10-27 | 2023-06-13 | EMC IP Holding Company LLC | Methods and systems for distributing topology information to client nodes |
US11675898B2 (en) | 2019-11-22 | 2023-06-13 | Pure Storage, Inc. | Recovery dataset management for security threat monitoring |
US11681470B2 (en) | 2017-05-31 | 2023-06-20 | Fmad Engineering Kabushiki Gaisha | High-speed replay of captured data packets |
US11687418B2 (en) | 2019-11-22 | 2023-06-27 | Pure Storage, Inc. | Automatic generation of recovery plans specific to individual storage elements |
US11693985B2 (en) | 2015-02-27 | 2023-07-04 | Pure Storage, Inc. | Stand-by storage nodes in storage network |
US20230214322A1 (en) * | 2020-05-18 | 2023-07-06 | Cambricon (Xi'an) Semiconductor Co., Ltd. | Method and device for allocating storage addresses for data in memory |
US11704036B2 (en) | 2016-05-02 | 2023-07-18 | Pure Storage, Inc. | Deduplication decision based on metrics |
US11720714B2 (en) | 2019-11-22 | 2023-08-08 | Pure Storage, Inc. | Inter-I/O relationship based detection of a security threat to a storage system |
US11720692B2 (en) | 2019-11-22 | 2023-08-08 | Pure Storage, Inc. | Hardware token based management of recovery datasets for a storage system |
US11722150B1 (en) | 2022-09-28 | 2023-08-08 | Qumulo, Inc. | Error resistant write-ahead log |
US11729269B1 (en) | 2022-10-26 | 2023-08-15 | Qumulo, Inc. | Bandwidth management in distributed file systems |
US11733908B2 (en) | 2013-01-10 | 2023-08-22 | Pure Storage, Inc. | Delaying deletion of a dataset |
US11741056B2 (en) * | 2019-11-01 | 2023-08-29 | EMC IP Holding Company LLC | Methods and systems for allocating free space in a sparse file system |
US11740822B2 (en) | 2021-04-29 | 2023-08-29 | EMC IP Holding Company LLC | Methods and systems for error detection and correction in a distributed storage system |
US20230280910A1 (en) * | 2017-10-31 | 2023-09-07 | Pure Storage, Inc. | Allocation Of Differing Erase Block Sizes |
US11755751B2 (en) | 2019-11-22 | 2023-09-12 | Pure Storage, Inc. | Modify access restrictions in response to a possible attack against data stored by a storage system |
US11762682B2 (en) | 2021-10-27 | 2023-09-19 | EMC IP Holding Company LLC | Methods and systems for storing data in a distributed system using offload components with advanced data services |
US11768623B2 (en) | 2013-01-10 | 2023-09-26 | Pure Storage, Inc. | Optimizing generalized transfers between storage systems |
US11775189B2 (en) | 2019-04-03 | 2023-10-03 | Pure Storage, Inc. | Segment level heterogeneity |
US11775481B2 (en) | 2020-09-30 | 2023-10-03 | Qumulo, Inc. | User interfaces for managing distributed file systems |
WO2023196249A1 (en) * | 2022-04-05 | 2023-10-12 | Western Digital Technologies, Inc. | Aligned and unaligned data deallocation |
US11789870B2 (en) * | 2019-05-24 | 2023-10-17 | Microsoft Technology Licensing, Llc | Runtime allocation and utilization of persistent memory as volatile memory |
US11869586B2 (en) | 2018-07-11 | 2024-01-09 | Pure Storage, Inc. | Increased data protection by recovering data from partially-failed solid-state devices |
US11868247B1 (en) * | 2013-01-28 | 2024-01-09 | Radian Memory Systems, Inc. | Storage system with multiplane segments and cooperative flash management |
US11875193B2 (en) | 2021-03-25 | 2024-01-16 | Oracle International Corporation | Tracking frame states of call stack frames including colorless roots |
US11892983B2 (en) | 2021-04-29 | 2024-02-06 | EMC IP Holding Company LLC | Methods and systems for seamless tiering in a distributed storage system |
US11921677B1 (en) | 2023-11-07 | 2024-03-05 | Qumulo, Inc. | Sharing namespaces across file system clusters |
US11922071B2 (en) | 2021-10-27 | 2024-03-05 | EMC IP Holding Company LLC | Methods and systems for storing data in a distributed system using offload components and a GPU module |
US20240089313A1 (en) * | 2020-10-28 | 2024-03-14 | Vivo Mobile Communication Co., Ltd. | File sending method and apparatus, and electronic device |
US11934660B1 (en) | 2023-11-07 | 2024-03-19 | Qumulo, Inc. | Tiered data storage with ephemeral and persistent tiers |
US11934322B1 (en) | 2018-04-05 | 2024-03-19 | Pure Storage, Inc. | Multiple encryption keys on storage drives |
US11941116B2 (en) | 2019-11-22 | 2024-03-26 | Pure Storage, Inc. | Ransomware-based data protection parameter modification |
US11941265B2 (en) | 2021-01-22 | 2024-03-26 | EMC IP Holding Company LLP | Method, electronic equipment and computer program product for managing metadata storage unit |
US11947968B2 (en) | 2015-01-21 | 2024-04-02 | Pure Storage, Inc. | Efficient use of zone in a storage device |
US11963321B2 (en) | 2019-09-11 | 2024-04-16 | Pure Storage, Inc. | Low profile latching mechanism |
US11966592B1 (en) | 2022-11-29 | 2024-04-23 | Qumulo, Inc. | In-place erasure code transcoding for distributed file systems |
US11977832B2 (en) | 2016-03-28 | 2024-05-07 | Microsoft Technology Licensing, Llc | Map note annotations at corresponding geographic locations |
US11995336B2 (en) | 2018-04-25 | 2024-05-28 | Pure Storage, Inc. | Bucket views |
US12007942B2 (en) | 2021-10-27 | 2024-06-11 | EMC IP Holding Company LLC | Methods and systems for seamlessly provisioning client application nodes in a distributed system |
US12008266B2 (en) | 2010-09-15 | 2024-06-11 | Pure Storage, Inc. | Efficient read by reconstruction |
US12019541B2 (en) | 2022-10-17 | 2024-06-25 | Oracle International Corporation | Lazy compaction in garbage collection |
US12045487B2 (en) | 2017-04-21 | 2024-07-23 | Pure Storage, Inc. | Preserving data deduplication in a multi-tenant storage system |
US12050689B2 (en) | 2019-11-22 | 2024-07-30 | Pure Storage, Inc. | Host anomaly-based generation of snapshots |
US12067118B2 (en) | 2019-11-22 | 2024-08-20 | Pure Storage, Inc. | Detection of writing to a non-header portion of a file as an indicator of a possible ransomware attack against a storage system |
US12079333B2 (en) | 2019-11-22 | 2024-09-03 | Pure Storage, Inc. | Independent security threat detection and remediation by storage systems in a synchronous replication arrangement |
US12079356B2 (en) | 2019-11-22 | 2024-09-03 | Pure Storage, Inc. | Measurement interval anomaly detection-based generation of snapshots |
US12079502B2 (en) | 2019-11-22 | 2024-09-03 | Pure Storage, Inc. | Storage element attribute-based determination of a data protection policy for use within a storage system |
US12087382B2 (en) | 2019-04-11 | 2024-09-10 | Pure Storage, Inc. | Adaptive threshold for bad flash memory blocks |
US12093435B2 (en) | 2021-04-29 | 2024-09-17 | Dell Products, L.P. | Methods and systems for securing data in a distributed storage system |
US12131074B2 (en) | 2021-10-27 | 2024-10-29 | EMC IP Holding Company LLC | Methods and systems for storing data in a distributed system using GPUS |
US12135888B2 (en) | 2019-07-10 | 2024-11-05 | Pure Storage, Inc. | Intelligent grouping of data based on expected lifespan |
US12141058B2 (en) | 2023-04-24 | 2024-11-12 | Pure Storage, Inc. | Low latency reads using cached deduplicated data |
Families Citing this family (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015172107A1 (en) | 2014-05-09 | 2015-11-12 | Nutanix, Inc. | Mechanism for providing external access to a secured networked virtualization environment |
US9959059B2 (en) * | 2014-10-20 | 2018-05-01 | Sandisk Technologies Llc | Storage error management |
US10291739B2 (en) * | 2015-11-19 | 2019-05-14 | Dell Products L.P. | Systems and methods for tracking of cache sector status |
US10809998B2 (en) | 2016-02-12 | 2020-10-20 | Nutanix, Inc. | Virtualized file server splitting and merging |
JP6578982B2 (en) * | 2016-02-12 | 2019-09-25 | 富士通株式会社 | Information processing apparatus, failure information storage program, and failure information storage method |
WO2017182063A1 (en) | 2016-04-19 | 2017-10-26 | Huawei Technologies Co., Ltd. | Vector processing for segmentation hash values calculation |
SG11201704733VA (en) | 2016-04-19 | 2017-11-29 | Huawei Tech Co Ltd | Concurrent segmentation using vector processing |
US11218418B2 (en) | 2016-05-20 | 2022-01-04 | Nutanix, Inc. | Scalable leadership election in a multi-processing computing environment |
US11568073B2 (en) | 2016-12-02 | 2023-01-31 | Nutanix, Inc. | Handling permissions for virtualized file servers |
US10824455B2 (en) | 2016-12-02 | 2020-11-03 | Nutanix, Inc. | Virtualized server systems and methods including load balancing for virtualized file servers |
US11562034B2 (en) | 2016-12-02 | 2023-01-24 | Nutanix, Inc. | Transparent referrals for distributed file servers |
US10728090B2 (en) | 2016-12-02 | 2020-07-28 | Nutanix, Inc. | Configuring network segmentation for a virtualization environment |
US11294777B2 (en) | 2016-12-05 | 2022-04-05 | Nutanix, Inc. | Disaster recovery for distributed file servers, including metadata fixers |
US11281484B2 (en) | 2016-12-06 | 2022-03-22 | Nutanix, Inc. | Virtualized server systems and methods including scaling of file system virtual machines |
US11288239B2 (en) | 2016-12-06 | 2022-03-29 | Nutanix, Inc. | Cloning virtualized file servers |
KR102498668B1 (en) | 2017-05-17 | 2023-02-09 | 삼성전자주식회사 | Method and host device for flash-aware heap memory management |
US10547683B2 (en) * | 2017-06-26 | 2020-01-28 | Christopher Squires | Object based storage systems that utilize direct memory access |
JP2019057178A (en) * | 2017-09-21 | 2019-04-11 | 東芝メモリ株式会社 | Memory system and control method |
JP2019057172A (en) | 2017-09-21 | 2019-04-11 | 東芝メモリ株式会社 | Memory system and control method |
US11016665B2 (en) | 2018-01-23 | 2021-05-25 | Seagate Technology Llc | Event-based dynamic memory allocation in a data storage device |
US11086826B2 (en) | 2018-04-30 | 2021-08-10 | Nutanix, Inc. | Virtualized server systems and methods including domain joining techniques |
US11263263B2 (en) | 2018-05-30 | 2022-03-01 | Palantir Technologies Inc. | Data propagation and mapping system |
US11194680B2 (en) | 2018-07-20 | 2021-12-07 | Nutanix, Inc. | Two node clusters recovery on a failure |
US11770447B2 (en) | 2018-10-31 | 2023-09-26 | Nutanix, Inc. | Managing high-availability file servers |
US11232070B2 (en) | 2019-06-24 | 2022-01-25 | Western Digital Technologies, Inc. | Metadata compaction in a distributed storage system |
US11704035B2 (en) | 2020-03-30 | 2023-07-18 | Pure Storage, Inc. | Unified storage on block containers |
US12079162B2 (en) | 2020-03-30 | 2024-09-03 | Pure Storage, Inc. | Snapshot management in a storage system |
US11768809B2 (en) | 2020-05-08 | 2023-09-26 | Nutanix, Inc. | Managing incremental snapshots for fast leader node bring-up |
US12131192B2 (en) | 2021-03-18 | 2024-10-29 | Nutanix, Inc. | Scope-based distributed lock infrastructure for virtualized file server |
US20230066137A1 (en) | 2021-08-19 | 2023-03-02 | Nutanix, Inc. | User interfaces for disaster recovery of distributed file servers |
US12117972B2 (en) | 2021-08-19 | 2024-10-15 | Nutanix, Inc. | File server managers and systems for managing virtualized file servers |
US20230259294A1 (en) * | 2022-02-11 | 2023-08-17 | Samsung Electronics Co., Ltd. | Systems, methods, and apparatus for copy destination atomicity in devices |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6477612B1 (en) * | 2000-02-08 | 2002-11-05 | Microsoft Corporation | Providing access to physical memory allocated to a process by selectively mapping pages of the physical memory with virtual memory allocated to the process |
US7278008B1 (en) * | 2004-01-30 | 2007-10-02 | Nvidia Corporation | Virtual address translation system with caching of variable-range translation clusters |
Family Cites Families (251)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4571674A (en) | 1982-09-27 | 1986-02-18 | International Business Machines Corporation | Peripheral storage system having multiple data transfer rates |
US5359726A (en) | 1988-12-22 | 1994-10-25 | Thomas Michael E | Ferroelectric storage device used in place of a rotating disk drive unit in a computer system |
US5247658A (en) | 1989-10-31 | 1993-09-21 | Microsoft Corporation | Method and system for traversing linked list record based upon write-once predetermined bit value of secondary pointers |
US5261068A (en) | 1990-05-25 | 1993-11-09 | Dell Usa L.P. | Dual path memory retrieval system for an interleaved dynamic RAM memory unit |
US5193184A (en) | 1990-06-18 | 1993-03-09 | Storage Technology Corporation | Deleted data file space release system for a dynamically mapped virtual data storage subsystem |
US5307497A (en) | 1990-06-25 | 1994-04-26 | International Business Machines Corp. | Disk operating system loadable from read only memory using installable file system interface |
US5325509A (en) | 1991-03-05 | 1994-06-28 | Zitel Corporation | Method of operating a cache memory including determining desirability of cache ahead or cache behind based on a number of available I/O operations |
US5438671A (en) | 1991-07-19 | 1995-08-01 | Dell U.S.A., L.P. | Method and system for transferring compressed bytes of information between separate hard disk drive units |
US5469555A (en) | 1991-12-19 | 1995-11-21 | Opti, Inc. | Adaptive write-back method and apparatus wherein the cache system operates in a combination of write-back and write-through modes for a cache-based microprocessor system |
US6256642B1 (en) | 1992-01-29 | 2001-07-03 | Microsoft Corporation | Method and system for file system management using a flash-erasable, programmable, read-only memory |
US5596736A (en) | 1992-07-22 | 1997-01-21 | Fujitsu Limited | Data transfers to a backing store of a dynamically mapped data storage system in which data has nonsequential logical addresses |
US6330426B2 (en) | 1994-05-23 | 2001-12-11 | Stephen J. Brown | System and method for remote education using a memory card |
US5845329A (en) | 1993-01-29 | 1998-12-01 | Sanyo Electric Co., Ltd. | Parallel computer |
US5459850A (en) | 1993-02-19 | 1995-10-17 | Conner Peripherals, Inc. | Flash solid state drive that emulates a disk drive and stores variable length and fixed lenth data blocks |
JP2856621B2 (en) | 1993-02-24 | 1999-02-10 | インターナショナル・ビジネス・マシーンズ・コーポレイション | Batch erase nonvolatile memory and semiconductor disk device using the same |
US5404485A (en) | 1993-03-08 | 1995-04-04 | M-Systems Flash Disk Pioneers Ltd. | Flash file system |
JP2784440B2 (en) | 1993-04-14 | 1998-08-06 | インターナショナル・ビジネス・マシーンズ・コーポレイション | Data page transfer control method |
CA2121852A1 (en) | 1993-04-29 | 1994-10-30 | Larry T. Jost | Disk meshing and flexible storage mapping with enhanced flexible caching |
US5499354A (en) | 1993-05-19 | 1996-03-12 | International Business Machines Corporation | Method and means for dynamic cache management by variable space and time binding and rebinding of cache extents to DASD cylinders |
US5535399A (en) | 1993-09-30 | 1996-07-09 | Quantum Corporation | Solid state disk drive unit having on-board backup non-volatile memory |
US5809527A (en) | 1993-12-23 | 1998-09-15 | Unisys Corporation | Outboard file cache system |
JPH086854A (en) | 1993-12-23 | 1996-01-12 | Unisys Corp | Outboard-file-cache external processing complex |
GB9326499D0 (en) | 1993-12-24 | 1994-03-02 | Deas Alexander R | Flash memory system with arbitrary block size |
US5553261A (en) | 1994-04-01 | 1996-09-03 | Intel Corporation | Method of performing clean-up of a solid state disk while executing a read command |
US5696917A (en) | 1994-06-03 | 1997-12-09 | Intel Corporation | Method and apparatus for performing burst read operations in an asynchronous nonvolatile memory |
US5504882A (en) | 1994-06-20 | 1996-04-02 | International Business Machines Corporation | Fault tolerant data storage subsystem employing hierarchically arranged controllers |
DE19540915A1 (en) | 1994-11-10 | 1996-05-15 | Raymond Engineering | Redundant arrangement of solid state memory modules |
US6170047B1 (en) | 1994-11-16 | 2001-01-02 | Interactive Silicon, Inc. | System and method for managing system memory and/or non-volatile memory using a memory controller with integrated compression and decompression capabilities |
US6002411A (en) | 1994-11-16 | 1999-12-14 | Interactive Silicon, Inc. | Integrated video and memory controller with data processing and graphical processing capabilities |
US5586291A (en) | 1994-12-23 | 1996-12-17 | Emc Corporation | Disk controller with volatile and non-volatile cache memories |
US5651133A (en) | 1995-02-01 | 1997-07-22 | Hewlett-Packard Company | Methods for avoiding over-commitment of virtual capacity in a redundant hierarchic data storage system |
US5701434A (en) | 1995-03-16 | 1997-12-23 | Hitachi, Ltd. | Interleave memory controller with a common access queue |
EP0747825B1 (en) | 1995-06-06 | 2001-09-19 | Hewlett-Packard Company, A Delaware Corporation | SDRAM data allocation system and method |
US5682499A (en) | 1995-06-06 | 1997-10-28 | International Business Machines Corporation | Directory rebuild method and apparatus for maintaining and rebuilding directory information for compressed data on direct access storage device (DASD) |
US5845313A (en) | 1995-07-31 | 1998-12-01 | Lexar | Direct logical block addressing flash memory mass storage architecture |
US5930815A (en) | 1995-07-31 | 1999-07-27 | Lexar Media, Inc. | Moving sequential sectors within a block of information in a flash memory mass storage architecture |
US5754563A (en) | 1995-09-11 | 1998-05-19 | Ecc Technologies, Inc. | Byte-parallel system for implementing reed-solomon error-correcting codes |
US6014724A (en) | 1995-10-27 | 2000-01-11 | Scm Microsystems (U.S.) Inc. | Flash translation layer block indication map revision system and method |
US5787486A (en) | 1995-12-15 | 1998-07-28 | International Business Machines Corporation | Bus protocol for locked cycle cache hit |
US5757567A (en) | 1996-02-08 | 1998-05-26 | International Business Machines Corporation | Method and apparatus for servo control with high efficiency gray code for servo track ID |
US6385710B1 (en) | 1996-02-23 | 2002-05-07 | Sun Microsystems, Inc. | Multiple-mode external cache subsystem |
US5960462A (en) | 1996-09-26 | 1999-09-28 | Intel Corporation | Method and apparatus for analyzing a main memory configuration to program a memory controller |
US5754567A (en) | 1996-10-15 | 1998-05-19 | Micron Quantum Devices, Inc. | Write reduction in flash memory systems through ECC usage |
TW349196B (en) | 1996-10-18 | 1999-01-01 | Ibm | Cached synchronous DRAM architecture having a mode register programmable cache policy |
US6279069B1 (en) | 1996-12-26 | 2001-08-21 | Intel Corporation | Interface for flash EEPROM memory arrays |
US5802602A (en) | 1997-01-17 | 1998-09-01 | Intel Corporation | Method and apparatus for performing reads of related data from a set-associative cache memory |
US6073232A (en) | 1997-02-25 | 2000-06-06 | International Business Machines Corporation | Method for minimizing a computer's initial program load time after a system reset or a power-on using non-volatile storage |
JP3459868B2 (en) | 1997-05-16 | 2003-10-27 | 日本電気株式会社 | Group replacement method in case of memory failure |
US6418478B1 (en) | 1997-10-30 | 2002-07-09 | Commvault Systems, Inc. | Pipelined high speed data transfer mechanism |
US6101601A (en) | 1998-04-20 | 2000-08-08 | International Business Machines Corporation | Method and apparatus for hibernation within a distributed data processing system |
US5957158A (en) | 1998-05-11 | 1999-09-28 | Automatic Switch Company | Visual position indicator |
US6185654B1 (en) | 1998-07-17 | 2001-02-06 | Compaq Computer Corporation | Phantom resource memory address mapping system |
US6507911B1 (en) | 1998-07-22 | 2003-01-14 | Entrust Technologies Limited | System and method for securely deleting plaintext data |
US6209088B1 (en) | 1998-09-21 | 2001-03-27 | Microsoft Corporation | Computer hibernation implemented by a computer operating system |
US6629112B1 (en) | 1998-12-31 | 2003-09-30 | Nortel Networks Limited | Resource management for CORBA-based applications |
US6412080B1 (en) | 1999-02-23 | 2002-06-25 | Microsoft Corporation | Lightweight persistent storage system for flash memory devices |
KR100330164B1 (en) | 1999-04-27 | 2002-03-28 | 윤종용 | A method for simultaneously programming plural flash memories having invalid blocks |
US7194740B1 (en) | 1999-05-28 | 2007-03-20 | Oracle International Corporation | System for extending an addressable range of memory |
US7660941B2 (en) | 2003-09-10 | 2010-02-09 | Super Talent Electronics, Inc. | Two-level RAM lookup table for block and page allocation and wear-leveling in limited-write flash-memories |
US6336174B1 (en) | 1999-08-09 | 2002-01-01 | Maxtor Corporation | Hardware assisted memory backup system and method |
KR100577380B1 (en) | 1999-09-29 | 2006-05-09 | 삼성전자주식회사 | A flash-memory and a it's controling method |
US8171204B2 (en) | 2000-01-06 | 2012-05-01 | Super Talent Electronics, Inc. | Intelligent solid-state non-volatile memory device (NVMD) system with multi-level caching of multiple channels |
US6785785B2 (en) | 2000-01-25 | 2004-08-31 | Hewlett-Packard Development Company, L.P. | Method for supporting multi-level stripping of non-homogeneous memory to maximize concurrency |
US6671757B1 (en) | 2000-01-26 | 2003-12-30 | Fusionone, Inc. | Data transfer and synchronization system |
US7089391B2 (en) | 2000-04-14 | 2006-08-08 | Quickshift, Inc. | Managing a codec engine for memory compression/decompression operations using a data movement engine |
US6523102B1 (en) | 2000-04-14 | 2003-02-18 | Interactive Silicon, Inc. | Parallel compression/decompression system and method for implementation of in-memory compressed cache improving storage density and access speed for industry standard memory subsystems and in-line memory modules |
AU2001275147A1 (en) | 2000-06-23 | 2002-01-08 | Intel Corporation | Non-volatile cache |
US6813686B1 (en) | 2000-06-27 | 2004-11-02 | Emc Corporation | Method and apparatus for identifying logical volumes in multiple element computer storage domains |
US6981070B1 (en) | 2000-07-12 | 2005-12-27 | Shun Hang Luk | Network storage device having solid-state non-volatile memory |
US6658438B1 (en) | 2000-08-14 | 2003-12-02 | Matrix Semiconductor, Inc. | Method for deleting stored digital data from write-once memory device |
JP3671138B2 (en) | 2000-08-17 | 2005-07-13 | ジャパンコンポジット株式会社 | Breathable waterproof covering structure and construction method thereof |
US6404647B1 (en) | 2000-08-24 | 2002-06-11 | Hewlett-Packard Co. | Solid-state mass memory storage device |
US6883079B1 (en) | 2000-09-01 | 2005-04-19 | Maxtor Corporation | Method and apparatus for using data compression as a means of increasing buffer bandwidth |
US6625685B1 (en) | 2000-09-20 | 2003-09-23 | Broadcom Corporation | Memory controller with programmable configuration |
US7039727B2 (en) | 2000-10-17 | 2006-05-02 | Microsoft Corporation | System and method for controlling mass storage class digital imaging devices |
US6779088B1 (en) | 2000-10-24 | 2004-08-17 | International Business Machines Corporation | Virtual uncompressed cache size control in compressed memory systems |
US20020154633A1 (en) | 2000-11-22 | 2002-10-24 | Yeshik Shin | Communications architecture for storage-based devices |
US6754785B2 (en) | 2000-12-01 | 2004-06-22 | Yan Chiew Chow | Switched multi-channel network interfaces and real-time streaming backup |
US6976060B2 (en) | 2000-12-05 | 2005-12-13 | Agami Sytems, Inc. | Symmetric shared file storage system |
US20020103819A1 (en) | 2000-12-12 | 2002-08-01 | Fresher Information Corporation | Technique for stabilizing data in a non-log based information storage and retrieval system |
US7013376B2 (en) | 2000-12-20 | 2006-03-14 | Hewlett-Packard Development Company, L.P. | Method and system for data block sparing in a solid-state storage device |
KR100365725B1 (en) | 2000-12-27 | 2002-12-26 | 한국전자통신연구원 | Ranked Cleaning Policy and Error Recovery Method for File Systems Using Flash Memory |
JP4818812B2 (en) | 2006-05-31 | 2011-11-16 | 株式会社日立製作所 | Flash memory storage system |
US6731447B2 (en) | 2001-06-04 | 2004-05-04 | Xerox Corporation | Secure data file erasure |
US6839808B2 (en) | 2001-07-06 | 2005-01-04 | Juniper Networks, Inc. | Processing cluster having multiple compute engines and shared tier one caches |
US6785776B2 (en) | 2001-07-26 | 2004-08-31 | International Business Machines Corporation | DMA exclusive cache state providing a fully pipelined input/output DMA write mechanism |
US7275135B2 (en) | 2001-08-31 | 2007-09-25 | Intel Corporation | Hardware updated metadata for non-volatile mass storage cache |
US20030061296A1 (en) | 2001-09-24 | 2003-03-27 | International Business Machines Corporation | Memory semantic storage I/O |
GB0123416D0 (en) | 2001-09-28 | 2001-11-21 | Memquest Ltd | Non-volatile memory control |
US6938133B2 (en) | 2001-09-28 | 2005-08-30 | Hewlett-Packard Development Company, L.P. | Memory latency and bandwidth optimizations |
US6892264B2 (en) | 2001-10-05 | 2005-05-10 | International Business Machines Corporation | Storage area network methods and apparatus for associating a logical identification with a physical identification |
US7173929B1 (en) | 2001-12-10 | 2007-02-06 | Incipient, Inc. | Fast path for performing data operations |
US7013379B1 (en) | 2001-12-10 | 2006-03-14 | Incipient, Inc. | I/O primitives |
JP4061272B2 (en) | 2002-01-09 | 2008-03-12 | 株式会社ルネサステクノロジ | Memory system and memory card |
JP4154893B2 (en) | 2002-01-23 | 2008-09-24 | 株式会社日立製作所 | Network storage virtualization method |
US20030145230A1 (en) | 2002-01-31 | 2003-07-31 | Huimin Chiu | System for exchanging data utilizing remote direct memory access |
US7010662B2 (en) | 2002-02-27 | 2006-03-07 | Microsoft Corporation | Dynamic data structures for tracking file system free space in a flash memory device |
US7533214B2 (en) | 2002-02-27 | 2009-05-12 | Microsoft Corporation | Open architecture flash driver |
US7085879B2 (en) | 2002-02-27 | 2006-08-01 | Microsoft Corporation | Dynamic data structures for tracking data stored in a flash memory device |
JP2003281071A (en) | 2002-03-20 | 2003-10-03 | Seiko Epson Corp | Data transfer controller, electronic equipment and data transfer control method |
JP4050548B2 (en) | 2002-04-18 | 2008-02-20 | 株式会社ルネサステクノロジ | Semiconductor memory device |
US7043599B1 (en) | 2002-06-20 | 2006-05-09 | Rambus Inc. | Dynamic memory supporting simultaneous refresh and data-access transactions |
US7562089B2 (en) | 2002-06-26 | 2009-07-14 | Seagate Technology Llc | Systems and methods for storing information to allow users to manage files |
US7082495B2 (en) | 2002-06-27 | 2006-07-25 | Microsoft Corporation | Method and apparatus to reduce power consumption and improve read/write performance of hard disk drives using non-volatile memory |
JP4001516B2 (en) | 2002-07-05 | 2007-10-31 | 富士通株式会社 | Degeneration control device and method |
US7051152B1 (en) | 2002-08-07 | 2006-05-23 | Nvidia Corporation | Method and system of improving disk access time by compression |
KR100505638B1 (en) | 2002-08-28 | 2005-08-03 | 삼성전자주식회사 | Apparatus and method for saving and restoring of working context |
US7340566B2 (en) | 2002-10-21 | 2008-03-04 | Microsoft Corporation | System and method for initializing a memory device from block oriented NAND flash |
US7171536B2 (en) | 2002-10-28 | 2007-01-30 | Sandisk Corporation | Unusable block management within a non-volatile memory system |
US7035974B2 (en) | 2002-11-06 | 2006-04-25 | Synology Inc. | RAID-5 disk having cache memory implemented using non-volatile RAM |
US6996676B2 (en) | 2002-11-14 | 2006-02-07 | International Business Machines Corporation | System and method for implementing an adaptive replacement cache policy |
US7093101B2 (en) | 2002-11-21 | 2006-08-15 | Microsoft Corporation | Dynamic data structures for tracking file system free space in a flash memory device |
US20040225881A1 (en) | 2002-12-02 | 2004-11-11 | Walmsley Simon Robert | Variant keys |
US6957158B1 (en) | 2002-12-23 | 2005-10-18 | Power Measurement Ltd. | High density random access memory in an intelligent electric device |
US20040148360A1 (en) | 2003-01-24 | 2004-07-29 | Hewlett-Packard Development Company | Communication-link-attached persistent memory device |
US6959369B1 (en) | 2003-03-06 | 2005-10-25 | International Business Machines Corporation | Method, system, and program for data backup |
US8041878B2 (en) | 2003-03-19 | 2011-10-18 | Samsung Electronics Co., Ltd. | Flash file system |
US7610348B2 (en) | 2003-05-07 | 2009-10-27 | International Business Machines | Distributed file serving architecture system with metadata storage virtualization and data access at the data server connection speed |
JP2004348818A (en) | 2003-05-20 | 2004-12-09 | Sharp Corp | Method and system for controlling writing in semiconductor memory device, and portable electronic device |
US7243203B2 (en) | 2003-06-13 | 2007-07-10 | Sandisk 3D Llc | Pipeline circuit for low latency memory |
US7047366B1 (en) | 2003-06-17 | 2006-05-16 | Emc Corporation | QOS feature knobs |
US20040268359A1 (en) | 2003-06-27 | 2004-12-30 | Hanes David H. | Computer-readable medium, method and computer system for processing input/output requests |
US7487235B2 (en) | 2003-09-24 | 2009-02-03 | Dell Products L.P. | Dynamically varying a raid cache policy in order to optimize throughput |
US7173852B2 (en) | 2003-10-03 | 2007-02-06 | Sandisk Corporation | Corrected data storage and handling methods |
US7096321B2 (en) | 2003-10-21 | 2006-08-22 | International Business Machines Corporation | Method and system for a cache replacement technique with adaptive skipping |
WO2005065084A2 (en) | 2003-11-13 | 2005-07-21 | Commvault Systems, Inc. | System and method for providing encryption in pipelined storage operations in a storage network |
WO2005050453A1 (en) | 2003-11-18 | 2005-06-02 | Matsushita Electric Industrial Co., Ltd. | File recording device |
US7139864B2 (en) | 2003-12-30 | 2006-11-21 | Sandisk Corporation | Non-volatile memory and method with block management system |
US7328307B2 (en) | 2004-01-22 | 2008-02-05 | Tquist, Llc | Method and apparatus for improving update performance of non-uniform access time persistent storage media |
US7356651B2 (en) | 2004-01-30 | 2008-04-08 | Piurata Technologies, Llc | Data-aware cache state machine |
US7305520B2 (en) | 2004-01-30 | 2007-12-04 | Hewlett-Packard Development Company, L.P. | Storage system with capability to allocate virtual storage segments among a plurality of controllers |
US7130956B2 (en) | 2004-02-10 | 2006-10-31 | Sun Microsystems, Inc. | Storage system including hierarchical cache metadata |
US7130957B2 (en) | 2004-02-10 | 2006-10-31 | Sun Microsystems, Inc. | Storage system structure for storing relational cache metadata |
US7231590B2 (en) | 2004-02-11 | 2007-06-12 | Microsoft Corporation | Method and apparatus for visually emphasizing numerical data contained within an electronic document |
US7725628B1 (en) | 2004-04-20 | 2010-05-25 | Lexar Media, Inc. | Direct secondary device interface by a host |
US20050240713A1 (en) | 2004-04-22 | 2005-10-27 | V-Da Technology | Flash memory device with ATA/ATAPI/SCSI or proprietary programming interface on PCI express |
JP4755642B2 (en) | 2004-04-26 | 2011-08-24 | ストアウィズ インク | Method and system for file compression and operation of compressed files for storage |
US7644239B2 (en) | 2004-05-03 | 2010-01-05 | Microsoft Corporation | Non-volatile memory cache performance improvement |
US7360015B2 (en) | 2004-05-04 | 2008-04-15 | Intel Corporation | Preventing storage of streaming accesses in a cache |
US7386663B2 (en) | 2004-05-13 | 2008-06-10 | Cousins Robert E | Transaction-based storage system and method that uses variable sized objects to store data |
US20050257017A1 (en) | 2004-05-14 | 2005-11-17 | Hideki Yagi | Method and apparatus to erase hidden memory in a memory card |
US7831561B2 (en) | 2004-05-18 | 2010-11-09 | Oracle International Corporation | Automated disk-oriented backups |
US7447847B2 (en) | 2004-07-19 | 2008-11-04 | Micron Technology, Inc. | Memory device trims |
US7395384B2 (en) | 2004-07-21 | 2008-07-01 | Sandisk Corproation | Method and apparatus for maintaining data on non-volatile memory systems |
US8407396B2 (en) | 2004-07-30 | 2013-03-26 | Hewlett-Packard Development Company, L.P. | Providing block data access for an operating system using solid-state memory |
US7203815B2 (en) | 2004-07-30 | 2007-04-10 | International Business Machines Corporation | Multi-level page cache for enhanced file system performance via read ahead |
US7664239B2 (en) | 2004-08-09 | 2010-02-16 | Cox Communications, Inc. | Methods and computer-readable media for managing and configuring options for the real-time notification and disposition of voice services in a cable services network |
US7398348B2 (en) | 2004-08-24 | 2008-07-08 | Sandisk 3D Llc | Method and apparatus for using a one-time or few-time programmable memory with a host device designed for erasable/rewritable memory |
US20060075057A1 (en) | 2004-08-30 | 2006-04-06 | International Business Machines Corporation | Remote direct memory access system and method |
WO2006025322A1 (en) | 2004-08-30 | 2006-03-09 | Matsushita Electric Industrial Co., Ltd. | Recorder |
US7603532B2 (en) | 2004-10-15 | 2009-10-13 | Netapp, Inc. | System and method for reclaiming unused space from a thinly provisioned data container |
US8131969B2 (en) | 2004-10-20 | 2012-03-06 | Seagate Technology Llc | Updating system configuration information |
US7873782B2 (en) | 2004-11-05 | 2011-01-18 | Data Robotics, Inc. | Filesystem-aware block storage system, apparatus, and method |
AU2004325580A1 (en) | 2004-12-06 | 2006-06-15 | Peter Jensen | System and method of erasing non-volatile recording media |
US8074041B2 (en) | 2004-12-09 | 2011-12-06 | International Business Machines Corporation | Apparatus, system, and method for managing storage space allocation |
US7581118B2 (en) | 2004-12-14 | 2009-08-25 | Netapp, Inc. | Disk sanitization using encryption |
US7487320B2 (en) | 2004-12-15 | 2009-02-03 | International Business Machines Corporation | Apparatus and system for dynamically allocating main memory among a plurality of applications |
KR100684887B1 (en) | 2005-02-04 | 2007-02-20 | 삼성전자주식회사 | Data storing device including flash memory and merge method of thereof |
US20060136657A1 (en) | 2004-12-22 | 2006-06-22 | Intel Corporation | Embedding a filesystem into a non-volatile device |
US20060143396A1 (en) | 2004-12-29 | 2006-06-29 | Mason Cabot | Method for programmer-controlled cache line eviction policy |
US7246195B2 (en) | 2004-12-30 | 2007-07-17 | Intel Corporation | Data storage management for flash memory devices |
US9104315B2 (en) | 2005-02-04 | 2015-08-11 | Sandisk Technologies Inc. | Systems and methods for a mass data storage system having a file-based interface to a host and a non-file-based interface to secondary storage |
US20060184719A1 (en) | 2005-02-16 | 2006-08-17 | Sinclair Alan W | Direct data file storage implementation techniques in flash memories |
US20060190552A1 (en) | 2005-02-24 | 2006-08-24 | Henze Richard H | Data retention system with a plurality of access protocols |
US7254686B2 (en) | 2005-03-31 | 2007-08-07 | International Business Machines Corporation | Switching between mirrored and non-mirrored volumes |
US7620773B2 (en) | 2005-04-15 | 2009-11-17 | Microsoft Corporation | In-line non volatile memory disk read cache and write buffer |
US20060236061A1 (en) | 2005-04-18 | 2006-10-19 | Creek Path Systems | Systems and methods for adaptively deriving storage policy and configuration rules |
US8452929B2 (en) | 2005-04-21 | 2013-05-28 | Violin Memory Inc. | Method and system for storage of data in non-volatile media |
US7702873B2 (en) | 2005-04-25 | 2010-04-20 | Network Appliance, Inc. | Managing common storage by allowing delayed allocation of storage after reclaiming reclaimable space in a logical volume |
US20060265636A1 (en) | 2005-05-19 | 2006-11-23 | Klaus Hummler | Optimized testing of on-chip error correction circuit |
US7457910B2 (en) | 2005-06-29 | 2008-11-25 | Sandisk Corproation | Method and system for managing partitions in a storage device |
US7716387B2 (en) | 2005-07-14 | 2010-05-11 | Canon Kabushiki Kaisha | Memory control apparatus and method |
US7552271B2 (en) | 2005-08-03 | 2009-06-23 | Sandisk Corporation | Nonvolatile memory with block management |
US7984084B2 (en) | 2005-08-03 | 2011-07-19 | SanDisk Technologies, Inc. | Non-volatile memory with scheduled reclaim operations |
KR100739722B1 (en) | 2005-08-20 | 2007-07-13 | 삼성전자주식회사 | A method for managing a flash memory and a flash memory system |
US7580287B2 (en) | 2005-09-01 | 2009-08-25 | Micron Technology, Inc. | Program and read trim setting |
JP5008845B2 (en) | 2005-09-01 | 2012-08-22 | 株式会社日立製作所 | Storage system, storage apparatus and control method thereof |
US20070061508A1 (en) | 2005-09-13 | 2007-03-15 | Quantum Corporation | Data storage cartridge with built-in tamper-resistant clock |
US7437510B2 (en) | 2005-09-30 | 2008-10-14 | Intel Corporation | Instruction-assisted cache management for efficient use of cache and memory |
US7529905B2 (en) | 2005-10-13 | 2009-05-05 | Sandisk Corporation | Method of storing transformed units of data in a memory system having fixed sized storage blocks |
US7516267B2 (en) | 2005-11-03 | 2009-04-07 | Intel Corporation | Recovering from a non-volatile memory failure |
US7739472B2 (en) | 2005-11-22 | 2010-06-15 | Sandisk Corporation | Memory system for legacy hosts |
US7366808B2 (en) | 2005-11-23 | 2008-04-29 | Hitachi, Ltd. | System, method and apparatus for multiple-protocol-accessible OSD storage subsystem |
US7526614B2 (en) | 2005-11-30 | 2009-04-28 | Red Hat, Inc. | Method for tuning a cache |
US7877540B2 (en) | 2005-12-13 | 2011-01-25 | Sandisk Corporation | Logically-addressed file storage methods |
US20070143566A1 (en) | 2005-12-21 | 2007-06-21 | Gorobets Sergey A | Non-volatile memories with data alignment in a directly mapped file storage system |
US20070143567A1 (en) | 2005-12-21 | 2007-06-21 | Gorobets Sergey A | Methods for data alignment in non-volatile memories with a directly mapped file storage system |
US20070143560A1 (en) | 2005-12-21 | 2007-06-21 | Gorobets Sergey A | Non-volatile memories with memory allocation for a directly mapped file storage system |
US20070156998A1 (en) | 2005-12-21 | 2007-07-05 | Gorobets Sergey A | Methods for memory allocation in non-volatile memories with a directly mapped file storage system |
US7769978B2 (en) | 2005-12-21 | 2010-08-03 | Sandisk Corporation | Method and system for accessing non-volatile storage devices |
US7747837B2 (en) | 2005-12-21 | 2010-06-29 | Sandisk Corporation | Method and system for accessing non-volatile storage devices |
US7831783B2 (en) | 2005-12-22 | 2010-11-09 | Honeywell International Inc. | Effective wear-leveling and concurrent reclamation method for embedded linear flash file systems |
US20070150663A1 (en) | 2005-12-27 | 2007-06-28 | Abraham Mendelson | Device, system and method of multi-state cache coherence scheme |
JP4392049B2 (en) | 2006-02-27 | 2009-12-24 | 富士通株式会社 | Cache control device and cache control program |
US20070208790A1 (en) | 2006-03-06 | 2007-09-06 | Reuter James M | Distributed data-storage system |
US20070233937A1 (en) | 2006-03-31 | 2007-10-04 | Coulson Richard L | Reliability of write operations to a non-volatile memory |
US7676628B1 (en) | 2006-03-31 | 2010-03-09 | Emc Corporation | Methods, systems, and computer program products for providing access to shared storage by computing grids and clusters with large numbers of nodes |
US7636829B2 (en) | 2006-05-02 | 2009-12-22 | Intel Corporation | System and method for allocating and deallocating memory within transactional code |
US20070261030A1 (en) | 2006-05-04 | 2007-11-08 | Gaurav Wadhwa | Method and system for tracking and prioritizing applications |
US8307148B2 (en) | 2006-06-23 | 2012-11-06 | Microsoft Corporation | Flash management techniques |
US7721059B2 (en) | 2006-07-06 | 2010-05-18 | Nokia Corporation | Performance optimization in solid-state media |
US20080052377A1 (en) | 2006-07-11 | 2008-02-28 | Robert Light | Web-Based User-Dependent Customer Service Interaction with Co-Browsing |
US7870306B2 (en) | 2006-08-31 | 2011-01-11 | Cisco Technology, Inc. | Shared memory message switch and cache |
JP4452261B2 (en) | 2006-09-12 | 2010-04-21 | 株式会社日立製作所 | Storage system logical volume management method, logical volume management program, and storage system |
JP4942446B2 (en) | 2006-10-11 | 2012-05-30 | 株式会社日立製作所 | Storage apparatus and control method thereof |
US7685178B2 (en) | 2006-10-31 | 2010-03-23 | Netapp, Inc. | System and method for examining client generated content stored on a data container exported by a storage system |
US20080120469A1 (en) | 2006-11-22 | 2008-05-22 | International Business Machines Corporation | Systems and Arrangements for Cache Management |
US7904647B2 (en) | 2006-11-27 | 2011-03-08 | Lsi Corporation | System for optimizing the performance and reliability of a storage controller cache offload circuit |
US8935302B2 (en) | 2006-12-06 | 2015-01-13 | Intelligent Intellectual Property Holdings 2 Llc | Apparatus, system, and method for data block usage information synchronization for a non-volatile storage volume |
US8151082B2 (en) | 2007-12-06 | 2012-04-03 | Fusion-Io, Inc. | Apparatus, system, and method for converting a storage request into an append data storage command |
WO2008070798A1 (en) | 2006-12-06 | 2008-06-12 | Fusion Multisystems, Inc. (Dba Fusion-Io) | Apparatus, system, and method for managing commands of solid-state storage using bank interleave |
US20080140737A1 (en) | 2006-12-08 | 2008-06-12 | Apple Computer, Inc. | Dynamic memory management |
US20080140918A1 (en) | 2006-12-11 | 2008-06-12 | Pantas Sutardja | Hybrid non-volatile solid state memory system |
US7660911B2 (en) | 2006-12-20 | 2010-02-09 | Smart Modular Technologies, Inc. | Block-based data striping to flash memory |
US7913051B1 (en) | 2006-12-22 | 2011-03-22 | Emc Corporation | Methods and apparatus for increasing the storage capacity of a zone of a storage system |
US20080229045A1 (en) | 2007-03-16 | 2008-09-18 | Lsi Logic Corporation | Storage system provisioning architecture |
US8135900B2 (en) | 2007-03-28 | 2012-03-13 | Kabushiki Kaisha Toshiba | Integrated memory management and memory management method |
US20080243966A1 (en) | 2007-04-02 | 2008-10-02 | Croisettier Ramanakumari M | System and method for managing temporary storage space of a database management system |
US8429677B2 (en) | 2007-04-19 | 2013-04-23 | Microsoft Corporation | Composite solid state drive identification and optimization technologies |
US9207876B2 (en) | 2007-04-19 | 2015-12-08 | Microsoft Technology Licensing, Llc | Remove-on-delete technologies for solid state drive optimization |
US7853759B2 (en) | 2007-04-23 | 2010-12-14 | Microsoft Corporation | Hints model for optimization of storage devices connected to host and write optimization schema for storage devices |
JP2008276646A (en) | 2007-05-02 | 2008-11-13 | Hitachi Ltd | Storage device and data management method for storage device |
US8850154B2 (en) | 2007-09-11 | 2014-09-30 | 2236008 Ontario Inc. | Processing system having memory partitioning |
US20090070526A1 (en) | 2007-09-12 | 2009-03-12 | Tetrick R Scott | Using explicit disk block cacheability attributes to enhance i/o caching efficiency |
US7873803B2 (en) | 2007-09-25 | 2011-01-18 | Sandisk Corporation | Nonvolatile memory with self recovery |
TWI366828B (en) | 2007-09-27 | 2012-06-21 | Phison Electronics Corp | Wear leveling method and controller using the same |
KR101148319B1 (en) | 2007-11-05 | 2012-05-24 | 노키아 지멘스 네트웍스 오와이 | Buffer status reporting system and method |
JP2009122850A (en) | 2007-11-13 | 2009-06-04 | Toshiba Corp | Block device control device and access range management method |
US8195912B2 (en) | 2007-12-06 | 2012-06-05 | Fusion-io, Inc | Apparatus, system, and method for efficient mapping of virtual and physical addresses |
KR101086855B1 (en) | 2008-03-10 | 2011-11-25 | 주식회사 팍스디스크 | Solid State Storage System with High Speed and Controlling Method thereof |
US8051243B2 (en) | 2008-04-30 | 2011-11-01 | Hitachi, Ltd. | Free space utilization in tiered storage systems |
US20090276654A1 (en) | 2008-05-02 | 2009-11-05 | International Business Machines Corporation | Systems and methods for implementing fault tolerant data processing services |
JP5159421B2 (en) | 2008-05-14 | 2013-03-06 | 株式会社日立製作所 | Storage system and storage system management method using management device |
US8775718B2 (en) | 2008-05-23 | 2014-07-08 | Netapp, Inc. | Use of RDMA to access non-volatile solid-state memory in a network storage system |
US8554983B2 (en) | 2008-05-27 | 2013-10-08 | Micron Technology, Inc. | Devices and methods for operating a solid state drive |
WO2009149386A1 (en) | 2008-06-06 | 2009-12-10 | Pivot3 | Method and system for distributed raid implementation |
US7917803B2 (en) | 2008-06-17 | 2011-03-29 | Seagate Technology Llc | Data conflict resolution for solid-state memory devices |
US8843691B2 (en) | 2008-06-25 | 2014-09-23 | Stec, Inc. | Prioritized erasure of data blocks in a flash storage device |
US8135907B2 (en) | 2008-06-30 | 2012-03-13 | Oracle America, Inc. | Method and system for managing wear-level aware file systems |
JP5242264B2 (en) | 2008-07-07 | 2013-07-24 | 株式会社東芝 | Data control apparatus, storage system, and program |
US20100017556A1 (en) | 2008-07-19 | 2010-01-21 | Nanostar Corporationm U.S.A. | Non-volatile memory storage system with two-stage controller architecture |
KR101086857B1 (en) | 2008-07-25 | 2011-11-25 | 주식회사 팍스디스크 | Control Method of Solid State Storage System for Data Merging |
US7941591B2 (en) | 2008-07-28 | 2011-05-10 | CacheIQ, Inc. | Flash DIMM in a standalone cache appliance system and methodology |
JP5216463B2 (en) | 2008-07-30 | 2013-06-19 | 株式会社日立製作所 | Storage device, storage area management method thereof, and flash memory package |
US8205063B2 (en) | 2008-12-30 | 2012-06-19 | Sandisk Technologies Inc. | Dynamic mapping of logical ranges to write blocks |
US20100235597A1 (en) | 2009-03-10 | 2010-09-16 | Hiroshi Arakawa | Method and apparatus for conversion between conventional volumes and thin provisioning with automated tier management |
US8433845B2 (en) | 2009-04-08 | 2013-04-30 | Google Inc. | Data storage device which serializes memory device ready/busy signals |
US20100262979A1 (en) | 2009-04-08 | 2010-10-14 | Google Inc. | Circular command queues for communication between a host and a data storage device |
US8639871B2 (en) | 2009-04-08 | 2014-01-28 | Google Inc. | Partitioning a flash memory data storage device |
US8516219B2 (en) | 2009-07-24 | 2013-08-20 | Apple Inc. | Index cache tree |
US20120159040A1 (en) | 2010-12-15 | 2012-06-21 | Dhaval Parikh | Auxiliary Interface for Non-Volatile Memory System |
-
2013
- 2013-04-17 US US13/865,153 patent/US9563555B2/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6477612B1 (en) * | 2000-02-08 | 2002-11-05 | Microsoft Corporation | Providing access to physical memory allocated to a process by selectively mapping pages of the physical memory with virtual memory allocated to the process |
US7278008B1 (en) * | 2004-01-30 | 2007-10-02 | Nvidia Corporation | Virtual address translation system with caching of variable-range translation clusters |
Cited By (485)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11379119B2 (en) | 2010-03-05 | 2022-07-05 | Netapp, Inc. | Writing data in a distributed data storage system |
US9767271B2 (en) | 2010-07-15 | 2017-09-19 | The Research Foundation For The State University Of New York | System and method for validating program execution at run-time |
US10353630B1 (en) | 2010-09-15 | 2019-07-16 | Pure Storage, Inc. | Simultaneously servicing high latency operations in a storage system |
US10126982B1 (en) | 2010-09-15 | 2018-11-13 | Pure Storage, Inc. | Adjusting a number of storage devices in a storage system that may be utilized to simultaneously service high latency operations |
US9684460B1 (en) | 2010-09-15 | 2017-06-20 | Pure Storage, Inc. | Proactively correcting behavior that may affect I/O performance in a non-volatile semiconductor storage device |
US10228865B1 (en) | 2010-09-15 | 2019-03-12 | Pure Storage, Inc. | Maintaining a target number of storage devices for variable I/O response times in a storage system |
US11307772B1 (en) | 2010-09-15 | 2022-04-19 | Pure Storage, Inc. | Responding to variable response time behavior in a storage environment |
US11275509B1 (en) | 2010-09-15 | 2022-03-15 | Pure Storage, Inc. | Intelligently sizing high latency I/O requests in a storage environment |
US11614893B2 (en) | 2010-09-15 | 2023-03-28 | Pure Storage, Inc. | Optimizing storage device access based on latency |
US12008266B2 (en) | 2010-09-15 | 2024-06-11 | Pure Storage, Inc. | Efficient read by reconstruction |
US10156998B1 (en) | 2010-09-15 | 2018-12-18 | Pure Storage, Inc. | Reducing a number of storage devices in a storage system that are exhibiting variable I/O response times |
US11435904B1 (en) | 2010-09-28 | 2022-09-06 | Pure Storage, Inc. | Dynamic protection data in a storage system |
US11579974B1 (en) | 2010-09-28 | 2023-02-14 | Pure Storage, Inc. | Data protection using intra-device parity and intra-device parity |
US10810083B1 (en) | 2010-09-28 | 2020-10-20 | Pure Storage, Inc. | Decreasing parity overhead in a storage system |
US10817375B2 (en) | 2010-09-28 | 2020-10-27 | Pure Storage, Inc. | Generating protection data in a storage system |
US12086030B2 (en) | 2010-09-28 | 2024-09-10 | Pure Storage, Inc. | Data protection using distributed intra-device parity and inter-device parity |
US10452289B1 (en) | 2010-09-28 | 2019-10-22 | Pure Storage, Inc. | Dynamically adjusting an amount of protection data stored in a storage system |
US10180879B1 (en) | 2010-09-28 | 2019-01-15 | Pure Storage, Inc. | Inter-device and intra-device protection data |
US11797386B2 (en) | 2010-09-28 | 2023-10-24 | Pure Storage, Inc. | Flexible RAID layouts in a storage system |
US11636031B2 (en) | 2011-08-11 | 2023-04-25 | Pure Storage, Inc. | Optimized inline deduplication |
US10061798B2 (en) | 2011-10-14 | 2018-08-28 | Pure Storage, Inc. | Method for maintaining multiple fingerprint tables in a deduplicating storage system |
US11341117B2 (en) | 2011-10-14 | 2022-05-24 | Pure Storage, Inc. | Deduplication table management |
US10540343B2 (en) | 2011-10-14 | 2020-01-21 | Pure Storage, Inc. | Data object attribute based event detection in a storage system |
US9811551B1 (en) | 2011-10-14 | 2017-11-07 | Pure Storage, Inc. | Utilizing multiple fingerprint tables in a deduplicating storage system |
US20140173223A1 (en) * | 2011-12-13 | 2014-06-19 | Nathaniel S DeNeui | Storage controller with host collaboration for initialization of a logical volume |
US10951488B2 (en) | 2011-12-27 | 2021-03-16 | Netapp, Inc. | Rule-based performance class access management for storage cluster performance guarantees |
US10911328B2 (en) | 2011-12-27 | 2021-02-02 | Netapp, Inc. | Quality of service policy based load adaption |
US11212196B2 (en) | 2011-12-27 | 2021-12-28 | Netapp, Inc. | Proportional quality of service based on client impact on an overload condition |
US10073656B2 (en) | 2012-01-27 | 2018-09-11 | Sandisk Technologies Llc | Systems and methods for storage virtualization |
US10089010B1 (en) | 2012-03-15 | 2018-10-02 | Pure Storage, Inc. | Identifying fractal regions across multiple storage devices |
US10521120B1 (en) | 2012-03-15 | 2019-12-31 | Pure Storage, Inc. | Intelligently mapping virtual blocks to physical blocks in a storage system |
US9792045B1 (en) | 2012-03-15 | 2017-10-17 | Pure Storage, Inc. | Distributing data blocks across a plurality of storage devices |
US8931054B2 (en) * | 2012-06-28 | 2015-01-06 | International Business Machines Corporation | Secure access to shared storage resources |
US20140006708A1 (en) * | 2012-06-28 | 2014-01-02 | International Business Machines Corporation | Secure access to shared storage resources |
US9058123B2 (en) | 2012-08-31 | 2015-06-16 | Intelligent Intellectual Property Holdings 2 Llc | Systems, methods, and interfaces for adaptive persistence |
US10346095B2 (en) * | 2012-08-31 | 2019-07-09 | Sandisk Technologies, Llc | Systems, methods, and interfaces for adaptive cache persistence |
US10359972B2 (en) * | 2012-08-31 | 2019-07-23 | Sandisk Technologies Llc | Systems, methods, and interfaces for adaptive persistence |
US20140068183A1 (en) * | 2012-08-31 | 2014-03-06 | Fusion-Io, Inc. | Systems, methods, and interfaces for adaptive persistence |
US20140068197A1 (en) * | 2012-08-31 | 2014-03-06 | Fusion-Io, Inc. | Systems, methods, and interfaces for adaptive cache persistence |
US9767284B2 (en) | 2012-09-14 | 2017-09-19 | The Research Foundation For The State University Of New York | Continuous run-time validation of program execution: a practical approach |
US10318495B2 (en) | 2012-09-24 | 2019-06-11 | Sandisk Technologies Llc | Snapshots for a non-volatile device |
US10509776B2 (en) | 2012-09-24 | 2019-12-17 | Sandisk Technologies Llc | Time sequence data management |
US10623386B1 (en) | 2012-09-26 | 2020-04-14 | Pure Storage, Inc. | Secret sharing data protection in a storage system |
US11032259B1 (en) | 2012-09-26 | 2021-06-08 | Pure Storage, Inc. | Data protection in a storage system |
US11924183B2 (en) | 2012-09-26 | 2024-03-05 | Pure Storage, Inc. | Encrypting data in a non-volatile memory express (‘NVMe’) storage device |
US10284367B1 (en) | 2012-09-26 | 2019-05-07 | Pure Storage, Inc. | Encrypting data in a storage system using a plurality of encryption keys |
US10324795B2 (en) | 2012-10-01 | 2019-06-18 | The Research Foundation for the State University o | System and method for security and privacy aware virtual machine checkpointing |
US9552495B2 (en) | 2012-10-01 | 2017-01-24 | The Research Foundation For The State University Of New York | System and method for security and privacy aware virtual machine checkpointing |
US9069782B2 (en) | 2012-10-01 | 2015-06-30 | The Research Foundation For The State University Of New York | System and method for security and privacy aware virtual machine checkpointing |
US11662936B2 (en) | 2013-01-10 | 2023-05-30 | Pure Storage, Inc. | Writing data using references to previously stored data |
US11573727B1 (en) | 2013-01-10 | 2023-02-07 | Pure Storage, Inc. | Virtual machine backup and restoration |
US9891858B1 (en) | 2013-01-10 | 2018-02-13 | Pure Storage, Inc. | Deduplication of regions with a storage system |
US10013317B1 (en) | 2013-01-10 | 2018-07-03 | Pure Storage, Inc. | Restoring a volume in a storage system |
US9880779B1 (en) | 2013-01-10 | 2018-01-30 | Pure Storage, Inc. | Processing copy offload requests in a storage system |
US12099741B2 (en) | 2013-01-10 | 2024-09-24 | Pure Storage, Inc. | Lightweight copying of data using metadata references |
US10585617B1 (en) | 2013-01-10 | 2020-03-10 | Pure Storage, Inc. | Buffering copy requests in a storage system |
US11768623B2 (en) | 2013-01-10 | 2023-09-26 | Pure Storage, Inc. | Optimizing generalized transfers between storage systems |
US10908835B1 (en) | 2013-01-10 | 2021-02-02 | Pure Storage, Inc. | Reversing deletion of a virtual machine |
US11733908B2 (en) | 2013-01-10 | 2023-08-22 | Pure Storage, Inc. | Delaying deletion of a dataset |
US11853584B1 (en) | 2013-01-10 | 2023-12-26 | Pure Storage, Inc. | Generating volume snapshots |
US11099769B1 (en) | 2013-01-10 | 2021-08-24 | Pure Storage, Inc. | Copying data without accessing the data |
US10530886B2 (en) | 2013-01-16 | 2020-01-07 | Cisco Technology, Inc. | Method for optimizing WAN traffic using a cached stream and determination of previous transmission |
US9300748B2 (en) * | 2013-01-16 | 2016-03-29 | Cisco Technology, Inc. | Method for optimizing WAN traffic with efficient indexing scheme |
US9306997B2 (en) | 2013-01-16 | 2016-04-05 | Cisco Technology, Inc. | Method for optimizing WAN traffic with deduplicated storage |
US9509736B2 (en) | 2013-01-16 | 2016-11-29 | Cisco Technology, Inc. | Method for optimizing WAN traffic |
US20140201384A1 (en) * | 2013-01-16 | 2014-07-17 | Cisco Technology, Inc. | Method for optimizing wan traffic with efficient indexing scheme |
US11868247B1 (en) * | 2013-01-28 | 2024-01-09 | Radian Memory Systems, Inc. | Storage system with multiplane segments and cooperative flash management |
US20140281126A1 (en) * | 2013-03-14 | 2014-09-18 | Sandisk Technologies Inc. | Overprovision capacity in a data storage device |
US9804960B2 (en) * | 2013-03-14 | 2017-10-31 | Western Digital Technologies, Inc. | Overprovision capacity in a data storage device |
US9218279B2 (en) | 2013-03-15 | 2015-12-22 | Western Digital Technologies, Inc. | Atomic write command support in a solid state drive |
US9594520B2 (en) | 2013-03-15 | 2017-03-14 | Western Digital Technologies, Inc. | Atomic write command support in a solid state drive |
US10254983B2 (en) | 2013-03-15 | 2019-04-09 | Western Digital Technologies, Inc. | Atomic write command support in a solid state drive |
US10102144B2 (en) | 2013-04-16 | 2018-10-16 | Sandisk Technologies Llc | Systems, methods and interfaces for data virtualization |
US20140344507A1 (en) * | 2013-04-16 | 2014-11-20 | Fusion-Io, Inc. | Systems and methods for storage metadata management |
US10558561B2 (en) * | 2013-04-16 | 2020-02-11 | Sandisk Technologies Llc | Systems and methods for storage metadata management |
US9170938B1 (en) * | 2013-05-17 | 2015-10-27 | Western Digital Technologies, Inc. | Method and system for atomically writing scattered information in a solid state storage device |
US9513831B2 (en) | 2013-05-17 | 2016-12-06 | Western Digital Technologies, Inc. | Method and system for atomically writing scattered information in a solid state storage device |
WO2015034483A1 (en) * | 2013-09-04 | 2015-03-12 | Intel Corporation | Mechanism for facilitating dynamic storage management for mobile computing devices |
US11394775B2 (en) | 2013-09-04 | 2022-07-19 | Intel Corporation | Mechanism for facilitating dynamic storage management for mobile computing devices |
US9563654B2 (en) | 2013-09-16 | 2017-02-07 | Netapp, Inc. | Dense tree volume metadata organization |
US9268502B2 (en) | 2013-09-16 | 2016-02-23 | Netapp, Inc. | Dense tree volume metadata organization |
WO2015038741A1 (en) * | 2013-09-16 | 2015-03-19 | Netapp, Inc. | Management of extent based metadata with dense tree structures within a distributed storage architecture |
WO2015048140A1 (en) * | 2013-09-24 | 2015-04-02 | Intelligent Intellectual Property Holdings 2 Llc | Systems and methods for storage collision management |
US20150248418A1 (en) * | 2013-10-09 | 2015-09-03 | Rahul M. Bhardwaj | Technology for managing cloud storage |
US9547654B2 (en) * | 2013-10-09 | 2017-01-17 | Intel Corporation | Technology for managing cloud storage |
EP3061008A4 (en) * | 2013-10-24 | 2017-05-03 | Western Digital Technologies, Inc. | Data storage device supporting accelerated database operations |
EP3061008A1 (en) * | 2013-10-24 | 2016-08-31 | Western Digital Technologies, Inc. | Data storage device supporting accelerated database operations |
US10444998B1 (en) | 2013-10-24 | 2019-10-15 | Western Digital Technologies, Inc. | Data storage device providing data maintenance services |
US20160004460A1 (en) * | 2013-10-29 | 2016-01-07 | Hitachi, Ltd. | Computer system and control method |
US11899986B2 (en) | 2013-11-06 | 2024-02-13 | Pure Storage, Inc. | Expanding an address space supported by a storage system |
US11128448B1 (en) | 2013-11-06 | 2021-09-21 | Pure Storage, Inc. | Quorum-aware secret sharing |
US10263770B2 (en) | 2013-11-06 | 2019-04-16 | Pure Storage, Inc. | Data protection in a storage system using external secrets |
US11706024B2 (en) | 2013-11-06 | 2023-07-18 | Pure Storage, Inc. | Secret distribution among storage devices |
US10887086B1 (en) | 2013-11-06 | 2021-01-05 | Pure Storage, Inc. | Protecting data in a storage system |
US11169745B1 (en) | 2013-11-06 | 2021-11-09 | Pure Storage, Inc. | Exporting an address space in a thin-provisioned storage device |
US10365858B2 (en) | 2013-11-06 | 2019-07-30 | Pure Storage, Inc. | Thin provisioning in a storage device |
US9471248B2 (en) | 2013-11-12 | 2016-10-18 | Netapp, Inc. | Snapshots and clones of volumes in a storage system |
US9152684B2 (en) | 2013-11-12 | 2015-10-06 | Netapp, Inc. | Snapshots and clones of volumes in a storage system |
US9201918B2 (en) | 2013-11-19 | 2015-12-01 | Netapp, Inc. | Dense tree volume metadata update logging and checkpointing |
US9405473B2 (en) | 2013-11-19 | 2016-08-02 | Netapp, Inc. | Dense tree volume metadata update logging and checkpointing |
US9804973B1 (en) | 2014-01-09 | 2017-10-31 | Pure Storage, Inc. | Using frequency domain to prioritize storage of metadata in a cache |
US10191857B1 (en) | 2014-01-09 | 2019-01-29 | Pure Storage, Inc. | Machine learning for metadata cache management |
US9727249B1 (en) * | 2014-02-06 | 2017-08-08 | SK Hynix Inc. | Selection of an open block in solid state storage systems with multiple open blocks |
US11386120B2 (en) | 2014-02-21 | 2022-07-12 | Netapp, Inc. | Data syncing in a distributed system |
US10656864B2 (en) | 2014-03-20 | 2020-05-19 | Pure Storage, Inc. | Data replication within a flash storage array |
US11847336B1 (en) | 2014-03-20 | 2023-12-19 | Pure Storage, Inc. | Efficient replication using metadata |
US10860547B2 (en) | 2014-04-23 | 2020-12-08 | Qumulo, Inc. | Data mobility, accessibility, and consistency in a data storage system |
US11461286B2 (en) | 2014-04-23 | 2022-10-04 | Qumulo, Inc. | Fair sampling in a hierarchical filesystem |
US10156986B2 (en) | 2014-05-12 | 2018-12-18 | The Research Foundation For The State University Of New York | Gang migration of virtual machines using cluster-wide deduplication |
US9823842B2 (en) | 2014-05-12 | 2017-11-21 | The Research Foundation For The State University Of New York | Gang migration of virtual machines using cluster-wide deduplication |
US11841984B1 (en) | 2014-06-03 | 2023-12-12 | Pure Storage, Inc. | Encrypting data with a unique key |
US9779268B1 (en) | 2014-06-03 | 2017-10-03 | Pure Storage, Inc. | Utilizing a non-repeating identifier to encrypt data |
US10607034B1 (en) | 2014-06-03 | 2020-03-31 | Pure Storage, Inc. | Utilizing an address-independent, non-repeating encryption key to encrypt data |
US10037440B1 (en) | 2014-06-03 | 2018-07-31 | Pure Storage, Inc. | Generating a unique encryption key |
US11036583B2 (en) | 2014-06-04 | 2021-06-15 | Pure Storage, Inc. | Rebuilding data across storage nodes |
US11399063B2 (en) | 2014-06-04 | 2022-07-26 | Pure Storage, Inc. | Network authentication for a storage system |
US11455291B2 (en) | 2014-06-24 | 2022-09-27 | Google Llc | Processing mutations for a remote database |
US10545948B2 (en) * | 2014-06-24 | 2020-01-28 | Google Llc | Processing mutations for a remote database |
US20150370844A1 (en) * | 2014-06-24 | 2015-12-24 | Google Inc. | Processing mutations for a remote database |
US10521417B2 (en) * | 2014-06-24 | 2019-12-31 | Google Llc | Processing mutations for a remote database |
US11003380B1 (en) | 2014-06-25 | 2021-05-11 | Pure Storage, Inc. | Minimizing data transfer during snapshot-based replication |
US11561720B2 (en) | 2014-06-25 | 2023-01-24 | Pure Storage, Inc. | Enabling access to a partially migrated dataset |
US10346084B1 (en) | 2014-06-25 | 2019-07-09 | Pure Storage, Inc. | Replication and snapshots for flash storage systems |
US10496556B1 (en) | 2014-06-25 | 2019-12-03 | Pure Storage, Inc. | Dynamic data protection within a flash storage system |
US11221970B1 (en) | 2014-06-25 | 2022-01-11 | Pure Storage, Inc. | Consistent application of protection group management policies across multiple storage systems |
US9817608B1 (en) | 2014-06-25 | 2017-11-14 | Pure Storage, Inc. | Replication and intermediate read-write state for mediums |
US12079143B2 (en) | 2014-06-25 | 2024-09-03 | Pure Storage, Inc. | Dynamically managing protection groups |
US20160026408A1 (en) * | 2014-07-24 | 2016-01-28 | Fusion-Io, Inc. | Storage device metadata synchronization |
US10296469B1 (en) | 2014-07-24 | 2019-05-21 | Pure Storage, Inc. | Access control in a flash storage system |
US9798728B2 (en) | 2014-07-24 | 2017-10-24 | Netapp, Inc. | System performing data deduplication using a dense tree data structure |
US10114576B2 (en) * | 2014-07-24 | 2018-10-30 | Sandisk Technologies Llc | Storage device metadata synchronization |
US10348675B1 (en) | 2014-07-24 | 2019-07-09 | Pure Storage, Inc. | Distributed management of a storage system |
US11080154B2 (en) | 2014-08-07 | 2021-08-03 | Pure Storage, Inc. | Recovering error corrected data |
US10983866B2 (en) | 2014-08-07 | 2021-04-20 | Pure Storage, Inc. | Mapping defective memory in a storage system |
US9864761B1 (en) | 2014-08-08 | 2018-01-09 | Pure Storage, Inc. | Read optimization operations in a storage system |
US9348517B2 (en) | 2014-08-28 | 2016-05-24 | International Business Machines Corporation | Using a migration threshold and a candidate list for cache management of sequential write storage |
US10380026B2 (en) * | 2014-09-04 | 2019-08-13 | Sandisk Technologies Llc | Generalized storage virtualization interface |
US20160070652A1 (en) * | 2014-09-04 | 2016-03-10 | Fusion-Io, Inc. | Generalized storage virtualization interface |
US11163448B1 (en) | 2014-09-08 | 2021-11-02 | Pure Storage, Inc. | Indicating total storage capacity for a storage device |
US10430079B2 (en) | 2014-09-08 | 2019-10-01 | Pure Storage, Inc. | Adjusting storage capacity in a computing system |
US11914861B2 (en) | 2014-09-08 | 2024-02-27 | Pure Storage, Inc. | Projecting capacity in a storage system based on data reduction levels |
US10133511B2 (en) | 2014-09-12 | 2018-11-20 | Netapp, Inc | Optimized segment cleaning technique |
US9671960B2 (en) | 2014-09-12 | 2017-06-06 | Netapp, Inc. | Rate matching technique for balancing segment cleaning and I/O workload |
US10210082B2 (en) | 2014-09-12 | 2019-02-19 | Netapp, Inc. | Rate matching technique for balancing segment cleaning and I/O workload |
US10257274B2 (en) * | 2014-09-15 | 2019-04-09 | Foundation for Research and Technology—Hellas (FORTH) | Tiered heterogeneous fast layer shared storage substrate apparatuses, methods, and systems |
US10313427B2 (en) | 2014-09-24 | 2019-06-04 | Intel Corporation | Contextual application management |
US11811619B2 (en) | 2014-10-02 | 2023-11-07 | Pure Storage, Inc. | Emulating a local interface to a remotely managed storage system |
US10999157B1 (en) | 2014-10-02 | 2021-05-04 | Pure Storage, Inc. | Remote cloud-based monitoring of storage systems |
US10164841B2 (en) | 2014-10-02 | 2018-12-25 | Pure Storage, Inc. | Cloud assist for storage systems |
US11444849B2 (en) | 2014-10-02 | 2022-09-13 | Pure Storage, Inc. | Remote emulation of a storage system |
US11442640B1 (en) | 2014-10-07 | 2022-09-13 | Pure Storage, Inc. | Utilizing unmapped and unknown states in a replicated storage system |
US10430282B2 (en) | 2014-10-07 | 2019-10-01 | Pure Storage, Inc. | Optimizing replication by distinguishing user and system write activity |
US10838640B1 (en) | 2014-10-07 | 2020-11-17 | Pure Storage, Inc. | Multi-source data replication |
US12079498B2 (en) | 2014-10-07 | 2024-09-03 | Pure Storage, Inc. | Allowing access to a partially replicated dataset |
US10114574B1 (en) | 2014-10-07 | 2018-10-30 | Pure Storage, Inc. | Optimizing storage allocation in a storage system |
WO2016081166A1 (en) * | 2014-11-18 | 2016-05-26 | Netapp, Inc. | N-way merge for updating volume metadata in a storage i/o stack |
US10365838B2 (en) | 2014-11-18 | 2019-07-30 | Netapp, Inc. | N-way merge technique for updating volume metadata in a storage I/O stack |
US9836229B2 (en) | 2014-11-18 | 2017-12-05 | Netapp, Inc. | N-way merge technique for updating volume metadata in a storage I/O stack |
US11662909B2 (en) | 2014-11-24 | 2023-05-30 | Pure Storage, Inc | Metadata management in a storage system |
US9977600B1 (en) | 2014-11-24 | 2018-05-22 | Pure Storage, Inc. | Optimizing flattening in a multi-level data structure |
US10254964B1 (en) | 2014-11-24 | 2019-04-09 | Pure Storage, Inc. | Managing mapping information in a storage system |
US9727485B1 (en) | 2014-11-24 | 2017-08-08 | Pure Storage, Inc. | Metadata rewrite and flatten optimization |
US9773007B1 (en) * | 2014-12-01 | 2017-09-26 | Pure Storage, Inc. | Performance improvements in a storage system |
US10482061B1 (en) * | 2014-12-01 | 2019-11-19 | Pure Storage, Inc. | Removing invalid data from a dataset in advance of copying the dataset |
US11061786B1 (en) | 2014-12-11 | 2021-07-13 | Pure Storage, Inc. | Cloud-based disaster recovery of a storage system |
US11775392B2 (en) | 2014-12-11 | 2023-10-03 | Pure Storage, Inc. | Indirect replication of a dataset |
US10838834B1 (en) | 2014-12-11 | 2020-11-17 | Pure Storage, Inc. | Managing read and write requests targeting a failed storage region in a storage system |
US10248516B1 (en) | 2014-12-11 | 2019-04-02 | Pure Storage, Inc. | Processing read and write requests during reconstruction in a storage system |
US10235065B1 (en) | 2014-12-11 | 2019-03-19 | Pure Storage, Inc. | Datasheet replication in a cloud computing environment |
US9864769B2 (en) | 2014-12-12 | 2018-01-09 | Pure Storage, Inc. | Storing data utilizing repeating pattern detection |
US10783131B1 (en) | 2014-12-12 | 2020-09-22 | Pure Storage, Inc. | Deduplicating patterned data in a storage system |
US11561949B1 (en) | 2014-12-12 | 2023-01-24 | Pure Storage, Inc. | Reconstructing deduplicated data |
US11803567B1 (en) | 2014-12-19 | 2023-10-31 | Pure Storage, Inc. | Restoration of a dataset from a cloud |
US10545987B2 (en) | 2014-12-19 | 2020-01-28 | Pure Storage, Inc. | Replication to the cloud |
US11947968B2 (en) | 2015-01-21 | 2024-04-02 | Pure Storage, Inc. | Efficient use of zone in a storage device |
US10296354B1 (en) | 2015-01-21 | 2019-05-21 | Pure Storage, Inc. | Optimized boot operations within a flash storage array |
US11169817B1 (en) | 2015-01-21 | 2021-11-09 | Pure Storage, Inc. | Optimizing a boot sequence in a storage system |
US9459998B2 (en) | 2015-02-04 | 2016-10-04 | International Business Machines Corporation | Operations interlock under dynamic relocation of storage |
US9720601B2 (en) | 2015-02-11 | 2017-08-01 | Netapp, Inc. | Load balancing technique for a storage array |
US11487438B1 (en) | 2015-02-18 | 2022-11-01 | Pure Storage, Inc. | Recovering allocated storage space in a storage system |
US9710165B1 (en) | 2015-02-18 | 2017-07-18 | Pure Storage, Inc. | Identifying volume candidates for space reclamation |
US11886707B2 (en) | 2015-02-18 | 2024-01-30 | Pure Storage, Inc. | Dataset space reclamation |
US10782892B1 (en) | 2015-02-18 | 2020-09-22 | Pure Storage, Inc. | Reclaiming storage space in a storage subsystem |
US10809921B1 (en) | 2015-02-18 | 2020-10-20 | Pure Storage, Inc. | Optimizing space reclamation in a storage system |
US20160253097A1 (en) * | 2015-02-27 | 2016-09-01 | Kyocera Document Solutions Inc. | Information processing device that extends service life of non-volatile semiconductor memory and recording medium |
US10437677B2 (en) * | 2015-02-27 | 2019-10-08 | Pure Storage, Inc. | Optimized distributed rebuilding within a dispersed storage network |
US11693985B2 (en) | 2015-02-27 | 2023-07-04 | Pure Storage, Inc. | Stand-by storage nodes in storage network |
US9875065B2 (en) * | 2015-02-27 | 2018-01-23 | Kyocera Document Solutions Inc. | Information processing device that extends service life of non-volatile semiconductor memory and recording medium |
US20170249246A1 (en) * | 2015-03-13 | 2017-08-31 | Hitachi Data Systems Corporation | Deduplication and garbage collection across logical databases |
US10853242B2 (en) * | 2015-03-13 | 2020-12-01 | Hitachi Vantara Llc | Deduplication and garbage collection across logical databases |
US20160283157A1 (en) * | 2015-03-23 | 2016-09-29 | Kabushiki Kaisha Toshiba | Memory device |
US10223037B2 (en) * | 2015-03-23 | 2019-03-05 | Toshiba Memory Corporation | Memory device including controller for controlling data writing using writing order confirmation request |
US9762460B2 (en) | 2015-03-24 | 2017-09-12 | Netapp, Inc. | Providing continuous context for operational information of a storage system |
US11188269B2 (en) | 2015-03-27 | 2021-11-30 | Pure Storage, Inc. | Configuration for multiple logical storage arrays |
US9710317B2 (en) | 2015-03-30 | 2017-07-18 | Netapp, Inc. | Methods to identify, handle and recover from suspect SSDS in a clustered flash array |
US10693964B2 (en) | 2015-04-09 | 2020-06-23 | Pure Storage, Inc. | Storage unit communication within a storage system |
US9411613B1 (en) | 2015-04-22 | 2016-08-09 | Ryft Systems, Inc. | Systems and methods for managing execution of specialized processors |
US9542244B2 (en) | 2015-04-22 | 2017-01-10 | Ryft Systems, Inc. | Systems and methods for performing primitive tasks using specialized processors |
US9411528B1 (en) * | 2015-04-22 | 2016-08-09 | Ryft Systems, Inc. | Storage management systems and methods |
US10073899B2 (en) * | 2015-05-18 | 2018-09-11 | Oracle International Corporation | Efficient storage using automatic data translation |
US20160342645A1 (en) * | 2015-05-18 | 2016-11-24 | Oracle International Corporation | Efficient storage using automatic data translation |
US11231956B2 (en) | 2015-05-19 | 2022-01-25 | Pure Storage, Inc. | Committed transactions in a storage system |
US10877942B2 (en) | 2015-06-17 | 2020-12-29 | Qumulo, Inc. | Filesystem capacity and performance metrics and visualizations |
US10310740B2 (en) | 2015-06-23 | 2019-06-04 | Pure Storage, Inc. | Aligning memory access operations to a geometry of a storage device |
US10564882B2 (en) | 2015-06-23 | 2020-02-18 | Pure Storage, Inc. | Writing data to storage device based on information about memory in the storage device |
US11010080B2 (en) | 2015-06-23 | 2021-05-18 | Pure Storage, Inc. | Layout based memory writes |
US11442919B2 (en) * | 2015-07-31 | 2022-09-13 | Accenture Global Services Limited | Data reliability analysis |
US10089180B2 (en) * | 2015-07-31 | 2018-10-02 | International Business Machines Corporation | Unfavorable storage growth rate abatement |
US10678642B2 (en) | 2015-07-31 | 2020-06-09 | Pure Storage, Inc. | Unfavorable storage growth rate abatement |
US9740566B2 (en) | 2015-07-31 | 2017-08-22 | Netapp, Inc. | Snapshot creation workflow |
US10565230B2 (en) | 2015-07-31 | 2020-02-18 | Netapp, Inc. | Technique for preserving efficiency for replication between clusters of a network |
US10394660B2 (en) | 2015-07-31 | 2019-08-27 | Netapp, Inc. | Snapshot restore workflow |
US20170046351A1 (en) * | 2015-08-10 | 2017-02-16 | International Business Machines Corporation | File migration in a hierarchical storage system |
US10169346B2 (en) * | 2015-08-10 | 2019-01-01 | International Business Machines Corporation | File migration in a hierarchical storage system |
US11249999B2 (en) | 2015-09-04 | 2022-02-15 | Pure Storage, Inc. | Memory efficient searching |
US11341136B2 (en) | 2015-09-04 | 2022-05-24 | Pure Storage, Inc. | Dynamically resizable structures for approximate membership queries |
US11269884B2 (en) | 2015-09-04 | 2022-03-08 | Pure Storage, Inc. | Dynamically resizable structures for approximate membership queries |
US20170085636A1 (en) * | 2015-09-21 | 2017-03-23 | Intel Corporation | Method and Apparatus for Dynamically Offloading Execution of Machine Code in an Application to a Virtual Machine |
US10305976B2 (en) * | 2015-09-21 | 2019-05-28 | Intel Corporation | Method and apparatus for dynamically offloading execution of machine code in an application to a virtual machine |
US11070382B2 (en) | 2015-10-23 | 2021-07-20 | Pure Storage, Inc. | Communication in a distributed architecture |
US20170126663A1 (en) * | 2015-10-29 | 2017-05-04 | Airbus Defence and Space GmbH | Forward-Secure Crash-Resilient Logging Device |
US10511588B2 (en) * | 2015-10-29 | 2019-12-17 | Airbus Defence and Space GmbH | Forward-secure crash-resilient logging device |
TWI687806B (en) * | 2015-10-29 | 2020-03-11 | 韓商愛思開海力士有限公司 | Data storage device and operating method thereof |
US10929043B2 (en) | 2015-12-02 | 2021-02-23 | Netapp, Inc. | Space reservation for distributed storage systems |
US10365848B2 (en) * | 2015-12-02 | 2019-07-30 | Netapp, Inc. | Space reservation for distributed storage systems |
US10838923B1 (en) * | 2015-12-18 | 2020-11-17 | EMC IP Holding Company LLC | Poor deduplication identification |
US10222984B1 (en) * | 2015-12-31 | 2019-03-05 | EMC IP Holding Company LLC | Managing multi-granularity flash translation layers in solid state drives |
US10108547B2 (en) * | 2016-01-06 | 2018-10-23 | Netapp, Inc. | High performance and memory efficient metadata caching |
US9965398B2 (en) | 2016-01-12 | 2018-05-08 | Samsung Electronics Co., Ltd. | Method and apparatus for simplified nameless writes using a virtual address table |
US10956403B2 (en) | 2016-02-01 | 2021-03-23 | International Business Machines Corporation | Verifying data consistency |
US10176216B2 (en) | 2016-02-01 | 2019-01-08 | International Business Machines Corporation | Verifying data consistency |
US10474636B2 (en) * | 2016-03-25 | 2019-11-12 | Amazon Technologies, Inc. | Block allocation for low latency file systems |
US10437521B2 (en) * | 2016-03-25 | 2019-10-08 | Netapp, Inc. | Consistent method of indexing file system information |
US20170277709A1 (en) * | 2016-03-25 | 2017-09-28 | Amazon Technologies, Inc. | Block allocation for low latency file systems |
US11061865B2 (en) | 2016-03-25 | 2021-07-13 | Amazon Technologies, Inc. | Block allocation for low latency file systems |
US20170277739A1 (en) * | 2016-03-25 | 2017-09-28 | Netapp, Inc. | Consistent method of indexing file system information |
US11977832B2 (en) | 2016-03-28 | 2024-05-07 | Microsoft Technology Licensing, Llc | Map note annotations at corresponding geographic locations |
US10013425B1 (en) * | 2016-03-31 | 2018-07-03 | EMC IP Holding Company LLC | Space-efficient persistent block reservation optimized for compression |
US10929022B2 (en) | 2016-04-25 | 2021-02-23 | Netapp. Inc. | Space savings reporting for storage system supporting snapshot and clones |
US11704036B2 (en) | 2016-05-02 | 2023-07-18 | Pure Storage, Inc. | Deduplication decision based on metrics |
US10452297B1 (en) | 2016-05-02 | 2019-10-22 | Pure Storage, Inc. | Generating and optimizing summary index levels in a deduplication storage system |
US11169706B2 (en) | 2016-05-26 | 2021-11-09 | Nutanix, Inc. | Rebalancing storage I/O workloads by storage controller selection and redirection |
US10838620B2 (en) | 2016-05-26 | 2020-11-17 | Nutanix, Inc. | Efficient scaling of distributed storage systems |
US11070628B1 (en) | 2016-05-26 | 2021-07-20 | Nutanix, Inc. | Efficient scaling of computing resources by accessing distributed storage targets |
CN107564558A (en) * | 2016-06-30 | 2018-01-09 | 希捷科技有限公司 | Realize scattered atom I/O write-ins |
US9977626B2 (en) | 2016-06-30 | 2018-05-22 | Seagate Technology Llc | Implementing scattered atomic I/O writes |
US20180004649A1 (en) * | 2016-07-01 | 2018-01-04 | Intel Corporation | Techniques to Format a Persistent Memory File |
US10776034B2 (en) | 2016-07-26 | 2020-09-15 | Pure Storage, Inc. | Adaptive data migration |
US10997098B2 (en) | 2016-09-20 | 2021-05-04 | Netapp, Inc. | Quality of service policy sets |
US11886363B2 (en) | 2016-09-20 | 2024-01-30 | Netapp, Inc. | Quality of service policy sets |
US11327910B2 (en) | 2016-09-20 | 2022-05-10 | Netapp, Inc. | Quality of service policy sets |
US11513902B1 (en) * | 2016-09-29 | 2022-11-29 | EMC IP Holding Company LLC | System and method of dynamic system resource allocation for primary storage systems with virtualized embedded data protection |
US10613974B2 (en) | 2016-10-04 | 2020-04-07 | Pure Storage, Inc. | Peer-to-peer non-volatile random-access memory |
US10162523B2 (en) | 2016-10-04 | 2018-12-25 | Pure Storage, Inc. | Migrating data between volumes using virtual copy operation |
US11029853B2 (en) | 2016-10-04 | 2021-06-08 | Pure Storage, Inc. | Dynamic segment allocation for write requests by a storage system |
US11385999B2 (en) | 2016-10-04 | 2022-07-12 | Pure Storage, Inc. | Efficient scaling and improved bandwidth of storage system |
US10191662B2 (en) | 2016-10-04 | 2019-01-29 | Pure Storage, Inc. | Dynamic allocation of segments in a flash storage system |
US10756816B1 (en) | 2016-10-04 | 2020-08-25 | Pure Storage, Inc. | Optimized fibre channel and non-volatile memory express access |
US10545861B2 (en) | 2016-10-04 | 2020-01-28 | Pure Storage, Inc. | Distributed integrated high-speed solid-state non-volatile random-access memory |
US11036393B2 (en) | 2016-10-04 | 2021-06-15 | Pure Storage, Inc. | Migrating data between volumes using virtual copy operation |
US10656850B2 (en) | 2016-10-28 | 2020-05-19 | Pure Storage, Inc. | Efficient volume replication in a storage system |
US10185505B1 (en) | 2016-10-28 | 2019-01-22 | Pure Storage, Inc. | Reading a portion of data to replicate a volume based on sequence numbers |
US11119657B2 (en) | 2016-10-28 | 2021-09-14 | Pure Storage, Inc. | Dynamic access in flash system |
US11640244B2 (en) | 2016-10-28 | 2023-05-02 | Pure Storage, Inc. | Intelligent block deallocation verification |
US10359942B2 (en) | 2016-10-31 | 2019-07-23 | Pure Storage, Inc. | Deduplication aware scalable content placement |
US11119656B2 (en) | 2016-10-31 | 2021-09-14 | Pure Storage, Inc. | Reducing data distribution inefficiencies |
US10719480B1 (en) * | 2016-11-17 | 2020-07-21 | EMC IP Holding Company LLC | Embedded data valuation and metadata binding |
US20190243818A1 (en) * | 2016-12-09 | 2019-08-08 | Qumulo, Inc. | Managing storage quotas in a shared storage system |
US11256682B2 (en) * | 2016-12-09 | 2022-02-22 | Qumulo, Inc. | Managing storage quotas in a shared storage system |
US10095729B2 (en) * | 2016-12-09 | 2018-10-09 | Qumulo, Inc. | Managing storage quotas in a shared storage system |
US20180165321A1 (en) * | 2016-12-09 | 2018-06-14 | Qumulo, Inc. | Managing storage quotas in a shared storage system |
EP3333732B1 (en) * | 2016-12-09 | 2023-03-08 | Qumulo, Inc. | Managing storage quotas in a shared storage system |
US11550481B2 (en) | 2016-12-19 | 2023-01-10 | Pure Storage, Inc. | Efficiently writing data in a zoned drive storage system |
US11054996B2 (en) | 2016-12-19 | 2021-07-06 | Pure Storage, Inc. | Efficient writing in a flash storage system |
US20200073559A1 (en) * | 2016-12-19 | 2020-03-05 | Pure Storage, Inc. | Efficient writing in a flash storage system |
US10452290B2 (en) | 2016-12-19 | 2019-10-22 | Pure Storage, Inc. | Block consolidation in a direct-mapped flash storage system |
CN110023896A (en) * | 2016-12-19 | 2019-07-16 | 净睿存储股份有限公司 | The merged block in flash-memory storage system directly mapped |
WO2018118453A1 (en) * | 2016-12-19 | 2018-06-28 | Pure Storage, Inc. | Block consolidation in a direct-mapped flash storage system |
US11093146B2 (en) | 2017-01-12 | 2021-08-17 | Pure Storage, Inc. | Automatic load rebalancing of a write group |
US10482049B2 (en) * | 2017-02-03 | 2019-11-19 | Lenovo Enterprise Solutions (Singapore) Pte. Ltd. | Configuring NVMe devices for redundancy and scaling |
US10318202B2 (en) * | 2017-03-20 | 2019-06-11 | Via Technologies, Inc. | Non-volatile memory apparatus and data deduplication method thereof |
US11449485B1 (en) | 2017-03-30 | 2022-09-20 | Pure Storage, Inc. | Sequence invalidation consolidation in a storage system |
US11403019B2 (en) | 2017-04-21 | 2022-08-02 | Pure Storage, Inc. | Deduplication-aware per-tenant encryption |
US12045487B2 (en) | 2017-04-21 | 2024-07-23 | Pure Storage, Inc. | Preserving data deduplication in a multi-tenant storage system |
US11163721B1 (en) * | 2017-04-25 | 2021-11-02 | EMC IP Holding Company LLC | Snapshot change list and file system indexing |
US10261708B1 (en) * | 2017-04-26 | 2019-04-16 | EMC IP Holding Company LLC | Host data replication allocating single memory buffers to store multiple buffers of received host data and to internally process the received host data |
US10944671B2 (en) | 2017-04-27 | 2021-03-09 | Pure Storage, Inc. | Efficient data forwarding in a networked device |
US10146694B1 (en) * | 2017-04-28 | 2018-12-04 | EMC IP Holding Company LLC | Persistent cache layer in a distributed file system |
US11704063B2 (en) | 2017-05-31 | 2023-07-18 | Fmad Engineering Kabushiki Gaisha | Efficient storage architecture for high speed packet capture |
US11836385B2 (en) | 2017-05-31 | 2023-12-05 | Fmad Engineering Kabushiki Gaisha | High speed data packet flow processing |
US11392317B2 (en) | 2017-05-31 | 2022-07-19 | Fmad Engineering Kabushiki Gaisha | High speed data packet flow processing |
US11128740B2 (en) | 2017-05-31 | 2021-09-21 | Fmad Engineering Kabushiki Gaisha | High-speed data packet generator |
US11249688B2 (en) | 2017-05-31 | 2022-02-15 | Fmad Engineering Kabushiki Gaisha | High-speed data packet capture and storage with playback capabilities |
US11036438B2 (en) * | 2017-05-31 | 2021-06-15 | Fmad Engineering Kabushiki Gaisha | Efficient storage architecture for high speed packet capture |
US11681470B2 (en) | 2017-05-31 | 2023-06-20 | Fmad Engineering Kabushiki Gaisha | High-speed replay of captured data packets |
US10990297B1 (en) * | 2017-07-21 | 2021-04-27 | EMC IP Holding Company LLC | Checkpointing of user data and metadata in a non-atomic persistent storage environment |
US10402266B1 (en) | 2017-07-31 | 2019-09-03 | Pure Storage, Inc. | Redundant array of independent disks in a direct-mapped flash storage system |
US11093324B2 (en) | 2017-07-31 | 2021-08-17 | Pure Storage, Inc. | Dynamic data verification and recovery in a storage system |
US10901660B1 (en) | 2017-08-31 | 2021-01-26 | Pure Storage, Inc. | Volume compressed header identification |
US11921908B2 (en) | 2017-08-31 | 2024-03-05 | Pure Storage, Inc. | Writing data to compressed and encrypted volumes |
US10831935B2 (en) | 2017-08-31 | 2020-11-10 | Pure Storage, Inc. | Encryption management with host-side data reduction |
US11436378B2 (en) | 2017-08-31 | 2022-09-06 | Pure Storage, Inc. | Block-based compression |
US11520936B1 (en) | 2017-08-31 | 2022-12-06 | Pure Storage, Inc. | Reducing metadata for volumes |
US10776202B1 (en) | 2017-09-22 | 2020-09-15 | Pure Storage, Inc. | Drive, blade, or data shard decommission via RAID geometry shrinkage |
CN110019248A (en) * | 2017-09-29 | 2019-07-16 | 英特尔公司 | Technology for the more storage format database access of dynamic |
US11176091B2 (en) * | 2017-09-29 | 2021-11-16 | Intel Corporation | Techniques for dynamic multi-storage format database access |
US10789211B1 (en) | 2017-10-04 | 2020-09-29 | Pure Storage, Inc. | Feature-based deduplication |
US11537563B2 (en) | 2017-10-04 | 2022-12-27 | Pure Storage, Inc. | Determining content-dependent deltas between data sectors |
US10649981B2 (en) | 2017-10-23 | 2020-05-12 | Vmware, Inc. | Direct access to object state in a shared log |
US20190129982A1 (en) * | 2017-10-30 | 2019-05-02 | Nicira, Inc. | Just-in-time multi-indexed tables in a shared log |
US11392567B2 (en) * | 2017-10-30 | 2022-07-19 | Vmware, Inc. | Just-in-time multi-indexed tables in a shared log |
US20230280910A1 (en) * | 2017-10-31 | 2023-09-07 | Pure Storage, Inc. | Allocation Of Differing Erase Block Sizes |
US10884919B2 (en) | 2017-10-31 | 2021-01-05 | Pure Storage, Inc. | Memory management in a storage system |
US11741003B2 (en) * | 2017-11-17 | 2023-08-29 | Pure Storage, Inc. | Write granularity for storage system |
US11275681B1 (en) | 2017-11-17 | 2022-03-15 | Pure Storage, Inc. | Segmented write requests |
US10860475B1 (en) * | 2017-11-17 | 2020-12-08 | Pure Storage, Inc. | Hybrid flash translation layer |
US20220164281A1 (en) * | 2017-11-17 | 2022-05-26 | Pure Storage, Inc. | Write granularity for storage system |
US10719265B1 (en) * | 2017-12-08 | 2020-07-21 | Pure Storage, Inc. | Centralized, quorum-aware handling of device reservation requests in a storage system |
US20190188040A1 (en) * | 2017-12-19 | 2019-06-20 | Western Digital Technologies, Inc. | Multi-constraint dynamic resource manager |
US11030007B2 (en) * | 2017-12-19 | 2021-06-08 | Western Digital Technologies, Inc | Multi-constraint dynamic resource manager |
US11010233B1 (en) | 2018-01-18 | 2021-05-18 | Pure Storage, Inc | Hardware-based system monitoring |
US10970395B1 (en) | 2018-01-18 | 2021-04-06 | Pure Storage, Inc | Security threat monitoring for a storage system |
US11734097B1 (en) | 2018-01-18 | 2023-08-22 | Pure Storage, Inc. | Machine learning-based hardware component monitoring |
US11144638B1 (en) | 2018-01-18 | 2021-10-12 | Pure Storage, Inc. | Method for storage system detection and alerting on potential malicious action |
US10915813B2 (en) | 2018-01-31 | 2021-02-09 | Pure Storage, Inc. | Search acceleration for artificial intelligence |
US11036596B1 (en) | 2018-02-18 | 2021-06-15 | Pure Storage, Inc. | System for delaying acknowledgements on open NAND locations until durability has been confirmed |
US11249831B2 (en) | 2018-02-18 | 2022-02-15 | Pure Storage, Inc. | Intelligent durability acknowledgment in a storage system |
US11494109B1 (en) | 2018-02-22 | 2022-11-08 | Pure Storage, Inc. | Erase block trimming for heterogenous flash memory storage devices |
US10929443B2 (en) * | 2018-02-23 | 2021-02-23 | Microsoft Technology Licensing, Llc | Location and context for computer file system |
US20190266260A1 (en) * | 2018-02-23 | 2019-08-29 | Microsoft Technology Licensing, Llc | Location and context for computer file system |
US10657068B2 (en) | 2018-03-22 | 2020-05-19 | Intel Corporation | Techniques for an all persistent memory file system |
US11281501B2 (en) * | 2018-04-04 | 2022-03-22 | Micron Technology, Inc. | Determination of workload distribution across processors in a memory system |
US11934322B1 (en) | 2018-04-05 | 2024-03-19 | Pure Storage, Inc. | Multiple encryption keys on storage drives |
US11995336B2 (en) | 2018-04-25 | 2024-05-28 | Pure Storage, Inc. | Bucket views |
US11385792B2 (en) | 2018-04-27 | 2022-07-12 | Pure Storage, Inc. | High availability controller pair transitioning |
US11327655B2 (en) | 2018-04-27 | 2022-05-10 | Pure Storage, Inc. | Efficient resource upgrade |
US10678433B1 (en) | 2018-04-27 | 2020-06-09 | Pure Storage, Inc. | Resource-preserving system upgrade |
US20190042097A1 (en) * | 2018-05-18 | 2019-02-07 | Intel Corporation | Non-volatile memory cloning with hardware copy-on-write support |
US10725690B2 (en) * | 2018-05-18 | 2020-07-28 | Intel Corporation | Non-volatile memory cloning with hardware copy-on-write support |
US11030111B2 (en) | 2018-05-23 | 2021-06-08 | International Business Machines Corporation | Representing an address space of unequal granularity and alignment |
US10599580B2 (en) * | 2018-05-23 | 2020-03-24 | International Business Machines Corporation | Representing an address space of unequal granularity and alignment |
US10678436B1 (en) | 2018-05-29 | 2020-06-09 | Pure Storage, Inc. | Using a PID controller to opportunistically compress more data during garbage collection |
US11436023B2 (en) | 2018-05-31 | 2022-09-06 | Pure Storage, Inc. | Mechanism for updating host file system and flash translation layer based on underlying NAND technology |
US11360936B2 (en) | 2018-06-08 | 2022-06-14 | Qumulo, Inc. | Managing per object snapshot coverage in filesystems |
US10776046B1 (en) | 2018-06-08 | 2020-09-15 | Pure Storage, Inc. | Optimized non-uniform memory access |
US11281577B1 (en) | 2018-06-19 | 2022-03-22 | Pure Storage, Inc. | Garbage collection tuning for low drive wear |
US11869586B2 (en) | 2018-07-11 | 2024-01-09 | Pure Storage, Inc. | Increased data protection by recovering data from partially-failed solid-state devices |
CN110780809A (en) * | 2018-07-31 | 2020-02-11 | 爱思开海力士有限公司 | Apparatus and method for managing metadata for interfacing of multiple memory systems |
KR20210019577A (en) * | 2018-08-10 | 2021-02-22 | 마이크론 테크놀로지, 인크. | Data validity tracking in non-volatile memory |
US10795828B2 (en) | 2018-08-10 | 2020-10-06 | Micron Technology, Inc. | Data validity tracking in a non-volatile memory |
KR102281750B1 (en) | 2018-08-10 | 2021-07-28 | 마이크론 테크놀로지, 인크. | Tracking data validity in non-volatile memory |
US11586561B2 (en) | 2018-08-10 | 2023-02-21 | Micron Technology, Inc. | Data validity tracking in a non-volatile memory |
WO2020033167A1 (en) * | 2018-08-10 | 2020-02-13 | Micron Technology, Inc. | Data validity tracking in a non-volatile memory |
US11194759B2 (en) | 2018-09-06 | 2021-12-07 | Pure Storage, Inc. | Optimizing local data relocation operations of a storage device of a storage system |
US11133076B2 (en) | 2018-09-06 | 2021-09-28 | Pure Storage, Inc. | Efficient relocation of data between storage devices of a storage system |
US10728255B2 (en) * | 2018-09-24 | 2020-07-28 | Nutanix, Inc. | System and method for protection of entities across availability zones |
US20200099692A1 (en) * | 2018-09-24 | 2020-03-26 | Nutanix, Inc. | System and method for protection of entities across availability zones |
US10846216B2 (en) | 2018-10-25 | 2020-11-24 | Pure Storage, Inc. | Scalable garbage collection |
US11216369B2 (en) | 2018-10-25 | 2022-01-04 | Pure Storage, Inc. | Optimizing garbage collection using check pointed data sets |
US12019764B2 (en) | 2018-10-26 | 2024-06-25 | Pure Storage, Inc. | Modifying encryption in a storage system |
US11113409B2 (en) | 2018-10-26 | 2021-09-07 | Pure Storage, Inc. | Efficient rekey in a transparent decrypting storage array |
US11243909B2 (en) * | 2018-10-31 | 2022-02-08 | Alibaba Group Holding Limited | Journaling overhead reduction with remapping interface |
US20200134041A1 (en) * | 2018-10-31 | 2020-04-30 | Alibaba Group Holding Limited | Journaling overhead reduction with remapping interface |
US11494241B2 (en) | 2018-10-31 | 2022-11-08 | Nutanix, Inc. | Multi-stage IOPS allocation |
US10922142B2 (en) | 2018-10-31 | 2021-02-16 | Nutanix, Inc. | Multi-stage IOPS allocation |
US11347699B2 (en) | 2018-12-20 | 2022-05-31 | Qumulo, Inc. | File system cache tiers |
US11194473B1 (en) | 2019-01-23 | 2021-12-07 | Pure Storage, Inc. | Programming frequently read data to low latency portions of a solid-state storage array |
US11151092B2 (en) | 2019-01-30 | 2021-10-19 | Qumulo, Inc. | Data replication in distributed file systems |
US10614033B1 (en) | 2019-01-30 | 2020-04-07 | Qumulo, Inc. | Client aware pre-fetch policy scoring system |
US11588633B1 (en) | 2019-03-15 | 2023-02-21 | Pure Storage, Inc. | Decommissioning keys in a decryption storage system |
CN111708716A (en) * | 2019-03-18 | 2020-09-25 | 爱思开海力士有限公司 | Data storage device, computing device having the same, and method of operation |
US11334254B2 (en) | 2019-03-29 | 2022-05-17 | Pure Storage, Inc. | Reliability based flash page sizing |
US11397674B1 (en) | 2019-04-03 | 2022-07-26 | Pure Storage, Inc. | Optimizing garbage collection across heterogeneous flash devices |
US11775189B2 (en) | 2019-04-03 | 2023-10-03 | Pure Storage, Inc. | Segment level heterogeneity |
US10990480B1 (en) | 2019-04-05 | 2021-04-27 | Pure Storage, Inc. | Performance of RAID rebuild operations by a storage group controller of a storage system |
US12087382B2 (en) | 2019-04-11 | 2024-09-10 | Pure Storage, Inc. | Adaptive threshold for bad flash memory blocks |
US11099986B2 (en) | 2019-04-12 | 2021-08-24 | Pure Storage, Inc. | Efficient transfer of memory contents |
US11789870B2 (en) * | 2019-05-24 | 2023-10-17 | Microsoft Technology Licensing, Llc | Runtime allocation and utilization of persistent memory as volatile memory |
US20240078181A1 (en) * | 2019-05-24 | 2024-03-07 | Microsoft Technology Licensing, Llc | Runtime allocation and utilization of persistent memory as volatile memory |
US11487665B2 (en) | 2019-06-05 | 2022-11-01 | Pure Storage, Inc. | Tiered caching of data in a storage system |
US11281394B2 (en) | 2019-06-24 | 2022-03-22 | Pure Storage, Inc. | Replication across partitioning schemes in a distributed storage system |
US10929046B2 (en) | 2019-07-09 | 2021-02-23 | Pure Storage, Inc. | Identifying and relocating hot data to a cache determined with read velocity based on a threshold stored at a storage device |
US12135888B2 (en) | 2019-07-10 | 2024-11-05 | Pure Storage, Inc. | Intelligent grouping of data based on expected lifespan |
US11422751B2 (en) | 2019-07-18 | 2022-08-23 | Pure Storage, Inc. | Creating a virtual storage system |
US11086713B1 (en) | 2019-07-23 | 2021-08-10 | Pure Storage, Inc. | Optimized end-to-end integrity storage system |
US11963321B2 (en) | 2019-09-11 | 2024-04-16 | Pure Storage, Inc. | Low profile latching mechanism |
US11403043B2 (en) | 2019-10-15 | 2022-08-02 | Pure Storage, Inc. | Efficient data compression by grouping similar data within a data segment |
US10725977B1 (en) | 2019-10-21 | 2020-07-28 | Qumulo, Inc. | Managing file system state during replication jobs |
US11409696B2 (en) | 2019-11-01 | 2022-08-09 | EMC IP Holding Company LLC | Methods and systems for utilizing a unified namespace |
US11288238B2 (en) | 2019-11-01 | 2022-03-29 | EMC IP Holding Company LLC | Methods and systems for logging data transactions and managing hash tables |
US11392464B2 (en) | 2019-11-01 | 2022-07-19 | EMC IP Holding Company LLC | Methods and systems for mirroring and failover of nodes |
US11294725B2 (en) | 2019-11-01 | 2022-04-05 | EMC IP Holding Company LLC | Method and system for identifying a preferred thread pool associated with a file system |
US11741056B2 (en) * | 2019-11-01 | 2023-08-29 | EMC IP Holding Company LLC | Methods and systems for allocating free space in a sparse file system |
US11288211B2 (en) | 2019-11-01 | 2022-03-29 | EMC IP Holding Company LLC | Methods and systems for optimizing storage resources |
US11409720B2 (en) * | 2019-11-13 | 2022-08-09 | Western Digital Technologies, Inc. | Metadata reduction in a distributed storage system |
US12050683B2 (en) * | 2019-11-22 | 2024-07-30 | Pure Storage, Inc. | Selective control of a data synchronization setting of a storage system based on a possible ransomware attack against the storage system |
US11500788B2 (en) | 2019-11-22 | 2022-11-15 | Pure Storage, Inc. | Logical address based authorization of operations with respect to a storage system |
US11645162B2 (en) | 2019-11-22 | 2023-05-09 | Pure Storage, Inc. | Recovery point determination for data restoration in a storage system |
US20220050898A1 (en) * | 2019-11-22 | 2022-02-17 | Pure Storage, Inc. | Selective Control of a Data Synchronization Setting of a Storage System Based on a Possible Ransomware Attack Against the Storage System |
US11941116B2 (en) | 2019-11-22 | 2024-03-26 | Pure Storage, Inc. | Ransomware-based data protection parameter modification |
US11675898B2 (en) | 2019-11-22 | 2023-06-13 | Pure Storage, Inc. | Recovery dataset management for security threat monitoring |
US11651075B2 (en) | 2019-11-22 | 2023-05-16 | Pure Storage, Inc. | Extensible attack monitoring by a storage system |
US11687418B2 (en) | 2019-11-22 | 2023-06-27 | Pure Storage, Inc. | Automatic generation of recovery plans specific to individual storage elements |
US11625481B2 (en) | 2019-11-22 | 2023-04-11 | Pure Storage, Inc. | Selective throttling of operations potentially related to a security threat to a storage system |
US11520907B1 (en) | 2019-11-22 | 2022-12-06 | Pure Storage, Inc. | Storage system snapshot retention based on encrypted data |
US11615185B2 (en) | 2019-11-22 | 2023-03-28 | Pure Storage, Inc. | Multi-layer security threat detection for a storage system |
US11341236B2 (en) | 2019-11-22 | 2022-05-24 | Pure Storage, Inc. | Traffic-based detection of a security threat to a storage system |
US12079502B2 (en) | 2019-11-22 | 2024-09-03 | Pure Storage, Inc. | Storage element attribute-based determination of a data protection policy for use within a storage system |
US11720691B2 (en) | 2019-11-22 | 2023-08-08 | Pure Storage, Inc. | Encryption indicator-based retention of recovery datasets for a storage system |
US11720714B2 (en) | 2019-11-22 | 2023-08-08 | Pure Storage, Inc. | Inter-I/O relationship based detection of a security threat to a storage system |
US11720692B2 (en) | 2019-11-22 | 2023-08-08 | Pure Storage, Inc. | Hardware token based management of recovery datasets for a storage system |
US12079356B2 (en) | 2019-11-22 | 2024-09-03 | Pure Storage, Inc. | Measurement interval anomaly detection-based generation of snapshots |
US12079333B2 (en) | 2019-11-22 | 2024-09-03 | Pure Storage, Inc. | Independent security threat detection and remediation by storage systems in a synchronous replication arrangement |
US11755751B2 (en) | 2019-11-22 | 2023-09-12 | Pure Storage, Inc. | Modify access restrictions in response to a possible attack against data stored by a storage system |
US11657146B2 (en) | 2019-11-22 | 2023-05-23 | Pure Storage, Inc. | Compressibility metric-based detection of a ransomware threat to a storage system |
US12067118B2 (en) | 2019-11-22 | 2024-08-20 | Pure Storage, Inc. | Detection of writing to a non-header portion of a file as an indicator of a possible ransomware attack against a storage system |
US12050689B2 (en) | 2019-11-22 | 2024-07-30 | Pure Storage, Inc. | Host anomaly-based generation of snapshots |
US11657155B2 (en) | 2019-11-22 | 2023-05-23 | Pure Storage, Inc | Snapshot delta metric based determination of a possible ransomware attack against data maintained by a storage system |
CN111026615A (en) * | 2019-12-20 | 2020-04-17 | 浪潮电子信息产业股份有限公司 | Method and device for acquiring logical volume list, electronic equipment and storage medium |
US10795796B1 (en) | 2020-01-24 | 2020-10-06 | Qumulo, Inc. | Predictive performance analysis for file systems |
US11734147B2 (en) | 2020-01-24 | 2023-08-22 | Qumulo Inc. | Predictive performance analysis for file systems |
US11294718B2 (en) | 2020-01-24 | 2022-04-05 | Qumulo, Inc. | Managing throughput fairness and quality of service in file systems |
US10860372B1 (en) | 2020-01-24 | 2020-12-08 | Qumulo, Inc. | Managing throughput fairness and quality of service in file systems |
US11151001B2 (en) | 2020-01-28 | 2021-10-19 | Qumulo, Inc. | Recovery checkpoints for distributed file systems |
US11372735B2 (en) | 2020-01-28 | 2022-06-28 | Qumulo, Inc. | Recovery checkpoints for distributed file systems |
US10860414B1 (en) | 2020-01-31 | 2020-12-08 | Qumulo, Inc. | Change notification in distributed file systems |
US11928084B2 (en) * | 2020-02-28 | 2024-03-12 | Nebulon, Inc. | Metadata store in multiple reusable append logs |
US20230111251A1 (en) * | 2020-02-28 | 2023-04-13 | Nebulon, Inc. | Metadata store in multiple reusable append logs |
US11640371B2 (en) * | 2020-03-12 | 2023-05-02 | Western Digital Technologies, Inc. | Snapshot management in partitioned storage |
KR102589609B1 (en) | 2020-03-12 | 2023-10-13 | 웨스턴 디지털 테크놀로지스, 인코포레이티드 | Snapshot management in partitioned storage |
KR20220119348A (en) * | 2020-03-12 | 2022-08-29 | 웨스턴 디지털 테크놀로지스, 인코포레이티드 | Snapshot management in partitioned storage |
CN113448920A (en) * | 2020-03-27 | 2021-09-28 | 伊姆西Ip控股有限责任公司 | Method, apparatus and computer program product for managing indexes in a storage system |
US11243932B2 (en) * | 2020-03-27 | 2022-02-08 | EMC IP Holding Company LLC | Method, device, and computer program product for managing index in storage system |
US10936551B1 (en) | 2020-03-30 | 2021-03-02 | Qumulo, Inc. | Aggregating alternate data stream metrics for file systems |
US10936538B1 (en) | 2020-03-30 | 2021-03-02 | Qumulo, Inc. | Fair sampling of alternate data stream metrics for file systems |
US11347709B2 (en) | 2020-04-01 | 2022-05-31 | Sap Se | Hierarchical metadata enhancements for a memory management system |
US11347743B2 (en) * | 2020-04-01 | 2022-05-31 | Sap Se | Metadata converter and memory management system |
CN113535597A (en) * | 2020-04-14 | 2021-10-22 | 阿里巴巴集团控股有限公司 | Memory management method, memory management unit and Internet of things equipment |
US20230214322A1 (en) * | 2020-05-18 | 2023-07-06 | Cambricon (Xi'an) Semiconductor Co., Ltd. | Method and device for allocating storage addresses for data in memory |
WO2021254598A1 (en) * | 2020-06-16 | 2021-12-23 | Huawei Technologies Co., Ltd. | Devices for memory management |
US11775481B2 (en) | 2020-09-30 | 2023-10-03 | Qumulo, Inc. | User interfaces for managing distributed file systems |
US20240089313A1 (en) * | 2020-10-28 | 2024-03-14 | Vivo Mobile Communication Co., Ltd. | File sending method and apparatus, and electronic device |
US11941265B2 (en) | 2021-01-22 | 2024-03-26 | EMC IP Holding Company LLP | Method, electronic equipment and computer program product for managing metadata storage unit |
US11372819B1 (en) | 2021-01-28 | 2022-06-28 | Qumulo, Inc. | Replicating files in distributed file systems using object-based data storage |
US11157458B1 (en) | 2021-01-28 | 2021-10-26 | Qumulo, Inc. | Replicating files in distributed file systems using object-based data storage |
US11461241B2 (en) | 2021-03-03 | 2022-10-04 | Qumulo, Inc. | Storage tier management for file systems |
US11435901B1 (en) | 2021-03-16 | 2022-09-06 | Qumulo, Inc. | Backup services for distributed file systems in cloud computing environments |
US11132126B1 (en) | 2021-03-16 | 2021-09-28 | Qumulo, Inc. | Backup services for distributed file systems in cloud computing environments |
US11567660B2 (en) | 2021-03-16 | 2023-01-31 | Qumulo, Inc. | Managing cloud storage for distributed file systems |
US11875193B2 (en) | 2021-03-25 | 2024-01-16 | Oracle International Corporation | Tracking frame states of call stack frames including colorless roots |
US11567704B2 (en) | 2021-04-29 | 2023-01-31 | EMC IP Holding Company LLC | Method and systems for storing data in a storage pool using memory semantics with applications interacting with emulated block devices |
US11604610B2 (en) * | 2021-04-29 | 2023-03-14 | EMC IP Holding Company LLC | Methods and systems for storing data in a distributed system using offload components |
US11892983B2 (en) | 2021-04-29 | 2024-02-06 | EMC IP Holding Company LLC | Methods and systems for seamless tiering in a distributed storage system |
US11579976B2 (en) | 2021-04-29 | 2023-02-14 | EMC IP Holding Company LLC | Methods and systems parallel raid rebuild in a distributed storage system |
US11669259B2 (en) | 2021-04-29 | 2023-06-06 | EMC IP Holding Company LLC | Methods and systems for methods and systems for in-line deduplication in a distributed storage system |
US20220350543A1 (en) * | 2021-04-29 | 2022-11-03 | EMC IP Holding Company LLC | Methods and systems for storing data in a distributed system using offload components |
US12093435B2 (en) | 2021-04-29 | 2024-09-17 | Dell Products, L.P. | Methods and systems for securing data in a distributed storage system |
US11740822B2 (en) | 2021-04-29 | 2023-08-29 | EMC IP Holding Company LLC | Methods and systems for error detection and correction in a distributed storage system |
US11741004B2 (en) | 2021-05-19 | 2023-08-29 | Oracle International Corporation | Colorless roots implementation in Z garbage collector |
US20220374393A1 (en) * | 2021-05-19 | 2022-11-24 | Oracle International Corporation | Snapshot at the beginning marking in z garbage collector |
US11734171B2 (en) * | 2021-05-19 | 2023-08-22 | Oracle International Corporation | Snapshot at the beginning marking in Z garbage collector |
WO2022271412A1 (en) * | 2021-06-24 | 2022-12-29 | Pure Storage, Inc. | Efficiently writing data in a zoned drive storage system |
US11669255B2 (en) | 2021-06-30 | 2023-06-06 | Qumulo, Inc. | Distributed resource caching by reallocation of storage caching using tokens and agents with non-depleted cache allocations |
US20230050976A1 (en) * | 2021-08-12 | 2023-02-16 | Seagate Technology Llc | File system aware computational storage block |
CN113766027A (en) * | 2021-09-09 | 2021-12-07 | 瀚高基础软件股份有限公司 | Method and equipment for forwarding data by flow replication cluster node |
US11294604B1 (en) | 2021-10-22 | 2022-04-05 | Qumulo, Inc. | Serverless disk drives based on cloud storage |
US12131074B2 (en) | 2021-10-27 | 2024-10-29 | EMC IP Holding Company LLC | Methods and systems for storing data in a distributed system using GPUS |
US11762682B2 (en) | 2021-10-27 | 2023-09-19 | EMC IP Holding Company LLC | Methods and systems for storing data in a distributed system using offload components with advanced data services |
US11677633B2 (en) | 2021-10-27 | 2023-06-13 | EMC IP Holding Company LLC | Methods and systems for distributing topology information to client nodes |
US12007942B2 (en) | 2021-10-27 | 2024-06-11 | EMC IP Holding Company LLC | Methods and systems for seamlessly provisioning client application nodes in a distributed system |
US11922071B2 (en) | 2021-10-27 | 2024-03-05 | EMC IP Holding Company LLC | Methods and systems for storing data in a distributed system using offload components and a GPU module |
US11354273B1 (en) | 2021-11-18 | 2022-06-07 | Qumulo, Inc. | Managing usable storage space in distributed file systems |
US11599508B1 (en) | 2022-01-31 | 2023-03-07 | Qumulo, Inc. | Integrating distributed file systems with object stores |
WO2023196249A1 (en) * | 2022-04-05 | 2023-10-12 | Western Digital Technologies, Inc. | Aligned and unaligned data deallocation |
US11853554B2 (en) | 2022-04-05 | 2023-12-26 | Western Digital Technologies, Inc. | Aligned and unaligned data deallocation |
US11722150B1 (en) | 2022-09-28 | 2023-08-08 | Qumulo, Inc. | Error resistant write-ahead log |
US12019541B2 (en) | 2022-10-17 | 2024-06-25 | Oracle International Corporation | Lazy compaction in garbage collection |
US11729269B1 (en) | 2022-10-26 | 2023-08-15 | Qumulo, Inc. | Bandwidth management in distributed file systems |
US11966592B1 (en) | 2022-11-29 | 2024-04-23 | Qumulo, Inc. | In-place erasure code transcoding for distributed file systems |
CN116036604A (en) * | 2023-01-28 | 2023-05-02 | 腾讯科技(深圳)有限公司 | Data processing method, device, computer and readable storage medium |
US12141058B2 (en) | 2023-04-24 | 2024-11-12 | Pure Storage, Inc. | Low latency reads using cached deduplicated data |
US12038877B1 (en) | 2023-11-07 | 2024-07-16 | Qumulo, Inc. | Sharing namespaces across file system clusters |
US12019875B1 (en) | 2023-11-07 | 2024-06-25 | Qumulo, Inc. | Tiered data storage with ephemeral and persistent tiers |
US11934660B1 (en) | 2023-11-07 | 2024-03-19 | Qumulo, Inc. | Tiered data storage with ephemeral and persistent tiers |
US11921677B1 (en) | 2023-11-07 | 2024-03-05 | Qumulo, Inc. | Sharing namespaces across file system clusters |
Also Published As
Publication number | Publication date |
---|---|
US9563555B2 (en) | 2017-02-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9563555B2 (en) | Systems and methods for storage allocation | |
US9323465B2 (en) | Systems and methods for persistent atomic storage operations | |
US9442844B2 (en) | Apparatus, system, and method for a storage layer | |
US9983993B2 (en) | Apparatus, system, and method for conditional and atomic storage operations | |
US10019320B2 (en) | Systems and methods for distributed atomic storage operations | |
US9342256B2 (en) | Epoch based storage management for a storage device | |
KR101769465B1 (en) | Systems and methods for atomic storage operations | |
US9606914B2 (en) | Apparatus, system, and method for allocating storage | |
US10102075B2 (en) | Systems and methods for storage collision management | |
US10133511B2 (en) | Optimized segment cleaning technique | |
JP6290405B2 (en) | System and method for memory consistency | |
JP6677740B2 (en) | Storage system | |
US20160246522A1 (en) | Exactly once semantics |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: FUSION-IO, INC., UTAH Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:FLYNN, DAVID;PIGGIN, NICK;TALAGALA, NISHA;SIGNING DATES FROM 20130728 TO 20140612;REEL/FRAME:034732/0617 |
|
AS | Assignment |
Owner name: FUSION-IO, LLC, DELAWARE Free format text: CHANGE OF NAME;ASSIGNOR:FUSION-IO, INC;REEL/FRAME:034838/0091 Effective date: 20141217 |
|
AS | Assignment |
Owner name: SANDISK TECHNOLOGIES, INC., TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:FUSION-IO, LLC;REEL/FRAME:035168/0366 Effective date: 20150219 |
|
AS | Assignment |
Owner name: SANDISK TECHNOLOGIES, INC., TEXAS Free format text: CORRECTIVE ASSIGNMENT TO REMOVE APPL. NO'S 13/925,410 AND 61/663,464 PREVIOUSLY RECORDED AT REEL: 035168 FRAME: 0366. ASSIGNOR(S) HEREBY CONFIRMS THE ASSIGNMENT;ASSIGNOR:FUSION-IO, LLC;REEL/FRAME:035603/0582 Effective date: 20150219 Owner name: FUSION-IO, LLC, DELAWARE Free format text: CORRECTIVE ASSIGNMENT TO REMOVE APPL. NO'S 13/925,410 AND 61/663,464 PREVIOUSLY RECORDED AT REEL: 034838 FRAME: 0091. ASSIGNOR(S) HEREBY CONFIRMS THE CHANGE OF NAME;ASSIGNOR:FUSION-IO, INC;REEL/FRAME:035603/0748 Effective date: 20141217 |
|
AS | Assignment |
Owner name: SANDISK TECHNOLOGIES LLC, TEXAS Free format text: CHANGE OF NAME;ASSIGNOR:SANDISK TECHNOLOGIES INC;REEL/FRAME:038807/0807 Effective date: 20160516 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |