CodePlexProject Hosting for Open Source Software

An image of paging as below:

Before finding a cell, it is need to find its page firstly. For an one-dimensional array to calculate its page-index and cell-index:

int pageIndex = index / pageSize; int cellIndex = index % pageSize;

If the size of page is power of 2, it will be more effective to use bit-computing instead of division.

All the cells data will be attached to the most bottom of index-pages, for example:

Any never been accessed index-pages, and the cells will be null, they do not use memory.

To access a cell, according to the depth of tree, an index of cell will be calculated two times (the depth of tree is 2):

Smaller size of page will take fewer memory, but if they are too small, a more deeper tree is necessary to be created in order to get larger capacity of array. The deeper tree will spend more time during once cell accessing. Consider to get a good balance between page-size and tree-depth is most important. Currently ReoGrid uses the following values of page size:

- Number of rows: 128
- Number of columns: 32
- Depth of tree: 3

- Number of rows: 128^3 = 2,097,152
- Number of columns: 32^3 = 32,768
- (Cell elements up to 2,097,152 x 32,768 = 68,719,476,736)
- (Maximum memory usage is 68,719,476,736 * 1.5 * 8 / 1024 / 1024 / 1024 = 768GB)

get { Node node = root; // MaxDepth is 3 for (int d = MaxDepth - 1; d > 0; d--) { // RowBitLen is 7 when row page size is 128 int r = (row >> (RowBitLen * d)) % RowSize; // ColBitLen is 5 when col page size is 32 int c = (col >> (ColBitLen * d)) % ColSize; node = node.nodes[r, c]; if (node == null) return default(T); } // RowSize is 128, and ColSize is 32 return node.data[row % RowSize, col % ColSize]; }

Setter:

set { Node node = root; for (int d = MaxDepth - 1; d > 0; d--) { // RowBitLen is 7 when row page size is 128 int r = (row >> (RowBitLen * d)) % RowSize; // ColBitLen is 5 when col page size is 32 int c = (col >> (ColBitLen * d)) % ColSize; Node child = node.nodes[r, c]; if (child == null) { if (value == null) { return; } else { child = node.nodes[r, c] = new Node(); if (d > 1) { child.nodes = new Node[RowSize, ColSize]; } } } node = child; } if (node.data == null) { if (value == null) { return; } else { node.data = new T[RowSize, ColSize]; } } node.data[row % RowSize, col % ColSize] = value; }

double sum = 0; foreach(var page in sheetData) { for(int r = page.minRow; r <= page.maxRow; r++) { for(int c = page.minCol; c <= page.maxCol; c++) { sum += page[r, c].data; } } }

double sum = 0; for (int r = 0; r < sheet.RowCount; r++) { if(sheet.IsEmptyRow(r)) continue; for (int c = 0; c < sheet.ColCount; c++) { if(sheet.IsEmptyCol(c)) continue; sum += sheet[r, c].data; } }

In fact seems mix the solution 1 and 2 will be more better :)

Test Item | total cycles | elapsed (ms) | one cycle | memory usage |
---|---|---|---|---|

Sequential Read & Write |
2,000,000 | 161 | 0.0805 μs |
9.6 MB |

- Number of rows: 1000 | ||||

- Number of columns: 1000 | ||||

Sequential Read & Write |
20,000,000 | 1624 | 0.0812 μs |
46 MB |

- Number of rows: 100,000 | ||||

- Number of columns: 100 | ||||

Interval Read & Write |
20,000 | 67 | 3.35 μs |
35 MB |

- Number of rows: 10000 | ||||

- Number of columns: 10000 | ||||

- Interval: 100 | ||||

Random Read & Write |
20,000 | 183 | 9.15 μs |
73 MB |

- Row Range: 0-10000 | ||||

- Column Range: 0-10000 | ||||

Iterate Over Empty Range |
200,000,000 | 152 | 0.00076 μs |
- |

- Number of rows: 1,000,000 | ||||

- Number of columns: 200 | ||||

Fill 1,000,000 Rows |
1,000,000 | 130 | 0.13 μs |
145 MB |

- Number of rows: 1,000,000 | ||||

- Number of columns: 200 | ||||

Sum 1,000,000 Rows (Iterator) |
1,000,000 | 146 | 0.146 μs |
- |

- Number of rows: 1,000,000 | ||||

- Number of columns: 1 | ||||

Sum 1,000,000 Rows (For) |
1,000,000 | 67 | 0.067 μs |
- |

- Number of rows: 1,000,000 | ||||

- Number of columns: 1 |

Excel: DataStructPerformance20140111.xlsx

I found a bug that is first level of tree will not be used, and the depth of tree will be +1 than the depth as expected. After fix this, performance has been a little improved. But I don't know that why random accessing became slower.

Monday, January 13, 2014

Test Item | total cycles | elapsed (ms) | one cycle | memory usage |
---|---|---|---|---|

Sequential Read & Write |
2,000,000 | 130 | 0.065 μs |
9.6 MB |

- Number of rows: 1000 | ||||

- Number of columns: 1000 | ||||

Sequential Read & Write |
20,000,000 | 1297 | 0.06485 μs |
46 MB |

- Number of rows: 100,000 | ||||

- Number of columns: 100 | ||||

Interval Read & Write |
20,000 | 81 | 4.05 μs |
35 MB |

- Number of rows: 10000 | ||||

- Number of columns: 10000 | ||||

- Interval: 100 | ||||

Random Read & Write |
20,000 | 263 | 13.15 μs |
73 MB |

- Row Range: 0-10000 | ||||

- Column Range: 0-10000 | ||||

Iterate Over Empty Range |
200,000,000 | 150 | 0.00075 μs |
- |

- Number of rows: 1,000,000 | ||||

- Number of columns: 200 | ||||

Fill 1,000,000 Rows |
1,000,000 | 110 | 0.11 μs |
145 MB |

- Number of rows: 1,000,000 | ||||

- Number of columns: 200 | ||||

Sum 1,000,000 Rows (Iterator) |
1,000,000 | 113 | 0.113 μs |
- |

- Number of rows: 1,000,000 | ||||

- Number of columns: 1 | ||||

Sum 1,000,000 Rows (For) |
1,000,000 | 50 | 0.05 μs |
- |

- Number of rows: 1,000,000 | ||||

- Number of columns: 1 |

Excel: DataStructPerformance20140113.xlsx

Last edited Jan 14, 2014 at 1:44 PM by unvell, version 39