2011 Volume E94.D Issue 7 Pages 1409-1418
Conventional array processors randomly access input/coefficient data stored in memory many times during three-dimensional discrete cosine transform (3D-DCT) calculations. This causes a calculation bottleneck. In this paper, a 3D array processor dedicated to 3D-DCT is proposed. The array processor drastically reduces data swapping or replacement during the calculation and thus improves performance. The time complexity of the proposed N×N×N array processor is O(N) for an N3-size input data cube, and that of the 3D-DCT sequential calculation is O(N4). A specific I/O architecture, throughput-improved architectures, and more scalable architecture are also discussed in terms of practical implementation. Experimental results of implementation on FPGA (field-programmable gate array) suggest that our architecture provides good performance for real-time 3D-DCT calculations.