The gridfree octree DSMC approach is parallelized with GPGPUs using CUDA. A linear space filling Morton Z-curve is employed to represent the three dimensional octree structure in a linear array and the advantages of this implementation is discussed. The methodology for a multi-GPU, hybrid MPI-CUDA implementation, involving load balanced domain decomposition and inter-GPU communications is presented. External flow at 1 atm over a fractal-like spherical aggregate is modeled and strong scaling studies for the test case is analysed. The analysis showed that the CHAOS DSMC solver, with 8 GPUs was 92% efficient, while the 16 GPU case was not partitioned equally between all the 16 GPUs. Argon gas flow through a fibrous microstruture modeled with 1.5 million triangular panels was also modeled and showed good scaling. Material permeability of the microstructure was computed to be equal to 253 ×10-12m2.