This paper proposes an energy-efficient deep in-memory architecture for NAND flash (DIMA-F) to perform machine learning and inference algorithms on NAND flash memory. Algorithms for data analytics, inference, and decision-making require processing of large data volumes and are hence limited by data access costs. DIMA-F achieves energy savings and throughput improvement for such algorithms by reading and processing data in the analog domain at the periphery of NAND flash memory. This paper also provides behavioral models of DIMA-F that can be used for analysis and large scale system simulations in presence of circuit non-idealities and variations. DIMA-F is studied in the context of linear support vector machines and k-nearest neighbor for face detection and recognition, respectively. An estimated 8×-to-23× reduction in energy and 9×-to-15× improvement in throughput resulting in EDP gains up to 345× over the conventional NAND flash architecture incorporating an external digital ASIC for computation.