In CuTeDSL, it is often convenient to use StaticPersistentTileScheduler when writing persistent kernels. In this blogpost I give a brief intro to how it works. Usage in KernelTypical usage within a CuTeDSL program is that we first set up the grid and the scheduler params in a dedicated function: tile_sched_params = utils.PersistentTileSchedulerParams( problem_shape_ntile_mnl, self.cluster_shape_mnk, swizzle_size=self.swizzle_size, raster_along_m=self.raster_along_m, ) grid = utils.StaticPersistentTileScheduler.get_grid_shape( tile_sched_params, self.max_active_clusters ) Within the kernel we then create the scheduler and use it as an iterator over work tiles: tile_sched = utils.StaticPersistentTileScheduler.create( tile_sched_params, cute.arch.block_idx(), cute.arch.grid_dim() ) work_tile = tile_sched.initial_work_tile_info() while work_tile.is_valid_tile: # Get coordinate for current tile cur_tile_coord = work_tile.tile_idx mma_tile_coord_mnl = ( cur_tile_coord[0] //…
No comments yet. Log in to reply on the Fediverse. Comments will appear here.