151 questions
2
votes
2
answers
114
views
What is the correct way to perform 4D FFT in Cuda by implementing 1D FFT in each dimension using cufftPlanMany API
Cuda does not have any direct implementation of 4D FFT. Hence I want to decompose a 4D FFT into 4 x 1D FFTs into X, Y, Z, and W dimensions. I understand that the cufftPlanMany API is best suited for ...
1
vote
0
answers
47
views
Why CuFFT throughput increases as the transform size gets larger?
(Updated) I am trying to understand how CUDA parallelism works in CuFFT while learning CUDA coding.
I wrote my version of 1-D FFT in CUDA C++ and compared it with cuFFT. Below are the throughputs I ...
2
votes
0
answers
61
views
How do you manage register usage with cufft LTO callbacks?
When swapping cuFFT callbacks from the legacy callbacks to the new LTO callbacks, I encountered errors with certain FFT sizes combined with certain FFT callbacks. The error would occur when calling ...
-1
votes
1
answer
174
views
CUDA image upsampling with FFT method
I'm trying to do image upsampling with FFT in CUDA. I first do forward FFT on the image, then I pad the result with 0 as shown below:
for a transformed image:
1 2
3 4
Pad it to:
1 0 0 2
0 0 0 0
0 0 0 ...
0
votes
0
answers
35
views
How to fix the error that occurred in not defining the cuFFT library in Google colab
I am a beginner in CUDA programming and I need to use the cuFFT library for my research in Google colab. But by executing the code, the commands of the cuFFT library are justified with an error.
The ...
1
vote
1
answer
765
views
I get CUFFT_INTERNAL_ERROR when cufftPlanMany
Is there any other reason that CUFFT_INTERNAL_ERROR occurs?
I do cuFFT2D on same size of input and different batch size for every set.
Input array size is 360(rows)x90(cols) and batch size is usually ...
1
vote
1
answer
340
views
Problem compiling dll files with CUDA FFT package (Windows 64)
I'm trying to compile some dll files with some c++ and CUDA functions to quickly process some data that I receive in a python program (160MB/s from an acquisition card to be FFT). The DLL works fine ...
0
votes
0
answers
108
views
Issue with cudafft library and fftshift on odd image dimensions
'm facing with a code I'm implementing for an exam using the GPU. Specifically, the code I'm writing is in C++, and I'm using the CUFFT library to perform the Fast Fourier Transform (FFT). The purpose ...
1
vote
1
answer
145
views
Batching multiple 2D FFT's from within a 4D array using planMany() from FFTW/cuFFT
I have a 4D array of dimensions (N, 128, 128, 4) and I want to perform a 2D FFT for the two middle dimensions. My question: is it possible to do this with the xxxPlanMany() function from FFTW/cuFFT/...
1
vote
1
answer
2k
views
torch fft with a GPU is much slower then fft with CPU
I'm running the following simple code on a strong server with a bunch of Nvidia RTX A5000/6000 with Cuda 11.8. For some reason, FFT with the GPU is much slower than with the CPU (200-800 times). Does ...
0
votes
1
answer
1k
views
CMake CUDA: static link with cublas
I want to compile CUDALibrarySamples. cuFFT uses cmake and I want to compile and link 1d_c2c application with the static version of cufft lib (-lcufft_static). Using Makefiles is trivial I have added -...
-2
votes
1
answer
230
views
Blockwise/Strided reduction using CUDA
TLDR: I am trying to write a GPU code that computes a blockwise reduction on an array. The input looks like [block_0, trash_0, block_1, trash_1, ..., block_n, trash_n], and I want to compute block_0 + ...
1
vote
0
answers
206
views
Fourier transform with cuFFT, are complex to complex more efficient?
I'm writing a code that integrates a PDE in time in Fourier space, and I'm doing so in CUDA/C++.
There is one real valued array I need to evolve in time.
I've written the code in two different ways, ...
0
votes
1
answer
133
views
How to set cuFFT timeout?
I am looking for a way to interrupt cuda FFT computation if it runs for too long. How can it be accomplished?
I was looking for some timeout setting in the API, but I found no such option. When ...
0
votes
0
answers
35
views
How do I use complex thrust::device_vector in cuFFT functions [duplicate]
I have a working code (not shown) that performs a series of complex->complex fast fourier transforms using the cufft library. I have been attempting to simplify this code by using the thrust ...