![]() It produces heat equivalent to burning 169 pounds of coal an hour, much of which is used in heating the Westinghouse Energy Center. The TCS requires 664 kilowatts of power, enough to power 500 homes. Another seven miles of serial, copper cable and a mile of fiber-optic cable provide for data handling. It uses 14 miles of high-bandwidth interconnect cable to maintain communication among its 3,000 processors. ![]() "Total TCS floor space is roughly that of a basketball court. (I think that was on 32 bit floats, but I can't find the specs. Though, if you muck around with the architecture (as with Pascal), the ratio of compute to memory b/w to compute resources (smem, register file memory) may change the ratios up.įor comparison, the first supercomputer that was in the Teraflops range and that was available outside of the nuclear labs was the Pittsburgh Terascale machine. It's my understanding that Sgemm (float32 matrix multiplication) has been memory b/w limited for a while on Nvidia GPUs. More complicated operations (e.g., cryptographic hashing) that have a higher iop / byte loaded will be limited by the reduced throughput of the integer op functional units rather than memory bandwidth.įor "deep learning", convolution is one of the few operations that tends to be compute rather than memory b/w bound. Nvidia packs their GPUs with a lot of float32 (or float64) units because some problems (e.g., convolution, or more typical HPC problems like PDEs, which will probably be done in float64) have a high flop / byte ratio.Ī problem just calculating, say, hamming distance or 1-2 integer bit ops per integer word loaded will probably be memory bandwidth bound rather than integer op throughput limited. It depends upon the op / byte loaded intensity. There isn't really anything to indicate it is getting closer, and AMD's dismissal of the NVidia "doing something in the car industry" doesn't give me a lot of confidence.Īnyway, I hope I'm wrong. Then the framework backends get ported, your code keeps running the same, and it can run on all those other architectures.Įveryone has been waiting for that day since One Weird Trick. If you are working in the field there is no reason to put up with second class support from non NVidia vendors - just build in TensorFlow, Torch or a couple of other frameworks and wait for the day (one day, we are promised!) when OpenCL is competitive. The truth is that most of the work in Deep Learning is developing new NN architectures and other algorithmic optimisations. While those 44 Teraflops are usable only for certain kinds of applications (involving 32-bit floating point linear algebra operations), the figure is still kind of incredible.īut the truth is that without the hardware vendors putting significant resources into OpenCL it just isn't competitive and won't be until that happens. For certain deep learning applications, if properly configured, two GTX 1080's might outperform the Titan X and cost about the same, but that's not an apples-to-apples comparison.Ī beefy desktop computer with four of these Titan X's will have 44 Teraflops of raw computing power, about "one fifth" of the raw computing power of the world's current 500th most powerful supercomputer. We'll have to wait for benchmarks, but based on specs, this new Titan X looks like the best single GPU card you can buy for deep learning today. For comparison, the GTX 1080's memory bandwidth is quoted as "320GB/s." The new Titan X also has 3,584 CUDA cores (1.3x more than GTX 1080) and 12GB of RAM (1.5x more than GTX 1080). Nvidia says memory bandwidth is "460GB/s," which will probably have the most impact on deep learning applications (lots of giant matrices must be fetched from memory, repeatedly, for extended periods of time). ![]() Great news all-around for deep learning practitioners.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |