Intel is the king of a shrinking kingdom. Every traditional desktop or laptop PC runs on the Santa Clara company’s processors, but that tradition is fast being eroded by more mobile, ARM-powered alternatives. Apple’s most important personal computers now run iOS, Google’s flagship Chromebook has an ARM flavor, and Microsoft just announced Windows for ARM. And what’s more, the burden of processing tasks is shifting away from the personal device and distributed out to networks of server farms up in the proverbial cloud, leaving Intel with a big portfolio of chips and no obvious customer to sell millions of them to.
You may have noticed some new Nvidia drivers have just popped up, but it would seem like the best course of action is not to update right now, as the company has acknowledged a number of problems including an issue which seriously messes with memory clock speeds on certain graphics cards.
Self-driving cars are the future, and Nvidia wants in. CEO Jen-Hsun Huang announced today at the inaugural GPU Technology Conference Europe that the company is developing a simplified supercomputer that can power self-driving cars.
The supercomputer, called Xavier, is a system-on-chip (SoC) design that features both CPU and GPU on a single chip. Nvidia worked hard to shrink the silicon down to minimize space and maximize efficiency.
Zotac has revealed a pair of GeForce GTX 1060 cards including a new Mini offering designed to fit in small PC cases.
The GeForce GTX 1060 Mini (pictured above) is 6.85-inches long (a tad under 17.5cm) meaning it should fit in a Mini-ITX case.
There may be something fishy going on with Nvidia’s drivers that is causing the company’s new GeForce GTX 1000 Series “Pascal” graphics chips to run at a higher idle clock speed than what the Windows desktop requires. While this issue won’t trigger the apocalypse, this “bug” means a hotter desktop and more overall larger power draw.
The problem, it seems, only happens when m
Nvidia has formally announced its next-generation GeForce GTX 1080 and GeForce GTX 1070 GPUs, ahead of competitor AMD. The two are the first consumer products to use Nvidia's new Pascal architecture, and promise giant leaps of performance and power efficiency compared to previous-generation GeForce GTX 9xx series cards and even the $1000 Titan X GPU.
Felix Kjellberg (better known as Pewdiepie) has made a career out of playing video games, whilst recording said video game and himself. His YouTube channel earned him $12 million (approx. Rs. 82 crores) in 2015, and since then he has expanded his partnership with Disney-owned Maker Studios into newer avenues to produce original content for YouTube's paid subscription service - YouTube Red - and collaborate with fellow web stars for a whole variety of things in Revelmode.
VR has been called the next frontier in PC gaming, but the path to headset-wearing, arms-waving bliss is one that's still full of questions.
Nvidia is hoping to answer one of them by launching the GeForce GTX VR Ready program, which will help gamers identify PC components and entire systems that can cope with VR's heavy hardware demands.
As part of the program, Nvidia's partners will apply a "VR-ready" badge to products that meet Nvidia's minimum hardware requirements for virtual reality. Gamers can then pick out combinations for a new build or go to a retailer for an entire VR-ready system without worrying about stuttering frame rates or system lock-ups further down the line.
Made up of PC makers, system builders, add-in-card manufacturers and retailers, announced partners so far include Acer, Alienware, Asus, falcon Northwest, Hewlett Packard, Maingear, Amazon, NCIX, Newegg, EVGA, MSI and Zotac.
Additionally, Nvidia has worked with developers and hardware makers to set minimum system requirements for VR.
Nvidia's Maxwell-based GTX 970 graphics card has been listed as the minimum requirement for a GPU, with the GTX 980 recommended for more demanding scenarios. That means you're looking at no lower than the R9 390 if you're in the AMD camp. According to Nvidia, you'll also need:
if you meet the minimum requirements for VR then you're one of the lucky few (million), according to Nvidia, which reckons that just 13 million PCs will have the capabilities needed to run VR in 2016.
That sounds like a lot, but it's actually less than 1 per cent of the 1.43 billion PCs expected to be in use globally this year, according to Bloomberg which points to figures by research firm Gartner.
Ever since DirectX 12 was announced, AMD and Nvidia have jockeyed for position regarding which of them would offer better support for the new API and its various features. One capability that AMD has talked up extensively is GCN’s support for asynchronous compute . Asynchronous compute allows all GPUs based on AMD’s GCN architecture to perform graphics and compute workloads simultaneously. Last week, an Oxide Games employee reported that contrary to general belief, Nvidia hardware couldnt perform Asynchronous compute and that the performance impact of attempting to do so was disastrous on the company’s hardware.
This announcement kicked off a flurry of research into what Nvidia hardware did and did not support, as well as anecdotal claims that people would (or already did) return their GTX 980 Ti’s based on Ashes of the Singularity performance. We’ve spent the last few days in conversation with various sources working on the problem, including Mahigan and CrazyElf at Overclock.net, as well as parsing through various data sets and performance reports. Nvidia has not responded to our request for clarification as of yet, but here’s the situation as we currently understand it.
When AMD and Nvidia talk about supporting asynchronous compute, they aren’t talking about the same hardware capability. The Asynchronous Command Engines in AMD’s GPUs (between 2-8 depending on which card you own) are capable of executing new workloads at latencies as low as a single cycle. A high-end AMD card has eight ACEs and each ACE has eight queues. Maxwell, in contrast, has two pipelines, one of which is a high-priority graphics pipeline. The other has a a queue depth of 31 — but Nvidia can’t switch contexts anywhere near as quickly as AMD can.
According to a talk given at GDC 2015, there are restrictions on Nvidia’s preeemption capabilities. Additional text below the slide explains that “the GPU can only switch contexts at draw call boundaries” and “On future GPUs, we’re working to enable finer-grained preemption, but that’s still a long way off.” To explore the various capabilities of Maxwell and GCN, users at Beyond3D and Overclock.net have used an asynchronous compute tests that evaluated the capability on both AMD and Nvidia hardware. The benchmark has been revised multiple times over the week, so early results aren’t comparable to the data we’ve seen in later runs.
Note that this is a test of asynchronous compute latency, not performance. This doesn’t test overall throughput — in other words, just how long it takes to execute — and the test is designed to demonstrate if asynchronous compute is occurring or not. Because this is a latency test, lower numbers (closer to the yellow “1” line) mean the results are closer to ideal.
Here’s the R9 290’s performance. The yellow line is perfection — that’s what we’d get if the GPU switched and executed instantaneously. The y-axis of the graph shows normalized performance to 1x, which is where we’d expect perfect asynchronous latency to be. The red line is what we are most interested in. It shows GCN performing nearly ideally in the majority of cases, holding performance steady even as thread counts rise. Now, compare this to Nvidia’s GTX 980 Ti.
Attempting to execute graphics and compute concurrently on the GTX 980 Ti causes dips and spikes in performance and little in the way of gains. Right now, there are only a few thread counts where Nvidia matches ideal performance (latency, in this case) and many cases where it doesn’t. Further investigation has indicated that Nvidia’s asynch pipeline appears to lean on the CPU for some of its initial steps, whereas AMD’s GCN handles the job in hardware.
Right now, the best available evidence suggests that when AMD and Nvidia talk about asynchronous compute, they are talking about two very different capabilities. “Asynchronous compute,” in fact, isn’t necessarily the best name for what’s happening here. The question is whether or not Nvidia GPUs can run graphics and compute workloads concurrently. AMD can, courtesy of its ACE units.
It’s been suggested that AMD’s approach is more like Hyper-Threading, which allows the GPU to work on disparate compute and graphics workloads simultaneously without a loss of performance, whereas Nvidia may be leaning on the CPU for some of its initial setup steps and attempting to schedule simultaneous compute + graphics workload for ideal execution. Obviously that process isn’t working well yet. Since our initial article, Oxide has since stated the following:
“We actually just chatted with Nvidia about Async Compute, indeed the driver hasn’t fully implemented it yet, but it appeared like it was. We are working closely with them as they fully implement Async Compute.”
Here’s what that likely means, given Nvidia’s own presentations at GDC and the various test benchmarks that have been assembled over the past week. Maxwell does not have a GCN-style configuration of asynchronous compute engines and it cannot switch between graphics and compute workloads as quickly as GCN. According to Beyond3D user EXt3h :
“There were claims originally, that Nvidia GPUs wouldn’t even be able to execute async compute shaders in an async fashion at all, this myth was quickly debunked. What become clear, however, is that Nvidia GPUs preferred a much lighter load than AMD cards. At small loads, Nvidia GPUs would run circles around AMD cards. At high load, well, quite the opposite, up to the point where Nvidia GPUs took such a long time to process the workload that they triggered safeguards in Windows. Which caused Windows to pull the trigger and kill the driver, assuming that it got stuck.
“Final result (for now): AMD GPUs are capable of handling a much higher load. About 10x times what Nvidia GPUs can handle. But they also need also about 4x the pressure applied before they get to play out there capabilities.”
Ext3h goes on to say that preemption in Nvidia’s case is only used when switching between graphics contexts (1x graphics + 31 compute mode) and “pure compute context,” but claims that this functionality is “utterly broken ” on Nvidia cards at present. He also states that while Maxwell 2 (GTX 900 family) is capable of parallel execution, “The hardware doesn’t profit from it much though, since it has only little ‘gaps’ in the shader utilization either way. So in the end, it’s still just sequential execution for most workload, even though if you did manage to stall the pipeline in some way by constructing an unfortunate workload, you could still profit from it.”
Nvidia, meanwhile, has represented to Oxide that it can implement asynchronous compute, however, and that this capability was not fully enabled in drivers. Like Oxide, we’re going to wait and see how the situation develops. The analysis thread at Beyond3D makes it very clear that this is an incredibly complex question, and much of what Nvidia and Maxwell may or may not be doing is unclear.
Earlier, we mentioned that AMD’s approach to asynchronous computing superficially resembled Hyper-Threading. There’s another way in which that analogy may prove accurate: When Hyper-Threading debuted, many AMD fans asked why Team Red hadn’t copied the feature to boost performance on K7 and K8. AMD’s response at the time was that the K7 and K8 processors had much shorter pipelines and very different architectures, and were intrinsically less likely to benefit from Hyper-Threading as a result. The P4, in contrast, had a long pipeline and a relatively high stall rate. If one thread stalled, HT allowed another thread to continue executing, which boosted the chip’s overall performance.
GCN-style asynchronous computing is unlikely to boost Maxwell performance, in other words, because Maxwell isn’t really designed for these kinds of workloads. Whether Nvidia can work around that limitation (or implement something even faster) remains to be seen