Ever since DirectX 12 was announced, AMD and Nvidia have jockeyed for position regarding which of them would offer better support for the new API and its various features. One capability that AMD has talked up extensively is GCN’s support for asynchronous compute . Asynchronous compute allows all GPUs based on AMD’s GCN architecture to perform graphics and compute workloads simultaneously. Last week, an Oxide Games employee reported that contrary to general belief, Nvidia hardware couldnt perform Asynchronous compute and that the performance impact of attempting to do so was disastrous on the company’s hardware.
This announcement kicked off a flurry of research into what Nvidia hardware did and did not support, as well as anecdotal claims that people would (or already did) return their GTX 980 Ti’s based on Ashes of the Singularity performance. We’ve spent the last few days in conversation with various sources working on the problem, including Mahigan and CrazyElf at Overclock.net, as well as parsing through various data sets and performance reports. Nvidia has not responded to our request for clarification as of yet, but here’s the situation as we currently understand it.
When AMD and Nvidia talk about supporting asynchronous compute, they aren’t talking about the same hardware capability. The Asynchronous Command Engines in AMD’s GPUs (between 2-8 depending on which card you own) are capable of executing new workloads at latencies as low as a single cycle. A high-end AMD card has eight ACEs and each ACE has eight queues. Maxwell, in contrast, has two pipelines, one of which is a high-priority graphics pipeline. The other has a a queue depth of 31 — but Nvidia can’t switch contexts anywhere near as quickly as AMD can.
According to a talk given at GDC 2015, there are restrictions on Nvidia’s preeemption capabilities. Additional text below the slide explains that “the GPU can only switch contexts at draw call boundaries” and “On future GPUs, we’re working to enable finer-grained preemption, but that’s still a long way off.” To explore the various capabilities of Maxwell and GCN, users at Beyond3D and Overclock.net have used an asynchronous compute tests that evaluated the capability on both AMD and Nvidia hardware. The benchmark has been revised multiple times over the week, so early results aren’t comparable to the data we’ve seen in later runs.
Note that this is a test of asynchronous compute latency, not performance. This doesn’t test overall throughput — in other words, just how long it takes to execute — and the test is designed to demonstrate if asynchronous compute is occurring or not. Because this is a latency test, lower numbers (closer to the yellow “1” line) mean the results are closer to ideal.
Here’s the R9 290’s performance. The yellow line is perfection — that’s what we’d get if the GPU switched and executed instantaneously. The y-axis of the graph shows normalized performance to 1x, which is where we’d expect perfect asynchronous latency to be. The red line is what we are most interested in. It shows GCN performing nearly ideally in the majority of cases, holding performance steady even as thread counts rise. Now, compare this to Nvidia’s GTX 980 Ti.
Attempting to execute graphics and compute concurrently on the GTX 980 Ti causes dips and spikes in performance and little in the way of gains. Right now, there are only a few thread counts where Nvidia matches ideal performance (latency, in this case) and many cases where it doesn’t. Further investigation has indicated that Nvidia’s asynch pipeline appears to lean on the CPU for some of its initial steps, whereas AMD’s GCN handles the job in hardware.
Right now, the best available evidence suggests that when AMD and Nvidia talk about asynchronous compute, they are talking about two very different capabilities. “Asynchronous compute,” in fact, isn’t necessarily the best name for what’s happening here. The question is whether or not Nvidia GPUs can run graphics and compute workloads concurrently. AMD can, courtesy of its ACE units.
It’s been suggested that AMD’s approach is more like Hyper-Threading, which allows the GPU to work on disparate compute and graphics workloads simultaneously without a loss of performance, whereas Nvidia may be leaning on the CPU for some of its initial setup steps and attempting to schedule simultaneous compute + graphics workload for ideal execution. Obviously that process isn’t working well yet. Since our initial article, Oxide has since stated the following:
“We actually just chatted with Nvidia about Async Compute, indeed the driver hasn’t fully implemented it yet, but it appeared like it was. We are working closely with them as they fully implement Async Compute.”
Here’s what that likely means, given Nvidia’s own presentations at GDC and the various test benchmarks that have been assembled over the past week. Maxwell does not have a GCN-style configuration of asynchronous compute engines and it cannot switch between graphics and compute workloads as quickly as GCN. According to Beyond3D user EXt3h :
“There were claims originally, that Nvidia GPUs wouldn’t even be able to execute async compute shaders in an async fashion at all, this myth was quickly debunked. What become clear, however, is that Nvidia GPUs preferred a much lighter load than AMD cards. At small loads, Nvidia GPUs would run circles around AMD cards. At high load, well, quite the opposite, up to the point where Nvidia GPUs took such a long time to process the workload that they triggered safeguards in Windows. Which caused Windows to pull the trigger and kill the driver, assuming that it got stuck.
“Final result (for now): AMD GPUs are capable of handling a much higher load. About 10x times what Nvidia GPUs can handle. But they also need also about 4x the pressure applied before they get to play out there capabilities.”
Ext3h goes on to say that preemption in Nvidia’s case is only used when switching between graphics contexts (1x graphics + 31 compute mode) and “pure compute context,” but claims that this functionality is “utterly broken ” on Nvidia cards at present. He also states that while Maxwell 2 (GTX 900 family) is capable of parallel execution, “The hardware doesn’t profit from it much though, since it has only little ‘gaps’ in the shader utilization either way. So in the end, it’s still just sequential execution for most workload, even though if you did manage to stall the pipeline in some way by constructing an unfortunate workload, you could still profit from it.”
Nvidia, meanwhile, has represented to Oxide that it can implement asynchronous compute, however, and that this capability was not fully enabled in drivers. Like Oxide, we’re going to wait and see how the situation develops. The analysis thread at Beyond3D makes it very clear that this is an incredibly complex question, and much of what Nvidia and Maxwell may or may not be doing is unclear.
Earlier, we mentioned that AMD’s approach to asynchronous computing superficially resembled Hyper-Threading. There’s another way in which that analogy may prove accurate: When Hyper-Threading debuted, many AMD fans asked why Team Red hadn’t copied the feature to boost performance on K7 and K8. AMD’s response at the time was that the K7 and K8 processors had much shorter pipelines and very different architectures, and were intrinsically less likely to benefit from Hyper-Threading as a result. The P4, in contrast, had a long pipeline and a relatively high stall rate. If one thread stalled, HT allowed another thread to continue executing, which boosted the chip’s overall performance.
GCN-style asynchronous computing is unlikely to boost Maxwell performance, in other words, because Maxwell isn’t really designed for these kinds of workloads. Whether Nvidia can work around that limitation (or implement something even faster) remains to be seen
In the self-driving car business, decisions are only as good as your sensor data. While the state-of-the-art Velodyne LIDAR that adorns benchmark research vehicles will set you back $80,000, winning the DARPA urban challenge probably requires the better part of a cool million. Unfortunately, all it may take to spoof sophisticated sensors like these is a cheap laser pointer pulsed by something as simple as an Arduino or Raspberry Pi.
As IEEE Spectrum reports, security specialist Jonathan Petit will be presenting a disturbingly easy new hack this November at the Black Hat Europe conference. After recording the probe signals from an IBEO Lux lidar unit, Petit simply fired them back at the emitter using his laser. As long as they were synchronized, the lidar unit ‘saw’ an illusory object in front of it. The trick works up to 100 meters away in any direction — at the front, back or side — and doesn’t even require a tightly focused beam.
Although other hacks like spoofing the vehicle’s GPS or tire sensors have been done before, Petit’s hack could potentially bring a vehicle at speed to a full stop. Several 3D-rendered vehicles could be placed not only in front of the car, but actively moving toward it. That would present quite a gauntlet to any control now on the market.
Lidar systems don’t operate in a radiation band that is licensed like short-range radar. Nor do they typically encode or encrypt their pulses. These realities makes them particularly vulnerable to anyone deliberately targeting them. But lidar systems are evolving rapidly, becoming not just cheaper, but more capable. So-called ‘sensor-fusion’ technology is also evolving to the point where hacking just a single sensor, or a single kind of sensor, may not be enough to overwhelm the system.
For example, last week a radically new kid of was laser systen in Scientific Reportsthat combines an electrically-pumped vertical-cavity surface-emitting laser (VCSEL) with a micromechanical resonator on a single chip. This device would be able to sweep the output beam across a broad wavelength band in a microsecond (as opposed to 10 milliseconds) to create a highly efficient LIDAR source beam. Putting the wavelength control functions inside the laser itself in this way would mean tiny, fast, and low-power sensors at a fraction of the cost.
What will gadget-loving UAE residents be doing at 9pm on 9-9 (tonight)?
We’re bound to be glued to our screens to figure out what Apple has in store for them in its latest instalment of iPhones – the iPhone 6s and the iPhone 6s Plus.
With just hours to go for the covers to be taken off the new devices, rumours are flying thick and fast about what the new iPhones may or may not sport under their hoods.
UAE residents are indeed early adopters of technology, and that’s one of the reasons why smartphone manufacturers from BlackBerry to Samsung have Dubai in their global launch schedules.
Dubai Holding is currently re-working the master plan of the mega Mall of the World development and now aims to build it as 'future city' of Dubai.
The new components of the development include residential and office with the master plan being re-engineered to integrate public transport systems in the form of Metro, tram, buses, water transport, etc., to ease traffic within and outside the development
“The project has not been stalled… it is in redevelopment stage.
"What Dubai Holding wants to do given the strategic location of the land – which is as large as Downtown Dubai - is to have patience and search for what is going to be the very best result for the site?” said Morgan Parker, Chief Operating Officer of Sufouh Development, the new company set up to oversee the development of Mall of the World.
“We are trying to forecast what Dubai is going to be 50 years from now and so we are not building a project that is a statement on the world today,” he asserted.
The site in the Al Sufouh area is currently occupied by the Dubai Police Academy.
“The academy is not moving for next two years and so we have time to find the best solutions not just for development but also for the surrounding area.
"Our challenge is to create a tourism destination in a climate-controlled environment and not to create any congestion and traffic jams.”
The phase one of the development is likely to begin in the next 18 months with full construction planned only after the existing academy relocates to its new location in Dubai Academic City.
The project, announced at Cityscape 2014, is likely to cost Dh25 billion and will include a shopping mall with an area of eight million square feet, the world’s largest theme park, which will be covered by a glass dome that will be open during the winter months, a wellness dedicated zone, a cultural celebration district and a wide range of hospitality options comprising 20,000 hotel rooms.
“It is not just about creating the world’s largest mall, but seamlessly integrating the hospitality, residential, commercial and entertainment lifestyle options into the bigger picture.”
In fact, the Mall of the World will not have one large mall, but three urban malls, almost two thirds of the size of Mall of the Emirates. There will be dozens of public plazas and entertainment zones, 23 parks, hotels, etc., that will be linked to each other through air-conditioned arcades and climate controlled spaces.
That sounds crazy but you can really pre-order that Pip Boy device along with the game to be on sale starting from November 10, 2015.
Get ready for nuclear apocalypses)))
An elementary biocomputer targeted to manipulate a behavior of bacteria is going to be raised by a group of international biologists. Afterwards the invention would become a breakthrough in the field of medicine. Being placed into bowels, the biocomputers would be able to have a positive impact on the health of a person: to activate a microflora or even to initiate a production of medicine.
A primitive logical scheme includes three types of bacteria. The scheme works in a following way: two strains of colibacillus are complemented by two opposite sets of genes. Thus, the scientists forced bacteria to synthesize signal molecules aimed to switch on or switch off a certain complex of genes in the third type of bacteria. The three components collected in one capacity contemporize the work of separate components, which represents a peculiar analog of electronic “pendulum”. The biological “pendulum” could be used to control the work of more sophisticated biocomputers in bowels.
North Dakota police will be free to use armed drones. However the drones should be equipped with nonlethal weapons. The weapons permitted are as follows: rubber bullets, pepper spray, police tasers, sound cannons and tear gas. Mitigation of the law limiting the usage of drones for law enforcement activities was pushed by commercial companies and policemen.
The first hearings on the law took place in March when it was supposed to completely ban the weaponizing of the drones. Later the companies producing unmanned aerial vehicles joined the debate. The law prohibiting the usage of armed drones was absolutely unbeneficial for them as they couldn’t receive favourable purchase orders for North Dakota police. As a result the legislators allowed to arm the drones with nonlethal weapons. However according to The Guardian at least 39 people have been killed by police tasers only in 2015 so far.
The American civil community is seriously concerned about depersonalization of actions of police officers. They would dare handle weapons indirectly, being unable to control and sum up a particular situation what could lead to new victims.