2005-08-27

Intel's Continued Marketing Evolution

As an engineer, I had real hope for Intel. I had long theorized that Intel's project codenamed "Yamhill" to bring the AMD x86-64 instruction set to the Pentium Pro (i686) series of processors was actually a 2-part endeavor. I had believed the 2nd more complete effort would result in a new series of innovative Intel products that would challenge AMD to match Intel in intertwined multi-core, cross-core scheduling (true threading) as well as PAE36/52 virtualization atop of a new, true 64-bit architecture.

Oh how could I be so assuming and so DEAD WRONG.

Intel is a company that stoped desiging x86 in 1994 and has A) spent the last 12 years marketing an aging set of designs while they B) slapped on lossy math pipelines, C) extended stages with clock marketing and D) only maintains marketshare thanx to the combination of 1) Tier-1 distribution control and 2) the raw dollars it can put into its leading-edge fabs resulting in a 9-12 month packaging technology lead over AMD. And as Intel Design Forum (IDF) 2005 draws to a close, make no mistake, INTEL IS NOT CHANGING.

Intel has no innovative designs or strategies left for hardware, other than packaging -- which Centrino has proved to be very, very profitable, even if not innovative at all -- and could only be considered marginally evolutionary. Everything is going to be the same as it has always been for the last 12 years for Intel -- reuse what it can, extend what it must, leverage software hacks and software-based solutions, and never -- ABSOLUTE NEVER -- redesign its aging x86 architecture.

To summarize, I will define this as "Intel's 5-step program" for the next 5+ years ...


Step 1: No new x86 = x86 "Refit Attempt #2"

The last, full x86 design by Intel was 1994, the Pentium Pro (i686). All current Pentium products hold lineage to this design, which itself was largely a set of "lessons learned" from its massive set of flaws in the 1992 Pentium (i586), the first superscalar x86 from Intel (although not quite the first superscalar x86 -- NexGen, now AMD, actually holds that title with the Nx586). By 1997, Intel believed the future was Explicitly Parallel Instruction set Compuation (EPIC) and Branch Predication, replacing the x86 architecture, traditional out-of-order execution, register renaming and branch prediction logic with compile-time optimizations.

By 2000, Intel realized it had made a colossal mistake, and compile-time optimizations for EPIC/Predication could not replace the real need for traditional out-of-order execution, register renaming and, most of all, branch prediction. Itanium flopped and Itanium2 offered little in the way of competition compared to its now 6 year-old i686 designs in the then current Pentium 3. A quick, 18-month refit known as the NetBurst architecture resulted in the Pentium 4, using largely long, staggered stages in the pipes for much higher clock speeds, as well as the additional of more extensions and "lossy math" pipelined dedicated to them. The result was a power hungry, 50% slower MHz for MHz architecture than the P3, that benefited from a few interconnect tweaks.

Intel now regrets that decision, and is now moving back to the last, true i686 design in the Pentium 3. But instead of designing a new architecture, it is just respinning the existing i686/P3 design -- and is little different than the Socket-479 Pentium M. A few improvements in the staging (hovering around 14 stages on average, little changed from the P3), the ability to issue work to 4 pipes simultaneously, and newer interconnect technologies for DDR2 memory. Again, only a 18-24 month refit, instead of a full, true 36-48 month redesign.

Most notably absent in the new architecture announcements is HyperThreading. As I have repeated myself time and time again, HyperThreading is a hack that is only applicable to the Pentium 4. With pipes as long as 40 stages, the Pentium 4 spends a lot of time doing absolutely nothing. HyperThreading is a simple hardware hack that lets the OS schedule processes on the CPU as if it was two processors in the attempt to make use of those unused stages, as well as mitigate the ultimate of stalls in the P4 -- the branch mispredict -- from forcing a complete flush of the p4 to only those stages of the thread where the branch mispredict occured. It works well for P4, with the added context switching only reducing performance by 5% or less, and often increasing performance by more than 20% in many cases.

By going back to a tighter, more efficient design with only 14 stages, new out-of-order optimizations and other capabilities, HyperThreading is totally inapplicable to Intel's new processor. When Intel uses the term "Multithreading" in the future, they will be using the term the same as AMD -- threading multiple strings of instructions over multiple cores, not the same core. The concept of doing threading on the same core was designed explicitly for, and dies very much with, the grossly inefficient P4 architecture.

Step 2: Tier-1 commodity volume = Integration, not innovation

If it has not become obvious by now, if you are anyone but a Tier-1 original equipment manufacturer (OEM) that does nothing but PCs, Intel is not a partner you want. Intel has shifted more and more focus to high volume, lock-stock'n barrel system designs of little difference. Now more than ever there is virtually no difference between a Dell, Gateway or other major Tier-1 PC product -- they are Intel designed, Intel integrated, Intel specified and everything short of shipped from Intel itself boxes.

This is an excellent strategy for Intel, more and more integration at a constant, guaranteed cost and, more importantly, guaranteed profit margin per unit. Tier-1 OEMs don't incur any R&D costs, and reap the direct margins as well, doing little more than marketing and service -- and even then Intel has R&D money to throw at them as well. Since this is where 80% of Americans get their PCs, it's an avenue that is not likely to change anytime soon. So there is little incentive for Intel to open up their products to 2nd or 3rd party designers -- they want to sell one integrated product for all.

Not surprisingly, Intel's new commodity products will start to integrate the entire memory controller hub (MCH) and graphical processor unit (FPU) into a single package -- in essence, the northbridge on-CPU. This is different than AMD's approach in the Athlon 64/Opteron (see the next step), but more akin to what Cyrix / National Semiconductor did with the Geode but more for embedded (which AMD has licensed), or even non-x86 processors like the Sun UltraSPARC "i" (e.g., IIi, IIIi, etc...) products that integrate the memory and I/O control into a single, uniprocessor-only design.

So there will soon be two types of Intel desktop processors. One with everything integrated for desktops and general consumer usage, and another for more enthusiast or workstation users.

If I would be so bold, it would not surprise me if the sheer volume of the former will quickly outstrip the volume of the latter. Especially since the volume of the more enthusiast/workstation user is going to AMD and its "2nd/3rd party inclusion" of major designers of the US, Europe and Tawain more and more. Intel might not mind one bit giving up this segment because it will almost always be a sub-25% marketshare, whereas AMD cannot break into the 75%+ marketshare of heavy integration.

Which may leave a very permanent mark on the industry with Intel continuing to dominate in volume and at the Tier-1, while AMD caters to more flexible designs and performance. In fact, we're already starting to see this where AMD no longer sells on cost, but on quality (to myself included), with pricing matching no longer a rule for AMD. Which brings me to the server.

Step 3: GTL here to stay = bridging, bridging and more bridging

Probably the most pitiful aspect of Intel's strategy is the utter-failure to introduce a commodity systems interconnect, but continuing to rely on the late '80s designed Gunning Transceiver Logic (GTL) of legacy IBM PC/AT signals. The Advanced GTL Plus (AGTL+) "bus" is just that, a bunch of wires that share all the same controls -- again, tied largely to hardwired IBM PC/AT signals -- and no more than two components talking at the same time. Anytime Intel has to add another component to the bus, it bridges that component, so it now ties up the bus if it needs to talk to anything else. Whether Pentium or Itanium, this is Intel's approach -- and to find otherwise is to go with a costly proprietary design from HP, IBM, SGI or another.

More recently the simple bus design through a single memory controller hub (MCH) has become a challenge for Intel with dual-core processors, requiring additional bridging. So it is not surprising to hear that Intel is now introducing MCH designs with two independent connections for two processors. At first it might seem like an innovation, but it's really just an evolution of the additional bridge logic that was required internal to the processor for dual-core. Plus, without a full redesign of the processor with a real systems interconnect, there is only so much bridging that could be done inside of the CPU before diminishing returns resulted.

Make no mistake, the two-processor MCH AGTL+ is not even to the same redesign level that AMD's original adoption of the upto 16-port Alpha EV6 "crossbar switch" was. In other words, the performance of two processors on this new MCH has more to do with signaling improvements than an actual, even although quite aged now, 32-bit Athlon MP or Alpha 264 approach. And Intel is still very, very far from coming close to anything like the partial mesh used in Athlon 64 / Opteron, with glueless, non-uniform memory architecture (NUMA) to each CPU, as well as tunneled HyperTransport for multiple inter-CPU and inter-I/O access.

Other than proprietary designs like the few from HP, Intel and SGI, AMD has one the commodity 2-8 way (currently 4-16 core) battle, especially for low-cost, Infiniband-connected supercomputing clusters where AMD has a price and performance lead over 2x.

Step 4: XScale to the rescue = I/O Processors in the chipset

About the only "cool thing" I have noted in Intel's server designs is something I have argued for a long while. It will be interesting if AMD comes up with somthing similar directly in a HyperTransport I/O tunnel, but it's definitely an area where Intel has actually done good. I noted that in Intel's commodity dual-processor designs it is now starting to embedded an IOP332 (PCI-X/PCIe 500-1000+MHz XScale I/O Processor) on the mainboard, possibly into the I/O Controller Hub (ICH) aka "southbridge" itself. So, what does this afford Intel?

In a nutshell, instead of network/storage controllers to either be "dumb" and rely on the host CPU/memory for software-based processing (i.e., lots of redundant, inefficient data streams), or "expensive" ($500+) with their own intelligence on-board, the chipset can offer some direct intelligence of its own. This intelligence works at the chipset, without bothering the CPU, using main memory for buffering. So instead of pushing all data from the disks directly up the CPU and affecting other service loads just to do a RAID-1 mirror or RAID-3/4/5/6 XOR operation, or when dealing with network layer 2/3/4 frame/packet/transport resolution for general network services, possibly iSCSI, etc..., the chipset can do this directly.

It may seem like a small, insignificant addition to the chipset, but the 500-1000+MHz superscalar microcontroller resource in an embedded XScale offers a lot of off-loading of traditional I/O services that can process such network/storage data streams directly and save a good 2-5x as much load on and/or duplication in the host CPU interconnect, which was _never_ built for such operations (but processing data). I can personally see a lot of drivers for many OSes that can now start taking advantage of cheaper network/storage hardware but giving the same performance and reduced CPU load of products that cost 3-5x as much.

It might be the one thing that could cause me to reconsider an Intel server purchase over an AMD one -- although only if Intel's new EM64T processors have an I/O MMU so it removes the need to use I/O performance-killing "bounce buffers" with more than 1-4GiB of memory.

Step 5: Virtualization is for software = software-based hardware products

And, alas, Intel's failure is complete. As much as Intel loves to bash Sun as proprietary -- no matter how ironic that statement is with SPARC being a documented IEEE standard available for license under "fair and non-discriminatory" terms, and Itanium of no such option -- Intel seems to be following Sun's playbook. Instead of coming out with a real, hardware-based virtualization option (one I seriously hope AMD does in the next 2 years -- and I suspect they will), Intel is completely going software-based virtualization for loads over a network. If this sounds exactly like a replay of Sun's recent virtualization moves, it is.

The larger question is if Intel is going to be a virtualization enabler for Microsoft Virtual Server (fka Virtual PC), EMC VMWare ESX/GSX, or possibly a competitor in the long run? I mean, if Intel is offering NO hardware virtualization features, and doing everything in software, at what point is Intel needed -- other than maybe for access to design/interface information by Microsoft, VMWare, etc...? It's clear that without a serious x86 redesign/rethink, all Intel can offer is multi-threading over multi-core in the future -- no different than AMD (except that AMD might do it in hardware, which will be interesting if they do ;-).

Because my past assumptions of what Intel might be up to have no fallen completely dead, as Intel's plans are to continue to reuse what they designed over a decade ago, leverage their 17+ fabs (~6 leading-edge) compared to AMD's 4 (and only 1 leading-edge) to maintain its 9-12 month packaging technology lead, and build a new Pentium world of software, not hardware options.

Because hardware innovation is dead at Intel. It was dead long ago and with the failure of IA-64 (which Digital Semiconductor predicted back in the mid-'90s), it is going to be dead in the future at Intel too.

1 comments:

wow power leveling said...

Americans everywhere humor A detention wow gold notice was written like this: a wow power leveling police car with stones, to win wow gold the detention center for seven wow power leveling days all-inclusive accommodation replica rolex Tour Value; hit send 2 a beautiful bracelet, wow power level fashionsuit, police transport; more more surprises , the former can enjoy free shaved 10; before the 100 can play with power leveling the dogs, the guests were presented massage sticks, electric shocks to CHEAPEST power leveling the dead skin beauty care services.