Making an FSF Endorseable Mass-volume Embedded Processor

Making an FSF Endorseable Mass-volume Embedded Processor


| release: 02 Dec 2012 | edited : 08 Dec 2012 |

The goal:

To bring into production a mass-volume high-performance processor with:
  • modern interfaces (SATA-II, Gigabit Ethernet, USB3)
  • modern capabilities (3D Graphics and 1080p30 Video Decode)
  • no DRM
that is guaranteed at all times to be:
  • 100% documented
  • 100% software-programmable
  • and 100% supported by Free Software licensed toolchains
and requiring no proprietary libraries of any kind, all the
way from boot-up, right through to the applications level.

The Target markets:

The justification:

    Proprietary 3D Graphics and Video Engines come with software libraries
    that are too complex for the average factory to deal with, and where
    bugs or unsuitable compiler options cannot be corrected (at all).
    The strategy is therefore to dramatically simplify product development,
    streamlining the entire process of delivering real products to market.

The deadline:

    July 2013 for first mass-produced silicon

The cost:

    $USD 10 million


Introduction

There are plenty of embedded System-on-a-Chip processors available: Apple has one that is custom-made for its products. Samsung has dozens of processors for its Mobile Phones and Smart Tablet products. Freescale, Texas Instruments, Rockchips, Allwinner, Ingenic - the list goes on and on. Yet not a single one of these modern mass-volume embedded processors is 100% Free Software Friendly. The reason is that in the embedded world, the core processor is actually quite slow, and cannot on its own deliver modern performance (for example 1080p video decode) expected of systems today (see January issue "Rise of the mobile processors", PCPro). So every single one of them has to have a hardware block dedicated to accelerating 3D Graphics or Video Decode, and absolutely every single one of those hardware blocks is proprietary and all of them come with a proprietary software library. Whilst reverse-engineering efforts are being called for, the sheer scale of each of these reverse-engineering tasks being faced is enormous, and can be viewed from one perspective as a criminal waste of Free Software Engineers' time (and yet, from another perspective, the efforts can be viewed as absolutely necessary to guarantee Software Freedom). Purely in practical business terms that proprietary software has been demonstrated to have real detrimental effects on the ability of companies to deliver product based around such SoCs. It is well-known in the Software Industry that any practical problems faced with a 3rd party proprietary software leaves any business critically dependent on that proprietary hardware/software vendor. However the consequences of software failure in this instance leave product designers with a very stark choice: abandon the product and start again. The proprietary vendor in this case is too busy working on the latest version of their silicon to deal with 3rd or 4th hand users of their older products. To understand this last statement it's important to remember that Software Engineers who merely use the proprietary libraries when developing for a product based around a particular SoC are at least 3 or 4 times removed from the actual GPU's origins. The GPU hardware often does not even originate with the SoC vendor: it is licensed as a component. So even if you could contact them to explain that there is a problem, not even the SoC vendor could tell you how the GPU in their own processor actually works! By contrast, look at what happened recently when both Valve Software released the source code of their 3D Gaming Engine, and Intel released the source code of their 3D OpenGL Libraries. Both software teams were, when faced with bugs, able to get together and thrash out the problems. They didn't have to sign Mutual NDAs; they didn't have to get permission from the CEO to sign off the meeting. One of the developers described it as "the most successful work trip I've ever had". (see Linux Game News). So this article describes why it is necessary to go to the trouble of getting an entire mass-volume embedded System-on-a-Chip manufactured that is affordable, desirable and at the same time happens to be FSF-Endorseable (http://www.fsf.org/news/endorsement-criteria) It sounds absolutely crazy to have to go through with this, but the sheer bloody-minded intransigence of the existing processor design houses to honour Free Software Licenses and respect Software Freedom - even for practical business reasons - is getting beyond a joke. It is however CRITICALLY IMPORTANT to emphasise that the requirement to be FSF-Endorseable is a key strategic business decision that openly declares and recognises the value that Free Software represents: decreased development costs, simplified development and faster time to market. The goal REALLY IS to create a highly-successful processor that will be sold in volumes of several million, that has modern up-to-date features such as HDMI, SATA-II, the capability to do 3D Graphics and the capability to do 1080p30 video decode - all using general-purpose Software algorithms. The target markets include Laptops, Desktops, Low-power Low-end Servers, Scientific Processing and Cluster Computing, Tablets, Games Consoles, IPTVs and so on. The fact that it is a general-purpose processor which has enhanced integrated capability to perform as a GPU (Graphics and Video Processing Unit) is a deliberate choice. Firstly it's necessary to go through the options that are presently available, to see why they are either unuseable or undesirable.

Systems with MALI 400, Vivante GC, PowerVR etc.

These are undesirable because the libraries are a one-shot deal, compiled by the designers of the hardware, and if you don't like the compile options or you encounter a bug, it is impossible for you to fix the problem. When building hardware which has a built-in GPU, if the software doesn't work but the rest of the hardware is otherwise perfect, you just wasted all hardware and software development time on that SoC up until that point, because you cannot replace a built-in GPU. As an example: two SoCs, the S3C6410 and the S5PC100 by Samsung, used an unusual GPU. The OpenGL ES 2 software library worked great... as long as the initialisation sequence was called in a very specific order that happened to be commonly used in Android's core. All other software - including 3rd party Android applications - which happened to call those functions in a different sequence would crash. Fixing this situation across the entire industry is impossible: requests have been made for several years to each of the designers of these GPUs, and they are not listening.

Systems with NVidia, ATI, SiS, Intel Graphics etc. with source code available

These fall into two categories: separate GPUs and integrated GPUs. It doesn't matter if they're separate or if they're integrated: all of them are too power-hungry (between 8 and 1,000 watts), or they are integrated into CPUs that are themselves too power-hungry to be considered viable for the Embedded market. So even in cases where the GPU has been reverse-engineered, or the Graphics Design Team has had permission to release and work on Free Software Drivers (as is the case with the Intel Team), such that there is a full Free Software 3D Driver Stack, it doesn't make any difference because either the GPU or its CPU is way over the power budget for an embedded processor. The only exception is the upcoming 22nm Intel ValleyView SoC, which is apparently to be sampling at the end of 2013 (over a year away). It is targetted at low-power applications, yet will provide high-performance 3D Graphics and will run "Gen 7 (Ivy Bridge) Graphics". Even here, however, they still plan to use PowerVR's Video decode engines. These are proprietary engines with proprietary libraries: it is unlikely that Imagination Technologies will provide details, thus making the entire chip non-endorseable. So there really aren't any choices here, either, even when the 3D GPU Engine's source code is available, because those GPUs that have been around for long enough were from a generation where power consumption was not an issue.

So if the existing CPU-GPU combinations aren't viable, what's the alternative?

The alternative is to use an existing processor design that has general-purpose instructions which happen to have been designed from the ground up with a view to accelerating both Video processing and 3D Graphics. And, rather than have separate processors for each purpose, combine them into one single multi-threaded, multi-core processor.

Does this design already exist?

Yes it does. The first version has been through an MVP Programme and has proven silicon: it's a dual-core part in 65nm. Running at a lowly 600mhz and using only 6 of its 8 hardware threads, it can out-perform a 1.5ghz ARM Cortex A9 with NEON instructions by a factor of 2 in doing 720p H.264 Decode. This chip can do 720p H.264 decode at 30fps (at 600mhz) whereas the 1.5ghz ARM Cortex A9 can barely manage 15fps. To bring it up to 1080p30 therefore requires a raw performance increase of 2.25 (as well as increasing memory throughput etc.). The performance increase can be provided by increasing the clock rate to 800mhz and doubling the number of cores, thus giving a raw performance increase of about 2.7 times. In order to bring the power consumption back down, yet also keep costs to a minimum and increase the chances of success, a move to 40nm is the safest bet.

What interfaces does the current version have?

Not enough to make it spectacularly successful in the current market. If the current design had been released in 2006 it would have been highly successful.

What interfaces will the future version have?

SATA-II, Gigabit Ethernet, HDMI 1.4, NAND Flash, DDR3 32-bit @ 1333mhz, USB-3, USB-OTG, I2S, I2C, RS232, SD/MMC, GPIO, PWMs and so on - all the things to be expected of a modern general-purpose mass-volume System-on-a-Chip.

Why FSF-Endorseable? Why not use a proprietary GPU or VPU?

One of the key reasons not to use a proprietary GPU or VPU is that it is incredibly hard to work with hardware-software combinations where you cannot see what is going on. Talk to Canonical, Linaro and so on, as well as to countless developers. The pain that they go through to get 3D Graphics or Video Decoding to work with the proprietary GPU and VPU vendors is just disgraceful. Various developers create new standards that make life simple to work with, yet the SoC developers (silicon houses) have absolutely no idea what those standards are, and just want something to "work". Usually that means "Android", pre-compiled and packaged up to sell to Factories in ready-made and often Copyright-violating form. So whilst Free Software Developers are working on the latest standards for Video and 3D interoperability, the hardware that's hitting the retail stores is using software that is years behind the times, and is impossible to work with even "inside the box", let alone "outside". Contrast this with a situation where it would LITERALLY be possible to take the c source code for ffmpeg and its associated libraries, and merely hit "compile". Likewise for the Free Software OpenGL Reference Platform source code, MesaGL. Simply hit "compile" and it's turned into accelerated code by the specialist compiler provided by this new SoC's designers. The difference here is in the architecture typically used in proprietary GPUs and VPUs. Normally these are separate units, which require complex communications paths, asynchronous computing, complex multi-threaded approaches across userspace-kernelspace boundaries and so on. With the approach taken by the chosen hardware designers, all that hardware-software complexity is gone: the CPU is the GPU; the CPU is the VPU. So there is a clear-cut business case for using a processor core that's backed up by a highly optimised Free Software compiler, which can take straight c and c++ source code and turn it into accelerated binaries. The time saved on product development is immense. What is absolutely fantastic is that not only is this a more profitable business, nor just a way to make technology work better and more reliably, but those clear-cut business goals can be achieved whilst also respecting Software Freedom.

How much is it going to cost to get this processor made?

About $USD 10m, to be on the safe side. $1.5m will be for licensing of the modern interfaces such as DDR3, HDMI, SATA-II, USB-3 and RGMII. Around $1m will be for the Mask charges. Around $2m in engineering costs. The rest of the funds is there to ensure safety margins in case the masks need to be redone.

What's the target price in mass-volume?

This is tricky to answer! We know what price to aim for: it's around $USD 7.50 and that's set by Ingenic's latest 1ghz MIPS-compatible processor (the Jz4770). However, the actual performance and features of the planned chip exceed or meet those of many Quad-Core ARM Cortex A9 SoCs at 1.5ghz, so it should in theory be priced closer to those. According to figures published online, Freescale's Quad-core 1ghz iMX6 is planned for around $30, and the Dual 1ghz around $23 in 1k volumes, which sets an upper comparable limit. Taking things a step further, however, and going slightly outside of the "embedded" processor range by going to a 3 Watt power budget by putting in 8 cores, a raw performance figure of 38 GFLOPS can be achieved at a lowly 800mhz. According to a Processor GFLOPS Comparison chart, that would make it comparable to a 3Ghz AMD Phenom II x4 940s, as well as an Intel Core i7 920 at 3.2ghz. Those processors are around the 100 Watt mark, and Intel Core i7 processors can cost as much as $700. So there is quite a range, here. One distinct advantage that this SoC has is that there is no licensing costs involved (no NREs and no royalties) for 3D or Video Processing Engines.

What's the next step?

Find investors! We need to move quickly: there's an opportunity to hit Christmas sales if the processor is ready by July 2013. This should be possible to achieve if the engineers start NOW (because the design's already done and proven: it's a matter of bolting on the modern interfaces, compiling for FPGA to make sure it works, then running verification etc. No actual "design" work is needed).

Contact details

web : http://www.qimod.com email: info@qimod.com skype: lkcl..