Making an FSF Endorseable Mass-volume Embedded Processor

| release: 02 Dec 2012 | edited : 08 Dec 2012 |

The goal:

To bring into production a mass-volume high-performance processor with: modern interfaces (SATA-II, Gigabit Ethernet, USB3) modern capabilities (3D Graphics and 1080p30 Video Decode) no DRM that is guaranteed at all times to be: 100% documented 100% software-programmable and 100% supported by Free Software licensed toolchains and requiring no proprietary libraries of any kind, all the way from boot-up, right through to the applications level.

The Target markets:

Laptops Tablets Desktop systems Open Educational Computing Scientific Cluster-computing (8-core @ 800mhz gives an estimated raw performance of 38 GFLOPS, in only 2.7 Watts) Low to mid range Games Consoles IP TV Boxes

The justification:

    Proprietary 3D Graphics and Video Engines come with software libraries
    that are too complex for the average factory to deal with, and where
    bugs or unsuitable compiler options cannot be corrected (at all).
    The strategy is therefore to dramatically simplify product development,
    streamlining the entire process of delivering real products to market.

The deadline:

    July 2013 for first mass-produced silicon

The cost:

    $USD 10 million



Introduction
There are plenty of embedded System-on-a-Chip processors available: Apple
has one that is custom-made for its products.  Samsung has dozens of
processors for its Mobile Phones and Smart Tablet products.  Freescale,
Texas Instruments, Rockchips, Allwinner, Ingenic - the list goes on and
on.

Yet not a single one of these modern mass-volume embedded processors
is 100% Free Software Friendly.  The reason is that in the embedded
world, the core processor is actually quite slow, and cannot on its own
deliver modern performance (for example 1080p video decode) expected of
systems today (see January issue "Rise of the mobile processors", PCPro).
So every single one of them has to have a hardware block dedicated to
accelerating 3D Graphics or Video Decode, and absolutely every single
one of those hardware blocks is proprietary and all of them come with a
proprietary software library.  Whilst reverse-engineering efforts are
being called for, the sheer scale of each of these reverse-engineering
tasks being faced is enormous, and can be viewed from one perspective
as a criminal waste of Free Software Engineers' time (and yet, from
another perspective, the efforts can be viewed as absolutely necessary
to guarantee Software Freedom).

Purely in practical business terms that proprietary software has
been demonstrated to have real detrimental effects on the ability of
companies to deliver product based around such SoCs.  It is well-known
in the Software Industry that any practical problems faced with a 3rd
party proprietary software leaves any business critically dependent on
that proprietary hardware/software vendor.  However the consequences of
software failure in this instance leave product designers with a very
stark choice: abandon the product and start again.  The proprietary vendor
in this case is too busy working on the latest version of their silicon
to deal with 3rd or 4th hand users of their older products.

To understand this last statement it's important to remember that Software
Engineers who merely use the proprietary libraries when developing
for a product based around a particular SoC are at least 3 or 4 times
removed from the actual GPU's origins.  The GPU hardware often does
not even originate with the SoC vendor: it is licensed as a component.
So even if you could contact them to explain that there is a problem,
not even the SoC vendor could tell you how the GPU in their own processor
actually works!

By contrast, look at what happened recently when both Valve Software
released the source code of their 3D Gaming Engine, and Intel released
the source code of their 3D OpenGL Libraries.  Both software teams were,
when faced with bugs, able to get together and thrash out the problems.
They didn't have to sign Mutual NDAs; they didn't have to get permission
from the CEO to sign off the meeting.  One of the developers described it
as "the most successful work trip I've ever had".
(see Linux Game News).

So this article describes why it is necessary to go to the trouble of
getting an entire mass-volume embedded System-on-a-Chip manufactured
that is affordable, desirable and at the same time happens to be
FSF-Endorseable (http://www.fsf.org/news/endorsement-criteria)

It sounds absolutely crazy to have to go through with this, but the
sheer bloody-minded intransigence of the existing processor design
houses to honour Free Software Licenses and respect Software Freedom - even
for practical business reasons - is getting beyond a joke.

It is however CRITICALLY IMPORTANT to emphasise that the requirement to be
FSF-Endorseable is a key strategic business decision that openly declares
and recognises the value that Free Software represents: decreased development
costs, simplified development and faster time to market.

The goal REALLY IS to create a highly-successful processor that will be
sold in volumes of several million, that has modern up-to-date features
such as HDMI, SATA-II, the capability to do 3D Graphics and the capability
to do 1080p30 video decode - all using general-purpose Software algorithms.

The target markets include Laptops, Desktops, Low-power Low-end Servers,
Scientific Processing and Cluster Computing, Tablets, Games Consoles,
IPTVs and so on.

The fact that it is a general-purpose processor which has enhanced
integrated capability to perform as a GPU (Graphics and Video Processing
Unit) is a deliberate choice.

Firstly it's necessary to go through the options that are presently available,
to see why they are either unuseable or undesirable.

Systems with MALI 400, Vivante GC, PowerVR etc.
These are undesirable because the libraries are a one-shot deal, compiled
by the designers of the hardware, and if you don't like the compile
options or you encounter a bug, it is impossible for you to fix the problem.
When building hardware which has a built-in GPU, if the software doesn't
work but the rest of the hardware is otherwise perfect, you just wasted
all hardware and software development time on that SoC up until that point,
because you cannot replace a built-in GPU.

As an example: two SoCs, the S3C6410 and the S5PC100 by Samsung, used an
unusual GPU.  The OpenGL ES 2 software library worked great... as long
as the initialisation sequence was called in a very specific order that
happened to be commonly used in Android's core.  All other software -
including 3rd party Android applications - which happened to call those
functions in a different sequence would crash.

Fixing this situation across the entire industry is impossible: requests
have been made for several years to each of the designers of these GPUs,
and they are not listening.

Systems with NVidia, ATI, SiS, Intel Graphics etc. with source code available
These fall into two categories: separate GPUs and integrated GPUs.
It doesn't matter if they're separate or if they're integrated: all
of them are too power-hungry (between 8 and 1,000 watts), or they are
integrated into CPUs that are themselves too power-hungry to be considered
viable for the Embedded market.

So even in cases where the GPU has been reverse-engineered, or the Graphics
Design Team has had permission to release and work on Free Software Drivers
(as is the case with the Intel Team), such that there is a full Free Software
3D Driver Stack, it doesn't make any difference because either the GPU or
its CPU is way over the power budget for an embedded processor.

The only exception is the upcoming 22nm Intel ValleyView SoC, which is
apparently to be sampling at the end of 2013 (over a year away).  It is
targetted at low-power applications, yet will provide high-performance 3D
Graphics and will run "Gen 7 (Ivy Bridge) Graphics".  Even here, however,
they still plan to use PowerVR's Video decode engines.  These are
proprietary engines with proprietary libraries: it is unlikely that
Imagination Technologies will provide details, thus making the entire
chip non-endorseable.

So there really aren't any choices here, either, even when the 3D GPU Engine's
source code is available, because those GPUs that have been around for long
enough were from a generation where power consumption was not an issue.

So if the existing CPU-GPU combinations aren't viable, what's the alternative?
The alternative is to use an existing processor design that has
general-purpose instructions which happen to have been designed from
the ground up with a view to accelerating both Video processing and 3D
Graphics.  And, rather than have separate processors for each purpose,
combine them into one single multi-threaded, multi-core processor.

Does this design already exist?
Yes it does.  The first version has been through an MVP Programme and
has proven silicon: it's a dual-core part in 65nm.  Running at a lowly
600mhz and using only 6 of its 8 hardware threads, it can out-perform a
1.5ghz ARM Cortex A9 with NEON instructions by a factor of 2 in doing 720p
H.264 Decode.  This chip can do 720p H.264 decode at 30fps (at 600mhz)
whereas the 1.5ghz ARM Cortex A9 can barely manage 15fps.

To bring it up to 1080p30 therefore requires a raw performance increase
of 2.25 (as well as increasing memory throughput etc.).  The performance
increase can be provided by increasing the clock rate to 800mhz and
doubling the number of cores, thus giving a raw performance increase of
about 2.7 times.

In order to bring the power consumption back down, yet also keep costs
to a minimum and increase the chances of success, a move to 40nm is the
safest bet.

What interfaces does the current version have?
Not enough to make it spectacularly successful in the current market.
If the current design had been released in 2006 it would have been
highly successful.

What interfaces will the future version have?
SATA-II, Gigabit Ethernet, HDMI 1.4, NAND Flash, DDR3 32-bit @ 1333mhz, USB-3,
USB-OTG, I2S, I2C, RS232, SD/MMC, GPIO, PWMs and so on - all the things to
be expected of a modern general-purpose mass-volume System-on-a-Chip.

Why FSF-Endorseable?  Why not use a proprietary GPU or VPU?
One of the key reasons not to use a proprietary GPU or VPU is
that it is incredibly hard to work with hardware-software combinations
where you cannot see what is going on.  Talk to Canonical, Linaro and so
on, as well as to countless developers.  The pain that they go through
to get 3D Graphics or Video Decoding to work with the proprietary GPU
and VPU vendors is just disgraceful.

Various developers create new standards that make life simple to
work with, yet the SoC developers (silicon houses) have absolutely
no idea what those standards are, and just want something to "work".
Usually that means "Android", pre-compiled and packaged up to sell to
Factories in ready-made and often Copyright-violating form.

So whilst Free Software Developers are working on the latest standards
for Video and 3D interoperability, the hardware that's hitting the
retail stores is using software that is years behind the times, and
is impossible to work with even "inside the box", let alone "outside".

Contrast this with a situation where it would LITERALLY be possible
to take the c source code for ffmpeg and its associated libraries, and
merely hit "compile".  Likewise for the Free Software OpenGL Reference
Platform source code, MesaGL.  Simply hit "compile" and it's turned
into accelerated code by the specialist compiler provided by this new
SoC's designers.

The difference here is in the architecture typically used in proprietary
GPUs and VPUs.  Normally these are separate units, which require complex
communications paths, asynchronous computing, complex multi-threaded
approaches across userspace-kernelspace boundaries and so on.  With the
approach taken by the chosen hardware designers, all that hardware-software
complexity is gone: the CPU is the GPU; the CPU is the VPU.

So there is a clear-cut business case for using a processor core that's
backed up by a highly optimised Free Software compiler, which can
take straight c and c++ source code and turn it into accelerated binaries.
The time saved on product development is immense.  What is absolutely
fantastic is that not only is this a more profitable business, nor just
a way to make technology work better and more reliably, but those clear-cut
business goals can be achieved whilst also respecting Software Freedom.

How much is it going to cost to get this processor made?
About $USD 10m, to be on the safe side.  $1.5m will be for licensing
of the modern interfaces such as DDR3, HDMI, SATA-II, USB-3 and RGMII.
Around $1m will be for the Mask charges.  Around $2m in engineering costs.
The rest of the funds is there to ensure safety margins in case the
masks need to be redone.

What's the target price in mass-volume?
This is tricky to answer!  We know what price to aim for: it's around $USD
7.50 and that's set by Ingenic's latest 1ghz MIPS-compatible processor
(the Jz4770).  However, the actual performance and features of the planned
chip exceed or meet those of many Quad-Core ARM Cortex A9 SoCs at 1.5ghz,
so it should in theory be priced closer to those.  According to figures
published online, Freescale's Quad-core 1ghz iMX6 is planned for around $30,
and the Dual 1ghz around $23 in 1k volumes, which sets an upper comparable
limit.

Taking things a step further, however, and going slightly outside of the
"embedded" processor range by going to a 3 Watt power budget by putting
in 8 cores, a raw performance figure of 38 GFLOPS can be achieved at a
lowly 800mhz.  According to a Processor GFLOPS Comparison chart,
that would make it comparable to a 3Ghz AMD Phenom II x4 940s, as well as
an Intel Core i7 920 at 3.2ghz.  Those processors are around the 100 Watt
mark, and Intel Core i7 processors can cost as much as $700.

So there is quite a range, here.  One distinct advantage that this SoC has
is that there is no licensing costs involved (no NREs and no royalties)
for 3D or Video Processing Engines.

What's the next step?
Find investors!  We need to move quickly: there's an opportunity to hit
Christmas sales if the processor is ready by July 2013.  This should be
possible to achieve if the engineers start NOW (because the design's
already done and proven: it's a matter of bolting on the modern interfaces,
compiling for FPGA to make sure it works, then running verification etc.
No actual "design" work is needed).



Contact details
web  : http://www.qimod.com
email: info@qimod.com
skype: lkcl..