Apr 212014
 

First of all, remove xpra and cython if you had them installed:

aptitude purge xpra cython

Update your package lists, as we are going to install a lot of packages:

aptitude update

Prepare required prerequisites

Then follow the instructions on the xpra Wiki for building Ubuntu / Debian style:

apt-get install libx11-dev libxtst-dev libxcomposite-dev libxdamage-dev \ python-all-dev python-gobject-dev python-gtk2-dev

apt-get install xvfb xauth x11-xkb-utils
apt-get install libx264-dev libvpx-dev libswscale-dev libavcodec-dev

The file mentioned in the how-to, vpx.pc should exist:

cat /usr/lib/pkgconfig/vpx.pc

You will need to install and compile Cython from sources, as the version in the Raspbian repository is too old (0.15.1 vs. 0.16 minimum needed).

wget http://www.cython.org/release/Cython-0.20.1.tar.gz
tar -xzf Cython-0.20.1.tar.gz

change into the newly extracted directory. Install cython:

python setup.py install

This will take quite a while. Test that you have the correct cython version:

cython --version

should yield Cython version 0.20.1

Download and extract source

wget https://www.xpra.org/src/xpra-0.12.3.tar.bz2
tar -xjf xpra-0.12.3.tar.bz2

Note: there may be a newer package, check, please.

Change into the extracted directory. We need to apply a patch:

patch < patches/old-libav.patch

Enter xpra/codecs/dec_avcodec/decoder.pyx as the file to patch

Next patch (several files in one go):

patch < patches/old-libav-pixfmtconsts.patch

Simply copy and paste the “Index file” the patcher asks for, for example xpra/codecs/csc_swscale/colorspace_converter.pyx

Next patch (also several files):

patch < patches/old-libav-no0RGB.patch

Act like above (copy & paste file name, without leading / ).

It also contains a useful README, which tells you the next step is:

./setup.py install --home=install

After the compilation is done, you should either (always) set the Pythonpath to include the install subdirectory, like this:

export PYTHONPATH=$PWD/install/lib/python:$PYTHONPATH

or install the “finished” files to the appropriate targets. From the install directory do:

cp bin/* /usr/bin/.
cp -R lib/* /usr/lib/.
cp -R share/* /usr/share/.

xpra will now be the newest version:

xpra –version

xpra v0.12.3

You will still have to set the PYTHONPATH to the new files in /usr/lib/python, though:

The PYTHONPATH environment variable needs to be set:

export PYTHONPATH=/var/lib/python:$PYTHONPATH

 

Test & Test results

OK, here’s how to set up a test session:

Set up a test server, which has xpra installed (you can install it through the winswitch packages, will get you the newest xpra version on Ubuntu & Debian)

Start X Windows, open LXTerminal, run the following commands.

export PYTHONPATH=/var/lib/python:$PYTHONPATH

Start an xpra session via SSH (can be killed using Ctrl-C, and reconnected to using the same command):

xpra start ssh:maxcs@192.168.1.61:122 –start-child=xterm –encoding=h264

Read the manpage (man xpra) to have a look at some other options

Test results

xpra-raspberry-h264

rgb, png encodings are too high-latency.

jpeg is barely usable, even when resizing the application (for instance Abiword) to not full-screen usage.

webm encoding delivers worse quality, but seems a bit more usable

h264 decoding is NOT done in hardware in the default code (we’ll look into this). Surprisingly it is still the “most fluid to use” one.

I suspect that no decoding in H.264 is taking place, and server side xpra falls back to a different encoder (webm?) Anyways, one can even “watch” videos (a couple of frames each second with heavy artifacts) with this.

For very light administration / checking of remote contents, etc. xpra can be used as is. We will need to enable hardware decoding of h264, though, for it to yield real benefits.

Please note: our interests solely rest in streaming TO the Raspberry Pi, not FROM the Raspberry Pi – we will not test / patch in order to speed up administration of the Pi at this point.

 

Notes & Further reading

Dependencies of xpra package:

(you can show this using “apt-cache showpkg xpra” on a machine which has the package in the newer version, e.g. Ubuntu AMD64):

Dependencies:
0.12.3-1 – python2.7 (0 (null)) python (2 2.7.1-0ubuntu2) python (3 2.8) libavcodec53 (18 4:0.8-1~) libavcodec-extra-53 (2 4:0.8-1~) libavutil51 (18 4:0.8-1~) libavutil-extra-51 (2 4:0.8-1~) libc6 (2 2.14) libgtk2.0-0 (2 2.24.0) libswscale2 (18 4:0.8-1~) libswscale-extra-2 (2 4:0.8-1~) libvpx1 (2 1.0.0) libx11-6 (0 (null)) libx264-120 (0 (null)) libxcomposite1 (2 1:0.3-1) libxdamage1 (2 1:1.1) libxext6 (0 (null)) libxfixes3 (0 (null)) libxrandr2 (2 4.3) libxtst6 (0 (null)) python-gtk2 (0 (null)) x11-xserver-utils (0 (null)) xvfb (0 (null)) python-gtkglext1 (0 (null)) python-opengl (0 (null)) python-numpy (0 (null)) python-imaging (0 (null)) python-appindicator (0 (null)) openssh-server (0 (null)) python-pyopencl (0 (null)) pulseaudio (0 (null)) pulseaudio-utils (0 (null)) python-dbus (0 (null)) gstreamer0.10-plugins-base (0 (null)) gstreamer0.10-plugins-good (0 (null)) gstreamer0.10-plugins-ugly (0 (null)) python-gst0.10 (0 (null)) openssh-client (0 (null)) ssh-askpass (0 (null)) python-numeric (0 (null)) python-lz4 (0 (null)) keyboard-configuration (0 (null)) xpra:i386 (0 (null))

CheckInstall

Optional: install checkinstall, to create a package which you can easily remove or re-deploy to other computers:

aptitude install checkinstall

 

Troubleshooting

Patches

error: implicit declaration of function ‘avcodec_free_frame’

you need to apply the patch patches/old-libav.patch

error: ‘AV_PIX_FMT_YUV420P’ undeclared

you need to apply the patch patches/old-libav-pixfmtconsts.patch

error: ‘PIX_FMT_0RGB’ undeclared

you need to apply the patch patches/old-libav-no0RGB.patch

The other patches were NOT needed in my experimental compilation.

 

ImportError: No module named xpra.platform

Once you try to execute xpra (from LXTerminal preferably), you may get this message. The PYTHONPATH environment variable needs to be set:

export PYTHONPATH=/var/lib/python:$PYTHONPATH

Apr 202014
 

We’re working on streaming a multimedia remote desktop to the Raspberry Pi.

In the future we envision, you shall be able to use a webbrowser on the Pi at normal speeds – including YouTube videos, etc. – operate with CPU / GPU intensive applications – as all the processing is done on a server, and just the H.264 stream rendering on the Raspberry Pi.

First steps

A very interesting application stack to serve this purpose is already available: WinSwitch & xpra

Look at the WinSwitch homepage for installation instructions – it is really quite easy.

WinSwitch bundles several remote clients (VNC / xpra / RDP) with an easy-to-use interface, and broadcasts servers / clients (via Avahi / Bonjour).

A very first demonstration of the capabilities of this stack can be obtained installing WinSwitch on your “normal” desktop machine, and on a server.

Server

As a server we currently use the “fastest available” Intel Atom processor currently on the market –  Intel(R) Atom(TM) CPU  C2750. We are looking into using AMD’s ARM 64 bit processor as server hardware in the future (power-efficiency!), and the performance should be roughly comparable.

Client

As a client we use a Dell Inspiron notebook (with Windows 8.1), Core i7 processor, FullHD resolution.

There is a xpra package available for the Raspberry Pi, which is based on a quite old version of xpra, and will not connect to our server. This may be related to the huge version difference between the two packages, wrong setup, or special tweaking done by winswitch to xpra.

Test results

YouTube videos

We can stream a webbrowser running YouTube fullscreen in FullHD, which will use about 50 % of the server’s total resources (decoding one or several videos in FullHD, encoding one FullHD stream). This is possible in low-latency, at about 30fps and high quality. Yes, this does include an audio stream, too.

The stream uses about 40 Mbp/s of bandwidth, and is much more reliable (less choppy) if streamed over LAN, instead of WLAN. In fast-moving scenes video will still be a bit choppy, but tolerably.

The encoding is done in software, using x264.

Streaming ONE browser window is possible with our server hardware in good quality (for video content) from either the host directly, or from a virtual Ubuntu machine (KVM-virtualized) inside it.

TWO browser windows will start to degrade the quality, even if trying to force best quality and lowest latency.

image

This screenshot shows the YouTube video in the browser being streamed on the client.

Application streaming

Winswitch allows you to stream single applications. Performance / latency is very good on a local network, keyboard / mouse delay is barely noticeable.

In general, applications will be quite useable and seem responsive.

Applications requiring precise mouse / screen cordination, like graphics software will not be usable (at least with our hardware setup).

image

This screenshot shows Firefox being streamed through xpra.

image

VLC media player being streamed – sound works

image

GIMP: barely usable (too much delays)

image

word processing with AbiWord: OK performance (could be better, but it’s usable)

 

Desktop streaming

Streaming a desktop with xnest / Xephyr / xpra from inside a virtualised Ubuntu container on the base hardware is NOT possible at low-latency (30 fps) with high-quality. (With our server hardware)

In order to test it, you have to install additional packages on the server:

aptitude install xnest xserver-xephyr

and set the desktop default to xpra, possibly after restarting the server / client:

image

Apparently frame-grabbing / mirroring the desktop / going through the X layer uses up much more processor resources.

image

Gameplay is quite smooth – but full screen video playback would be a problem.

Some hints

  • authentication with private/public key pairs may be problematic without additional configuration, for first tests I recommend to re-enable password login for SSH.
  • audio for the browser may require alsa and pulseaudio
  • This does not work on the Raspberry Pi yet, this is our next step (compiling a package for it).

 

Outlook

  • encoding performance and latency may be enhanced significantly using NVidia’s NVENC hardware encoding / framegrab API – which xpra supports.
Mar 152014
 

With this interesting tool you can redirect input over the network – control other Linux boxes as if you were physically sitting in front of them and using a USB mouse and keyboard.

The project’s GitHub repository is found here:

https://github.com/MerlijnWajer/uinput-mapper

More documentation is available on this site: http://hetgrotebos.org/wiki/uinput-mapper

Here is an introduction how to set up keyboard and mouse forwarding via SSH to a second Linux box.

keyboard-mouse

Installing uinput-mapper

as user root (sudo su):

aptitude install git-core

cd /opt

git clone https://github.com/MerlijnWajer/uinput-mapper.git

cd uinput-mapper

make

This will check out the tool into the /opt directory. This is not a requirement per se, you can also install it in a different directory of your liking.

make will build the file “uinputmapper/uinput_gen.py”

Use the same procedure on the server.

Connect to server

Test the connection by logging in via SSH and the appropriate key to the server (you need to set this private / public authentication up first, of course – see this article, for instance).

ssh root@192.168.1.61

Should log you in to the remote server. Change the username, (the path to the key if needed – can be specified with the –i option), and the IP address according to your setup.

Log out again.

Have a look at the input devices on your “local” machine, from which you will be redirecting the input:

ls -alh /dev/input/*

Sometimes, keyboards will create two devices – one for the additional (multimedia?) keys.

Connect (with error / stdin logging)

./input-read -G /dev/input/event0 -G /dev/input/event1 -D | ssh root@192.168.1.61 “/opt/uinput-mapper/input-create &>>/tmp/errorlog “

If nothing happens, have a look at /tmp/errorlog on your server:

Traceback (most recent call last):
  File “/opt/uinput-mapper/input-create”, line 73, in <module>
    fd, ev = in_f.load()
EOFError

Try connecting with the compatibility option for Python < 2.7 in this case.

./input-read -C -G /dev/input/event0 -G /dev/input/event1 -D | ssh root@192.168.1.61 “/opt/uinput-mapper/input-create -C &>>/tmp/errorlog2 “

please note, that the compatibility flag needs to be given on BOTH sides (local and remote part) of the command.

You can verify your Python version like this:

root@cloudsource2:/opt/uinput-mapper# python –version
Python 2.7.3

If everything works allright, you can leave away the last part (starting with the ampersand “& …”) which just redirects standard output and error from the server for debugging.

Jan 312014
 

Open source code allows your operating system and application stack to be recompiled for different systems.

Today, with many applications being migrated into the cloud, good performance per Watt of power usage is paramount in keeping power-costs down.

x86 traditionally has not been optimized for best per-Watt performance – Intel is catching up with Atom, especially with the BayTrail SoC for the mobile application area. For microservers Intel has introduced the C2000 “Avoton” Atom SoC.

Let’s look at a couple of alternatives for modern cloud computing.

The ARM architecture is quite big already in the mobile market, getting more and more into the desktop markets (take the Raspberry Pi for instance), and now it’s taking big strides towards servers.

ARM Contender #1: Calxeda

Calxeda developed one of the first (or maybe THE first) ARM-based server module solutions. Their design “EnergyCore ECX-1000” is based on 4 x ARMv7 Cortex A9 cores (32 bit), running at 1.1 – 1.4 GhZ.

Each board has one memory slot for up to 4 GB of RAM (remember, 32 bit!), and four SATA ports per socket, and five 10 GBit/s on-board LAN-ports. They were specified at 1.5 W power usage per core, and 5 W per node.

It was planned originally by Calxeda to produce a “Midway” chip, which would allow for 40 bit memory addressing. Being socket compatible with the ECX-1000’s, the chip would have allowed to address 16 GB of memory.

According to this article, Calxeda was looking to provide a 15 – 20 x price/performance advantage over “traditional” server processors. This article claims Calxeda was also looking at a 5x – 10x performance / Watt increase.

Dell has built a server based on the Calxeda board architecture and donated it to the Apache Software foundation, so they can tweak Apache, Hadoop, Cassandra, … for the architecture. In this server architecture, up to 360 ECX-1000 nodes can be put in a 4U chassis.

HP has also tested the waters with it’s experimental Redstone ARM Server, based on Calxeda technology. It allows up to 288 ECX-1000 nodes in 4U rack space.

Avantek announced machines based on the Calxeda architecture at the end of 2012, with a 3U base machine (four x ECX-1000 cards, some disk drives) weighing in at about 4000 GBP (~ about 4900 €), and a fully “loaded” machine with 48 cards, giving 192 Cores and 192 GB of memory, mix of disk and flash at about 40.000 GBP (~ about 49.000 €). Here’s Avantek’s info page, which also has a comparison to Xeon E5450 on it.

“Ten times the performance at the same power in the same space”.

Calxeda ran out of money in mid-December 2013, and it looks as though they are shutting down operations. The intellectual property may very well be bought by Dell or HP. It had roughly 125 employees by the time the news hit, and they had raised about 90 – 100 Million $ in venture funding. (Have a look at the article to see an actual Calxeda card, with the SATA ports next to each core). Calxeda was also backed by ARM Holdings Inc.

 

Tilera

Tilera has it’s own RISC based design (non-ARM), including many cores (up to 72) in one SoC, interconnected with the “iMesh” non-blocking interconnect, with “Terabits of on-chip bandwidth”. The cores can be programmed in ANSI C/C++ or JAVA. Linux runs on the system – support for the Tilera architecture was added in October 2010, with ver. 2.6.36 of the Linux kernel. The CPU series itself was launched in October 2011.

Facebook claimed, that in their tests the Tilera architecture was about four times more power efficient than the Intel X86 architecture. They ran memcached 

Router & Wireless company MikroTik has a product called “Cloud Core Router” which is based on a 36 core Tilera CPU. To give you an idea of it’s cost: the router retails (depending on the version) for about 1000 € including VAT.

Have a look at this page to see the Cloud Core Router. Tilera has also some evaluation platforms of their own.

 

ARM Contender #2: Marvell ARMADA XP

This is a series of multicore processors, (quad-core ARM). The XP apparently stands for “extreme performance”.

Marvell powers Dell “Copper” ARM Server.

Chinese search giant Baidu has deployed these.

 

ARM Contender #3: AMD

AMD’s getting on the ARM bandwagon. I always liked that company (and despite my criticism of it these days, I also like Intel!) – they are not disappointing me!

The Opteron A1100 is based on the first true 64 bit addressing ARMv8 core, Cortex A57.

The Octo-Core version of the Opteron A1100 is claimed to be “two to four times faster” when compared with the Opteron X2150, with four x64 Jaguar cores. This is an interesting comparison, because both are targeted to be available on the Moonshot platform (see below).

The TDP of the octo-core version of A1100 is 25 W. It contains two 10 GbE ports, eight SATA 6G ports, eight PCIe-3.0 lanes.

Development platforms based on the Opteron A1100 should be available soon. On the developer board, the chip can address up to 32 x 4 GB of memory (four DIMM slots).

AMD predicts that in 2019 the ARM platform will take up about 25 % of the server market.

Read more in AMD’s press release

 

The Moonshot platform

HP is offering different server-modules for the ProLiant Moonshot. The Moonshot platform is intended for cloud computing centers.

Calxeda’s modules (EnergyCore, see above) were also intended to be used for this platform.

HP also uses Intel’s Atom chips for Moonshot. They plan to use Avoton for it (see below for more information about Avoton).

The first Moonshot system is Moonshot 1500 – taking up 4.3 Rack Units, with 45 ProLiant Moonshot Atom S1200, ethernet switch and some more gear, prices start at 50.605 €.

HP wants to offer KeyStone Chips from TI including many DSPs, interesting for instance for content delivery networks (transcoding), etc.

 

Intel: Avoton

With the BayTrail SoC being targeted at the mobile market, Intel has introduced a different SoC for microservers, which is called Avoton (Atom C2000 series being the first representatives). They also have a SoC Rangeley, which shares some of the Avoton platform and manufacturing process, but is targeted at the communications / networking market.

Avoton has eight CPU cores based on the new Silvermont microarchitecture – the first true reworking of the Atom architecture since it’s beginnings. Intel finally introduced out-of-order execution for it.

Configured with two DIMMs per channel, a single Avoton node can support up to 64 GB RAM. It supports four Gigabit Ethernet connections – but no 10 GBit connection.

They have integrated power control tightly into the chip, and have made sensible tradeoffs – for instance wake up latency has not been compromised upon to avoid dropped packets and such.

They have a choice of different products based on Avoton and Rangeley. Ranging from two cores and 6 W, clocked at 1.7 GhZ to eight cores and 20 W, clocked at 2.4 GhZ.

Figures released from Intel indicate that the Atom C2750 (2.4 GhZ, 8 Cores, 20 Watt) easily outperform Marvell’s ARMADA XP (1,33 GhZ, 4 Cores, A9) and Calxeda’s ECX-1000 (1.4 GhZ, 4 cores, A9) in memory bandwidth and General purpose computing. I agree with the article that AMD’s Cortex A57 core with true 64-bit addressing will be the real rival for Avalon, the one it should be compared against.

Intel is targetting the C2000 at “cold storage” applications. Have a look at this PDF to read more about it.

The C2750 supports Intel’s virtualization feature (VT-x), but not VT-d apparently (which is used to “pass through devices” to the virtualized system, e.g. graphics cards, …)

Performance

Have a look at these charts. They even measure against a Raspberry Pi!

Prices

The Atom C2750’s list price is 171 $.

A1SAi-2750F

Supermicro has a motherboard, the A1SAi-2750F, which integrates the C2750.

This board is avaliable at about 340 € including VAT in Germany. It has 4 DDR-3 SO-DIMM slots, 1 x PCIe 2.0 x 8, 1 x VGA, 2 x 2 x USB 3.0, 2 x USB 2.0, 4 x GB LAN, Also 2 x 6 GB/s SATA, and 4 x 3 GB/s SATA.

It is a Mini-ITX board.

ASRock C2750D4I

This is another option, but more expensive, and with only 2 GbE ports.

 

SPEC_int_rate Benchmarks

source one 

  • Opteron A1100, eight core: 80 (simulated) @ 25 W
  • Opteron X2150, four core: 28,1 @ 22 W
  • Atom C2750 (Avolon): 105 @ 28 W
  • Intel Xeon E7-8870 (2,4 GhZ), Deca Core: 1770 @ 105.63 W
Optimization WordPress Plugins & Solutions by W3 EDGE