• Suppliers

    Unleash the power of the TenByTen6410

    The TenByTen6410 is a power house, no question. However, coming from the MINI2440v2 the visual effects and performance are not as “earth shaking” as expected or hoped for. Are you one of those users who feels disappointed by the graphical performance? Do you want to know what you have not done right in the past or even now?

    In this five part series of How To's, Sven explains step by step how to unleash the hidden power within the S3C6410, and how to make pleasant GUIs for the MINI2440 and TenByTen6410 boards.

    • Part I discusses the hardware and some common principles
    • Part II dives deep into the TFT controller (virtual screens, panning and more)
    • Part III explains how to create stunning visuals just by using the TFT controller at its max
    • Part IV helps to understand the principles of hardware 2D acceleration
    • Part V squeezing everything together into UltimaGUI

    Part I

    Why do vanilla/dev/fb0, directFB, SDL and XORG disappoint?

    Did you buy an S3C6410 based board and expected razor sharp visuals at astounding speeds as those you grew used to on your iPhone?  Did you try several packages from debian/emdebian and they all disappointed you? Well, that was expected! Don’t get me wrong here, the graphics are "OK", but it should be better, MUCH better, and we’ll show you how to do it.

    To understand why they're disappointing in the way they currently are, we will have to understand what is going on in our system.

    Let’s say we connect a 7inch (800x480px) display, like the MegaDisplay 7 or other ones to our board. I know that this is a red flag for a lot of you, because of bad experiences. However it is also by far Hiteg's best selling display, thus support is needed.

    The display's depth is normally set to 16Bit RGB in 565 layout, which means that red gets 5Bits, green gets 6Bits, and blue gets 5Bits out of our 16bits. That calculates to 800x480x2 Bytes = 750KB frame buffer. We refresh the TFT sixty times a second which equals to approx. 44MB per second. That’s rather fair, as we use DDR RAM at 133MHz speed in 32Bit on the TenByTen6410. That means the RAM can, ideally, move 133Mx2x4Bytes which is approximately 1GByte/sec. This is, no wonders, double the rate the MINI2440’s RAMs are able to move. Please rest assured that these are the theoretical values and not what the RAMs are really moving on request. You need approximately nine cycles to store the address and bank registers of the RAM, for any given address, you read one 32Bit word from a random address, you’ll have to wait nine cycles. Thus the ideal rate is around 2/3 of above theoretical values, if addresses are not randomly set. But that gives you still 667MBytes on the 6410 and 330MBytes on the MINI2440 side. SAMSUNG has a nice example in the User’s Manual to calculate the memory bus occupation in %:

    For the MINI2440

    With 32bit SDRAM (Trp=2 (HCLK) ; Trcd=2 (HCLK) ; CL=2 (HCLK) ) and HCLK frequency is 100MHz
    LCD DMA Burst Count (Times/s) = LCD Data Rate(Byte/s) /32(Byte) ; LCD DMA using 8words(32Byte) burst
    LCD Data Rate = 16(bpp) x 800 x 480 x 60 / 8 = 43.95Mbyte/s 
    LCD DMA Burst Count = 43.95MB / 32 = 1.375M/s
    Pdma = (Trp+Trcd+CL+(2 x 4)+1) x (1/100MHz) = 0.150ms 
    LCD System Load = 1.375 x 0.150 = 0.206
    System Bus Occupation Rate = (0.206/1) x 100 = 20.6%

    For the TenByTen6410

    With 32bit DDR SDRAM (Trp=1 (HCLK) ; Trcd=1 (HCLK) ; CL=2 (HCLK) ) and HCLK frequency is 133MHz.    
    LCD DMA Burst Count (Times/s) = LCD Data Rate(Byte/s)/64(Byte) ; LCD DMA using 16words(64Byte) burst
    LCD Data Rate = 16(bpp) x 800 x 480 x 60 / 8 = 43.95Mbyte/s 
    LCD DMA Burst Count = 43.95MB / 64 = 0.69M/s
    Pdma = (Trp+Trcd+CL+(2 x 4)+1) x (1/133MHz) = 0.098ms 
    LCD System Load = 0.69 x 0.098 = 0.068
    System Bus Occupation Rate = (0.068/1) x 100 = 6.8%

    This means we spend 20% of the memory band width on the MINI2440 and 7% on the TenByTen6410 for “just” updating the display. At least we don’t wonder anymore why modern graphic cards need 128/256bit memory bus width with GDDR3/4/5.

    If we now look how common software has to deal with graphics under LinuxFB we can see that most use an back buffer for their drawing. That buffer gets copied as a resulting image later to the frame buffer, sometimes with color reduction from 32Bit (ARGB) to 16Bit RGB(565) and additional dithering (whoa!). Most modern packages are using alpha blending and translucent effects to improve their visuals. Those techniques need to a) read the source pixel, b) the destination pixel, sometimes other pixels in the neighbourhood too. The software then calculates the effect (blending, translucence, blurring, etc.) pixel by pixel and stores it c) back into the destination. Once finished they have to copy the result into the Linux Frame buffer which results in, at least, if no dithering is involved, one read and one write. On the MINI2440 we can easily see how QT beats the guts out of the system, as each pixel needs at least two Read and one Write access to the RAM. The 6410 is easier here and is still rocking, but, is the speed impressive? "Nahh!" Some might say, "sure it’s that slow, you’d have to use OpenGLES shaders for that, those free the CPU". And they are right, on the CPU part, the memory bandwidth is still effected and a big slow down for the entire system. What is true and fast on the desktop is mostly wrong on a SOC! Everyone is doing OpenGL accelerated GUIs/Desktop now, right?

    Tip for now:

    Do not use extensive alpha blending, translucent effects, etc.

    So what can I use then?

    The approach shown below is useable for the MINI2440 and TenByTen6410 alike. It’s the easiest approach and also the very end of what’s possible on the MINI2440 due to its limited TFT controller. We promise you don’t get it faster under Linux! GUI toolkits like QT, GTK and others are developed for Desktop systems in mind. The graphic cards found there are in no comparison with what we have to deal with here. Our tip would be to write your own drawing routine, use one color as transparency stencil, best the 0.  Don’t let QT draw it for you as they make extensive use of alpha buffers. Extent QT in a way that it uses this old stencil technique, or better, use QT on your desktop to tinker out your GUI needs, and write your own small GUI kit. You will see that your application will only use a very basic subset of widgets found in QT. A Canvas, a Label, a Button, a Slider, a List and maybe some sort of pane to group things. That’s 90% of your applications needs.  Furthermore, your application doesn’t need to be versatile. You can bake all buttons with text in each state (normal, pressed, disabled), to save memory bandwidth. Do the same with labels from which you know they will not change ( static text ). Use 8Bit graphics and a color table with RGB565 values. It will all fit into the CPU data cache and be very fast. Use 0 as stencil color as this normally results in a bit faster binary (brz (branch when zero), brnz (branch not zero) instead of cmp commands). This gives nice looking visuals in 16Bit RGB frame buffer. If your application could accept 255 colors, switch the frame buffer to 8bit mode. You can let the TFT controller do the color look up for you.

    Don’t know how to start? Have a look in tslib’s demo programs! Combine all your images (except font bitmaps) into one big image for one given color map, instead of loading each and any at a time. You will see you end up with just one image, if your GUI does not feature a rainbow theme. We suggest using the BMP file format as it is simple enough to be implement on an evening, and very light thus you don’t waste memory on a PNG library for example. You could, however, use an Run Length Encoding scheme to shorten NAND loading times, but we don’t think that it’s useful as you’d have to a) load the encoded one into the memory and b) decode it to another part of the memory. Again, we want to save memory bandwidth and NAND Flash reads are not that slow. But test it out by yourself and use the fastest approach!

    A Minimal GUI

    As a starting point a C struct for your own little GUI:

    typedef struct widget_t
        unsigned int ID;    // 
        unsigned int x, y, width, height; // where to draw
        unsigned int state; // 0=disabled, 1=normal; 2=pressed;
        unsigned int ts_x, ts_y, ts_width, ts_height; // your touch screen rect 
        char * str_value;
        float  float_value;
        int    int_value;
        unsigned short *color_map; // pointer to the color map used for this widget;
        unsigned int need_update; // if true, blit the data to frame buffer
        unsigned char *source_data; // that would be the beginnig of your big image in memory
        unsigned int src_x, src_y;  // the position where to find the data;
                                    // we handle it normally like that, that we add src_x+=state * width;
                                    // in this case we chain the single images horizontally. 
                                    // use src_y accordingly for vertically stacked images
        struct widget_t *first_sibling;
        struct widget_t *last_sibling; // a linked list for collection widgets, otherwise NULL
        struct widget_t *next;        // pointer to the next widget
        void (* callback)(struct widget_t *) // you can register a call back function for a widget and get a pointer to the widget as parameter
        void (* draw)(struct widget_t *); // the pointer to the draw function, which can be different for each widget
        ... // you could add some getter and setter methods for widget's value to keep all three types in sync if needed.

    Starting from here you can derivate different widgets, give each type it’s own drawing routine as needed. If you bitblit your data, use clipping and maybe some sort of index to identify which (expensive) bitblit you really have to do. Is a widget in viewport? Calculate clipping width and height before your loop starts….minimize calculations inside of the loops, no multiplications!! ...you know the drill, right? In addition you’d need some functions to add widgets to a list, run through lists and maybe delete elements from a list. Obviously you’d need an event handler which reacts to user input ( touch screen, keyboard, GPIO) and distribute those to listeners. A good point for reading / investigating can be found in GP32x related source code, where a lot of bright people programmed solutions for the SAMSUNG S3C2400 SOC. In Mirco’s SDK (shipped with DevKit ARM) you can find assembler bitblit routines, which will serve all your needs. And which, of course, can be used under Linux as well!

    How to handle fonts?

    Fonts are handled in exactly the same way. The main problem here is to decide if you only want to support fixed width fonts, like courier for example, or not. Fixed width fonts are very easy to handle, as each characters starting point, within a bitmap, can be easily calculated. We can’t do that for proportional fonts, though. As it is not difficult per se, it is a lot of work. You could use freetype to render a string into a bitmap. That would be easy enough, but also slow. With some hours of good old manual labour you can create a record for each letter used.

    typedef struct prop_font_t
        unsigned int start_x;
        unsigned int width;
    PROP_FONT vera_sans_12[255];
    vera_sans_12[32].start_x=0; // offset in bitmap
    vera_sans_12[32].width=12;  // width
    ... and so on.

    Obviously you want to do this only for those letters you are really going to use. Your font BitBlit routine would be feed with values from above for each character. This is def. not a nice work, but the result will be lightning fast.

    A good point for further reading would be game programming manuals from the beginning of the 90th. Meaning something before the 3D revolution. Something like >>Game Gems<< pops into my mind. Make sure it's for MS-DOS, as Amiga, NES, or similar gaming platforms always had some sort of hardware support. You don't have that!

    Changing the way the TFT controller works

    This is only interesting for TenByTen6410 and modern SOCs. The MINI2440 lacks the ability.

    One reason for poor memory performance, besides the amount of bandwidth needed for the LCD controller, is the way the TFT controller’s DMA works. Whenever a line has to be drawn on the display the frame buffer is read and data is transferred. In the time between two lines the DMA has nothing to do and the CPU ( and other DMA channels ) can access the RAM. This sounds fair! But its bad. The RAMs are only fast when they have to move continuous data with increasing addresses (bulk transfer). They have to spend time by readjusting the addresses (approx. 9 cycles). More important the RAM can do better than the current pixel clock.

    This is the main reason why in nearly all hand sets and other mobile device TFT displays are equipped with an special panel controller. These controller contain dedicated memory and refresh the panel from there. The data transfer from the CPU to the display can happen now in a one block move per VSync at max speed (see system controller register for DMA priorities). As said the TFT Panel Controller refreshes the attached screen out of its own RAM. Thus, in an ideal world, the CPU would only have to update the display controller’s RAM if changes are needed. The 6410 features the ability to communicate with those panel controllers, named CPU interface, or i80 interface or LDI interface. and not even one, but two in parallel! The 6410 is able to update both in a one-shot manner or continuously. The beauty is that you can lower the rate for updates according to your current applications needs by simply setting a frame skip value in a register within the 6410’s TFT Controller.

    Sounds cool, it is! You can set it to >>no frame skip <<when you play video and up to 30 skipped frames when idle. COOL! What has to be done?

    Add a IOCTL to the frame buffer device which let you adjust the frame skip if an CPU interface display is connected to your board, like Hiteg's MegaDisplay7C(S). Again, this only works if you are in charge of all the drawing involved. If QT's main loop is uninterested in your settings and keeps updating the frame buffer sixty times per second, you win nothing!

    I believe that there is a way to tune directFB in a way to support frame skipping, but I honestly doubt QT would listen. Please, prove me wrong!

    We learned so far:

    1) RGB interface is BAD
    2) CPU interface is GOOD

    The main problem is, and here the circle closes, that no GUI support those kind of features natively. However, if you follow our steps and implement a mini GUI on your own, we assure you, you will not be disappointed. Be the lord of your widget, not the slave of a framework!

    How to do it right then?

    This interesting question will be explained in the next part.

    We hope you have enjoyed our small how to and got some incitements for next projects around the MINI2440V2 or TenByTen6410.


    Sven Riemann



    No specials at this time



    New products

    No new products at this time