New article –
SMP affinity and proper interrupt handling in Linux

Affinity

Today I finished working on a new article called “SMP affinity and proper interrupt handling in Linux”. It describes a problem encountered by many system engineers and administrators and that is how to configure the system to properly handle interrupts.

This article concludes a research that I did, those purpose was to find out what really can be done on the subject. I found few simple approaches that, despite their simplicity, improve the situation drastically. Hope you will find the article useful. You can find the article right here.

SMP affinity and proper interrupt handling in Linux

Introduction

Hardware interrupts has always been expensive. Somehow these small pieces of software consume so much CPU power and hardware and software engineers has always been trying to change this state of affairs. Some significant progress has been made. Still hardware interrupts consume lots of CPU power.

You will rarely see effects of interrupt handling on desktop systems. Take a look at your /proc/interrupts file. This file enlists all of your hardware devices and how many interrupts received by each and one of them on each CPU. If you are on a regular desktop system, you will see that number of interrupts that your computer handles is relatively small. Even powerful servers handling millions of packets per second handle only tens of thousands of interrupts per second. Yet these interrupts consume CPU power and handling them properly undoubtedly helps to improve system’s performance.

But really, what can we do about interrupts?

There are many things that can be done. Many Linux distributions ship with kernel that include modifications that significantly improve the situation. Technologies, such as NAPI, reduce number of interrupts and interrupt handling overhead so dramatically, that modern server probably wont be able to sustain a 1Gbps Ethernet link. NAPI is part of kernel for quiet some time. Other things include interrupt coalescence.

In this article I would like to address one of the most powerful techniques to optimize interrupt handling.

SMP affinity

The SMP affinity or processor affinity term has quiet broad meaning and requires an explanation. The word affinity addresses proximity of a certain task to certain processor within multi-processor system. I.e. when processor X runs process Y, they are affine to each other. The processor has parts of process’s memory in cache, thus constantly moving the process to different processor when scheduling it, would probably mean less effective scheduling.

As far as interrupts concerned, SMP affinity refers to a question what processor handles certain interrupt. On the contrary to the processes, binding interrupts to certain CPU will most likely cause performance degradation and here’s why. Interrupt handlers are usually very small in size. Interrupt’s memory footprint is relatively small, thus keeping interrupt on certain CPU will not improve cache hits. Instead, multiple interrupts will keep one of the cores overloaded while others remain relatively free. Scheduler has no idea about this state of affairs. It assumes that our interrupt handling core is as busy as any other core. As a result, you may face bottle necks as one of the processes or threads will occasionally work on core that has only 90% of its power available.

Things may be even worse because often core 0 by default handles all interrupts. On busy systems all interrupts may consume as much as 30% of core’s 0 power. Because we assume that all cores are equally powerful, we may find ourselves in a situation where our software system will effectively use only 70% of total CPU power.

Who’s responsible

APIC or Advanced Programmable Interrupt Controller has been integral part of all modern x86 based systems for many years – both SP (single-processor) and MP. This component is responsible for delivering interrupts. It also decides what interrupt goes where, in terms of cores.

By default APIC delivers ALL interrupts to core 0.This is the reason why /proc/interrupts will look like this on vast majority of modern Linux systems:

         CPU0     CPU1     CPU2     CPU3
  0:   123357        0        0        0   IO-APIC-edge  timer
  8:        0        0        0        0   IO-APIC-edge  rtc
 11:        0        0        0        0  IO-APIC-level  acpi
169:        0        0        0        0  IO-APIC-level  uhci_hcd:usb1
177:        0        0        0        0  IO-APIC-level  qla2xxx
185:        0        0        0        0  IO-APIC-level  qla2xxx
193:    12252        0        0        0  IO-APIC-level  ioc0
209:        0        0        0        0  IO-APIC-level  uhci_hcd:usb2
217:      468        0        0        0  IO-APIC-level  eth0
225:      285        0        0        0  IO-APIC-level  eth1
NMI:      120       66       76       45
LOC:   123239   123220   123187   123065
ERR:        0
MIS:        0

See anything suspicious? Well, CPU0 handling all hardware interrupts. All of them. This is the situation that you see on a system with misconfigured interrupt SMP affinity.

Simple solution for the problem

Solution for this problem has been around pretty much since the introduction of the APIC. It has several interrupt delivery and destination modes. Physical and logical. Fixed and low priority. Etc. The important fact is that it is capable of delivering interrupts to any of the cores and even do load balancing between them.

Its configuration is limited to first eight cores. I.e. if you have more than eight cores, don’t expect any core higher than 7 to receive interrupts.

By default it operates in physical/fixed. This means that it will deliver certain interrupt to certain core. You already know that by default it is core 0. The thing is that you can easily change core that receives certain interrupt.

For each and every IRQ number in the first column in /proc/interrupts file, there’s a sub-directory in /proc/irq/. That directory contains a file named smp_affinity. Using this file you can change what core handles that interrupt. Reading from this file produces a hexadecimal number which is a bitmask with a single bit for each core. When certain bit is set, APIC will deliver the interrupt to corresponding core.

Let’s see an example…

#
# cat /proc/interrupts
         CPU0     CPU1     CPU2     CPU3
  0: 19599546        0        0        0   IO-APIC-edge  timer
  8:        0        0        0        0   IO-APIC-edge  rtc
 11:        0        0        0        0  IO-APIC-level  acpi
169:        0        0        0        0  IO-APIC-level  uhci_hcd:usb1
177:        0        0        0        0  IO-APIC-level  qla2xxx
185:        0        0        0        0  IO-APIC-level  qla2xxx
193:    95337        0        0        0  IO-APIC-level  ioc0
209:        0        0        0        0  IO-APIC-level  uhci_hcd:usb2
217:   100778        0        0        0  IO-APIC-level  eth0
225:    56651        0        0        0  IO-APIC-level  eth1
NMI:      466      393      422      372
LOC: 19600453 19600434 19600401 19600279
ERR:        0
MIS:        0
#
#
# echo "2" > /proc/irq/217/smp_affinity
# cat /proc/interrupts
         CPU0     CPU1     CPU2     CPU3
  0: 19606722        0        0        0   IO-APIC-edge  timer
  8:        0        0        0        0   IO-APIC-edge  rtc
 11:        0        0        0        0  IO-APIC-level  acpi
169:        0        0        0        0  IO-APIC-level  uhci_hcd:usb1
177:        0        0        0        0  IO-APIC-level  qla2xxx
185:        0        0        0        0  IO-APIC-level  qla2xxx
193:    95349        0        0        0  IO-APIC-level  ioc0
209:        0        0        0        0  IO-APIC-level  uhci_hcd:usb2
217:   101027       49        0        0  IO-APIC-level  eth0
225:    56655        0        0        0  IO-APIC-level  eth1
NMI:      466      393      422      372
LOC: 19607629 19607610 19607577 19607455
ERR:        0
MIS:        0
#

As we can see, once we enter the magical command, CPU1 begins receiving interrupts from eth0, instead of CPU0. The echo command that changed the state of affairs is especially interesting. It is “2” that we’re echoing into the file. Writing “4” to the file, would cause eth0 interrupt be handled by CPU2, instead of CPU1. As I already mentioned, it is a bitmask where one bit correspond to single CPU.

How about writing “3” into the file. In theory, this should cause APIC to divert interrupts to CPU0 and CPU1. Unfortunately, things are a little more complicated here. It all depends on whether APIC works in physical “destination mode” and low priority “delivery mode”. If it is so, than you most likely would not be seeing CPU0 handling all interrupts. This is because when kernel configures APIC to work in physical/low priority modes, it automatically tells APIC to load balance interrupts between first eight cores.

So if on your system CPU0 handles all interrupts by default, this probably means that APIC configured ambiguously.

Ultimate solution

First of all, unfortunately there is no choice but to replace the kernel. Software that configures APIC is part of the kernel and if we want to change things we have no choice but to fix things in kernel. Things related to APIC are not configurable, so we have absolutely no choice. The only question is, replace kernel with what?

I tested this with OpenSuSE 10.2 that comes with kernel 2.6.18. Installing kernel 2.6.24.3 (the latest at the moment) with OpenSuSE’s default kernel configuration (/proc/config.gz) fixes the problem. With this kernel, things look like this, right from the start:

# cat /proc/interrupts
         CPU0     CPU1     CPU2     CPU3
  0:   728895   728796   728624   728895  IO-APIC-edge     timer
  8:        0        0        0        0  IO-APIC-edge     rtc
 11:        0        0        0        0  IO-APIC-fasteoi  acpi
 16:        0        0        0        0  IO-APIC-fasteoi  uhci_hcd:usb1
 19:        0        0        0        0  IO-APIC-fasteoi  uhci_hcd:usb2
 24:    14090    14090    14327    14056  IO-APIC-fasteoi  ioc0
 49:        7        9        7        8  IO-APIC-fasteoi  qla2xxx
 50:        8       12       11       10  IO-APIC-fasteoi  qla2xxx
 77:     2849     2759     2841     2827  IO-APIC-fasteoi  eth0
 78:    25072    25138    24996    24980  IO-APIC-fasteoi  eth1
NMI:        0        0        0        0
LOC:  2915270  2915256  2915228  2915092
ERR:        0

Looks good isn’t it? All cores handle interrupts, thus working with maximum efficiency. Now how about getting this result with just any kernel version? It appears to be doable.

There’s a kernel configuration option that stands in our way and once removed you will get similar situation with probably any kernel  newer than 2.6.10. The option is CONFIG_HOTPLUG_CPU. It adds support for hotplugable CPUs. It appears that having this option off, makes kernel configure APIC properly.

Actually  it is quiet understandable. You see, APIC has to be told what processors should receive interrupts. You need additional piece of code that tells APIC how to handle processor removals – processor removal is one of the things that CONFIG_HOTPLUG_CPU allows you to do. I assume that this functionality was missing from earlier kernel and got inside in 2.6.24.3.

Conclusion

We saw that we can achieve really nice results by doing some modifications to kernel configuration. On a very busy system, doing this small configuration change can boost server’s productivity by large margin.

I hope you will find this information useful and use techniques I described in this article.

WordPress 2.5 trials – failure

Few days ago I started testing wordpress 2.5. Those of your following my web-site probably could see themes changing every few hours, articles becoming screwed up, etc. Unfortunately, I am writing this post using wordpress 2.3.3. I downgraded back to older version.

I absolutely loved the look and feel of the new version. The administrator panel is a candy. However one thing that ruined everything is an editor. WordPress 2.5 comes with TinyMCE 3. It has an annoying bug, removing any extra space or tabulation character, even if it is inside of <pre> tags. This creates a huge problem for someone who posts pieces of code – someone such as myself. It removes indentation from the code making it impossible to read. Pity.

Luckily I created backup copies and now web-site is back online.

How to obtain a unique thread identifier on Linux

Thread ID From some reason this topic never got enough attention in libc. POSIX threads library does addresses this issue, however what starts in POSIX library stays in POSIX library. pthread_self() and friends will get you an identifier that is unique accross your program, but not accross your system. Although thread is a system object, the system is unaware of the identifier POSIX library allocated for the thread. Thus the thread identifier allocated by the POSIX library does identify your thread within boundaries of your program, yet every-one else knows nothing about this identifier and its meaning.

Read the rest of this entry »

New article – How debugger works

Debugger

In this article I am showing how real debugger works and even demonstrate a small program that debugs other program. It’s right here and I hope you’ll find it interesting.

How debugger works

Table of contents

IntroductionBACK TO TOC

In this article, I’d like to tell you how real debugger works. What happens under the hood and why it happens. We’ll even write our own small debugger and see it in action.

I will talk about Linux, although same principles apply to other operating systems. Also, we’ll talk about x86 architecture. This is because it is the most common architecture today. On the other hand, even if you’re working with other architecture, you will find this article useful because, again, same principles work everywhere.

Kernel supportBACK TO TOC

Actual debugging requires operating system kernel support and here’s why. Think about it. We’re living in a world where one process reading memory belonging to another process is a serious security vulnerability. Yet, when debugging a program, we would like to access a memory that is part of debugged process’s (debuggee) memory space, from debugger process. It is a bit of a problem, isn’t it? We could, of course, try somehow to use same memory space for both debugger and debuggee, but then what if debuggee itself creates processes. This really complicates things.

Debugger support has to be part of the operating system kernel. Kernel able to read and write memory that belongs to each and every process in the system. Furthermore, as long as process is not running, kernel can see value of its registers and debugger have to be able to know values of the debuggee registers. Otherwise it won’t be able to tell you where the debuggee has stopped (when we pressed CTRL-C in gdb for instance).

As we spoke about where debugger support starts we already mentioned several of the features that we need in order to have debugging support in operating system. We don’t want just any process to be able to debug other processes. Someone has to monitor debuggers and debuggees. Hence the debugger has to tell the kernel that it is going to debug certain process and kernel has to either permit or deny this request. Therefore, we need an ability to tell the kernel that certain process is a debugger and it is about to debug other process. Also we need an ability to query and set values from debuggee’s memory space. And we need an ability to query and set values of the debuggee’s registers, when it stops.

And operating system lets us to do all this. Each operating system does it in it’s manner of course. Linux provides single system call named ptrace() (defined in sys/ptrace.h), which allows to do all these operations and much more.

ptrace()BACK TO TOC

ptrace() accepts four arguments. First is one of the values from enum __ptrace_request that defined in sys/ptrace.h. This argument specifies what operation we would like to do, whether it is reading debuggee registers or altering values in its memory. Second argument specifies pid of the debuggee process. It’s not very obvious, but single process can debug several other processes. Thus we have to tell exactly what process we’re referring. Last two arguments are optional arguments for the call.

Starting to debugBACK TO TOC

One of the first things debuggers do to start debugging certain process is attaching to it or running it. There is a ptrace() operation for each one of these cases.

First called PTRACE_TRACEME, tells the kernel that calling process wants its parent to debug itself. I.e. me calling ptrace( PTRACE_TRACEME ) means I want my dad to debug me. This comes handy when you want debugger process to spawn the debuggee. In this case you do fork() creating a new process, then ptrace( PTRACE_TRACEME ) and then you call exec() or execve().

Second operation called PTRACE_ATTACH. It tells the kernel that calling process should become debugging parent of the process being called. Debugging parent means debugger and a parent process.

Debugger-debuggee synchronizationBACK TO TOC

Alright. Now we told operating system that we are going to debug certain process. Operating system made it our child process. Good. This is a great time for us to have the debuggee stopped and us doing preparations before we actually start to debug. We may want to, for instance, analyze executable that we run and place a breakpoints before we actually start debugging. So, how do we stop the debuggee and let debugger do its thing?

Operating system does that for us using signals. Actually, operating system notifies us, the debugger, about all kinds of events that occur in debuggee and it does all that with signals. This includes the “debuggee is ready to shoot” signal. In particular, if we attach to existing process it receives SIGSTOP and we receive SIGCHLD once it actually stops. If we spawn a new process and it did ptrace( PTRACE_TRACEME ) it will receive SIGTRAP signal once it attempts to exec() or execve(). We will be notified with SIGCHLD about this, of course.

A new debugger was bornBACK TO TOC

Now lets see code that actually demonstrates that. Complete listing can be found here.

The debuggee does the following…

.
.
.
    if (ptrace( PTRACE_TRACEME, 0, NULL, NULL ))
    {
        perror( "ptrace" );
        return;
    } 

    execve( "/bin/ls", argv, envp );
.
.
.

Note the ptrace( PTRACE_TRACEME ) followed by execve(). This is what real debuggers do to spawn the process that going to be debugged. As you know, execve() replaces current executable image and memory of the current process with the executable and memory space belonging to program that being execve()‘d. Once kernel finishes this operation, it sends SIGTRAP to calling process and SIGCHLD to the debugger. The debugger receives appropriate notifications via signals and via wait() that returns. Here is the debugger’s code.

.
.
.
    do {
        child = wait( &status );
        printf( "Debugger exited wait()\n" );
        if (WIFSTOPPED( status ))
        {
            printf( "Child has stopped due to signal %d\n",
                WSTOPSIG( status ) );
        }
        if (WIFSIGNALED( status ))
        {
            printf( "Child %ld received signal %d\n",
                    (long)child,
                    WTERMSIG(status) );
        }
    } while (!WIFEXITED( status ));
.
.
.

Compiling and running listing1.c produces following output:

In debuggee process 14095
In debugger process 14094
Process 14094 received signal 17
Debugger exited wait()
Child has stopped due to signal 5

Here we can clearly see that debugger indeed receives a signal and gets notified via wait(). If we want to place a breakpoint before we start to debug the process, this is our chance. Lets talk about how we can do something like that.

The magic behind INT 3BACK TO TOC

It is time to dig a bit into subject that is not adored by most of the programmers and that is assembler language. I am afraid we don’t have much choice because breakpoints work on assembler level.

We have to understand that each our compiled program is actually a set of instructions that tells CPU what to do. Some of our C expressions translated into single instruction, while others may be translated into hundreds and even thousands of instructions. Instruction may be bigger or smaller. From 1 byte up to 15 bytes long for modern CPUs (Intel x86_64).

Debuggers mostly operate on CPU instruction level. The matter of fact that gdb understands C/C++ code and allows you to place breakpoints at certain C/C++ line is only an enhancement over gdb‘s basic ability to place breakpoints on certain instruction.

There are several ways to place breakpoints. The most widely used is the INT 3 instruction. It is a single byte operation code instruction that once reached by CPU, tells it to call special breakpoint interrupt handler, provided by operating system during its initialization. Since INT 3 instruction operation code is so small, we can safely substitute any instruction with it. Once operating system’s interrupt handler called, it figures what process reached a breakpoint and notifies it and its debugging process via signals.

Breakpoints hands onBACK TO TOC

Lets return to our debuggee/debugger friends. As we mentioned debugger does have a chance to place a breakpoint before letting the debuggee process to run. Lets see how this can be done.

Breakpoints placed with INT 3 instruction. Before writing the actual 0xcc (INT 3 operation code), we should figure where to place the instruction. For purpose of this article we will do it manually. On the contrary, real debuggers include complex logic that calculates where and when to place the breakpoints. gdb places several breakpoints by itself, without you even knowing about it. And obviously it has functionality that places breakpoints once you ask it to do so.

In our previous example we had our debuggee process executing ls. It is not suitable for our next demonstration. We will need a sample program that would let us easily demonstrate breakpoints in action. Here it is.

#include <stdio.h>

int main()
{
        printf( "~~~~~~~~~~~~> Before breakpoint\n" );
        // The breakpoint
        printf( "~~~~~~~~~~~~> After breakpoint\n" );

        return 0;
}

And here is the disassembler output of the main() routine.

0000000000400508 <main>:
  400508:       55                      push   %rbp
  400509:       48 89 e5                mov    %rsp,%rbp
  40050c:       bf 18 06 40 00          mov    $0x400618,%edi
  400511:       e8 12 ff ff ff          callq  400428 <puts@plt>
  400516:       bf 2a 06 40 00          mov    $0x40062a,%edi
  40051b:       e8 08 ff ff ff          callq  400428 <puts@plt>
  400520:       b8 00 00 00 00          mov    $0x0,%eax
  400525:       c9                      leaveq
  400526:       c3                      retq

We can see that if we will place a breakpoint at address 0x400516, we will see a printout before reaching the breakpoint and right after reaching it. For the sake of our demonstration, we will place a breakpoint at this address. Once we will reach the breakpoint, we will sleep and then let the debuggee running. We should see debuggee producing first printout, then sleeping for a few seconds and then producing second printout.

We’ll achieve our goal in several steps.

  1. First of all, we should fork() off the debuggee. We already did something similar.
  2. Next step is to intercept the execve() call in debuggee. Been there, done that.
  3. Here’s something new. We should modify a byte at address 0x400516 from 0xbf to 0xcc, saving original value (0xbf). This is how we place the breakpoint.
  4. Next, we’re going to wait() for the process. Once it will reach the breakpoint, we’ll be notified.
  5. Once the debuggee reaches the breakpoint we want to restore the code we broke with our 0xcc to its original state.
  6. In addition, we want to fix value of RIP register. This register tells CPU what is the location in memory of next meaningful instruction for it to execute. It’s value will be 0x400517, one byte after 0xcc that we placed. We want to set the RIP register to 0x400516 value because we don’t want the CPU to skip over that MOV instruction that we broke with our 0xcc.
  7. Finally, we want to wait five seconds for the sake of demonstration and let the debuggee continue running.

First things first. Lets see how we do step 3.

.
.
.
        addr = 0x400516;

        data = ptrace( PTRACE_PEEKTEXT, child, (void *)addr, NULL );
        orig_data = data;
        data = (data & ~0xff) | 0xcc;
        ptrace( PTRACE_POKETEXT, child, (void *)addr, data );
.
.
.

Again, we can see how ptrace() does the job for us. First we peek 8 (sizeof( long )) bytes from address 0x400516. On some architectures this could cause lots of headache because of unaligned memory access. Luckily, we’re on x86_64 and unaligned memory accesses are permitted. Next we set the lowest byte  to be 0xcc – INT 3 instruction. Finally, we place 8 bytes back to their place.

We’ve seen how we can wait for certain event in debuggee. Also, we now know how to restore the original value at address 0x400516. So we can skip over steps 4-5 and jump right into step 6. This is something that we haven’t done so far.

What we have to do is to read debuggee registers, change them and write them back. Again ptrace() does all the job for us.

.
.
.
        struct user_regs_struct regs;
.
.
.
        ptrace( PTRACE_GETREGS, child, NULL, &regs );
        regs.rip = addr;
        ptrace( PTRACE_SETREGS, child, NULL, &regs );
.
.
.

Things are not too well documented here. For instance ptrace() documentation never mentions struct user_regs_struct, however this is what ptrace() system call expects to receive in kernel. Once we know what we should use as ptrace() arguments, it is easy. We use PTRACE_GETREGS operation to obtain values of debuggee’s registers, we modify the RIP register and write them back with PTRACE_SETREGS operation. Clear and simple.

Lets see how things actually work. You can find complete listing of debugger process here. Compiling and running listing2.c, produces following output.

In debuggee process 29843
In debugger process 29842
Process 29842 received signal 17
~~~~~~~~~~~~> Before breakpoint
Process 29842 received signal 17
RIP before resuming child is 400517
Time before debugger falling asleep: 1206346035
Time after debugger falling asleep: 1206346040. Resuming debuggee...
~~~~~~~~~~~~> After breakpoint
Process 29842 received signal 17
Debuggee exited...
Debugger exiting...

You can see that “Before breakpoint” printout appears 5 seconds before “After breakpoint” printout. The “RIP before resuming child is 400517” clearly indicates that the debuggee has stopped on address 0x400517, as we expected.

Single stepsBACK TO TOC

After seeing how easy to place a breakpoint, you can guess that stepping over one line of C/C++ code is simply a matter of placing a breakpoint on the next line of code. This is exactly what gdb does when you want it to single step over some expression.

ConclusionBACK TO TOC

Debuggers and how they work often associated with some kind of magic.

Debuggers, and gdb as an example, are exceptionally complicated piece of software. Placing breakpoints and single stepping is only a small fraction of what it is able to do. gdb in particular works on dozens of hardware architectures. It supports remote debugging. It is perhaps the most advanced and complicated executable analyzer. It knows when a program loads dynamic library and analyzes the code of that library automatically. It supports bunch of programming languages – from C/C++ to ADA. And these are just few out of its features.

On the contrary, we’ve seen how easy to start debugging certain process, place a breakpoint, etc. The basic functionality that allows debugging is in the operating system and in the CPU, waiting for us to use it.

New article –
Irex Technologies Iliad, more than a year together

Iliad for Dummies

I finally finished writing this review. Took me almost a month to complete, but its finally here.

I decided to write this review for two reasons. First of all, although there are so many reviews out there, no one has tracked the device and its development or evolution for over a year. I did. Also, there is a personal story of using Irex Technologies Iliad that I would like to tell.

Hope you’ll find it useful. Please fill free to leave comments. Here’s the link.

Irex Technologies, Iliad – More than a year together

Introduction

I decided to write this review for two reasons. First of all, although there are so many reviews out there, no one has tracked the device and its development or evolution for over a year. I did. Also, there is a personal story of using Irex Technologies Iliad that I would like to tell.

History

I started tracking development of e-Ink or e-Paper based devices sometime in 2002 or 2003, when I heart about this wonderful technology for the first time. The perspective of having screen with paper like DPI thrilled me. I am respectful to paper books, yet in terms of convenience and ease of use, analogue media already lost it to digital media in nearly every possible aspect. Consider MP3, personal media players with movie playing capabilities. Etc. Yet, the only direction in which digital media didn’t make much progress was reading and books. LCD screen based e-book readers lack many features that are a must in devices of this kind: readability, battery power, etc. As far as readability concerned, I assume either you yourself experienced headache or heart about people suffering from this problem when reading from LCD screen. Not to mention endless problem of reading at sunlight. On the other hand constantly emitting light, like LCD screens do, causes high power consumption, thus rises requirements for the power source – the battery. Problem of power consumption is even worse when combined with reading at sun-light issues.

Considering this, no wonder LCD screen based e-book readers didn’t take off. What about other technologies, you may ask? There is no monitor technology that would be suitable for intensive reading. On the contrary, other technologies are even more problematic then LCD. Something had to change and it did. With introduction of e-Ink screens.

A word on technology

I don’t want to dig too deeply into technical details of how things work in e-Ink screens. Yet at least on paper the idea is quiet simple. Screen consists of positively charged dark particles and negatively charged white particles. Charge applied to one side of the screen. Positive charge draws negative particles, pushing positive particles to the other side of the screen. Same thing, with opposite polarity, happens when applying negative charge.

e-Ink Technology Diagram

Irex Technologies Iliad

After waiting for several years for the new dream gadget, such gadgets finally began to appear. At first it was Japan only. Irex Technologies, a Philips spin off, was the first firm to release e-Ink based device in Europe. Few month after its initial release it became available in the middle east and I couldn’t wait any longer. It cost me almost 900$ including taxes, delivery and customs to Israeli authorities that didn’t know how to handle the device and detained it until I payed another 60$ and explained them myself what the hell is this. I didn’t think about the price. I had to get one and I did it.

First impression

At first, I was astonished by simplicity of packaging. Cool looking box contained only three items – USB cable, charger/USB/Ethernet hub and the device itself. I expected to see a manual, yet it wasn’t there. After thinking about it, I concluded that all documentation should be in the device itself. After all, this is a device for reading texts.

So, I decided to turn it on and see for myself. My first guess was that On/Off button is the big round button in the right top corner, yet after pushing it for a while I concluded that either the device is broken or the On/Off button is somewhere else. This is when I thought that it still might be a good idea to see a manual before turning it on. Luckily I was next to a computer. I browsed to Irex’s web-site and downloaded the manual. It appeared that power button is not the round button in the right top corner but a little needle at the bottom edge of the device. After I pulled it, the screen blinked and I saw the e-Ink technology in action for the first time in my life.

Iliad for DummiesI must say that it is one amazing technology. Well you can see the pixels, but they are small. And by small I mean really small. You can see the text under any angle and it does look like paper. The surface color is bright grayish beige. The darkest black appears as dark gray color.

The device has a resolution of 1024×768 pixels. And since the screen size is only 8 inch (diagonal) you get nice around 160 DPI (Dots Per Inch). This is twice more then you have on a regular LCD panel and you enjoy every dot per every inch out of it :-)

Second impression

After playing around with the device, I decided to see what kind of settings are available for me. The device’s user interface is very intuitive so it didn’t take me much time to figure out where are the goodies. It appeared that software version installed on my Iliad was 2.5, despite specs on the package clearly stated that it is 2.6. Moreover, the Irex’s web-site stated that all new units sold with 2.6. And what’s the first thing you do with a gadget when you know that newer version of software is available? Of course, you upgrade. So did I.

I configured wireless network, plugged it into power hub and did a long press on the connect button (the button in the right top corner – more on buttons later). Upgrade went almost smoothly. The only problem was that I had to retry the installation twice – first time it got stalled and I had to reset. My prayers were answered because after the reset it booted with good old 2.5. Second attempt went so much better and I ended up with brand new 2.6.

Third impression

After reading few documents that were already on the device (readme, release-notes, etc :-) ), I decided its time to try some of my own content. I had latest Intel’s x86 PDF specs and I decided to give them a try. I’ll spear you a story of what happened during next couple of hours. I think you can make a guess yourself, knowing that USB cable I received with the device wasn’t good and I had to use my own – luckily, it is a standard USB to square-USB cable. So after few hours long headache I was finally reading my beloved x86 spec.

Well… When you think about it, it really shouldn’t surprise anyone. I mean PDFs made for A4 paper will not render well on a A5 screen. Yet, it was some sort of disappointment. On the other hand it really has nothing to do with Iliad, since most of the e-Ink based devices have relatively small display. Later, Irex partially solved this problem by adding zoom in feature in PDF viewer.

Something that draw my attention immediately after I turned it on was the fact that it takes so much time for it to boot. I mean my PC usually boots faster then Iliad. By much time I mean 40 seconds, more or less. I’ve seen people wondering if this will ever change and prognoses on this are quiet pessimistic. More on this later.

Features overview

Now let’s say few words about Iliad’s features. After all Irex always claimed Iliad is business oriented device, thus it should be packet with lots of stuff. And actually, it is:

  1. Support for several different formats, PDF, HTML, Mobipocket, etc.
  2. Supports MP3 and has audio jack. The only problem with this feature is that they never implemented it. More on this later.
  3. Touch screen – correct me if I am wrong, but this is the only e-Ink device with touch screen at the moment. More on this later.
  4. Wi-fi, ethernet and USB connectivity.
  5. MMC/SD and CF (I/II) slots for more memory.
  6. Exceptionally (for this kind of devices) powerful CPU (xScale at 400MHz).
  7. 64Mb of RAM and 128Mb of internal memory.

Unfortunately, as I mentioned, MP3 support actually never appeared. HTML is indeed supported but only few languages supported (actually this is something that changes so keep up with the updates). This is a serious problem for someone who reads other languages, such as myself.

Touch screen is nice, but here’s something that Irex never mention when they advertise Iliad. e-Ink is quiet slow. Meaning that no matter what you draw, it will appear something like 0.5-1 second after you drew it and this is really annoying. In my case, it is so annoying, I stopped using Iliad for sketching and writing almost immediately, despite I hoped it would become a replacement for my paper note book that I often use.

You can easily guess why you need USB connectivity but Wi-Fi and Ethernet? At the moment, the only feature that uses Wi-Fi and Ethernet is downloading firmware updates. Otherwise it is completely useless. Now obviously there is a reason why Iliad has them, but year and half after Iliad was introduced for the first time, we still can only speculate. One possible reason is that Iliad going to let you browse Mobipocket book catalog and buy stuff right from the device. We will see.

Of course you need a memory card slot. Of course you don’t need two slots. You could argue that some people may not have MMC/SD or CF card and having two cards is convenient, but truly, do you know a person that doesn’t own a MMC/SD card?

Finally, I simply cannot explain why there’s so powerful CPU in it? You could argue that Irex probably are going to add more features that will raise the CPU consumption, but nothing like this happened since Iliad was released for the first time. And I doubt it will ever happen. Actually, after reading release notes each time they released a software upgrade, I noticed that Irex spent great deal of time trying to keep the CPU running at 100MHz most of the time (to reduce power consumption). Go figure.

Having too powerful CPU, useless Ethernet and Wi-FI controllers, useless memory card slot and a touch screen skyrockets the price for the device. No wonder this is one of the most expensive e-Ink based devices at the moment.

The backside thing

Iliad BacksideTake a look at the picture. What is that you see on it? This is a backside of 1st edition of Iliad. The backside seems to be designed for some stand or holding device. Yet, there is no holding device, neither stand for Irex Iliad.

Moreover, in September 2007 Irex announced a new edition of Iliad. Let me quote “The iLiad 2nd edition features a fully redesigned backside that looks more elegant and also provides more stability when the iLiad is laying on a surface”.

Now I must say that this whole backside thing sounds, smells and looks like a complete joke. I suppose there are things that you can do with backside of the device. Things like placing retractable leg that would let to have Iliad standing and not laying. Yet these features bypassed Iliad. Instead it features useless backside design that was redesigned in 2nd edition. Unfortunately, I couldn’t find pictures of the backside of 2nd edition.

Buttons

One of the things Irex advertise is the flip button used to turn pages. I certainly agree that the button is indeed very convenient. Yet, it may start getting jammed a bit, sometimes causing Iliad to interpret single press and a long press turning five pages instead of one. And it is exceptionally annoying if all you wanted to do is to turn to a next page and instead you jump five pages ahead and try to get back, page by page, but then suddenly it jumps five pages back and eventually you flip two pages back instead of one page forward (other combinations are possible, taking you to any page within ten pages range but not to the next page). Can’t tell if this is a software or a hardware problem. Also I can’t tell you if it will happen to you too. It did happen to me.

Other buttons are handy. I suppose you’ve seen them on pictures. There are four dedicated buttons that take you to different sections of device’s memory – the news, books, docs and notes buttons. You can configure a folder to which you’ll be taken once you press any of these buttons. Pressing any of these buttons twice will take you to the last document you’ve read from that category.

Below the flip button, there’s a next and prev buttons and the enter button. Above the flip button there’s a menu button that takes you to main menu of the device and a up button that takes you to higher level in the menu.

You are probably wondering what’s next and prev buttons needed for. After all, there’s a flip button. Actually, next and prev complement the flip button. In menu, flip button allows you to step over pages of menu items (six items per page), while next and prev buttons allow you to walk through menu item after item.

The power button is a completely different story. It is exceptionally inconvenient. You really must to have your fingernails in good shape to turn Iliad on. Furthermore, not only you have to reach that needle, but you also have to hold it pressed for a second or two. There is absolutely no chance of pressing it accidentally. Yet you still have to hold it pressed to turn the device on. This is perhaps the most annoying imperfections that Irex afforded to have.

Software

Irex Iliad is a Linux based device. It’s interface built using, believe it or not, X windows toolkit. From one point of view it should ease software development. From the other point of view, it gives the interface slightly oldish look and feel – considering the monochromatic nature of the device it might be an advantage.

The Iliad’s interface is relatively well thought. There are eight items in the main menu

  1. Iliad Settings
  2. Iliad Profiles
  3. Reference Materials
  4. Recent Documents
  5. CF Card
  6. MMC Card
  7. USB Stick
  8. Main Memory

As I already mentioned there are four dedicated buttons that take you into news, books, docs or notes folders in either CF card, MMC card, USB stick or main memory (default). You can also use the main menu to get to one of these.

Anyway, things seems to be good more or less. The only thing that I didn’t like was the operation of the menu itself. First it is too bulky on the screen. There is absolutely no reason to limit number of items appearing on such a large screen to six – compare with Sony PRS-50X, pressing tens of items onto smaller screen. Other thing that I didn’t like very much is that the menu is so slow. I mean with so much RAM and CPU power, Iliad could draw in memory every possible menu item long before I attempt to access it and only show a picture once the menu is accessed. Instead it takes over a second, and sometimes even more, to open a menu item.

On the other hand, Irex tend to solve this kind of problems and in this particular case I think it will be solved at a time.

Ergonomics

One of the problems with all e-Ink based devices that I’ve seen is that they are not comfortable to hold. Despite this device is mostly hand held, Iliad in particular doesn’t have much space for a grip. This is bad for people with big hands, such as myself. Iliad is also not very comfortable for people with small hands because it is quiet heavy. It seems that Irex didn’t put too much effort into thinking about how people going to hold it.

This emphasizes the negative impression I got when was looking for power button. As far as ease of use and ergonomics concerned, Irex could do a much better job.

To complement this, I admit that I didn’t see other companies placing too much effort into solving this problem.

The power thing

Irex are doing excellent job keeping the device working for as long as possible. Also, this is one of the issues that improves over the time. When I received Iliad it was working for 6-8 hours. Now, with version 2.11 it’s supposed to be 12. Yet, just to make things a little bitter I must remind that Irex were selling 20 hours non-stop. They are no longer advertising anything like this today, but this is what I payed for.

Still, having a device like this working for 10 and even more hours straight is almost too much to ask.

The boot time thing

Iliad boots in 40 seconds. This is absolutely awful. Perhaps they’ve planned people having Iliad on for whole day. I don’t know. I was mostly using it for periods from couple of minutes, to couple of hours and while in later case waiting 40 seconds for it to boot is more or less acceptable, waiting 40 seconds just to read for two minutes (or actually what have left from them), is too much.

I encountered several discussions regarding this issue on different forums on the web. The final verdict, told by the Irex themselves, was that it is impossible to shorten boot time. This is mostly due to amount of hardware devices on board that have to be initialized. Also it is impossible to implement suspend/resume features because hardware was never designed for something like this. I am not sure this is completely true, there is nothing we can do about it. 2nd edition of the device didn’t bring so awaited salvation. 40 seconds boot time it is.

Upgrades

Once in couple of months Irex releasing a new version of software for Iliad. Although it usually brings nothing exceptionally new to your Iliad experience, most of the time it does include nice improvements here and there. Over the time, Iliad’s battery life got better. New features, such as zoom in that I already mentioned and landscape mode, had appeared. It now supports Mobipocket, which is a very nice addition to the overall feature set.

One of the things that significantly improved Iliad experience was an improved PDF viewer page rendering time. As I mentioned e-Ink screens are very slow. It takes it almost a second to redraw itself. Irex did software changes to make this process much faster.

SDK for Iliad

One of the things Irex promised and accomplished was a release of SDK for Iliad. Iliad is Linux based device, running Linux kernel 2.4. It took them some time, but eventually the SDK is here and the results can already be seen – take the Sudoku game for Iliad for example.

The release of SDK makes me believe that MP3 support will arrive – and if Irex won’t add mp3 support, someone else will do it for them.

Last time I checked there were some problems installing home brew Iliad software – this process isn’t standardized yet. However things are changing and I believe more home brew software will appear and it will be much easier to install it.

Technical support

I applied for technical support twice. In both cases it was exceptionally slow and tedious process.

I’ve heart about people who were luckier then me. I think it depends on how you buy it and if there is a local distributor in your country. If you’re wondering how’s tech. support in your country, you better check on local forums and see what people are saying.

In case there is no local distributor in your country and you intend to buy directly from Irex, the technical support process is awful. If something goes wrong with your Iliad and it has to be repaired they send you a special box. You should place your Iliad into the box and send it back to them. Then, few weeks later they will send you a replacement unit. No, they won’t repair your unit. They will send you someone esle’s, repaired unit. And they won’t delete information from your Iliad before sending it to someone else, so make sure you don’t have any sensitive information on Iliad before sending it to repair. In addition you may run into problems with your local tax authorities – most likely they have no idea what’s Irex Iliad and why you and Irex Technologies LTD. want to exchange expensive goods without paying any customs.

I did it twice (see “Iliad vs. me” below) and in both cases the whole process took month and half.

Their technical support personnel isn’t exactly listening to you (reading your emails). Once they understand what’s the problem, they follow a procedure. They have procedures for every occasion and they will follow the procedure ignoring any of your attempts to speed things up. And you do want to speed things up because it is really slow.

Finally, luckily, it won’t cost you money (except for customs) if your Iliad is on warranty. Otherwise you will have to buy a “Repair Voucher” that can be as much as 112$.

Conclusions

I think I covered most of the interesting aspects of Irex Iliad. In my opinion the technology is absolutely amazing. Yet so far Irex Iliad failed to deliver what they sell. Perhaps this can be compensated by the fact that the technology is so new. Hopefully one day they will overcome the problems that Iliad has and it will become a killer device. They are making some progress (although I am not sure that the direction is right). In the meantime it is up to you to device whether to spend your money on the device.

Iliad vs. me.

As a post script to this review, I would like to tell you what have happened to me personally.

I bought Iliad a second after it became available here, in Israel. Payed for it a small fortune. At first I liked it very much, but after two months two horizontal stripes appeared in the middle of the screen. The stripes didn’t disappear when device was turned on and off. I ignored them, but then they became wider and eventually I had to apply to Irex’s technical support.

First stripe

It took two months to complete box/replacement unit cycle. Most of them I spent without Iliad, reading books from my old pocket PC. I payed 30$ to Israeili customs authorities when I received the box for the first time – this is because they decided to check what kind of item costs 1$ and detained it. I had to pay 30$ storage fee (when they detain things, they store them).

I asked technical support, several times, what caused this problem and if there is a way to avoid this problem in the future. They ignored all my questions.

There was one problem with the replacement unit – the stylus for the touch screen was very loose in its slot. It was falling off its slot when you simply upended Iliad. I thought it is nothing, until one day I lost it. You can use Iliad without a stylus if you only need it for reading, but there are some things that require that darn stylus. For instance, you can’t change network settings without it.

I decided not to buy a new stylus because it was simply too expensive. Consider paying 50$ for piece of cheap plastic crappy stylus that you are going to loose anyway. It is cheaper now. Only 30$. Yet still too expensive for me.

Three months after I received a replacement unit, once again, a horizontal stripe appeared in the middle of the screen. It widened with time, so I applied for technical support again. This time I wasn’t too patient and got almost blunt in my emails to them. Don’t know if it has anything to do with my bluntness or it was bad luck, but this time, the cycle took months, again.

Second stripe

Replacement unit I received obviously wasn’t configured to work with my home wireless network. Since I didn’t have a stylus to configure it, I got stacked with version 2.10 of the software. Still, I kept using it, until three months ago vertical stripe appeared in the middle of the screen. This time the 1 year warranty was over so I decided not to apply for technical support but to write this review instead.

Third stripe

At the moment I’m saving some money to buy myself a new ebook reader. Didn’t decide which one to buy yet, but it certainly won’t be Irex Technologies Iliad.

I hope you found this article useful. Fill free to leave comments.

Alexander Sandler.

New article – About this web-site

Internet Explorer Shield

Just finished working on a new article – About this web-site. It explains history of this web-site and what tools have been used to create it. Grab it here.

About this web-site

Internet Explorer ShieldI’ve been running this web-site for something like five years. Well… Not this one in particular, but many different sites with different domain names, different CMS systems, etc.

I guess it started because I actually wanted to do web-designs. I started learning Photoshop back in 1997-1998, and by year 2002 became quiet proficient with it. As proficient as an amateur can be of course. Also, HTML is simple language and I used it to create simple web-pages for many years. Actually, there’s a page that I created back in 1997, still hosted somewhere on the internet. Imagine how amazed I was when I found it couple of years ago. By the year 2002 I, again, was proficient enough to do more complex web-sites. I learned PHP and JavaScript and knew how to do some nice things with these tools. Eventually, after using some free hosting solutions, I concluded that paid hosting is a must and purchased a hosting plan. Obviously I had to register a domain name. Back then it was asandler.org. Later, when I managed to come up with something that can be opened to public, I tried to register asandler.com. It appears that it is already taken by Adam Sandler. Anyway, now its just someone trying to sell something – I doubt it has anything to do with Adam anymore.

As I mentioned, after a year or so I managed to come up with a design and content that can actually be shown to someone. Later, I concluded that the content is useless and threw it to garbage. Same happened to design and PHP/Javascript code backing it up.

Later, I figured out that the true reason why I am doing this web-site is because I wanted to create a web-site. Note the difference between having a web-site and building one. Once I created a design and created PHP code to back it up, I became bored and threw everything away. It happened several times in a row. Once I even wrote a complete CMS system. One with built-in WYSIWYG editor (TinyMCE) and database driven file-system. I think I have this code somewhere on my hard-drive.

Two years ago, after being asandler.org, asandler.net, obosnuy.org, obosnuy.com, etc, I registered alexandersandler.net. I registered it on dyndns. This allowed me to do some advanced configurations with the domain, including redirecting some sub-domains to my home computer and the primary domain to my web-site. Few month later I decided to stop playing around and to do a site that would stay. I went shopping for CMS system that could answer my demands. Back then, all I wanted is to manage few documents online, do a little theme and that is it. I was quiet terrified with number of features some CMS systems has. I really wanted something simple, that would just do those simple tasks I wanted it to do. Eventually, I found Etomite. Etomite is a simple CMS system, that allows you to write your own plug-ins, easily create themes and manage your content. Everything is online, using administrator panel.

It appeared that doing themes for Etomite is very simple task. Basically, this is like having several PHP files included from one PHP file, to create a complete web-page. I.e. one PHP file contains only the header, second contains side-bar, etc. I created several themes, some were quiet nice. Also, I started working on some content, again. This is when I started the g15dods page. Eventually, after playing around with Etomite for about a year I started seeing its problems and disadvantages.

In my opinion, one major problem with Etomite is that its community is very small. You can write plug-ins for it of course. And as with themes, writing plug-ins for it is a very easy. Yet from some reason when I started to look for something useful, I couldn’t find it. I was looking for gallery plug-in – I found whole bunch of them, but no one has touched them since 2005, meaning nobody maintained and fixed them. Also, I wanted Ajax gallery. There is none. Same with comments. I wanted to allow comments on my web-site, yet I couldn’t find working comments plug-in. When I asked about it on a forum it appeared that I actually needed a guest book plug-in, which with some effort could be turned into inconvenient comments plug-in. And forget about Ajax.

So I went down to create another theme. And once few months after it was ready I decided it was time to move to some other CMS. This time I was looking for  something mainstream. Something that would be easy to use, that would have lots of plug-ins and large community of people using it and enhancing it. Finally, after doing some research I ended up with two options – Drupal and WordPress.

It really had to be blog. CMSes in their pure form no longer exist. Instead every CMS tries to be either portal or blog. I didn’t want to install portal software on my web-site so it really had to be a blog. Among many blogs, I picked two biggest.

It took me 30 minutes to figure out that it can’t be Drupal. It was really simple to understand that Drupal is way too not intuitive for me. On the other hand, frankly my demands have changed too. I wanted something that works out of the box, preferably with tons of themes and plug-ins – as time passes I have less and less time to work on themes and plug-ins myself, hence the new demands. Drupal has tons of features, tons of themes and tons of plug-ins, yet it lacks simplicity. On the other hand, WordPress has them all.

At the moment, my web-site is WordPress based. It didn’t take much time to get used to it and I found a way to make things work my way. At first I started working on a new theme, but over time I figured that I better use one of those ready themes, especially considering that there are so many of them – creating theme for WordPress is far from simple. Yet, since I am so impatient with getting things perfect I often change the theme. Of course since this is WordPress, I don’t really have to worry about the content – it’s there, no matter what theme I use.

Few things were difficult with WordPress. For instance I wanted some more advanced features in the WYSIWYG editor that is built into WordPress. These features are available in plug-in that, on the other hand, breaks look and fill of the formatted text. Luckily, the plug-in allows you to use your own style for the edited text, so eventually I have it all the way I want it.

I’ll continue adding new paragraphs to this article as this web-site evolves. In the meantime wish me good luck with WordPress and I hope this setup will last longer than it’s predecessors.