Thursday
Jul012010

Developing for Android: Decaf EC2 Client

“Android would have the spirit of Linux and reach of Windows.”

This is the sentence that Andy Rubin, the guy behind Android, used to get the interest of the Google founders. And this is the promise he has made to everyone. So far Android is off to a very good start. Sold 10 million devices in its first year (2009) and now activating 160,000 a day, according to Mr. Rubin.

For us this 'Linux server in your pocket' meant something. With a device this powerful we could build a tool to manage and monitor your Amazon EC2 Cloud. In this article we will walk through the main features of Decaf. We will discuss these features to show how you can develop for Android. And hopefully trigger your imagination on what to do with this platform.

Passion for Android

Android is the Linux based, open source smartphone operating system from Google. It is still a bit 'the new kid on the block'. But Android grew to 10 million users in its first year. Android is also open, you can do almost everything you want. As a user you can decide yourself what happens with your phone. And as a developer you have the opportunity to do something truly innovative.

So, what can you do with a Linux server in your pocket? What we wanted was an app to maintain our Amazon EC2 cloud. Without our own physical datacenter there is no need to replace hardware anymore. Now that we can instantly replace our hardware we can handle our calamities in a totally different way. On top of that monitoring services, either homegrown or commercial, are necessary but way too expensive.

A year ago we started building Decaf, an app to manage and monitor your Amazon EC2 account. We were in a bit of a hurry, because we wanted to enter the Android Developer Challenge. We are very proud to be a finalist, especially with an enterprise app (system administration) in a "consumer competition".

Decaf needs to do a couple of things

  • manage an Amazon EC2 account (launch instances, create volumes, associate ips, etc.)
  • monitor availability on instances
  • and show the health of your Amazon EC2 account 

We will walk through the core of Decaf showing the basic building blocks of an Android app. This will show you most of the things you need to know to get started. For monitoring we will need to build something that runs regularly to check your instances, without eating all your resources. And finally, to show charts of CPU, network and disk I/O of your Amazon EC2 account, readily accessible on your home screen.

Decaf, the app

Before we start talking about building an app you need some tools. I work on a MacBook, with 4 GB memory. Some of my collegue Android developers develop on a laptop running Ubuntu. To start developing you need 2 more things, the Android SDK and a development environment. The Android Development Tools for Eclipse are good, so I always use Eclipse. You can also organize your development around Ant. Though the emulator is very good, you do need a phone. Any HTC phone can be easily used as a developer phone. If you want Google's base Android I recommend the developer phones. (Both are ok!)

The manifest 

Every Android app needs to declare its intentions, how it acts and what it listens to. The way to tell the system what you intend to do is through the manifest file (AndroidManifest.xml.) You have to ask permission to the user to use system facilities like vibrating, internet or GPS. The manifest also states the Activities and Receivers, later more on what they are.

Excerpt from the Decaf AndroidManifest.xml

<uses-permission 
    android:name="android.permission.VIBRATE" />
<uses-permission
    android:name="android.permission.INTERNET" />

<activity
        android:name=".Dashboard"
        android:label="@string/app_name">
    <intent-filter>
        <action android:name="android.intent.action.MAIN" />
        <category android:name="android.intent.category.LAUNCHER" />
    </intent-filter>
</activity>

<receiver android:name=".OnAlarmReceiver" >
    <intent-filter>
        <action android:name="android.net.conn.CONNECTIVITY_CHANGE" />
    </intent-filter>
</receiver>

The Dashboard Activity

This Activity shows an overview of what you have in your account. It shows the number of instances, images, etc. This activity is not just an index of where to go, it holds valuable overview information for the user. We think that every screen has the responsibility to inform the user of something. Just a screen to get you to another screen is a waste of time and effort.

Decaf interacts with the Amazon EC2 webservice through its API. But to create a responsive application we decided to persist the data in a local database. In Android you can store data in several ways. We chose to use SQLite. The app will refresh its data when it expires (configurable in settings) or explicitly.

We encapsulated the databases with a Content Provider. The reason we did this was stability, it is easier to manage the resources in one single place in a multithreaded/multitasking environment. But for this article you can regard a Content Provider as a SQL database whose resources are managed by the OS.

Every activity does a couple of things

  • sets up the screen in onCreate based on the xml layout
  • gets cursor referring to the information (you can see a cursor as a query with the results)
  • draws the information on the screen
  • adds interaction to the screen elements 
public void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);
    setContentView(R.layout.dashboard);
}

protected void onResume() {
    Uri uri = Uri.parse("content://decaf/dashboard");
    cursor = getContentResolver().query( uri, null, null, null, null);

    super.onResume();
}

protected void onPauseResume() {
        super.onPause();

        cursor.close();
}

// redraw is also called when the data changes
protected void redrawScreen() {
    cursor.requery();
    if (cursor.moveToNext()) {
        int instances = cursor.getInt(cursor.getColumnIndex(INSTANCES
                .column()));
        instanceCount.setText(String.valueOf(instances));
        if (instances > 1) {
            instancesTitle.setText("Running Instances");
    }
}

// only called once for every DecafActivity
protected void drawScreen() {
    LinearLayout container = (LinearLayout) findViewById(R.id.dashboard_instances);
    container.setOnClickListener(new View.OnClickListener() {
        public void onClick(View view) {
            Intent i = new Intent(context, InstancesDetails.class);
            context.startActivity(i);
        }
    });
}

If you use activities in combination with system resources like a database or GPS you have to (politely) request access, and carefully give it back. The way to do this is to follow the Activity lifecycle. In our experience memory is the most expensive resource, the processor is quite ok. The consequence of not properly releasing resources is most directly felt in battery drain. If your app does not behave, people will realize because of changed battery usage. Only hang on to resources when you really need them. Therefore we use onResume/onPause to handle the cursor, instead of onCreate/onDestroy. (A lesson learned the hard way.)

You can add interaction to any element you can identify. Apart from this there are 2 default ways to let the user do things. One is the menu (accessible on any Android by a menu button) and the other is the context menu (accessible to views with a long press.) As a user I expect these 2 to be present always.

Monitoring your instances

We just built an application to maintain our Amazon EC2 account. We can do everything from terminating instances to creating volumes and managing security groups. Decaf is already the perfect tool to repair your cloud when necessary. But when is it necessary? When does my cloud need fixing?

With Android we have the ability to do things in the background. Not only in the background when the phone is awake. We can wake the phone whenever we want, providing the use has granted us the right when we requested permission. What we want to do is have the phone wake up and check if instances are running.

Just asking for the status of the instance to Amazon is not enough, Amazon does not maintain the internal state of an instance. But for us it is usually enough to regularly check if the service behind a port is still there.

In Decaf the user can specify many things in the settings. She can specify the frequency of checking the instances, and how to be notified. The defaults are conservative. We check every hour and we vibrate and show the led, but do not make sound. If monitoring is off Decaf does not use any of the phone's resources.

We have a task that needs to be performed regularly. Android has the perfect feature for that, the repeating alarm. We don't need to be always there, so we don't need a long running service. We just need to be woken up, do some work, and go away again.

A repeating alarm is like a cron job. A cron job specifies the exact time, a repeating alarm is triggered after a certain amount of time. An alarm sends a broadcast to a receiver (our receivers are specified in the manifest.)

This code example shows the basics

AlarmManager mgr = (AlarmManager) context.getSystemService(Context.ALARM_SERVICE);
Intent i = new Intent(context, OnAlarmReceiver.class);
PendingIntent pi = PendingIntent.getBroadcast(context, 0, i, 0);

mgr.setRepeating(AlarmManager.ELAPSED_REALTIME_WAKEUP,
            SystemClock.elapsedRealtime(), period, pi);

The receiver is started as soon as the alarm is fired. The system sends the broadcast, and the broadcast receiver is woken up. You can either handle it, or pass it on. This way multiple receivers can handle the same broadcast.

In our case the receiver needs to check some instances, but it has no idea how long it will take. On Android you can't assume the phone will be awake especially for you. But, if you have permission, you can keep the phone awake as long as you need.

This is the broadcast receiver

public class OnAlarmReceiver extends BroadcastReceiver {
    @Override
    public void onReceive(final Context context, final Intent intent) {
        // save way to force full execution, releasing wakelock
        ExecutorService queryingThread = Executors.newSingleThreadExecutor();

        DecafWakeLock.acquire(context);
        Runnable task = new Runnable() {
            @Override
            public void run() {
                try {
                    check();
                    report();
                } finally {
                    DecafWakeLock.release();
                }
            }
        };

        queryingThread.submit(task);
    }
}

In checking the port we have to take into account that the mobile networks of today are not yet 100% ready for 'always online'. The phone is continuously switching from tower, or from network. Some areas have good coverage others very bad. In case there is no network we don't check. Decaf gets notified when the network is back again and starts checking again. By default Decaf checks each port 3 times, to make sure it is not a timeout error, for example.

Now I have an app that I can use to maintain my Amazon account with. It also monitors my instances and notifies me when something is not as expected. But I want something more. I want to have a feeling/sense of what is happening with my application. I want to see charts!

The Widget

Android has another very interesting feature. It allows you to add widgets to the home screen. These widgets are small activities with a user interface. Of course you need to live up to some expectations, but you can do very nice things with dynamic information. We are going to use a widget to show charts on aggregated CPU, network and disk information. This way we always have an idea of what is going on with our cloud.

A smartphone is not yet smart enough to be a full Nagios or Cacti. For SNMP style monitoring you need to be able to poll every minute, or more often. But Amazon AWS has released Amazon CloudWatch. This service is used by other Amazon services to determine how instances are used. Amazon AutoScaling, for example, uses CloudWatch information to start of stop instances depending on utilization of a group.

CloudWatch gives you particular metrics you can query. Available metrics are CPU, network and disk I/O. These metrics can be aggregated over groups of instances, for example based on which image they started from. You can also aggregate over the entire account.

A metric has measurements. A measurement is either an average or an aggregation over a period. You can ask the 5 min aggregate of CPU usage for the last 2 days. CloudWatch maintains a backlog of 2 weeks of this information.

For Decaf we implemented a widget showing the basic metrics over the entire Amazon EC2 account. You can choose the time frame you want to watch. You can have anything between the full 2 weeks, or only the last 4 hours. These 3 metrics are shown graphically in a widget.

Widgets are declared in the manifest and be selected by the user to be shown on the screen. When your widget is selected you have (let the user) configure it. After that the widget update service is regularly woken up to do the updating. You could also implement this using repeating alarms, but the advantage of following the 'normal' pattern is that Android handles the resources automatically. Widgets do not always get updated when not shown, for example. And widget updates get 'bundled' so the phone doesn't need to be woken up many times.

What is interesting in our widget is that we use Google Charts to draw the charts for the metrics. We use Android's basic HTTP methods to get the image and feed it directly to the widget screen.

public static Bitmap getBitmap(Series series1, Series series2) {
    Bitmap diskBitmap = null;
    String chco = "B6F1FE";
    String chls = "3,1,0";
    String series = series1.toString();

    if (series2 != null) {
        chco = String.format("FEB6DB,%s", chco);
        chls = String.format("%s|3,1,0", chls);
        series = String.format("%s|%s", series, series2.toString());
    }

    try {
        URL diskURL = new URL("http://chart.apis.google.com/chart?"
                + "chs=226x42&" + "cht=ls&"
                + String.format("chco=%s&", chco)
                + String.format("chls=%s&", chls)
                + "chf=c,s,000000|bg,s,000000&" + "chd=t:" + series);
        diskBitmap = BitmapFactory.decodeStream((InputStream) diskURL
                .getContent());
    } catch (MalformedURLException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
    return diskBitmap;
}

Conclusion

With Android you get a lot of freedom. Freedom to choose as a user, and freedom to innovate using the device as a developer. But all this freedom comes at a price. Native Android apps are a lot of fun to build, but not necessarily easy to get right.

There is much talk about the 'divergence of devices' on Android. It is a concern, and it is visibly addressed by the Android team. But if you follow the guidelines and stick to the APIs 99% of the apps will run on all available devices.

When you get deep into the hardware subsystems like camera and/or GPS it is worthwhile to buy as many devices as possible. But for the majority of apps using the emulators is enough.

I do advice to walk around with the different devices yourself. An HTC Tattoo is a device with a small screen. Handling this device, and the apps on it, is different from a G1 let alone a Nexus One.

An app like Decaf has to be reliable. If and when something fails in your infrastructure you will need tools that help you repair the problem.

Amazon EC2 in one paragraph

The 'cloud' comes in many varieties, of which one is Amazon AWS. Part of Amazon AWS is Amazon Elastic Compute Cloud (EC2) and Simple Storage System (S3.) These 2 services allow you to virtually manage your compute infrastructure (EC2) and storage (S3.) This radicalizes the way infrastructure is part of the development lifecycle. Amazon EC2 calls servers instances. An instance is started from an AMI, short of Amazon Machine Image. Instances have volumes attached (exposed as block devices) and elastic ips associated. Volumes can be saved as snapshots, a simple form of incremental backups. S3 is designed to be 100% reliable storage systems. Not as fast as machine or network storage, in combination with Amazon CloudFront fast enough for many types of media.

References

Wednesday
Jun232010

Decaf per Country

Zoran had the idea it might be interesting to look at where Decaf was sold. Talking about it we thought it might show where Amazon AWS is most popular. But because it is telling two stories at once, AWS and Android, it is not that straightforward.

The users from United States make by far the largest group, more the 56%. The Netherlands is a bit misleading because we have a lot of friends (Bas, thank you) there. Germany and Spain are living up to their relative sizes. But France is disappointing.

With the increasing daily activations (Andy Rubin says 160,000 per day) we also see an increase in our user base.

Sunday
Jun132010

Lucid Lynx not so cloud friendly?!

Last couple of weeks we have been slowly making progress get kulitzer.com up and running. Most of our production systems run Ubuntu 9.10, or Karmic Koala. At the time we started setting up kulitzer Ubuntu just released the latest server, 10.04 LTS or Lucid Lynx. (LTS stands for Long Term Support.)

For infrastructures like kulitzer we always use Images, EBS volumes, Elastic IP addresses, etc. And we do the housekeeping automatically with init scripts, so we just a launch a replacement in case of emergency, with a custom image. We want to mix and mash our cloud assets in much the same way as containers are handled in transport.

As long as I can remember you add your mount points to /etc/fstab, and whether mounting succeeds or not the boot continues and finishes. Not anymore, as we found out. If one of the entries in /etc/fstab does not successfully mount the boot process just halts and there is no SSH daemon running. Luckily we found out why, thanks to other people discussing this. Not really cloud friendly yet, if you need a console to fix this.

Stuck with no running instances, and no working images I was somewhat irritated. Luckily you can start/stop servers as much as you want, without incurring huge cost. And it is also very very easy to create volumes from snapshots and vice versa. The cause of the problem was also the solution. Lack of hardware didn't give us a console to fix this. But lack of hardware also gave us unlimited ways to create servers and disks, and combine them however we wanted.

In the end we had to rebuild the instance from scratch. We did document this, so it was not much of a problem. And we mounted the 'old' root volume, so we could copy all of our configuration to the new instance. Finally we changed out init script a little bit. Instead of mounting

$ mount /var/www

we now do

$ mount -t xfs -o defaults /dev/sdf /var/www

No entries in /ets/fstab, so a clean boot process. And our script first attaches the volume, and only if that has succeeded will it try to mount. Created an image, relaunched to test, and now it works again!!

Is Lucid Lynx cloud friendly? I think it is a great operating system. And yes, it is very ready for the cloud. But it takes time to change a complex system like Ubuntu/Linux. And we have been relying on things like consoles for tens of years. Now that we don't have them anymore (at least not where we choose to spend our time) we can't rely on them either. What this shows is that we are really in the middle of a paradigm shift...

Thursday
May272010

Oprah, KeepAlive and ELB

HTTP KeepAlive, for those who don't know, is a way to minimize overhead on HTTP requests. The client and server agree they are in session, and until the session times out (a server setting on apache) it is cheaper to use this connection than to setup another one. If you are sharing an office you do not greet your colleagues when you come back from the toilet, but you do when you return from a holiday.

I like KeepAlive because it can help you get optimal performance for your clients, and that is what it is all about. But it is not always convenient if you have to handle huge spikes, you would like to get rid of KeepAlive altogether in that case. Most of the time you have to do both with the same web servers.

At Layar we have mobile clients that do multiple HTTP requests whenever they get something from us. Optimally you want to serve this using KeepAlive. But we also host the Layar website, which is sometimes the target of a Slashdot effect or what we experienced last week, an 'Oprah phenomenon'.

KeepAlive is not the only element to take into account, we are also using Amazon Elastic Load Balancing to serve our requests. ELB introduced a feature called 'sticky sessions'. And I always wondered how these relate to KeepAlive, because ELB balances at a lower level than TCP.

I think thought I finally understand understood, a little bit. You can't rely on KeepAlive when using ELB without sticky sessions. It might even be detrimental because the KeepAlive itself adds a bit of overhead and you might be constructing a KeepAlive session with every HTTP request. So if you want KeepAlive you better enable sticky sessions to forge a (short lived) monogamous relationship between client and server. This is now 'officially' debunked by Amazon AWS.

The 'front end' HTTP sessions are totally different and separate from the 'back end' HTTP sessions. The front end sessions will have HTTP KeepAlive on, by default, for HTTP 1.1 clients. The back end sessions are recommended to be with KeepAlive enabled.

We first decided to turn off KeepAlive. We don't need it functionally, because of some weird mobile client implementation of the HTTP stack for example. And we have much more ways of optimizing client responsiveness before it pays of to look at KeepAlive. But after reading the Amazon AWS reply we will enable KeepAlive, but with the lowest possible timeout, in case Oprah plays up again...

 

Wednesday
May192010

High Availability MySQL with Amazon RDS

DNA ReplicationAmazon AWS never ceases to amaze me. They did acknowledge they were working on High Availability version of Amazon RDS. But I guess most people expected 'just' a replication option. But they launched something infinitely more valuable with Multi-AZ RDS.

I didn't come up with the name, Multi-AZ RDS, but apart from being cryptic I have to admit it IS a bit sexy. Multi Availability Zone Relational Database Server is not nearly as catchy. In short it means you start one RDS instance, that is automatically replicated to another zone. But that is not all, it gives you automatic failover.

One feature of RDS is that it handles updates automatically. It relieves the administrative burdon of upgrading/patching your database instance, not something to look forward to. Though I personally think this is one of the best features of RDS, it does come at the small price of some minutes downtime, once a week. But if you don't want this, Multi-AZ RDS is the solution.

In the same announcement Amazon says it is working on a read-only replica service. This would mean you can distribute your database read to other instances to scale out. Although not my favorite way to scale out a data store for really high load (see this post for other approaches) it makes your 'idle replica' do some work.

Amazon AWS, thank you again! For more detailed information see the Amazon Web Services Blog