January 07, 2016

AWS Lambda efficiently executes the go binary

Recently have quite using the Lambda and the API Gateway, this is called Do not, and wondering where was, AWS LambdaでJavaとNode.jsとGoの簡易ベンチマークをしてみた 1451665326> `_ that because I saw the article, try to write a related article.

Precondition

You can use nodejs etc in AWS Lambda, but you can not use golang (for now). Therefore, if you want to write with golang, as mentioned in the above article,

run nodejs -> nodejsがgolangのバイナリをchild_process.spawn で起動

It will be a way of doing.

As a result, it costs 500 megabytes each time it takes a process to launch the process with each request.

Library to solve this problem is lambda_proc is.

Lambda_proc

Lambda is started in its own container. The activated container will continue to exist for a certain period of time and its container will be used every time it is requested. So, once rather than re-start the process of a single go, or not than no longer start-up cost of go Once you have to leave to start, way that the lambda_proc is.

The communication between node and go uses stdin / stdout. Specifically it is like this.

request
client -> node --(stdin)--> go

response
go --(stdout)--> node -> client

In lambda_proc, the JSON is formatted on the node side and passed to go as one line of JSON (Line Delimited JSON). The go-side helper library formats JSON and passes it to the go main function.

In reply, when a suitable struct is returned, lambda_proc helper library formats it to JSON and returns it to node.

The actual source code of go is benchmark below. If you write it to the standard output, it will go to the node side, so you need to write the log to the standard error output.

package main

import (
     "encoding/json"
     "log"
     "os"

     "github.com/aws/aws-sdk-go/aws"
     "github.com/aws/aws-sdk-go/aws/session"
     "github.com/aws/aws-sdk-go/service/dynamodb"
     "github.com/bitly/go-simplejson"
     "github.com/jasonmoo/lambda_proc"
)

// 標準エラー出力に書き出す
var logObj = log.New(os.Stderr, "", 0)

// 元記事のメイン部分の関数
func parse(jsonStr string) {
     js, _ := simplejson.NewJson([]byte(jsonStr))
     records := js.Get("Records")
     size := len(records.MustArray())

     for i := 0; i < size; i++ {
             record := records.GetIndex(i)

             logLn(record.Get("eventName").MustString())  // fmt.Printlnは使えない
             logLn(record.Get("eventId").MustString())
             logLn(record.Get("dynamodb").MustMap())
     }

     ddb := dynamodb.New(session.New(), aws.NewConfig().WithRegion("ap-northeast-1"))
     tableName := "mytest"
     keyValue := "test"
     attribute := dynamodb.AttributeValue{S: &keyValue}
     query := map[string]*dynamodb.AttributeValue{"id": &attribute}

     getItemInput := dynamodb.GetItemInput{
             TableName: &tableName,
             Key:       query,
     }

     obj, _ := ddb.GetItem(&getItemInput)
     logLn(obj)
}

// fmt.Printlnをして標準出力に書き込むと、js側でparseしてしまうので、標準エラー出力に書き出す
func logLn(a ...interface{}) {
     logObj.Println(a...)
}

// なにかstructを返さなければいけないのでダミーの構造体を作成。普通に書くと、むしろstructを返せたほうがいいでしょう
type Return struct {
     Id    string
     Value string
}

// メインとなる関数
func handlerFunc(context *lambda_proc.Context, eventJSON json.RawMessage) (interface{}, error) {
     parse(string(eventJSON))
     return Return{Id: "test", Value: "somevalue"}, nil
}

// mainではlambda_procに登録する
func main() {
     lambda_proc.Run(handlerFunc)
}

benchmark

Lambda has standard JSON for testing. This time, DynamoDB Update have been saved at hand the JSON for testing. I prepared Lambda's API Endpoint for hitting from curl and started curl 10 times every 0.5 seconds with the following script.

for I in `seq 1 10`
do
curl -X POST -H "Content-type: application/json" --data @body.json https://hoge.execute-api.ap-northeast-1.amazonaws.com/prod/benchmark
sleep 0.5
done

When you do this,

Duration	Billed Duration	Used Memory
367.42 ms	400 ms	14 MB
36.92 ms	100 ms	14 MB
44.00 ms	100 ms	14 MB
46.05 ms	100 ms	14 MB
61.44 ms	100 ms	15 MB
50.48 ms	100 ms	15 MB

When

Duration	Billed Duration	Used Memory
393.30 ms	400 ms	14 MB
44.13 ms	100 ms	14 MB
47.99 ms	100 ms	14 MB
52.30 ms	100 ms	14 MB

Two log streams came out for CloudWatch.

You can see from this log that two containers are used. Also, it takes 400 msec for the first time, but it takes only about 40 msec for subsequent requests. Of course the memory is also minimal. I did not make a call 10 times this time, but it is OK even more quantity. Also, if you extend the execution time limit once more you will live longer and I think that the startup cost will be within a negligible range.

Note the log output

It is also described in the above code, but in the log output fmt.Println and would use would be written out to standard output, will transmitted to the node side. For that reason, we are trying to write to standard error output. You can solve this, but you should be careful when using the logging library.

The program of go has become simple

As a side effect of using lambda_proc this time, the go program has been simplified.

With an ordinary application server, we had to deal with context and various things with HTTP awareness. However, in this method, only stdin/out does not matter. AWS Lambda (and API Gateway) will cover all things related to HTTP. You only need to look at standard I / O for go, and its format is JSON which became standard.

This restriction narrows the range to be implemented, making processing very easy to write.

It also makes testing easier. If the conventional, "net/http/httptest" the upright, but you need to think Toka, you only need standard I / O.

Summary

lambda_proc By using, showed that invoke the go of the program efficiently on AWS Lambda. By doing this, I think that go can be used not only for the occasional process but also for applications to field for cargo requests.

Because lambda can use quite a computer resource for free, I want to save money by using it well.

Posted by r_rudi

Sokohaka

Python, Sphinx, Mercurial, PostgreSQL, MQTT, Ansible etc...