Long Running User-Driven Tasks in the Cloud

Step up Your Cloud Project with Containers using AWS Copilot

In the most common use cases for server-less applications in AWS, there's an API created in API Gateway and the operations are handled by Lambda functions. The operations are quick and atomic, usually retrieving and writing small amounts of data. If any operation takes a little too long, the Lambda function may time out, causing the application to seem broken (or sluggish if the quick fix is to simply increase the timeout for Lambda execution).

This is usually avoided by applying design strategies where the slow processes are done as back-end administrative processes and the user gets data that's been optimized and cached, but what if there's no way around a long-running task based directly on user input? If the user doesn't see results right away, they need to see some indication of progress or they'll likely abandon the app.

On the client, modern UI frameworks are capable of rendering progress indicators that can keep the user engaged while the task is completed, but on the backend, different architecture is going to be needed.

Lambda functions may be too atomic, but if we move it all back to a traditional server, we lose many of the advantages of scalability. There is no sense in having all the long-running tasks compete for the same CPU cycles needed to handle the API traffic when they can be running in separate single-purpose services while likely costing less. If the services are implemented as containers, we keep all the server-less advantages.

Introducing containers can seem like it's going is to blow up the project timeline with learning curves if your AWS experience is limited to SAM or Amplify, but only if you haven't tried AWS Copilot. Copilot provides the CLI tooling for building and deploying a solution made of individual services running in their own containers. It's as easy to use as SAM and Amplify, but it's capable of managing and deploying way more flexible architectural designs.

Plan the Architecture

I'll demonstrate by building a simple solution with two services. The amazing thing about using AWS with Copilot and open source dependencies is that architecture can be refactored! Get started building right away, let Copilot provision the stack in AWS, hit some early milestones and demo to your stakeholders, and once you start evolving the project, just add or change things and Copilot will compare the CloudFormation stack and delete, update or provisions services as needed.

My fictional sample will be one of those apps where the user takes a photo of a receipt and uploads it to get some kind of paltry rebate while the app owner gets a bunch of data to mine for shopping habits and consumer prices and stuff. My back-end will need to scan and analyze the receipt image and let the user know if it's accepted or not.

First I'll sketch out the pieces I'm going to need from front to back:

  • A client to take and upload a photo
  • A back-end endpoint to upload the image to
  • Some place to store the image
  • A back-end service to scan the image with OCR
  • A database to keep all the data

Next, I'll expand a little to think about the resources needed for each piece:

The front end client Normally this should be a mobile app with native access to the camera, but I'm just going to serve a simple web app from the API container to test this.

The back-end API This will be a container running a web server with an endpoint to accept file uploads and other API routes as needed.

A place to store the images Since separate containers will be reading and writing the images, an S3 bucket will keep things simple.

The image scanning service This will be another container running a process that will pick up the images, run them through AWS Textract and process the results..

The database It's necessary to keep the data in some form of a database, but I'm going to leave it beyond the scope of this article.

I also need to consider the communication channels needed between services:

API tells the back-end service a new image is ready to be processed Use SNS/SQS - The API will publish a message to an SNS topic, which will be delivered to an SQS queue that the image scanning service is monitoring for work to do.

Image scanning service keeps the API updated with progress Use SNS - The service will publish a series of status messages back to the API via another SNS topic.

API passes along the progress updates to the client Use Websocket - Upon uploading a receipt, the client will receive an ID and will open a Websocket connection back to the API. Progress updates matched with the same ID will be sent to client on this channel so that the UI can be updated.

Finally, I'll consider the costs of what I've put together:

  • ECS containers: I need two tasks, one for the API running full time, another for the receipt scanning service that I'd like to only run when needed. Initially, each task will get 1 vCPU for $0.04/hr and 2GB RAM for $0.009/hr. I'll estimate $1.20 - $2 per day for the computing power.
  • Load Balancer: Required to route internet traffic to my ECS container and must run full time at a cost of about $0.03 per hour for normal capacity, so $0.72 per day.
  • S3 Storage: This will depend on usage. It will be $0.023 per GB per month, for standard storage and $0.005 per 1000 uploads and $0.0004 per 1000 requests, so this will be an insignificant component of the cost at first.
  • SNS/SQS Very minimal cost for data transfer between SNS and SQS, and the first million requests to the SQS queue per month are free, $0.40 per million after that.
  • Textract To detect the text from the receipts will cost $1.50 per 1000 pages. This part of the cost would be the highest and most fluctuant, but directly proportionate to usage.
  • Database There's a lot of options with a wide variance of costs.

Not considering the database, I'm guessing around $3 per day to run this app with low traffic. AWS billing is complex and very fine-grained, but it scales along with your app.

There is another option if I'm not expecting high traffic at first and need to reduce costs. The API container and the load balancer must run full time, incurring hourly charges 24 hours a day 7 days a week. If I use AWS App Runner instead of Fargate, the load balancer will not be required and the container will not need to run full time.

With AWS App Runner, I'll pay full time for just the RAM to keep provisioned containers warm, and when requests come in, the provisioned containers will quickly become active (without cold start delays) and handle the requests. I'll only need to pay the full hourly cost for those active hours.

App Runner can still scale and add more instances of my container to handle traffic surges, but only via Route 53 Multivalue Answer Routing, which means the client gets a list of servers to choose from and randomly chooses which server to hit. An Elastic Load Balancer still brings more functionality for the additional cost.

My project cannot use App Runner, however. One of my requirements is the WebSocket connection. App Runner is labeled 'Request-Driven Web Application' in the Copilot config, and that indicates it's only going to handle straight requests over the https protocol. A WebSocket connection starts off as a regular https request, but has to be 'upgraded' to a steady connection back to the browser, which App Runner cannot support.

Establish a Project Structure

I've got my architecture planned out, but nothing needs to be provisioned yet. With Copilot and containerized development, I can start with some of the code first and then use Copilot to provision my stack in the cloud only once I need to start accessing AWS resources. The containers will run the same on my laptop as they will in AWS ECS.

Based on my planning, I will decide on organizing my code into two repos - one for the API and one for the image processing service. Copilot will put some config in both of these projects, but they can be part of the same application. Copilot connects to your AWS account and stores data in a parameter store on their end at initialization time. The config still lives alongside the code, but state is maintained in AWS. Maybe in a professional scenario different teams would own these different areas of the app. I just want to demonstrate that a micro-service application deployed with Copilot can be maintained as one big mono-repo, or span multiple projects.

Create the API with an Image Upload Endpoint

I'm going to start by creating a quick API with one endpoint that will accept the binary upload of the image. I'm going to use Go with Gin Gonic to build it. There's no factor other than personal preference in my case. Using containers means you can choose any runtime that has a Docker image that matches your platform, so choose one based on whatever needs you can foresee for your project, or simply your developer experience.

Coming from SAM or Amplify projects, those CLIs generate your project structure and a whole bunch of code from templates. Not so with Copilot. First write the code (from scratch or find any starter out there), then containerize it, and then initialize the copilot service based on the container.

The Code

Initialize a new go module to run a simple web server with Gin Gonic:

mkdir copilot-receipt-scanner
cd copilot-receipt-scanner
go mod init copilot-receipt-scanner/api
go get github.com/gin-gonic/gin

Add main.go and implement an upload handler:

// this code is abbreviated, see the GitHub repo for full code
import (
  "github.com/gin-gonic/gin"
)

func main() {
  r := gin.Default()
  r.POST("/upload", func (c *gin.Context) {
    file, err := c.FormFile("image")
    if err != nil {
      panic(err)
    }
    id := GenerateID()
    tmp := os.TempDir()
    uploadedFile := filepath.Join(tmp, fmt.Sprintf("%s.png", id))
    c.SaveUploadedFile(file, uploadedFile)

    // TODO: handle the file

    err = os.Remove(uploadedFile)
    if err != nil {
      panic(err)
    }
  })
  r.Run()
}

It doesn't do much yet, but right now I just want to see it take an uploaded image without panicking.

The Container

I can run this with the command go run main.go, but that's with the Go version that's installed on my laptop running in my customized environment that I cannot expect to replicate how it will be run on AWS. I need to run it from a Docker container.

Create a Dockerfile in the root of the project:

FROM golang:1.18

WORKDIR /usr/src/app

# pre-fetch the dependencies in a separate layer
COPY go.mod go.sum ./
RUN go mod download && go mod verify

# copy in the full codebase and build
COPY . .
RUN go build -v -o /usr/local/bin/app ./

# open the default port and run the executable
EXPOSE 8080
CMD ["app"]

This will pull the official golang Docker image and build our Go code in an image so that anywhere this image is pushed to will run our web server from a compiled executable in a consistent environment.

Now I can run this little web server either with my local Go installation, or from a Docker container:

go run or docker run

The Copilot Config

With some runnable code and a container to run it in, I can establish the Copilot config. It's a single command, copilot init that runs in the root of the project, but you'll need to have AWS CLI authorization for your account configured.

copilot init

This didn't deploy anything yet, but it needed AWS credentials to run, so what did it do?

Locally, there's a new copilot directory in your project. This is where the manifest files go to describe the infrastructure and configure runtime details for AWS.

In AWS, there is some core infrastructure created. If you explore the console, or run a few AWS CLI commands, the following can be found:

  • SSM Parameters - aws ssm describe parameters shows 2 parameters, one for the the main app, and one for the api service we've created in the app.
  • ECR Registry - aws ecr describe-registry shows that a registry exists, which is where it will push the Docker image to when we deploy.
  • CloudFormation Stack - aws cloudformation describe-stacks shows that one stack was created for just the infrastructure IAM roles.

Create the Client

I need a simple client in order to test the back-end. It will just need to do the file upload and show a progress indicator, so I'm going to make simple react app and bundle it in my API container. Normally it would be more scalable to host it separately, in an S3 bucket using CloudFront, but I can refactor my architecture to do it that way later. Right now this is all I need to hit an early full-stack milestone.

simple web client

I'm going to create the react app in a path under my existing API project so that I can easily build it in to the Docker image.

cd copilot-receipt-scanner
npx create-react-app web

To containerize it, I'll use a staged Docker build. The React scripts offer a build script that optimizes all the code and dependencies for production. I can have Docker run that build script in an intermediate stage and then copy the resulting files to a path in the final image. Here's what the Dockerfile to do that looks like:

FROM golang:1.18 AS go-builder

WORKDIR /usr/src/app

# pre-fetch the dependencies for the go app in a separate layer
COPY go.mod go.sum ./
RUN go mod download && go mod verify

# copy in the go codebase and build
COPY . .
RUN go build -v -o /usr/local/bin/app ./


FROM node AS node-builder

WORKDIR /usr/src/app

# pre-fetch the dependencies for the web app in a separate layer
COPY ./web/package.json ./web/package-lock.json ./
RUN npm i

# copy the rest of the web app
COPY ./web .
RUN npm run build


FROM golang:1.18 AS prod

# copy the compiled binary from go-builder
COPY --from=go-builder /usr/local/bin/app /usr/local/bin/app

# copy the React app build from node-builder
COPY --from=node-builder /usr/src/app/build /var/www/html

# open the default port and run the executable
EXPOSE 8080
CMD ["app"]

There's quite a bit more to explain here.

First, I modified the Dockerfile from a few steps ago so that the image it was building is just my 'go-builder' stage. FROM golang:1.18 AS go-builder. It copies in the source code and runs the build, producing a compiled binary. I need that binary in the final image, but not the source code.

Next, I added another stage by appending a whole separate Docker image definition starting with FROM node AS node-builder. It's similar to the go-builder stage, but I need node so that it will run the build with npm build,

Finally, I appended a 'prod' stage which is going to be the one that ships. This one starts with FROM golang:1.18 AS prod. I need golang since that's what runs the server, but I don't need any source code copied in. The COPY --from=go-builder command will copy just the compiled binary from that first stage, and the COPY --from=node-builder command will copy just the optimized html css and javascript from the second stage into the /var/www/html path. The Gin Gonic server can then serve the contents of that static directory.

When I build the whole thing from just one docker build command, it will go through those first two stages and do all the build work, but the resulting image will only have the build output, the minimum needed to run. It doesn't seem like a big deal initially, but it will shrink the size of the image that gets deployed to ECR and AWS bills for ECR by the amount of data stored.

docker staged build

Docker Tip #1: Notice how in both the builder stages I copy in my source with two separate COPY commands. For the go build I bring in go.mod and go.sum first, and for the npm build I bring in package.json and 'package-lock.json` first. Then I run the respective dependency installations, and only then do I copy in the rest of the source code. Docker builds these images in layers with each command in the Dockerfile generally creating a layer, and if doesn't detect any differences in a given layer from one build to the next, it uses cache. It takes time to install dependencies, and I don't want to waste that time if I didn't add or update any dependencies. If my code changes, but the dependency manifests don't change, then the docker build will get past the dependency installation layer before it sees differences and starts to rebuild layers.

Docker Tip #2: Create a .dockerignore file in the project root and put /copilot in it. That way, when you get to the part where you're troubleshooting a problem in your Copilot manifest and running the deploy command over and over again, the copilot files will not be copied into your image (where they're not needed at all), and you won't have to wait for Docker to rebuild and push even though you only changed the manifest.

Send the Uploaded Image to a Backend Service

At this point, uploaded images are just sitting in the filesystem of the API container. I don't want to do much work in that container since it's handling requests from the users, and I don't want the images sitting on that ephemeral filesystem for longer than needed. I want a separate service in another container to be standing by to pick the image up and process it. I need to establish the channel to send the image off somewhere for processing. Here's what I'll do:

  • Move the uploaded image to an S3 bucket.
  • Publish a message to an SNS topic that a new image is ready to process.

Use S3 for Storage

It's so common to add storage for a service using either S3 for file-based storage or DynamoDB as a simple database that Copilot will take care of it for you easily with a guided CLI command.

copilot storage init -t S3

add an S3 bucket with Copilot CLI

This didn't create the bucket in AWS just yet, what it did was add some 'add-on' config to my Copilot manifest. There's a new file in my project, copilot/api/addons/receipt-uploads.yml, which contains a bunch of YAML code that may look familiar if you've worked with AWS CloudFormation templates before. When I deploy this, Copilot will deploy this config as a sub-stack, the S3 bucket will be provisioned, and my container will get another runtime environment variable with the name of the S3 bucket. I provided 'receipt-uploads' as the storage name, so the env var I can use in my code to get the full bucket name is RECEIPTUPLOADS_NAME

Back in my code, when the file is uploaded, I'm going to upload it to this S3 bucket:

// this code is abbreviated, see the GitHub repo for full code
import (
  "github.com/aws/aws-sdk-go-v2/config"
  "github.com/aws/aws-sdk-go-v2/feature/s3/manager"
  "github.com/aws/aws-sdk-go-v2/service/s3"
)

func main() {
  bucket := os.Getenv("RECEIPTUPLOADS_NAME")

  r.POST("/upload", func(c *gin.Context) {
    id := GenerateID()

    // upload the image to S3
    uploader := manager.NewUploader(s3.NewFromConfig(cfg))
    openedFile, err := os.Open(uploadedFile)
    newKey := fmt.Sprintf("uploads/%s/image.png", id)
    s3Result, err := uploader.Upload(context.TODO(), &s3.PutObjectInput{
      Bucket: &bucket,
      Key:    aws.String(newKey),
      Body:   openedFile,
    })

    // send the S3 location as the SNS message
    input := &sns.PublishInput{
      Message:  &s3Result.Location,
      TopicArn: &snsTopics.NewImage,
    }
    snsResult, err := client.Publish(context.TODO(), input)
  })
}

Publish to an SNS Topic

The key to getting micro-service architectures to work is Pub/Sub - publishing messages from one service and subscribing to them in another. In AWS, SNS/SQS takes care of this. Simple Notification Service handles the messaging, and since there could be multiple services on the receiving end and they might be asleep, Simple Queue Service collects the messages and hands them out to services when they're ready.

SNS/SQS is very simple to use, but it has to be configured. Either several steps in the AWS console, extra config in the CloudFormation template, or a shell script of AWS CLI commands will need to accompany the deployment. Since the configuration is pretty standard, Copilot will take care of this with just a bit of config in the manifest.

Find the manifest for the API service we already created:

copilot/api/manifest.yml

publish:
  topics:
    - name: newImage

All that's needed is a list of one or more meaningful topic names under the publish key in the Copilot manifest. The Copilot CLI will provision the SNS topic and configure permissions so that the service can publish to it and so that SQS queues in the AWS account can subscribe to it.

It will also add an environment variable to my container: COPILOT_SNS_TOPIC_ARNS. It's value at runtime will be a JSON serialized string containing ARNs for each topic. To use it, I need to add the AWS SDK to my dependencies. The SDK is available for just about every popular modern programming language. I'm using Go, so I will 'go get' the sdk:

go get github.com/aws/aws-sdk-go-v2
go get github.com/aws/aws-sdk-go-v2/config
go get github.com/aws/aws-sdk-go-v2/service/sns

In main.go, I'll publish to the SNS topic when an image is uploaded:

// this code is abbreviated, see the GitHub repo for full code
import (
  "encoding/json"

  "github.com/aws/aws-sdk-go-v2/service/sns"
)

type SNSTopics struct {
  NewImage string
}

func main() {
  cfg, err := config.LoadDefaultConfig(context.TODO())

  var snsTopics SNSTopics
  err = json.Unmarshal([]byte(os.Getenv("COPILOT_SNS_TOPIC_ARNS")), &snsTopics)

  r.POST("/upload", func(c *gin.Context) {
    client := sns.NewFromConfig(cfg)
    message := "New image uploaded"
    input := &sns.PublishInput{
      Message: &message,
      TopicArn: &snsTopics.NewImage
    }

    result, err := client.Publish(context.TODO(), input)
    fmt.Printf("Message ID: %s", *result.MessageId)
  })
}

Create the Backend Service to Receive the SNS Messages

In my architecture, processing the image is a completely separate concern running a separate container. I'm going to create a separate code repo for it, but initialize it with Copilot as part of the same application.

It will require the same steps I did to create the API - I need to write the code, containerize it, and then have Copilot initialize it as an additional service based on the Dockerfile I create.

The Code

Initialize a new go module that will use the AWS SDK to wait on messages sent from the API.

mkdir copilot-receipt-scanner-backend
cd copilot-receipt-scanner-backend
go mod init copilot-receipt-scanner/image-handler
go get github.com/aws/aws-sdk-go-v2
go get github.com/aws/aws-sdk-go-v2/config
go get github.com/aws/aws-sdk-go-v2/service/sqs

My go module will poll the SQS queue for messages, and handle the image whenever a message comes in. Receiving messages from SQS is a polling operation, so I'll have an endless loop that will keep calling the method to receive a message.

// this code is abbreviated, see the GitHub repo for full code
import (
  "github.com/aws/sdk-go-v2/service/sqs"
)

client := sqs.NewFromConfig(cfg)
queueUrl := os.Getenv("COPILOT_QUEUE_URI")
input := &sqs.ReceiveMessageInput{
  QueueUrl: &queueUrl,
  MaxNumberOfMesages: 1,
  MessageAttributeNames: []string{
    "All"
  },
  WaitTimeSeconds: 10,
}

for {
  response, err := client.ReceiveMessage(context.TODO(), input)
  for _, msg := range response.Messages {
    process(msg.Body)
  }
}

If I were anticipating high, profitable traffic, I would have many instances of this service running at once to make sure each image gets processed quickly, but if not and if I need to control costs, I wouldn't leave this loop running full time. I would have a lambda function invoked whenever new uploads come in that would use the ECS API to set the desired task count >0 and then use a CloudWatch event to set the desired number of tasks back down to zero when the queue is empty.

SQS Tip: If you're doing something expensive to process the messages received via SQS (like sending an image to Textract to detect text), make sure to have your code delete the message once it's handled it with no error. SQS will hide the message for 30 seconds when it's received by a worker, but if it's not deleted by the worker who received it, the message will become visible again like nothing happened. When your code isn't working yet, this is handy because you can keep repeating the process on the same SQS message without starting over, but once your code is working, it's easy to forget this part and come back to all your free tier used up.

To be safe and avoid leaving a service running in the test environment overnight or any other period of time that you aren't working on the project, use the Copilot command to delete the service from the environment:

copilot svc delete -e test --name image-handler

The rest of the infrastructure for the environment remains in place, and the service will come right back the next time you run copilot deploy (which you're doing frequently anyway).

The Container

The Dockerfile is pretty similar to what I used for the API container, except I don't need to build my web client or expose any ports to the Internet.

FROM golang:1.18 AS go-builder

WORKDIR /usr/src/app

# pre-fetch the dependencies for the go app in a separate layer
COPY go.mod go.sum ./
RUN go mod download && go mod verify

# copy in the go codebase and build
COPY . .
RUN go build -v -o /usr/local/bin/app ./


FROM golang:1.18 AS prod

ENV STAGE=prod

# copy the compiled binary from go-builder
COPY --from=go-builder /usr/local/bin/app /usr/local/bin/app

CMD ["app"]

The Copilot Config

I can just run copilot init in this new repo, and the guided CLI will help me configure this as an additional service on my existing app.

copilot init add a worker service

This generates a manifest file for a Worker Service, which includes the capability to create the SQS queue to receive messages from another service. I need to configure it in the manifest.

copilot/image-handler/manifest.yml

subscribe:
  topics:
    - name: newImage
      service: api

The config is very simple, but it gives Copilot enough information to set up the SQS queue and subscribe it to the SNS topic from the other service. It will provide the queue URL in environment variable named COPILOT_QUEUE_URI, which I'm already expecting in the code above.

Deploy and Test

Both my modules are now fully dependent on AWS resources, using environment variables that Copilot is going to provide when this deployed to AWS. Nothing has been provisioned yet either. I'm going to need to run a deployment before I can test anything.

Copilot introduces the concept of distinct environments. I can deploy this application to the same AWS account multiple times as different environments. Each the services and all the infrastructure, down to the VPC, will be isolated in their own stacks. I can start by creating a 'test' environment, and later when I deploy a 'prod' environment, I can go back and push updates to the 'test' environment and create a 'stage' environment to bring forth and test a new release without touching 'prod' until it's ready.

copilot env init --name test

create the test environment

That created the AWS infrastructure for an environment. Earlier, when I ran copilot init for the the first time, it created a few AWS resources that were global to the application. Now it's created a lot more AWS resources that are specific to this environment I've labeled 'test'.

Now I can deploy my two services to the 'test' environment. Just go into the project path for each project and run the simple deploy command:

copilot deploy

Copilot will pick everything up from context. It will build the Docker image, upload it to ECR and then provision the stack with all the AWS resources for that service. After running it for the API service, it displays the AWS App Runner URL which can be opened in your browser to try the app running directly from the test environment.

Establish a Channel Back to the Client for Progress Updates

We have arrived at the novel challenge. Up to this point everything has just been getting up to speed with Copilot. Now I need to solve something beyond the documentation.

To recap, the user uploaded an image, the API put the image in S3 and published an SNS message about it. A worker service monitoring the SQS queue has picked up this message and is doing the work on the image. The user needs to know if the image was acceptable or not, so the progress needs be communicated from the worker service all the way back to the to browser.

Traditionally, this would be solved by having both the worker service and the API accessing the same database or other storage, and the client would just poll an API endpoint, waiting for something to update. I'd rather not poll, that can end up meaning a huge amount of hits and a lot of work for an expensive database.

The worker service can easily publish to another SNS topic, and the browser can establish a Websocket connection to the API server, but if I use SQS my API server is going to be pulling double-duty handling requests and monitoring the queue for messages. All that would do is lift the polling part up one level. What else can I do in the middle?

SNS is pretty versatile, besides delivering to an SQS queue, it can send e-mails, text messages, or post to an http/https endpoint.💡 There it is, that's what I want to try! My API will be able to stick to it's single duty of handling requests, and instead of receiving extra poll requests from a client based dumbly on a passage of seconds, it will receive a single POST request from SNS based on the actual status update sent from the worker service that it will then pass along to to the browser via the Websocket connection.

Publish to SNS from the Backend Service

It was so easy to set up the first SNS topic, because Copilot did it for us. This one goes in the wrong direction - the worker service has to have permission to publish to it. Copilot doesn't do any of that for us, but it does have the wide-open ability to handle add-ons with any CloudFormation template.

Copilot added an add-on previously when we set up the S3 storage, now I'm going to create one myself under the backend project. I need to define an AWS::SNS::Topic and I need to give worker service permission to publish to it, but in this add-on template, I don't have any of the resources to reference. I'll need to add an AWS::IAM::ManagedPolicy that generically grants access to publish to the SNS topic and then put in the 'Outputs'. Copilot will see that and inject it into the service role.

copilot/image-handler/addons/image-status.yml

Parameters:
  App:
    Type: String
  Env:
    Type: String
  Name:
    Type: String

Resources:
  imageStatusSNSTopic:
    Metadata:
      'aws:copilot:description': 'A SNS topic to broadcast image status events'
    Type: AWS::SNS::Topic
    Properties:
      TopicName: !Sub '${AWS::StackName}-imageStatus'
      KmsMasterKeyId: 'alias/aws/sns'
  imageStatusSNSAccessPolicy:
    Type: AWS::IAM::ManagedPolicy
    Properties:
      PolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: 'Allow'
            Action: 'sns:Publish'
            Resource:
              - !Ref imageStatusSNSTopic

Outputs:
  ImageStatusAccessPolicy:
    Value: !Ref imageStatusSNSAccessPolicy
  ImageStatusSNSTopic:
    Value: !Ref imageStatusSNSTopic

This will be deployed as a sub-stack - Copilot will pass in values for 'App', 'Env', and 'Name' of the service, and then my output of the IAM Access Policy will be injected into the service role for my worker service, and the topic will added as environment variable.

Receive the Message in the API

It's time for the inevitable obstacle. I need to set up an https subscription for this SNS topic. I could do that directly in the CloudFormation template with the Subscription property, but I need the url to the deployed app, which isn't exported from the main stack or provided as a parameter or anything that would get it into my template without hardcoding.

I'm going to need to do the hard-coding, but at least I can do it safely by setting up a mapping in the template so it keys on the Copilot environment.

Mappings:
  EndpointMap:
    test:
      Url: # APP RUNNER URL /imageStatus

Resources:
  imageStatusSNSTopic:
    Type: AWS::SNS::Topic
    Properties:
      Subscription:
        - Endpoint: !FindInMap [EndpointMap, !Ref Env, Url]
          Protocol: 'https'

It's annoying because we're going to have to paste the url in here after deployment of the API server and before the deployment of the Worker service, but at at least it will only need to be done once for each new environment and it will never accidentally use a url from the wrong environment.

I would love to see Copilot support cross-stack imports. Service discovery is supported, so services can find each other, at runtime, via networking within the VPC, but if one service configures an S3 bucket or a database as a storage add-on and another service wants to share it, we have to do all this hard-coding between deployments.

Updating the worker service code is a simple task, at least. Since I included 'ImageStatusSNSTopic' in my Outputs, referencing the SNS Topic resource I created, Copilot will put the resolved ARN in an environment variable named IMAGE_STATUS_SNS_TOPIC for my worker service to use to publish the message.

This time my message will be a JSON document with the image identifier (the original message) and a status message.

type StatusMessage struct {
  ImageID string
  Status string
}

var topic = os.Getenv("IMAGE_STATUS_SNS_TOPIC")

func process(cfg *aws.Config, location *string) {
  client := sns.NewFromConfig(*cfg)
  u, err := json.Marshal(StatusMessage{ImageID: *location, Status: "started"})
  message = string(u)

  input := &sns.PublishInput{
    Message:  &message,
    TopicArn: &topic,
  }
  snsResult, err := client.Publish(context.TODO(), input)
}

The API is going to need a new endpoint to receive these posts from SNS, and I'm going to need a good way to dispatch the messages based on the data enclosed in the message.

SNS Tip: If you create an SNS subscription with an endpoint to anything other than an SQS queue, it has to be confirmed on the receiving end. Before you start testing and trying to troubleshoot missing SNS messages, go into the console and look under AWS SNS Subscriptions to see if any are 'Pending Confirmation.'

Establish a Websocket Connection with the client

Now we can tie it all together. In the React app, the post to upload the image is going to return the image identifier that my app generates, and with this I'll create a new Websocket connection on another endpoint:

var c = new WebSocket(`${apiUrl}/imageStatus/${imageId}/ws`)
c.onmessage = (msg) => {
    console.log(msg)
}

On the server, this is a new route where the request gets 'upgraded'.

import (
  "github.com/gorilla/websocket"
)

func main() {
  r.GET("/imageStatus/:id/ws", func(c *gin.Context) {
    // handle the request as a websocket request
    wsHandler(c.Writer, c.Request, c.Param("id"))
  })
}

func wsHandler(w http.ResponseWriter, r *http.Request, id string) {
  // use the request and response writer to create the websocket connection
  conn, err := wsUpgrader.Upgrade(w, r, nil)

  // a separate thread can communicate via this connection
  go func() {
    conn.WriteJSON(gin.H{"status": "message"})
  }()
}

var wsUpgrader = websocket.Upgrader{
  ReadBufferSize:  1024,
  WriteBufferSize: 1024,
  CheckOrigin: func(req *http.Request) bool {
    return true
  },
}

That gives me a thread that can send messages to the browser via a websocket connection, but the status messages that I want to push through this connection aren't here, they're coming in via another route in a completely different thread, and they'll be coming through for every different image that's being processed in the whole app. How do I get the right messages in that thread?

This last challenge brings up one of my favorite features of the Go programming language - channels. Usually, programming with multiple threads gets very complicated and risky sharing data structures in memory between threads. Channels are a native medium of communication for threads. The main thread can create a channel, pass it as an argument to a goroutine, and then push values onto the channel and the goroutine can wait for and receive them.

What I'm going to do to solve my last problem is make a map of channels at the top level. Each image that's sent off for processing will get a new channel created and stored in the map by the image identifier. When the websocket connection is created, the goroutine will be passed the channel and will wait to receive something from it. When a status update comes in via the SNS post, it will find the channel in the map based on the image identifier and send the status message into the channel. When a status message comes in that says the process is finished, the goroutine will send that last message and then destroy the channel and return.

statusChannels := make(map[string](chan string, 3))

// status updates arrive from SNS
statusChannels[imageID] <- message

// goroutine receives the message
message <- statusChannels[imageID]
conn.WriteJSON(gin.H{"status": message})

When those messages are received on the other end of the Websocket connection in the browser, React state can be updated which will re-render the view and give near-instant status updates.

⚠️Caution: If you're building your own Copilot app and you skipped down here to see how I handled the WebSocket bit, make sure you're using a 'Load Balanced Web Service' as your front end, and not the less costly 'Request-Driven Web Application'. The latter uses AWS App Runner, and it will not support WebSocket connections. Those connections use the protocol 'ws://' or 'wss://' and you'll get a 403 response without any evidence that the request even reached your web server. The beauty of Copilot is that you can simply refactor the manifest to change the task type, delete the service and redeploy with the same code.

Wrap Up

That was a lot of concepts to cover in one article, but I created a functioning multi-service web application that works enough to get a quick screen capture.

I'm not offering a demo URL. I don't want anyone actually using this app to upload receipts and blowing me out of my AWS Free Tier. The url might be in one of those screen captures, but that's another thing I love about Copilot:

copilot app delete

If this was a real app, and I launched and it didn't need to do any more work on it for a while, I could just delete the 'test' environment to reduce costs, but this was an educational project and I don't want to risk incurring any charges or leaving it out there vulnerable to attack. I just want to preserve my notes and my source code. One simple command will wipe the whole app out of my AWS account and leave me with a clean slate for my next project.