In this example, we use Python to listen to the Tracking Stream endpoint on AWS Lambda.
We log the arriving target updates of a filtered query that only shows target updates over Atlanta airport. Before reaching a set time-out we gracefully disconnect right after receiving a position_token; this way we avoid duplicate target update delivery. We store the position_token to re-connect to the same point in the stream on the next scheduled Lambda call.
The components of the infrastructure are:
- AWS Lambda to configure the connection and to receive target updates;
- AWS Cloudwatch Events to trigger Lambda regularly; and
- AWS S3 to store the location of the stream at time-out.
You can find and download the source code for this example here.
After reading this section you will know the purpose of each file in this example, and which role it plays.
client_test.pycontains production-ready sample client code that wraps the v2/targets/stream API;
It exposes target updates via callbacks and handles graceful disconnection to avoid duplicate target update delivery.
client.pyand manages loading and storing position_tokens, that encode the position in the stream that the client has progressed to. This is also where the TargetProcessor class is located: This class processes target updates as they come in, and exposes a callback function to do so.
terraform.tfcontains the infrastructure definition for AWS. Using Terraform 13 you can create the Lambda function, S3 bucket, and CloudWatch Event.
Pipfile.lockdefines all Python package dependencies for the client and the handler, like boto3 to write to S3, the requests package to easily call network resources, and pytest for testing and development.
assemble-lambda.shcontains a script that assembles all installed python packages and local python files into one zip file that can be uploaded to AWS Lambda either via the AWS CLI, via Terraform, or via the AWS console.
The next section lists all prerequisites, with links and notes on how to install them.
To execute this tutorial you need the following tools and accounts on your system:
- An AWS account, with configured profiles e.g. in
- Terraform 0.13 to create the AWS infrastructure;
- Access to a bash shell with
zipon the PATH to execute the
- pyenv to download and load the correct Python version;
- Pipenv to create the virtual environment and install dependencies; and
- Git to download the source code.
Both pyenv and Pipenv can be installed via pipx.
Following the installation of these prerequisites, you can walk through the next section to set up the example on your account.
This section contains the necessary commands to build and deploy the example and describes what they do.
Clone the repository, and navigate to the
pyenv initand follow the instructions, to get pyenv ready to load the correct python version
pyenv install 3.6-devto install the lastest version of Python 3.6.
Set the local python version to 3.6-dev by running
pyenv shell 3.6-dev;
python --versionshould now return
Python 3.6.11+or similar.
Create a virtual environment by executing
pipenv shell. This creates and loads a virtual environment.
First, we test the client and handler to see if we have all dependencies installed.
pipenv sync --devto install all development dependencies as specified by
- Test the client and handler by running
pytest --cov=. -vv. The following steps assume that this test passed.
Now we minimize the installed dependencies, to keep the lambda handler small. We remove the virtual environment, and recreate it, but install only the production dependencies. Do this by running
pipenv --rmand then
pipenv sync, this time without the
At this point, we tested that the source code works, and we have production dependencies installed. Now we assemble all code and dependencies into a zip file for AWS Lambda. Do this by calling
This creates a file called
lambda.zip in the current working directory.
- It is time to create the infrastructure on AWS. Run
to create an S3 bucket, a Lambda function
and a CloudWatch Event that will trigger the Lambda every 5 minutes. Terraform will ask for the value of three
airsafe2_token which is the Tracking Stream token that has been issued by Spire,
profile which is the
AWS credential set you want to use (defined by the heading in
region, which will determine
the AWS region the service will be deployed in. Follow the instructions given by terraform, and confirm if you agree to
create the infrastructure and roles. Note that this will create an infrastructure that will incur fees determined by AWS.
Note also: If the S3 bucket name is in use, it might be necessary to choose a different value for
resource "aws_s3_bucket" "last-position-token" in
terraform.tf. Terraform resolves dependencies between components
automatically, so there is no need to reconfigure environment variables etc. in e.g. the Lambda.
- Navigate to the AWS Console (e.g. for region us-west-1: https://us-west-1.console.aws.amazon.com/lambda/home?region=us-west-1#/functions/airsafe-2-streamer?tab=monitoring) to see Lambda invocations, and click on "View logs in CloudWatch" to see logs.
In this section, you created a Lambda function, an S3 bucket, and a CloudWatch event to schedule the Lambda function. At this point these components should be active on AWS, and called every 5 minutes, logging target updates over Hartsfield–Jackson Atlanta International Airport.
The results will mention some aspects regarding this setup to consider.
While the working stream should be visible now in CloudWatch Logs, here are some things to consider when consuming the Tracking Stream product.
CloudWatch Events triggers the Lambda periodically, and S3 is used to persist state across invocations. Having an event trigger outside the running system ensures that the stream has a chance to recover automatically, in case e.g. a write-operation to S3, or a Lambda execution failed.
A different approach that would not show this kind of resiliency would be to trigger the Lambda from SNS, and write the last position_token to SNS again to trigger the next Lambda invocation. This circular dependency means that one failure will halt the system.
Handling timeouts gracefully means that for each time interval that we trigger the lambda, we have less than that interval to retrieve all newly available data. This effect gets more pronounced the shorter the time interval.
Graceful timeout means that the client will disconnect from the stream when it receives its last guaranteed position token (v2/targets/stream sends a position_token at least every 15 seconds + latency). So having the Lambda limited to e.g. 300 seconds as a hard limit means we will set a timeout of about 295 seconds for the stream client. Gracefully timing out on a 295-second timeout means the client will terminate the connection as soon as it receives a position_token 20 seconds before the timeout (i.e. sometime after 275 seconds). This means that the client has 275 seconds to download 300 seconds worth of data (~92%). Setting a hard Lambda timeout of 30 seconds would mean 5 seconds to fetch 30 seconds worth of time (~17%).
It is important to keep timeouts high, when gracefully timing out. An alternative would be to store each position token as it arrives, listen until receiving a hard timeout, and deal with some duplicate target updates that will arrive when reconnecting.
The volume of data in the stream can be high, depending on the used filter parameters. Spending a lot of time processing each target update can mean that the consumer falls back in the stream. At a certain point, the consumer will be so far back that the server will disconnect. This will mean the consumer will lose out on available data.
It is useful to keep the amount of processing that is happening directly at data ingestion very low. Instead, any processing that takes more than a minimal amount of time can be executed in a scalable environment or asynchronously.
To destroy the infrastructure execute