Skip to content

Latest commit

 

History

History
 
 

scale-out

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

Amazon Elastic File System (Amazon EFS)

Scale-out Tutorial

Version 1.0.2

efs-st-1.0.2


© 2018 Amazon Web Services, Inc. and its affiliates. All rights reserved. This sample code is made available under the MIT-0 license. See the LICENSE file.

Errors or corrections? Email us at [email protected].


Tutorial Overview

Overview

This tutorial is designed to help you better understand the distributed data storage design of Amazon Elastic File System (Amazon EFS) and how to best leverage this design by taking advantage of scale-out achitectures.

Prerequisites

  • An AWS account with administrative level access
  • An Amazon EC2 key pair
  • An Amazon VPC
  • An Amazon EFS file system

If a key pair has not been previously created in your account, please refer to Creating a Key Pair Using Amazon EC2 from the AWS EC2 User's Guide.

Verify that the key pair is created in the same AWS region you will use for the tutorial.

If a VPC and EFS file system have not been previously created, please refer to Create a File System tutorial

The Environment

The tutorial steps will launch an EC2 Spot Fleet. Please use the recommended default instance type. All instances will automatically mount the Amazon EFS file system and register with EC2 SSM, which will be used to remotely execute a script to transfer objects from Amazon S3 to Amazon EFS in parallel.

Each instance will download all S3 objects from the public dataset https://aws.amazon.com/public-datasets/dc-lidar/.

For more information about this dataset, please refer to http://geospatial.dcgis.dc.gov/templates/dcfinder/s2.html?appid=62c9607bcfb349d5988e39390d50e995

WARNING!! This tutorial environment will exceed your free-usage tier. You will incur charges as a result of building this environment and executing the scripts included in this tutorial. Delete all files on the EFS file system that were created during this tutorial and delete the stack so you don’t continue to incur additional compute and storage charges.

Tutorial

Step 1: Create Amazon EC2 Spot Fleet Role

This step involves creating an IAM role used to run and launch an Amazon EC2 spot fleet.

1.1. Sign in to the AWS Management Console and navigate to the IAM service page

1.2. Select Roles on the left frame and select Create Role

1.3. Select AWS Service as the type of trusted entity, EC2 as the service that will use this role, and EC2 Spot Fleet Role as the use case. Select Next: Permissions.

1.4. Verify the AmazonEC2SpotFleetRole policy is listed and select Next: Review.

1.5. Give the role a name, like efs-scale-out-tutorial-ec2-spot-fleet-role and select Create Role.

Step 2: Create Amazon EC2 Role

This step involves creating an IAM role used by the EC2 instances.

2.1. Sign in to the AWS Management Console and navigate to the IAM service page

2.2. Select Roles on the left frame and select the Create Role.

2.3. Select AWS Service. as the type of trusted entity, EC2 as the service that will use this role, and EC2 Role for Simple Systems Manager as the use case. Select Next: Permissions.

2.4. Verify the AmazonEC2RoleforSSM policy is listed and select Next: Review.

2.5. Give the role a name, like efs-scale-out-tutorial-ec2-instance-role and select Create Role.

2.6. Find the role you just created by typing in it's name in the search field efs-scale-out-tutorial-ec2-instance-role and click the role's name link.

2.7. Select the +Add inline policy link on the bottom right side of the window.

2.8. Select Custom Policy and Select.

2.9. Give the policy a name, like efs-scale-out-tutorial-ec2-instance-policy and paste the policy snippet below into the Policy Document field.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "ec2:CreateTags",
                "ec2:DescribeAvailabilityZones",
                "ec2:DescribeNetworkInterfaceAttribute",
                "ec2:DescribeNetworkInterfaces",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeSubnets",
                "ec2:DescribeVpcAttribute",
                "ec2:DescribeVpcs",
                "elasticfilesystem:Describe*",
                "kms:ListAliases"
            ],
            "Resource": "*",
            "Effect": "Allow"
        }
    ]
}

2.10. Select Validate Policy to make sure the policy is valid. Select Apply Policy.

Step 3: Create an Amazon EC2 Spot Request

This step involves creating an Amazon EC2 Spot request

3.1. Sign in to the AWS Management Console and navigate to the EC2 service page

3.2. Select Spot Instances on the left frame under INSTANCES and select the Request Spot Instances

3.3. Create a Spot Instance Request using the following settings:

Select request type section

  • Request type: Request

  • Target capacity: Select the size of the fleet (recommend 5)

  • AMI: Select the latest Amazon Linus AMI (e.g. Amazon Linux AMI 2017.09.0.20170930 x86_64 HVM GP2)

  • Instance type(s): r4.large

  • Allocation strategy: Lowest price

  • Network: Select the appropriate VPC

  • Availability Zone: Select all availability zones

  • Maximum price Use automated bidding (recommended)

  • Select Next

Configure storage section

  • Keep all defaults

  • Set instance details section

  • Monitoring: Enable CloudWatch detailed monitoring

  • Tenancy: Default - run a shared hardware instance

  • User data: As text paste the cloud_config snippet below into the text box. IMPORTANT!! There is one placeholder in the script below that needs to be updated with the EFS file system id you want to use. In the runcmd: section replace !!!Add_file_system_id_here!!! with the file system id you want to use. e.g. - filesystem=fs-123456af

#cloud-config
repo_update: true
repo_upgrade: all

packages:
- parallel

write_files: 
- content: |
    #!/bin/bash -xe
    FILE_SYSTEM_ID=$1

    # get instance-id, availability-zone, and region from intance metadata
    instanceid=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
    availabilityzone=$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)
    region=${availabilityzone:0:-1}

    # wait for file system DNS name to be propagated
    results=1
    while [[ $results != 0 ]]; do
      nslookup $FILE_SYSTEM_ID.efs.$region.amazonaws.com
      results=$?
      if [[ results = 1 ]]; then
        sleep 30
      fi
    done

    cd /$FILE_SYSTEM_ID/lidar/${instanceid}

    # create threads variable
    threads=$(($(nproc --all) * 8))

    aws ec2 create-tags --resources ${instanceid} --tags "Key=Name,Value=Scale-out Tutorial parallel s3 cp (waiting)" --region ${region}
    
    # create a list of objecst to get
    aws s3api list-objects --bucket dc-lidar-2015 --output text --query 'Contents[].{Key:Key}|[?contains(Key, `.las`) == `true`]|[?contains(Key, `.lasx`) == `false`]' | awk '{print "s3://dc-lidar-2015/" $1}' > /tmp/$FILE_SYSTEM_ID-dc-lidar-2015.txt

    aws ec2 create-tags --resources ${instanceid} --tags "Key=Name,Value=Scale-out Tutorial parallel s3 cp (running)" --region ${region}

    # get the s3 objects in parallel
    sudo cat /tmp/$FILE_SYSTEM_ID-dc-lidar-2015.txt | parallel --will-cite -j ${threads} 'aws s3 cp {} .; grep -o ".\{8\}$"'

    aws ec2 create-tags --resources ${instanceid} --tags "Key=Name,Value=Scale-out Tutorial parallel s3 cp (complete)" --region ${region}

  path: /tmp/scale-out-tutorial-get-lidar-data.sh
  permissions: 0777

runcmd:
# set variables
- instanceid=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
- availabilityzone=$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)
- region=${availabilityzone:0:-1}
- filesystem=!!!Add_file_system_id_here!!!

# installs
- yum --enablerepo=epel install nload -y

# make directories and mount file system
- mkdir -p /${filesystem}
- chown ec2-user:ec2-user /${filesystem}
- echo "${filesystem}.efs.${region}.amazonaws.com:/ /${filesystem} nfs4 nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 0 0" | sudo tee /etc/fstab
- mount -a -t nfs4
- mkdir -p /${filesystem}/lidar/${instanceid}

# tag the instance
- aws ec2 create-tags --resources ${instanceid} --tags "Key=Name,Value=Scale-out Tutorial" --region ${region}
  • Instance tags: Add the following Kay:Value pair
  • Key: Name
  • Value: Scale-out Tutorial

Set keypair and role section

  • Key pair name: Select an existing EC2 key pair
  • IAM instance profile: Select the EC2 instance role you created in Step 2 above. (e.g. efs-scale-out-tutorial-ec2-instance-role)

Manage firewall rules section

  • Security groups: Select a security group that has NFS (TCP 2049) inbound access to the file system's mount targets

  • auto-assign IPv4 Public IP: Use subnet setting (Enable)

  • Accept the defaults for the remaining settings on this page.

  • Select Review

  • Review all the settings.

  • Select Launch

It will take a few minutes for the Amazon EC2 Spot request to be accepted and fulfilled. Before continuing, wait for all EC2 instances requested to show up in the EC2 console with the key:value pair Name : Scale-out Tutorial.

Step 4: Use SSM to remotely start file transfer from Amazon S3 to Amazon EFS

This step involves remotely running a script using SSM to start the transfer from S3 to EFS.

4.1. Sign in to the AWS Management Console and navigate to the EC2 service page

4.2. Select Managed Instances on the left frame under SYSTEMS MANAGER SHARED RESOURCES and select the Request Spot Instances

4.3. Verify all instances in the spot fleet are displayed. Instances should have the key:value pair Name : Scale-out Tutorial.

4.4. Select Run a command

4.5. Create a command using the following settings:

  • Select AWS-RunShellScript as the Command document.

  • Select Specifying a Tag as the Select Targets by option and select the key:value pair Name : Scale-out Tutorial.

  • Paste the command below into the Commands text box, replacing !!!Add_file_system_id_here!!! with the your EFS file system id. (e.g. /tmp/scale-out-tutorial-get-lidar-data.sh fs-123456af)

/tmp/scale-out-tutorial-get-lidar-data.sh !!!Add_file_system_id_here!!!
  • Select Run

Step 5: Monitor the transfer

5.1. Verify the script is executing on each EC2 instance Once the script has been successfully executed, the Name tag of each instance will change from Scale-out Tutorial to Scale-out Tutorial parallel s3 cp (running).

5.2. Monitor the total throughput to the EFS file system using CloudWatch Select the CloudWatch dashboard for your EFS file system that was created in the Create file system tutorial.

5.3. Monitor for 10 minutes Monitor the CloudWatch metrics for 10 minutes or so, looking at the Throughput & IOPS widgets. Optionally, you can SSH into one or more of the EC2 instances and monitor the throughput of that instance in realtime by running the script below on the instance.

nload -u M

Step 6: Terminate the EC2 Spot Request

6.1. Sign in to the AWS Management Console and navigate to the EC2 service page

6.2. Select Spot Requests on the left frame under INSTANCES and select the checkbox next to the request id you just created.

6.3. Select Actions then Cancel Spot request. Confirm the Terminate instances checkbox is checked and select Confirm.

Sample Results

Click on the image to watch a screen capture*

video

*this was achieved using a file system with a permitted throughput greater than 3 GB/s

Conclusion

The distributed data storage design of Amazon EFS enables high levels of availability, durability, and scalability. This distributed architecture results in a small latency overhead for each file operation. Due to this per-operation latency, overall throughput generally increases as the average I/O size increases, because the overhead is amortized over a larger amount of data. Amazon EFS supports highly parallelized workloads (for example, using concurrent operations from multiple threads and multiple Amazon EC2 instances), which enables high levels of aggregate throughput and operations per second.


Bonus

Having issues? - Want to save time?

Launch the AWS CloudFormation Stack

This AWS CloudFormation stack will automatically create the environment created in Steps 1 - 3 above. Steps 4 - 5 are also scripted out below and you can run those in a terminal session. You will still need to complete Step 6 to delete the environment when you're finished with the tutorial.

Click the cloudformation-launch-stack link below to create the AWS CloudFormation stack in your account and desired AWS region. This region must an existing Amazon EFS file system which you will use with this tutorial.

AWS Region Code Name Launch
us-east-1 US East (N. Virginia) cloudformation-launch-stack
us-east-2 US East (Ohio) cloudformation-launch-stack
us-west-1 US West (N. California) cloudformation-launch-stack
us-west-2 US West (Oregon) cloudformation-launch-stack
eu-west-1 EU (Ireland) cloudformation-launch-stack
eu-central-1 EU (Frankfurt) cloudformation-launch-stack
ap-southeast-2 AP (Sydney) cloudformation-launch-stack

After launching the AWS CloudFormation Stack above, you should see the Amazon EC2 instances running in your VPC. Wait for the Name tag of each instance to read "Scale-out Tutorial" before continuing.

Verify all instances have registered with SSM

Run this command to verify all EC2 instances in the spot fleet have registered with SSM. You should see one entry per instance.

aws ssm describe-instance-information --query "InstanceInformationList[*]" --output json

Remotely execute script to download objects from S3 to Amazon EFS

Run this command to remotely execute the scale-out-tutorial-get-lidar-data.sh script which will download all objects from the DC-LiDAR public dataset to the Amazon EFS file system specified in the CloudFormation template parameters.

aws ssm send-command --targets "Key=tag:Name,Values=Scale-out Tutorial" --document-name "AWS-RunShellScript" --comment "Scale-out Tutorial" --parameters commands=/tmp/scale-out-tutorial-get-lidar-data.sh --output text --query "Command.CommandId"

Verify the script is executing on each EC2 instance

Once the script has been successfully executed, the Name tag of each instance will change from 'Scale-out Tutorial' to 'Scale-out Tutorial parallel s3 cp (running)'

Delete the CloudFormation stack

Delete the cloud formation stack to terminate all EC2 instances in the fleet.

Troubleshooting

For feedback, suggestions, or corrections, please email me at [email protected].

License

This sample code is made available under the MIT-0 license. See the LICENSE file.