Getting Started with File Dumps
42matters generates file data dumps of the mobile app meta-data of apps and charts available on Google Play and iTunes and makes it available to its customers for download.
Access Credentials
In order to obtain your Access Credentials, please read the File Dumps page or contact us.
Note: Do not share your access credentials publicly such as in emails, source control, chats etc. This
will lead to having your credentials disabled.
Authentication
Clients require AWS S3 credentials in order to obtain the data files. Your credentials are available for you
in your 42matters account under Launchpad.
There are many tools that can be used to access the data, always check with your security and devops team which is the most suitable tool for your company.
Cyberduck
To make sure that the credentials are working well try out with a tool such as Cyberduck. After
installing the software,
launch it and click on Open Connection -> choose Amazon S3 from the dropdown.
Enter Server external.42matters.com.s3.amazonaws.com
and the access credentials from your account.

After successfully authenticating with the server you will be able to navigate to the target path
locations, which
we've supplied in the on-boarding email.

ExpanDrive
An alternative tool you can use is ExpanDrive. After
installing the software,
launch it and click on the large "+" button in the bottom-left section -> choose Amazon S3.
Enter Server s3.amazonaws.com and external.42matters.com bucket
and the access credentials from your account.

Automation with AWS CLI
Here we show an example of how to use awscli
to list bucket's contents and download a standard playstore dump for a particular date.
# 1) Install awscli - https://aws.amazon.com/cli/
pip install awscli --upgrade --user
# 2) Configure awscli with the credentials we've provided in your account.
aws configure --profile YOUR_COMPANY
# 3) list the contents of a folder
aws s3 ls s3://external.42matters.com/1/42apps/v0.1/production/playstore/lookup/ --profile YOUR_COMPANY
# 4) list the contents of the timestamped folder
aws s3 ls s3://external.42matters.com/1/42apps/v0.1/production/playstore/lookup/2018-08-16/ --profile YOUR_COMPANY
# 5) download the file locally
aws s3 cp s3://external.42matters.com/1/42apps/v0.1/production/playstore/lookup/2018-08-16/playstore-00.tar.gz playstore.tar.gz --profile YOUR_COMPANY
# 6) unpack the file
tar xvfz playstore.tar.gz
Automation with Python Boto3
In order to programmatically download the feed files use a client library for AWS S3 in the language of your
choice. For Python we recommend Boto3 or S3Transfer tool for bulk downloads.
Here we show an example of how to use Boto3
to find the latest available Monthly Playstore Standard file dump and then download it. This code snippet is
written in
Python.
Setup boto3 with the required credentials, that you can find under Launchpad.
import boto3
s3 = boto3.resource(
"s3",
aws_access_key_id="xxx",
aws_secret_access_key="xxx",
use_ssl=True,
)
Download the file named "current", which lies in the root folder of the dump type and is always updated when
there is a new dump ready. Inside it contains a link to the location on S3. Note: Daily GPlay & iTunes dumps
produce two files - one for updated and one for removed apps. Accordingly there are two current files -
"current-updated" and "current-removed".
current_file = s3.Object(
"external.42matters.com",
key="1/42apps/v0.1/production/playstore/lookup/current",
)
current_url = current_file.get()["Body"].read().decode("utf-8")
The format of the link inside is:
"https://s3.amazonaws.com/external.42matters.com/1/42apps/VERSION/production/PLATFORM/DUMP_TYPE/YYYY-MM-DD/FILE_NAME.tar.gz"
Boto3 requires to provide the bucket and key separately, so just split the url and extract the two values.
bucket, key = current_url.replace("https://s3.amazonaws.com/", "").split(
"/", maxsplit=1
)
Download the file.
latest_dump = s3.Object(bucket, key).download_file(
Filename="current_playstore.tar.gz"
)
Complete script example
import boto3
s3 = boto3.resource(
"s3",
aws_access_key_id="xxx",
aws_secret_access_key="xxx",
use_ssl=True,
)
current_file = s3.Object(
"external.42matters.com",
key="1/42apps/v0.1/production/playstore/lookup/current",
)
current_url = current_file.get()["Body"].read().decode("utf-8")
bucket, key = current_url.replace("https://s3.amazonaws.com/", "").split(
"/", maxsplit=1
)
latest_dump = s3.Object(bucket, key).download_file(
Filename="current_playstore.tar.gz"
)
Examine file dump data
A good command-line tool for playing with the line-separated json files is JQ,
here a couple of terminal commands to get you started after you unpack the files:
Print first 100 app title for GPlay apps
head -n 100 playstore-00 | jq -c '.title' -r
Print all titles for iTunes apps
head -n 100 itunes-00 | jq -c '.trackCensoredName' -r
Last Modified: 07 Feb 2022