Getting Started with File Dumps
Our mobile and connected TV (CTV) app file dumps provide insight into: app metadata, performance metrics, tech stacks, permissions, localization techniques, top chart rankings, classifications, content ratings, ad space resellers, and more.
Currently, our file dumps provide intelligence on Google Play, the Apple App Store, Amazon Appstore, Tencent MyApp, Huawei AppGallery, Roku Channel Store, Apple TV tvOS App Store, Amazon Fire TV, Google TV, Samsung Smart TV Apps, LG Content Store, and Vizio SmartCast Apps.
Access Credentials
To obtain your access credentials, review the File Dumps page or contact us directly.
Note: Do not share your access credentials publicly (in emails, source control, chats etc.). Your credentials will be automatically decommissioned if you do so.
Authentication
Clients require AWS S3 credentials in order to obtain the data files. Your credentials are available in your 42matters account under Launchpad.
There are many tools that can be used to access our file dump data. Check with your security or DevOps team to determine which is the most suitable for your company.
Cyberduck
You can use Cyberduck to make sure your credentials work.
After installing and launching the Cyberduck software, click Open Connection and choose Amazon S3 from the dropdown.
Enter external.42matters.com.s3.amazonaws.com in the server field and your account's access credentials in the relevant fields.

After successfully authenticating, you will be able to navigate to the target path locations, which we've supplied in the on-boarding email.

ExpanDrive
An alternative tool you can use is ExpanDrive.
After installing the software, launch it and click on the large "+" button in the bottom-left. Then choose Amazon S3.
Enter s3.amazonaws.com in the server field and external.42matters.com in the bucket field.
Then enter your access credentials in the relevant fields.

Automation with AWS CLI
Here we show an example of how to use awscli
to list bucket's contents and download a standard playstore dump for a particular date.
# 1) Install awscli - https://aws.amazon.com/cli/
pip install awscli --upgrade --user
# 2) Configure awscli with the credentials we've provided in your account.
aws configure --profile YOUR_COMPANY
# 3) list the contents of a folder
aws s3 ls s3://external.42matters.com/1/42apps/v0.1/production/playstore/lookup/ --profile YOUR_COMPANY
# 4) list the contents of the timestamped folder
aws s3 ls s3://external.42matters.com/1/42apps/v0.1/production/playstore/lookup/2018-08-16/ --profile YOUR_COMPANY
# 5) download the file locally
aws s3 cp s3://external.42matters.com/1/42apps/v0.1/production/playstore/lookup/2018-08-16/playstore-00.tar.gz playstore.tar.gz --profile YOUR_COMPANY
# 6) unpack the file
tar xvfz playstore.tar.gz
Automation with Python Boto3
In order to programmatically download the file dumps, use a client library for AWS S3 in the language of your choice.
For Python, we recommend the Boto3
or S3Transfer
tools for bulk downloads.
Here we show an example of how to use Boto3
to find the latest available Monthly Playstore Standard file dump and then download it. This code snippet is
written in
Python.
Setup boto3 with the required credentials, which you can find in the Launchpad.
import boto3
s3 = boto3.resource(
"s3",
aws_access_key_id="xxx",
aws_secret_access_key="xxx",
use_ssl=True,
)
Download the file named "current", which can be found in the root folder of the dump type and is always updated when there is a new dump ready.
Inside it contains a link to the location on S3. Note: Daily Google Play and App Store dumps produce two files — one for updated and one for removed apps.
Accordingly there are two current files — "current-updated" and "current-removed".
current_file = s3.Object(
"external.42matters.com",
key="1/42apps/v0.1/production/playstore/lookup/current",
)
current_url = current_file.get()["Body"].read().decode("utf-8")
The format of the link inside is:
"https://s3.amazonaws.com/external.42matters.com/1/42apps/VERSION/production/PLATFORM/DUMP_TYPE/YYYY-MM-DD/FILE_NAME.tar.gz"
Boto3 requires to provide the bucket and key separately, so just split the url and extract the two values.
bucket, key = current_url.replace("https://s3.amazonaws.com/", "").split(
"/", maxsplit=1
)
Download the file.
latest_dump = s3.Object(bucket, key).download_file(
Filename="current_playstore.tar.gz"
)
Complete Script Example
import boto3
s3 = boto3.resource(
"s3",
aws_access_key_id="xxx",
aws_secret_access_key="xxx",
use_ssl=True,
)
current_file = s3.Object(
"external.42matters.com",
key="1/42apps/v0.1/production/playstore/lookup/current",
)
current_url = current_file.get()["Body"].read().decode("utf-8")
bucket, key = current_url.replace("https://s3.amazonaws.com/", "").split(
"/", maxsplit=1
)
latest_dump = s3.Object(bucket, key).download_file(
Filename="current_playstore.tar.gz"
)
Example File Dump Data
A good command-line tool for playing with the line-separated json files is JQ.
Here are a couple terminal commands to get you started after you unpack the files:
Print first 100 app title for Google Play apps:
head -n 100 playstore-00 | jq -c '.title' -r
Print all titles for Apple App Store apps:
head -n 100 itunes-00 | jq -c '.trackCensoredName' -r
Last Modified: 03 May 2023