Getting Started with File Dumps

Our mobile and connected TV (CTV) app file dumps provide insight into: app metadata, performance metrics, tech stacks, permissions, localization techniques, top chart rankings, classifications, content ratings, ad space resellers, and more.

Currently, our file dumps provide intelligence on Google Play, the Apple App Store, Amazon Appstore, Tencent Appstore, Huawei AppGallery, Roku Channel Store, Apple TV tvOS App Store, Amazon Fire TV, Google TV, Samsung Smart TV Apps, LG Content Store, and Vizio SmartCast Apps.

Available File Dumps

We offer the following app details file dumps:

Android App Details
iOS App Details
Tencent App Details
Amazon App Details
Huawei App Details
Connected TV App Details (Roku, Apple TV, Fire TV, Google TV, Samsung, LG, and Vizio)

We also offer the following file dumps for app store top charts:

And finally, we offer the following app intelligence file dumps:

Data Format

Data is stored in a single gzipped file with line delimited JSON with the following characteristics:

Each line is a valid JSON object
UTF-8 encoding
Line separator is '\n'

Access Credentials

To obtain your access credentials, review the File Dumps page or contact us directly.

Note: Do not share your access credentials publicly (in emails, source control, chats etc.). Your credentials will be automatically decommissioned if you do so.

Authentication

Clients require AWS S3 credentials in order to obtain the data files. Your credentials are available in your 42matters account under Launchpad.

There are many tools that can be used to access our file dump data. Check with your security or DevOps team to determine which is the most suitable for your company.

Cyberduck

You can use Cyberduck to make sure your credentials work. After installing and launching the Cyberduck software, click Open Connection and choose Amazon S3 from the dropdown.

Enter external.42matters.com.s3.amazonaws.com in the server field and your account's access credentials in the relevant fields.

Cyberduck

After successfully authenticating, you will be able to navigate to the target path locations, which we've supplied in the on-boarding email.

Cyberduck

ExpanDrive

An alternative tool you can use is ExpanDrive. After installing the software, launch it and click on the large "+" button in the bottom-left. Then choose Amazon S3.

Enter s3.amazonaws.com in the server field and external.42matters.com in the bucket field. Then enter your access credentials in the relevant fields.

ExpanDrive

Automation with AWS CLI

Here we show an example of how to use awscli to list bucket's contents and download a standard playstore dump for a particular date.

# 1) Install awscli - https://aws.amazon.com/cli/
pip install awscli --upgrade --user

# 2) Configure awscli with the credentials we've provided in your account.
aws configure --profile YOUR_COMPANY

# 3) list the contents of a folder
aws s3 ls s3://external.42matters.com/1/42apps/v0.1/production/playstore/lookup/ --profile YOUR_COMPANY

# 4) list the contents of the timestamped folder
aws s3 ls s3://external.42matters.com/1/42apps/v0.1/production/playstore/lookup/2018-08-16/ --profile YOUR_COMPANY

# 5) download the file locally
aws s3 cp s3://external.42matters.com/1/42apps/v0.1/production/playstore/lookup/2018-08-16/playstore-00.tar.gz playstore.tar.gz --profile YOUR_COMPANY

# 6) unpack the file
tar xvfz playstore.tar.gz

Automation with Python Boto3

In order to programmatically download the file dumps, use a client library for AWS S3 in the language of your choice. For Python, we recommend the Boto3 or S3Transfer tools for bulk downloads.

Here we show an example of how to use Boto3 to find the latest available Monthly Playstore Standard file dump and then download it. This code snippet is written in Python.

Setup boto3 with the required credentials, which you can find in the Launchpad.

import boto3

s3 = boto3.resource(
    "s3",
    aws_access_key_id="xxx",
    aws_secret_access_key="xxx",
    use_ssl=True,
)

Download the file named "current", which can be found in the root folder of the dump type and is always updated when there is a new dump ready. Inside it contains a link to the location on S3. Note: Daily Google Play and App Store dumps produce two files — one for updated and one for removed apps. Accordingly there are two current files — "current-updated" and "current-removed".

current_file = s3.Object(
    "external.42matters.com",
    key="1/42apps/v0.1/production/playstore/lookup/current",
)
current_url = current_file.get()["Body"].read().decode("utf-8")

The format of the link inside is:

"https://s3.amazonaws.com/external.42matters.com/1/42apps/VERSION/production/PLATFORM/DUMP_TYPE/YYYY-MM-DD/FILE_NAME.tar.gz"

Boto3 requires to provide the bucket and key separately, so just split the url and extract the two values.

bucket, key = current_url.replace("https://s3.amazonaws.com/", "").split(
    "/", maxsplit=1
)

Download the file.

latest_dump = s3.Object(bucket, key).download_file(
    Filename="current_playstore.tar.gz"
)

Complete Script Example

import boto3

s3 = boto3.resource(
    "s3",
    aws_access_key_id="xxx",
    aws_secret_access_key="xxx",
    use_ssl=True,
)

current_file = s3.Object(
    "external.42matters.com",
    key="1/42apps/v0.1/production/playstore/lookup/current",
)
current_url = current_file.get()["Body"].read().decode("utf-8")

bucket, key = current_url.replace("https://s3.amazonaws.com/", "").split(
    "/", maxsplit=1
)

latest_dump = s3.Object(bucket, key).download_file(
    Filename="current_playstore.tar.gz"
)

Example File Dump Data

A good command-line tool for playing with the line-separated json files is JQ. Here are a couple terminal commands to get you started after you unpack the files:

Print first 100 app title for Google Play apps:

head -n 100 playstore-00 | jq -c '.title' -r

Print all titles for Apple App Store apps:

head -n 100 itunes-00 | jq -c '.trackCensoredName' -r

Last Modified: 03 May 2023

Automate Your Workflow With 42matters’ APIs

Integrate app data and intelligence into your existing workflows, dashboards, CRM platforms, messaging platforms, and more! Below is a selection of the third-party services that are compatible with 42matters’ APIs: