Getting Started with File Dumps


Our mobile and connected TV (CTV) app file dumps provide insight into: app metadata, performance metrics, tech stacks, permissions, localization techniques, top chart rankings, classifications, content ratings, ad space resellers, and more.

Currently, our file dumps provide intelligence on Google Play, the Apple App Store, Amazon Appstore, Tencent MyApp, Huawei AppGallery, Roku Channel Store, Apple TV tvOS App Store, Amazon Fire TV, Google TV, Samsung Smart TV Apps, LG Content Store, and Vizio SmartCast Apps.

Available File Dumps

We offer the following app details file dumps:

We also offer the following file dumps for app store top charts:

And finally, we offer the following app intelligence file dumps:

Data Format

Data is stored in a single gzipped file with line delimited JSON with the following characteristics:

  • Each line is a valid JSON object
  • UTF-8 encoding
  • Line separator is '\n'

Access Credentials

To obtain your access credentials, review the File Dumps page or contact us directly.

Note: Do not share your access credentials publicly (in emails, source control, chats etc.). Your credentials will be automatically decommissioned if you do so.

Authentication

Clients require AWS S3 credentials in order to obtain the data files. Your credentials are available in your 42matters account under Launchpad.

There are many tools that can be used to access our file dump data. Check with your security or DevOps team to determine which is the most suitable for your company.

Cyberduck

You can use Cyberduck to make sure your credentials work. After installing and launching the Cyberduck software, click Open Connection and choose Amazon S3 from the dropdown.

Enter external.42matters.com.s3.amazonaws.com in the server field and your account's access credentials in the relevant fields.

Cyberduck

After successfully authenticating, you will be able to navigate to the target path locations, which we've supplied in the on-boarding email.

Cyberduck

ExpanDrive

An alternative tool you can use is ExpanDrive. After installing the software, launch it and click on the large "+" button in the bottom-left. Then choose Amazon S3.

Enter s3.amazonaws.com in the server field and external.42matters.com in the bucket field. Then enter your access credentials in the relevant fields.

ExpanDrive

Automation with AWS CLI

Here we show an example of how to use awscli to list bucket's contents and download a standard playstore dump for a particular date.

# 1) Install awscli - https://aws.amazon.com/cli/
pip install awscli --upgrade --user

# 2) Configure awscli with the credentials we've provided in your account.
aws configure --profile YOUR_COMPANY

# 3) list the contents of a folder
aws s3 ls s3://external.42matters.com/1/42apps/v0.1/production/playstore/lookup/ --profile YOUR_COMPANY

# 4) list the contents of the timestamped folder
aws s3 ls s3://external.42matters.com/1/42apps/v0.1/production/playstore/lookup/2018-08-16/ --profile YOUR_COMPANY

# 5) download the file locally
aws s3 cp s3://external.42matters.com/1/42apps/v0.1/production/playstore/lookup/2018-08-16/playstore-00.tar.gz playstore.tar.gz --profile YOUR_COMPANY

# 6) unpack the file
tar xvfz playstore.tar.gz
            

Automation with Python Boto3

In order to programmatically download the file dumps, use a client library for AWS S3 in the language of your choice. For Python, we recommend the Boto3 or S3Transfer tools for bulk downloads.

Here we show an example of how to use Boto3 to find the latest available Monthly Playstore Standard file dump and then download it. This code snippet is written in Python.

Setup boto3 with the required credentials, which you can find in the Launchpad.

import boto3

s3 = boto3.resource(
    "s3",
    aws_access_key_id="xxx",
    aws_secret_access_key="xxx",
    use_ssl=True,
)

Download the file named "current", which can be found in the root folder of the dump type and is always updated when there is a new dump ready. Inside it contains a link to the location on S3. Note: Daily Google Play and App Store dumps produce two files — one for updated and one for removed apps. Accordingly there are two current files — "current-updated" and "current-removed".

current_file = s3.Object(
    "external.42matters.com",
    key="1/42apps/v0.1/production/playstore/lookup/current",
)
current_url = current_file.get()["Body"].read().decode("utf-8")

The format of the link inside is:

"https://s3.amazonaws.com/external.42matters.com/1/42apps/VERSION/production/PLATFORM/DUMP_TYPE/YYYY-MM-DD/FILE_NAME.tar.gz"

Boto3 requires to provide the bucket and key separately, so just split the url and extract the two values.

bucket, key = current_url.replace("https://s3.amazonaws.com/", "").split(
    "/", maxsplit=1
)

Download the file.

latest_dump = s3.Object(bucket, key).download_file(
    Filename="current_playstore.tar.gz"
)

Complete Script Example

import boto3

s3 = boto3.resource(
    "s3",
    aws_access_key_id="xxx",
    aws_secret_access_key="xxx",
    use_ssl=True,
)

current_file = s3.Object(
    "external.42matters.com",
    key="1/42apps/v0.1/production/playstore/lookup/current",
)
current_url = current_file.get()["Body"].read().decode("utf-8")

bucket, key = current_url.replace("https://s3.amazonaws.com/", "").split(
    "/", maxsplit=1
)

latest_dump = s3.Object(bucket, key).download_file(
    Filename="current_playstore.tar.gz"
)

Example File Dump Data

A good command-line tool for playing with the line-separated json files is JQ. Here are a couple terminal commands to get you started after you unpack the files:

Print first 100 app title for Google Play apps:

head -n 100 playstore-00 | jq -c '.title' -r

Print all titles for Apple App Store apps:

head -n 100 itunes-00 | jq -c '.trackCensoredName' -r

Last Modified: 03 May 2023


Automate Your Workflow With 42matters’ APIs


Integrate app data and intelligence into your existing workflows, dashboards, CRM platforms, messaging platforms, and more! Below is a selection of the third-party services that are compatible with 42matters’ APIs:



Salesforce

Salesforce

HubSpot

HubSpot

Slack

Slack

Intercom

Intercom

Pipedrive

Pipedrive

Zendesk

Zendesk

Gekoboard

Gekoboard

Klipfolio

Klipfolio

Dynamics

Dynamics


And many more!