Keyforge Data Scraping Part 2: The Script

In the previous post in this series, we discussed the API for the new procedurally generated card game called Keyforge.  In this post, we’ll walk through a simple implementation of the API via PowerShell.  The script in its entirety can be found on my GitHub here.

Overview

The goal of this script was to run through all of the decks currently registered and save data about the decks and the cards in the decks.  This project was intended as a working proof of concept for a larger project I’d like to build out.  In the end, I’d like to have the data stored in Azure SQL Server and build a simple API and a simple PowerBI report in front of it so that users can get answers to their Keyforge data questions.

Requirements

  • SQL Server (Express edition is free)
  • PowerShell (Comes preinstalled on Windows machines)
  • PowerShell Module dbatools
    • within PowerShell run: install-module dbatools

The Script

The first part of the script checks to see if the database has been created or not.  If it hasn’t been created, it creates the database:

Next, it tries to get information on as many of the cards as it can.  It does this by looping through the API pages and collecting distinct cards until it has 350 cards, 7 houses, or it gets to a page that hasn’t had a new card.  This part of the code was more or less taken from BatmanAMA on GitHub and was used as a starting point.

 

A major issue I had early on in this process was that PowerShell wasn’t bringing in special characters correctly with the default settings of Invoke-RestMethod.  Figuring this part out is what took, by far, the longest amount of time during this project.

To get around the text issue, I used Invoke-WebRequest instead of RestMethod.  I added the ContentType parameter with the charset=utf8.  From there, I added the UTF8.GetSTring and updated the encoding.  This seemed to correct most or all of the issues I was having.

The script now has 3 main objects that we care about:

  • $decks – information such as deck name and houses
  • $houses – house names
  • $cards – card names, traits, number, etc

However, the data isn’t yet being stored anywhere!  Before we can store the data, we need to create the tables in the database (since we know the script created the database).

The dimCards table will hold information on the individual cards.  Create this table:

The dbatools PowerShell library is really powerful — it can write objects straight to a table.  However, I have only been able to get it to work if I add another column to the beginning of the object.

A single card can have multiple traits, so it will require a junction table to correctly account for everything.  This creates the dimTraits and factCardTraits tables:

Next, I get an object that holds card id and traits for the card.  I loop through each card in the object and each trait for the card and insert into the two tables above.

The rest of the code is very similar to the above, but it loops through each individual deck to get the cards in the decks.  The first step is to create the dimDecks, factDeckHouses, and factDeckCards tables.  DimDecks will hold deck information such as name, factDeckHouses will link decks to their 3 houses, and factDeckCards will link decks to their 36 cards.

The next and final section loops through all decks on every page of the API and tracks the data on the page.  Future versions will likely only contain the meat of what is in this part of the script, as opposed to getting card information first.

As discussed in the previous post, the first item in the API is called Count which holds the total number of decks available.  We set the number of decks per page in the $PageSize variable at the beginning of this script.  To calculate the total number of pages, it is just the count / page size:

I then loop through all of those pages and get the same type of information we’ve gathered previously:

 

 

 Next Steps

  • Database Updates
    • Performance tune it (it’s super slow right now!)
    • Create Foreign Keys where appropriate
    • Clean up some of the data that is messed up.  Specifically, the dimTraits table has some issues such as “Cyborg â?¢ Beast”.  Break this into Cyborg and Beast.
  • Script Updates
    • Create functions for the repetitive portions
    • Use a queue to pull in all of the JSON via the API and split the work among multiple nodes to speed up the process
  • Project Updates
    • Move the script to Azure
    • Create a schedule to run regularly
    • Create the API
    • Create the PowerBI report

 

Have questions or suggestions?  Please feel free to comment below or contact me.

6 thoughts on “Keyforge Data Scraping Part 2: The Script

  1. Is it necessary to use the db? I’m trying to write a Mastodon bot to post random decks/cards to Mastodon. This is my first bot project, I’m using javascript, but know significant powershell (of course, that’s not really going to be helpful for a mastodon bot). In anycase, I had imagined just making a call to the api, and then formatting the returned text for the content of the post.

    1. No, it isn’t necessary. I’m trying to become a better database developer which is the main reason I wanted to move the data to a database. The only real issue I see is that I’m not sure how to search the API for a specific deck. If you’re wanting to do a truly random deck, you could pull a random API page and then a random deck and its cards from that page.

      I think there must be a way to use the API to search for a specific deck.. but I haven’t seen anything on that online and I haven’t figured it out myself. If you do run across that, I’d love to know!

      1. So, the ID you get from the QR code is different than the id used in the API and deck URLs. Has anyone already figured out how to get the deck id from the 15 digit code from the QR code?

Leave a Reply

Your email address will not be published. Required fields are marked *