Keyforge Data Scraping Part 2: The Script

In the previous post in this series, we discussed the API for the new procedurally generated card game called Keyforge.  In this post, we’ll walk through a simple implementation of the API via PowerShell.  The script in its entirety can be found on my GitHub here.

Overview

The goal of this script was to run through all of the decks currently registered and save data about the decks and the cards in the decks.  This project was intended as a working proof of concept for a larger project I’d like to build out.  In the end, I’d like to have the data stored in Azure SQL Server and build a simple API and a simple PowerBI report in front of it so that users can get answers to their Keyforge data questions.

Requirements

  • SQL Server (Express edition is free)
  • PowerShell (Comes preinstalled on Windows machines)
  • PowerShell Module dbatools
    • within PowerShell run: install-module dbatools

The Script

The first part of the script checks to see if the database has been created or not.  If it hasn’t been created, it creates the database:

if (-not (Find-DbaDatabase -SqlInstance $server -Pattern $database)) { # If the database doesn't exist
    New-DbaDatabase -SqlInstance $server -Name $database # create the database

Next, it tries to get information on as many of the cards as it can.  It does this by looping through the API pages and collecting distinct cards until it has 350 cards, 7 houses, or it gets to a page that hasn’t had a new card.  This part of the code was more or less taken from BatmanAMA on GitHub and was used as a starting point.

 

    #####################
    ##  Get Card Data  ##
    #####################
    $Url = "https://www.keyforgegame.com/api/decks/?page={0}&page_size={1}&search={2}&links=cards" -f "{0}", $pagesize, $search
    $page = 1
    $cards = [System.Collections.Generic.List[object]]::new()
    $houses = [System.Collections.Generic.List[object]]::new()
    do {
        $Response = Invoke-WebRequest ($url -f $page++) -ContentType 'application/json; charset=utf8'
        $jsonCorrected = [Text.Encoding]::UTF8.GetString(
                  [Text.Encoding]::GetEncoding(28591).GetBytes($Response.Content)
                )
        $decks = $jsonCorrected |ConvertFrom-Json
        if (($decks._linked.cards | where-object id -notin ($cards.id))) {
            $cards.AddRange(($decks._linked.cards | where-object id -notin ($cards.id)))
        }
        if (($decks._linked.houses | where-object id -notin ($houses.id)).count) {
            $houses.AddRange(($decks._linked.houses | where-object id -notin ($houses.id)))
        }
    } while ($cards.Count -lt $totalCards -or $houses.Count -lt $totalHouses -or $page * $pagesize -ge $decks.count)

A major issue I had early on in this process was that PowerShell wasn’t bringing in special characters correctly with the default settings of Invoke-RestMethod.  Figuring this part out is what took, by far, the longest amount of time during this project.

To get around the text issue, I used Invoke-WebRequest instead of RestMethod.  I added the ContentType parameter with the charset=utf8.  From there, I added the UTF8.GetSTring and updated the encoding.  This seemed to correct most or all of the issues I was having.

The script now has 3 main objects that we care about:

  • $decks – information such as deck name and houses
  • $houses – house names
  • $cards – card names, traits, number, etc

However, the data isn’t yet being stored anywhere!  Before we can store the data, we need to create the tables in the database (since we know the script created the database).

The dimCards table will hold information on the individual cards.  Create this table:

    # Create dimCards table (without traits)
    $query = "CREATE TABLE dimCards (
    CardID INT IDENTITY(1,1) PRIMARY KEY
    , CardKeyforgeID VARCHAR(255)
    , CardNumber INT
    , CardName NVARCHAR(MAX)
    , House NVARCHAR(MAX)
    , CardType NVARCHAR(MAX)
    , FrontImageURL NVARCHAR(MAX)
    , CardText NVARCHAR(MAX)
    , CardAmber INT
    , CardPower INT
    , CardArmor INT
    , CardRarity NVARCHAR(MAX)
    , FlavorText NVARCHAR(MAX)
    , Expansion INT
    , IsMaverick BIT
    , CONSTRAINT UC_dimCards_KeyforgeID UNIQUE(CardKeyforgeID))"
    Invoke-DbaQuery -SqlInstance $server -Database $database -Query $query

The dbatools PowerShell library is really powerful — it can write objects straight to a table.  However, I have only been able to get it to work if I add another column to the beginning of the object.

    $CardTable = $cards | Select-Object @{N="FirstColumn";E={0}}, id, card_number, card_title, house, card_type, front_image, card_text, amber, power, armor, rarity, flavor_text, expansion, is_maverick
    Write-DbaDataTable -SqlInstance $server -Database $database -Table dimCards -InputObject ($CardTable)  -Verbose 

A single card can have multiple traits, so it will require a junction table to correctly account for everything.  This creates the dimTraits and factCardTraits tables:

    # Create dimTraits table
    Invoke-DbaQuery -ServerInstance $server -Database $database -Query "CREATE TABLE dimTraits (TraitID INT IDENTITY(1,1) PRIMARY KEY, TraitName VARCHAR(50) UNIQUE)"
    # Create factCardTraits table
    Invoke-DbaQuery -ServerInstance $server -Database $database -Query "CREATE TABLE factCardTraits (CardID INT, TraitID INT, CONSTRAINT UC_CardTrait UNIQUE(CardID, TraitID) )"

Next, I get an object that holds card id and traits for the card.  I loop through each card in the object and each trait for the card and insert into the two tables above.

# Clean up traits (a card can have multiple traits)
    $TraitsXCards = $cards | Where-Object {$_.traits} | Select-Object id, traits
    foreach ($card in $TraitsXCards | Select-Object id, @{Name="Traits";Expression={$_.traits -split (' • ')}}){
        $KeyforgeCardID = $card.id
        $CardID = Invoke-DbaQuery -SqlInstance $server -Database $database -Query "SELECT CardID FROM dimCards WHERE CardKeyforgeID = '$KeyforgeCardID'"
        foreach ($trait in $card.traits){
            $query = "select TraitID from dimTraits where TraitName = '$trait'"
            $TraitID = Invoke-DbaQuery -SqlInstance $server -Database $database -Query $query
            if (-not $TraitID){
                $InsertQuery = "INSERT INTO dimTraits VALUES ('$trait')"
                Invoke-DbaQuery -SqlInstance $server -Database $database -Query $InsertQuery
                $TraitID = Invoke-DbaQuery -SqlInstance $server -Database $database -Query $query
            }
            $query = "SELECT * from factCardTraits WHERE CardID = $($CardID.CardID) AND TraitID = $($TraitID.TraitID)"
            $Response = Invoke-DbaQuery -SqlInstance $server -Database $database -Query $query
            if (-not $Response){
                $query = "INSERT INTO factCardTraits VALUES ($($CardID.CardID), $($TraitID.TraitID))"
                Invoke-DbaQuery -ServerInstance $server -Database $database -Query $query
            }
        }
    }

The rest of the code is very similar to the above, but it loops through each individual deck to get the cards in the decks.  The first step is to create the dimDecks, factDeckHouses, and factDeckCards tables.  DimDecks will hold deck information such as name, factDeckHouses will link decks to their 3 houses, and factDeckCards will link decks to their 36 cards.

    #####################
    ##  Get Deck Data  ##
    #####################
    Invoke-DbaQuery -ServerInstance $server -Database $database -Query "CREATE TABLE dimDecks (DeckID INT IDENTITY(1,1) PRIMARY KEY, DeckKeyforgeID VARCHAR(255) NOT NULL, DeckName NVARCHAR(255), Expansion INT, PowerLevel INT, Chains INT, Wins INT, Losses INT, Notes NVARCHAR(MAX), CONSTRAINT UC_DeckKeyforgeID UNIQUE(DeckKeyforgeID))"
    Invoke-DbaQuery -ServerInstance $server -Database $database -Query "CREATE TABLE factDeckHouses (DeckID INT NOT NULL, House VARCHAR(255) NOT NULL, CONSTRAINT UC_DeckHouses UNIQUE(DeckID, House))"
    Invoke-DbaQuery -ServerInstance $server -Database $database -Query "CREATE TABLE factDeckCards (DeckID INT NOT NULL, CardID INT NOT NULL)" # originally had a unique constraint, but decks can have the same card multiple times

The next and final section loops through all decks on every page of the API and tracks the data on the page.  Future versions will likely only contain the meat of what is in this part of the script, as opposed to getting card information first.

As discussed in the previous post, the first item in the API is called Count which holds the total number of decks available.  We set the number of decks per page in the $PageSize variable at the beginning of this script.  To calculate the total number of pages, it is just the count / page size:

    $totalPages = [math]::Ceiling($decks.count / $pagesize)

I then loop through all of those pages and get the same type of information we’ve gathered previously:

    foreach ($page in 1..$totalPages){
        Write-Host "On page $page of $totalPages"
        $url = "https://www.keyforgegame.com/api/decks/?links=cards&page_size=25&page=$page"

        $Response = Invoke-WebRequest $url -ContentType 'application/json; charset=utf8'
        $jsonCorrected = [Text.Encoding]::UTF8.GetString(
                  [Text.Encoding]::GetEncoding(28591).GetBytes($Response.Content)
                )
        $body = $jsonCorrected |ConvertFrom-Json

        $totalPages = [math]::Ceiling($body.count / $pagesize)


        $DeckCount = 0
        foreach($deck in $body.data){
            $DeckCount ++ 
            write-host "       Deck # $DeckCount of $pagesize"

            $query = "SELECT * FROM dimDecks WHERE deckkeyforgeid = '$($deck.id)'"
            $response = Invoke-DbaQuery -SqlInstance $server -Database $database -Query $query

            if (-not $Response){ # Deck hasn't been added

                $DeckTable = $deck | Select-Object @{N="FirstColumn";E={0}}, id, name, expansion, power_level, chains, wins, losses, notes
                Write-DbaDataTable -SqlInstance $server -Database $database -Table dimDecks -InputObject $DeckTable

                $DeckID = Invoke-DbaQuery -SqlInstance $server -Database $database -Query "SELECT deckID FROM dimDecks WHERE DeckKeyforgeID = '$($deck.id)'"
                $DeckID = $DeckID.DeckID

                foreach ($house in $deck._links.houses){
                    Invoke-DbaQuery -SqlInstance $server -Database $database -Query "INSERT INTO factDeckHouses VALUES ($DeckID, '$house')"
                }

                foreach ($card in $deck._links.cards){
                    $CardID = Invoke-DbaQuery -SqlInstance $server -Database $database -Query "SELECT cardID from dimCards WHERE CardKeyforgeID = '$card'"
                    $CardID = $CardID.CardID

                    if (-not $CardID){
                        Write-Host "Adding New Card to Database.  ID: $card"
                        $card = $body._linked.cards | Where-Object {$_.id -eq $card}
                        $CardTable = $card | Select-Object @{N="FirstColumn";E={0}}, id, card_number, card_title, house, card_type, front_image, card_text, amber, power, armor, rarity, flavor_text, expansion, is_maverick
                        Write-DbaDataTable -SqlInstance $server -Database $database -Table dimCards -InputObject ($CardTable) 

                        $query = "SELECT cardID from dimCards WHERE CardKeyforgeID = '$($card.id)'"
                        $CardID = Invoke-DbaQuery -SqlInstance $server -Database $database -Query $query
                        $CardID = $CardID.CardID

                        $TraitsXCards = $card | Where-Object {$_.traits} | Select-Object id, traits
                        foreach ($card in $TraitsXCards | Select-Object id, @{Name="Traits";Expression={$_.traits -split (' • ')}}){
                            $query = "select TraitID from dimTraits where TraitName = '$trait'"
                            $TraitID = Invoke-DbaQuery -SqlInstance $server -Database $database -Query $query
                            if (-not $TraitID){
                                $InsertQuery = "INSERT INTO dimTraits VALUES ('$trait')"
                                Invoke-DbaQuery -SqlInstance $server -Database $database -Query $InsertQuery
                                $TraitID = Invoke-DbaQuery -SqlInstance $server -Database $database -Query $query
                            }
                            $query = "SELECT * from factCardTraits WHERE CardID = $CardID AND TraitID = $($TraitID.TraitID)"
                            $Response = Invoke-DbaQuery -SqlInstance $server -Database $database -Query $query
                            if (-not $Response){
                                $query = "INSERT INTO factCardTraits VALUES ($CardID, $($TraitID.TraitID))"
                                Invoke-DbaQuery -ServerInstance $server -Database $database -Query $query
                            }
                        }
                    }

                    #Write-Host "INSERT INTO factDeckCards VALUES ($DeckID, $CardID); $card"
                    Invoke-DbaQuery -SqlInstance $server -Database $database -Query "INSERT INTO factDeckCards VALUES ($DeckID, $CardID)"
                }
            }
        }

 

 

 Next Steps

  • Database Updates
    • Performance tune it (it’s super slow right now!)
    • Create Foreign Keys where appropriate
    • Clean up some of the data that is messed up.  Specifically, the dimTraits table has some issues such as “Cyborg â?¢ Beast”.  Break this into Cyborg and Beast.
  • Script Updates
    • Create functions for the repetitive portions
    • Use a queue to pull in all of the JSON via the API and split the work among multiple nodes to speed up the process
  • Project Updates
    • Move the script to Azure
    • Create a schedule to run regularly
    • Create the API
    • Create the PowerBI report

 

Have questions or suggestions?  Please feel free to comment below or contact me.

8 thoughts to “Keyforge Data Scraping Part 2: The Script”

  1. Is it necessary to use the db? I’m trying to write a Mastodon bot to post random decks/cards to Mastodon. This is my first bot project, I’m using javascript, but know significant powershell (of course, that’s not really going to be helpful for a mastodon bot). In anycase, I had imagined just making a call to the api, and then formatting the returned text for the content of the post.

    1. No, it isn’t necessary. I’m trying to become a better database developer which is the main reason I wanted to move the data to a database. The only real issue I see is that I’m not sure how to search the API for a specific deck. If you’re wanting to do a truly random deck, you could pull a random API page and then a random deck and its cards from that page.

      I think there must be a way to use the API to search for a specific deck.. but I haven’t seen anything on that online and I haven’t figured it out myself. If you do run across that, I’d love to know!

      1. So, the ID you get from the QR code is different than the id used in the API and deck URLs. Has anyone already figured out how to get the deck id from the 15 digit code from the QR code?

  2. Hey Allison,

    thank you for this posts u made, i am thinking about creating an easy mobile app to keep track of win or lose ratio with your own decks and optional the chance to scan the opponents deck for further documentation. also i wanna add a section for a deck analyzer (Amber, Board Controll, Amber Control, Effiency etc.) with some easy graphs like burgertokens.com uses..

    any idea how i can start this project?

  3. Allison,

    I have your script working – I think. Thanks for posting this! It is exactly the kind of help I needed for a project. If you would be willing to talk about this work, there’s still a few questions I have.

    Ben

Leave a Reply to casktapper Cancel reply

Your email address will not be published. Required fields are marked *