Collecting Data from Steam

Introduction

Steam, a digital distribution platform developed by Valve, is a go-to hub for gamers around the world. While it primarily focuses on distributing and managing video games, it also fosters a thriving community through features like:

  • Friends
  • Groups
  • Discussions
  • Reviews

Recently, while browsing the Steam page for Helldivers™️ 2, something caught my eye—a noticeable drop in user reviews over just a few days. I was initially interested in buying the game, but that sudden dip made me curious. What was driving the negative feedback? So, I dove into the reviews to see what was going on. Turns out, many players were frustrated by Sony’s decision to force users onto the PSN network to play the game, and that didn’t sit well with them.

Despite the backlash, my friends were eager to play, so I caved and bought the game anyway. Reviews almost swayed my decision, but peer pressure won out this time. For many others, though, those reviews can be make-or-break when it comes to hitting “buy.” They can also offer developers invaluable insights into what keeps their community happy—or, as in this case, what makes them really upset. 😦

That got me thinking: how much can we learn from these Steam reviews? Time to dig in and see what interesting patterns or insights they reveal!

Data Collection

First Attempt - Using BeautifulSoup4

From what I studied in university, I had some familiarity with web crawlers, so I thought, why not try building something similar to gather my data? The plan was simple: create a script that visits Steam’s review pages (one at a time) and collects user reviews. To get started, I used a handy library called BeautifulSoup.

import urllib
import bs4
import time
import re

GAMEID = 220 # Half Life 2

urlopener = urllib.request.urlopen(f"https://steamcommunity.com/app/{GAMEID}/positivereviews/?browsefilter=toprated&snr=1_5_100010_#scrollTop=2000")
time.sleep(4)

soup = bs4.BeautifulSoup(urlopener.read(), 'html.parser')

After going through the HTML of a typical Steam review page, it wasn’t too hard to locate the data I needed. Here’s an example of how I implemented the basic scraping logic.

urlopener = urllib.request.urlopen(f"https://steamcommunity.com/app/{GAMEID}/positivereviews/?browsefilter=toprated&snr=1_5_100010_#scrollTop=2000")
time.sleep(4)

soup = bs4.BeautifulSoup(urlopener.read(), 'html.parser')
 The year was 2004. "Friends" had its series finale, "The Apprentice" had its premiere, and I as a young teenager in a small town spent long winter afternoons at my friend's house playing video games. It was always at his house. He had a shiny new GeForce 6800, compared to my aging Voodoo. PC games never really held much more than a passing fascination for me since any game worth playing was available on a console, largely Nintendo ones. What a fool I was. On Christmas, he received a copy of a computer game he said would change everything. That game was Half-Life 2. That December everything changed.We were the right man in the wrong place. We picked up the can. We learned why we don't go to Ravenholm anymore. We launched barrels into the sky just to watch them fall.My PC had no chance to deal with all the physics calculations, let alone the lighting and particle effects. It ran, at best, 15 fps. So I played the whole thing with my friend. Then I upgraded my computer, and explored the world of City 17 once more. A few years later I upgraded again, and played again. And then I played Portal. And countless others. I upgraded once more, and played more.This game is the reason why I got into PC gaming. This game is the reason why I have experienced the length and breadth of every game I've experienced since.Fast forward fifteen years. "Friends" is the most-streamed show on Netflix, "The Apprentice" is our President, and we will soon revisit City 17 in virtual reality. This game is no longer the technical marvel it was, its innovations copied and improved upon by those developers who followed. The physics and lighting effects that brought computers to their knees are now standard. Indeed, my mid-range build now runs this game at 300fps. But the story, although unfinished, is still as engrossing as it ever was. Leading the resistance against the Combine is just as exciting as it ever was. Gordon is just as enigmatic as he ever was.HL2 has aged, and it has aged well. It's still fun, it's still a masterpiece. But at that time, in the long nights and deep snows of December 2004, it was perfect.
  I just recently finished Half-Life 2, for the second time.  The first was some time ago.  But this time, finishing the game was different.  My daughter watched me play through the game, always pointing out little things.  After I completed the game, she said "Dad, I really liked watching you play Half-Life 2"  And I said "I know, it was fun, but now it's over and I will be playing something else."  She looked at me, full of innocence, and said, "Well, maybe you can get Half-Life 3, and I can watch you play it?"  I looked at her, moisture rimming my eyes, and said, "I don't think that is going to happen hun, there isn't a Half-Life 3, and likely won't ever be."  I will give this game a recommend, because it is amazing.  But with no closure in sight, it's a difficult decision.  All I can say is:"Wake up, Mr. Freeman"
  Whoever invented the poison headcrab is a danger to society
 Product received for free  I will continue to pick up cans until HL3.
  This one of the best first person shooter games I've ever played, plain and simple.Half Life 2 begins 20 years after the original Half Life. As Gordon Freeman awakens from stasis he finds himself on a train heading to City 17 which is overrun by the combine. You find yourself as the leader of the resistance and you must take down the Combine. Half Life 2 is at this point one of the best received games and I'm sure pretty much everyone has played it. If not well what the hell are you doing reading this review ? You need to go buy this game straight away and play it right now.The game is a simple first person shooter, but it does plenty things different than the regular first person shooters. You have a health bar and you have to do platforming to solve some of the puzzles.  Not to mention the source engine physics are absolutely insane. For a game that came out back in 2004 it still looks amazing. Now while it does have so low res textures it's still looks breath taking, like in the final chapters.The game for the most part is pretty easy. I played through the game without a problem, even though some segments were tough due to the fast movement you have to pull off to avoid gunfire. Not to mention the Strider battles are incredibly intense and quite difficult.The game took me 10 hours to finish. I felt this was the perfect amount of time for the price I paid for it and this was worth every single penny.Apart from the single playthrough there's not much replay value. The only reason to replay it twice is to look for the G-Man Easter eggs. In the end Half Life 2 is a fantastic game. It's a must own for everyone who is a gamer. This game is a piece of history and without a doubt one of the best games I have ever played in my entire life.I experienced zero problems while playing the game. I ran the game on ultra settings 60 FPS the entire time of playing. This game runs really well and should run on pretty much all systems.Final Rating:9.5/10 -  A Must own and must playIf you liked this review please consider giving it a thumbs up and if you disliked post in the comments what you disliked about the review.For more reviews follow Snort's Review curator page here - follow for regular updates on reviews for other games!
 All I want is a chance to play HL3 before I die. 
  Don't cry because it's over.Smile because it happenedgoodnight, Half-LifeEDIT: NEVERMIND
 Rise and shine, Mr. Freeman. Rise and shine. 10/10 
  "GAME THAT DESERVE THE SEQUEL" award
  I was 12 years old. My uncle (HAI ALLAN) built me a computer, my family didn't have internet and we booted up steam at his house installed Half Life 2 and checked the box to boot in offline mode. STEAM IS STILL UPDATING YAY.I had an AMD Athlon 64 Dual Core (♥♥♥♥♥♥♥♥♥), it might have been my uncles old Voodoo Geforce 3, I can't remember, a whole 1 GB of RAM (He was super jelly lol it was Geil memory which served me well). I also had a 12GB HDD (might be wrong on that) and an Iiyama 1280x1024 native CRT! I was ready to play the greatest game of my life.The second I saw G-mans face was just... I had never seen animations like it. DAT LIP SYNCING. Dem loading screens though.Here I am 13 years later. 13. Years. Later. Im ♥♥♥♥♥♥♥ 26 with 2 kids.I play the series once a year as is my tradition and I will always remember fondly that time with my uncle loading Half Life 2. The Half Life 2. I could even tell you the clothes I was wearing.Fantastic game and many cherished memories.

While the code was functional, my initial results were a bit underwhelming. I expected to pull in far more reviews for a game, but it turned out that scraping like this wasn’t as effective as I had hoped. I tried tweaking my crawler, but it didn’t lead to better results.

Second Attempt - Using Steamworks API

That’s when I discovered the ✨Steamworks API✨—why didn’t I think of this earlier?! 😭 Going through the User Review API, I quickly wrote another script that goes over a list of games and start dumping the reviews.

import requests
import json

response = requests.get(f"https://store.steampowered.com/appreviews/220?json=1")
response_dict = json.loads(response.text)

Using a simple request, I could pull the 20 most helpful reviews from the last 365 days for Half-Life 2 (still holding out hope for HL3 🙏). The first review in the response?

I will continue to pick up cans until HL3.

Classic.

The API offers several parameters to fine-tune the reviews you want to fetch. The ones I found most useful were language, review_type, and num_per_page. You also need to provide the appid to pull reviews for specific games.

Once I had everything working, I needed a way to organize the data. Each review gets saved in a .txt file, and the filename includes important metadata like the review type and app ID. I decided to structure the file names as <REVIEW_TYPE>_<INDEX_OF_REVIEW>_<APPID>.txt, where REVIEW_TYPE is 1 for positive and 0 for negative, and INDEX_OF_REVIEW is a unique identifier for each review. This setup makes it easy to dump reviews and start analyzing them without much fuss. Things weren’t supposed to be this simple (cue foreshadowing).

import json
import os
import requests

APPIDS = [386360, 578080, 548430, 271590, 2357570, 730, 435150, 632360, 457140, 322330, 289070, 4000, 1794680, 1145360, 892970, 220, 359550, 648800, 413150, 1174180, 1245620, 379720, 945360, 1938090, 2195250]
LANGUAGE = "english"
NUM_PER_PAGE = 50

if not os.path.exists("data"):
  os.mkdir("data")

for APPID in APPIDS:
  positive_response = requests.get(f"https://store.steampowered.com/appreviews/{APPID}?json=1&num_per_page={NUM_PER_PAGE}&language={LANGUAGE}&review_type=positive")
  negative_response = requests.get(f"https://store.steampowered.com/appreviews/{APPID}?json=1&num_per_page={NUM_PER_PAGE}&language={LANGUAGE}&review_type=negative")

  positive_response_dict = json.loads(positive_response.text)
  negative_response_dict = json.loads(negative_response.text)


  for index, review in enumerate(positive_response_dict['reviews']):
    with open(f"data/1_{index:02d}_{APPID}.txt", "a", encoding="utf-8") as f:
      f.write(review["review"])

  for index, review in enumerate(negative_response_dict['reviews']):
    with open(f"data/0_{index:02d}_{APPID}.txt", "a", encoding="utf-8") as f:
      f.write(review["review"])

As it turns out, while the reviews were being saved just fine, managing individual files like this quickly became a nightmare. On top of that, I was discarding a lot of additional data that might come in handy later. Clearly, this plan wasn’t going to work.

So, I scrapped the whole thing. After revisiting the API response, I realized that its structure wasn’t all that complicated—it would actually fit perfectly into a table. And we all know what that means: SQL time! This time around, I decided to save all the extra data for future use, just in case.

This time, I also decided to expand the scope of the games I was working with—and guess what? I found an API call that gives a list of all the games available on Steam. Isn’t that sweet?

With that in mind, I created four tables in PostgreSQL:

  • all_apps: A table to list all the games, their name and description.
                      Table "all_apps"
        Column    |  Type   | Collation | Nullable | Default 
      -------------+---------+-----------+----------+---------
      appid       | integer |           | not null | 
      name        | text    |           |          | 
      description | text    |           |          | 
      Indexes:
          "all_apps_pkey" PRIMARY KEY, btree (appid)
      Referenced by:
          TABLE "all_reviews" CONSTRAINT "all_reviews_appid_fkey1" FOREIGN KEY (appid) REFERENCES all_apps(appid)
          TABLE "all_status" CONSTRAINT "all_status_appid_fkey" FOREIGN KEY (appid) REFERENCES all_apps(appid)
    
  • all_reviews: A table to dump the response from the Steam Review API.
                                      Table "all_reviews"
                Column            |            Type             | Collation | Nullable | Default 
      -----------------------------+-----------------------------+-----------+----------+---------
      reccomendationid            | text                        |           | not null | 
      appid                       | integer                     |           |          | 
      language                    | text                        |           |          | 
      review                      | text                        |           |          | 
      timestamp_created           | timestamp without time zone |           |          | 
      timestamp_updated           | timestamp without time zone |           |          | 
      voted_up                    | boolean                     |           |          | 
      votes_up                    | bigint                      |           |          | 
      votes_funny                 | bigint                      |           |          | 
      weighted_vote_score         | double precision            |           |          | 
      comment_count               | integer                     |           |          | 
      steam_purchase              | boolean                     |           |          | 
      received_for_free           | boolean                     |           |          | 
      written_during_early_access | boolean                     |           |          | 
      developer_response          | text                        |           |          | 
      timestamp_dev_response      | timestamp without time zone |           |          | 
      primarily_steam_deck        | boolean                     |           |          | 
      Indexes:
          "all_reviews_pkey1" PRIMARY KEY, btree (reccomendationid)
      Foreign-key constraints:
          "all_reviews_appid_fkey1" FOREIGN KEY (appid) REFERENCES all_apps(appid)
          "all_reviews_reccomendationid_fkey" FOREIGN KEY (reccomendationid) REFERENCES all_users(reccomendationid)
    
  • all_users: A table to dump user data from the Steam Review API.
                                      Table "all_users"
              Column          |            Type             | Collation | Nullable | Default 
      -------------------------+-----------------------------+-----------+----------+---------
      reccomendationid        | text                        |           | not null | 
      steamid                 | text                        |           |          | 
      num_games_owned         | integer                     |           |          | 
      num_reviews             | integer                     |           |          | 
      playtime_forever        | integer                     |           |          | 
      playtime_last_two_weeks | integer                     |           |          | 
      playtime_at_review      | integer                     |           |          | 
      deck_playtime_at_review | integer                     |           |          | 
      last_played             | timestamp without time zone |           |          | 
      Indexes:
          "all_users_pkey1" PRIMARY KEY, btree (reccomendationid)
      Referenced by:
          TABLE "all_reviews" CONSTRAINT "all_reviews_reccomendationid_fkey" FOREIGN KEY (reccomendationid) REFERENCES all_users(reccomendationid)
    
  • all_status: A temporary table to log and resume the review collection process.
                    Table "public.all_status"
        Column    |  Type   | Collation | Nullable | Default 
      -------------+---------+-----------+----------+---------
      appid       | integer |           | not null | 
      is_complete | boolean |           |          | 
      last_cursor | text    |           |          | 
      Indexes:
          "all_status_pkey" PRIMARY KEY, btree (appid)
      Foreign-key constraints:
          "all_status_appid_fkey" FOREIGN KEY (appid) REFERENCES all_apps(appid)
    

Now, you might be wondering why I used reccomendationid as the primary key for the all_users table when a steamid seems like a more natural choice for identifying users. Initially, I thought the same thing. But after collecting over a million data points, I realized that reccomendationid is tied to the game, not just the user. Fields like playtime_forever and playtime_last_two_weeks are game-specific metrics, not user-specific. That’s why using reccomendationid as the primary key made more sense.

So now, without further ado, let me introduce what you’ve all been waiting for: THE FINAL DATA COLLECTION SCRIPT ✨

import json
import requests
import urllib
import tqdm

import psycopg2

def get_all_steam_public_apps():
    response = requests.get("https://api.steampowered.com/ISteamApps/GetAppList/v2/")
    response_dict = json.loads(response.text)
    return response_dict['applist']['apps']

def get_steam_reviews(
    cursor,
    app_id,
    language,
):
    response = requests.get(f"https://store.steampowered.com/appreviews/{app_id}?json=1&num_per_page=100&language={language}&review_type=all&cursor={cursor}&filter_offtopic_activity=0&filter=updated")
    response_dict = {}
    try:
        response_dict = json.loads(response.text)
    except json.JSONDecodeError:
        response_dict = json.loads(response.text.encode('utf-8-sig').decode('utf-8-sig'))
    except Exception as e:
        response_dict['success'] = 0
        
    if len(response_dict['reviews']) == 0:
        response_dict['success'] = 0
    
    if response_dict['success'] == 1:
        return urllib.parse.quote(response_dict['cursor']), response_dict['reviews']
    else:
        return urllib.parse.quote("*"), []

def postprocess_review(review):
    if 'recommendationid' not in review:
        raise KeyError("recommendationid not found in review")
    
    if 'author' not in review:
        raise KeyError("author not found in review")
    
    if 'steamid' not in review['author']:
        raise KeyError("steamid not found in review")
    
    review['author']['steamid'] = review['author']['steamid'] if 'steamid' in review['author'] else None
    review['author']['num_games_owned'] = review['author']['num_games_owned'] if 'num_games_owned' in review['author'] else None
    review['author']['num_reviews'] = review['author']['num_reviews'] if 'num_reviews' in review['author'] else None
    review['author']['playtime_forever'] = review['author']['playtime_forever'] if 'playtime_forever' in review['author'] else None
    review['author']['playtime_last_two_weeks'] = review['author']['playtime_last_two_weeks'] if 'playtime_last_two_weeks' in review['author'] else None
    review['author']['playtime_at_review'] = review['author']['playtime_at_review'] if 'playtime_at_review' in review['author'] else None
    review['author']['deck_playtime_at_review'] = review['author']['deck_playtime_at_review'] if 'deck_playtime_at_review' in review['author'] else None
    review['author']['last_played'] = review['author']['last_played'] if 'last_played' in review['author'] else None
    
    review['language'] = review['language'] if 'language' in review else None
    review['review'] = review['review'] if 'review' in review else None
    review['timestamp_created'] = review['timestamp_created'] if 'timestamp_created' in review else None
    review['timestamp_updated'] = review['timestamp_updated'] if 'timestamp_updated' in review else None
    review['voted_up'] = review['voted_up'] if 'voted_up' in review else None
    review['votes_up'] = review['votes_up'] if 'votes_up' in review else None
    review['votes_funny'] = review['votes_funny'] if 'votes_funny' in review else None
    review['weighted_vote_score'] = review['weighted_vote_score'] if 'weighted_vote_score' in review else None
    review['comment_count'] = review['comment_count'] if 'comment_count' in review else None
    review['steam_purchase'] = review['steam_purchase'] if 'steam_purchase' in review else None
    review['received_for_free'] = review['received_for_free'] if 'received_for_free' in review else None
    review['written_during_early_access'] = review['written_during_early_access'] if 'written_during_early_access' in review else None
    review['developer_response'] = review['developer_response'] if 'developer_response' in review else None
    review['timestamp_dev_response'] = review['timestamp_dev_response'] if 'timestamp_dev_response' in review else None
    review['primarily_steam_deck'] = review['primarily_steam_deck'] if 'primarily_steam_deck' in review else None
    
    return review
    

if __name__=="__main__":
    connection = psycopg2.connect(database = "steam_data", user = "ayushchamoli")
    cursor = connection.cursor()
    
    all_apps = get_all_steam_public_apps()
    all_apps.sort(key=lambda app: app['appid'])
    
    _b_table_exists = cursor.execute("select exists(select * from information_schema.tables where table_name=%s)", ("all_apps",))
    _b_table_exists = cursor.fetchone()[0]
    if not _b_table_exists:    
        print("Creating table")
        cursor.execute(f"create table all_apps (appid int primary key, name text, description text)")
        
    for app in tqdm.tqdm(all_apps):
        try:
            cursor.execute("insert into all_apps (appid, name) values (%s, %s)", (app['appid'], app['name']))
        
        except Exception:
            connection.commit()
    
    _b_user_table_exists = cursor.execute("select exists(select * from information_schema.tables where table_name=%s)", ("all_users",))
    _b_user_table_exists = cursor.fetchone()[0]
    if not _b_user_table_exists:
        print("Creating user table")
        cursor.execute("""create table all_users (
                            reccomendationid text,
                            steamid text,
                            num_games_owned int,
                            num_reviews int,
                            playtime_forever int,
                            playtime_last_two_weeks int,
                            playtime_at_review int,
                            deck_playtime_at_review int,
                            last_played timestamp,
                            primary key (reccomendationid)
                            )""")
    
    _b_review_table_exists = cursor.execute("select exists(select * from information_schema.tables where table_name=%s)", ("all_reviews",))
    _b_review_table_exists = cursor.fetchone()[0]
    if not _b_review_table_exists:
        print("Creating review table")
        cursor.execute("""create table all_reviews (
                            reccomendationid text references all_users(reccomendationid), 
                            appid int references all_apps(appid),
                            language text,
                            review text,
                            timestamp_created timestamp,
                            timestamp_updated timestamp,
                            voted_up boolean,
                            votes_up bigint,
                            votes_funny bigint,
                            weighted_vote_score float,
                            comment_count int,
                            steam_purchase boolean,
                            received_for_free boolean,
                            written_during_early_access boolean,
                            developer_response text,
                            timestamp_dev_response timestamp,
                            primarily_steam_deck boolean,
                            primary key (reccomendationid)
                            )
                            """)
    
    _b_all_status_exists = cursor.execute("select exists(select * from information_schema.tables where table_name=%s)", ("all_status",))
    _b_all_status_exists = cursor.fetchone()[0]
    if not _b_all_status_exists:
        print("Creating review collection status table")
        cursor.execute("""create table all_status (
                            appid int references all_apps(appid),
                            is_complete boolean,
                            last_cursor text,
                            primary key (appid)
                            )""")
    
    cursor.execute("select appid from all_apps")
    all_app_ids = cursor.fetchall()
    
    for app_id in tqdm.tqdm(all_app_ids):
        app_id = app_id[0]
        app_id_status = cursor.execute("select is_complete from all_status where appid=%s", (app_id,))
        app_id_status = cursor.fetchone()
        if app_id_status is not None and app_id_status[0]:
            continue
        
        last_cursor = cursor.execute("select last_cursor from all_status where appid=%s", (app_id,))
        last_cursor = cursor.fetchone()
        if last_cursor is None:
            last_cursor = "*"
            
        if last_cursor[0] == "-":
            continue
        
        last_cursor = last_cursor[0]

            
        steam_cursor = urllib.parse.quote(last_cursor)
        while True:
            steam_cursor, reviews = get_steam_reviews(steam_cursor, app_id, "english")
            cursor.execute("insert into all_status (appid, last_cursor) values (%s, %s) on conflict (appid) do update set last_cursor = excluded.last_cursor", (app_id, steam_cursor))

            for review in reviews:
                review = postprocess_review(review)
                try:
                    cursor.execute("""insert into all_users (reccomendationid, steamid, num_games_owned, num_reviews, playtime_forever, playtime_last_two_weeks, playtime_at_review, deck_playtime_at_review, last_played)
                                        values (%s, %s, %s, %s, %s, %s, %s, %s, (to_timestamp(%s)))
                                        """, 
                                        (review['recommendationid'], review['author']['steamid'], review['author']['num_games_owned'], review['author']['num_reviews'], review['author']['playtime_forever'], review['author']['playtime_last_two_weeks'], review['author']['playtime_at_review'], review['author']['deck_playtime_at_review'], review['author']['last_played']))
                                    
                    cursor.execute("""insert into all_reviews (reccomendationid, appid, language, review, timestamp_created, timestamp_updated, voted_up, votes_up, votes_funny, weighted_vote_score, comment_count, steam_purchase, received_for_free, written_during_early_access, developer_response, timestamp_dev_response, primarily_steam_deck) 
                                        values (%s, %s, %s, %s, (to_timestamp(%s)), (to_timestamp(%s)), %s, %s, %s, %s, %s, %s, %s, %s, %s, (to_timestamp(%s)), %s)
                                        """,
                                        (review['recommendationid'], app_id, review['language'], review['review'], review['timestamp_created'], review['timestamp_updated'], review['voted_up'], review['votes_up'], review['votes_funny'], review['weighted_vote_score'], review['comment_count'], review['steam_purchase'], review['received_for_free'], review['written_during_early_access'], review['developer_response'], review['timestamp_dev_response'], review['primarily_steam_deck']))
            
                except Exception as e:
                    print(f"Error in inserting review: {e}")
                    print(review)
                    connection.commit()
                    exit(1)
                    
            connection.commit()
            if len(reviews) == 0:
                break
        
        cursor.execute("insert into all_status (appid, is_complete, last_cursor) values (%s, %s, %s) on conflict (appid) do update set last_cursor = excluded.last_cursor, is_complete = excluded.is_complete", (app_id, True, "-"))
        connection.commit()

The script pulls reviews for each game, processes the data, and dumps it into the right tables. It’s a smart setup because it handles everything from storing how many games a user owns to tracking whether they played on Steam Deck. Plus, it uses cursors to keep track of progress—if something crashes, it just picks up where it left off. 💾 The script even handles those pesky cases where the API response might be missing fields or has odd encodings. And finally, the all_status table ensures we can log what’s been collected and what’s still pending. It’s efficient, it’s clean, and it’s built to scale—exactly how data scraping should be! 💪

And guess how many reviews I’ve collected so far? 1.5 million. 😅 Phew… but here’s the wild part—that’s only from 1,586 apps, while Steam has over 200,000! 😱 For now, I think I’ve got plenty of data to dive into, but trust me, I’ll be scraping more as I go. Oh, and before I forget, here’s the data [i will be updating it from time to time].




    Enjoy Reading This Article?

    Here are some more articles you might like to read next:

  • Fine-Tune LLM for Code Generation - Data Collection and Preprocessing