BIT

Analyzing my 11k bash commands

history | wc informs me that I’ve typed exactly 11005 bash commands for the 278 days I’ve used Ubuntu (ls -lt /var/log/installer). However, I only expanded my history storage about 3 months ago, so I’ve probably lost out on 60% of the commands I’ve ever typed.

Still, I’m intrigued to investigate what commands have I been writing, so I set out to write some scripts to find out. I hypothesize the front runners to be  cd, ls, and docker.

Top 10 commands executed

I wrote the following script:

import os
import sys
from collections import Counter
from itertools import dropwhile

if __name__ == " __main__":
    history_path = os.path.join("", os.path.expanduser("~"), ".bash_history")
    raw_history = open(history_path, 'r').read().splitlines()

    def nonEmpty(str: str) -> bool:
        return len(str) > 0

    def isSegmentEnvVar(segment: str):
        return '=' in segment

    def isSegmentDate(segment: str):
        return segment[0].isnumeric()

    # remove blank lines and dates and env varibales
    history = list(map(lambda row: ' '.join(
        dropwhile(lambda segment: isSegmentDate(segment) or isSegmentEnvVar(segment), row.split())), raw_history
        ))
    history = list(filter(nonEmpty, history))
    
    def top_n_command_name(n: int):
        command_names = map(lambda row: row.split(" ")[0], history)
        return Counter(command_names).most_common(n)
        
    print(top_n_command_name(10))

Results:

Command Times Executed
cd 1522
ls 1446
npm 725
sudo 684
python3 588
git 513
cb-dev-kit 449
cb-cli 444
code 396
node 296

isSegmentEnvVar serves to remove the leading environment variables (e.g, ENV1=hello python3 driver.py).

isSegmentDate serves to remove the leading date information that might appear in a standard history entry.

Looks like I use a lot of node and python. Interesting, I would’ve thought that I’ve used more gcc than the other two.

10 most frequent commands with 2 keywords

    def top_n_k_keyword(n: int, k: int):
        keywords_list = [' '.join(row.split()[:k])
                          for row in history if len(row.split()) >= k]
        return Counter(keywords_list).most_common(n)
        
    print(top_n_k_keyword(10, 2))
Command Times Executed
cd .. 324
code . 142
npm start 74
cd packages/server/ 71
ls -R 44
cb-cli init 43
cd ../.. 42
stack build 39
cb-dev-kit generate 38
git push 37

My full script with 3 more options

unfooling-blog-snippets/bash-history-analysis at main · BlastWind/unfooling-blog-snippets
Code snippets for my blog posts at https://unfooling.com - unfooling-blog-snippets/bash-history-analysis at main · BlastWind/unfooling-blog-snippets

I designed 3 more options:

  • Shortest N commands
  • Longest N commands
  • N most frequent full commands

Run the script I linked above with no arguments to get some default output. Join my discord and let me know your results!