AI System
This document explains how the computer players work in Notty.
Overview
Computer players in Notty use Q-Learning, a reinforcement learning technique, to make strategic decisions. The AI learns from experience and improves over time.
Q-Learning Basics
Q-Learning is a type of machine learning where an agent learns to make decisions by:
- Observing the current state of the game
- Choosing an action to perform
- Receiving a reward based on the outcome
- Learning from the reward to improve future decisions
Key Concepts
- State: The current situation in the game (hand size, deck size, etc.)
- Action: What the AI can do (draw cards, steal, discard, etc.)
- Reward: Points given for good or bad actions
- Q-Table: A lookup table that stores the value of each action in each state
How the AI Works
State Representation
The AI simplifies the game state into four features:
- Hand size bucket: 0-4 cards, 5-9 cards, 10-14 cards, or 15-20 cards
- Deck size bucket: Low (<30), Medium (30-59), or High (≥60)
- Can discard: Whether the AI can discard a valid group
- Opponent hand size: Average hand size of other players (bucketed)
This simplified state makes learning faster and more efficient.
Action Selection
The AI uses an epsilon-greedy strategy:
- Exploration (ε% of the time): Try a random action to discover new strategies
- Exploitation (1-ε% of the time): Choose the best known action based on past experience
The exploration rate (epsilon) starts at 20% and gradually decreases to 5% as the AI learns.
Reward System
The AI receives rewards based on its actions:
| Action | Reward | Reason |
|---|---|---|
| Win the game | +100.0 | Biggest reward for achieving the goal |
| Discard group | +10.0 | Good - reduces hand size |
| Reduce hand size | +2.0 per card | Encourages getting closer to winning |
| Steal card | +0.5 | Neutral - takes from opponent |
| Draw cards | -0.5 | Small penalty - increases hand size |
| Pass turn | -1.0 | Penalty for not making progress |
Learning Process
After each action, the AI updates its Q-table using this formula:
Q(state, action) = Q(state, action) + α ×
[reward + γ × max(Q(next_state)) - Q(state, action)]
Where:
- α (alpha) = 0.1 (learning rate - how much to update)
- γ (gamma) = 0.9 (discount factor - how much to value future rewards)
AI Strategy Components
Draw Count Selection
When drawing cards, the AI considers:
- Hand size: Draw fewer if close to 20-card limit
- Card potential: Draw more if close to forming groups
- Opponent status: Draw more if opponents have few cards
- Deck size: Be conservative if deck is low
Target Player Selection
When stealing, the AI targets:
- Players with the most cards (more likely to have useful cards)
- Avoids players with very few cards (close to winning)
Card Discard Selection
When using Draw & Discard, the AI discards:
- Cards that don't fit into potential groups
- Duplicate cards that aren't useful
- Cards with numbers/colors that are isolated
Group Discard Selection
The AI finds the best group to discard by:
- Finding all valid groups in hand
- Prioritizing larger groups (more cards discarded)
- Choosing groups that leave useful cards in hand
Persistent Learning
The AI's Q-table is saved to disk periodically:
- Auto-save: Every 100 actions
- Manual save: When the game exits
- Save location:
~/.notty/qlearning/notty_qtable.pkl
This means the AI remembers what it learned across multiple games and continues to improve.
Learning Statistics
The AI tracks:
- Total actions: How many actions it has taken
- Exploration actions: How many were random (exploring)
- Epsilon: Current exploration rate
- Q-table size: Number of state-action pairs learned
AI Behavior
Early Games (Exploration Phase)
- Takes more random actions (20% exploration)
- Tries different strategies
- Builds up the Q-table
- May make suboptimal moves
Later Games (Exploitation Phase)
- Takes fewer random actions (5% exploration)
- Uses learned strategies
- Makes more optimal moves
- Still explores occasionally to find better strategies
Implementation Details
Files
notty/src/qlearning_agent.py: Q-Learning agent implementationnotty/src/computer_action_selection.py: AI strategy and decision-making
Key Classes
- QLearningAgent: Manages Q-table, learning, and action selection
- computer_chooses_action(): Main function for AI decision-making
Timing
- Computer players have a 1-second delay between actions
- This makes the game easier to follow visually
- The delay is purely cosmetic and doesn't affect AI performance
Customization
You can modify the AI behavior by
changing these parameters in computer_action_selection.py:
agent = QLearningAgent(
alpha=0.1, # Learning rate (0.0 to 1.0)
gamma=0.9, # Discount factor (0.0 to 1.0)
epsilon=0.2, # Initial exploration rate (0.0 to 1.0)
epsilon_decay=0.9995, # How fast epsilon decreases
epsilon_min=0.05 # Minimum exploration rate
)
Future Improvements
Potential enhancements to the AI system:
- More sophisticated state representation
- Deep Q-Learning with neural networks
- Multi-agent learning (AIs learn from each other)
- Different difficulty levels
- Opponent modeling
For game rules, see Game Mechanics.