GSoC 2024 Report - PictoPy

Project Overview

PictoPy is a project developed during Google Summer of Code 2024 for AOSSIE. The goal was to create a backend system for image processing, object detection, and face recognition. The documentation for the entire project can be found here.

Phase 1 (Mid-Phase)

Setup and Initial Development

Backend Setup
- Initialized a FastAPI backend with a standard directory structure
- Faced challenges with large library sizes for object detection models
- Switched to ONNX format, reducing environment size from 5GB (with PyTorch GPU) to ~400MB
Related PRs
- PR-25 (Merged)
- PR-26 (Closed)
Routing Logic with Parallel Processing
- Implemented non-blocking requests using FastAPI
- Explored various options for parallel processing (threading, multiprocessing)
- Settled on asyncio for concurrent processing of multiple images
- Switched from uvicorn to hypercorn for better cross-platform compatibility
Database Design
- Developed schemas for storing image and album information
- Image schema includes file path, object detection results, metadata, and potential face embeddings
- Album schema contains multiple images and album-specific information
- Plans to refine schemas for more concrete mappings and handle edge cases
Related PRs
- PR-29 (Merged)
- PR-30 (Merged)

Phase 2 (Final Phase)

Feature Implementation

Face Embeddings
- Integrated object detection model
- Implemented face detection and embedding generation
- Tested various models (ArcFace, VGG, etc.)
- Selected FaceNet for optimal size and performance constraints
- Used ultralytics model for object and face detection
- Generated face embeddings using FaceNet
- All models converted to ONNX format for efficiency
Related PR

PR-34 (Merged)
Face Recognition and Schema Updates
- Implemented face clustering using DBSCAN algorithm
- Updated schemas to accommodate clustering results
- Ensured proper handling of image operations (add/delete) affecting clusters
- Completed core project requirements (image database, album database, object detection, face detection, and recognition)
- Optimized DBSCAN parameters for ONNX model performance
Related PR

PR-36 (Merged)
Documentation and API Collection
- Created comprehensive documentation on setup, directory structure, and model architecture
- Developed a Postman Collection for API testing and development
- Utilized mkdocs-material for documentation, hosted here
- Provided a Dockerfile for backend containerization
Related PRs
- PR-37 (Closed)
- PR-39 (Merged)
- PR-41 (Merged)

Project Completion Status

Status: Completed
Significant Changes: None, adhered to original plan with minor adjustments (e.g., using ONNX runtime instead of exploring OpenVINO)

Future Plans

Potential development of an Electron-based frontend
Continued contribution to the project
Promotion of PictoPy to leverage its potential and encourage further development