Skip to content

GSoC 2024 Report - PictoPy

Project Overview

PictoPy is a project developed during Google Summer of Code 2024 for AOSSIE. The goal was to create a backend system for image processing, object detection, and face recognition. The documentation for the entire project can be found here.

Phase 1 (Mid-Phase)

Setup and Initial Development

  1. Backend Setup

    • Initialized a FastAPI backend with a standard directory structure
    • Faced challenges with large library sizes for object detection models
    • Switched to ONNX format, reducing environment size from 5GB (with PyTorch GPU) to ~400MB

    Related PRs

  2. Routing Logic with Parallel Processing

    • Implemented non-blocking requests using FastAPI
    • Explored various options for parallel processing (threading, multiprocessing)
    • Settled on asyncio for concurrent processing of multiple images
    • Switched from uvicorn to hypercorn for better cross-platform compatibility
  3. Database Design

    • Developed schemas for storing image and album information
    • Image schema includes file path, object detection results, metadata, and potential face embeddings
    • Album schema contains multiple images and album-specific information
    • Plans to refine schemas for more concrete mappings and handle edge cases

    Related PRs

Phase 2 (Final Phase)

Feature Implementation

  1. Face Embeddings

    • Integrated object detection model
    • Implemented face detection and embedding generation
    • Tested various models (ArcFace, VGG, etc.)
    • Selected FaceNet for optimal size and performance constraints
    • Used ultralytics model for object and face detection
    • Generated face embeddings using FaceNet
    • All models converted to ONNX format for efficiency

    Related PR

    PR-34 (Merged)

  2. Face Recognition and Schema Updates

    • Implemented face clustering using DBSCAN algorithm
    • Updated schemas to accommodate clustering results
    • Ensured proper handling of image operations (add/delete) affecting clusters
    • Completed core project requirements (image database, album database, object detection, face detection, and recognition)
    • Optimized DBSCAN parameters for ONNX model performance

    Related PR

    PR-36 (Merged)

  3. Documentation and API Collection

    • Created comprehensive documentation on setup, directory structure, and model architecture
    • Developed a Postman Collection for API testing and development
    • Utilized mkdocs-material for documentation, hosted here
    • Provided a Dockerfile for backend containerization

    Related PRs

Project Completion Status

  • Status: Completed
  • Significant Changes: None, adhered to original plan with minor adjustments (e.g., using ONNX runtime instead of exploring OpenVINO)

Future Plans

  • Potential development of an Electron-based frontend
  • Continued contribution to the project
  • Promotion of PictoPy to leverage its potential and encourage further development

Blog Posts