Avitai Knowledge
Multi-Modal Knowledge Extraction
Extract, structure, and query knowledge from scientific literature, databases, and experimental data. Multi-modal AI for biological understanding.
Repository Coming Soon
This project is under active development and will be open-sourced soon
Overview
Avitai Knowledge is a comprehensive platform for extracting, structuring, and querying scientific knowledge. In an era where millions of papers are published annually and biological databases contain petabytes of data, finding relevant information has become a major bottleneck in research. Avitai Knowledge solves this problem.
The platform uses cutting-edge multi-modal AI to understand scientific content in all its forms. It doesn't just read text – it interprets figures, parses tables, recognizes chemical structures, and understands the relationships between different types of biological entities. This deep understanding enables powerful capabilities like semantic search, knowledge graph construction, and automated synthesis of findings.
What makes Avitai Knowledge particularly powerful is its ability to work with both public and private data. While it can mine knowledge from PubMed and other public sources, it can also process your internal documents, experimental data, and proprietary information. This creates a unified knowledge base that combines what's known publicly with your organization's unique insights.
The platform is designed for both programmatic access and interactive exploration. Researchers can use natural language queries to find information, browse knowledge graphs visually, or integrate the platform's APIs into computational workflows. Whether you're conducting a literature review, validating a hypothesis, or exploring a new research area, Avitai Knowledge helps you find what you need faster.
Avitai Knowledge also powers the Research foundation model in our main platform, providing the vast biological knowledge required for AI-assisted scientific discovery. By open-sourcing the core technology, we enable the research community to build their own knowledge extraction systems tailored to specific domains.
Key Features
Literature Mining
Extract structured knowledge from millions of scientific papers. Identify entities, relationships, and claims with state-of-the-art NLP models.
Multi-Modal Learning
Understand scientific content across text, tables, figures, and chemical structures. Unified representation of diverse data types.
Knowledge Graphs
Build and query biological knowledge graphs connecting genes, proteins, pathways, diseases, and compounds with rich metadata.
Semantic Search
Find relevant information using natural language queries. Search across literature, databases, and internal experimental data.
Document Understanding
Process complex scientific documents including PDFs, supplementary materials, and patents. Extract methods, results, and conclusions.
Database Integration
Connect to major biological databases (UniProt, PDB, ChEMBL, etc.) and integrate external knowledge into your workflows.
Use Cases
Automated literature review for drug target identification
Extract experimental protocols from published papers
Build company-specific knowledge bases from internal documents
Find similar experiments or compounds across literature
Track research trends and emerging technologies in real-time
Generate research hypotheses by connecting disparate findings
Question answering over scientific literature and databases
Prior art searches for patent applications
Installation
# Install from PyPI
pip install avitai-knowledge
# With all NLP models (requires significant disk space)
pip install avitai-knowledge[full]
# Or install from source
git clone https://github.com/avitai/avitai-knowledge.git
cd avitai-knowledge
pip install -e .Quick Start
from avitai_knowledge import KnowledgeExtractor, KnowledgeGraph
from avitai_knowledge.search import SemanticSearch
# Extract knowledge from papers
extractor = KnowledgeExtractor()
knowledge = extractor.process_papers([
"path/to/paper1.pdf",
"path/to/paper2.pdf"
])
# Build a knowledge graph
kg = KnowledgeGraph()
kg.add_knowledge(knowledge)
# Semantic search
search = SemanticSearch(kg)
results = search.query(
"What are the protein targets of aspirin?"
)
# Or query the knowledge graph
proteins = kg.find_proteins_interacting_with("EGFR")Built With
Ready to Get Started?
Explore the documentation, try examples, or contribute to the project.