C library for handling Kindle (MOBI) formats of ebook documents

Libmobi

C library for handling Mobipocket/Kindle (MOBI) ebook format documents.

For examples on how to use the library have a look at tools folder.

Features:

  • reading and parsing:
    • some older text Palmdoc formats (pdb),
    • Mobipocket files (prc, mobi),
    • newer MOBI files including KF8 format (azw, azw3),
    • Replica Print files (azw4)
  • recreating source files using indices
  • reconstructing references (links and embedded) in html files
  • reconstructing source structure that can be fed back to kindlegen
  • reconstructing dictionary markup (orth, infl tags)
  • writing back loaded documents
  • metadata editing
  • handling encrypted documents

Todo:

  • improve writing
  • serialize rawml into raw records
  • process RESC records

Doxygen documentation:

Source:

Installation:

[for git] $ ./autogen.sh
$ ./configure
$ make
[optionally] $ make test
$ sudo make install

On macOS, you can install via Homebrew with brew install libmobi.

Optionally provided Xcode and MSVC++ project files

Usage

  • single include file: #include <mobi.h>
  • linker flag: -lmobi
  • basic usage:
#include <mobi.h>

/* Initialize main MOBIData structure */
/* Must be deallocated with mobi_free() when not needed */
MOBIData *m = mobi_init();
if (m == NULL) { 
  return ERROR; 
}

/* Open file for reading */
FILE *file = fopen(fullpath, "rb");
if (file == NULL) {
  mobi_free(m);
  return ERROR;
}

/* Load file into MOBIData structure */
/* This structure will hold raw data/metadata from mobi document */
MOBI_RET mobi_ret = mobi_load_file(m, file);
fclose(file);
if (mobi_ret != MOBI_SUCCESS) { 
  mobi_free(m);
  return ERROR;
}

/* Initialize MOBIRawml structure */
/* Must be deallocated with mobi_free_rawml() when not needed */
/* In the next step this structure will be filled with parsed data */
MOBIRawml *rawml = mobi_init_rawml(m);
if (rawml == NULL) {
  mobi_free(m);
  return ERROR;
}
/* Raw data from MOBIData will be converted to html, css, fonts, media resources */
/* Parsed data will be available in MOBIRawml structure */
mobi_ret = mobi_parse_rawml(rawml, m);
if (mobi_ret != MOBI_SUCCESS) {
  mobi_free(m);
  mobi_free_rawml(rawml);
  return ERROR;
}

/* Do something useful here */
/* ... */
/* For examples how to access data in MOBIRawml structure see mobitool.c */

/* Free MOBIRawml structure */
mobi_free_rawml(rawml);

/* Free MOBIData structure */
mobi_free(m);

return SUCCESS;
  • for examples of usage, see tools

Requirements

  • compiler supporting C99
  • zlib (optional, configure --with-zlib=no to use included miniz.c instead)
  • libxml2 (optional, configure --with-libxml2=no to use internal xmlwriter)
  • tested with gcc (>=4.2.4), clang (llvm >=3.4), sun c (>=5.13), MSVC++ (2015)
  • builds on Linux, MacOS X, Windows (MSVC++, MinGW), Android, Solaris
  • tested architectures: x86, x86-64, arm, ppc
  • works cross-compiled on Kindle :)

Tests

  • Travis status
  • Coverity status

Projects using libmobi

License:

  • LGPL, either version 3, or any later

Credits:

  • The huffman decompression and KF8 parsing algorithms were learned by studying python source code of KindleUnpack.
  • Thanks to all contributors of Mobileread MOBI wiki
Owner
Comments
  • convert mobi ebook to epub error

    convert mobi ebook to epub error

    convert mobi file to epub format successfully, but the epub file format is error, it can't be opened by iBooks and many android epub readers. I check the epub file with calibre-edit, and get the error below:

    ERROR: Parsing failed: xmlParseEntityRef: no name, line 1, column 807    [OEBPS/part00000.html]
    INFO: File too large    [OEBPS/part00000.html]
    

    123_test.epub.zip

  • Can't get image from mobi

    Can't get image from mobi

    Hello @bfabiszewski I am using your another lib QLMobi combine with libmobi to parse html and images from mobi book. Most book works great, but some books can not get media image. I have try to fix but can not get the point. Hope you can help,this is the last problem for me i think~ Both QLMobi and libmobi are great nearly perfect lib. Thank you very much for your great job~ World of Warcraft - Dawn of the Aspects Part I.mobi.zip

    Also i am the developer of Alook Browser - 2x Speed (https://itunes.apple.com/us/app/alook-web-browser-2x-speed/id1261944766?mt=8) if you are using iOS ,and here is a promotional code JWYTH3FE4JJK Forgive my poor english~ Best Regards.

  • Trying to get in touch regarding a security issue

    Trying to get in touch regarding a security issue

    Hey there!

    I'd like to report a security issue but cannot find contact instructions on your repository.

    If not a hassle, might you kindly add a SECURITY.md file with an email, or another contact method? GitHub recommends this best practice to ensure security issues are responsibly disclosed, and it would serve as a simple instruction for security researchers in the future.

    Thank you for your consideration, and I look forward to hearing from you!

    (cc @huntr-helper)

  • Fix multiple definition of buffer_init when linking with libmagic

    Fix multiple definition of buffer_init when linking with libmagic

    Hi first of all thank you for the work you put in this library.

    I'm using it in a project where it's statically linked with libmagic and I'm getting this error:

    third-party/libscan/third-party/ext_libmobi/src/libmobi/src/.libs//libmobi.a(libmobi_la-buffer.o): In function `buffer_init':
    /root/agent/work/f1ce04c709a195f3/third-party/libscan/third-party/ext_libmobi/src/libmobi/src/buffer.c:26: multiple definition of `buffer_init'
    /vcpkg/installed/x64-linux/lib/libmagic.a(buffer.o):buffer.c:(.text+0x0): first defined here
    collect2: error: ld returned 1 exit status
    

    buffer_init in libmagic: file.h

    I managed to work around this by renaming the function to mobi_buffer_init in my fork. I'd appreciate we could merge this upstream (if you really want to keep the name it's fine but it would make my life much easier)

    Thanks!

  • Mobi file can't parse

    Mobi file can't parse

  • Export some useful api

    Export some useful api

    Hi Bartek, I'm currently writing a dictionary software and want to add mobi dict support to it. I'm glad to find there is such a great mobi library! This PR is basically what I found i ismail's mobdict project, so credits to him. It adds two useful functions to get the start and length of an entry. And since getting the error string is such a useful function I also exports libmobi_msg. Let me know what do you think of it. :)

  • README question: can libmobi also create new documents from scratch?

    README question: can libmobi also create new documents from scratch?

    The README lists a lot of features, but they're all apparently centered around reading or modifying an existing file.

    Can libmobi also create new ebooks from scratch? (For use in an EPUB->MOBI conversion software) If yes, maybe another bullet point in the README clarifying that would be useful :slightly_smiling_face:

    Thanks for creating this cool library!

  • convert azw3 ebook to epub error

    convert azw3 ebook to epub error

    printf("Could not initialize zip archive\n"); Here is the link to the file I tested. https://1drv.ms/u/s!AkaVccfysLmAhI5Odqj2pZ1QCMci6g?e=U9lkC3

  • add CMake support

    add CMake support

    This PR adds very basic CMake support -- it just compiles the source code to a static library. It uses zlib and libxml2 libraries on Unix-like OSs, and internal implementations otherwise. I've tested it on Linux and Windows (MinGW). Because I have no experience with autoconf tools, it's difficult for me to came up with a complete configure.ac equivalent CMakeLists.txt right now. My thought is to have basic support now and improve it later.

  • I am confused about a function.

    I am confused about a function.

    _buffer_get_varlen I am puzzled by this function, why should I read 7 bit, Stops when byte has bit 7 set, I am also confused about this condition. Should not be a step-by-step read 8 bit

  • AddressSanitizer: heap-buffer-overflow at buffer.c:212

    AddressSanitizer: heap-buffer-overflow at buffer.c:212

    We found with our fuzzer several heap-buffer-overflow errors when compiling libmobi with address sanitizer and run with the command mobitool -i7m $file. Someone else also found a few others here.

    We will list them separately in the following issue threads and this is the 1st one.

    POC (proof-of-crash) files: https://github.com/ntu-sec/pocs/blob/master/libmobi/hbo_buffer.c%3A212_1.mobi https://github.com/ntu-sec/pocs/blob/master/libmobi/hbo_buffer.c%3A212_2.mobi

    gdb output: https://github.com/ntu-sec/pocs/blob/master/libmobi/hbo_buffer.c:212_1.mobi.gdb.txt https://github.com/ntu-sec/pocs/blob/master/libmobi/hbo_buffer.c:212_2.mobi.gdb.txt

  • an issue with the function implementation of mobi_buffer_get_varlen_internal in src/buffer.c

    an issue with the function implementation of mobi_buffer_get_varlen_internal in src/buffer.c

    When you create a MOBIBuffer object:

        typedef struct {
        size_t offset; /**< Current offset in respect to buffer start */
        size_t maxlen; /**< Length of the buffer data */
        unsigned char *data; /**< Pointer to buffer data */
        MOBI_RET error; /**< MOBI_SUCCESS = 0 if operation on buffer is successful, non-zero value on failure */
    } MOBIBuffer;
    

    the initial value of buf->offset is 0:

    MOBIBuffer * mobi_buffer_init_null(unsigned char *data, const size_t len) {
        MOBIBuffer *buf = malloc(sizeof(MOBIBuffer));
        if (buf == NULL) {
            debug_print("%s", "Buffer allocation failed\n");
            return NULL;
        }
        buf->data = data;
        buf->offset = 0;
        buf->maxlen = len;
        buf->error = MOBI_SUCCESS;
        return buf;
    }
    

    I think there is a problem calling mobi_buffer_get_varlen_internal when direction is -1(read buffer backwards) with a value of buf->offset that is 3. If buf->offset is 3, it should Reads maximum 4 bytes from the buffer. Stops when byte has bit 7 set. so it should read byte number 3, byte number 2, byte number 1, and then byte number 0. but when it comes to read byte number 0, we can see the following check at line 267: if (buf->offset < 1) { it checks if zero is less than 1 and it is, so an error is printed and only the last 3 bytes that have been read return and not the 4. (even though according to pull request it should return 0)

    if it needs to read byte number 0 - it should read it and then return without decrementing buf->offset of 0 because if it does it, it will lead to an integer underflow and we will get the max value for size_t in buf->offset, so I suggest checking if it is 0 after reading the byte to the value byte and after updating the value of val, and if buf->offset is 0, we should check byte_count and according to that decide whether to execute

                    debug_print("%s", "End of buffer\n");
                    buf->error = MOBI_BUFFER_END;
                    return 0;
    

    or to set byte to stop_flag so it will stop reading and return val, while keeping buf->offset at 0,

Minimal Linux Live (MLL) is a tiny educational Linux distribution, which is designed to be built from scratch by using a collection of automated shell scripts. Minimal Linux Live offers a core environment with just the Linux kernel, GNU C library, and Busybox userland utilities.
Minimal Linux Live (MLL) is a tiny educational Linux distribution, which is designed to be built from scratch by using a collection of automated shell scripts. Minimal Linux Live offers a core environment with just the Linux kernel, GNU C library, and Busybox userland utilities.

Minimal Linux Live (MLL) is a tiny educational Linux distribution, which is designed to be built from scratch by using a collection of automated shell scripts. Minimal Linux Live offers a core environment with just the Linux kernel, GNU C library, and Busybox userland utilities.

Jan 8, 2023
C++ standard library reference

Information This is source package for Cppreference C++ standard library reference documentation available at http://en.cppreference.com. If there is

Dec 17, 2022
A cheatsheet of modern C++ language and library features.

C++20/17/14/11 Overview Many of these descriptions and examples come from various resources (see Acknowledgements section), summarized in my own words

Jan 6, 2023
A library of language lexers for use with Scintilla

README for Lexilla library. The Lexilla library contains a set of lexers and folders that provides support for programming, mark-up, and data languag

Jan 1, 2023
This is a simple UNITEST to test the implementation of the the various container types of the C++ standard template library

ft_container UNITest. This is a simple UNITEST to test the implementation of the the various container types of the C++ standard template library that

Dec 27, 2022
Feature-rich C99 library for memory scanning purposes, designed for Windows running machines, meant to work on both 32-bit and 64-bit portable executables. Has a modern C++ wrapper.

memscan Feature-rich C99 library for memory scanning purposes, designed for Windows running machines, meant to work on both 32-bit and 64-bit portable

Oct 2, 2022
A single file C++ header-only minizip wrapper library

cpp-zipper A single file C++ header-only minizip wrapper library This code is based on 'Making MiniZip Easier to Use' by John Schember. https://nachti

Dec 18, 2022
C++20 Concepts IO library which is 10x faster than stdio and iostream

fast_io fast_io is a new C++20 library for extremely fast input/output and aims to replace iostream and cstdio. It is header-only (module only in the

Feb 16, 2022
Modern, header-only, compact and cross platform C++ network/sockets library

cpp-net-lib Modern, header-only, compact and cross-platform C++ network/sockets library. Don't mind the crappy name, I suck at naming things. Why? I n

Jul 20, 2022
Instant compile time C++ 11 metaprogramming library

Brigand Meta-programming library Introduction Brigand is a light-weight, fully functional, instant-compile time C++ 11 meta-programming library. Every

Dec 15, 2022
C++98 library that encapsulates memory-mapped-files for POSIX or Windows

Memory-Mapped File C++ Library Tutorial and Reference Purpose This is a library, for the C++98 language and its successive versions, to handle files a

Dec 28, 2022
Connect 4 clone written with c++ with the RSGL library. Based on my connect 4 clone written in python/pygame and my SDL port of that same repo. Along with 3DS support by SaCode

RSGL-Connect-4 Building linux git clone https://github.com/RSGL-Org/RSGL-Connect-4.git cd RSGL-Connect-4 make ./Connect4 Bulding 3ds (3ds support

Dec 28, 2022
C++ Type Traits for Smart Pointers that are not included in the standard library, containing inheritance detection and member detection.
C++ Type Traits for Smart Pointers that are not included in the standard library, containing inheritance detection and member detection.

Smart Pointer Type Trait ?? A simple, header-only cpp library implementing smart pointer type traits. You can easily compile your code diffrently depe

Sep 14, 2022
High performance library for creating, modiyfing and parsing PDF files in C++

Welcome to PDF-Writer. A Fast and Free C++ Library for Creating, Parsing an Manipulating PDF Files and Streams. Documentation is available here. Proje

Dec 30, 2022
The module for my life story archive that gives data and statistics for the family Kindle Fire.

By: Top README.md Read this article in a different language Sorted by: A-Z Sorting options unavailable ( af Afrikaans Afrikaans | sq Shqiptare Albania

Oct 24, 2022
A Haskell library for fast decoding of JSON documents using the simdjson C++ library

hermes A Haskell interface over the simdjson C++ library for decoding JSON documents. Hermes, messenger of the gods, was the maternal great-grandfathe

Dec 5, 2022
All-in-one library and application for processing and rendering PDF documents.

All-in-one library and application for processing and rendering PDF documents. Contains document viewer/editor application, application for splitting/merging PDF documents and page manipulation, application for comparison of similar PDF documents.

Jan 1, 2023
🥑 ArangoDB is a native multi-model database with flexible data models for documents, graphs, and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions.
🥑 ArangoDB is a native multi-model database with flexible data models for documents, graphs, and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions.

?? ArangoDB is a native multi-model database with flexible data models for documents, graphs, and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions.

Jan 9, 2023
A simple YAML parser which produces a Node Tree Object representation of YAML Documents

A simple YAML parser which produces a Node Tree Object representation of YAML Documents and includes a find method to locate individual Nodes within the parsed Node Tree.

Sep 18, 2022
Extract image files from Microsoft Word documents!

docimg Extract image files from Microsoft Word documents! Build This project depends on libzip. You will need to link the library yourself. On Linux,

Nov 16, 2022