Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

Vowpal Wabbit

Linux build status Windows build status MacOS build status

codecov Total Alerts Gitter chat

This is the Vowpal Wabbit fast online learning code.

Why Vowpal Wabbit?

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning. There is a specific focus on reinforcement learning with several contextual bandit algorithms implemented and the online nature lending to the problem well. Vowpal Wabbit is a destination for implementing and maturing state of the art algorithms with performance in mind.

  • Input Format. The input format for the learning algorithm is substantially more flexible than might be expected. Examples can have features consisting of free form text, which is interpreted in a bag-of-words way. There can even be multiple sets of free form text in different namespaces.
  • Speed. The learning algorithm is fast -- similar to the few other online algorithm implementations out there. There are several optimization algorithms available with the baseline being sparse gradient descent (GD) on a loss function.
  • Scalability. This is not the same as fast. Instead, the important characteristic here is that the memory footprint of the program is bounded independent of data. This means the training set is not loaded into main memory before learning starts. In addition, the size of the set of features is bounded independent of the amount of training data using the hashing trick.
  • Feature Interaction. Subsets of features can be internally paired so that the algorithm is linear in the cross-product of the subsets. This is useful for ranking problems. The alternative of explicitly expanding the features before feeding them into the learning algorithm can be both computation and space intensive, depending on how it's handled.

Visit the wiki to learn more.

Getting Started

For the most up to date instructions for getting started on Windows, MacOS or Linux please see the wiki. This includes:

Comments
  • C# refactoring, memory leak fixes, general goodness,...

    C# refactoring, memory leak fixes, general goodness,...

    fixed Runtime library mismatch between zlib, libvw, vw.exe, VowpalWabbitCore.dll (CLR),... by using zlib/boost nuget provided msbuild targets included Visual Leak Detector for memory leak detection on windows refactored C# API to allow users to dynamically constructor serializers based on alternate descriptions (not just on static annotations) string marshalling is compatible to command line (either escaping or splitting) schema based pre-hashing: if hash can be determine from schema it's only generated once and re-used for each example. added type extension API for marshalling allow user to generate native and string examples in parallel in both debug and release keep marshalling expression tree for debugging refactored marshalling expression tree generation to improve readability added sweeping helper improved C# label parsing extensibility added assembly signing fixed memory leaks in C# usage of VW fixed model hashing/reload interaction fixed handling of empty line examples within set of action dependent features fixed order issue when predicting ADF examples containing empty action dependent features fixed default namespace incompatibility (space vs. 0) improved RunTests to C# test wrapping (detects inter-test dependencies and input files) unit tests are run in test/ folder, thus no need copy all input files added user-supplied model id support

  • VWRegressor provides very different performance for  loss_function = 'quantile' , quantile_tau = 0.5 and loss_function = 'squared'

    VWRegressor provides very different performance for loss_function = 'quantile' , quantile_tau = 0.5 and loss_function = 'squared'

    Describe the bug

    VWRegressor provides very different performance for loss_function = 'quantile' , quantile_tau = 0.5 and loss_function = 'squared'

    loss_function = 'squared' - provides very GOOD low MAE loss_function = 'quantile' , quantile_tau = 0.5 - provides bad high MAE

    data is mixture categorical data and continues data : 600 rows like this

    image

    To Reproduce

            if 1:
                model = VWRegressor(convert_to_vw = False ,normalized = True, 
                                                               passes = passes, 
                                                                power_t = 0.5, #1.0,
                                                               readable_model = 'my_VW.model' , cache_file =  'my_VW.cache' ,
                                                               learning_rate = 2.3 , l2 = l2, l1=l1,
                                                               quadratic= 'CC' , cubic = 'CCC',
                                                                loss_function = 'quantile' , quantile_tau = 0.5)
                q=0
            else:
                model = VWRegressor(convert_to_vw = False ,normalized = True, 
                                                          passes = passes, 
                                                           power_t = 0.5, #1.0,
                                                          readable_model = 'my_VW.model' , cache_file =  'my_VW.cache' ,
                                                          learning_rate = 2.1, loss_function = 'squared' , l2 = l2, l1=l1,
                                                          quadratic= 'CC' , cubic = 'CCC' )
    

    Expected behavior

    my guess MAE for loss_function = 'quantile' , quantile_tau = 0.5 and loss_function = 'squared' should be very similar

    in addition loss_function = 'quantile' , quantile_tau = 0.9 and loss_function = 'quantile' , quantile_tau = 0.1 gives very wide confidence intervals - even no sense confidence intervals

    Observed Behavior

    How did VW behave? Please include any stack trace, log messages or crash logs.

    Environment

    What version of VW did you use? latest OS - windows 10

    Additional context

    do you have code example where VWRegressor loss_function = 'quantile' , quantile_tau = 0.9 and loss_function = 'quantile' , quantile_tau = 0.1

  • JNI Layer throws Exceptions when close method is called in parallel ON DIFFERENT MODELS

    JNI Layer throws Exceptions when close method is called in parallel ON DIFFERENT MODELS

    Problem

    In the JNI layer, when multiple passes are enabled (> 1) and an attempt is made to close separate models in parallel, exceptions can be thrown. This is true even though each model has it's own lock to guard all accesses to the native code paths. Only a global lock around calls to the model close methods seems to avoid this issue.

    I'm looking for help on identifying if any critical sections of the C code that can be guarded by a lock to avoid thread-safety issues. I'm not asking for the C code to lock. I just want help trying to figure out where to put the locks in the Java code that wraps the C code.

    Scope

    This seems to be in PR #1291 but was not fixed with PR #1295

    Discussion

    From empirical testing, it appears one of these lines seems to be the problem. I am wondering if any of these use global state. I am trying to figure out if we can lock only over a short critical section to avoid thread-safety issues.

    1. adjust_used_index(*vwInstance);
    2. vwInstance->do_reset_source = true;
    3. VW::start_parser(*vwInstance);
    4. LEARNER::generic_driver(*vwInstance);
    5. VW::end_parser(*vwInstance);

    Previous Conversation

    In PR #1295 there was the following conversation:

    @JohnLangford

    There should be zero shared state between multiple created VW objects. Is that what it's doing? (Creating multiple distinct VW objects?)

    @deaktator

    @JohnLangford. It looks like just one VW object. Each Java call does the following on the C side:

    vw* vwInstance = VW::initialize(env->GetStringUTFChars(command, NULL);

    @JohnLangford

    A single VW object can not be operated on in multiple threads because the code inside VW is not thread safe. If you want to have a model which is shared by multiple threads, you set this up more explicitly by initializing a new VW object with an existing model.

    @deaktator

    Hey @JohnLangford. We take care of multi-threaded access to VW by locking anywhere that requires access to the C code. The thread-safety issues I encountered before were on an incomplete version of the code that locked in the wrong place. When I run the tests in parallel, they seem to work just fine now. I ran them a bunch of times with forking in the tests and didn't see any issues.

    Tracking Down What's Happening

    It appears @jon-morra-zefr pretty much copied the C# code for multiple passes, so this seems like it might apply to C# as well. Both C# and JNI C++ code appear below as well as the calling code that blows up.

    I've seen a bunch of different errors that occur at the same spot. invalidated cache, malformed LDF feature exceptions, etc.

    Example Code That triggers exceptions

    // Doing this many times in parallel with no locks causes problems.
    
    	val vwJNI = VWLearners.create[VWTypedLearner[_]](vwLearnString)
    	// Learning in here using   vwJNI.learn
    
    	// PROBLEM AREA:
    	lock.lock()     // <== NEED GLOBAL LOCKING OR EXCEPTIONS THROWN
    	vwJNI.close()
    	lock.unlock()   // <== NEED GLOBAL LOCKING OR EXCEPTIONS THROWN
    

    Similarity of the C# and JNI C++ Code

    C# Code: vowpal_wabbit/cs/cli/vowpalwabbit.cpp

    void VowpalWabbit::RunMultiPass()
    { if (m_vw->numpasses > 1)
      { try
        { adjust_used_index(*m_vw);
          m_vw->do_reset_source = true;
          VW::start_parser(*m_vw);
          LEARNER::generic_driver(*m_vw);
          VW::end_parser(*m_vw);
        }
        CATCHRETHROW
      }
    }
    

    JNI C++ Code: vowpal_wabbit/java/src/main/c++/vowpalWabbit_learner_VWLearners.cc

    JNIEXPORT void JNICALL Java_vowpalWabbit_learner_VWLearners_performRemainingPasses(JNIEnv *env, jclass obj, jlong vwPtr)
    { try
      { vw* vwInstance = (vw*)vwPtr;
        if (vwInstance->numpasses > 1)
          { adjust_used_index(*vwInstance);
            vwInstance->do_reset_source = true;
            VW::start_parser(*vwInstance);
            LEARNER::generic_driver(*vwInstance);
            VW::end_parser(*vwInstance);
          }
      }
      catch(...)
      { rethrow_cpp_exception_as_java_exception(env);
      }
    }
    

    Any thoughts?

  • Continuous actions

    Continuous actions

    This is the preliminary PR for continuous actions.

    This includes, cats_tree (continuous action tree with smoothing) algorithm, converting between PMF (discrete) to PDF (continuous) distribution, sampling form continuous PDF, etc.

    The code is for the paper available at https://arxiv.org/pdf/2006.06040.pdf

    We will add more details.

  • Bug fixes

    Bug fixes

    Fixed NRE on empty hashes Skip model load/initialize when seeding from in-memory model Fixed progressive validation in Azure trainer Includes mixed JSON string and JSON direct support Includes native C++ JSON parsing

  • Try coveralls

    Try coveralls

    Added 3 new make targets: vw_gcov, library_example_gcov, test_gcov which build vw and the examples with GCOV support, then run tests. This allows coveralls to analyze test coverage in the source code, but slows the tests down signifigantly. I also edited the travis .yml file to upload the results to coveralls.io and added the badge to the readme.

    Someone will need to setup a coveralls account for the main VW project and point the badge in the readme to that badge. Currently the coveralls badge points only to my fork.

  • Pandas to vw text format

    Pandas to vw text format

    1. Overview

    The goal of this PR is to fix the issue #2308.

    The PR introduces a new class DFToVW in vowpalwabbit.pyvw that takes as input the pandas.DataFrame and special types (SimpleLabel, Feature, Namespace) that specify the desired VW conversion.

    These classes make extensive use of a class Col that refers to a given column in the user specified dataframe.

    A simpler interface DFtoVW.from_colnames also be used for the simple use-cases. The main benefit is that the user need not use the specific types.


    Below are some usages of this class. They all rely on the following pandas.DataFrame called df :

      house_id  need_new_roof  price  sqft   age  year_built
    0      id1              0   0.23  0.25  0.05        2006
    1      id2              1   0.18  0.15  0.35        1976
    2      id3              0   0.53  0.32  0.87        1924
    

    2. Simple usage using DFtoVW.from_colnames

    Let say we want to build a VW dataset with the target need_new_roof and the feature age :

    from vowpalwabbit.pyvw import DFtoVW
    conv = DFtoVW.from_colnames(y="need_new_roof", x=["age", "year_built"], df=df)
    

    Then we can use the method process_df:

    conv.process_df()
    

    that outputs the following list:

    ['0 | 0.05 2006', '1 | 0.35 1976', '0 | 0.87 1924']
    

    This list can then directly be consumed by the method pyvw.model.learn.

    3. Advanced usages using default constructor

    The class DFtoVW also allow the following patterns in its default constructor :

    • tag
    • (named) namespaces, with scaling factor
    • (named) features, with constant feature possible

    To use these more complex patterns we need to import them using:

    from vowpalwabbit.pyvw import SimpleLabel, Namespace, Feature, Col
    

    3.1. Named namespace with scaling, and named feature

    Let's create a VW dataset that include a named namespace (with scaling) and a named feature:

    conv = DFtoVW(
            df=df,
            label=SimpleLabel(Col("need_new_roof")),
            namespaces=Namespace(name="Imperial", value=0.092, features=Feature(value=Col("sqft"), name="sqm"))
            )
    conv.process_df()
    

    which yields:

    ['0 |Imperial:0.092 sqm:0.25',
     '1 |Imperial:0.092 sqm:0.15',
     '0 |Imperial:0.092 sqm:0.32']
    

    3.2. Multiple namespaces, multiple features, and tag

    Let's create a more complex example with a tag and multiples namespaces with multiples features.

    conv = DFtoVW(
            df=df, 
            label=SimpleLabel(Col("need_new_roof")),
            tag=Col("house_id"),
            namespaces=[
                    Namespace(name="Imperial", value=0.092, features=Feature(value=Col("sqft"), name="sqm")),
                    Namespace(name="DoubleIt", value=2, features=[Feature(value=Col("price")), Feature(Col("age"))])
                    ]
            )
    conv.process_df()
    

    which yields:

    ['0 id1|Imperial:0.092 sqm:0.25 |DoubleIt:2 0.23 0.05',
     '1 id2|Imperial:0.092 sqm:0.15 |DoubleIt:2 0.18 0.35',
     '0 id3|Imperial:0.092 sqm:0.32 |DoubleIt:2 0.53 0.87']
    

    4. Implementation details

    • The class DFtoVW and the specific types are located in vowpalwabbit/pyvw.py. The class only depends on the pandas module.
    • the code includes docstrings
    • 8 tests are included in tests/test_pyvw.py

    5. Extensions

    • This PR does not yet handle multilines and more complex label types.
    • To convert very large dataset that can't fit in RAM, one can make use of the pandas import option chunksize and process each chunk at a time. I could also implement this functionnality directly in the class using generator. The generator would then be consumed by either a VW learning interface or could be written to external file (for conversion purpose only).
  • Test don't pass on Mac OS 10.10

    Test don't pass on Mac OS 10.10

    Mac os 10.10 and boost version boost-1.58.0

    gcc --version Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn) Target: x86_64-apple-darwin14.0.0 Thread model: posix

    Get some warnings and test 16 don't pass.

    Here is full log:

    make
    cd vowpalwabbit; /Library/Developer/CommandLineTools/usr/bin/make -j 4 things
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c main.cc -o main.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c global_data.cc -o global_data.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c parse_regressor.cc -o parse_regressor.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c parse_primitives.cc -o parse_primitives.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c unique_sort.cc -o unique_sort.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c cache.cc -o cache.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c rand48.cc -o rand48.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c simple_label.cc -o simple_label.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c multiclass.cc -o multiclass.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c oaa.cc -o oaa.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c multilabel_oaa.cc -o multilabel_oaa.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c boosting.cc -o boosting.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c ect.cc -o ect.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c autolink.cc -o autolink.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c binary.cc -o binary.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c lrq.cc -o lrq.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c cost_sensitive.cc -o cost_sensitive.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c multilabel.cc -o multilabel.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c label_dictionary.cc -o label_dictionary.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c csoaa.cc -o csoaa.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c cb.cc -o cb.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c cb_adf.cc -o cb_adf.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c cb_algs.cc -o cb_algs.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c search.cc -o search.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c search_meta.cc -o search_meta.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c search_sequencetask.cc -o search_sequencetask.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c search_dep_parser.cc -o search_dep_parser.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c search_hooktask.cc -o search_hooktask.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c search_multiclasstask.cc -o search_multiclasstask.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c search_entityrelationtask.cc -o search_entityrelationtask.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c search_graph.cc -o search_graph.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c parse_example.cc -o parse_example.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c scorer.cc -o scorer.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c network.cc -o network.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c parse_args.cc -o parse_args.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c accumulate.cc -o accumulate.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c gd.cc -o gd.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c learner.cc -o learner.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c lda_core.cc -o lda_core.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c gd_mf.cc -o gd_mf.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c mf.cc -o mf.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c bfgs.cc -o bfgs.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c noop.cc -o noop.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c print.cc -o print.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c example.cc -o example.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c parser.cc -o parser.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c loss_functions.cc -o loss_functions.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c sender.cc -o sender.o
    parser.cc:452:26: warning: 'daemon' is deprecated: first deprecated in OS X 10.5
          [-Wdeprecated-declarations]
          if (!all.active && daemon(1,1))
                             ^
    /usr/include/stdlib.h:267:6: note: 'daemon' has been explicitly marked
          deprecated here
    int      daemon(int, int) __DARWIN_1050(daemon) __OSX_AVAILABLE_BUT_DEPR...
             ^
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c nn.cc -o nn.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c bs.cc -o bs.o
    1 warning generated.
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c cbify.cc -o cbify.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c topk.cc -o topk.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c stagewise_poly.cc -o stagewise_poly.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c log_multi.cc -o log_multi.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c active.cc -o active.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c kernel_svm.cc -o kernel_svm.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c best_constant.cc -o best_constant.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c ftrl.cc -o ftrl.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c svrg.cc -o svrg.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c lrqfa.cc -o lrqfa.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c interact.cc -o interact.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c comp_io.cc -o comp_io.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c interactions.cc -o interactions.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c vw_exception.cc -o vw_exception.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c allreduce.cc -o allreduce.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -o active_interactor active_interactor.cc
    ar rcs liballreduce.a allreduce.o
    ar rcs libvw.a hash.o global_data.o io_buf.o parse_regressor.o parse_primitives.o unique_sort.o cache.o rand48.o simple_label.o multiclass.o oaa.o multilabel_oaa.o boosting.o ect.o autolink.o binary.o lrq.o cost_sensitive.o multilabel.o label_dictionary.o csoaa.o cb.o cb_adf.o cb_algs.o search.o search_meta.o search_sequencetask.o search_dep_parser.o search_hooktask.o search_multiclasstask.o search_entityrelationtask.o search_graph.o parse_example.o scorer.o network.o parse_args.o accumulate.o gd.o learner.o lda_core.o gd_mf.o mf.o bfgs.o noop.o print.o example.o parser.o loss_functions.o sender.o nn.o bs.o cbify.o topk.o stagewise_poly.o log_multi.o active.o kernel_svm.o best_constant.o ftrl.o svrg.o lrqfa.o interact.o comp_io.o interactions.o vw_exception.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -o vw main.o -L. -l vw -l allreduce -L /usr/local/lib -lboost_program_options-mt -lboost_serialization-mt -l pthread -l z
    cd cluster; /Library/Developer/CommandLineTools/usr/bin/make
    /usr/bin/clang++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c spanning_tree.cc -o spanning_tree.o
    spanning_tree.cc:161:9: warning: 'daemon' is deprecated: first deprecated in OS
          X 10.5 [-Wdeprecated-declarations]
        if (daemon(1,1))
            ^
    /usr/include/stdlib.h:267:6: note: 'daemon' has been explicitly marked
          deprecated here
    int      daemon(int, int) __DARWIN_1050(daemon) __OSX_AVAILABLE_BUT_DEPR...
             ^
    1 warning generated.
    /usr/bin/clang++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -o spanning_tree spanning_tree.o 
    cd library; /Library/Developer/CommandLineTools/usr/bin/make things
    /usr/bin/g++ -g -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -o ezexample_predict ezexample_predict.cc -L ../vowpalwabbit -l vw -l allreduce -L /usr/local/lib -lboost_program_options-mt -lboost_serialization-mt -l pthread -l z
    /usr/bin/g++ -g -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -o ezexample_train ezexample_train.cc -L ../vowpalwabbit -l vw -l allreduce -L /usr/local/lib -lboost_program_options-mt -lboost_serialization-mt -l pthread -l z
    /usr/bin/g++ -g -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -o library_example library_example.cc -L ../vowpalwabbit -l vw -l allreduce -L /usr/local/lib -lboost_program_options-mt -lboost_serialization-mt -l pthread -l z
    /usr/bin/g++ -g -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -o recommend recommend.cc -L ../vowpalwabbit -l vw -l allreduce -L /usr/local/lib -lboost_program_options-mt -lboost_serialization-mt -l pthread -l z
    /usr/bin/g++ -g -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -o gd_mf_weights gd_mf_weights.cc -L ../vowpalwabbit -l vw -l allreduce -L /usr/local/lib -lboost_program_options-mt -lboost_serialization-mt -l pthread -l z
    /usr/bin/g++ -g -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -o test_search test_search.cc -L ../vowpalwabbit -l vw -l allreduce -L /usr/local/lib -lboost_program_options-mt -lboost_serialization-mt -l pthread -l z
    
    make test
    cd vowpalwabbit; /Library/Developer/CommandLineTools/usr/bin/make -j 4 things
    make[1]: Nothing to be done for `things'.
    cd library; /Library/Developer/CommandLineTools/usr/bin/make things
    make[1]: Nothing to be done for `things'.
    vw running test-suite...
    (cd test && ./RunTests -d -fe -E 0.001 ../vowpalwabbit/vw ../vowpalwabbit/vw)
    Testing on: hostname=air-mac OS=darwin
    Testing vw: ../vowpalwabbit/vw
    Testing lda: ../vowpalwabbit/vw
    RunTests: '-D' to see any diff output
    RunTests: '-o' to force overwrite references
    RunTests: test 1: stderr OK
    RunTests: test 2: stderr OK
    RunTests: test 2: predict OK
    RunTests: test 3: stderr OK
    RunTests: test 4: stdout OK
    RunTests: test 4: stderr OK
    RunTests: test 5: stderr OK
    RunTests: test 6: stderr OK
    RunTests: test 6: minor (<0.001) precision differences ignored
    RunTests: test 6: predict OK
    RunTests: test 7: stderr OK
    RunTests: test 8: stderr OK
    RunTests: test 8: minor (<0.001) precision differences ignored
    RunTests: test 8: predict OK
    RunTests: test 9: stderr OK
    RunTests: test 9: predict OK
    RunTests: test 10: stderr OK
    RunTests: test 10: predict OK
    RunTests: test 11: stderr OK
    RunTests: test 12: stderr OK
    RunTests: test 13: stderr OK
    RunTests: test 14: stdout OK
    RunTests: test 14: minor (<0.001) precision differences ignored
    RunTests: test 14: stderr OK
    RunTests: test 15: stdout OK
    RunTests: test 15: stderr OK
    RunTests: test 16: stdout OK
    --- diff -u --minimal train-sets/ref/rcv1_small.stderr stderr.tmp
    --- train-sets/ref/rcv1_small.stderr    2015-08-13 00:22:20.000000000 +0300
    +++ stderr.tmp  2015-08-13 00:33:33.000000000 +0300
    @@ -17,7 +17,7 @@
      5 0.47879     0.00006     0.00617      0.595892   0.183063                            0.47184     1.00000   
      6 0.47750     0.00000     0.00221      0.703360   0.403715                            0.68626     1.00000   
      7 0.47680     0.00000     0.00038      0.588395   0.175459                            0.08911     1.00000   
    - 8 0.47671     0.00000     0.00002      0.568445   0.136827                            0.00444     1.00000   
    + 8 0.47671     0.00000     0.00002      0.568443   0.136827                            0.00444     1.00000   
    
     finished run
     number of examples = 8000
    RunTests: test 16: FAILED: ref(train-sets/ref/rcv1_small.stderr) != stderr(stderr.tmp)
        cmd: ../vowpalwabbit/vw -k -c -d train-sets/rcv1_small.dat --loss_function=logistic -b 20 --bfgs --mem 7 --passes 20 --termination 0.001 --l2 1.0 --holdout_off
    
    
  • Trying to upgrade from vw-jni-8.2.0 to something close to vw-jni-8.4.1-SNAPSHOT

    Trying to upgrade from vw-jni-8.2.0 to something close to vw-jni-8.4.1-SNAPSHOT

    Using VW 8.4.0 installed with brew I created a simple test set initializing VW with

    $ vw --csoaa 10  -b 24  --l2 0.0  -l 0.1  -c -k --passes 100  -f /Users/pat/big-data/harness/models/test_resource  --save_resume
    

    Then I paste examples in:

    0:0.0 1:1.0 | user_user_2 testGroupId_1 
    0:0.0 1:1.0 | user_user_2 testGroupId_1 
    0:0.0 1:1.0 | user_user_2 testGroupId_1 
    0:0.0 1:1.0 | user_user_2 testGroupId_1 
    0:1.0 1:0.0 | user_user_1 testGroupId_1 
    0:1.0 1:0.0 | user_user_1 testGroupId_1 
    0:1.0 1:0.0 | user_user_1 testGroupId_1 
    0:1.0 1:0.0 | user_user_1 testGroupId_1 
    save_
    

    at the save_ the file Users/pat/big-data/harness/models/test_resource is updated—all is well.

    Using the last available JNI binary wrapper for 8.2.0 doing the same thing from Java does not update the model file. I'm not running in --quiet mode and there is no complaint from VW.

    The save_ pseudo example does apparently work for 8.2.0 since another user is using it in CLI and daemon mode.

    Is this feature not supported with JNI?

    On the advice of @arielf it appears I need something like vw-jni-8.4.1-SNAPSHOT so trying to build for dev machine (MBP) and deploy machine (ubuntu). Dev machine first.

  • Python wrapper installation fails

    Python wrapper installation fails

    I'm not able to pip install vowpalwabbit to install the python wrapper. I don't know enough to understand why it's failing, but I thought it might be worth bringing to someone's attention.

    I'm on OSX and using an Anaconda environment. I installed vowpal wabbit from homebrew.

    Here's my traceback:

    Collecting vowpalwabbit
      Using cached vowpalwabbit-8.2.0.tar.gz
    Building wheels for collected packages: vowpalwabbit
      Running setup.py bdist_wheel for vowpalwabbit ... error
      Complete output from command /Users/vvvvv/anaconda/envs/trendrank/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/vx/n21m77w12nd0tb9xwhfcgd75gpm86h/T/pip-build-74A7hN/vowpalwabbit/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" bdist_wheel -d /var/folders/vx/n21m77w12nd0tb9xwhfcgd75gpm86h/T/tmp5FosrOpip-wheel- --python-tag cp27:
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.macosx-10.5-x86_64-2.7
      creating build/lib.macosx-10.5-x86_64-2.7/vowpalwabbit
      copying vowpalwabbit/__init__.py -> build/lib.macosx-10.5-x86_64-2.7/vowpalwabbit
      copying vowpalwabbit/pyvw.py -> build/lib.macosx-10.5-x86_64-2.7/vowpalwabbit
      copying vowpalwabbit/sklearn_vw.py -> build/lib.macosx-10.5-x86_64-2.7/vowpalwabbit
      running egg_info
      writing vowpalwabbit.egg-info/PKG-INFO
      writing top-level names to vowpalwabbit.egg-info/top_level.txt
      writing dependency_links to vowpalwabbit.egg-info/dependency_links.txt
      warning: manifest_maker: standard file '-c' not found
    
      reading manifest file 'vowpalwabbit.egg-info/SOURCES.txt'
      reading manifest template 'MANIFEST.in'
      warning: no previously-included files matching '*.o' found anywhere in distribution
      warning: no previously-included files matching '*.exe' found anywhere in distribution
      warning: no previously-included files matching '*.pyc' found anywhere in distribution
      writing manifest file 'vowpalwabbit.egg-info/SOURCES.txt'
      running build_ext
      Traceback (most recent call last):
        File "<string>", line 1, in <module>
        File "/private/var/folders/vx/n21m77w12nd0tb9xwhfcgd75gpm86h/T/pip-build-74A7hN/vowpalwabbit/setup.py", line 184, in <module>
          tests_require=['tox'],
    
    [...]
    
        File "/private/var/folders/vx/n21m77w12nd0tb9xwhfcgd75gpm86h/T/pip-build-74A7hN/vowpalwabbit/setup.py", line 38, in find_boost
          raise Exception('Could not find boost python library')
      Exception: Could not find boost python library
    
      ----------------------------------------
      Failed building wheel for vowpalwabbit
      Running setup.py clean for vowpalwabbit
    Failed to build vowpalwabbit
    Installing collected packages: vowpalwabbit
      Running setup.py install for vowpalwabbit ... error
        Complete output from command /Users/vvvvv/anaconda/envs/trendrank/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/vx/n21m77w12nd0tb9xwhfcgd75gpm86h/T/pip-build-74A7hN/vowpalwabbit/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /var/folders/vx/n21m77w12nd0tb9xwhfcgd75gpm86h/T/pip-4GRRiq-record/install-record.txt --single-version-externally-managed --compile:
        running install
        running build
        running build_py
        creating build
        creating build/lib.macosx-10.5-x86_64-2.7
        creating build/lib.macosx-10.5-x86_64-2.7/vowpalwabbit
        copying vowpalwabbit/__init__.py -> build/lib.macosx-10.5-x86_64-2.7/vowpalwabbit
        copying vowpalwabbit/pyvw.py -> build/lib.macosx-10.5-x86_64-2.7/vowpalwabbit
        copying vowpalwabbit/sklearn_vw.py -> build/lib.macosx-10.5-x86_64-2.7/vowpalwabbit
        running egg_info
        creating vowpalwabbit.egg-info
        writing vowpalwabbit.egg-info/PKG-INFO
        writing top-level names to vowpalwabbit.egg-info/top_level.txt
        writing dependency_links to vowpalwabbit.egg-info/dependency_links.txt
        writing manifest file 'vowpalwabbit.egg-info/SOURCES.txt'
        warning: manifest_maker: standard file '-c' not found
    
        reading manifest file 'vowpalwabbit.egg-info/SOURCES.txt'
        reading manifest template 'MANIFEST.in'
        warning: no files found matching '*' under directory 'src'
        warning: no previously-included files matching '*.o' found anywhere in distribution
        warning: no previously-included files matching '*.exe' found anywhere in distribution
        warning: no previously-included files matching '*.pyc' found anywhere in distribution
        writing manifest file 'vowpalwabbit.egg-info/SOURCES.txt'
        running build_ext
        make: *** No rule to make target `clean'.  Stop.
        Traceback (most recent call last):
          File "<string>", line 1, in <module>
          File "/private/var/folders/vx/n21m77w12nd0tb9xwhfcgd75gpm86h/T/pip-build-74A7hN/vowpalwabbit/setup.py", line 184, in <module>
            tests_require=['tox'],
    
    [...]
    
        subprocess.CalledProcessError: Command '['make', 'clean']' returned non-zero exit status 2
    
        ----------------------------------------
    Command "/Users/vvvvv/anaconda/envs/trendrank/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/vx/n21m77w12nd0tb9xwhfcgd75gpm86h/T/pip-build-74A7hN/vowpalwabbit/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /var/folders/vx/n21m77w12nd0tb9xwhfcgd75gpm86h/T/pip-4GRRiq-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/folders/vx/n21m77w12nd0tb9xwhfcgd75gpm86h/T/pip-build-74A7hN/vowpalwabbit/
    
  • Accept multiline examples in the JNI interface

    Accept multiline examples in the JNI interface

    This is a pretty major refactoring of the JNI layer. The impetus for this refactoring was the ability to accept multiline examples but it lead to a much larger change. The biggest change is the decoupling of the return type and the prediction function. This proved necessary to support all the different ways to extract data with --cb_explore.

    I am going to go over this in detail with @deaktator offline, so @JohnLangford let's hold off on merging this for now. If anyone else has any comments at this time they are certainly welcome.

  • build: Experiment with using nix to manage dev tooling starting with clang-tidy

    build: Experiment with using nix to manage dev tooling starting with clang-tidy

    Nix allows us to have a consistent developer environment. Perhaps the most useful instance of this is that the commands that are run locally use the exact same version of tooling as what is run in CI.

    Documentation:

    All args are passed through to the underlying run-clang-format call.

    To run clang-tidy on all files, as is done in CI you can run:

    nix develop -c vw-clang-tidy
    

    To just check specific files you can pass them as args:

    nix develop -c vw-clang-tidy vowpalwabbit/core/src/kskip_ngram_transformer.cc
    

    To apply fixes use -fix:

    nix develop -c vw-clang-tidy vowpalwabbit/core/src/kskip_ngram_transformer.cc -fix
    

    Using from Windows:

    • Using WSL - commands are identical as above
    • Using docker
      docker run -v %cd%:/src -it nixos/nix nix --extra-experimental-features nix-command --extra-experimental-features flakes develop /src -c vw-clang-tidy
      
  • perf: arm64 performance optimizations

    perf: arm64 performance optimizations

    Recently at @liftoff we have been leveraging arm64 machines to train our models using vowpal wabbit. Using arm64 in our infrastructure has helped us reduce the cost of our ML infrastructure.

    To get the most out of vowpal wabbit on these machines, we added arm64 compiler optimization and transported the SIMD instructions used in vowpal wabbit via sse2neon. To do that, we followed AWS guide on how to optimize builds for arm64 machines, there are probably more optimizations to apply which require a deeper knowledge of the training algorithm.

    These optimizations improved our ML pipeline time by roughly 20% (the pipeline contains steps other than training with vowpal wabbit so more experiments should be conducted if you are interested in getting the improvement on vowpal wabbit).

    The proper solution would be to fill the placeholder in lda_core.cc with the corresponding instructions but that would require a deeper understanding of the code.

  • How To interpret VW model file

    How To interpret VW model file

    Describe the bug

    I am trying to use vowpal wabbit in cats mode and my use case involves online learning. These are the steps involvedL

    1. Spawn new model using following args:
     vw = vowpalwabbit.Workspace(
                "--cats "
                + str(self.action)
                + "  --bandwidth "
                + str(self.bandwidth)
                + " --min_value {} --max_value {} --json --chain_hash --coin --epsilon {}".format(self.min_val,
                                                                                                  self.max_val,
                                                                                                  self.epsilon))
    
    1. To save the model, I use vw.save("model.vw")

    2. This saved model is used to serve predictions and then reloaded in backend with the args as: vw = vowpalwabbit.Workspace("-i {} --json -q :: --quiet".format(filename)) after which it is trained on latest data.

    3. Go to step 2

    For training each example, I use the following method to convert to cats json format as specified here

    def to_vw_example_json_format(context, cat_features, num_features, cats_label=None):
        example_dict = {}
        if cats_label is not None:
            bid, cost, pdf_value = cats_label
            example_dict["_label_ca"] = {
                "action": bid,
                "cost": cost,
                "pdf_value": pdf_value,
            }
        example_dict["c"] = {}
        for feature in cat_features:
            example_dict["c"]["{feature}={feature_value}".format(feature=feature, feature_value=str(context[feature]))] = 1
        for feature in num_features:
            example_dict["c"]["{feature}:{feature_value}".format(feature=feature, feature_value=float(context[feature]))] = 1
        return json.dumps(example_dict)
    

    And eventually this to train:

     txt_ex = to_vw_example_json_format(context, self.config.cat_features, self.config.num_features, (action, cost, pdf_value))
     vw_format = vw.parse(txt_ex, vowpalwabbit.LabelType.CONTINUOUS)
     vw.learn(vw_format)
     vw.finish_example(vw_format)
    

    And this to predict:

    vw_text_example = to_vw_example_format(context)
    return vw.predict(vw_text_example)
    

    When I tried to save a model using vw1 = vowpalwabbit.Workspace("-i model.vw --invert_hash model.human"),

    Pasting first few lines of model.human

    Version 9.1.0
    Id 
    Min label:-1
    Max label:1
    bits:18
    lda:0
    0 ngram:
    0 skip:
    options: --bandwidth 0.001953125 --binary --cats 512 --cats_pdf 512 --cats_tree 512 --cb_explore_pdf --chain_hash --coin --epsilon 0.200000002980232 --get_pmf --max_value 1 --min_value 0 --pmf_to_pdf 512 --quadratic :: --sample_pdf --tree_bandwidth 1 --random_seed 16723685891146756571
    Checksum: 1316569494
    :1
    initial_t 0
    norm normalizer 1.29372e+08
    t 126717
    sum_loss 0
    sum_loss_since_last_dump 0
    dump_interval 1
    min_label -1
    max_label 1
    weighted_labeled_examples 0
    weighted_labels 0
    weighted_unlabeled_examples 0
    example_number 0
    total_features 0
    total_weight 1.29461e+07
    sd::oec.weighted_labeled_examples 0
    current_pass 0
    l1_state 0
    l2_state 1
    14:0.00977461 40.954 40.954 1 2 20.477
    29:-0.00902544 -44.3535 44.3535 1 2 22.1767
    60:0.00896538 44.6506 44.6506 1 2 22.3253
    63:0.0315597 0.090195 0.090195 1 0.00992318 1
    82:0.00196092 204.145 204.145 1 2 102.073
    93:0 101.693 101.693 1 0 101.693
    100:-0.0129796 -30.84 30.84 1 2 15.42
    119:-0.0832061 -4.81091 4.81091 1 2 2.40546
    120:-0.0130022 -30.7865 30.7865 1 2 15.3932
    121:-0.0183661 -91.893 91.893 1 6.71185 30.8929
    124:-0.00315753 -310.143 310.143 1 7.66512 133.944
    126:-0.0312913 -12.7924 12.7924 1 2 6.39622
    127:0.00679762 -0.059022 2.3972 1 0.210799 1
    139:-0.0329065 -0.0895731 0.0895731 1 0 1
    152:-0.140299 -0.500891 0.500891 1 0.200642 1
    162:-0.00195466 -204.8 204.8 1 2 102.4
    164:-0.00195466 -204.8 204.8 1 2 102.4
    

    I wanted to understand the following things:

    1. How to interpret model.human file
    2. The exact role of chain_hash arg. This is not very clear in vw documentation.
    3. A brief about how given a context, and model.human file, how can I derive vw prediction.
    4. With each training iteration, output distribution of a model for the same context changes (which is expected). However, after training, for different contexts, I do not see variance in model predictions. Can someone please guide why this is happening

    Thanks

    How to reproduce

    Provided in description itself.

    Version

    9.1.0

    OS

    Linux

    Language

    Python

    Additional context

    No response

  • VW calls predict twice during learn for cb_explore_adf

    VW calls predict twice during learn for cb_explore_adf

    Describe the bug

    If you call vw with --cb_explore_adf and an example that does not have a label then predict will be called twice, this seems sub-optimal and un-intuitive for reductions above cb_explore_adf and can cause bugs

    e.g. when called with LAS, learn paths would not get full predictions if there is a label missing and -p <file> is specified (see #4273)

    If a label is missing during learn what should the reduction contract be? Should learn be called anyway? Should prediction be called? Should we warn? Should we throw?

    How to reproduce

    input modified from cb_test.ldf file:

    shared | s_1 s_2
     | a_1 b_1 c_1
     | a_2 b_2 c_2
     | a_3 b_3 c_3
    

    run with --cb_explore_adf and printing/breakpoint to see that predict gets called twice from cb_explore_adf_common.h

    code line

    Version

    9.5.0

    OS

    Linux

    Language

    CLI

    Additional context

    No response

In this tutorial, we will use machine learning to build a gesture recognition system that runs on a tiny microcontroller, the RP2040.
In this tutorial, we will use machine learning to build a gesture recognition system that runs on a tiny microcontroller, the RP2040.

Pico-Motion-Recognition This Repository has the code used on the 2 parts tutorial TinyML - Motion Recognition Using Raspberry Pi Pico The first part i

Nov 3, 2022
A flexible, high-performance serving system for machine learning models

XGBoost Serving This is a fork of TensorFlow Serving, extended with the support for XGBoost, alphaFM and alphaFM_softmax frameworks. For more informat

Nov 18, 2022
A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms.
A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms.

iNeural A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms. What is a Neural Network? Work on

Apr 5, 2022
Deep Scalable Sparse Tensor Network Engine (DSSTNE) is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models

Amazon DSSTNE: Deep Scalable Sparse Tensor Network Engine DSSTNE (pronounced "Destiny") is an open source software library for training and deploying

Nov 30, 2022
Lite.AI.ToolKit 🚀🚀🌟: A lite C++ toolkit of awesome AI models such as RobustVideoMatting🔥, YOLOX🔥, YOLOP🔥 etc.
Lite.AI.ToolKit 🚀🚀🌟:  A lite C++ toolkit of awesome AI models such as RobustVideoMatting🔥, YOLOX🔥, YOLOP🔥 etc.

Lite.AI.ToolKit ?? ?? ?? : A lite C++ toolkit of awesome AI models which contains 70+ models now. It's a collection of personal interests. Such as RVM, YOLOX, YOLOP, YOLOR, YoloV5, DeepLabV3, ArcFace, etc.

Nov 29, 2022
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

Nov 25, 2022
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

Nov 26, 2022
In-situ data analyses and machine learning with OpenFOAM and Python

PythonFOAM: In-situ data analyses with OpenFOAM and Python Using Python modules for in-situ data analytics with OpenFOAM 8. NOTE that this is NOT PyFO

Nov 12, 2022
A lightweight C++ machine learning library for embedded electronics and robotics.

Fido Fido is an lightweight, highly modular C++ machine learning library for embedded electronics and robotics. Fido is especially suited for robotic

Sep 19, 2022
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Dec 5, 2022
Python Inference Script is a Python package that enables developers to author machine learning workflows in Python and deploy without Python.
Python Inference Script is a Python package that enables developers to author machine learning workflows in Python and deploy without Python.

Python Inference Script(PyIS) Python Inference Script is a Python package that enables developers to author machine learning workflows in Python and d

Nov 4, 2022
SHARK - High Performance Machine Learning for CPUs, GPUs, Accelerators and Heterogeneous Clusters

SHARK Communication Channels GitHub issues: Feature requests, bugs etc Nod.ai SHARK Discord server: Real time discussions with the nod.ai team and oth

Dec 1, 2022
An Open Source Machine Learning Framework for Everyone
An Open Source Machine Learning Framework for Everyone

Documentation TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, a

Nov 26, 2022
Distributed machine learning platform

Veles Distributed platform for rapid Deep learning application development Consists of: Platform - https://github.com/Samsung/veles Znicz Plugin - Neu

Nov 9, 2022
An open source machine learning library for performing regression tasks using RVM technique.

Introduction neonrvm is an open source machine learning library for performing regression tasks using RVM technique. It is written in C programming la

May 31, 2022
Feature Store for Machine Learning
Feature Store for Machine Learning

Overview Feast (Feature Store) is an operational data system for managing and serving machine learning features to models in production. Please see ou

Dec 4, 2022
Machine Learning Platform for Kubernetes
Machine Learning Platform for Kubernetes

Reproduce, Automate, Scale your data science. Welcome to Polyaxon, a platform for building, training, and monitoring large scale deep learning applica

Nov 25, 2022
CNStream is a streaming framework for building Cambricon machine learning pipelines
CNStream is a streaming framework for building Cambricon  machine learning pipelines

CNStream is a streaming framework for building Cambricon machine learning pipelines

Nov 23, 2022