My Cryptocurrency Hackathon Strategy
On one fine afternoon, I was consuming LinkedIn and randomly stumbled upon a Cryptocurrency Hackathon post. I quickly checked the problem…
On one fine afternoon, I was consuming LinkedIn and randomly stumbled upon a Cryptocurrency Hackathon post. I quickly checked the problem statement and event location. I had a 100% match on the problem statement as it involved developing a fintech ML solution and it was happening just 400m from my home. I signed up without blinking twice.
This was going to be my 2nd Hackathon. The first one had left me with a sour experience. We were asked to use organiser’s API for collecting data and make predictive model over it. It was a 2 day hackathon and the API was broken for quite a time. When the API finally started working, the data was so scarce that nobody could make any ML model out of it. The only relief was that office space was great and food was tasty. Also, we were in a team of 3 — Me(A data scientist), a friend(Another data scientist) and a economist(who was an ideator and not a coder). Having 2 data scientist in a team led to chaos and replication of work. Having an ideator in team is of no value in a hackathon. You need people who can come up with ideas that can be developed in a limited time-frame and produce them. No doubt we had a disaster.
This time, I was sure of not having another data scientist in the team. Ideally I wanted a full-stack guy with practical mindset. The only such guy I knew was my room-mate. I felt that 2 of us are enough. In hackathons, less is more. Trust me.
We had 4 problem statements to choose from~

coinberg.tech
Arbitrage trading — I had grip on the concept of arbitrage but hadn’t implemented it. One of my friend has deployed this trading system and makes around $1000 every month. To actually prove that it works, you have to setup an account on 2 exchanges and make rules for trading so that the arbitrage is more than the transaction cost. You also need to take care of slippage so that you don’t end up making a loss. It’s all very straight-forward and doesn’t require any ML. Maybe you can train a ML model to understand the arbitrage trading opportunities. But I was not sure. Hence I just passed on this problem statement.

The red is price of crypto1 on exchange1 and blue is price of crypto1 on exchange2. Since blue is less than red at the shown time t, we can long(buy&sell) on blue and short(sell&buy) on red. The assumption is that both the price should be as close as possible.
Sentiment analysis — Well, it will be hard to find a data scientist who has not worked or read on sentiment analysis. It is the darling project of everyone. I myself have worked exhaustively in NLP and could have easily taken to solve this problem statement. But there were issues. Firstly, I know for a fact that it is not easy to collect the data required to train the model. There is no financial text data available for positive and negative sentences. Interestingly, I have the data. I work in a finance research company Morningstar and we have tons of textual and numerical data. I had recently trained a Tensorflow model for this very specific task on the most clean tagged data possible. I could have used it and tried to correlate price fluctuations with sentiment. But that would have been a breach of data privacy and hence I just dropped the idea. Also, there was no other way to collect such clean tagged data from anywhere else. I made a cognizant decision that I will prefer loosing the competition rather than spoil mine and organisation’s name for hiring an unethical data scientist.
Portfolio management — This is an old problem to solve. You just select a bunch of cryptos and use a library to do interior point optimisation — Markowitz frontier. Recently I had also been reading on doing portfolio optimisation using Reinforcement Learning but it requires tons of computational resources and you cannot be sure if the solution will converge. I didn’t want to work on a boring old idea and felt uncomfortable with RL. Hence I was not keen to work on this problem but it was my 2nd choice.

Trend forecasting — This is almost like predicting the unpredictable. But I have heard a lot of people working on ML models for predicting the direction of price movement. Also, if you know the returns, you can be sure on how much money to bet i.e If you know that returns can be 5% after time t instead of 2%, you will bet more money as the strength of direction is strong. Initially, I was not sure what time horizon I want to use for the prediction. After I got the data, I did an analysis on distribution of returns for 1hr, 2hr and 3hr. From what I know about cryptos — You cannot predict far in time and there is no point in predicting very close because it can be noisy and returns will be less. Based on this logic and the distribution of returns, 3 hour looked practical. Also, from trading perspective, you need to do a lot of trades to average out the predictions. If you trade only a few times, you might end up lucky or unlucky. A time-frame of 3 hours allowed us to make many trades. Hence, I ended up selecting 3 hour as the prediction window i.e. I made a ML model which can look at historical data till time t and predict the returns at time t + 3hr.
Once we were sure what we had to do to come up with a good solution, we created the whole data science pipeline~
Collected data using Binance API
Technical analysis feature engineering
Train — Validation — Test data : We used data from July-2017 to April-2018 for train and validation. We tested on May-June 2018 data. We used BTCUSDT, ETHUSDT, LTCUSDT and XRPBTC as they are one of the highest traded.
A pipeline for algorithm selection, feature selection and hypertuning of parameters with cross-validation
Result
The algorithm traded ~8% of the time, used a maximum of ~$700 and made ~$1400 in the backtest of 33 days.
I was happy to see that out of ~1000 trades the algo predicted, it lost in only one trade. It predicted wrong direction of price movement only 1 time.
We were able to achieve this high accuracy of predicting direction and making profits because of special trade entry and exit conditions we had enforced.

Originally published at ml-dl.com on June 21, 2018.